Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 11 Regex: typo in the code example. Top-level domain validation expression #277

Open
nikolayrantsev opened this issue Dec 28, 2020 · 0 comments

Comments

@nikolayrantsev
Copy link

Hello,
Thanks, everyone so much for this course!

  1. Found small type in the 11th chapter, section 'Extracting data using regular expressions':
    ...
    Here is our new regular expression:
    [a-zA-Z0-9]\S*@\S*[a-zA-Z]
    ... then the code block with the usage of this example:
    If we use this expression in our program, our data is much cleaner:
# Search for lines that have an at sign between characters
# The characters must be a letter or number
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    x = re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-Z]', line)
    if len(x) > 0:
        print(x)

# Code: http://www.py4e.com/code3/re07.py

please update the "+" sign in the line x = re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-Z]', line) with the "*"

  1. Interesting thing here is that by running the code with the correct expression [a-zA-Z0-9]\S*@\S*[a-zA-Z], we're receiving the results including lines like:
    [ 'dhorwitz@david-horwitz-6:~/branchManagemnt/sakai_2-5-x']

Appreciate the explanation of how to improve the expression in order to filter out the records not matching email address criteria to have a top-level domain.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant