3. Regular expressions¶
3.1. Import¶
>>> import re
3.2. Usage¶
Text processing,
Finding patterns,
Data cleaning
Data validation
3.3. Functions withing re package¶
Function |
meaning and usage |
Result |
---|---|---|
re.match |
If match |
True/False |
re.search |
First occurrence |
|
re.split |
Splitting by separator |
List |
re.findall |
Find all occurrences |
List |
re.finditer |
Find all occurrences |
Iterator |
3.4. Characters classes¶
Class |
Meaning |
|
---|---|---|
. |
Any character |
|
^ |
Beginning of the line |
|
$ |
End of line |
|
* |
Zero or more occurrences |
|
+ |
One or more occurrences |
|
? |
One or zero occurrences |
|
{n} |
N of occurrences |
|
{n, m} |
Number of occurrences in range |
|
d |
Number group - same as [0-9] |
|
D |
Anti number group [^0-9] |
|
w |
Group “characters” - same as [a-zA-Z0-9_] |
|
W |
Anti group “characters” - same as [^a-zA-Z0-9_] |
|
s |
Group of white characters - same as [\r\n\t\f\v] |
|
[abc] |
Group of characters a, b or c |
|
[a-z] |
Characters in range |
|
() |
Group |
3.5. Exercise - part 1¶
Create function
check_ip
Function will be checking if IP is correct,
Check function on dictionary of hosts
{
'127.0.0.1': {'correct': None},
'8.8.8.8': {'correct': None},
'x.x.x.x': {'correct': None}
}
* In place of **x.x.x.x** put any address from your network,
* Amend **correct** flag
Hint
You may use following expression, or find / create more precisse
^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$
3.6. Exercise - part 2¶
Create function
check_email
Function will be checking if email is correct
3.7. Exercise - part 3¶
Using library
requests
Download content of the page
Get all html tags,
Get human readable words
3.8. Exercise - part 4¶
Using library
collections
Get number of occurrences of word from Ex. part 3 (second point),
Get top 10 of most frequent words ?,
Get top 70 of most frequent words ?,