regex

SamadXYZ · Jul 8, 2021 · 3a0c26c · 3a0c26c
1 parent d255277
commit 3a0c26c
Showing 1 changed file with 55 additions and 37 deletions.
diff --git a/18_Day_Regular_expressions/18_regular_expressions.md b/18_Day_Regular_expressions/18_regular_expressions.md
@@ -21,7 +21,7 @@
 - [📘 Day 18](#-day-18)
   - [Regular Expressions](#regular-expressions)
     - [The *re* Module](#the-re-module)
-    - [Functions in *re* Module](#functions-in-re-module)
+    - [Methods in *re* Module](#methods-in-re-module)
       - [Match](#match)
       - [Search](#search)
       - [Searching for All Matches Using *findall*](#searching-for-all-matches-using-findall)
@@ -37,6 +37,9 @@
     - [Quantifier in RegEx](#quantifier-in-regex)
     - [Cart ^](#cart-)
   - [💻 Exercises: Day 18](#-exercises-day-18)
+    - [Exercises: Level 1](#exercises-level-1)
+    - [Exercises: Level 2](#exercises-level-2)
+    - [Exercises: Level 3](#exercises-level-3)
 
 # 📘 Day 18
 
@@ -52,11 +55,11 @@ After importing the module we can use it to detect or find patterns.
 import re
 ```
 
-### Functions in *re* Module
+### Methods in *re* Module
 
 To find a pattern we use different set of *re* character sets that allows to search for a match in a string.
 
-* *re.match()*: searches only in the beginning of the first line of the string and returns matched objects if found, else returns none. 
+* *re.match()*: searches only in the beginning of the first line of the string and returns matched objects if  found, else returns None. 
 * *re.search*: Returns a match object if there is one anywhere in the string, including multiline strings.
 * *re.findall*: Returns a list containing all matches
 * *re.split*:	Takes a string, splits it at the match points, returns a list
@@ -89,6 +92,16 @@ print(substring)       # I love to teach
 
 As you can see from the example above, the pattern we are looking for (or the substring we are looking for) is *I love to teach*. The match function returns an object **only** if the text starts with the pattern.
 
+```py
+import re
+
+txt = 'I love to teach python and javaScript'
+match = re.match('I like to teach', txt, re.I)
+print(match)  # None
+```
+
+The string does not string with *I like to teach*, therefore there was no match and the match method returned None.
+
 #### Search
 
 ```py
@@ -129,10 +142,9 @@ I recommend python for a first programming language'''
 # It return a list
 matches = re.findall('language', txt, re.I)
 print(matches)  # ['language', 'language']
-
 ```
 
-As you can see, the word language was found two times in the string. Let's practice some more.
+As you can see, the word *language* was found two times in the string. Let us practice some more.
 Now we will look for both Python and python words in the string:
 
 ```py
@@ -145,7 +157,7 @@ print(matches)  # ['Python', 'python']
 
 ```
 
-Since we are using *re.I* both lowercase and uppercase letters are included. If we don't have that flag, then we will have to write our pattern differently. Let's check it out:
+Since we are using *re.I* both lowercase and uppercase letters are included. If we do not have the re.I flag, then we will have to write our pattern differently. Let us check it out:
 
 ```py
 txt = '''Python is the most beautiful language that a human being has ever created.
@@ -173,24 +185,23 @@ match_replaced = re.sub('[Pp]ython', 'JavaScript', txt, re.I)
 print(match_replaced)  # JavaScript is the most beautiful language that a human being has ever created.
 ```
 
-Let's add one more example. The following string is really hard to read unless we remove the % symbol. Replacing the % with an empty string will clean the text.
+Let us add one more example. The following string is really hard to read unless we remove the % symbol. Replacing the % with an empty string will clean the text.
 
 ```py
 
-txt = '''%I a%m te%%a%%che%r% a%n%d %% I l%o%ve te%ach%ing.
+txt = '''%I a%m te%%a%%che%r% a%n%d %% I l%o%ve te%ach%ing. 
 T%he%re i%s n%o%th%ing as r%ewarding a%s e%duc%at%i%ng a%n%d e%m%p%ow%er%ing p%e%o%ple.
-I fo%und te%a%ching m%ore i%n%t%er%%es%ting t%h%an any other %jobs.
+I fo%und te%a%ching m%ore i%n%t%er%%es%ting t%h%an any other %jobs. 
 D%o%es thi%s m%ot%iv%a%te %y%o%u to b%e a t%e%a%cher?'''
 
 matches = re.sub('%', '', txt)
 print(matches)
 ```
 
 ```sh
-I am teacher and  I love teaching.
-There is nothing as rewarding as educating and empowering people.
-I found teaching more interesting than any other jobs.
-Does this motivate you to be a teacher?
+I am teacher and I love teaching.
+There is nothing as rewarding as educating and empowering people. 
+I found teaching more interesting than any other jobs. Does this motivate you to be a teacher?
 ```
 
 ## Splitting Text Using RegEx Split
@@ -260,15 +271,15 @@ print(matches)  # ['Apple', 'apple']
 
 ![Regular Expression cheat sheet](../images/regex.png)
 
-Let's use examples to clarify the meta characters above 
+Let us use examples to clarify the meta characters above 
 
 ### Square Bracket
 
-Let's use square bracket to include lower and upper case
+Let us use square bracket to include lower and upper case
 
 ```py
 regex_pattern = r'[Aa]pple' # this square bracket mean either A or a
-txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
+txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away.'
 matches = re.findall(regex_pattern, txt)
 print(matches)  # ['Apple', 'apple']
 ```
@@ -277,7 +288,7 @@ If we want to look for the banana, we write the pattern as follows:
 
 ```py
 regex_pattern = r'[Aa]pple|[Bb]anana' # this square bracket means either A or a
-txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. '
+txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away.'
 matches = re.findall(regex_pattern, txt)
 print(matches)  # ['Apple', 'banana', 'apple', 'banana']
 ```
@@ -288,18 +299,18 @@ Using the square bracket and or operator , we manage to extract Apple, apple, Ba
 
 ```py
 regex_pattern = r'\d'  # d is a special character which means digits
-txt = 'This regular expression example was made on December 6,  2019.'
+txt = 'This regular expression example was made on December 6,  2019 and revised on July 8, 2021'
 matches = re.findall(regex_pattern, txt)
-print(matches)  # ['6', '2', '0', '1', '9'], this is not what we want
+print(matches)  # ['6', '2', '0', '1', '9', '8', '2', '0', '2', '1'], this is not what we want
 ```
 
 ### One or more times(+)
 
 ```py
 regex_pattern = r'\d+'  # d is a special character which means digits, + mean one or more times
-txt = 'This regular expression example was made on December 6,  2019.'
+txt = 'This regular expression example was made on December 6,  2019 and revised on July 8, 2021'
 matches = re.findall(regex_pattern, txt)
-print(matches)  # ['6', '2019'] - now, this is better!
+print(matches)  # ['6', '2019', '8', '2021'] - now, this is better!
 ```
 
 ### Period(.)
@@ -332,34 +343,34 @@ Zero or one time. The pattern may not occur or it may occur once.
 
 ```py
 txt = '''I am not sure if there is a convention how to write the word e-mail.
-Some people write it email others may write it as Email or E-mail.'''
+Some people write it as email others may write it as Email or E-mail.'''
 regex_pattern = r'[Ee]-?mail'  # ? means here that '-' is optional
 matches = re.findall(regex_pattern, txt)
 print(matches)  # ['e-mail', 'email', 'Email', 'E-mail']
 ```
 
 ### Quantifier in RegEx
 
-We can specify the length of the substring we are looking for in a text, using a curly bracket. Lets imagine, we are interested in a substring with a length of 4 characters:
+We can specify the length of the substring we are looking for in a text, using a curly bracket. Let us imagine, we are interested in a substring with a length of 4 characters:
 
 ```py
-txt = 'This regular expression example was made on December 6,  2019.'
+txt = 'This regular expression example was made on December 6,  2019 and revised on July 8, 2021'
 regex_pattern = r'\d{4}'  # exactly four times
 matches = re.findall(regex_pattern, txt)
-print(matches)  # ['2019']
+print(matches)  # ['2019', '2021']
 
-txt = 'This regular expression example was made on December 6,  2019.'
+txt = 'This regular expression example was made on December 6,  2019 and revised on July 8, 2021'
 regex_pattern = r'\d{1, 4}'   # 1 to 4
 matches = re.findall(regex_pattern, txt)
-print(matches)  # ['6', '2019']
+print(matches)  # ['6', '2019', '8', '2021']
 ```
 
 ### Cart ^
 
 * Starts with
 
 ```py
-txt = 'This regular expression example was made on December 6,  2019.'
+txt = 'This regular expression example was made on December 6,  2019 and revised on July 8, 2021'
 regex_pattern = r'^This'  # ^ means starts with
 matches = re.findall(regex_pattern, txt)
 print(matches)  # ['This']
@@ -368,21 +379,23 @@ print(matches)  # ['This']
 * Negation
 
 ```py
-txt = 'This regular expression example was made on December 6,  2019.'
+txt = 'This regular expression example was made on December 6,  2019 and revised on July 8, 2021'
 regex_pattern = r'[^A-Za-z ]+'  # ^ in set character means negation, not A to Z, not a to z, no space
 matches = re.findall(regex_pattern, txt)
-print(matches)  # ['6,', '2019.']
+print(matches)  # ['6,', '2019', '8', '2021']
 ```
 
 ## 💻 Exercises: Day 18
 
-  1. What is the most frequent word in the following paragraph?
+### Exercises: Level 1
+ 1. What is the most frequent word in the following paragraph?
 ```py
     paragraph = 'I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love.
 ```
 
 ```sh
-    [(6, 'love'),
+    [
+    (6, 'love'),
     (5, 'you'),
     (3, 'can'),
     (2, 'what'),
@@ -403,18 +416,21 @@ print(matches)  # ['6,', '2019.']
     (1, 'an'),
     (1, 'all'),
     (1, 'Python'),
-    (1, 'If')]
+    (1, 'If')
+    ]
 ```
 
-2. The position of some particles on the horizontal x-axis -12, -4, -3 and -1 in the negative direction, 0 at origin, 4 and 8 in the positive direction. Extract these numbers from this whole text and find the distance between the two furthest particles.
+2. The position of some particles on the horizontal x-axis are -12, -4, -3 and -1 in the negative direction, 0 at origin, 4 and 8 in the positive direction. Extract these numbers from this whole text and find the distance between the two furthest particles.
 
 ```py
 points = ['-1', '2', '-4', '-3', '-1', '0', '4', '8']
 sorted_points =  [-4, -3, -1, -1, 0, 2, 4, 8]
-distance = 12
+distance = 8 -(-4) # 12
 ```
 
-3. Write a pattern which identifies if a string is a valid python variable
+### Exercises: Level 2
+
+1. Write a pattern which identifies if a string is a valid python variable
 
     ```sh
     is_valid_variable('first_name') # True
@@ -423,7 +439,9 @@ distance = 12
     is_valid_variable('firstname') # True
     ```
 
-4. Clean the following text. After cleaning, count three most frequent words in the string.
+### Exercises: Level 3
+
+1. Clean the following text. After cleaning, count three most frequent words in the string.
 
     ```py
     sentence = '''%I $am@% a %tea@cher%, &and& I lo%#ve %tea@ching%;. There $is nothing; &as& mo@re rewarding as educa@ting &and& @emp%o@wering peo@ple. ;I found tea@ching m%o@re interesting tha@n any other %jo@bs. %Do@es thi%s mo@tivate yo@u to be a tea@cher!?'''