diff --git a/18_Day_Regular_expressions/18_regular_expressions.md b/18_Day_Regular_expressions/18_regular_expressions.md index 25684ae6..f8366d26 100644 --- a/18_Day_Regular_expressions/18_regular_expressions.md +++ b/18_Day_Regular_expressions/18_regular_expressions.md @@ -21,7 +21,7 @@ - [📘 Day 18](#-day-18) - [Regular Expressions](#regular-expressions) - [The *re* Module](#the-re-module) - - [Functions in *re* Module](#functions-in-re-module) + - [Methods in *re* Module](#methods-in-re-module) - [Match](#match) - [Search](#search) - [Searching for All Matches Using *findall*](#searching-for-all-matches-using-findall) @@ -37,6 +37,9 @@ - [Quantifier in RegEx](#quantifier-in-regex) - [Cart ^](#cart-) - [💻 Exercises: Day 18](#-exercises-day-18) + - [Exercises: Level 1](#exercises-level-1) + - [Exercises: Level 2](#exercises-level-2) + - [Exercises: Level 3](#exercises-level-3) # 📘 Day 18 @@ -52,11 +55,11 @@ After importing the module we can use it to detect or find patterns. import re ``` -### Functions in *re* Module +### Methods in *re* Module To find a pattern we use different set of *re* character sets that allows to search for a match in a string. -* *re.match()*: searches only in the beginning of the first line of the string and returns matched objects if found, else returns none. +* *re.match()*: searches only in the beginning of the first line of the string and returns matched objects if found, else returns None. * *re.search*: Returns a match object if there is one anywhere in the string, including multiline strings. * *re.findall*: Returns a list containing all matches * *re.split*: Takes a string, splits it at the match points, returns a list @@ -89,6 +92,16 @@ print(substring) # I love to teach As you can see from the example above, the pattern we are looking for (or the substring we are looking for) is *I love to teach*. The match function returns an object **only** if the text starts with the pattern. +```py +import re + +txt = 'I love to teach python and javaScript' +match = re.match('I like to teach', txt, re.I) +print(match) # None +``` + +The string does not string with *I like to teach*, therefore there was no match and the match method returned None. + #### Search ```py @@ -129,10 +142,9 @@ I recommend python for a first programming language''' # It return a list matches = re.findall('language', txt, re.I) print(matches) # ['language', 'language'] - ``` -As you can see, the word language was found two times in the string. Let's practice some more. +As you can see, the word *language* was found two times in the string. Let us practice some more. Now we will look for both Python and python words in the string: ```py @@ -145,7 +157,7 @@ print(matches) # ['Python', 'python'] ``` -Since we are using *re.I* both lowercase and uppercase letters are included. If we don't have that flag, then we will have to write our pattern differently. Let's check it out: +Since we are using *re.I* both lowercase and uppercase letters are included. If we do not have the re.I flag, then we will have to write our pattern differently. Let us check it out: ```py txt = '''Python is the most beautiful language that a human being has ever created. @@ -173,13 +185,13 @@ match_replaced = re.sub('[Pp]ython', 'JavaScript', txt, re.I) print(match_replaced) # JavaScript is the most beautiful language that a human being has ever created. ``` -Let's add one more example. The following string is really hard to read unless we remove the % symbol. Replacing the % with an empty string will clean the text. +Let us add one more example. The following string is really hard to read unless we remove the % symbol. Replacing the % with an empty string will clean the text. ```py -txt = '''%I a%m te%%a%%che%r% a%n%d %% I l%o%ve te%ach%ing. +txt = '''%I a%m te%%a%%che%r% a%n%d %% I l%o%ve te%ach%ing. T%he%re i%s n%o%th%ing as r%ewarding a%s e%duc%at%i%ng a%n%d e%m%p%ow%er%ing p%e%o%ple. -I fo%und te%a%ching m%ore i%n%t%er%%es%ting t%h%an any other %jobs. +I fo%und te%a%ching m%ore i%n%t%er%%es%ting t%h%an any other %jobs. D%o%es thi%s m%ot%iv%a%te %y%o%u to b%e a t%e%a%cher?''' matches = re.sub('%', '', txt) @@ -187,10 +199,9 @@ print(matches) ``` ```sh -I am teacher and I love teaching. -There is nothing as rewarding as educating and empowering people. -I found teaching more interesting than any other jobs. -Does this motivate you to be a teacher? +I am teacher and I love teaching. +There is nothing as rewarding as educating and empowering people. +I found teaching more interesting than any other jobs. Does this motivate you to be a teacher? ``` ## Splitting Text Using RegEx Split @@ -260,15 +271,15 @@ print(matches) # ['Apple', 'apple'] ![Regular Expression cheat sheet](../images/regex.png) -Let's use examples to clarify the meta characters above +Let us use examples to clarify the meta characters above ### Square Bracket -Let's use square bracket to include lower and upper case +Let us use square bracket to include lower and upper case ```py regex_pattern = r'[Aa]pple' # this square bracket mean either A or a -txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. ' +txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away.' matches = re.findall(regex_pattern, txt) print(matches) # ['Apple', 'apple'] ``` @@ -277,7 +288,7 @@ If we want to look for the banana, we write the pattern as follows: ```py regex_pattern = r'[Aa]pple|[Bb]anana' # this square bracket means either A or a -txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away. ' +txt = 'Apple and banana are fruits. An old cliche says an apple a day a doctor way has been replaced by a banana a day keeps the doctor far far away.' matches = re.findall(regex_pattern, txt) print(matches) # ['Apple', 'banana', 'apple', 'banana'] ``` @@ -288,18 +299,18 @@ Using the square bracket and or operator , we manage to extract Apple, apple, Ba ```py regex_pattern = r'\d' # d is a special character which means digits -txt = 'This regular expression example was made on December 6, 2019.' +txt = 'This regular expression example was made on December 6, 2019 and revised on July 8, 2021' matches = re.findall(regex_pattern, txt) -print(matches) # ['6', '2', '0', '1', '9'], this is not what we want +print(matches) # ['6', '2', '0', '1', '9', '8', '2', '0', '2', '1'], this is not what we want ``` ### One or more times(+) ```py regex_pattern = r'\d+' # d is a special character which means digits, + mean one or more times -txt = 'This regular expression example was made on December 6, 2019.' +txt = 'This regular expression example was made on December 6, 2019 and revised on July 8, 2021' matches = re.findall(regex_pattern, txt) -print(matches) # ['6', '2019'] - now, this is better! +print(matches) # ['6', '2019', '8', '2021'] - now, this is better! ``` ### Period(.) @@ -332,7 +343,7 @@ Zero or one time. The pattern may not occur or it may occur once. ```py txt = '''I am not sure if there is a convention how to write the word e-mail. -Some people write it email others may write it as Email or E-mail.''' +Some people write it as email others may write it as Email or E-mail.''' regex_pattern = r'[Ee]-?mail' # ? means here that '-' is optional matches = re.findall(regex_pattern, txt) print(matches) # ['e-mail', 'email', 'Email', 'E-mail'] @@ -340,18 +351,18 @@ print(matches) # ['e-mail', 'email', 'Email', 'E-mail'] ### Quantifier in RegEx -We can specify the length of the substring we are looking for in a text, using a curly bracket. Lets imagine, we are interested in a substring with a length of 4 characters: +We can specify the length of the substring we are looking for in a text, using a curly bracket. Let us imagine, we are interested in a substring with a length of 4 characters: ```py -txt = 'This regular expression example was made on December 6, 2019.' +txt = 'This regular expression example was made on December 6, 2019 and revised on July 8, 2021' regex_pattern = r'\d{4}' # exactly four times matches = re.findall(regex_pattern, txt) -print(matches) # ['2019'] +print(matches) # ['2019', '2021'] -txt = 'This regular expression example was made on December 6, 2019.' +txt = 'This regular expression example was made on December 6, 2019 and revised on July 8, 2021' regex_pattern = r'\d{1, 4}' # 1 to 4 matches = re.findall(regex_pattern, txt) -print(matches) # ['6', '2019'] +print(matches) # ['6', '2019', '8', '2021'] ``` ### Cart ^ @@ -359,7 +370,7 @@ print(matches) # ['6', '2019'] * Starts with ```py -txt = 'This regular expression example was made on December 6, 2019.' +txt = 'This regular expression example was made on December 6, 2019 and revised on July 8, 2021' regex_pattern = r'^This' # ^ means starts with matches = re.findall(regex_pattern, txt) print(matches) # ['This'] @@ -368,21 +379,23 @@ print(matches) # ['This'] * Negation ```py -txt = 'This regular expression example was made on December 6, 2019.' +txt = 'This regular expression example was made on December 6, 2019 and revised on July 8, 2021' regex_pattern = r'[^A-Za-z ]+' # ^ in set character means negation, not A to Z, not a to z, no space matches = re.findall(regex_pattern, txt) -print(matches) # ['6,', '2019.'] +print(matches) # ['6,', '2019', '8', '2021'] ``` ## 💻 Exercises: Day 18 - 1. What is the most frequent word in the following paragraph? +### Exercises: Level 1 + 1. What is the most frequent word in the following paragraph? ```py paragraph = 'I love teaching. If you do not love teaching what else can you love. I love Python if you do not love something which can give you all the capabilities to develop an application what else can you love. ``` ```sh - [(6, 'love'), + [ + (6, 'love'), (5, 'you'), (3, 'can'), (2, 'what'), @@ -403,18 +416,21 @@ print(matches) # ['6,', '2019.'] (1, 'an'), (1, 'all'), (1, 'Python'), - (1, 'If')] + (1, 'If') + ] ``` -2. The position of some particles on the horizontal x-axis -12, -4, -3 and -1 in the negative direction, 0 at origin, 4 and 8 in the positive direction. Extract these numbers from this whole text and find the distance between the two furthest particles. +2. The position of some particles on the horizontal x-axis are -12, -4, -3 and -1 in the negative direction, 0 at origin, 4 and 8 in the positive direction. Extract these numbers from this whole text and find the distance between the two furthest particles. ```py points = ['-1', '2', '-4', '-3', '-1', '0', '4', '8'] sorted_points = [-4, -3, -1, -1, 0, 2, 4, 8] -distance = 12 +distance = 8 -(-4) # 12 ``` -3. Write a pattern which identifies if a string is a valid python variable +### Exercises: Level 2 + +1. Write a pattern which identifies if a string is a valid python variable ```sh is_valid_variable('first_name') # True @@ -423,7 +439,9 @@ distance = 12 is_valid_variable('firstname') # True ``` -4. Clean the following text. After cleaning, count three most frequent words in the string. +### Exercises: Level 3 + +1. Clean the following text. After cleaning, count three most frequent words in the string. ```py sentence = '''%I $am@% a %tea@cher%, &and& I lo%#ve %tea@ching%;. There $is nothing; &as& mo@re rewarding as educa@ting &and& @emp%o@wering peo@ple. ;I found tea@ching m%o@re interesting tha@n any other %jo@bs. %Do@es thi%s mo@tivate yo@u to be a tea@cher!?'''