In-Class Exercises, Session 4

Advanced Python
In-Class Exercises, Session 4


	MATCHING
Ex. 4.1	Match a simple character pattern.
	Search for 'Velas', then try 'Benter' and 'Acme'.
	import re lines = [ 'Acme Corporation is heded by CEO Joseph Benter, and ', 'President Maria Velas. Mr. Benter focuses on R&D ', 'while Ms. Velas provides vision and major deals for ', 'Acme. '] for line in lines: if re.search(r'', line): print(line)
	Expected Output (for Velas):
	President Maria Velas. Mr. Benter focuses on R&D while Ms. Velas provides vision and major deals for

Ex. 4.2	'not' to negate a search. Execute previous pattern with 'not' in front of re.search()
	import re lines = [ 'Acme Corporation is heded by CEO Joseph Benter, and ', 'President Maria Velas. Mr. Benter focuses on R&D ', 'while Ms. Velas provides vision and major deals for ', 'Acme. '] for line in lines: if re.search(r'Benter', line): print(line)
	Expected Output (for Benter):
	while Ms. Velas provides vision and major deals for Acme.


	ANCHORS
Ex. 4.3	Anchors - start of string.
	Print only those lines that have 'TEL' at the start:
	import re for text_line in ['AURORA HOTEL', 'OPEN12:00 AM - 11:59 PM', '14200 E ALAMEDA AVE AURORA, CO 80012', 'TEL (303) 344-9901']: if re.search(r'', text_line): print(text_line)
	Expected Output:
	TEL (303) 344-9901

Ex. 4.4	Anchors - end of string.
	Print only those files that end in .jpg
	import re filenames = ['image.jpg', 'image.png', 'filejpg.txt', 'file2.doc', 'file3.pdf', 'image2.gif', 'image3.jpg', 'image4.jpg'] for name in filenames: if re.search(r'', name): print(name)
	Expected Output:
	image.jpg image3.jpg image4.jpg


	BUILT-IN CHARACTER CLASSES
Ex. 4.5	"Digit" character class.
	Match on each string that has a digit.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	hello world 00 23 bonjour wilkommen23 99 00 88557799 Que 3 Tal! count: 7

Ex. 4.6	"Word" character class.
	Match each string that has a letter, number or underscore.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:


Ex. 4.7	"Space" character class.
	Match on each line that has a space.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:



	INVERSE CHARACTER CLASSES
Ex. 4.8	"Not a digit" character class.
	Match on each string that has a character that is not a digit.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	hello world 00 goodbye world 23 bonjour wilkommen23 aloha Que 3 Tal! myfile.jpg yourfile.JPG count: 8

Ex. 4.9	"Not a space" character class.
	Match on each string that has any non-spaces.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', ' ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	hello world 00 goodbye world 23 bonjour wilkommen23 aloha 99 00 88557799 Que 3 Tal! myfile.jpg yourfile.JPG count: 11


	CUSTOM CHARACTER CLASSES
Ex. 4.10	Custom character class.
	Match on each string that has a capital letter in it.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	Que 3 Tal! yourfile.JPG count: 2

Ex. 4.11	Using custom character class with built-in character class.
	Match on each string that has a letter followed by a number.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	wilkommen23 count: 1


	INVERSE CUSTOM CHARACTER CLASSES
Ex. 4.12	Inverse Custom Character Class. Match on each string that has any character that is not a letter.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	hello world 00 goodbye world 23 bonjour wilkommen23 99 00 88557799 Que 3 Tal! myfile.jpg yourfile.JPG count: 10


	THE WILDCARD
Ex. 4.13	Match on each string that ends with a character that is not a digit.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	goodbye world 23 bonjour wilkommen23 aloha Que 3 Tal! myfile.jpg yourfile.JPG count: 7

Ex. 4.14	Demo: match on any character.
	Use the wildcard (., a period) to see which strings match it.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')


	LAB 1
Ex. 4.15	Match on each string that starts with a digit.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	99 00 88557799 count: 3

Ex. 4.16	Match on each string that starts with a space.
	import re lines = [ 'this is the first line,', ' and this is the second line and', ' this is the third line. ' ] for line in lines: if re.search(r'', line): print(line)
	Expected Output:
	and this is the second line and', this is the third line. ' ]

Ex. 4.17	Loop through and print only lines with some text (not including spaces).
	import re text = """line 1 line 2, line 3... line4!""" lines = text.splitlines() for line in lines: if re.search(r'', line): print(line)
	Expected Output:
	line 1 line 2, line 3... line4!

Ex. 4.18	Match on each string that ends with a digit.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	hello world 00 99 00 88557799 count: 4

Ex. 4.19	Match on each line that ends with a space.
	import re lines = [ 'this is the first line, ', 'this is the second line and', 'this is the third line. ' ] for line in lines: if re.search(r'', line): print(line)
	Expected Output:
	this is the first line, this is the third line.

Ex. 4.20	Match on each string that consists only of a 2-digit number.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	99 00 count: 2

Ex. 4.21	Match on a capital letter followed by a lowercase letter.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	Que 3 Tal!

Ex. 4.22	Match on files with date format YYYY-MM-DD followed by '.txt'.
	import re dirlist = ('.', '..', '2010-12-15.txt', '2010-12-16.txt', 'testfile.txt', '20101-11-03.txt') for item in dirlist: if re.search(r'', item): print(item)
	Expected Output:
	2010-12-15.txt 2010-12-16.txt

Ex. 4.23	Match on date format MM/DD/YY (and not 4-digit year).
	import re dates = ['Jan. 3, 2018', '23-Mar-17', '12/02/98', '12/03/1998', '23.17.2018'] for date in dates: if re.search(r'', date): print(date)
	Expected Output:
	12/02/98

Ex. 4.24	Determine whether selected word begins with a vowel. If so, prepend an 'an' rather an an 'a'.
	import re words = ['apple', 'pear', 'orange', 'kiwi', 'elderberry', 'carrot', 'ugli fruit'] for word in words: if re.search(r'', word): prepend = 'an' else: prepend = 'a' print(f"{prepend} {word}")
	Expected Output:
	an apple a pear an orange a kiwi an elderberry a carrot an ugli fruit


	BUILT-IN QUANTIFIERS
Ex. 4.25	"One or more" quantifier. Match on each string that has one or more letters in it.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	hello world 00 goodbye world 23 bonjour wilkommen23 aloha Que 3 Tal! myfile.jpg yourfile.JPG count: 8

Ex. 4.26	"Zero or one" quantifier.
	Without using a character class (or grouped alternates), use a single regex that matches on each string that has 'a' or 'an' followed by a space.
	import re lines = [ 'This is a wonderful thing. ', "I haven't seen anything like it. ", "Isn't it an exceptional experience? "] for line in lines: if re.search(r'', line): print(line)
	Expected Output:
	This is a wonderful thing. Isn't it an exceptional experience?

Ex. 4.27	"Zero or more" quantifier, quantifiers with anchor.
	Match on all strings that consist only of a 1 followed by zero or more digits.
	import re numbers = [ '100', '135', '31', '1', 'I think', ] for val in numbers: if re.search(r'', val): print(val)
	Expected Output:
	100 135 1

Ex. 4.28	Quantifiers with Anchor. Match on each string that consists only of one or more digit characters.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	99 00 88557799 count: 3

Ex. 4.29	Quantifiers with Anchor (3). Match on each string that consists only of letters.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	aloha count: 1

Ex. 4.30	Quantifiers with custom character class.
	Match each string that has a capital letter followed by one or more lowercase letters.
	import re match_strings = [ 'hello World 00', 'goodbye C world ', ' 23 bonjour', 'wilkommen23 ', 'Aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	hello World 00 Aloha Que 3 Tal! count: 3

Ex. 4.31	Quantifiers with anchors. Match on each string that consists only of letters, numbers or the underscore.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	aloha 99 00 88557799 count: 4

Ex. 4.32	Quantifiers with anchors (2). Match on each string that consists only of non-digits.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	goodbye world aloha myfile.jpg yourfile.JPG count: 4

Ex. 4.33	Quantifiers with anchors (3). Match on each string that consists only of non-spaces.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	aloha 99 00 88557799 myfile.jpg yourfile.JPG count: 6


	CUSTOM QUANTIFIERS
Ex. 4.34	Custom quantifier.
	Match on each string that has two or more spaces at the end.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	goodbye world wilkommen23 count: 2

Ex. 4.35	Custom quantifier.
	Match on strings that have a capital letter followed by two or more lowercase letters.
	import re match_strings = [ 'hello World 00', 'goodbye As world ', 'To 23 bonjour', 'wilkommen23 ', 'Aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	hello World 00 Aloha Que 3 Tal! count: 3

Ex. 4.36	Custom quantifier.
	Print those numbers that are in the millions (i.e., 7 or more digits).
	import re nums = [ '1', '10', '100', '1000', '10000', '100000', '1000000', '10000000' ] for num in nums: if re.search(r'', num): print(num)
	Expected Output:
	1000000 10000000

Ex. 4.37	Custom quantifier. Having split the text into words, show those words that are greater than 7 characters in size.
	import re text = """This is the 1000th story, regarding a duck named Quack. It was unlikely that Quack could have been given a name like that by his mother, so we can only conclude that he was named by the author, who has a cuteness problem.""" words = text.split() stripped = [ word.rstrip('.,') for word in words ] for word in stripped: if re.search(r'', word): print(word)
	Expected Output:
	regarding unlikely conclude cuteness problem

Ex. 4.38	Custom Quantifier.
	A password must be 3-8 characters in length (letters, numbers and underscores are permitted). Validate the below password attempts.
	import re attempts = [ '1234', 'hello_there', 'password', 'ok', 'what?', 'supercalifrag'] for password in attempts: if re.search(r'', password): print(f'{password}: validated')
	Expected Output:
	1234: validated password: validated


	ESCAPING SPECIAL CHARACTERS
Ex. 4.39	Escape wildcard (aka period). Match on each string that has a letter, number or underscore followed by a period.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	myfile.jpg yourfile.JPG count: 2
	Note: why would this work without escaping the period? Because

Ex. 4.40	Escape end anchor (aka dollar sign).
	Match on strings that have a dollar amount, including two decimal places ($23.53).
	import re lines = [ 'The coat cost $239.50.', 'The candy cost $1.93', "I didn't buy anything today.", '$1 sale', 'I dream of $$$' ] for line in lines: if re.search(r'', line): print(line)
	Expected Output:
	The coat cost $239.50. The candy cost $1.93

Ex. 4.41	Escape quantifier character +.
	Match on all lines with positive numbers.
	import re numbers = [ 'Amount: -23.9', 'Amount: +43.8', 'Amount: -9.03', 'Amount: +99.9', 'Amount: +22.0' ] for num in numbers: if re.search(r'', num): print(num)
	Expected Output:
	Amount: +43.8 Amount: +99.9 Amount: +22.0

Ex. 4.42	Escape quantifier character *.
	Match on all lines with asterisked footnotes.
	import re numbers = [ 'As Ibid* said,', 'there should be no greater good ', 'than compassion, love, ', 'mutual benefit', 'and the profit-making motive.', ] for num in numbers: if re.search(r'', num): print(num)
	Expected Output:
	As Ibid* said, than compasssion, love, mutual benefit


	LAB 2
Ex. 4.43	Match on each string that has one or more "word" characters, followed by one or more spaces, followed by one or more "word" characters.
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	hello world 00 goodbye world 23 bonjour Que 3 Tal! count: 4

Ex. 4.44	Ignore comment lines: print only those lines that don't start with a comment (the first non-space character is a hash mark).
	import re text = """ # this is a program to do stuff a = 5 b = 10 # an int if True: # multiply them c = a * b """ for line in text.splitlines(): if not re.search(r'', line): print(line)
	Expected Output:
	a = 5 b = 10 # an int if True: c = a * b

Ex. 4.45	Match those lines that contain a 7-digit hex number (a-fA-F0-9).
	import re lines = [ 'The color code is #ABF2307.', 'Mr. Mxyzptlk is 999 years old today.', 'The memory address is fc9d223.' ] for line in lines: if re.search(r'', line): print(line)
	Expected Output:
	The color code is #ABF2307. The memory address is fc9d223.

Ex. 4.46	Show those lines that contain two capitalized words (as in a name).
	import re lines = [ 'The owner is Gwen Harstridge.', "There aren't a lot of stores like this one.", 'Paris is not a lot like Rome.', 'I hail from Los Angeles, California.' ] for line in lines: if re.search(r'', line): print(line)
	Expected Output:
	The owner is Gwen Harstridge. I hail from Los Angeles, California.


	re.IGNORECASE
Ex. 4.47	Without using a character class, match on each string that ends in .jpg or .JPG (try this another way).
	(hint: use the flag argument (the optional 3rd argument) to re.search())
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	myfile.jpg yourfile.JPG count: 2

Ex. 4.48	Print only those files that start with 'image#' ('image' plus a possible number) and end in any of these image extensions: '.jpg', '.png', '.gif'
	import re filenames = ['image2.jpg', 'image.png', 'file.txt', 'file2.doc', 'file3.pdf', 'image2.gif', 'image3.jpg', 'image4.jpg', 'advert.jpg', 'advert.png'] for name in filenames: if re.search(r'', name): print(name)
	Expected Output:
	image2.jpg image.png image2.gif image3.jpg image4.jpg

Ex. 4.49	Match on each string that ends in .jpg or .JPG
	import re match_strings = [ 'hello world 00', 'goodbye world ', ' 23 bonjour', 'wilkommen23 ', 'aloha', '99', '00', '88557799', 'Que 3 Tal!', 'myfile.jpg', 'yourfile.JPG' ] count = 0 for string in match_strings: if re.search(r'', string): print(string) count += 1 print(f'count: {count}')
	Expected Output:
	myfile.jpg yourfile.JPG count: 2


	GROUPING FOR QUANTIFYING and ALTERNATES
Ex. 4.50	Quantifying a group. Match on a number with two decimal places and possible thousandths separator (3.95, 3,200.95, etc.)
	First create a pattern that is 1 or more digits with comma separator (i.e. matching on 0,, 00, 000,) and group the number with parentheses; quantify the group to say that there is zero or more of these, followed by one or more digits, a period and 2 digits. (Do not use a custom character class for this purpose.)
	import re values = ['23.9', '18.2', '23.95', '2,238,000.00', '15,382.92', 'joe', '6.05'] # list of str for value in values: matchobj = re.search(r'', value) if matchobj: print(value)
	Expected Output:
	23.95 2,238,000.00 15,382.92
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.

Ex. 4.51	Quantifying a Group (2). Write a single regex that matches on q, Q, quit, Quit, QUIT. Do this without a character class and without the alternate vertical bar.
	import re x = input('Do you want to quit? ') # str, 'QuIt' (sample input) if re.search(r'', x): print("you're quitting!") else: print("you failed to quit.")
	Expected Output:
	Do you want to quit? QuIt you're quitting!


	GROUPING FOR EXTRACTION
Ex. 4.52	Group for extraction.
	Use a parenthetical grouping to extract the number from this text.
	import re line = '34: this is a line of text' matchobj = re.search(r'', line) print(matchobj.group(1))
	Expected Output:
	34
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.

Ex. 4.53	Group for extraction. Extract the Catalog ID and Publication Date from the text line.
	import re rs_row = 'Catalog ID: 2839-587 Pub. Date: 2019-09-03' matchobj = re.search(r'', rs_row) if matchobj: print(matchobj.group(1)) print(matchobj.group(2))
	Expected Output:
	2839-587 2019-09-03
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.

Ex. 4.54	Group for extraction. In one regex match, extract the IP address from this log line.
	import re line = '172.26.93.208 - - [28/Jun/2012:21:00:17 -0400] "GET /~cmk380/pythondata/image2b.txt HTTP/1.1" 200 30' matchobj = re.search(r'', line) if matchobj: print(matchobj.group(1))
	Expected Output:
	172.26.93.208
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.


	'MINIMAL MATCH' QUANTIFIER
Ex. 4.55	Demonstration: "minimal" match.
	The below regex grabs the word Python from the text. Run the code once to observe this. Now add a question mark ? as the character directly after the "one or more" plus sign and run again - you should see that the "one or more word characters" pattern is now matching on as few characters as possible.
	import re text = 'My language is Python' matchobj = re.search(r'', text) print(matchobj.group(1))
	Expected Output:
	P
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.

Ex. 4.56	Work with wildcard and minimal match.
	Use the wildcard to match everything between the first two brackets. Note carefully what was printed. (Don't forget that square brackets must be escaped with a backslash, and that extraction requires grouping parentheses.)
	import re text = 'Discussion of terms <TO COME> after something <PLEASE REVIEW>.' matchobj = re.search(r'', text) print(matchobj.group(1))
	Expected Output:
	TO COME
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.

Ex. 4.57	Match on non-search character.
	Perform the same extraction on the below text by searching for a bracket followed by one or more non-brackets. Text extracted should be the same.
	import re text = 'Discussion of terms <TO COME> after something <PLEASE REVIEW>.' matchobj = re.search(r'', text) print(matchobj.group(1))
	Expected Output:
	TO COME
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.


	GROUPING with .groups()
Ex. 4.58	Retrieve a grouping with .groups().
	In one regex match, extract the status code and bytes downloaded (last 2 integers on the line) from this log line. Call .groups() the match object to reveal the extracted values.
	import re line = '172.26.93.208 - - [28/Jun/2012:21:00:17 -0400] "GET /~cmk380/pythondata/image2b.txt HTTP/1.1" 200 30' matchobj = re.search(r'', line) if matchobj: print(matchobj.groups())
	Expected Output:
	('200', '30')
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.

Ex. 4.59	Retrieve a grouping with .groups().
	Extract city, state zip from line.
	import re line = 'Los Angeles, CA 91604' matchobj = re.search(r'', line) print(matchobj.groups())
	Expected Output:
	('Los Angeles', 'CA', '91604')
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.

Ex. 4.60	Quantify for an optional group.
	Pull out all the info about each person (Favorite Color may not be there).
	import re results = [ 'Name: Joe; Favorite Color: Blue; Employee ID: 2395', 'Name: Marie; Employee ID: 2321', 'Name: Teneski; Favorite Color: Green; Employee ID: 1913' ] for row in results: matchobj = re.search(r'', row) print(matchobj.groups())
	Expected Output:
	('Joe', 'Favorite Color: Blue; ', 'Blue', '2395') ('Marie', None, None, '2321') ('Teneski', 'Favorite Color: Green; ', 'Green', '1913')
	Note that if you see the message AttributeError: 'NoneType' object has no attribute 'group', this means that the search did not find a match and returned None, and the code attempted to call .group() on None. Check the string and pattern to determine why it did not match.


	findall() FOR MULTIPLE MATCHES
Ex. 4.61	Group and extract with findall().
	Extract email addresses only for nyu.edu.
	import re text = """There are many ways to contact us. Use the general email contact@nyu.edu, or email our public liason at help@nyu.edu. If you need tech support you can reach us at askits@nyu.edu. Author: Joe Wilson joe@wilson.com""" emails = re.findall(r'', text) print(emails)
	Expected Output:
	['contact@nyu.edu', 'help@nyu.edu', 'askits@nyu.edu']


	re.sub() FOR SUBSTITUTIONS
Ex. 4.62	Regex substitution. Replace space-separated with comma separated
	import re args = 'this that other and some other' args2 = re.sub(r'', ",", args) print(args2)
	Expected Output:
	this,that,other,and,some,other


	re.split() FOR PATTERN-BASED DELIMITERS
Ex. 4.63	Regex split. Split the user-input comma-separated values string into separate digit values.
	import re ui = '23, 14, 7,3,9' numbers = re.split(r'', ui) print(numbers)
	Expected Output:
	['23', '14', '7', '3', '9']

Ex. 4.64	DOTALL wildcard match.
	Extract everything between =code start= and = code end =. Use the re.DOTALL switch to use the wildcard (.) to match on a newline.
	import re text = """Title of This Text This is some description... =code start= a = 5 b = 5.0 if a == b: print('yes') =code end= This is some discussion... """ matchobj = re.search(r'', text) print(matchobj.group(1))
	Expected Output:
	a = 5 b = 5.0 if a == b: print('yes')

Ex. 4.65	Multiline anchors.
	Use findall() to extra numbers from only the start of each line of the text. Use re.MULTILINE to allow the carrot (^) to match at the start of any line.
	import re text = """Title of This Text 23 we want to grab some 99 numbers 12 but not others, 17 and then some 5 so we just get 1 the ones on the left side 93 and me and 23 too """ matches = re.findall(r'', text) print(matches)
	Expected Output:
	['23', '12', '5', '93']