A Regular Expression (RegEx) represents a group of characters that forms a search pattern used for matching/searching within strings. Python Regular Expressions is a special sequence of characters that helps you match/find other strings or sets of strings, using a specialized syntax held in a pattern.
There is a module named “re” which provides python regular expressions support. It comprises functions such as split(), search(), match(), sub(), and findall() etc.
Below are the functions used in Python Regular Expressions:
- re.match()
- re.search()
- re.findall()
re.match Function:
The match function attempts to match RE pattern to string with optional flags. The re.match function returns a match object on success, None on failure.
Syntax:
# import re module import re Substring1 ='tutoriaslart' Substring2 ='Python' String1 ='''You are learning Python Regular Expression at tutoriaslart.com''' String2 ='Python Regular Expression at tutoriaslart.com' # match_object1=re.match(Substring1, String1, re.IGNORECASE) if match_object1: print('Matched string: ',match_object2.group()) else: print('No matched found for Substring1') match_object2=re.match(Substring2, String2, re.IGNORECASE) if match_object2: print('Match for Substring2: ',match_object2.group()) else: print('No matched found')
#re.match function-tutorialsart.com import re test_string = "Python Regular Expression at tutorialsart.com" pattern=r'(.*) (.*?) .*' match_object = re.match(pattern , test_string, re.M|re.I) if match_object: print ("match_object.group() : ", match_object.group()) print ("match_object.group(1) : ", match_object.group(1)) print ("match_object.group(2) : ", match_object.group(2)) else: print ("No match found")
re.search Function:
The search function, searches for first occurrence of RE pattern within string with optional flags.
Syntax:
re.search(pattern, string, flags=0)
#re.search function-tutorialsart.com import re test_string = "Python Regular Expression" pattern=r'(.*) (.*?) .*' search_object = re.search(pattern , test_string, re.M|re.I) if search_object: print ("search_object.group() : ", search_object.group()) print ("search_object.group(1) : ", search_object.group(1)) print ("search_object.group(2) : ", search_object.group(2)) else: print ("No match found")
re.match and re.search functions have the same parameters and method objects
.
These are parameters used in both functions:
Parameters | Description |
pattern | This is the regular expression to be matched |
string | This is a string, which searched to match the pattern at the beginning of string. |
flags | Different flags can be specified using bitwise OR (|). These are modifiers. |
Below is the list of Object Method with Description in both functions:
Object Methods | Description |
groups() | This method returns all matching subgroups in a tuple. |
group(num=0) | This method returns the entire match /or specific subgroup number. |
re.findall():
As we know in search() function, it returns only the first matched value of any given pattern, whereas findall() module returns search for “all” occurrences that match a given pattern. findall() will iterate over all the lines of the file and will return all non-overlapping matches of pattern in a single step.
Syntax:
#re.findall function-tutorialsart.com import re pattern = 'tutorialsart' string = "Python Regular Expression at tutorialsart.com from tutorialsart.com" result = re.findall(pattern, string) if result: print ('match found:',result) else: print ("No match found")
Search and Replace:
Another function provided by Python is Search and Replace. This method replaces all occurrences of the Regular Expression’s given pattern in a string with repl, substituting all occurrences unless max provided. This method returns a modified string.
Syntax:
#search and replace import re phone = "897-923-451 #Given Number" # Delete Python-style comments number = re.sub(r'#.*$', "", phone) print ("Given Number : ", number) # Remove anything other than digits number = re.sub(r'\D', "", phone) print ("Given Number : ", number)
Matching Vs Searching:
Python offers two distinctive primitive operations based on normal expressions:
Match checks for a match as it were at the starting of the string, whereas Search checks for a match anywhere within the string.
#search vs match import re string = "John and Bob are friends"; matchObj = re.match( r'Bob', string, re.M|re.I) if matchObj: print ("match --> matchObj.group() : ", matchObj.group()) else: print ("No match found") searchObj = re.search( r'Bob', string, re.M|re.I) if searchObj: print ("search --> searchObj.group() : ", searchObj.group()) else: print ("Nothing found")
Regular Expression Modifiers:
Option Flags
Regular expression literals may include an optional modifier to control various aspects of matching. The modifiers are specified as an optional flag. You can provide multiple modifiers using exclusive OR (|).
Modifier | Description |
re.I | Performs case-insensitive matching. |
re.L | Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B). |
re.X | Permits “cuter” regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker |
re.U | Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B |
re.S | Makes a period (dot) match any character, including a newline. |
re.M | Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string). |
Regular-Expression Patterns:
In python all characters match themselves, except for control characters, (+ ? . * ^ $ ( ) [ ] { } | \). A control character can be escaped by preceding it with a backslash.
Given below is the table for those patterns:
Patterns | Description |
^ | Matches beginning of line. |
. | Matches any single character except a newline. Using the m option allows it to match the newline as well. |
$ | Matches end of line. |
\G | Matches the point where the last match finished. |
[…] | Matches any single character in brackets. |
[^…] | Matches any single character not in brackets. |
re* | Matches 0 or more occurrences of the preceding expression. |
re? | Matches 0 or 1 occurrence of preceding expression. |
re+ | Matches 1 or more occurrence of the preceding expression. |
\bre{ n,} | Matches n or more occurrences of the preceding expression. |
\b | Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. |
re{ n} | Matches exactly n number of occurrences of preceding expression. |
re{ n,m} | Matches at least n and at most m occurrences of preceding expression. |
(re) | Groups regular expressions and remembers matched text. |
a| b | Matches either a or b. |
\s | Matches whitespace. Equivalent to [\t\n\r\f] |
\S | Matches non-whitespace. |
\d | Matches digits. Equivalent to [0-9]. |
\D | Matches non-digits. |
\Z | Matches end of the string. If a newline exists, it matches just before the newline. |
\z | Matches end of the string. |
\A | Matches beginning of string. |
\w | Matches word characters. |
\W | Matches non-word characters. |
(?#…) | Comments. |
(?> re) | Matches independent pattern without backtracking. |
(?! re) | Specifies position using pattern negation. Does not have a range. |
(?= re) | Specifies position using a pattern. Doesn’t have a range. |
(?imx) | Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
(?: re) | Groups regular expressions without remembering matched text. |
(?-imx: re) | Temporarily toggles off i, m, or x options within parentheses. |
(?imx: re) | Temporarily toggles on i, m, or x options within parentheses. |
(?-imx) | Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
\B | Matches non-word boundaries. |
\n, \t, etc. | Matches newlines, carriage returns, tabs, etc. |
\1…\9 | Matches nth grouped subexpression. |
\10 | Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code. |
Regular Expression Examples:
1) Literal characters:
Example | Description |
python | Match “python”. |
2) Character classes:
Examples | Description |
[Pp]ython | Match “Python” or “python” |
[aeiou] | Match any one lowercase vowel |
rub[ye] | Match “ruby” or “rube” |
[0-9] | Match any digit; same as [0123456789] |
[a-z] | Match any lowercase ASCII letter |
[A-Z] | Match any uppercase ASCII letter |
[a-zA-Z0-9] | Match any of the above |
[^aeiou] | Match anything other than a lowercase vowel |
[^0-9] | Match anything other than a digit |
3) Special Character Classes:
Examples | Description |
. | Match any character except newline |
\D | Match a nondigit: [^0-9] |
\d | Match a digit: [0-9] |
\s | Match a whitespace character: [ \t\r\n\f] |
\S | Match nonwhitespace: [^ \t\r\n\f] |
\w | Match a single word character: [A-Za-z0-9_] |
\W | Match a nonword character: [^A-Za-z0-9_] |
4) Repetition Cases:
Examples | Description |
ruby? | Match “rub” or “ruby”: the y is optional |
ruby* | Match “rub” plus 0 or more ys |
ruby+ | Match “rub” plus 1 or more ys |
\d{3} | Match exactly 3 digits |
\d{3,} | Match 3 or more digits |
\d{3,5} | Match 3, 4, or 5 digits |
5) Nongreedy repetition:
Examples | Description |
<.*> | Greedy repetition: matches “<python>perl>” |
<.*?> | Nongreedy: matches “<python>” in “<python>perl>” |
6) Grouping with Parentheses:
Examples | Description |
\D\d+ | No group: + repeats \d |
(\D\d)+ | Grouped: + repeats \D\d pair |
([Pp]ython(, )?)+ | Match “Python”, “Python, python, python”, etc. |
7) Backreferences:
Examples | Description |
([Pp])ython&\1ails | Match python&pails or Python&Pails |
([‘”])[^\1]*\1 | Single or double-quoted string. \1 matches whatever the 1st group matched. \2 matches whatever the 2nd group matched, etc. |
8) Special Syntax with Parentheses:
Examples | Description |
R(?#comment) | Matches “R”. All the rest is a comment |
R(?i:uby) | Case-insensitive while matching “uby” |
R(?i)uby | Case-insensitive while matching “uby” |
rub(?:y|le)) | Group only without creating \1 backreference |
9) Anchors:
Examples | Description |
^Python | Match “Python” at the start of a string or internal line. |
\APython | Match “Python” at the start of a string. |
Python\Z | Match “Python” at the end of a string. |
\bPython\b | Match “Python” at a word boundary. |
\brub\B | \B is nonword boundary: match “rub” in “rube” and “ruby” but not alone. |
Python(?=!) | Match “Python”, if followed by an exclamation point. |
Python(?!!) | Match “Python”, if not followed by an exclamation point. |
Python$ | Match “Python” at the end of a string or line. |
10) Alternatives:
Examples | Description |
python|perl | Match “python” or “perl” |
rub(y|le)) | Match “ruby” or “ruble” |
Python(!+|\?) | “Python” followed by one or more ! or one ? |