Python Regular Expressions

A Regular Expression (RegEx) represents a group of characters that forms a search pattern used for matching/searching within strings. Python Regular Expressions is a special sequence of characters that helps you match/find other strings or sets of strings, using a specialized syntax held in a pattern.

There is a module named “re” which provides python regular expressions support. It comprises functions such as split(), search(), match(), sub(), and findall() etc.

Below are the functions used in Python Regular Expressions:
  • re.match()
  • re.search()
  • re.findall()

re.match Function:

The match function attempts to match RE pattern to string with optional flags. The re.match function returns a match object on success, None on failure. 

Syntax:

# import re module
import re
Substring1 ='tutoriaslart'
Substring2 ='Python'
String1 ='''You are learning Python 
            Regular Expression at tutoriaslart.com'''
String2 ='Python Regular Expression at tutoriaslart.com'
# 
match_object1=re.match(Substring1, String1, re.IGNORECASE)
if match_object1:
    print('Matched string: ',match_object2.group())
else:
    print('No matched found for Substring1')
    
match_object2=re.match(Substring2, String2, re.IGNORECASE)
if match_object2:
    print('Match for Substring2: ',match_object2.group())
else:
    print('No matched found')
#re.match function-tutorialsart.com 
import re
test_string = "Python Regular Expression at tutorialsart.com"
pattern=r'(.*) (.*?) .*'
match_object = re.match(pattern , test_string, re.M|re.I)
if match_object:
   print ("match_object.group() : ", match_object.group())
   print ("match_object.group(1) : ", match_object.group(1))
   print ("match_object.group(2) : ", match_object.group(2))
  
else:
   print ("No match found")

re.search Function:

The search function, searches for first occurrence of RE pattern within string with optional flags.

Syntax:
re.search(pattern, string, flags=0)
 #re.search function-tutorialsart.com 
import re
test_string = "Python Regular Expression"
pattern=r'(.*) (.*?) .*'
search_object = re.search(pattern , test_string, re.M|re.I)
if search_object:
   print ("search_object.group() : ", search_object.group())
   print ("search_object.group(1) : ", search_object.group(1))
   print ("search_object.group(2) : ", search_object.group(2))
else:
   print ("No match found")
 

re.match and re.search functions have the same parameters and method objects

.

These are parameters used in both functions:

Parameters Description
patternThis is the regular expression to be matched
stringThis is a string, which searched to match the pattern at the beginning of string.
flagsDifferent flags can be specified using bitwise OR (|). These are modifiers.

Below is the list of Object Method with Description in both functions:

Object Methods Description
groups()This method returns all matching subgroups in a tuple.
group(num=0)This method returns the entire match /or specific subgroup number.

re.findall():

As we know in search() function, it returns only the first matched value of any given pattern, whereas findall() module returns search for “all” occurrences that match a given pattern. findall() will iterate over all the lines of the file and will return all non-overlapping matches of pattern in a single step.

Syntax:
#re.findall function-tutorialsart.com 
import re

pattern = 'tutorialsart'
string = "Python Regular Expression at tutorialsart.com from tutorialsart.com"
result = re.findall(pattern, string)
if result:
   print ('match found:',result)
else:
   print ("No match found")

Search and Replace:

Another function provided by Python is Search and Replace. This method replaces all occurrences of the Regular Expression’s given pattern in a string with repl, substituting all occurrences unless max provided. This method returns a modified string.

Syntax:
#search and replace

import re
phone = "897-923-451 #Given Number"
# Delete Python-style comments
number = re.sub(r'#.*$', "", phone)
print ("Given Number : ", number)
# Remove anything other than digits
number = re.sub(r'\D', "", phone)    
print ("Given Number : ", number)

Matching Vs Searching:

Python offers two distinctive primitive operations based on normal expressions: 

Match checks for a match as it were at the starting of the string, whereas Search checks for a match anywhere within the string.

#search vs match

import re
string = "John and Bob are friends";
matchObj = re.match( r'Bob', string, re.M|re.I)
if matchObj:
   print ("match --> matchObj.group() : ", matchObj.group())
else:
   print ("No match found")
searchObj = re.search( r'Bob', string, re.M|re.I)
if searchObj:
   print ("search --> searchObj.group() : ", searchObj.group())
else:
   print ("Nothing found")

Regular Expression Modifiers:

Option Flags

Regular expression literals may include an optional modifier to control various aspects of matching. The modifiers are specified as an optional flag. You can provide multiple modifiers using exclusive OR (|).

ModifierDescription
re.IPerforms case-insensitive matching.
re.LInterprets words according to the current locale. This interpretation
affects the alphabetic group (\w and \W), as well as word boundary
behavior (\b and \B).
re.XPermits “cuter” regular expression syntax. It ignores whitespace
(except inside a set [] or when escaped by a backslash) and treats
unescaped # as a comment marker
re.UInterprets letters according to the Unicode character set. This flag
affects the behavior of \w, \W, \b, \B
re.SMakes a period (dot) match any character, including a newline.
re.MMakes $ match the end of a line (not just the end of the string)
and makes ^ match the start of any line (not just the start of the string).

Regular-Expression Patterns:

In python all characters match themselves, except for control characters, (+ ? . * ^ $ ( ) [ ] { } | \). A control character can be escaped by preceding it with a backslash.

Given below is the table for those patterns:

PatternsDescription
^Matches beginning of line.
.Matches any single character except a newline.
Using the m option allows it to match the newline as well.
$Matches end of line.
\GMatches the point where the last match finished.
[…]Matches any single character in brackets.
[^…]Matches any single character not in brackets.
re*Matches 0 or more occurrences of the preceding expression.
re?Matches 0 or 1 occurrence of preceding expression.
re+Matches 1 or more occurrence of the preceding expression.
\bre{ n,}Matches n or more occurrences of the preceding expression.
\bMatches word boundaries when outside brackets. Matches
backspace (0x08) when inside  brackets.
re{ n}Matches exactly n number of occurrences of preceding
expression.
re{ n,m}Matches at least n and at most m occurrences of preceding
expression.
(re)Groups regular expressions and remembers matched text.
a| bMatches either a or b.
\sMatches whitespace. Equivalent to [\t\n\r\f]
\SMatches non-whitespace.
\dMatches digits. Equivalent to [0-9].
\DMatches non-digits.
\ZMatches end of the string. If a newline exists,
it matches just before the newline.
\zMatches end of the string.
\AMatches beginning of string.
\wMatches word characters.
\WMatches non-word characters.
(?#…)Comments.
(?> re)Matches independent pattern without backtracking.
(?! re)Specifies position using pattern negation. Does not have a range.
(?= re)Specifies position using a pattern. Doesn’t have a range.
(?imx)Temporarily toggles on i, m, or x options within a regular
expression. If in parentheses, only that area is affected.
(?: re)Groups regular expressions without remembering matched text.
(?-imx: re)Temporarily toggles off i, m, or x options within parentheses.
(?imx: re)Temporarily toggles on i, m, or x options within parentheses.
(?-imx)Temporarily toggles off i, m, or x options within a regular
expression. If in parentheses, only that area is affected.
\BMatches non-word boundaries.
\n, \t, etc.Matches newlines, carriage returns, tabs, etc.
\1…\9Matches nth grouped subexpression.
\10
Matches nth grouped subexpression if it matched already.
Otherwise refers to the octal representation of a character code.

Regular Expression Examples:

1) Literal characters:

Example Description
pythonMatch “python”.

2) Character classes:

ExamplesDescription
[Pp]ythonMatch “Python” or “python”
[aeiou]Match any one lowercase vowel
rub[ye]Match “ruby” or “rube”
[0-9]Match any digit; same as [0123456789]
[a-z]Match any lowercase ASCII letter
[A-Z]Match any uppercase ASCII letter
[a-zA-Z0-9]Match any of the above
[^aeiou]Match anything other than a lowercase vowel
[^0-9]Match anything other than a digit

3) Special Character Classes:

ExamplesDescription
.Match any character except newline
\DMatch a nondigit: [^0-9]
\dMatch a digit: [0-9]
\sMatch a whitespace character: [ \t\r\n\f]
\SMatch nonwhitespace: [^ \t\r\n\f]
\wMatch a single word character: [A-Za-z0-9_]
\WMatch a nonword character: [^A-Za-z0-9_]

4) Repetition Cases:

ExamplesDescription
ruby?Match “rub” or “ruby”: the y is optional
ruby*Match “rub” plus 0 or more ys
ruby+Match “rub” plus 1 or more ys
\d{3}Match exactly 3 digits
\d{3,}Match 3 or more digits
\d{3,5}Match 3, 4, or 5 digits

5) Nongreedy repetition:

ExamplesDescription
<.*>Greedy repetition: matches “<python>perl>”
<.*?>Nongreedy: matches “<python>” in “<python>perl>”

6) Grouping with Parentheses:

ExamplesDescription
\D\d+No group: + repeats \d
(\D\d)+Grouped: + repeats \D\d pair
([Pp]ython(, )?)+Match “Python”, “Python, python, python”, etc.

7) Backreferences:

ExamplesDescription
([Pp])ython&\1ailsMatch python&pails or Python&Pails
([‘”])[^\1]*\1Single or double-quoted string. \1 matches whatever the
1st group matched. \2 matches whatever the 2nd group matched, etc.

8) Special Syntax with Parentheses:

ExamplesDescription
R(?#comment)Matches “R”. All the rest is a comment
R(?i:uby)Case-insensitive while matching “uby”
R(?i)ubyCase-insensitive while matching “uby”
rub(?:y|le))Group only without creating \1 backreference

9) Anchors:

ExamplesDescription
^PythonMatch “Python” at the start of a string or internal line.
\APythonMatch “Python” at the start of a string.
Python\ZMatch “Python” at the end of a string.
\bPython\bMatch “Python” at a word boundary.
\brub\B\B is nonword boundary: match “rub”
in “rube” and “ruby” but not alone.
Python(?=!)Match “Python”, if followed by an exclamation point.
Python(?!!)Match “Python”, if not followed by an exclamation point.
Python$Match “Python” at the end of a string or line.

10) Alternatives:

ExamplesDescription
python|perlMatch “python” or “perl”
rub(y|le))Match “ruby” or “ruble”
Python(!+|\?)“Python” followed by one or more ! or one ?