Python Regular Expression
Regular Expressions (regex or regexp) in Python are patterns used to match character combinations in strings. The re module provides a range of functions to work with regular expressions.
Import the re-module
import re
Pattern | Description |
---|---|
. | Matches any character except newline |
^ | Start of the string |
$ | End of the string |
* | Matches 0 or more repetitions |
+ | Matches 1 or more repetitions |
? | Matches 0 or 1 repetitions |
{n} | Matches exactly n repetitions |
{n,} | Matches n or more repetitions |
{n,m} | Matches between n and m times |
[abc] | Matches any character in the set |
[^abc] | Matches any character not in the set |
\d | Matches any digit (0-9) |
\D | Matches any non-digit |
\w | Matches any word character (a-z, A-Z, 0-9, _) |
\W | Matches any non-word character |
\s | Matches any whitespace |
\S | Matches any non-whitespace |
1) re.match()
Matches a pattern at the start of the string. The group() method returns the complete matched subgroup by default or a tuple of matched subgroups depending on the number of arguments.
result = re.match(r"hello", "hello world") print(result.group())
Output
Hello
2) re.search()
Searches the entire string for a match and returns the first occurrence.
result = re.search(r"world", "hello world") print(result.group())
Output
World
3) re.findall()
Returns a list of all non-overlapping matches in the string.
result = re.findall(r"o", "hello world") print(result)
Output
['o', 'o']
4) re.finditer()
Returns an iterator yielding match objects for all matches.
for match in re.finditer(r"o", "hello world"): print(match.start(), match.group())
Output
4 o 7 o
5) re.sub()
Replaces all occurrences of the pattern with a specified string.
result = re.sub(r"world", "Python", "hello world") print(result)
Output
hello Python
6) re.split()
Splits a string by the occurrences of the pattern.
result = re.split(r"\s+", "hello world python") print(result)
Output
['hello', 'world', 'python']
7) re.compile()
Compiles a regular expression pattern into a regex object for reuse.
pattern = re.compile(r"\d+") result = pattern.findall("123 abc 456") print(result)
Output
['123', '456']