Progress0%

10 of 20 topics completed

Python Regular Expressions

Regular expressions (regex or regexp) are powerful sequences of characters that define search patterns. They're extremely useful for finding, matching, and manipulating text. In this tutorial, you'll learn how to use Python's built-in regex module to perform pattern matching operations.

💡 Real-World Applications

Regular expressions are used in many real-world scenarios:

Form validation (email, phone numbers, passwords)
Data extraction and web scraping
Search and replace operations
Text parsing and processing
Log file analysis

Introduction to Regular Expressions

To use regular expressions in Python, we need to import the re module:

Python

1import re
2
3# Basic example: finding a pattern in a string
4text = "Contact us at info@example.com for more information."
5pattern = r"info@example.com"
6
7if re.search(pattern, text):
8    print("Email address found!")
9else:
10    print("Email address not found.")

⚠️ Note on Raw Strings

Notice the r prefix in r"info@example\.com". This creates a "raw string" that treats backslashes literally, which is important in regex patterns where backslashes have special meaning.

Basic Pattern Matching

The re module provides several functions for working with regular expressions:

Python

1import re
2
3text = "Python programming is fun and powerful!"
4
5# 1. search() - Find the first match
6result = re.search(r"fun", text)
7if result:
8    print(f"Match found at position: {result.start()}")  # Output: Match found at position: 22
9
10# 2. findall() - Find all matches
11matches = re.findall(r"p[a-z]*", text, re.IGNORECASE)
12print(matches)  # Output: ['Python', 'programming', 'powerful']
13
14# 3. match() - Check if string starts with the pattern
15if re.match(r"Python", text):
16    print("Text starts with 'Python'")  # This will print
17
18# 4. split() - Split string by pattern
19words = re.split(r"s+", text)
20print(words)  # Output: ['Python', 'programming', 'is', 'fun', 'and', 'powerful!']
21
22# 5. sub() - Replace pattern with another string
23new_text = re.sub(r"fun", "enjoyable", text)
24print(new_text)  # Output: Python programming is enjoyable and powerful!

Regular Expression Patterns

Regular expression patterns use special characters to match different types of text:

Character	Description	Example
`.`	Matches any character except newline	`a.c` matches "abc", "axc", etc.
`^`	Matches start of string	`^hello` matches strings starting with "hello"
`$`	Matches end of string	`world$` matches strings ending with "world"
`*`	Matches 0 or more repetitions	`ab*c` matches "ac", "abc", "abbc", etc.
`+`	Matches 1 or more repetitions	`ab+c` matches "abc", "abbc", but not "ac"
`?`	Matches 0 or 1 repetition	`ab?c` matches "ac" or "abc"
`{n}`	Matches exactly n repetitions	`a{3}` matches "aaa"
`{n,}`	Matches n or more repetitions	`a{2,}` matches "aa", "aaa", etc.
`{n,m}`	Matches between n and m repetitions	`a{1,3}` matches "a", "aa", or "aaa"
`[]`	Character set - matches any character in the brackets	`[abc]` matches "a", "b", or "c"
`[^]`	Negated character set - matches any character not in the brackets	`[^abc]` matches any character except "a", "b", or "c"
`\d`	Matches any digit (0-9)	`\d3` matches three digits like "123"
`\w`	Matches any alphanumeric character (a-z, A-Z, 0-9, _)	`\w+` matches words like "Python3"
`\s`	Matches any whitespace character	`hello\sworld` matches "hello world"
`\|`	Alternation (OR)	`cat\|dog` matches either "cat" or "dog"

Common Regex Examples

Email Validation

Python

import re

def is_valid_email(email):
    # Simple email pattern
    pattern = r'^[w.-]+@[w.-]+.w+$'
    return bool(re.match(pattern, email))

# Test the function
emails = [
    "user@example.com",     # Valid
    "john.doe@company.co",  # Valid
    "invalid@email",        # Invalid - missing top-level domain
    "@missing.com",         # Invalid - missing username
    "spaces not@allowed.com" # Invalid - contains space
]

for email in emails:
    if is_valid_email(email):
        print(f"{email} is a valid email address")
    else:
        print(f"{email} is NOT valid")

Phone Number Extraction

Python

import re

text = """Contact info:
Alice: (123) 456-7890
Bob: 555-123-4567
Charlie: 987.654.3210
"""

# Pattern for different phone formats
pattern = r'[(]?d{3}[)]?[-.s]?d{3}[-.s]?d{4}'

# Find all phone numbers
phone_numbers = re.findall(pattern, text)
print("Phone numbers found:")
for number in phone_numbers:
    print(number)

# Output:
# (123) 456-7890
# 555-123-4567
# 987.654.3210

Groups and Capturing

You can use parentheses () to create capture groups in your patterns, which allow you to extract specific parts of the matched text:

Python

1import re
2
3# Extracting information from a structured string
4log_entry = "2023-05-15 14:32:15 - ERROR - File not found: data.csv"
5
6pattern = r"(d{4}-d{2}-d{2}) (d{2}:d{2}:d{2}) - (w+) - (.+)"
7match = re.search(pattern, log_entry)
8
9if match:
10    date = match.group(1)
11    time = match.group(2)
12    level = match.group(3)
13    message = match.group(4)
14    
15    print(f"Date: {date}")
16    print(f"Time: {time}")
17    print(f"Log Level: {level}")
18    print(f"Message: {message}")
19
20# Output:
21# Date: 2023-05-15
22# Time: 14:32:15
23# Log Level: ERROR
24# Message: File not found: data.csv

Named Groups

For more readable code, you can use named groups with the (?P<name>pattern) syntax:

Python

1import re
2
3# Parsing a URL using named groups
4url = "https://www.example.com:8080/path/to/page.html?query=value#section"
5
6pattern = r"(?P<protocol>https?://)?(?P<host>[w.-]+)(:(?P<port>d+))?(?P<path>/[w/.-]*)?(?(?P<query>[w=&]+))?(?P<fragment>#[w-]+)?"
7match = re.search(pattern, url)
8
9if match:
10    # Access groups by name
11    protocol = match.group("protocol") or ""
12    host = match.group("host") or ""
13    port = match.group("port") or "default"
14    path = match.group("path") or "/"
15    query = match.group("query") or "none"
16    fragment = match.group("fragment") or "none"
17    
18    print(f"Protocol: {protocol}")
19    print(f"Host: {host}")
20    print(f"Port: {port}")
21    print(f"Path: {path}")
22    print(f"Query: {query}")
23    print(f"Fragment: {fragment}")

Flags and Options

Python's re module provides several flags to modify the behavior of regular expressions:

Python

1import re
2
3text = """
4Python is a programming language.
5PYTHON is very popular.
6python is easy to learn.
7"""
8
9# Case-insensitive matching with re.IGNORECASE
10matches = re.findall(r"python", text, re.IGNORECASE)
11print(f"Found {len(matches)} occurrences of 'python'")  # Found 3 occurrences of 'python'
12
13# Multi-line mode with re.MULTILINE
14# ^ and $ match the start/end of each line
15matches = re.findall(r"^python", text, re.MULTILINE | re.IGNORECASE)
16print(f"Found {len(matches)} lines starting with 'python'")  # Found 1 lines starting with 'python'
17
18# Dot matches any character including newline with re.DOTALL
19pattern_with_dotall = re.compile(r"programming.*popular", re.DOTALL)
20match1 = pattern_with_dotall.search(text)
21print("With DOTALL:", "Match found" if match1 else "No match")  # With DOTALL: Match found
22
23pattern_without_dotall = re.compile(r"programming.*popular")
24match2 = pattern_without_dotall.search(text)
25print("Without DOTALL:", "Match found" if match2 else "No match")  # Without DOTALL: No match

Flag	Description
`re.IGNORECASE` or `re.I`	Perform case-insensitive matching
`re.MULTILINE` or `re.M`	Make ^ and $ match the beginning/end of each line
`re.DOTALL` or `re.S`	Make . match any character including newline
`re.VERBOSE` or `re.X`	Allow pattern to contain comments and whitespace

Using Verbose Mode for Complex Patterns

For complex patterns, you can use the re.VERBOSE flag to make your regex more readable:

Python

1import re
2
3# Complex pattern for validating a password
4# Rules:
5# - At least 8 characters
6# - Contains at least one uppercase letter
7# - Contains at least one lowercase letter
8# - Contains at least one digit
9# - Contains at least one special character
10
11password_pattern = re.compile(r"""
12    ^                   # Start of string
13    (?=.*[A-Z])         # At least one uppercase letter
14    (?=.*[a-z])         # At least one lowercase letter
15    (?=.*d)            # At least one digit
16    (?=.*[!@#$%^&*()])  # At least one special character
17    .{8,}               # At least 8 characters long
18    $                   # End of string
19""", re.VERBOSE)
20
21def is_valid_password(password):
22    return bool(password_pattern.match(password))
23
24# Test the function
25passwords = [
26    "Abc123!",        # Too short
27    "password123",    # No uppercase or special char
28    "PASSWORD123!",   # No lowercase
29    "Password!",      # No digit
30    "P@ssw0rd",       # Valid
31    "Str0ng!Pass"     # Valid
32]
33
34for password in passwords:
35    if is_valid_password(password):
36        print(f"'{password}' is a valid password")
37    else:
38        print(f"'{password}' is NOT valid")

Practical Example: Log Parser

Let's build a simple log parser that extracts information from log entries:

Python

1import re
2from datetime import datetime
3
4log_data = """
52023-01-15 08:22:03 INFO User login successful: alice@example.com
62023-01-15 08:23:15 WARNING Failed login attempt: bob@example.com (wrong password)
72023-01-15 08:25:42 ERROR Database connection failed: timeout after 30s
82023-01-15 09:05:22 INFO User logout: alice@example.com
92023-01-15 09:10:54 ERROR File not found: /data/reports/january.csv
10"""
11
12# Define the pattern with named groups
13log_pattern = re.compile(r"""
14    (d{4}-d{2}-d{2})s+  # Date (YYYY-MM-DD)
15    (d{2}:d{2}:d{2})s+  # Time (HH:MM:SS)
16    (w+)s+                # Log level (INFO, WARNING, ERROR)
17    (.+)                    # Message
18""", re.VERBOSE)
19
20# Parse the log entries
21entries = []
22for match in log_pattern.finditer(log_data):
23    date_str, time_str, level, message = match.groups()
24    
25    # Convert to datetime object
26    timestamp = datetime.strptime(f"{date_str} {time_str}", "%Y-%m-%d %H:%M:%S")
27    
28    entries.append({
29        'timestamp': timestamp,
30        'level': level,
31        'message': message
32    })
33
34# Filter for error entries
35error_entries = [entry for entry in entries if entry['level'] == 'ERROR']
36
37# Print the results
38print(f"Total log entries: {len(entries)}")
39print(f"Error entries: {len(error_entries)}")
40print("
41Error details:")
42for entry in error_entries:
43    print(f"{entry['timestamp']}: {entry['message']}")

🎯 Try it yourself!

Create a function that uses regular expressions to extract all URLs from a text document. The function should handle URLs starting with http://, https://, or www.

Python

def extract_urls(text):
    # Your code here
    pass

sample_text = """
Check out these websites:
https://www.python.org
http://example.com/page
Visit www.github.com for code repositories
Email me at user@example.com for more info.
"""

urls = extract_urls(sample_text)
print(urls)  # Should print: ['https://www.python.org', 'http://example.com/page', 'www.github.com']

Best Practices for Regular Expressions

Keep it simple - Use the simplest pattern that does the job
Test thoroughly - Test your regex with various inputs, including edge cases
Use raw strings - Always use raw strings (r"pattern") for regex patterns
Use named groups - For complex patterns, named groups make code more readable
Compile patterns - For patterns used multiple times, compile them first
Use verbose mode - For complex patterns, use re.VERBOSE and comments
Be careful with greedy matching - Use non-greedy quantifiers (*?, +?) when appropriate

Summary

In this tutorial, you've learned:

How to use Python's re module for regular expressions
Basic pattern matching and common regex metacharacters
How to use capture groups and named groups
Working with regex flags like re.IGNORECASE and re.VERBOSE
Practical examples like email validation, URL parsing, and log analysis

Regular expressions are incredibly powerful for text manipulation, but they can also be complex. Start with simple patterns and gradually build your understanding. With practice, you'll be able to craft efficient patterns for any text processing need.

Getting Started

Basic Concepts

Data Structures

Advanced Topics

Python Regular Expressions

💡 Real-World Applications

Introduction to Regular Expressions

⚠️ Note on Raw Strings

Basic Pattern Matching

Regular Expression Patterns

Common Regex Examples

Email Validation

Phone Number Extraction

Groups and Capturing

Named Groups

Flags and Options

Using Verbose Mode for Complex Patterns

Practical Example: Log Parser

🎯 Try it yourself!

Best Practices for Regular Expressions

Summary

Related Tutorials

Python Strings

Python File I/O

Python Error Handling