Using Python Regex to extract phone numbers from a text file
A data analytics engineer with four years of experience working as a data engineer. Holds a MSc in Data.
with open ('lorem.txt', 'rt') as myfile: # Open lorem.txt for reading text
contents = myfile.read() # Read the entire file to a string
# print(contents) # Print the string if you want to
# Now let's extract the text from here
import re
reg_ex=r"\+?\d+(?:[- (]+\d+\)?)+"
print(re.findall(rs, contents))
The code imports the re module, which provides support for regular expressions in Python.
reg_ex = r"\+?\d+(?:[- (]+\d+\)?)+"defines a regular expression pattern. Let's break it down:\+?: Matches an optional plus sign (\+). The backslash\is used to escape the plus sign because it has a special meaning in regular expressions.\d+: Matches one or more digits (\d). This captures the numeric part of the phone number.(?:[- (]+\d+\)?)+: This is a non-capturing group(?: ... )that matches one or more occurrences of a sequence of characters. Let's break it down further:[- (]+: Matches one or more occurrences of a hyphen, space, or opening parenthesis character. The characters are enclosed within square brackets[- (].\d+: Matches one or more digits.\)?: Matches an optional closing parenthesis\).The combination of
(?:[- (]+\d+\)?)+inside the capturing group(...)+allows the regular expression to match multiple occurrences of the separator and digit pattern, capturing the entire phone number.re.findall(rs, contents)searches for all non-overlapping matches of the regular expression patternrsin thecontentsstring. It returns a list of all matched substrings.
tips: \+?: The plus sign (\+) is optional (?). It matches zero or one occurrence of the plus sign. This allows for phone numbers with or without a plus sign at the beginning, indicating an international number.\d+: This matches one or more digits (\d). It captures the numeric portion of the phone number, such as the area code and subscriber number.
