Using Python Regex to extract phone numbers from a text file
with open ('lorem.txt', 'rt') as myfile: # Open lorem.txt for reading text
contents = myfile.read() # Read the entire file to a string
# print(contents) # Print the string if you want to
# Now let's extract the text from here
import re
reg_ex=r"\+?\d+(?:[- (]+\d+\)?)+"
print(re.findall(rs, contents))
The code imports the re
module, which provides support for regular expressions in Python.
reg_ex = r"\+?\d+(?:[- (]+\d+\)?)+"
defines a regular expression pattern. Let's break it down:\+?
: Matches an optional plus sign (\+
). The backslash\
is used to escape the plus sign because it has a special meaning in regular expressions.\d+
: Matches one or more digits (\d
). This captures the numeric part of the phone number.(?:[- (]+\d+\)?)+
: This is a non-capturing group(?: ... )
that matches one or more occurrences of a sequence of characters. Let's break it down further:[- (]+
: Matches one or more occurrences of a hyphen, space, or opening parenthesis character. The characters are enclosed within square brackets[- (]
.\d+
: Matches one or more digits.\)?
: Matches an optional closing parenthesis\)
.The combination of
(?:[- (]+\d+\)?)+
inside the capturing group(...)+
allows the regular expression to match multiple occurrences of the separator and digit pattern, capturing the entire phone number.re.findall(rs, contents)
searches for all non-overlapping matches of the regular expression patternrs
in thecontents
string. It returns a list of all matched substrings.
tips: \+?
: The plus sign (\+
) is optional (?
). It matches zero or one occurrence of the plus sign. This allows for phone numbers with or without a plus sign at the beginning, indicating an international number.\d+
: This matches one or more digits (\d
). It captures the numeric portion of the phone number, such as the area code and subscriber number.