Using Python Regex to extract phone numbers from a text file

with open ('lorem.txt', 'rt') as myfile:  # Open lorem.txt for reading text
    contents = myfile.read()              # Read the entire file to a string
# print(contents)                         # Print the string if you want to

# Now let's extract the text from here

import re
reg_ex=r"\+?\d+(?:[- (]+\d+\)?)+"
print(re.findall(rs, contents))

The code imports the re module, which provides support for regular expressions in Python.

  1. reg_ex = r"\+?\d+(?:[- (]+\d+\)?)+" defines a regular expression pattern. Let's break it down:

    \+?: Matches an optional plus sign (\+). The backslash \ is used to escape the plus sign because it has a special meaning in regular expressions.

    \d+: Matches one or more digits (\d). This captures the numeric part of the phone number.

    (?:[- (]+\d+\)?)+: This is a non-capturing group (?: ... ) that matches one or more occurrences of a sequence of characters. Let's break it down further:

    [- (]+: Matches one or more occurrences of a hyphen, space, or opening parenthesis character. The characters are enclosed within square brackets [- (].

    \d+: Matches one or more digits.

    \)?: Matches an optional closing parenthesis \).

    The combination of (?:[- (]+\d+\)?)+ inside the capturing group (...)+ allows the regular expression to match multiple occurrences of the separator and digit pattern, capturing the entire phone number.

    re.findall(rs, contents) searches for all non-overlapping matches of the regular expression pattern rs in the contents string. It returns a list of all matched substrings.

tips: \+?: The plus sign (\+) is optional (?). It matches zero or one occurrence of the plus sign. This allows for phone numbers with or without a plus sign at the beginning, indicating an international number.\d+: This matches one or more digits (\d). It captures the numeric portion of the phone number, such as the area code and subscriber number.