If you have read my previous posts; you might know I am currently working on a new project to move some services to a self-hosted solution. As part of this, I have been working on dealing with unicode
characters in regex.
In relation to this I have found that I am writing the same function repeatedly. The only difference being the number matches being returned. So I decided we need to refactor this.
Here is my solution
def findMatches(string, regex) -> dict:
"""
This is a generic matching function.
Warning! Your regex expression MUST use 'Named Groups' -> (:P<name>) or
this function will return an empty dictionary
:param string: The text you are searching
:type string: str
:param regex: The regular expression string you are using to search
:type regex: str
:returns: A dictionary of named key/value pairs. The key value is derived \
from (:P<name>)
:returns: None is returned if No match is found.
:rtype: dict
:rtype: None
"""
matcher = re.compile(regex, re.UNICODE)
match = matcher.match(string)
if match:
matches = dict()
for key in match.groupdict():
matches[key] = match.group(key)
return matches
# No Matches
return None
It takes two arguments. The string
to be searched and the regex
to be used. I go through the basic process of making the re
object and do the match.
I then go over match objects dictionary and get the name of the keys. I use these keys to make a simple dictionary object storing the matching key, value pairs.
No more writing the same basic function repeatedly.