If you have been following this series, you might know that I am playing with SPF records. I have turned my eye to a
and mx
mechanisms. As I started looking at the a
mechanism. I noticed that my current approach using the standard string functions would probably be fairly difficult to implement. So I started to think about using the Regex crate. So this will be a look at how that went. The challenges and the things that I took away from the experience.
I started out first by checking what were the possible patterns for an a
mechanism. So I did some searching and found this link.
This also listed some examples
type | Sample |
---|---|
a | v=spf1 a ~all |
a:domain | v=spf1 a:example.com ~al |
a:hosts | v=spf1 a:mailers.example.com ~all |
a/prefix | v=spf1 a/24 a:offsite.example.com/24 ~all |
Initial attempt
So yes. This I suspect would take a lot of work. But I felt pretty confident in my regex skills. Seemed pretty straight forward. So I started off with my trusty regex101 - pattern and test data and came up with a nice regex that covers my cases.
(?P<a_only>^a)|(?:a:)(?P<a_colon>[^\/].+)|(?P<a_slash>a\/\d{1,2})
capture a_only or capture a_colon or capture a_slash
It looked pretty clean in regex101. But I wanted to check things in rust.
I came a cross Rustexp. I thought nice. I can throw my pattern in here and check if it works. Spoiler: it doesn’t like alternatives
, all those |
pipes.
So I started looking for another way to get things done. I read through some of the regex docs and found RegexSet
.
With a lot of work I finally settled on the following code.
get_match()
does the work here.
What are we doing here?
- Line 10: Create a RegexSet and pass it along with the string to search into
get_match()
- Lines 30 ~ 33: Within
get_match()
, we assume there will be no match and so prepare to return a None if the check is false. - Lines 35 ~ 42:
- We see if there is a pattern match.
- We get the index of the pattern that matched.
- We get the patterns from RegexSet
- We create an actual
Regex::new()
To this we pass the pattern located the at the index we found previously. - Then we run the correct pattern on the string.
- We loop over the captures to get the capture we are looking for.
This is not the best solution. There are a few things that probably need to be addressed.
But I learned how to use RegexSet to some degree.
Discord
I popped onto discord to share my code and my initial regex101 pattern. I wasn’t satisfied with my solution and wanted to see if there were some better options.
The feedback was positive. And I was shown a couple of solutions. Thanks to T-Dark the Holy War Starter for giving his time and ideas.
- gist:0af5220fe7d4ca69dadde481b3aab816
- I was shown an option using my pattern which initially failed to compile. But with some work I got it working. gist:f317eeb698e7d4790c01edf1a1423b7a.
I changed my pattern to the following:r"(?P<a_only>a$)|(?:a:)(?P<a_colon>[^/].+)|(?P<a_slash>a/\d{1,2})"
- replace
^a
witha$
- remove the usually required escape sequence
\
Take Aways
- Rustexp at the time writing is still a bit limited. But may still be of some help.
- The Rust discord group is a great place to interact with others and ask question to help in the learning process. They have specific groups for learners.
- In rust the / is not a special character and does not need to be escaped, particularly if you are doing something like
r"...."
- The rust playground is really useful.
Closing Thoughts
Whilst this is not yet fully complete. I wanted to share this now while it is fresh. I will look at updating this post with the final regex solution once I have it fully implemented and tested.