Looking at Regex in Rust

If you have been following this series, you might know that I am playing with SPF records. I have turned my eye to a and mx mechanisms. As I started looking at the a mechanism. I noticed that my current approach using the standard string functions would probably be fairly difficult to implement. So I started to think about using the Regex crate. So this will be a look at how that went. The challenges and the things that I took away from the experience.

I started out first by checking what were the possible patterns for an a mechanism. So I did some searching and found this link.

dmarcian.com Image

This also listed some examples

typeSample
av=spf1 a ~all
a:domainv=spf1 a:example.com ~al
a:hostsv=spf1 a:mailers.example.com ~all
a/prefixv=spf1 a/24 a:offsite.example.com/24 ~all

Initial attempt

So yes. This I suspect would take a lot of work. But I felt pretty confident in my regex skills. Seemed pretty straight forward. So I started off with my trusty regex101 - pattern and test data and came up with a nice regex that covers my cases.

(?P<a_only>^a)|(?:a:)(?P<a_colon>[^\/].+)|(?P<a_slash>a\/\d{1,2})

capture a_only or capture a_colon or capture a_slash It looked pretty clean in regex101. But I wanted to check things in rust. I came a cross Rustexp. I thought nice. I can throw my pattern in here and check if it works. Spoiler: it doesn’t like alternatives, all those | pipes.

So I started looking for another way to get things done. I read through some of the regex docs and found RegexSet.

With a lot of work I finally settled on the following code.

get_match() does the work here.

What are we doing here?

  1. Line 10: Create a RegexSet and pass it along with the string to search into get_match()
  2. Lines 30 ~ 33: Within get_match(), we assume there will be no match and so prepare to return a None if the check is false.
  3. Lines 35 ~ 42:
    1. We see if there is a pattern match.
    2. We get the index of the pattern that matched.
    3. We get the patterns from RegexSet
    4. We create an actual Regex::new() To this we pass the pattern located the at the index we found previously.
    5. Then we run the correct pattern on the string.
  4. We loop over the captures to get the capture we are looking for.

This is not the best solution. There are a few things that probably need to be addressed.

But I learned how to use RegexSet to some degree.

Discord

I popped onto discord to share my code and my initial regex101 pattern. I wasn’t satisfied with my solution and wanted to see if there were some better options.

The feedback was positive. And I was shown a couple of solutions. Thanks to T-Dark the Holy War Starter for giving his time and ideas.

  1. gist:0af5220fe7d4ca69dadde481b3aab816
  2. I was shown an option using my pattern which initially failed to compile. But with some work I got it working. gist:f317eeb698e7d4790c01edf1a1423b7a.

I changed my pattern to the following:
r"(?P<a_only>a$)|(?:a:)(?P<a_colon>[^/].+)|(?P<a_slash>a/\d{1,2})"

  1. replace ^a with a$
  2. remove the usually required escape sequence \

Take Aways

  1. Rustexp at the time writing is still a bit limited. But may still be of some help.
  2. The Rust dicord group is a great place to interact with others and ask question to help in the learning process. They have specific groups for learners.
  3. In rust the / is not a special character and does not need to be escaped, particularly if you are doing something like r"...."
  4. The rust playground is really useful.

Closing Thoughts

Whilst this is not yet fully complete. I wanted to share this now while it is fresh. I will look at updating this post with the final regex solution once I have it fully implemented and tested.


See also