Working with double-byte regex expressions with Python3

As part of my project Self Hosted Zapier Alternative; I am having to deal with doing regex searches against the three Japanese written forms, Kanji, Hiragana and Katakana.

Fortunately this is a common problem. So I have found some references for this. Also one of my favourite tools for developing regex expressions, Regex101, also offers support in this area.

I found this useful Github Gist.
You should also check the gist directly as there are some follow up comments and additions. See here

Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf) ~ The Big Kahuna!
Regex for matching Hirgana or Katakana
Regex for matching Non-Hirgana or Non-Katakana
Regex for matching Hirgana or Katakana or basic punctuation (、。’)
Regex for matching Hirgana or Katakana and random other characters
Regex for matching Hirgana
Regex for matching full-width Katakana (zenkaku 全角)
Regex for matching half-width Katakana (hankaku 半角)
Regex for matching full-width Numbers (zenkaku 全角)
Regex for matching full-width Letters (zenkaku 全角)
Regex for matching Hiragana codespace characters (includes non phonetic characters)
Regex for matching full-width (zenkaku) Katakana codespace characters (includes non phonetic characters)
Regex for matching half-width (hankaku) Katakana codespace characters (this is an old character set so the order is inconsistent with the hiragana)
Regex for matching Japanese Post Codes
Regex for matching Japanese mobile phone numbers (keitai bangou)
Regex for matching Japanese fixed line phone numbers

Using Regex101 I was able to come up with the following expression.


This will successfully match a string such as:
「渋11 渋谷駅行き・駒沢大学駅前」でタッチしました。  
Resulting in the following three groups.
busname = 渋11
destination = 渋谷駅
boardedat = 駒沢大学駅

If you are working in PHP you can also use the following:
\p{Han} (Using Chinese to match Kanji)

You can also checkout my Regex Experiments:
v1 PHP
v2 Python3

See also