UniRegex

This is a compiler that turns unicode regexes into 8-bit regexes for matching utf-8 strings. This is possible for the same reason that it's possible to write regexes that match ranges of decimal numbers. Character classes (not just [a-z] style classes, but also ., \s, etc) need to be compiled to such range matchers; however, normal text does not.

Unicode Character Category character classes like \p{Alphabetic} are not supported. \d and \w use a non-unicode definition. \s has a fixed definition that matches ECMAScript's (JS's), rather than being based on a particular unicode version.

↓ your unicode regex

↓ 8-bit-regex-compatible equivalent here
(?:[a-z]|\xe3\x81[\x84-\xbf]|\xe3[\x82-\xbf].|\xe4\x80\x80)*?

Known escape sequence types: \xNN, \x{NN...NN}, \uNNNN, \w\W\s\S\d\D (no unicode character category handling)

Report a bug on github! https://github.com/wareya/uniregex/tree/main