Hyperscan — Intel’s library for fast testing a string against multiple regexps.
Natural Language Isn’t Just English — English isn’t a great representative of the diversity of languages in the world: It’s a spoken language, not a signed language; it has a well-established, long-used roughly phone-based orthographic system; … with white space between words; … using (mostly) only lower-ascii characters; it has relatively little morphology; and, thus, fewer forms of each word; it has relatively fixed word order, etc. It just happens to have a massive training set.