Regex Utils 🔤

Zero-dependency TypeScript library for regex utilities that go beyond string matching. These are surprisingly hard to come by for any programming language. ✨

📚 Documentation
🔍 RegExp Equivalence Checker

API Overview 🚀

🔗 Set-like operations:
- .and(...) - Compute intersection of two regex.
- .not() - Compute the complement of a regex.
- .without(...) - Compute the difference of two regex.
✅ Set-like predicates:
- .isEquivalent(...) - Check whether two regex match the same strings.
- .isSubsetOf(...)
- .isSupersetOf(...)
- .isDisjointFrom(...)
📜 Generate strings:
- .sample(...) - Generate random strings matching a regex.
- .enumerate() - Exhaustively enumerate strings matching a regex.
🔧 Miscellaneous:
- .size() - Count the number of strings that a regex matches.
- .derivative(...) - Compute a Brzozowski derivative of a regex.

Installation 📦

npm install @gruhn/regex-utils

import { RB } from '@gruhn/regex-utils'

Example Use Cases 💡

Generate random strings from Regex 📜

Generate 5 random email addresses:

const email = RB(/^[a-z]+@[a-z]+\.[a-z]{2,3}$/) for (const str of email.sample().take(5)) { console.log(str) }

ky@e.no cc@gg.gaj z@if.ojk vr@y.ehl e@zx.hzq

Generate 5 random email addresses, which have exactly 20 characters:

const emailLength20 = email.and(/^.{20}$/) for (const str of emailLength20.sample().take(5)) { console.log(str) }

kahragjijttzyze@i.mv gnpbjzll@cwoktvw.hhd knqmyotxxblh@yip.ccc kopfpstjlnbq@lal.nmi vrskllsvblqb@gemi.wc

Refactor Regex then Check Equivalence 🔄

Say we identified a regex in the code base that is prone to catastrophic backtracking and came up with a new version:

const oldRegex = /^(?:[a-zA-Z]\:\\|\\\\)([^\\\/\:\*\?\<\>\"\|]+(\\){0,1})+$/ const newRegex = /^(?:[a-zA-Z]:\\|\\\\)([^\\\/\:*?<>"|]+\\?)+$/

Using .isEquivalent we can verify that the refactored version matches exactly the same strings as the old version. That is, whether oldRegex.test(str) === newRegex.test(str) for every possible input string:

RB(oldRegex).isEquivalent(newRegex) // true

There is also a web interface for checking regex equivalence which also shows counterexample strings if the two regular expressions are not equivalent. The source code is a single HTML file: ./equiv-checker.html.

Comment Regex using Complement 💬

How do you write a regex that matches HTML comments like:

<!-- This is a comment -->

A straight forward attempt would be:

<!--.*-->

The problem is that .* also matches the end marker -->, so this is also a match:

<!-- This is a comment --> and this shouldn't be part of it -->

We need to specify that the inner part can be any string that does not contain -->. With .not() (aka. regex complement) this is easy:

import { RB } from '@gruhn/regex-utils' const strContainingCommentEnd = RB(/.*-->.*/) const commentStart = RB('<!--') const commentInner = RB(/^.*-->.*$/).not() const commentEnd = RB('-->') const comment = commentStart.concat(commentInner).concat(commentEnd)

With .toRegExp() we can convert back to a native JavaScript regex:

comment.toRegExp()

/^<!--(---*[^->]|-?[^-])*---*>$/

Password Regex using Intersections 🔐

It's difficult to write a single regex for multiple independent constraints. For example, to specify a valid password. But with regex intersections it's very natural:

import { RB } from '@gruhn/regex-utils' const passwordRegex = RB(/^[a-zA-Z0-9]{12,32}$/) // 12-32 alphanumeric characters .and(/[0-9]/) // at least one number .and(/[A-Z]/) // at least one upper case letter  .and(/[a-z]/) // at least one lower case letter

We can convert this back to a native JavaScript RegExp with:

passwordRegex.toRegExp()

Note

The output RegExp can be very large.

We can also use other utilities like .size() to determine how many potential passwords match this regex:

console.log(passwordRegex.size())

2301586451429392354821768871006991487961066695735482449920n

With .sample() we can generate some of these matches:

for (const str of passwordRegex.sample().take(10)) { console.log(str) }

NEWJIAXQISWT0Wwm lxoegadrzeynezkmtfcIBzzQ9e ypzvhvtwpWk4u6 MSZXXKIKEKWKXLQ8HQ7Ds BCBSFBSMNOLKlgQN5L 8950244600709IW1pg UOTQBLVOTZQWFSAJYBXZNQBEeom0l 520302447164378435bv4dp4ysC 71073970686490eY2Jt4 afgpnxqwUK5B

Solve Advent Of Code 2023 - Day 12 🎄

In the coding puzzle Advent Of Code 2023 - Day 12 you are given pairs of string patterns. An example pair is .??..??...?##. and 1,1,3. Both patterns describe a class of strings and the task is to count the number of strings that match both patterns.

In the first pattern, . and # stand for the literal characters "dot" and "hash". The ? stands for either . or #. This can be written as a regular expression:

for # we simply write #
for . we write o (since . has a special meaning in regular expressions)
for ? we write (o|#)

So the pattern .??..??...?##. would be written as:

const firstRegex = /^o(o|#)(o|#)oo(o|#)(o|#)ooo(o|#)##o$/

In the second pattern, each digit stands for a sequence of # separated by at least one o. This can also be written as a regular expression:

For a digit like 3 we write #{3}.
Between digits we write o+.
Additionally, arbitrary many o are allowed at the start and end, so we add o* at the start and end.

Thus, 1,1,3 would be written as:

const secondRegex = /^o*#{1}o+#{1}o+#{3}o*$/

To solve the task and find the number of strings that match both regex, we can use .and(...) and .size() from regex-utils. .and(...) computes the intersection of two regular expressions. That is, it creates a new regex which exactly matches the strings matched by both input regex.

const intersection = RB(firstRegex).and(secondRegex)

With .size() we can then determine the number of matched strings:

console.log(intersection.size())

4n

While at it, we can also try .enumerate() to list all these matches:

for (const str of intersection.enumerate()) { console.log(str) }

oo#ooo#ooo###o o#oooo#ooo###o oo#oo#oooo###o o#ooo#oooo###o

For a full solution checkout: ./benchmark/aoc2023-day12.ts.

References 📖

Heavily informed by these papers:

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.github/workflows		.github/workflows
benchmark		benchmark
src		src
test		test
.envrc		.envrc
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
equiv-checker-screenshot.png		equiv-checker-screenshot.png
equiv-checker.html		equiv-checker.html
flake.lock		flake.lock
flake.nix		flake.nix
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
typedoc.json		typedoc.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Regex Utils 🔤

API Overview 🚀

Installation 📦

Example Use Cases 💡

Generate random strings from Regex 📜

Refactor Regex then Check Equivalence 🔄

Comment Regex using Complement 💬

Password Regex using Intersections 🔐

Solve Advent Of Code 2023 - Day 12 🎄

References 📖

About

Uh oh!

Releases 37

Uh oh!

Languages

gruhn/regex-utils

Folders and files

Latest commit

History

Repository files navigation

Regex Utils 🔤

API Overview 🚀

Installation 📦

Example Use Cases 💡

Generate random strings from Regex 📜

Refactor Regex then Check Equivalence 🔄

Comment Regex using Complement 💬

Password Regex using Intersections 🔐

Solve Advent Of Code 2023 - Day 12 🎄

References 📖

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 37

Uh oh!

Languages