Posted on Jan 22, 2022

100 Languages Speedrun: Episode 65: Randomized Finite Automaton for Fast Thue Interpreter in Crystal

This is probably the most Computer Science heavy episode so far. If that's not your thing, feel free to skip and come back for the next episode.

Deterministic Finite Automaton is a computer science concept. It's a program with these properties:

at every point it's in one of limited number of states
it goes through the string one character at a time
based on current state, current character, and nothing else, it choses the next state
once it's done with the string, we extract some information from whichever state it ended up in

DFA Example

So for example let's write a program that matches a string of digits. It can have some extra spaces at the beginning and end, and it can have _ between digits (but not elsewhere, an not multiples), and it's not allowed to have leading 0s unless the whole number is zero. In regular expression terms it's /^\s*(0|[1-9]\d*(_\d+)*)\s*$/.

Let's try to make a DFA for it:

state Start, if character space, go to state Start:
state Start, if character 1-9, go to state Digit:
state Start, if character 0, go to state End:
state Digit, if character 0-9, go to state Digit
state Digit, if character space, go to state End
state Digit, if character _, go to state Underscore
state Undersor, if character 0-9, go to state Digit
state End, if character space go to state End

Any state, any character not listed, go to state Fail.

If we reached end of the string, and state is either Digit or End, the string matches. Otherwise it doesn't. Hopefully I didn't mess it up.

Other Finite Automata

DFAs are nice because it's obvious how to make them super fast - it's just one table lookup per character. It's also possible to combine them - OR, AND, and NOT of any number of DFAs is still a DFA (even if potentially a much bigger one).

This is all nice, but we usually want to know more than "matches" or "doesn't match" is. So we came up with so damn many variants of the DFA idea - including the engine behind regular expressions.

What we want for Thue is something like that, except:

we want to know which rule matched
we want to know where it matched
we want to know exactly one of all possible matches

So I came up with Randomized Finite Automaton - a DFA variant that's perfect for Thue.

Trie

First, let's organize all the rules into a "trie". Trie is like a tree-like structure which lets us lookup a lot of strings at once. At root of the tree is a hash with keys being all the first characters. Then at every node there's some data for strings that finish there, and more tries for strings that continue.

For example a trie for this collection of strings and heir associated data: {"no": 1, "no": 2, "not": 3, "nu": 4} would be:

root has data: [], and children {'n': trie_n}
trie_n has data: [], and children {'o': trie_no, 'u': trie_nu}
trie_nu has data: [4], and children {}
trie_no has data: [1, 2], and children {"t": trie_not}
trie_not has data: [3], and children {}

The goal of this is that we can have very big number of rules, and we match them all at once, instead of trying every single rule. If we have thousands of rules, this can be a lot faster than hash table based solution, since trie-based solution can just look at one character and instantly eliminate hundreds or thousands of potential matches, and only the relevant ones stay.

For example if we have this trie, and string "maybe", we do a check for root.children['m'], root.children['a'], root.children['y'], root.children['b'], root.children['e'], get empty 5 times, and we're done. No matter how many rules starting with n or whatnot we had.

I found one Trie implementation for Crystal but it wasn't doing quite what I wanted. I wanted multiple data per node, it had just one. It wouldn't be too hard to adapt, but tries are super simple so I just wrote my own implementation:

class Trie(T) getter :data def initialize @data = [] of T @children = Hash(Char, Trie(T)).new end def insert(str : String, data : T) if str.empty? @data.push(data) else c = str[0] @children[c] ||= Trie(T).new @children[c].insert(str[1..-1], data) end end def [](c : Char) @children[c]? end end

RFA

Now there are two ways to go forward. The first (NFA style), is to remember every partially matched trie. This can potentially mean checking N tries for every character if maximum length of the rule is N.

The other would be to precalculate every combination (DFA style). In principle that would be faster as we guarantee just one lookup per character. The cost would be extra calculation, and potentially a lot bigger tries.

If we expect rules to be fairly short (let's say 10 characters or less), even if there are thousands of rules in our Thue program, then the NFA style solution is just better. If we expect rules to be very long, then DFA solution would win, but I don't think Thue programs would have very big rules.

NFA solution would also be better at ignoring fake rules - like if you use impossible rules as comments (# this is a comment ::= it will never be matched), NFA solution is pretty much unaffected, while DFA solution would have significantly bigger state.

So here's the Randomized Finite Automaton - the core of this episode:

class RFA def initialize(@rules : Array(ThueRule)) @trie = Trie(ThueRule).new @rules.each do |rule| @trie.insert(rule.left, rule) end end # No empty matches allowed def random_match(str : String) count = 0 active = [@trie] match = nil str.chars.each_with_index do |char, idx| next_tries = active.map{|t| t[char]}.compact matching_rules = next_tries.flat_map(&.data) unless matching_rules.empty? count += matching_rules.size if rand(count) < matching_rules.size rule = matching_rules.sample match = {rule: rule, idx: (idx - rule.left.size + 1)} end end active = [@trie, *next_tries] end match end end

In the constructor we just insert every rule into the main trie.

As we match, we go character by character, and remember every potential trie. That's the main trie, plus any trie which we started matching already. The number of that is bound by length of the longest rule, but in principle there would be very few tries at the same time. (DFA-style solution would have only 1 trie, basically result of merging those NFA tries).

Then we go through all the tries and get all the rules matching at current character.

Now here's the fun part - we could use it to generate list of all possible matches in the string, but that's not what we want, we just want one. So we know we had N matches so far, and M matches at current character. We pick one of M at random, then we roll M / (N + M) to decide if we want to keep new or old match.

The final thing we need to adjust is subtract number of characters minus one from the match. The RFA gives us address of last matching character, but it's generally more convenient to know the first. All rules have fixed number of characters, so it's very easy.

Complete Thue Interpreter

Here's the whole program:

#!/usr/bin/env crystal require "./trie" class ThueRule getter :left def initialize(@left : String, @right : String) @right = "~\n" if @right == "~" end def apply(str, idx) before = str[0, idx] after = str[idx+@left.size .. -1] if @right[0] == '~' print @right[1..-1] replacement = "" elsif @right == ":::" replacement = STDIN.gets.not_nil!.chomp else replacement = @right end before + replacement + after end def to_s(io) io << "Rule<#{@left.inspect}::=#{@right.inspect}>" end end class ThueSideParser getter :results @toparse : Array(Char) def initialize(@str : String) @results = [""] @toparse = @str.chars parse end private def parse until @toparse.empty? case @toparse[0] when '[' chars = parse_range if @results.size == 1 @results = chars.map{|c| @results[0]+c} elsif @results.size == chars.size @results = @results.zip(chars).map{|s,c| s+c} else raise "Sizes of character classes mismatch in #{@str}" end else c = parse_character @results = @results.map{|s| s + c} end end @results end private def parse_character if @toparse[0] == '\\' @toparse.shift raise "Unmatched \\ in #{@str}" if eos? c = @toparse.shift case c when 'n' '\n' when 's' ' ' else c end else @toparse.shift end end private def parse_range chars = [] of Char @toparse.shift loop do raise "Character range never closed in #{@str}" if eos? if @toparse[0] == ']' @toparse.shift return chars end c = parse_character raise "Character range never closed in #{@str}" if eos? if @toparse[0] == '-' @toparse.shift e = parse_character raise "Invalid character range in #{@str}" if e < c chars.concat(c..e) else chars << c end end end private def eos? @toparse.empty? end end class ThueRuleParser def initialize(@str : String) if @str =~ /\A(.*)::=(.*)\z/ @valid = true @left = $1 @right = $2 else @left = "" @right = "" @valid = false end end def valid_rule? @valid end def empty_rule? @valid && @left.empty? end def call lefts = ThueSideParser.new(@left).results rights = ThueSideParser.new(@right).results # Support N-to-1 and 1-to-N rules lefts *= rights.size if lefts.size == 1 rights *= lefts.size if rights.size == 1 unless lefts.size == rights.size raise "Mismatched side of rule #{@str}" end lefts.zip(rights).map do |left, right| ThueRule.new(left, right) end end end class RFA def initialize(@rules : Array(ThueRule)) @trie = Trie(ThueRule).new @rules.each do |rule| @trie.insert(rule.left, rule) end end # No empty matches allowed def random_match(str : String) count = 0 active = [@trie] match = nil str.chars.each_with_index do |char, idx| next_tries = active.map{|t| t[char]}.compact matching_rules = next_tries.flat_map(&.data) unless matching_rules.empty? count += matching_rules.size if rand(count) < matching_rules.size rule = matching_rules.sample match = {rule: rule, idx: (idx - rule.left.size + 1)} end end active = [@trie, *next_tries] end match end end class ThueProgram def initialize @rules = [] of ThueRule @initial = "" @state = "" end def load(path) lines = File.read_lines(path).map(&.chomp).zip(1..) while lines.size > 0 line, line_no = lines.shift # Ignoring invalid rules, they are sometimes used as comments parser = ThueRuleParser.new(line) next unless parser.valid_rule? break if parser.empty_rule? @rules.concat parser.call end @rfa = RFA.new(@rules) @initial = lines.map(&.first).join("\n") end def run(debug) @state = @initial if debug @rules.each do |rule| STDERR.puts rule end end while match = @rfa.not_nil!.random_match(@state) rule = match[:rule] idx = match[:idx] if debug STDERR.puts "Applying rule #{rule} at #{idx} to #{@state.inspect}" end @state = rule.apply(@state, idx) end if debug STDERR.puts "No more matches. Final state: #{@state.inspect}" end end end unless ARGV.size == 1 STDERR.puts "Usage: #{PROGRAM_NAME} <file.thue>" exit 1 end prog = ThueProgram.new prog.load(ARGV[0]) # Crystal doesn't handle SIGPIPE well and we want to support: # crystal thue.cr examples/fizzbuzz.thue | head -n 100 begin prog.run(!!ENV["DEBUG"]?) rescue e : IO::Error exit if e.os_error == Errno::EPIPE raise e end

Performance

Doing just this change, we got decent performance improvement, 51s to 21s on 100k FizzBuzz Thue program:

$ time ./thue_rx.cr examples_rx/fizzbuzz.thue | head -n 100000 >/dev/null ./thue_rx.cr examples_rx/fizzbuzz.thue 51.50s user 0.81s system 101% cpu 51.601 total head -n 100000 > /dev/null 0.16s user 0.39s system 1% cpu 51.590 total $ time ./thue_rfa.cr examples_rx/fizzbuzz.thue | head -n 100000 >/dev/null ./thue_rfa.cr examples_rx/fizzbuzz.thue 21.47s user 13.90s system 165% cpu 21.418 total head -n 100000 > /dev/null 0.11s user 0.21s system 1% cpu 21.408 total

Comparing release builds, the difference is consistent but small, 41s to 39s on 500k FizzBuzz Thue program. Both finish 100k FizzBuzz in ~7s:

$ crystal build --release thue_rfa.cr $ crystal build --release thue_rx.cr $ time ./thue_rx examples_rx/fizzbuzz.thue | head -n 500000 >/dev/null ./thue_rx examples_rx/fizzbuzz.thue 41.05s user 3.38s system 106% cpu 41.762 total head -n 500000 > /dev/null 0.45s user 0.88s system 3% cpu 41.760 total $ time ./thue_rfa examples_rx/fizzbuzz.thue | head -n 500000 >/dev/null ./thue_rfa examples_rx/fizzbuzz.thue 39.44s user 64.53s system 272% cpu 38.119 total head -n 500000 > /dev/null 0.52s user 0.95s system 3% cpu 38.117 total

I'm not sure if this counts as a win or not. It's very big improvement on the development build, but small one on the release build. It's definitely going to be more significant when running Thue programs with a huge number of rules, I guess FizzBuzz with less than 100 rules didn't really benefit from that.

There's probably a lot of small optimizations that can be applied to RFA#random_match, even without precomputing a single big trie.

Code

All code examples for the series will be in this repository.

Code for the Better Thue Interpreter in Crystal episode is available here.

Top comments (2)

Ary Borenszweig • Jan 22 '22

Really nice post, as usual!

Some small things to further improve the performance:

Use str.each_char_with_index instead of str.chars.each_with_index (the former doesn't allocate memory while the latter does)
Use active.compact_map { |t| t[char] } instead of active.map { |t| t[char] }.compact (one less intermediate array)
Do something like active.clear; active.push @trie; active.concat next_tries instead of creating a new array for each char

That said, I totally understand that the goal is to improve performance compared to before while also keeping the code as readable as possible. I'm just suggesting these because you said "There's probably a lot small optimizations" and because I like optimizing things :-)

Tomasz Wegrzanowski • Jan 22 '22

Oh for sure. I think the biggest performance savings would come from switching from processing characters to processing bytes, as in UTF-8 this works perfectly well without any changes, and then we could just use 256 entry arrays instead of hashes for the trie.

Not like this really makes much difference for programs with so few rules.