Original article: https://kdnanmaga.xyz/how-to-load-a-large-json-file-into-elixir-ets-cache/
Elixir's ETS cache is an in-memory data store that can be accessed across processes. It can be used to build lookup tables such as geocoders, translators etc. in web applications.
For an application I was building, I had to load a large dataset (~500k rows) of geocodes into an ETS table before the webserver endpoint starts. This data can be shared across all the processes that handle incoming requests.
First attempt: Load the file and then parse
At first, I attempted loading the file into memory, then parsing it with Jason and...
# here my json is one single root object with key-value pairs def load_file(filename, tablename) do :ets.new(tablename, [:named_table]) with {:ok, body} <- File.read(filename), {:ok, json} <- Jason.decode(body), do: load_from_map(json, tablename) end defp load_from_map(parsed_map, tablename) do :ets.new(tablename, [:named_table]) for {k,v} <- parsed_map do :ets.insert(tablename, {k,v}) end end
It worked, but it took quite a while and hogged quite some RAM. My machine with 4GB RAM froze for about a minute.
Streaming to the rescue
At this point, I thought there could be a better way to do this, may be something that doesn't involve reading the entire file into memory. That's when I found Jaxon, a streaming JSON Parser. So now the file is opened as a stream and the JSON is parsed as the stream is being read. Pretty neat right?
# here my json is an array of objects {"k":<key>,"v":<value>} def load_file(filename, tablename) do :ets.new(tablename, [:named_table]) filename |> File.stream!() |> Jaxon.Stream.from_enumerable() |> Jaxon.Stream.query([:root, :all]) |> Stream.each(fn (kv) -> :ets.insert(tablename, {kv["k"],kv["v"]}) end) |> Stream.run() end
At first this didn't seem to work and I was disappointed until I realized I my JSON wasn't pretty and was just a single line. I generated a multi-line pretty JSON and voila! It worked!
Find me on twitter: @xqzkio
Top comments (0)