This is a longer-form article about how I built JavaScript's fastest “deep clone” function.
Last time when I talked about implementing “deep equal”, I ended by basically telling readers to draw the rest of the owl.
This time let's get in the weeds.
If you haven't already, read through the post on implementing JavaScript's fastest “deep equal”. It's a short read (~3 minutes), and we'll be picking up where we left off in this post.
tl;dr
The implementation we'll be talking about today is:
- 10-50x faster than
Lodash.deepClone
- 20-120x faster than
window.structuredClone
If you're skeptical, you can run the benchmarks yourself on Bolt by typing npm run bench
in the terminal.
Before we get into the how, let's first make sure we're on the same page about the problem we're trying to solve.
Disclaimer:
You might not need a deep copy.
Even if cloning were “free” (instantaneous), you still pay for the memory allocation.
Other solutions, such as structural sharing, can help you balance that cost, and could very well be a better fit for your use case.
As always, make sure you're aware of the tradeoffs, and think critically before you go around minting new copies of your data.
Let's get into it.
The problem
First, why copy data at all?
ChatGPT said,
Deep copies are useful when you want to safely manipulate nested data structures without affecting the original object — particularly in environments where immutability is critical for correctness and performance.
Good enough for me.
A “naive” solution
How can we make deep cloning fast?
As usual we start with a naive solution.
Let's say we're working with the following data model:
const UserSchema = { type: 'object', properties: { firstName: { type: 'string' }, lastName: { type: 'string' }, addresses: { type: 'array' items: { type: 'object', properties: { street1: { type: 'string' }, street2: { type: 'string' }, city: { type: 'string' } } } } } }
Here's what a user looks like:
const sherlock = { firstName: 'Sherlock', lastName: 'Holmes', addresses: [{ street1: '221 Baker St', street2: 'Apartment B', city: 'London' }] }
Let's practicing our “unlearning” skills and hardcode the solution, like we did last time.
type User = { firstName: string lastName: string addresses: { street1: string street2: string city: string }[] } function deepClone(user: User): User { return { firstName: user.firstName, lastName: user.lastName, addresses: user.addresses.map((value) => ({ street1: value.street1, street2: value.street2, city: value.city })) } }
Let's benchmark it.
Benchmarking our naive solution
If you'd like to run the benchmarks yourself, you can do so in the Bolt sandbox.
Here's the breakdown:
Lodash 205.09 µs/iter 203.25 µs █ (173.25 µs … 883.04 µs) 829.71 µs ▄█ ( 14.52 kb … 725.08 kb) 65.87 kb ██▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ 4.49 ipc ( 0.66% stalls) 99.07% L1 data cache 621.43k cycles 2.79M instructions 33.01% retired LD/ST (920.05k) deepClone 5.40 µs/iter 5.45 µs ▂█ (5.19 µs … 5.65 µs) 5.61 µs ▃███ ( 0.83 b … 1.97 kb) 169.67 b █▃▇█▃▃▁▃▁████▇▇▅▁▃▅▃▃ 5.68 ipc ( 1.07% stalls) 99.62% L1 data cache 17.05k cycles 96.82k instructions 37.40% retired LD/ST ( 36.21k) ┌ ┐ Lodash ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 205.09 µs deepClone ┤ 5.40 µs └ ┘ ┌ ┐ ╷┌┬ ╷ Lodash ├┤│─────────────────────────────────┤ ╵└┴ ╵ ┬ deepClone │ ┴ └ ┘ 5.19 µs 417.45 µs 829.71 µs summary deepClone 37.98x faster than Lodash
Yep, it's fast. It could probably be faster, but this blogpost isn't about micro-optimization – it's about drawing the rest of the owl.
Two call outs, before we move on:
Consistency
Notice how inconsistent Lodash is. Here's the range:
173.25 µs … 883.04 µs 883.04 / 173.25 = 5.1
That's a lot of variance. Depending on the run, Lodash sometimes took 400% more time to do the job.
Compare that to our implementation:
(5.19 µs … 5.65 µs) 5.65 / 5.19 = 1.1
By comparison, our implementation deviated by only 10%.
So in addition to being faster, it's also more predictable.
That makes sense - ours knows the shape of the data it will receive, so its performance is similarly knowable.
Instruction count
Notice how much work Lodash is doing:
621.43k cycles 2.79M instructions
Ours:
17.05k cycles 96.82k instructions
That also makes sense. Our implementation doesn't need to introspect the data, or make any decisions. All it does is clone data.
Interestingly, the difference in instruction count differs by nearly the same factor as time spent:
Lodash: 2.79M instructions Ours: 96_820 * 37x = 3.58M instructions
Implementing JavaScript's fastest “deep clone”
Like last time, we'll use a “spec” (JSON Schema) to front-load the cost, so we only have to pay it once.
And, like last time, we'll use the schema to build up our program as a string, which we can then pass to the built-in Function
constructor.
If you haven't seen this pattern before, my previous post talks about the pattern and why it works.
In other words, this is what we want:
const deepClone = Function('data', ` return { firstName: data.firstName, ...data.lastName !== undefined && ({ lastName: data.lastName }), addresses: data.addresses.map((value) => ({ street1: value.street1, ...value.street2 !== undefined && ({ street2: value.street2 }), city: value.city })) } `)
Mapping schema to syntax
Our implementation will take a JSON Schema as input, and return a big string.
We'll have to use recursion at some point, but postpone that part for as long as we can.
First, we need to decompose our big problem into smaller sub-problems.
When it comes to recursion, it's almost always easier to start at the leaf nodes.
Leaf nodes
How should we handle strings nodes?
Let's peek at our implementation again:
function deepClone(data) { return { firstName: data.firstName // 𐙘____________𐙘 // here } }
If we're at a string node, we return the value at that address, in this case, the string data.firstName
.
Which brings us to our first problem: how do we get the path? How do we know to return data.firstName
, and not data.lastName
, or data.x.y.z
for that matter?
Continuation passing style (CPS)
If you're like me, your first thought might have been to create a local “path” variable, or add a “context” parameter, where we keep track of the path to our data.
That would work, but on the other hand, that sounds tricky. Implementing that will require a lot of “top-level” coordination, which (speaking from experience) is brittle and tricky to debug (if we screw up the coordination, we'll need to start at the top, and work our way down).
Alternatively, we could punt on all that, and make our implementation a function:
type Builder = (path: string[]) => string const joinPath: Builder = (path) => path.reduce( (acc, cur) => `${acc}.${cur}`, '' ) const cloneString: Builder = (path) => joinPath(path)
Note: This compilation approach is called “compiling with continuations” or continuation-passing style.
That's actually all we need to do for the leaf nodes.
Array nodes
Okay, let's do arrays.
For arrays, our implementation will need to return a function that accepts a path, and returns a string.
In JSON Schema, arrays are represented like this:
{ type: 'array', items: { type: 'string' } }
But in our case, since we're working from the bottom up, { type: 'string' }
will be a function, instead:
{ type: 'array', items: Builder // 𐙘_____𐙘 // we already replaced the string node // with `cloneString` }
So we'll need to call it:
const cloneArray : (builder: Builder) => Builder = (builder) => (path) => joinPath(path) + `.map((value) => (${builder(['value'])}))`
In other words, our array cloner will be called with its child node (builder
) as an argument, then, since it's a Builder
, it receives the path from its parent.
It then passes a new path (the string "value"
) down to its child, since its children don't need to care about the full path from the root anymore.
If you're still confused, it might help to look at our original implementation again:
{ // ... addresses: data.addresses.map((value) => ({ // 𐙘___𐙘 })) }
That's actually all we need to do for arrays.
Object nodes
Objects will be a little trickier, but not by much.
Here's the JSON Schema for object schemas:
{ type: 'object', properties: { street1: { type: 'string' }, street2: { type: 'string' }, city: { type: 'string' } } }
Since we're building from the bottom-up, here's what our objects schemas will look like, mid-traversal:
{ type: 'object', properties: { street1: Builder, street2: Builder, city: Builder } }
In other words, instead of receiving a single builder as a child, we'll be receiving a record whose values are each builders.
We'll also need to call each of the builders with its key appended to the end of the path. Like this:
const cloneObject : (builders: Record<string, Builder>) => Builder = (builders) => (path) => { const children = Object.entries(builders) .map( ([k, builder]) => '' + k + ': ' + builder([...path, k]) ) return '{' + children.join(', ') + ' }' }
That's it for objects.
Wiring things up
Let's wire up our individual schema handlers:
type JsonSchema<T> = | { type: 'string' } | { type: 'array', items: T } | { type: 'object', properties: Record<string, T> } const fold : (schema: JsonSchema<Builder>) => Builder = (schema) => { switch (true) { case schema.type === 'string': return cloneString case schema.type === 'array': return cloneArray(schema.items) case schema.type === 'object': return cloneObject(schema.properties) } }
Note that our JsonSchema
type isn't recursive. Not only does this help with TypeScript IDE performance, but it also lets us accurately represent the actual type of our schema during traversal.
Note that our fold
function is also not recursive. We're not recursing yet; we're still wiring things up.
If you haven't seen this pattern before, I wrote about it previously in Recursion in TypeScript (without the tears).
Making it recursive
I said this already, but it's worth reiterating: this section assumes you're familiar with the idea of a Functor.
Prerequisites
If you need to brush up, here are a few resources:
Recursion in TypeScript, which covers the same pattern in more depth
Okay, we're finally ready to wire up our fold
function.
To do that, we need 2 more components:
A JSON Schema Functor
A Functor is an “interface” from functional programming.
Data that satisfies the Functor interface has a single method, map
, which takes a function, and applies the function to the element(s) inside the Functor.
If you've ever used Array.prototype.map
before, then you're already familiar with the Array Functor.
Let's define a Functor for JSON Schema specs:
const Functor = { map<S, T>(fn: (src: S) => T) { return (x: JsonSchema<S>): JsonSchema<T> => { switch (true) { case x.type === 'string': return x case x.type === 'array': return { type: 'array', items: fn(x.items) } case x.type === 'object': return { type: 'object', properties: Object.fromEntries( Object.entries(x.properties).map( ([k, v]) => [k, fn(v)] ) ) } } } } }
A combine
function
To do that, we're going to write a function called combine
. I call it that because it's closely related to the Y-combinator, and because the traditional name for it (catamorphism) isn't particularly accessible:
type Fold<T> = (x: JsonSchema<T>) => T const combine : <T>(fold: Fold<T>) => Fold<T> = (fold) => function loop(value) { return fold(Functor.map(loop)(value)) }
Putting it all together
Let's put it all together now:
const buildDeepClone = (schema: JsonSchema<any>) => Function( 'data', 'return ' + combine(fold)(schema)(['data']) )
For ~60 lines of code (about 20 of them types), that's not bad, especially for a solution that's an order of magnitude faster than the one that ships with the platform.
Here's a sandbox if you'd like to see it working.
Fuzz testing
We're not done yet. We've tested our buildDeepClone
function with a few hardcoded inputs, but that doesn't give us enough confidence to ship this as a library.
I don't know about you, but I'm lazy. I'd rather not rack my brain to think of edge cases, and my fingers hurt just thinking about typing them all out.
Not to mention, those kinds of tests are super brittle: if we change something about our implementation, we have to go and update a bunch of tests by hand, which often involves a bunch of guess-work to figure out what the tests mean.
Let's fuzz test our buildDeepClone
function instead.
For that, we'll use a library called fast-check
. If you're not familiar with it, now you know.
fast-check
fast-check
is a property testing library that exposes a combinator API that will feel very familiar if you've used zod
before.
You use these combinators to build up arbitraries. Arbitraries are pseudo-random data generators.
Our goal:
- generate a random JSON Schema
- generate a random input that satisfies the JSON Schema
- pass the schema to our
buildCloneDeep
function to derive adeepClone
function - pass the random input to our
deepClone
function - use
vitest
to make sure that the generated input and the cloned input are deeply equal to each other
Generate a random JSON Schema
First, let's write an arbitrary that generates a “random” (pseudo-random) string schema.
Generating a string schema
import * as fc from 'fast-check' const stringSchemaArbitrary = fc.constant({ type: 'string' })
In this example we're using the fc.constant
combinator to generate a constant value.
The reason we need to wrap this constant value in an arbitrary is so that we can compose it with other the other arbitraries we need to write.
Generating an array schema
Okay, let's build the array schema arbitrary:
import * as fc from 'fast-check' const arraySchemaArbitrary = <T>(model: fc.Arbitrary<T>) => fc.record({ type: fc.constant('array'), items: model })
Generating an object schema
Now let's build the object schema arbitrary:
import * as fc from 'fast-check' const objectSchemaArbitrary = <T>(model: fc.Arbitrary<T>) => fc.record({ type: fc.constant('object'), properties: fc.dictionary(fc.string(), model) })
All together now
Okay, so we have our component parts. How do we compose them to generate a random JSON Schema?
For that, we'll use fc.letrec
.
Using the fc.letrec
API might feel familiar to you. That's because it's design is very similar to the “continuation passing” approach we used earlier to generate our stringified function bodies.
Here's what using it looks like:
import * as fc from 'fast-check' const SchemaBuilder = fc.letrec((tie) => ({ string: stringSchemaArbitrary, array: arraySchemaArbitrary(tie('*')), object: objectSchemaArbitrary(tie('*')), ["*"]: fc.oneof( tie('string'), tie('array'), tie('object'), ) }))
Using it looks like this:
import * as fc from 'fast-check' const twoSchemas = fc.sample(SchemaBuilder['*'], 2) console.log(JSON.stringify(twoSchemas[0], null, 2)) // => { // "type": "object", // "properties": { // "$C_5g$G$": { "type": "string" }, // "G_If_1$_": { "type": "string" }, // "ku": { "type": "string" }, // "$_$_9_c$I": { // "type": "array", // "items": { "type": "string" } // }, // "VF$_$": { "type": "string" }, // "G": { "type": "string" } // } // } // } console.log(JSON.stringify(twoSchemas[1], null, 2)) // => { // "type": "array", // "items": { // "type": "array", // "items": { "type": "string" } // } // }
That covers our random JSON Schema generator. What about the inputs?
Reading the schema to generate random inputs
For this, we can re-use the combine
function we wrote earlier to avoid having to deal with walking the schema tree.
import * as fc from 'fast-check' export const generateData = combine((schema) => { switch (true) { case schema.type === 'string': return fc.string() case schema.type === 'array': return fc.array(schema.items) case schema.type === 'object': return fc.record(schema.properties) } })
Let's unpack that:
Our dataFromJsonSchema
function accepts a JSON Schema as input, then pattern matches on the schemas "type"
property:
- if
schema.type === "string"
, we return a string arbitrary - if
schema.type === "array"
, we return an array arbitrary, passing the arbitrary atschema.items
intofc.array
- if
schema.type === "object"
, we return a record arbitrary, passing the record of arbitraries atschema.properties
intofc.record
And that's it. Because combine
handles the recursion for us, we're done.
Writing our test
fast-check
is designed to integrate seamlessly with vitest
(and if this issue gets traction, it might someday be integrated as a first-class part of the API 🤞).
Here's all we need to do to fuzz test that our buildDeepClone
function works with any arbitrary JSON Schema:
import { assert, test } from 'vitest' import * as fc from 'fast-check' test('fuzz test', () => { fc.assert( fc.property(SchemaBuilder['*'], (schema) => { const [data] = fc.sample(generateData(schema)) const deepClone = buildDeepClone(schema) const clone = deepClone(data) assert.deepEqual(clone, data) }), { numRuns: 1000 } ) })
That's it! Notice that we're running the test 1,000 times. We can of course configure that – by default it runs the test 100 times.
Here's a sandbox if you'd like to run the test suite yourself.
Tip: If you run the test suite, you can scroll up in the terminal to see the generated schema, generated input, and cloned data for each run.
Closing thoughts
Hopefully this helped you see, if not the whole owl, enough that you could draw the rest if you wanted.
If you'd prefer not to write this kind of thing yourself, take a look at @traversable/schema
.
We support generating deepClone
, deepEqual
and more for the follow schema types:
And thanks for reading! This one was pretty long. If you made it all the way through and have feedback, or just want to chat about anything we talked about, feel free to reach out via email at ahrjarrett@gmail.com.
Aside
This approach to building leaner, more localized alternatives to bulky (and often slow) APIs was inspired by, of all things, Dune.
More specifically, it was a video about the thematic role of architecture in the Dune series.
This quote, which compares the Harkonnen's design to the Fremen's, is what got me thinking about it:
The [Harkonnens], even though they're technically the most powerful and resourceful family, struggle to adapt to Arrakis' harsh conditions, and they rely mostly on imported resources.
The Fremen on the other hand have been able to develop solutions that are locally adapted approaches that are specifically designed for [their] climate.
Instead of seeking universally applicable high tech solutions, maybe we need to be looking more for nuanced locally adapted approaches.
– Dami Lee, Dune Cities: Fiction or Reality?
I think this kind of approach could be explored in our line of work more often. It seems like this is a relatively new frontier (for the JavaScript ecosystem, at least), and I'm excited to see what others come up with as we explore the space of possibilities.
That's all for now. Cheers.
Links
- Bolt sandbox where you can run the benchmarks
- The benchmarks themselves
- Examples of the generated code
- How the benchmarks were conducted
- The implementation
Top comments (0)