DEV Community

Cover image for How to build JavaScript's fastest “deep clone” function
andrew jarrett
andrew jarrett

Posted on • Edited on

How to build JavaScript's fastest “deep clone” function

This is a longer-form article about how I built JavaScript's fastest “deep clone” function.

Last time when I talked about implementing “deep equal”, I ended by basically telling readers to draw the rest of the owl.

This time let's get in the weeds.

If you haven't already, read through the post on implementing JavaScript's fastest “deep equal”. It's a short read (~3 minutes), and we'll be picking up where we left off in this post.

tl;dr

The implementation we'll be talking about today is:

  • 10-50x faster than Lodash.deepClone
  • 20-120x faster than window.structuredClone

If you're skeptical, you can run the benchmarks yourself on Bolt by typing npm run bench in the terminal.

Before we get into the how, let's first make sure we're on the same page about the problem we're trying to solve.

Disclaimer:

You might not need a deep copy.

Even if cloning were “free” (instantaneous), you still pay for the memory allocation.

Other solutions, such as structural sharing, can help you balance that cost, and could very well be a better fit for your use case.

As always, make sure you're aware of the tradeoffs, and think critically before you go around minting new copies of your data.

Let's get into it.


The problem

First, why copy data at all?

ChatGPT said,

Deep copies are useful when you want to safely manipulate nested data structures without affecting the original object — particularly in environments where immutability is critical for correctness and performance.

Good enough for me.


A “naive” solution

How can we make deep cloning fast?

As usual we start with a naive solution.

Let's say we're working with the following data model:

const UserSchema = { type: 'object', properties: { firstName: { type: 'string' }, lastName: { type: 'string' }, addresses: { type: 'array' items: { type: 'object', properties: { street1: { type: 'string' }, street2: { type: 'string' }, city: { type: 'string' } } } } } } 
Enter fullscreen mode Exit fullscreen mode

Here's what a user looks like:

const sherlock = { firstName: 'Sherlock', lastName: 'Holmes', addresses: [{ street1: '221 Baker St', street2: 'Apartment B', city: 'London' }] } 
Enter fullscreen mode Exit fullscreen mode

Let's practicing our “unlearning” skills and hardcode the solution, like we did last time.

type User = { firstName: string lastName: string addresses: { street1: string street2: string city: string }[] } function deepClone(user: User): User { return { firstName: user.firstName, lastName: user.lastName, addresses: user.addresses.map((value) => ({ street1: value.street1, street2: value.street2, city: value.city })) } } 
Enter fullscreen mode Exit fullscreen mode

Let's benchmark it.


Benchmarking our naive solution

If you'd like to run the benchmarks yourself, you can do so in the Bolt sandbox.

Here's the breakdown:

Lodash 205.09 µs/iter 203.25 µs █ (173.25 µs … 883.04 µs) 829.71 µs ▄█ ( 14.52 kb … 725.08 kb) 65.87 kb ██▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ 4.49 ipc ( 0.66% stalls) 99.07% L1 data cache 621.43k cycles 2.79M instructions 33.01% retired LD/ST (920.05k) deepClone 5.40 µs/iter 5.45 µs ▂█ (5.19 µs … 5.65 µs) 5.61 µs ▃███ ( 0.83 b … 1.97 kb) 169.67 b █▃▇█▃▃▁▃▁████▇▇▅▁▃▅▃▃ 5.68 ipc ( 1.07% stalls) 99.62% L1 data cache 17.05k cycles 96.82k instructions 37.40% retired LD/ST ( 36.21k) ┌ ┐ Lodash ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 205.09 µs deepClone ┤ 5.40 µs └ ┘ ┌ ┐ ╷┌┬ ╷ Lodash ├┤│─────────────────────────────────┤ ╵└┴ ╵ ┬ deepClone │ ┴ └ ┘ 5.19 µs 417.45 µs 829.71 µs summary deepClone 37.98x faster than Lodash 
Enter fullscreen mode Exit fullscreen mode

Yep, it's fast. It could probably be faster, but this blogpost isn't about micro-optimization – it's about drawing the rest of the owl.

Two call outs, before we move on:

Consistency

Notice how inconsistent Lodash is. Here's the range:

173.25 µs … 883.04 µs 883.04 / 173.25 = 5.1 
Enter fullscreen mode Exit fullscreen mode

That's a lot of variance. Depending on the run, Lodash sometimes took 400% more time to do the job.

Compare that to our implementation:

(5.19 µs … 5.65 µs) 5.65 / 5.19 = 1.1 
Enter fullscreen mode Exit fullscreen mode

By comparison, our implementation deviated by only 10%.

So in addition to being faster, it's also more predictable.

That makes sense - ours knows the shape of the data it will receive, so its performance is similarly knowable.

Instruction count

Notice how much work Lodash is doing:

621.43k cycles 2.79M instructions 
Enter fullscreen mode Exit fullscreen mode

Ours:

17.05k cycles 96.82k instructions 
Enter fullscreen mode Exit fullscreen mode

That also makes sense. Our implementation doesn't need to introspect the data, or make any decisions. All it does is clone data.

Interestingly, the difference in instruction count differs by nearly the same factor as time spent:

Lodash: 2.79M instructions Ours: 96_820 * 37x = 3.58M instructions 
Enter fullscreen mode Exit fullscreen mode

Implementing JavaScript's fastest “deep clone”

Like last time, we'll use a “spec” (JSON Schema) to front-load the cost, so we only have to pay it once.

And, like last time, we'll use the schema to build up our program as a string, which we can then pass to the built-in Function constructor.

If you haven't seen this pattern before, my previous post talks about the pattern and why it works.

In other words, this is what we want:

const deepClone = Function('data', ` return { firstName: data.firstName, ...data.lastName !== undefined && ({ lastName: data.lastName }), addresses: data.addresses.map((value) => ({ street1: value.street1, ...value.street2 !== undefined && ({ street2: value.street2 }), city: value.city })) } `) 
Enter fullscreen mode Exit fullscreen mode

Mapping schema to syntax

Our implementation will take a JSON Schema as input, and return a big string.

We'll have to use recursion at some point, but postpone that part for as long as we can.

First, we need to decompose our big problem into smaller sub-problems.

When it comes to recursion, it's almost always easier to start at the leaf nodes.

Leaf nodes

How should we handle strings nodes?

Let's peek at our implementation again:

function deepClone(data) { return { firstName: data.firstName // 𐙘____________𐙘 // here } } 
Enter fullscreen mode Exit fullscreen mode

If we're at a string node, we return the value at that address, in this case, the string data.firstName.

Which brings us to our first problem: how do we get the path? How do we know to return data.firstName, and not data.lastName, or data.x.y.z for that matter?

Continuation passing style (CPS)

If you're like me, your first thought might have been to create a local “path” variable, or add a “context” parameter, where we keep track of the path to our data.

That would work, but on the other hand, that sounds tricky. Implementing that will require a lot of “top-level” coordination, which (speaking from experience) is brittle and tricky to debug (if we screw up the coordination, we'll need to start at the top, and work our way down).

Alternatively, we could punt on all that, and make our implementation a function:

type Builder = (path: string[]) => string const joinPath: Builder = (path) => path.reduce( (acc, cur) => `${acc}.${cur}`, '' ) const cloneString: Builder = (path) => joinPath(path) 
Enter fullscreen mode Exit fullscreen mode

Note: This compilation approach is called “compiling with continuations” or continuation-passing style.

That's actually all we need to do for the leaf nodes.

Array nodes

Okay, let's do arrays.

For arrays, our implementation will need to return a function that accepts a path, and returns a string.

In JSON Schema, arrays are represented like this:

{ type: 'array', items: { type: 'string' } } 
Enter fullscreen mode Exit fullscreen mode

But in our case, since we're working from the bottom up, { type: 'string' } will be a function, instead:

{ type: 'array', items: Builder // 𐙘_____𐙘 // we already replaced the string node // with `cloneString` } 
Enter fullscreen mode Exit fullscreen mode

So we'll need to call it:

const cloneArray : (builder: Builder) => Builder = (builder) => (path) => joinPath(path) + `.map((value) => (${builder(['value'])}))` 
Enter fullscreen mode Exit fullscreen mode

In other words, our array cloner will be called with its child node (builder) as an argument, then, since it's a Builder, it receives the path from its parent.

It then passes a new path (the string "value") down to its child, since its children don't need to care about the full path from the root anymore.

If you're still confused, it might help to look at our original implementation again:

{ // ... addresses: data.addresses.map((value) => ({ // 𐙘___𐙘 })) } 
Enter fullscreen mode Exit fullscreen mode

That's actually all we need to do for arrays.

Object nodes

Objects will be a little trickier, but not by much.

Here's the JSON Schema for object schemas:

{ type: 'object', properties: { street1: { type: 'string' }, street2: { type: 'string' }, city: { type: 'string' } } } 
Enter fullscreen mode Exit fullscreen mode

Since we're building from the bottom-up, here's what our objects schemas will look like, mid-traversal:

{ type: 'object', properties: { street1: Builder, street2: Builder, city: Builder } } 
Enter fullscreen mode Exit fullscreen mode

In other words, instead of receiving a single builder as a child, we'll be receiving a record whose values are each builders.

We'll also need to call each of the builders with its key appended to the end of the path. Like this:

const cloneObject : (builders: Record<string, Builder>) => Builder = (builders) => (path) => { const children = Object.entries(builders) .map( ([k, builder]) => '' + k + ': ' + builder([...path, k]) ) return '{' + children.join(', ') + ' }' } 
Enter fullscreen mode Exit fullscreen mode

That's it for objects.

Wiring things up

Let's wire up our individual schema handlers:

type JsonSchema<T> = | { type: 'string' } | { type: 'array', items: T } | { type: 'object', properties: Record<string, T> } const fold : (schema: JsonSchema<Builder>) => Builder = (schema) => { switch (true) { case schema.type === 'string': return cloneString case schema.type === 'array': return cloneArray(schema.items) case schema.type === 'object': return cloneObject(schema.properties) } } 
Enter fullscreen mode Exit fullscreen mode

Note that our JsonSchema type isn't recursive. Not only does this help with TypeScript IDE performance, but it also lets us accurately represent the actual type of our schema during traversal.

Note that our fold function is also not recursive. We're not recursing yet; we're still wiring things up.

If you haven't seen this pattern before, I wrote about it previously in Recursion in TypeScript (without the tears).


Making it recursive

I said this already, but it's worth reiterating: this section assumes you're familiar with the idea of a Functor.

Prerequisites

If you need to brush up, here are a few resources:

  1. Functors, applicatives and Monads in pictures

  2. Recursion in TypeScript, which covers the same pattern in more depth

Okay, we're finally ready to wire up our fold function.

To do that, we need 2 more components:

A JSON Schema Functor

A Functor is an “interface” from functional programming.

Data that satisfies the Functor interface has a single method, map, which takes a function, and applies the function to the element(s) inside the Functor.

If you've ever used Array.prototype.map before, then you're already familiar with the Array Functor.

Let's define a Functor for JSON Schema specs:

const Functor = { map<S, T>(fn: (src: S) => T) { return (x: JsonSchema<S>): JsonSchema<T> => { switch (true) { case x.type === 'string': return x case x.type === 'array': return { type: 'array', items: fn(x.items) } case x.type === 'object': return { type: 'object', properties: Object.fromEntries( Object.entries(x.properties).map( ([k, v]) => [k, fn(v)] ) ) } } } } } 
Enter fullscreen mode Exit fullscreen mode

A combine function

To do that, we're going to write a function called combine. I call it that because it's closely related to the Y-combinator, and because the traditional name for it (catamorphism) isn't particularly accessible:

type Fold<T> = (x: JsonSchema<T>) => T const combine : <T>(fold: Fold<T>) => Fold<T> = (fold) => function loop(value) { return fold(Functor.map(loop)(value)) } 
Enter fullscreen mode Exit fullscreen mode

Putting it all together

Let's put it all together now:

const buildDeepClone = (schema: JsonSchema<any>) => Function( 'data', 'return ' + combine(fold)(schema)(['data']) ) 
Enter fullscreen mode Exit fullscreen mode

For ~60 lines of code (about 20 of them types), that's not bad, especially for a solution that's an order of magnitude faster than the one that ships with the platform.

Here's a sandbox if you'd like to see it working.


Fuzz testing

We're not done yet. We've tested our buildDeepClone function with a few hardcoded inputs, but that doesn't give us enough confidence to ship this as a library.

I don't know about you, but I'm lazy. I'd rather not rack my brain to think of edge cases, and my fingers hurt just thinking about typing them all out.

Not to mention, those kinds of tests are super brittle: if we change something about our implementation, we have to go and update a bunch of tests by hand, which often involves a bunch of guess-work to figure out what the tests mean.

Let's fuzz test our buildDeepClone function instead.

For that, we'll use a library called fast-check. If you're not familiar with it, now you know.

fast-check

fast-check is a property testing library that exposes a combinator API that will feel very familiar if you've used zod before.

You use these combinators to build up arbitraries. Arbitraries are pseudo-random data generators.

Our goal:

  1. generate a random JSON Schema
  2. generate a random input that satisfies the JSON Schema
  3. pass the schema to our buildCloneDeep function to derive a deepClone function
  4. pass the random input to our deepClone function
  5. use vitest to make sure that the generated input and the cloned input are deeply equal to each other

Generate a random JSON Schema

First, let's write an arbitrary that generates a “random” (pseudo-random) string schema.

Generating a string schema

import * as fc from 'fast-check' const stringSchemaArbitrary = fc.constant({ type: 'string' }) 
Enter fullscreen mode Exit fullscreen mode

In this example we're using the fc.constant combinator to generate a constant value.

The reason we need to wrap this constant value in an arbitrary is so that we can compose it with other the other arbitraries we need to write.

Generating an array schema

Okay, let's build the array schema arbitrary:

import * as fc from 'fast-check' const arraySchemaArbitrary = <T>(model: fc.Arbitrary<T>) => fc.record({ type: fc.constant('array'), items: model }) 
Enter fullscreen mode Exit fullscreen mode

Generating an object schema

Now let's build the object schema arbitrary:

import * as fc from 'fast-check' const objectSchemaArbitrary = <T>(model: fc.Arbitrary<T>) => fc.record({ type: fc.constant('object'), properties: fc.dictionary(fc.string(), model) }) 
Enter fullscreen mode Exit fullscreen mode

All together now

Okay, so we have our component parts. How do we compose them to generate a random JSON Schema?

For that, we'll use fc.letrec.

Using the fc.letrec API might feel familiar to you. That's because it's design is very similar to the “continuation passing” approach we used earlier to generate our stringified function bodies.

Here's what using it looks like:

import * as fc from 'fast-check' const SchemaBuilder = fc.letrec((tie) => ({ string: stringSchemaArbitrary, array: arraySchemaArbitrary(tie('*')), object: objectSchemaArbitrary(tie('*')), ["*"]: fc.oneof( tie('string'), tie('array'), tie('object'), ) })) 
Enter fullscreen mode Exit fullscreen mode

Using it looks like this:

import * as fc from 'fast-check' const twoSchemas = fc.sample(SchemaBuilder['*'], 2) console.log(JSON.stringify(twoSchemas[0], null, 2)) // => { // "type": "object", // "properties": { // "$C_5g$G$": { "type": "string" }, // "G_If_1$_": { "type": "string" }, // "ku": { "type": "string" }, // "$_$_9_c$I": { // "type": "array", // "items": { "type": "string" } // }, // "VF$_$": { "type": "string" }, // "G": { "type": "string" } // } // } // } console.log(JSON.stringify(twoSchemas[1], null, 2)) // => { // "type": "array", // "items": { // "type": "array", // "items": { "type": "string" } // } // } 
Enter fullscreen mode Exit fullscreen mode

That covers our random JSON Schema generator. What about the inputs?

Reading the schema to generate random inputs

For this, we can re-use the combine function we wrote earlier to avoid having to deal with walking the schema tree.

import * as fc from 'fast-check' export const generateData = combine((schema) => { switch (true) { case schema.type === 'string': return fc.string() case schema.type === 'array': return fc.array(schema.items) case schema.type === 'object': return fc.record(schema.properties) } }) 
Enter fullscreen mode Exit fullscreen mode

Let's unpack that:

Our dataFromJsonSchema function accepts a JSON Schema as input, then pattern matches on the schemas "type" property:

  • if schema.type === "string", we return a string arbitrary
  • if schema.type === "array", we return an array arbitrary, passing the arbitrary at schema.items into fc.array
  • if schema.type === "object", we return a record arbitrary, passing the record of arbitraries at schema.properties into fc.record

And that's it. Because combine handles the recursion for us, we're done.

Writing our test

fast-check is designed to integrate seamlessly with vitest (and if this issue gets traction, it might someday be integrated as a first-class part of the API 🤞).

Here's all we need to do to fuzz test that our buildDeepClone function works with any arbitrary JSON Schema:

import { assert, test } from 'vitest' import * as fc from 'fast-check' test('fuzz test', () => { fc.assert( fc.property(SchemaBuilder['*'], (schema) => { const [data] = fc.sample(generateData(schema)) const deepClone = buildDeepClone(schema) const clone = deepClone(data) assert.deepEqual(clone, data) }), { numRuns: 1000 } ) }) 
Enter fullscreen mode Exit fullscreen mode

That's it! Notice that we're running the test 1,000 times. We can of course configure that – by default it runs the test 100 times.

Here's a sandbox if you'd like to run the test suite yourself.

Tip: If you run the test suite, you can scroll up in the terminal to see the generated schema, generated input, and cloned data for each run.


Closing thoughts

Hopefully this helped you see, if not the whole owl, enough that you could draw the rest if you wanted.

If you'd prefer not to write this kind of thing yourself, take a look at @traversable/schema.

We support generating deepClone, deepEqual and more for the follow schema types:

And thanks for reading! This one was pretty long. If you made it all the way through and have feedback, or just want to chat about anything we talked about, feel free to reach out via email at ahrjarrett@gmail.com.

Aside

This approach to building leaner, more localized alternatives to bulky (and often slow) APIs was inspired by, of all things, Dune.

More specifically, it was a video about the thematic role of architecture in the Dune series.

This quote, which compares the Harkonnen's design to the Fremen's, is what got me thinking about it:

The [Harkonnens], even though they're technically the most powerful and resourceful family, struggle to adapt to Arrakis' harsh conditions, and they rely mostly on imported resources.

The Fremen on the other hand have been able to develop solutions that are locally adapted approaches that are specifically designed for [their] climate.

Instead of seeking universally applicable high tech solutions, maybe we need to be looking more for nuanced locally adapted approaches.

– Dami Lee, Dune Cities: Fiction or Reality?

I think this kind of approach could be explored in our line of work more often. It seems like this is a relatively new frontier (for the JavaScript ecosystem, at least), and I'm excited to see what others come up with as we explore the space of possibilities.

That's all for now. Cheers.


Links

Top comments (0)