Posted on Jul 30 • Edited on Aug 25

How to build JavaScript's fastest “deep clone” function

#typescript #javascript #performance #webperf

This is a longer-form article about how I built JavaScript's fastest “deep clone” function.

Last time when I talked about implementing “deep equal”, I ended by basically telling readers to draw the rest of the owl.

This time let's get in the weeds.

If you haven't already, read through the post on implementing JavaScript's fastest “deep equal”. It's a short read (~5 minutes), and we'll be picking up where we left off in this post.

tl;dr

The implementation we'll be talking about today is:

10-50x faster than Lodash.deepClone
20-120x faster than window.structuredClone

If you're skeptical, you can run the benchmarks yourself on Bolt by typing npm run bench in the terminal.

Before we get into the how, let's first make sure we're on the same page about the problem we're trying to solve.

Disclaimer:

You might not need a deep copy.

Even if cloning were “free” (instantaneous), you still pay for the memory allocation.

Other solutions, such as structural sharing, can help you balance that cost, and could very well be a better fit for your use case.

As always, make sure you're aware of the tradeoffs, and think critically before you go around minting new copies of your data.

Let's get into it.

The problem

Managing shared references to mutable data is hard.

And JavaScript offers little help here: the syntax for mutating a value (=) is identical to the syntax for binding a variable (=), and similar to the syntax for comparing two values (==).

If you're an experienced JavaScript developer, you've probably learned to be hyper-vigilant about preventing accidental mutation.

Sometimes, though, you need a guarantee. Sometimes you need to make a copy.

So what's the problem?

Copying data is slow.

Part of the problem is that allocating memory takes time. This part is, to some extent, unavoidable.

Part of the problem is that the algorithm is slow. This is the part we can change.

A “naive” solution

How can we make deep cloning fast?

As usual we start with a naive solution.

Let's say we're working with the following data model:

const UserSchema = { type: 'object', properties: { firstName: { type: 'string' }, lastName: { type: 'string' }, addresses: { type: 'array' items: { type: 'object', properties: { street1: { type: 'string' }, street2: { type: 'string' }, city: { type: 'string' } } } } } }

Here's what a user looks like:

const sherlock = { firstName: 'Sherlock', lastName: 'Holmes', addresses: [{ street1: '221 Baker St', street2: 'Apartment B', city: 'London' }] }

Let's practicing our “unlearning” skills and hardcode the solution, like we did last time.

type User = { firstName: string lastName: string addresses: { street1: string street2: string city: string }[] } function deepClone(user: User): User { return { firstName: user.firstName, lastName: user.lastName, addresses: user.addresses.map((value) => ({ street1: value.street1, street2: value.street2, city: value.city })) } }

Let's benchmark it.

Benchmarking our naive solution

If you'd like to run the benchmarks yourself, you can do so in the Bolt sandbox.

Here's the breakdown:

Lodash 205.09 µs/iter 203.25 µs █ (173.25 µs … 883.04 µs) 829.71 µs ▄█ ( 14.52 kb … 725.08 kb) 65.87 kb ██▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ 4.49 ipc ( 0.66% stalls) 99.07% L1 data cache 621.43k cycles 2.79M instructions 33.01% retired LD/ST (920.05k) deepClone 5.40 µs/iter 5.45 µs ▂█ (5.19 µs … 5.65 µs) 5.61 µs ▃███ ( 0.83 b … 1.97 kb) 169.67 b █▃▇█▃▃▁▃▁████▇▇▅▁▃▅▃▃ 5.68 ipc ( 1.07% stalls) 99.62% L1 data cache 17.05k cycles 96.82k instructions 37.40% retired LD/ST ( 36.21k) ┌ ┐ Lodash ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 205.09 µs deepClone ┤ 5.40 µs └ ┘ ┌ ┐ ╷┌┬ ╷ Lodash ├┤│─────────────────────────────────┤ ╵└┴ ╵ ┬ deepClone │ ┴ └ ┘ 5.19 µs 417.45 µs 829.71 µs summary deepClone 37.98x faster than Lodash

Yep, it's fast. It could probably be faster, but this blogpost isn't about micro-optimization – it's about drawing the rest of the owl.

Two call outs, before we move on:

Consistency

Notice how inconsistent Lodash is. Here's the range:

173.25 µs … 883.04 µs 883.04 / 173.25 = 5.1

That's a lot of variance. Depending on the run, Lodash sometimes took 400% more time to do the job.

Compare that to our implementation:

(5.19 µs … 5.65 µs) 5.65 / 5.19 = 1.1

By comparison, our implementation deviated by only 10%.

So in addition to being faster, it's also more predictable.

That makes sense – ours knows the shape of the data it will receive, so its performance is similarly knowable.

Instruction count

Notice how much work Lodash is doing:

621.43k cycles 2.79M instructions

Ours:

17.05k cycles 96.82k instructions

That also makes sense. Our implementation doesn't need to introspect the data, or make any decisions. All it does is clone data.

Interestingly, the difference in instruction count differs by nearly the same factor as time spent:

Lodash: 2.79M instructions Ours: 96_820 * 37x = 3.58M instructions

Implementing JavaScript's fastest “deep clone”

Like last time, we'll use a “spec” (JSON Schema) to front-load the cost, so we only have to pay it once.

And, like last time, we'll use the schema to build up our program as a string, which we can then pass to the built-in Function constructor.

If you haven't seen this pattern before, my previous post talks about the pattern and why it works.

In other words, this is what we want:

const deepClone = Function('data', ` return { firstName: data.firstName, ...data.lastName !== undefined && ({ lastName: data.lastName }), addresses: data.addresses.map((value) => ({ street1: value.street1, ...value.street2 !== undefined && ({ street2: value.street2 }), city: value.city })) } `)

Mapping schema to syntax

Our implementation will take a JSON Schema as input, and return a big string.

We'll have to use recursion at some point, but postpone that part for as long as we can.

First, we need to decompose our big problem into smaller sub-problems.

When it comes to recursion, it's almost always easier to start at the leaf nodes.

Leaf nodes

How should we handle strings nodes?

Let's peek at our implementation again:

function deepClone(data) { return { firstName: data.firstName // 𐙘____________𐙘 // here } }

If we're at a string node, we return the value at that address, in this case, the string data.firstName.

Which brings us to our first problem: how do we get the path? How do we know to return data.firstName, and not data.lastName, or data.x.y.z for that matter?

Continuation passing style (CPS)

If you're like me, your first thought might have been to create a local “path” variable, or add a “context” parameter, where we keep track of the path to our data.

That would work, but on the other hand, that sounds tricky. Implementing that will require a lot of “top-level” coordination, which (speaking from experience) is brittle and tricky to debug (if we screw up the coordination, we'll need to start at the top, and work our way down).

Alternatively, we could punt on all that, and make our implementation a function that accepts the path as an argument. That puts the problem of "building up a path" much closer to where we'll actually be using it.

type Builder = (path: string[]) => string const joinPath: Builder = (path) => path.reduce( (acc, cur) => `${acc}.${cur}`, '' ) const cloneString: Builder = (path) => joinPath(path)

That's actually all we need to do for the leaf nodes.

Array nodes

Okay, let's do arrays.

For arrays, our implementation will need to return a function that accepts a path, and returns a string.

In JSON Schema, arrays are represented like this:

{ type: 'array', items: { type: 'string' } }

But in our case, since we're working from the bottom up, { type: 'string' } will be a function, instead:

{ type: 'array', items: Builder // 𐙘_____𐙘 // we already replaced the string node // with `cloneString` }

So we'll need to call it:

const cloneArray : (builder: Builder) => Builder = (builder) => (path) => joinPath(path) + `.map((value) => (${builder(['value'])}))`

In other words, our array cloner will be called with its child node (builder) as an argument, then, since it's a Builder, it receives the path from its parent.

It then passes a new path (the string "value") down to its child, since its children don't need to care about the full path from the root anymore.

If you're still confused, it might help to look at our original implementation again:

{ // ... addresses: data.addresses.map((value) => ({ // 𐙘___𐙘 })) }

That's actually all we need to do for arrays.

Object nodes

Objects will be a little trickier, but not by much.

Here's the JSON Schema for object schemas:

{ type: 'object', properties: { street1: { type: 'string' }, street2: { type: 'string' }, city: { type: 'string' } } }

Since we're building from the bottom-up, here's what our objects schemas will look like, mid-traversal:

{ type: 'object', properties: { street1: Builder, street2: Builder, city: Builder } }

In other words, instead of receiving a single builder as a child, we'll be receiving a record whose values are each builders.

We'll also need to call each of the builders with its key appended to the end of the path. Like this:

const cloneObject : (builders: Record<string, Builder>) => Builder = (builders) => (path) => { const children = Object.entries(builders) .map( ([k, builder]) => '' + k + ': ' + builder([...path, k]) ) return '{' + children.join(', ') + ' }' }

That's it for objects.

Wiring things up

Let's wire up our individual schema handlers:

type JsonSchema<T> = | { type: 'string' } | { type: 'array', items: T } | { type: 'object', properties: Record<string, T> } const fold : (schema: JsonSchema<Builder>) => Builder = (schema) => { switch (schema.type) { case 'string': return cloneString case 'array': return cloneArray(schema.items) case 'object': return cloneObject(schema.properties) } }

Note that our JsonSchema type isn't recursive. Not only does this help with TypeScript IDE performance, but it also lets us accurately represent the actual type of our schema during traversal.

Note that our fold function is also not recursive. We're not recursing yet; we're still wiring things up.

If you haven't seen this pattern before, I wrote about it previously in Recursion in TypeScript (without the tears).

Making it recursive

I said this already, but it's worth reiterating: this section assumes you're familiar with the idea of a Functor.

Prerequisites

If you need to brush up, here are a few resources:

Functors, applicatives and Monads in pictures
Recursion in TypeScript, which covers the same pattern in more depth

Okay, we're finally ready to wire up our fold function.

To do that, we need 2 more components:

A JSON Schema Functor

A Functor is an “interface” from functional programming.

Data that satisfies the Functor interface has a single method, map, which takes a function, and applies the function to the element(s) inside the Functor.

If you've ever used Array.prototype.map before, then you're already familiar with the Array Functor.

Let's define a Functor for JSON Schema specs:

const Functor = { map<S, T>(fn: (src: S) => T) { return (x: JsonSchema<S>): JsonSchema<T> => { switch (x.type) { case 'string': return x case 'array': return { type: 'array', items: fn(x.items) } case 'object': return { type: 'object', properties: Object.fromEntries( Object.entries(x.properties).map( ([k, v]) => [k, fn(v)] ) ) } } } } }

A `combine` function

To do that, we're going to write a function called combine. I call it that because it's closely related to the Y-combinator, and because the traditional name for it (catamorphism) isn't particularly accessible:

type Fold<T> = (x: JsonSchema<T>) => T const combine : <T>(fold: Fold<T>) => Fold<T> = (fold) => function loop(value) { return fold(Functor.map(loop)(value)) }

Putting it all together

Let's put it all together now:

const buildDeepClone = (schema: JsonSchema<any>) => Function( 'data', 'return ' + combine(fold)(schema)(['data']) )

For ~60 lines of code (about 20 of them types), that's not bad, especially for a solution that's an order of magnitude faster than the one that ships with the platform.

Here's a sandbox if you'd like to see it working.

Fuzz testing

We're not done yet. We've tested our buildDeepClone function with a few hardcoded inputs, but that doesn't give us enough confidence to ship this as a library.

I don't know about you, but I'm lazy. I'd rather not rack my brain to think of edge cases, and my fingers hurt just thinking about typing them all out.

Not to mention, those kinds of tests are super brittle: if we change something about our implementation, we have to go and update a bunch of tests by hand, which often involves a bunch of guess-work to figure out what the tests mean.

Let's fuzz test our buildDeepClone function instead.

For that, we'll use a library called fast-check. If you're not familiar with it, now you know.

`fast-check`

fast-check is a property testing library that exposes a combinator API that will feel very familiar if you've used zod before.

You use these combinators to build up arbitraries. Arbitraries are pseudo-random data generators.

Our goal:

generate a random JSON Schema
generate a random input that satisfies the JSON Schema
pass the schema to our buildCloneDeep function to derive a deepClone function
pass the random input to our deepClone function
use vitest to make sure that the generated input and the cloned input are deeply equal to each other

Generate a random JSON Schema

First, let's write an arbitrary that generates a “random” (pseudo-random) string schema.

Generating a string schema

import * as fc from 'fast-check' const stringSchemaArbitrary = fc.constant({ type: 'string' })

In this example we're using the fc.constant combinator to generate a constant value.

The reason we need to wrap this constant value in an arbitrary is so that we can compose it with other the other arbitraries we need to write.

Generating an array schema

Okay, let's build the array schema arbitrary:

import * as fc from 'fast-check' const arraySchemaArbitrary = <T>(model: fc.Arbitrary<T>) => fc.record({ type: fc.constant('array'), items: model })

Generating an object schema

Now let's build the object schema arbitrary:

import * as fc from 'fast-check' const objectSchemaArbitrary = <T>(model: fc.Arbitrary<T>) => fc.record({ type: fc.constant('object'), properties: fc.dictionary(fc.string(), model) })

All together now

Okay, so we have our component parts. How do we compose them to generate a random JSON Schema?

For that, we'll use fc.letrec.

Using the fc.letrec API might feel familiar to you. That's because it's design is very similar to the “continuation passing” approach we used earlier to generate our stringified function bodies.

Here's what using it looks like:

import * as fc from 'fast-check' const SchemaBuilder = fc.letrec((tie) => ({ string: stringSchemaArbitrary, array: arraySchemaArbitrary(tie('*')), object: objectSchemaArbitrary(tie('*')), ["*"]: fc.oneof( tie('string'), tie('array'), tie('object'), ) }))

Using it looks like this:

import * as fc from 'fast-check' const twoSchemas = fc.sample(SchemaBuilder['*'], 2) console.log(JSON.stringify(twoSchemas[0], null, 2)) // => { // "type": "object", // "properties": { // "$C_5g$G$": { "type": "string" }, // "G_If_1$_": { "type": "string" }, // "ku": { "type": "string" }, // "$_$_9_c$I": { // "type": "array", // "items": { "type": "string" } // }, // "VF$_$": { "type": "string" }, // "G": { "type": "string" } // } // } // } console.log(JSON.stringify(twoSchemas[1], null, 2)) // => { // "type": "array", // "items": { // "type": "array", // "items": { "type": "string" } // } // }

That covers our random JSON Schema generator. What about the inputs?

Reading the schema to generate random inputs

For this, we can re-use the combine function we wrote earlier to avoid having to deal with walking the schema tree.

import * as fc from 'fast-check' export const generateData = combine((schema) => { switch (schema.type) { case 'string': return fc.string() case 'array': return fc.array(schema.items) case 'object': return fc.record(schema.properties) } })

Let's unpack that:

Our generateData function accepts a JSON Schema as input, then pattern matches on the schema's "type" property:

if schema.type === "string", we return a string arbitrary
if schema.type === "array", we return an array arbitrary, passing the arbitrary at schema.items into fc.array
if schema.type === "object", we return a record arbitrary, passing the record of arbitraries at schema.properties into fc.record

And that's it. Because combine handles the recursion for us, we're done.

Writing our test

fast-check is designed to integrate seamlessly with vitest (and if this issue gets traction, it might someday be integrated as a first-class part of the API 🤞).

Here's all we need to do to fuzz test that our buildDeepClone function works with any arbitrary JSON Schema:

import { assert, test } from 'vitest' import * as fc from 'fast-check' test('fuzz test', () => { fc.assert( fc.property(SchemaBuilder['*'], (schema) => { const [data] = fc.sample(generateData(schema)) const deepClone = buildDeepClone(schema) const clone = deepClone(data) assert.deepEqual(clone, data) }), { numRuns: 1000 } ) })

That's it! Notice that we're running the test 1,000 times. We can of course configure that – by default it runs the test 100 times.

Here's a sandbox if you'd like to run the test suite yourself.

Tip: If you run the test suite, you can scroll up in the terminal to see the generated schema, generated input, and cloned data for each run.

Closing thoughts

Hopefully this helped you see, if not the whole owl, enough that you could draw the rest if you wanted.

If you'd prefer not to write this kind of thing yourself, take a look at @traversable/schema.

We support generating deepClone, deepEqual and more for the follow schema types:

And thanks for reading! This one was pretty long. If you made it all the way through and have feedback, or just want to chat about anything we talked about, feel free to reach out via email at ahrjarrett@gmail.com.

Aside

This approach to building leaner, more localized alternatives to bulky (and often slow) APIs was inspired by, of all things, Dune.

More specifically, it was a video about the thematic role of architecture in the Dune series.

This quote, which compares the Harkonnen's design to the Fremen's, is what got me thinking about it:

The [Harkonnens], even though they're technically the most powerful and resourceful family, struggle to adapt to Arrakis' harsh conditions, and they rely mostly on imported resources.

The Fremen on the other hand have been able to develop solutions that are locally adapted approaches that are specifically designed for [their] climate.

Instead of seeking universally applicable high tech solutions, maybe we need to be looking more for nuanced locally adapted approaches.

– Dami Lee, Dune Cities: Fiction or Reality?

I think this kind of approach could be explored in our line of work more often. It seems like this is a relatively new frontier (for the JavaScript ecosystem, at least), and I'm excited to see what others come up with as we explore the space of possibilities.

That's all for now. Cheers.

DEV Community

How to build JavaScript's fastest “deep clone” function

tl;dr

The problem

So what's the problem?

A “naive” solution

Benchmarking our naive solution

Consistency

Instruction count

Implementing JavaScript's fastest “deep clone”

Mapping schema to syntax

Leaf nodes

Continuation passing style (CPS)

Array nodes

Object nodes

Wiring things up

Making it recursive

Prerequisites

A JSON Schema Functor

A `combine` function

Putting it all together

Fuzz testing

`fast-check`

Generate a random JSON Schema

Generating a string schema

Generating an array schema

Generating an object schema

All together now

Reading the schema to generate random inputs

Writing our test

Closing thoughts

Aside

Links

Top comments (0)

tl;dr

The problem

So what's the problem?

A “naive” solution

Benchmarking our naive solution

Consistency

Instruction count

Implementing JavaScript's fastest “deep clone”

Mapping schema to syntax

Leaf nodes

Continuation passing style (CPS)

Array nodes

Object nodes

Wiring things up

Making it recursive

Prerequisites

A JSON Schema Functor

A combine function

Putting it all together

Fuzz testing

fast-check

Generate a random JSON Schema

Generating a string schema

Generating an array schema

Generating an object schema

All together now

Reading the schema to generate random inputs

Writing our test

Closing thoughts

Aside

Links

A `combine` function

`fast-check`