While reading the FastAPI documentation I came across various python libraries for working with json
. This got me thinking which one is the best, performance wise (including the python's built-in json module).
I surfed the net, found some resources and comparisions, and thought of implementing them on my own. So, this article is about the perf benchmarks I got when testing these libraries and the methodology.
Test Setup
- Ubuntu 24.04 LTS
- RAM - 24 GB
- Python version - 3.12
Libraries tested
-
json
- Built-in Python JSON library; widely used but relatively slow. -
ujson
- Fast C-based JSON parser; a drop-in replacement for json. -
orjson
- Extremely fast Rust-based library with rich type support. -
rapidjson
- Python wrapper for RapidJSON (C++); good performance and flexibility. -
msgspec
- Ultra-fast library with optional typed structs for maximum speed.
Results
Library | Serialization Time (s) | Deserialization Time (s) |
---|---|---|
json | 1.616786 | 1.616203 |
ujson | 1.413367 | 1.853332 |
orjson | 0.417962 | 1.272813 |
rapidjson | 2.044958 | 1.717067 |
msgspec | 0.489964 | 0.930834 |
Takeways
- The top contendors are
orjson
andmsgspec
(duh).- I personally like to use orjson when working with fastAPI as it has builtin support for orjson response format making it a more developer friendly option.
- msgspec on the other hand is like a swiss army knife as it has support for other structs as well like
yaml
,toml
etc. And it has a ton of more validation features like validating with a python class (sort of like pydantic).
Methodology
Generating sample data
A simple script to generate the testing data (can be made complex for better benchmarking).
def generate_sample_data(n): return [ { "id": i, "name": f"User{i}", "active": bool(i % 2), "scores": [random.random() for _ in range(10)], "info": {"age": random.randint(20, 40), "city": "City" + str(i)} } for i in range(n) ] data = generate_sample_data(10_000)
Benchmarking function
Store the encoding and decoding methods of every library in a dictionary with the lib names as the key and run this function.
def benchmark_json_libs(data, runs=10): results = [] # Pre-serialize data for each lib pre_serialized = {} for name, funcs in libs.items(): try: pre_serialized[name] = funcs["dumps"](data) except Exception as e: print(f"Serialization failed for {name}: {e}") continue for name, funcs in libs.items(): if name not in pre_serialized: continue try: ser_time = timeit.timeit(lambda: funcs["dumps"](data), number=runs) deser_time = timeit.timeit(lambda: funcs["loads"](pre_serialized[name]), number=runs) results.append({ "Library": name, "Serialization Time (s)": ser_time, "Deserialization Time (s)": deser_time }) except Exception as e: print(f"Benchmarking failed for {name}: {e}") return pd.DataFrame(results)
Visualization
Visualize the result using seaborn and matplotlib.
fig, axes = plt.subplots(1, 2, figsize=(14, 5)) sns.barplot(x="Library", y="Serialization Time (s)", data=results_df, ax=axes[0]) axes[0].set_title("JSON Serialization Time") sns.barplot(x="Library", y="Deserialization Time (s)", data=results_df, ax=axes[1]) axes[1].set_title("JSON Deserialization Time") plt.tight_layout() plt.show()
And that's it. Thanks for reading. Bye.
Top comments (1)
Good one 🙌