BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Rust Rewrite Enables Cloudflare to Boost CDN Performance and Enhance Security

Rust Rewrite Enables Cloudflare to Boost CDN Performance and Enhance Security

Listen to this article -  0:00

By adopting Rust for one of its core subsystems, Cloudflare succeeded in reducing response time by 10 ms and boosting performance by 25%. Additionally, the company emphasized that Rust made their system more secure and reduced development time.

On the heels of its successful migration to Rust for the Pingora subsystem, Cloudflare's engineers have rewritten from scratch one of the company's oldest and most critical components, FL, the "brain of Cloudflare":

FL is the brain of Cloudflare. Once a request reaches FL, we then run the various security and performance features in our network. It applies each customer’s unique configuration and settings, from enforcing WAF rules and DDoS protection to routing traffic to the Developer Platform and R2 [Cloudflare's object store, EN].

Cloudflare's architects decided to base FL2 on Oxy, their own internal framework for building proxies, which also includes support for monitoring, soft reloads, and dynamic configuration loading and swapping.

In particular, Oxy's built-in mechanism for graceful restarts is a critical feature for a proxy, since terminating a process would otherwise break all active connections. To prevent this, when an Oxy instance needs to be terminated, it stops accepting new connections but continues serving existing ones until they end naturally.

One critical hurdle for Cloudflare's architects was how to replace a running system that had underpinnes 15 years of Cloudflare products and was still evolving. To prevent their teams from having to implement each new feature twice, once for the LuaJIT-based FL and once for the new Rust-based FL2, they created a layer inside of FL which allowed new modules implemented in Rust for FL2 to run seamlessly.

Instead of maintaining a parallel implementation, teams could implement their logic in Rust, and replace their old Lua logic with that, without waiting for the full replacement of the old system.

To properly handle such a migration, Cloudflare's architects also defined a clear testing and rollout strategy. For testing, they used Flamingo, a system capable of running thousands of full end-to-end test requests concurrently against both FL1 and FL2. Each change is rolled out gradually, and at each stage it is fully tested against increasing traffic and benchmarked to make sure performance and resource usage remain acceptable.

Another key mechanism was FL2 ability to pass any request it could not handle to FL1. This fallback mechanism was essential for gradually increasing FL2 usage without compromising the overall stability of Cloudflare services. During the rollout, as FL2 matured, it handled a growing proportion of traffic, while the amount of traffic falling back to FL1 decreased correspondingly.

The main advantage of building FL2, say Cloudflare architects, is the performance gain it delivers. This can be traced back to two main factors: the fact that FL2 was entirely written in a high-performance language like Rust, rather than using a mix of C, Rust, and Lua code. Additionally, FL1 had to spend quite some time to convert data representation from one language to another. As a result, FL2 requires only half the CPU of FL1 and less than half the memory.

As a final note, FL2 also benefits from Rust compile-time security guarantees, reinforced by strong linting and checking rules, as well as strict coding standards, testing and review processes. This has led to a significant reduction in crashes, most of them due to hardware failures.

About the Author

BT