perf: add suite for transport performance #2224

iduartgomez · 2025-12-05T18:51:23Z

Transport Layer Performance Analysis and Benchmark Suite

Summary

This PR introduces a comprehensive performance analysis of the Freenet transport layer, identifying critical bottlenecks and providing a benchmark suite for ongoing performance monitoring.

Key Finding: Syscall overhead (70% of per-packet time) is the primary bottleneck, not encryption or serialization.

Benchmark Results

Per-Packet Time Breakdown (1364 bytes)

Component	Time	% of Total
UDP syscall (send)	12.7µs	70%
Serialization (bincode)	3.82µs	21%
Encryption (AES-128-GCM)	1.14µs	6%
Channel/async overhead	~0.5µs	3%
Total	~18µs	100%

Current Throughput Limits

Metric	Value
Max packets/sec	~80,000 pps
Max bandwidth	~870 Mbps
Theoretical (no syscall bottleneck)	~2.2 Gbps

Critical Issues Identified

1. Channel Buffer Size = 1 (HIGH PRIORITY)

Location: crates/core/src/transport/peer_connection.rs:592

let (sender, receiver) = mpsc::channel(1); // BOTTLENECK

Impact: Benchmark shows 27x throughput difference between channel(1) and channel(100):

Buffer Size	Throughput
1	80K elem/s
100	2.2M elem/s

Fix: Change to mpsc::channel(100) - trivial, zero-risk change.

2. Random Nonce Generation (MEDIUM PRIORITY)

Location: crates/core/src/transport/packet_data.rs:151

let nonce: [u8; NONCE_SIZE] = RNG.with(|rng| rng.borrow_mut().random());

Benchmark:

Method	Time
Random	34.7ns
Counter	6.3ns

Impact: 5.5x faster with counter-based nonces. Counter nonces are equally secure for AES-GCM when combined with a unique key per connection.

Fix: Use atomic counter + connection ID for nonce generation.

3. No Syscall Batching (HIGH PRIORITY - LARGER EFFORT)

Current: One send() syscall per packet = 12.7µs overhead each.

Solution: Use sendmmsg()/recvmmsg() to batch multiple packets per syscall.

Expected Impact: 10x reduction in syscall overhead, potentially reaching 800K+ pps.

Implementation: Requires socket2 crate and platform-specific code paths.

4. Serialization Overhead (MEDIUM PRIORITY - LARGER EFFORT)

Bincode serialization is 21% of packet creation time (3.82µs for 1364 bytes).

Potential Optimizations:

Pre-computed message templates for common patterns
Zero-copy serialization (rkyv, flatbuffers)
Avoid Vec allocations in hot path

Files Added

Documentation

docs/architecture/transport_perf_analysis.md - Full architecture analysis
docs/architecture/transport_benchmark_methodology.md - Benchmarking methodology

Benchmarks

crates/core/benches/transport_perf.rs - Criterion benchmark suite

Scripts

scripts/run_benchmarks.sh - Helper script with environment validation

Benchmark Suite

Levels

Level	What it Measures	Noise Level	CI Safe
0	Pure computation (encrypt, serialize)	<2%	✅
1	Protocol logic (channels, routing)	~5%	✅
2	Syscall overhead (loopback UDP)	~10%	⚠️
3	System limits (stress tests)	~15%	❌

Running

# All benchmarks (without tracing for accurate results) cargo bench --bench transport_perf --no-default-features --features "redb,websocket" # Specific levels cargo bench --bench transport_perf --no-default-features --features "redb,websocket" -- "level0/" # Using helper script ./scripts/run_benchmarks.sh level0 level1

Immediate Action Items

Trivial Fixes (< 1 hour each)

Change channel buffer 1 → 100 in peer_connection.rs:592
- Impact: 27x channel throughput
- Risk: None
Switch to counter-based nonces in packet_data.rs
- Impact: 5.5x faster nonce generation
- Risk: None (counter + connection ID is standard practice)

Medium-Term (1-2 days each)

Implement sendmmsg/recvmmsg batching
- Impact: ~10x syscall efficiency
- Requires: socket2 crate, platform detection
Increase inbound channel buffers in connection_handler.rs
- Related to documented packet drops (2251 packets in 10s)

Longer-Term

Zero-copy serialization investigation
GSO/GRO support for kernel offload
io_uring for Linux 5.1+

Validation

The benchmark suite allows validating any optimization:

# Before change git stash ./scripts/run_benchmarks.sh level0 level1 # After change git stash pop ./scripts/run_benchmarks.sh level0 level1 # Criterion automatically shows delta

- Analyzed test_small_network_get_failure failure modes - Identified gateway crash root cause (fixed in a283e23) - Documented PUT operation timeout issues - Provided recommendations for re-enabling the test - Suggested modernization using #[freenet_test] macro Related: freenet#2023, freenet#2043, freenet#2036, freenet#2011

Re-enabled the previously ignored test with key improvements: - Removed #[ignore] attribute - recent fixes should resolve issues - Increased PUT timeout: 30s → 90s (accounts for connection delays) - Increased overall test timeout: 120s → 180s (3 minutes) - Added detailed error messages for better debugging - Added documentation of recent fixes that resolved the issues Recent fixes that should prevent failures: - a283e23: Fixed gateway crashes during timeout notifications - 615f02d: Fixed PUT response routing through forwarding peers - 5734a33: Fixed local caching before forwarding PUTs Related: freenet#2023

The freenet-ping contract failed to compile because freenet-ping-types uses freenet_stdlib::time::now() when 'std' feature is disabled, but the 'contract' feature wasn't propagated to freenet-stdlib. Changes: - Added 'contract' feature to freenet-ping-types Cargo.toml - Enabled 'contract' feature in ping contract's types dependency - This allows WASM contract compilation to access time::now() function Fixes compilation error when test_small_network_get_failure loads and compiles the ping contract at runtime.

Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>

Cleaned up redundant comments throughout the test file that were explaining self-evident code. Kept the TODO comment as it's actionable. Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>

…enet#2023 This change annotates test_small_network_get_failure with test_log to capture test execution logs in CI. The test passes locally but fails in CI, and these logs will help us debug the issue. Changes: - Add test-log 0.2 to dev-dependencies in freenet-ping-app - Replace manual logger setup with #[test_log::test] attribute - Remove unused LevelFilter import - Logs will now be captured and displayed on test failure This will help us understand what's happening during CI test execution and identify the root cause of issue freenet#2023.

- Update Cargo.lock to include test-log dependency changes - Fix formatting (remove extra blank line in test function) - Resolves CI build failure with --locked flag This addresses the CI error: "the lock file needs to be updated but --locked was passed"

Removes unnecessary implementation notes that were no longer relevant after previous fixes to connection management and timeout handling. Related to issue freenet#2023 investigation.

Update lock file after rebasing PR freenet#2055 onto latest main.

…LuLNaER7JBmDiLQXuDAtpv fix: critical operation state management issues (freenet#1977)

This commit adds comprehensive documentation and tooling for analyzing and measuring Freenet transport layer performance. ## Analysis Document (docs/architecture/transport_perf_analysis.md) Documents the current transport architecture and identifies several performance bottlenecks: - No batch I/O (recvmmsg/sendmmsg): 100x syscall overhead - Channel buffer=1 for streams: causes packet drops under load - Over-conservative metadata reservation: 5.4% capacity loss - No GSO/GRO support: CPU bottleneck at high pps - Naive rate limiting: no congestion avoidance - Disabled global rate limiter: no global flow control ## Benchmark Suite (crates/core/benches/transport_perf.rs) Introduces criterion-based benchmarks in four categories: 1. Microbenchmarks: AES-GCM encryption/decryption, nonce generation, bincode serialization 2. Component benchmarks: mpsc channel throughput, syscall overhead 3. End-to-end benchmarks: placeholder for full transport throughput and latency distribution tests 4. Stress tests: maximum packet rate measurement Run with: cargo bench -p freenet --bench transport_perf

Adds comprehensive documentation on how to benchmark transport layer performance while isolating from OS/virtualization noise. ## Benchmark Levels Introduces a 4-level hierarchy from pure computation to full stack: - Level 0 (Pure Logic): Zero I/O, measures encryption/serialization - Completely deterministic, safe for CI - 2% noise threshold - Level 1 (Mock I/O): In-process channels, no syscalls - Measures protocol overhead without kernel - 5% noise threshold - Level 2 (Loopback): Real sockets on 127.0.0.1 - Includes syscall overhead, no NIC - 10% noise threshold - Level 3 (Stress): System limit testing - Highly environment-dependent - 15% noise threshold ## Key Insights - Docker on Mac/Windows adds 20-50% overhead (hidden VM) - CPU isolation (isolcpus) critical for reproducibility - Use MockSocket infrastructure for protocol-only measurements - Virtualization overhead: 5-30% depending on config ## Usage ```bash # CI-safe benchmarks only cargo bench --bench transport_perf -- "level0/" "level1/" # Full suite (bare metal, isolated CPUs) cargo bench --bench transport_perf ```

Rewrites transport benchmarks to minimize measurement noise: ## Noise Reduction Techniques 1. **Level 0 (Pure Logic)**: Zero I/O, zero async - Pre-allocated buffers reused across iterations - Fixed keys/nonces (measuring crypto, not RNG) - std::hint::black_box for DCE prevention - 2% noise threshold 2. **Level 1 (Mock I/O)**: For comparative measurement - Uses iter_batched to separate setup from hot path - tokio runtime created ONCE outside benchmark - Measures code changes through consistent overhead - Bias (not noise) - cancels out in A/B comparisons 3. **Feature gating**: Run with --no-default-features to disable tracing - Even disabled tracing macros have overhead - Benchmark script uses: --no-default-features -F redb,websocket ## New Benchmarks - bench_packet_creation: Combined serialize + encrypt path - bench_memcpy: Baseline for data movement costs - bench_channel_try_send: Backpressure overhead - bench_packet_routing: HashMap lookup with N peers ## Helper Script Added scripts/run_benchmarks.sh: - Environment validation (CPU governor, turbo, isolated CPUs) - Automatic feature flag handling - Level selection (./run_benchmarks.sh level0 level1) ## Usage for A/B Testing ```bash git stash && ./scripts/run_benchmarks.sh level1 # baseline git stash pop && ./scripts/run_benchmarks.sh level1 # with change # Criterion shows delta automatically ```

Implement high-priority performance improvements from issue freenet#2226: 1. Increase inbound stream channel buffer from 1 to 64 - Addresses 27x throughput difference identified in benchmarks - Reduces channel contention for incoming packet fragments 2. Replace random nonce generation with counter-based approach - Uses atomic counter + random prefix for AES-GCM nonces - ~5.5x faster than random generation while maintaining uniqueness - Random prefix ensures uniqueness across process restarts - Counter ensures uniqueness within process lifetime These changes target the low-effort, high-impact optimizations from the transport layer performance analysis (PR freenet#2224).

claude and others added 19 commits December 4, 2025 23:34

docs: remove investigation report file

3a35845

Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>

refactor: remove redundant comments from test_small_network_get_issue

c663ee1

Cleaned up redundant comments throughout the test file that were explaining self-evident code. Kept the TODO comment as it's actionable. Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>

refactor: remove redundant comments from test_small_network_get_issue

1c44879

Removes unnecessary implementation notes that were no longer relevant after previous fixes to connection management and timeout handling. Related to issue freenet#2023 investigation.

chore: update freenet-ping Cargo.lock after rebase

28eaae9

Update lock file after rebasing PR freenet#2055 onto latest main.

Merge pull request #196 from iduartgomez/claude/sync-upstream-main-01…

13b76a0

…LuLNaER7JBmDiLQXuDAtpv fix: critical operation state management issues (freenet#1977)

Merge branch 'freenet:main' into main

d755f65

Merge branch 'freenet:main' into main

c50d888

fix: resolve closure capture issues in benchmarks

2cf4ee4

fix: correct async runtime usage in level2/level3 benchmarks

beca3b8

docs: add issue template for transport performance analysis PR

00c86b2

fix: remove issue file and fix clippy warnings in benchmarks

fa0d096

iduartgomez marked this pull request as ready for review December 5, 2025 19:26

iduartgomez added this pull request to the merge queue Dec 5, 2025

Merged via the queue into freenet:main with commit b9309dc Dec 5, 2025
8 checks passed

iduartgomez deleted the claude/transport-perf-analysis-01B67NMKH8bo7oHd9zWvZUds branch December 5, 2025 19:39

iduartgomez mentioned this pull request Dec 5, 2025

Transport Layer Performance Improvements #2226

Open

6 tasks

iduartgomez mentioned this pull request Dec 5, 2025

perf: transport layer optimizations for throughput #2227

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf: add suite for transport performance #2224

perf: add suite for transport performance #2224

Uh oh!

iduartgomez commented Dec 5, 2025 •

edited

Loading

Uh oh!

Labels

2 participants

Uh oh!

perf: add suite for transport performance #2224

perf: add suite for transport performance #2224

Uh oh!

Conversation

iduartgomez commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Transport Layer Performance Analysis and Benchmark Suite

Summary

Benchmark Results

Per-Packet Time Breakdown (1364 bytes)

Current Throughput Limits

Critical Issues Identified

1. Channel Buffer Size = 1 (HIGH PRIORITY)

2. Random Nonce Generation (MEDIUM PRIORITY)

3. No Syscall Batching (HIGH PRIORITY - LARGER EFFORT)

4. Serialization Overhead (MEDIUM PRIORITY - LARGER EFFORT)

Files Added

Documentation

Benchmarks

Scripts

Benchmark Suite

Levels

Running

Immediate Action Items

Trivial Fixes (< 1 hour each)

Medium-Term (1-2 days each)

Longer-Term

Validation

Related

Uh oh!

Labels

2 participants

iduartgomez commented Dec 5, 2025 •

edited

Loading