Skip to content

Conversation

@iduartgomez
Copy link
Collaborator

@iduartgomez iduartgomez commented Dec 5, 2025

Transport Layer Performance Analysis and Benchmark Suite

Summary

This PR introduces a comprehensive performance analysis of the Freenet transport layer, identifying critical bottlenecks and providing a benchmark suite for ongoing performance monitoring.

Key Finding: Syscall overhead (70% of per-packet time) is the primary bottleneck, not encryption or serialization.


Benchmark Results

Per-Packet Time Breakdown (1364 bytes)

Component Time % of Total
UDP syscall (send) 12.7µs 70%
Serialization (bincode) 3.82µs 21%
Encryption (AES-128-GCM) 1.14µs 6%
Channel/async overhead ~0.5µs 3%
Total ~18µs 100%

Current Throughput Limits

Metric Value
Max packets/sec ~80,000 pps
Max bandwidth ~870 Mbps
Theoretical (no syscall bottleneck) ~2.2 Gbps

Critical Issues Identified

1. Channel Buffer Size = 1 (HIGH PRIORITY)

Location: crates/core/src/transport/peer_connection.rs:592

let (sender, receiver) = mpsc::channel(1); // BOTTLENECK

Impact: Benchmark shows 27x throughput difference between channel(1) and channel(100):

Buffer Size Throughput
1 80K elem/s
100 2.2M elem/s

Fix: Change to mpsc::channel(100) - trivial, zero-risk change.


2. Random Nonce Generation (MEDIUM PRIORITY)

Location: crates/core/src/transport/packet_data.rs:151

let nonce: [u8; NONCE_SIZE] = RNG.with(|rng| rng.borrow_mut().random());

Benchmark:

Method Time
Random 34.7ns
Counter 6.3ns

Impact: 5.5x faster with counter-based nonces. Counter nonces are equally secure for AES-GCM when combined with a unique key per connection.

Fix: Use atomic counter + connection ID for nonce generation.


3. No Syscall Batching (HIGH PRIORITY - LARGER EFFORT)

Current: One send() syscall per packet = 12.7µs overhead each.

Solution: Use sendmmsg()/recvmmsg() to batch multiple packets per syscall.

Expected Impact: 10x reduction in syscall overhead, potentially reaching 800K+ pps.

Implementation: Requires socket2 crate and platform-specific code paths.


4. Serialization Overhead (MEDIUM PRIORITY - LARGER EFFORT)

Bincode serialization is 21% of packet creation time (3.82µs for 1364 bytes).

Potential Optimizations:

  • Pre-computed message templates for common patterns
  • Zero-copy serialization (rkyv, flatbuffers)
  • Avoid Vec allocations in hot path

Files Added

Documentation

  • docs/architecture/transport_perf_analysis.md - Full architecture analysis
  • docs/architecture/transport_benchmark_methodology.md - Benchmarking methodology

Benchmarks

  • crates/core/benches/transport_perf.rs - Criterion benchmark suite

Scripts

  • scripts/run_benchmarks.sh - Helper script with environment validation

Benchmark Suite

Levels

Level What it Measures Noise Level CI Safe
0 Pure computation (encrypt, serialize) <2%
1 Protocol logic (channels, routing) ~5%
2 Syscall overhead (loopback UDP) ~10% ⚠️
3 System limits (stress tests) ~15%

Running

# All benchmarks (without tracing for accurate results) cargo bench --bench transport_perf --no-default-features --features "redb,websocket" # Specific levels cargo bench --bench transport_perf --no-default-features --features "redb,websocket" -- "level0/" # Using helper script ./scripts/run_benchmarks.sh level0 level1

Immediate Action Items

Trivial Fixes (< 1 hour each)

  • Change channel buffer 1 → 100 in peer_connection.rs:592

    • Impact: 27x channel throughput
    • Risk: None
  • Switch to counter-based nonces in packet_data.rs

    • Impact: 5.5x faster nonce generation
    • Risk: None (counter + connection ID is standard practice)

Medium-Term (1-2 days each)

  • Implement sendmmsg/recvmmsg batching

    • Impact: ~10x syscall efficiency
    • Requires: socket2 crate, platform detection
  • Increase inbound channel buffers in connection_handler.rs

    • Related to documented packet drops (2251 packets in 10s)

Longer-Term

  • Zero-copy serialization investigation
  • GSO/GRO support for kernel offload
  • io_uring for Linux 5.1+

Validation

The benchmark suite allows validating any optimization:

# Before change git stash ./scripts/run_benchmarks.sh level0 level1 # After change git stash pop ./scripts/run_benchmarks.sh level0 level1 # Criterion automatically shows delta

Related

  • Packet drop warnings in connection_handler.rs:339
  • Disabled rate limiter comment in connection_handler.rs:155-160
  • TODO comments in peer_connection.rs:35 and outbound_stream.rs:53-57
claude and others added 19 commits December 4, 2025 23:34
- Analyzed test_small_network_get_failure failure modes - Identified gateway crash root cause (fixed in a283e23) - Documented PUT operation timeout issues - Provided recommendations for re-enabling the test - Suggested modernization using #[freenet_test] macro Related: freenet#2023, freenet#2043, freenet#2036, freenet#2011
Re-enabled the previously ignored test with key improvements: - Removed #[ignore] attribute - recent fixes should resolve issues - Increased PUT timeout: 30s → 90s (accounts for connection delays) - Increased overall test timeout: 120s → 180s (3 minutes) - Added detailed error messages for better debugging - Added documentation of recent fixes that resolved the issues Recent fixes that should prevent failures: - a283e23: Fixed gateway crashes during timeout notifications - 615f02d: Fixed PUT response routing through forwarding peers - 5734a33: Fixed local caching before forwarding PUTs Related: freenet#2023
The freenet-ping contract failed to compile because freenet-ping-types uses freenet_stdlib::time::now() when 'std' feature is disabled, but the 'contract' feature wasn't propagated to freenet-stdlib. Changes: - Added 'contract' feature to freenet-ping-types Cargo.toml - Enabled 'contract' feature in ping contract's types dependency - This allows WASM contract compilation to access time::now() function Fixes compilation error when test_small_network_get_failure loads and compiles the ping contract at runtime.
Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>
Cleaned up redundant comments throughout the test file that were explaining self-evident code. Kept the TODO comment as it's actionable. Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>
…enet#2023 This change annotates test_small_network_get_failure with test_log to capture test execution logs in CI. The test passes locally but fails in CI, and these logs will help us debug the issue. Changes: - Add test-log 0.2 to dev-dependencies in freenet-ping-app - Replace manual logger setup with #[test_log::test] attribute - Remove unused LevelFilter import - Logs will now be captured and displayed on test failure This will help us understand what's happening during CI test execution and identify the root cause of issue freenet#2023.
- Update Cargo.lock to include test-log dependency changes - Fix formatting (remove extra blank line in test function) - Resolves CI build failure with --locked flag This addresses the CI error: "the lock file needs to be updated but --locked was passed"
Removes unnecessary implementation notes that were no longer relevant after previous fixes to connection management and timeout handling. Related to issue freenet#2023 investigation.
Update lock file after rebasing PR freenet#2055 onto latest main.
…LuLNaER7JBmDiLQXuDAtpv fix: critical operation state management issues (freenet#1977)
This commit adds comprehensive documentation and tooling for analyzing and measuring Freenet transport layer performance. ## Analysis Document (docs/architecture/transport_perf_analysis.md) Documents the current transport architecture and identifies several performance bottlenecks: - No batch I/O (recvmmsg/sendmmsg): 100x syscall overhead - Channel buffer=1 for streams: causes packet drops under load - Over-conservative metadata reservation: 5.4% capacity loss - No GSO/GRO support: CPU bottleneck at high pps - Naive rate limiting: no congestion avoidance - Disabled global rate limiter: no global flow control ## Benchmark Suite (crates/core/benches/transport_perf.rs) Introduces criterion-based benchmarks in four categories: 1. Microbenchmarks: AES-GCM encryption/decryption, nonce generation, bincode serialization 2. Component benchmarks: mpsc channel throughput, syscall overhead 3. End-to-end benchmarks: placeholder for full transport throughput and latency distribution tests 4. Stress tests: maximum packet rate measurement Run with: cargo bench -p freenet --bench transport_perf
Adds comprehensive documentation on how to benchmark transport layer performance while isolating from OS/virtualization noise. ## Benchmark Levels Introduces a 4-level hierarchy from pure computation to full stack: - Level 0 (Pure Logic): Zero I/O, measures encryption/serialization - Completely deterministic, safe for CI - 2% noise threshold - Level 1 (Mock I/O): In-process channels, no syscalls - Measures protocol overhead without kernel - 5% noise threshold - Level 2 (Loopback): Real sockets on 127.0.0.1 - Includes syscall overhead, no NIC - 10% noise threshold - Level 3 (Stress): System limit testing - Highly environment-dependent - 15% noise threshold ## Key Insights - Docker on Mac/Windows adds 20-50% overhead (hidden VM) - CPU isolation (isolcpus) critical for reproducibility - Use MockSocket infrastructure for protocol-only measurements - Virtualization overhead: 5-30% depending on config ## Usage ```bash # CI-safe benchmarks only cargo bench --bench transport_perf -- "level0/" "level1/" # Full suite (bare metal, isolated CPUs) cargo bench --bench transport_perf ```
Rewrites transport benchmarks to minimize measurement noise: ## Noise Reduction Techniques 1. **Level 0 (Pure Logic)**: Zero I/O, zero async - Pre-allocated buffers reused across iterations - Fixed keys/nonces (measuring crypto, not RNG) - std::hint::black_box for DCE prevention - 2% noise threshold 2. **Level 1 (Mock I/O)**: For comparative measurement - Uses iter_batched to separate setup from hot path - tokio runtime created ONCE outside benchmark - Measures code changes through consistent overhead - Bias (not noise) - cancels out in A/B comparisons 3. **Feature gating**: Run with --no-default-features to disable tracing - Even disabled tracing macros have overhead - Benchmark script uses: --no-default-features -F redb,websocket ## New Benchmarks - bench_packet_creation: Combined serialize + encrypt path - bench_memcpy: Baseline for data movement costs - bench_channel_try_send: Backpressure overhead - bench_packet_routing: HashMap lookup with N peers ## Helper Script Added scripts/run_benchmarks.sh: - Environment validation (CPU governor, turbo, isolated CPUs) - Automatic feature flag handling - Level selection (./run_benchmarks.sh level0 level1) ## Usage for A/B Testing ```bash git stash && ./scripts/run_benchmarks.sh level1 # baseline git stash pop && ./scripts/run_benchmarks.sh level1 # with change # Criterion shows delta automatically ```
@iduartgomez iduartgomez marked this pull request as ready for review December 5, 2025 19:26
@iduartgomez iduartgomez added this pull request to the merge queue Dec 5, 2025
Merged via the queue into freenet:main with commit b9309dc Dec 5, 2025
8 checks passed
@iduartgomez iduartgomez deleted the claude/transport-perf-analysis-01B67NMKH8bo7oHd9zWvZUds branch December 5, 2025 19:39
iduartgomez pushed a commit to iduartgomez/freenet-core that referenced this pull request Dec 5, 2025
Implement high-priority performance improvements from issue freenet#2226: 1. Increase inbound stream channel buffer from 1 to 64 - Addresses 27x throughput difference identified in benchmarks - Reduces channel contention for incoming packet fragments 2. Replace random nonce generation with counter-based approach - Uses atomic counter + random prefix for AES-GCM nonces - ~5.5x faster than random generation while maintaining uniqueness - Random prefix ensures uniqueness across process restarts - Counter ensures uniqueness within process lifetime These changes target the low-effort, high-impact optimizations from the transport layer performance analysis (PR freenet#2224).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants