-
- Notifications
You must be signed in to change notification settings - Fork 110
perf: add suite for transport performance #2224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
iduartgomez merged 19 commits into freenet:main from iduartgomez:claude/transport-perf-analysis-01B67NMKH8bo7oHd9zWvZUds Dec 5, 2025
Merged
perf: add suite for transport performance #2224
iduartgomez merged 19 commits into freenet:main from iduartgomez:claude/transport-perf-analysis-01B67NMKH8bo7oHd9zWvZUds Dec 5, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
- Analyzed test_small_network_get_failure failure modes - Identified gateway crash root cause (fixed in a283e23) - Documented PUT operation timeout issues - Provided recommendations for re-enabling the test - Suggested modernization using #[freenet_test] macro Related: freenet#2023, freenet#2043, freenet#2036, freenet#2011
Re-enabled the previously ignored test with key improvements: - Removed #[ignore] attribute - recent fixes should resolve issues - Increased PUT timeout: 30s → 90s (accounts for connection delays) - Increased overall test timeout: 120s → 180s (3 minutes) - Added detailed error messages for better debugging - Added documentation of recent fixes that resolved the issues Recent fixes that should prevent failures: - a283e23: Fixed gateway crashes during timeout notifications - 615f02d: Fixed PUT response routing through forwarding peers - 5734a33: Fixed local caching before forwarding PUTs Related: freenet#2023
The freenet-ping contract failed to compile because freenet-ping-types uses freenet_stdlib::time::now() when 'std' feature is disabled, but the 'contract' feature wasn't propagated to freenet-stdlib. Changes: - Added 'contract' feature to freenet-ping-types Cargo.toml - Enabled 'contract' feature in ping contract's types dependency - This allows WASM contract compilation to access time::now() function Fixes compilation error when test_small_network_get_failure loads and compiles the ping contract at runtime.
Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>
Cleaned up redundant comments throughout the test file that were explaining self-evident code. Kept the TODO comment as it's actionable. Co-authored-by: nacho.d.g <iduartgomez@users.noreply.github.com>
…enet#2023 This change annotates test_small_network_get_failure with test_log to capture test execution logs in CI. The test passes locally but fails in CI, and these logs will help us debug the issue. Changes: - Add test-log 0.2 to dev-dependencies in freenet-ping-app - Replace manual logger setup with #[test_log::test] attribute - Remove unused LevelFilter import - Logs will now be captured and displayed on test failure This will help us understand what's happening during CI test execution and identify the root cause of issue freenet#2023.
- Update Cargo.lock to include test-log dependency changes - Fix formatting (remove extra blank line in test function) - Resolves CI build failure with --locked flag This addresses the CI error: "the lock file needs to be updated but --locked was passed"
Removes unnecessary implementation notes that were no longer relevant after previous fixes to connection management and timeout handling. Related to issue freenet#2023 investigation.
Update lock file after rebasing PR freenet#2055 onto latest main.
…LuLNaER7JBmDiLQXuDAtpv fix: critical operation state management issues (freenet#1977)
This commit adds comprehensive documentation and tooling for analyzing and measuring Freenet transport layer performance. ## Analysis Document (docs/architecture/transport_perf_analysis.md) Documents the current transport architecture and identifies several performance bottlenecks: - No batch I/O (recvmmsg/sendmmsg): 100x syscall overhead - Channel buffer=1 for streams: causes packet drops under load - Over-conservative metadata reservation: 5.4% capacity loss - No GSO/GRO support: CPU bottleneck at high pps - Naive rate limiting: no congestion avoidance - Disabled global rate limiter: no global flow control ## Benchmark Suite (crates/core/benches/transport_perf.rs) Introduces criterion-based benchmarks in four categories: 1. Microbenchmarks: AES-GCM encryption/decryption, nonce generation, bincode serialization 2. Component benchmarks: mpsc channel throughput, syscall overhead 3. End-to-end benchmarks: placeholder for full transport throughput and latency distribution tests 4. Stress tests: maximum packet rate measurement Run with: cargo bench -p freenet --bench transport_perf
Adds comprehensive documentation on how to benchmark transport layer performance while isolating from OS/virtualization noise. ## Benchmark Levels Introduces a 4-level hierarchy from pure computation to full stack: - Level 0 (Pure Logic): Zero I/O, measures encryption/serialization - Completely deterministic, safe for CI - 2% noise threshold - Level 1 (Mock I/O): In-process channels, no syscalls - Measures protocol overhead without kernel - 5% noise threshold - Level 2 (Loopback): Real sockets on 127.0.0.1 - Includes syscall overhead, no NIC - 10% noise threshold - Level 3 (Stress): System limit testing - Highly environment-dependent - 15% noise threshold ## Key Insights - Docker on Mac/Windows adds 20-50% overhead (hidden VM) - CPU isolation (isolcpus) critical for reproducibility - Use MockSocket infrastructure for protocol-only measurements - Virtualization overhead: 5-30% depending on config ## Usage ```bash # CI-safe benchmarks only cargo bench --bench transport_perf -- "level0/" "level1/" # Full suite (bare metal, isolated CPUs) cargo bench --bench transport_perf ```
Rewrites transport benchmarks to minimize measurement noise: ## Noise Reduction Techniques 1. **Level 0 (Pure Logic)**: Zero I/O, zero async - Pre-allocated buffers reused across iterations - Fixed keys/nonces (measuring crypto, not RNG) - std::hint::black_box for DCE prevention - 2% noise threshold 2. **Level 1 (Mock I/O)**: For comparative measurement - Uses iter_batched to separate setup from hot path - tokio runtime created ONCE outside benchmark - Measures code changes through consistent overhead - Bias (not noise) - cancels out in A/B comparisons 3. **Feature gating**: Run with --no-default-features to disable tracing - Even disabled tracing macros have overhead - Benchmark script uses: --no-default-features -F redb,websocket ## New Benchmarks - bench_packet_creation: Combined serialize + encrypt path - bench_memcpy: Baseline for data movement costs - bench_channel_try_send: Backpressure overhead - bench_packet_routing: HashMap lookup with N peers ## Helper Script Added scripts/run_benchmarks.sh: - Environment validation (CPU governor, turbo, isolated CPUs) - Automatic feature flag handling - Level selection (./run_benchmarks.sh level0 level1) ## Usage for A/B Testing ```bash git stash && ./scripts/run_benchmarks.sh level1 # baseline git stash pop && ./scripts/run_benchmarks.sh level1 # with change # Criterion shows delta automatically ```
6 tasks
iduartgomez pushed a commit to iduartgomez/freenet-core that referenced this pull request Dec 5, 2025
Implement high-priority performance improvements from issue freenet#2226: 1. Increase inbound stream channel buffer from 1 to 64 - Addresses 27x throughput difference identified in benchmarks - Reduces channel contention for incoming packet fragments 2. Replace random nonce generation with counter-based approach - Uses atomic counter + random prefix for AES-GCM nonces - ~5.5x faster than random generation while maintaining uniqueness - Random prefix ensures uniqueness across process restarts - Counter ensures uniqueness within process lifetime These changes target the low-effort, high-impact optimizations from the transport layer performance analysis (PR freenet#2224).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Transport Layer Performance Analysis and Benchmark Suite
Summary
This PR introduces a comprehensive performance analysis of the Freenet transport layer, identifying critical bottlenecks and providing a benchmark suite for ongoing performance monitoring.
Key Finding: Syscall overhead (70% of per-packet time) is the primary bottleneck, not encryption or serialization.
Benchmark Results
Per-Packet Time Breakdown (1364 bytes)
Current Throughput Limits
Critical Issues Identified
1. Channel Buffer Size = 1 (HIGH PRIORITY)
Location:
crates/core/src/transport/peer_connection.rs:592Impact: Benchmark shows 27x throughput difference between
channel(1)andchannel(100):Fix: Change to
mpsc::channel(100)- trivial, zero-risk change.2. Random Nonce Generation (MEDIUM PRIORITY)
Location:
crates/core/src/transport/packet_data.rs:151Benchmark:
Impact: 5.5x faster with counter-based nonces. Counter nonces are equally secure for AES-GCM when combined with a unique key per connection.
Fix: Use atomic counter + connection ID for nonce generation.
3. No Syscall Batching (HIGH PRIORITY - LARGER EFFORT)
Current: One
send()syscall per packet = 12.7µs overhead each.Solution: Use
sendmmsg()/recvmmsg()to batch multiple packets per syscall.Expected Impact: 10x reduction in syscall overhead, potentially reaching 800K+ pps.
Implementation: Requires
socket2crate and platform-specific code paths.4. Serialization Overhead (MEDIUM PRIORITY - LARGER EFFORT)
Bincode serialization is 21% of packet creation time (3.82µs for 1364 bytes).
Potential Optimizations:
Files Added
Documentation
docs/architecture/transport_perf_analysis.md- Full architecture analysisdocs/architecture/transport_benchmark_methodology.md- Benchmarking methodologyBenchmarks
crates/core/benches/transport_perf.rs- Criterion benchmark suiteScripts
scripts/run_benchmarks.sh- Helper script with environment validationBenchmark Suite
Levels
Running
Immediate Action Items
Trivial Fixes (< 1 hour each)
Change channel buffer 1 → 100 in
peer_connection.rs:592Switch to counter-based nonces in
packet_data.rsMedium-Term (1-2 days each)
Implement sendmmsg/recvmmsg batching
socket2crate, platform detectionIncrease inbound channel buffers in
connection_handler.rsLonger-Term
Validation
The benchmark suite allows validating any optimization:
Related
connection_handler.rs:339connection_handler.rs:155-160peer_connection.rs:35andoutbound_stream.rs:53-57