Skip to content

Conversation

@sanity
Copy link
Collaborator

@sanity sanity commented Dec 5, 2025

Problem

CI occasionally fails on test_three_node_network_connectivity due to duplicate transport keypairs being generated for different nodes. When two nodes have the same public key, routing breaks and PUT operations time out.

Evidence from CI run 19970986958:

  • Gateway key: v6MWKgqJxtAjv5xt
  • Peer1 key: v6MWKgqK9qaZcuaY
  • Peer2 key: v6MWKgqK9qaZcuaYSame as peer1!

The test shows all nodes connecting successfully (full mesh with 2 connections each), but PUT operations hang because routing fails when two nodes share the same identity.

The issue is intermittent - the test passes consistently locally and passed on CI re-run.

Root Cause Investigation

TransportKeypair::new() uses OsRng which should give unique keys each time. However, in CI environments under load, RNG behavior can sometimes produce collisions due to:

  • Entropy exhaustion
  • Timing issues
  • Unknown environmental factors on self-hosted runners

This Solution

Generate all transport keypairs upfront at the start of each test and verify they're unique before proceeding:

  1. Early detection: Catches the rare RNG collision immediately with a clear, actionable error message
  2. Centralized generation: Uses pre-generated keys from a vector instead of inline generation
  3. Zero overhead for passing tests: The uniqueness check is a fast HashSet operation

The fix is in the #[freenet_test] macro codegen, so all tests using the macro benefit from this protection.

Testing

  • cargo test --package freenet --test connectivity passes (4 tests)
  • cargo test --package freenet --test operations passes (9 tests)
  • All pre-commit checks pass (fmt, clippy)

[AI-assisted - Claude]

## Problem CI occasionally fails on `test_three_node_network_connectivity` due to duplicate transport keypairs being generated for different nodes. When two nodes have the same public key, routing breaks and PUT operations time out. This was observed in CI run 19970986958 where peer1 and peer2 both had key `v6MWKgqK9qaZcuaY`. The issue is intermittent and appears to be related to RNG behavior in the CI environment (entropy exhaustion, timing, or other factors). The test passes consistently locally and on CI re-run. ## Solution Generate all transport keypairs upfront at the start of each test and verify they are unique before proceeding. This approach: 1. Catches the rare RNG collision immediately with a clear error message 2. Uses pre-generated keys from the vector instead of inline generation 3. Provides actionable diagnostics if the issue recurs The fix is in the `#[freenet_test]` macro codegen, so all tests using the macro will benefit from this protection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
@sanity sanity added this pull request to the merge queue Dec 5, 2025
Merged via the queue into main with commit e149ec2 Dec 5, 2025
8 checks passed
@sanity sanity deleted the fix-flaky-key-collision branch December 5, 2025 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants