Skip to content

fix: Gateway fails to reset encrypted session when peer restarts with new identity #2277

@sanity

Description

@sanity

Problem

When a peer restarts with a new identity from the same IP:port, the gateway doesn't detect the identity change and continues trying to use the old session encryption keys. The new peer's handshake packets are silently dropped or fail decryption, preventing reconnection.

Impact

  • Peers behind NAT cannot reconnect after restart without also restarting the gateway
  • In production, this would require gateway restarts whenever peers restart, which is not scalable
  • The gateway accumulates stale connection entries

Recommended Approach: Test First

Before implementing a fix, create a test that reproduces this failure.

Suggested test location: crates/core/tests/ or extend freenet-test-network

#[tokio::test] async fn test_peer_reconnect_after_restart() { // 1. Start a gateway // 2. Start a peer, connect to gateway, note its identity // 3. Disconnect the peer, clear its state // 4. Restart peer (new identity, same IP:port) // 5. Assert: peer successfully reconnects within reasonable timeout // 6. Assert: gateway has updated peer identity in its connection table }

This test should FAIL with current code, then PASS after the fix.

Steps to Reproduce

  1. Start a gateway with debug logging:

    RUST_LOG="info,freenet::transport=debug" freenet network --is-gateway ...
  2. Start a peer that connects through the gateway (e.g., from behind NAT)

  3. Note the peer's identity (e.g., 5AMPifZWGRfoydocq) and source IP:port (e.g., 136.62.52.28:43227)

  4. Kill the peer, clean its state (rm -rf ~/.local/share/freenet/), and restart it

    • The new peer will have a different identity (e.g., gHKWtWM62CJgyM1U)
    • But same source IP:port due to NAT
  5. Observe the new peer fails to connect with "max connection attempts reached"

Evidence from Logs

Gateway thinks it has connection to OLD identity:

connect_peer: transport entry already has pub_key, tx: 01KCAPRFZHCHZ3KSW076W38100, peer_addr: 136.62.52.28:43227, existing_pub_key: Some(5AMPifZWGRfoydocq) 

New peer (different identity) fails handshake:

Outbound handshake failed: max connection attempts reached, peer_addr: 5.9.111.215:31337, attempts: 22, elapsed_ms: 3142, direction: "outbound" 

tcpdump shows packets ARE flowing both ways:

19:58:02.220516 eno1 In IP 136.62.52.28.43227 > 5.9.111.215.31337: UDP, length 256 19:58:02.222066 eno1 Out IP 5.9.111.215.31337 > 136.62.52.28.43227: UDP, length 74 

But the gateway logs show NO inbound messages from the new peer - the packets are being dropped at the transport layer because they can't decrypt with the old session keys.

Expected Behavior

When the gateway receives handshake packets that don't decrypt with the existing session:

  1. Detect this is a new peer identity attempting to connect
  2. Invalidate the old session for that IP:port
  3. Establish a new encrypted session with the new identity
  4. Complete the connection handshake

Environment

  • Gateway: freenet 0.1.44 (git: b75ea56-dirty)
  • Peer: freenet 0.1.44 (from crates.io)
  • Peer behind NAT (technic.locut.us -> 136.62.52.28:43227)
  • Gateway on public IP (nova.locut.us -> 5.9.111.215:31337)

Why CI Didn't Catch This

This bug requires:

  • A peer behind NAT (or same source IP:port after restart)
  • Peer restart with new identity
  • Multi-node test infrastructure

CI likely tests with freenet local or simulated networks that don't exercise NAT traversal or peer restart scenarios.

Suggested Fix

In the transport/connection handler, when receiving packets from a known IP:port that fail decryption:

  1. Check if this looks like a new handshake initiation
  2. If so, clear the stale session and attempt fresh handshake
  3. Add logging for "stale session detected, resetting connection"

[AI-assisted - Claude]

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-cryptoArea: cryptographyA-networkingArea: Networking, ring protocol, peer discoveryE-mediumExperience needed to fix/implement: Medium / intermediateP-highHigh priorityT-bugType: Something is broken

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions