-
- Notifications
You must be signed in to change notification settings - Fork 111
Closed
Labels
A-networkingArea: Networking, ring protocol, peer discoveryArea: Networking, ring protocol, peer discoveryE-hardExperience needed to fix/implement: Hard / a lotExperience needed to fix/implement: Hard / a lotP-criticalCritical priorityCritical priorityS-blockedStatus: Blocked by external dependency or other issueStatus: Blocked by external dependency or other issueT-bugType: Something is brokenType: Something is broken
Description
Problem
GET request responses are being sent but not reaching the requesting peer, causing timeouts. This manifests as flaky CI failures in test_put_contract_three_hop_returns_response and production GET request timeouts.
Symptoms
- CI flakiness:
test_put_contract_three_hop_returns_responsefails intermittently with "Timeout waiting for get response" - Production issues: GET requests timeout even when the contract exists on the network
What the logs show
From CI failure analysis:
21:42:00.826 - peer-c sends RequestGet to peer-a (via routing) 21:42:00.828 - peer-a receives RequestGet 21:42:00.829 - peer-a sends ReturnGet back to peer-c via explicit address (NAT routing) 21:42:00.829 - "Message successfully sent to peer connection via explicit address" ... 45 seconds pass ... 21:42:45.827 - "Attempt 2/3 to GET from peer C" (timeout, retry) ... same pattern repeats ... 21:43:32.829 - "Attempt 3/3 to GET from peer C" ... final timeout, test fails ... The response is logged as "successfully sent" but never arrives at peer-c.
Network topology in test
gateway <---> peer-a (has contract) <---> peer-c (requesting) - peer-c initiates GET
- Request routes through to peer-a (contract location)
- peer-a finds contract, sends ReturnGet
- ReturnGet never reaches peer-c
Hypothesis
The issue appears to be in the return path. Possibilities:
- Connection lookup mismatch: The connection used to send the response may not be the same connection peer-c is listening on
- NAT routing address confusion: The
target_addrused for "explicit address" routing may be stale or incorrect - Message serialization/delivery: The message is queued but not actually delivered
- Channel closure: The receiving channel on peer-c may be closed or not being polled
Key code paths to investigate
handle_notification_msginp2p_protoc- handles routing of responsessend_to_peer_connection- the "successfully sent" log comes from here- Connection management - how connections are looked up by address
- The
conn_bridge_rxchannel handling for outbound messages
Reproduction
Run the test multiple times:
for i in {1..10}; do cargo test -p freenet test_put_contract_three_hop_returns_response -- --nocapture 2>&1 | tail -5 doneFails ~50% of the time in CI.
Impact
- Blocks release 0.1.44 (PR build: release 0.1.44 #2240)
- Affects production reliability of GET operations
- Related to overall network message delivery reliability
Related
- PR build: release 0.1.44 #2240 (release blocked by this)
- Issue fix: gracefully handle AddrInUse error instead of panicking on startup #2237 (AddrInUse panic - separate issue)
- PR fix: nat traversal timing and acceptor address bugs #2239 (NAT traversal timing fix - merged, may be related)
[AI-assisted - Claude]
Metadata
Metadata
Assignees
Labels
A-networkingArea: Networking, ring protocol, peer discoveryArea: Networking, ring protocol, peer discoveryE-hardExperience needed to fix/implement: Hard / a lotExperience needed to fix/implement: Hard / a lotP-criticalCritical priorityCritical priorityS-blockedStatus: Blocked by external dependency or other issueStatus: Blocked by external dependency or other issueT-bugType: Something is brokenType: Something is broken
Type
Projects
Status
Done