Skip to content

[Feature] Improve transaction propagation #3962

@vicsn

Description

@vicsn

🚀 Feature

A core feature of Narwhal is that transmissions may be included in multiple proposals.

image

The current setup of snarkOS has an overreliance on safety, because clients and validators propagate valid seen transmissions to all of their peers, incurring compute, bandwidth, and generally a lot of latency. The graph below shows that certificate generation slows down significantly under load. Previous measurements have shown that this in turn is caused by nodes waiting on transmission fetching.

Image

We shouldn't get rid of the propagation entirely either. When running a load test with 6000 transmissions on reference hardware and no propagation at all, sometimes all transmissions would land within a few rounds, but more often only around 5750 would land, which is an indication some older certificates get left behind under stress and we need at least some propagation.

To improve network throughput, I propose the following:

  • We add a propagate: bool field to struct UnconfirmedTransaction
  • When clients receive a transaction via /broadcast/transaction, broadcast to all peers with propagate: true
  • When clients receive a transaction via the P2P network, they broadcast to all peers with propagate: false, but only if propagate: true on the incoming message
  • When validators receive a transaction via /broadcast/transaction, broadcast to all validators with propagate: false.
  • When validators receive a transaction via the P2P network, they broadcast to all peers with propagate: false, but only if propagate: true on the incoming message. Receiving validators should immediately add the transmission into cache_transmissions so they don't have to fetch it.
  • In order to ensure all transmissions land, validators periodically - say every PRIMARY_PING_IN_MS - include transmissions which they have not seen in any proposal/certificate/ledger yet from cache_transmissions. We may want to tackle this last point only after [Feature] Add metric for subdag width #3961 is done. This can also be done based on the validator index.

The above is also applicable to solutions.

Note that the above approach to optimistic broadcast only works as when validator's router's are well-connected. With large networks and peer limits of 21, that may not be the case, so we may want to also move transmission broadcasts to the Gateway.

Metadata

Metadata

Assignees

Labels

featureNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions