Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace engine.Unit with ComponentManager in Pusher Engine #6747

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tim-barry
Copy link

Using #4219 as an example. Instead of starting new goroutines or directly processing messages in a blocking way, add messages to a queue that a worker pulls from.

The Pusher engine still currently implements network.Engine rather than network.MessageProcessor.

Using #4219 as an example.
Instead of starting new goroutines or directly processing messages
in a blocking way, messages are added to a queue that a worker pulls from.
The Pusher engine still currently implements network.Engine rather than
network.MessageProcessor.
@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2024

Codecov Report

Attention: Patch coverage is 65.15152% with 23 lines in your changes missing coverage. Please review.

Project coverage is 41.13%. Comparing base (a3676ba) to head (b5418dd).

Files with missing lines Patch % Lines
engine/collection/pusher/engine.go 66.15% 21 Missing and 1 partial ⚠️
engine/testutil/mock/nodes.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6747      +/-   ##
==========================================
- Coverage   41.23%   41.13%   -0.11%     
==========================================
  Files        2060     1896     -164     
  Lines      182635   169042   -13593     
==========================================
- Hits        75316    69533    -5783     
+ Misses     101011    93862    -7149     
+ Partials     6308     5647     -661     
Flag Coverage Δ
unittests 41.13% <65.15%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@tim-barry tim-barry marked this pull request as draft November 21, 2024 17:43
tim-barry and others added 4 commits November 21, 2024 10:48
Because the event processing now happens in a worker, any errors
raised within it are no longer visible to the caller of Process().
Because the test checked for error status, moved the tests to the
same package and call the internal processing function directly.
When using Unit, calling Ready would also start the engine.
With ComponentManager, we additionally need to invoke Start.
@tim-barry tim-barry marked this pull request as ready for review November 22, 2024 18:56
asSCGMsg := asEngineWrapper.Payload.(*messages.SubmitCollectionGuarantee)
originID := asEngineWrapper.OriginID

_ = e.process(originID, asSCGMsg)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems incorrect to fully discard any error, should likely be logged instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should be handling all possible error cases. In general, all functions which return an error should document all possible expected error cases in their documentation. Callers which observe an error return that they don't know explicitly how to handle safely, should propagate the error further up the stack. Eventually this results in either the component or entire process restarting.

Here is the part of our coding guidelines talking about error handling: https://github.com/onflow/flow-go/blob/master/CodingConventions.md#error-handling. In short, the rule is to prioritize safety over liveness.

Practically, the way we would do this here is using the irrecoverable.SignalerContext. That has a Throw method which will signal to restart the component or process.

I usually propagate errors all the way up the stack to where the SignalerContext is defined (here, that would be in inboundMessageWorker) then do ctx.Throw(err). You can also use irrecoverable.Throw to throw any context.Context.

Change Suggestions:

  • In general, make sure all functions in the file which return an error document their expected error returns (or the lack of expected errors). Here is an example of error documentation.
  • Propagate the error value to inboundMessageWorker and Throw unexpected errors at that frame.

Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with your comment that discarding any error is bad. Please take a look at our Coding Conventions, if you haven't already. Tldr:

  • just discarding any error: very bad
  • logging error and continuing: still bad (this is the pattern of continuing on best effort basis, which is not a viable approach for high-assurance software systems)
  • desired:
    • explicitly check error type and only continue on errors that are confirmed to be benign.
    • all other errors (outside of the specified benign code path) are fatal

The pusher.Engine is a bit of a simpler case, because it only distributes information to other nodes. So swallowing errors does not undermine safety as it would for many other Engines. BUT, we already had cases where nodes were logging OutOfMemory errors for hours, but the code just dropped those and the node was dysfunctional without the software noticing it ... so even if it is not safety critical, we still generally expect that errors are not swallowed (and neither logged and then just discarded).

There are exceptions to this rule, e.g. if it would be a massive time investment to clear up technical debt and thoroughly documenting possible benign errors. But then, a detailed comment is absolutely required why it is always fine to log and discard any errors in that particular case.

I would suggest to leave error handling for a bit and come back to it after some iterations. After having implemented some of the cleanup and simplifications I suggest further below, I suspect that error handling will become a lot more straightforward.

// Done returns a done channel that is closed once the engine has fully stopped.
func (e *Engine) Done() <-chan struct{} {
return e.unit.Done()
func (e *Engine) processInboundMessages(ctx context.Context) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably make sure functions have doc comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please add a short godoc comment to each function (especially public functions, but ideally private ones too).

@AlexHentschel
Copy link
Member

Issue for the present PR: https://github.com/onflow/flow-go-internal/issues/7018 (it helps reviewers when its linked in the PR description, so they can read up on the context)

Copy link
Member

@jordanschalm jordanschalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall structure of changes looks good. I've added some thoughts related to transitioning to network.MessageProcessor, which we can discuss separately.

In terms of things to address in this PR:

  • Make sure all functions have minimal documentation, including a description of expected error returns (if the function returns any error).
  • Make sure invocations of a function that returns expected errors explicitly handles those errors.
  • Throw unexpected top-level errors to the SignalerContext

asSCGMsg := asEngineWrapper.Payload.(*messages.SubmitCollectionGuarantee)
originID := asEngineWrapper.OriginID

_ = e.process(originID, asSCGMsg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should be handling all possible error cases. In general, all functions which return an error should document all possible expected error cases in their documentation. Callers which observe an error return that they don't know explicitly how to handle safely, should propagate the error further up the stack. Eventually this results in either the component or entire process restarting.

Here is the part of our coding guidelines talking about error handling: https://github.com/onflow/flow-go/blob/master/CodingConventions.md#error-handling. In short, the rule is to prioritize safety over liveness.

Practically, the way we would do this here is using the irrecoverable.SignalerContext. That has a Throw method which will signal to restart the component or process.

I usually propagate errors all the way up the stack to where the SignalerContext is defined (here, that would be in inboundMessageWorker) then do ctx.Throw(err). You can also use irrecoverable.Throw to throw any context.Context.

Change Suggestions:

  • In general, make sure all functions in the file which return an error document their expected error returns (or the lack of expected errors). Here is an example of error documentation.
  • Propagate the error value to inboundMessageWorker and Throw unexpected errors at that frame.

Comment on lines +54 to +58
// TODO length observer metrics
inbound, err := fifoqueue.NewFifoQueue(1000)
if err != nil {
return nil, fmt.Errorf("could not create inbound fifoqueue: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add those length metrics now for completeness. You can use this engine as an example. We'll also need to add a parameter for the mempoolMetrics to the constructor.

// Done returns a done channel that is closed once the engine has fully stopped.
func (e *Engine) Done() <-chan struct{} {
return e.unit.Done()
func (e *Engine) processInboundMessages(ctx context.Context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please add a short godoc comment to each function (especially public functions, but ideally private ones too).

Comment on lines +49 to +51
// TODO convert to network.MessageProcessor
var _ network.Engine = (*Engine)(nil)
var _ component.Component = (*Engine)(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be a good exercise to make the change to MessageProcessor here, though likely as a separate PR.

I'll add some thoughts below; I suggest we spend some time discussing this live as well. There is no need to implement any of the comments below before we have a chance to discuss.

The pusher engine is unusual in a couple of ways:

  1. It is using an old framework for internal message-passing, where two components in the same process would pass messages between each other using standard methods (SubmitLocal and ProcessLocal). The meaning of the message was carried by its type (here: messages.SubmitCollectionGuarantee) and the logical structure was similar to that of messages received over the network. Message processing logic needed to potentially handle inputs originating both locally and remotely.
  2. It does not expect to receive any inbound messages from non-local sources. It is send-only from a networking perspective.

Replacing network.Engine with network.MessageProcessor there are a few opportunities for simplifying the implementation:

  • Currently the internal message-passing is done through SubmitLocal. We can remove SubmitLocal and ProcessLocal and replace them with a context-specific function (and corresponding interface type, for use in the caller).
    • A reference example is the Compliance interface, which is exposed by the compliance engine for local message-passing.
    • For this reason, engine.MessageHandler may not be needed in this context. It is better suited for cases where we are handling external messages.
    • We then do not need to check originIDs and do not need both onSubmitCollectionGuarantee and SubmitCollectionGuarantee methods
    • Now, this context-specific function is the only way for local messages to enter the pusher engine, which will simplify our network message processing logic 👇
  • In order to satisfy network.MessageProcessor, we need a Process method, which accepts messages from remote network sources. Since this engine does not expect any remote messages, we should reject all inputs to Process.
    • In general, when we observe an unexpected message from another node, we should flag it as a possible Byzantine message. In the future, these messages would result in a slashing challenge targeting the originator of the message. For now, we can log a warning with the relevant log tag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Had a similar thought 👉 my comment

// started.
func (e *Engine) Ready() <-chan struct{} {
return e.unit.Ready()
func (e *Engine) inboundMessageWorker(ctx irrecoverable.SignalerContext, ready component.ReadyFunc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short godoc for this method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

})
err := e.messageHandler.Process(e.me.NodeID(), event)
if err != nil {
engine.LogError(e.log, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
engine.LogError(e.log, err)
// TODO: do not swallow errors here, remove this function when transitioning to `network.MessageProcessor`
engine.LogError(e.log, err)

})
err := e.messageHandler.Process(originID, event)
if err != nil {
engine.LogError(e.log, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
engine.LogError(e.log, err)
// TODO: do not swallow errors here, remove this function when transitioning to `network.MessageProcessor`
engine.LogError(e.log, err)

err := e.messageHandler.Process(originID, event)
if err != nil {
engine.LogError(e.log, err)
}
}

// ProcessLocal processes an event originating on the local node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we are directly returning the result of MessageHandler.Process. That function documents a possible expected error return type. At this point we should either:

  • include that expected error type as a possible return value from this function
  • explicitly handle the expected error, or explain why it is not actually expected in this specific context

In this case, the event is originating from a local source (another component in the same process). In general we trust locally generated inputs, so here we could add a comment explaining that, in this context, an IncompatibleInputTypeError should be treated as an exception.

Here is an example where we observe a sentinel error from function "f" and document why, in the context of the caller of "f", that error is an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of improving the implementation of the deprecated Engine interface, I think we should change the implementation to an application-specific interface (see this and this comment)

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. My comments are not a full review (yet). Rather I am aiming at highlighting the areas to pay attention to. When working on an issue, part of the goal is to address the issue, though fixing the issue is frequently not sufficient: one of the major challenges with implementing a high-assurance software system such as Flow is managing intellectual complexity, code complexity and component isolation. When an engineer is required to grok the entire code base or a large portion thereof in order to make any change, then we have lost the ability to extend and maintain the code base on business-relevant time scales.

The Pusher Engine is from a very early time when most engineers knew most of the code. As the code base grows more complex, by adding functionality towards the mature architecture, we need more precision and rigour in the implementation.

We appreciate that it is super hard for new engineers to identify technical debt as such. Therefore, I have provided detailed context why certain aspects are technical debt. As you grow more experienced and familiar with our best practises, it will become easier to spot tech debt. Nevertheless, I want to be clear about expectations: cleaning up tech debt in/around the code you touch as time permits. Always leave the code in a better state than you found it, especially wrt:

  • documentation in general (fixing outdated goDoc and adding missing docs)
  • error documentation in particular and error handling
  • code clarity and removing unnecessary abstraction layers
  • clearly apparent reasoning for correctness (either by the code itself or through added documentation)
  • API safety (is it hard to use the API incorrectly?)
  • test coverage

For quite some time, we will be helping you with spotting areas of tech debt and how to address them. If you see something that looks like tech debt and you don't know how to fix is, please ask - I loved your comment here. I can give you one recommendation: if you don't see anything to be improved at a piece of code (aside from implementing your issue) you should probably ask somebody for input. The code we touch tends to generally be older code (because the newer code often has already the desired features) and the older code is generally in dire need for cleaning up tech debt. Its frequent that engineers spend just as much time, frequently even longer, cleaning up tech debt than working on the actual issue. I think this will also be the case for the Pusher Engine.

Comment on lines 29 to 31
// Engine is the collection pusher engine, which provides access to resources
// held by the collection node.
type Engine struct {
Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very very ambiguous phrasing: "provides access to resources held by the collection node." By which means does it "provides access" (request-response?). What specific "resources" does the engine provide access to (all of the resources, many, ...?).

Suggested change
// Engine is the collection pusher engine, which provides access to resources
// held by the collection node.
type Engine struct {
// Engine is part of the Collection Nodes. It broadcasts finalized collections
// that the cluster generates to the broader network.
type Engine struct {

asSCGMsg := asEngineWrapper.Payload.(*messages.SubmitCollectionGuarantee)
originID := asEngineWrapper.OriginID

_ = e.process(originID, asSCGMsg)
Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with your comment that discarding any error is bad. Please take a look at our Coding Conventions, if you haven't already. Tldr:

  • just discarding any error: very bad
  • logging error and continuing: still bad (this is the pattern of continuing on best effort basis, which is not a viable approach for high-assurance software systems)
  • desired:
    • explicitly check error type and only continue on errors that are confirmed to be benign.
    • all other errors (outside of the specified benign code path) are fatal

The pusher.Engine is a bit of a simpler case, because it only distributes information to other nodes. So swallowing errors does not undermine safety as it would for many other Engines. BUT, we already had cases where nodes were logging OutOfMemory errors for hours, but the code just dropped those and the node was dysfunctional without the software noticing it ... so even if it is not safety critical, we still generally expect that errors are not swallowed (and neither logged and then just discarded).

There are exceptions to this rule, e.g. if it would be a massive time investment to clear up technical debt and thoroughly documenting possible benign errors. But then, a detailed comment is absolutely required why it is always fine to log and discard any errors in that particular case.

I would suggest to leave error handling for a bit and come back to it after some iterations. After having implemented some of the cleanup and simplifications I suggest further below, I suspect that error handling will become a lot more straightforward.

Comment on lines +61 to +64
messageHandler := engine.NewMessageHandler(log, notifier, engine.Pattern{
Match: engine.MatchType[*messages.SubmitCollectionGuarantee],
Store: &engine.FifoMessageStore{FifoQueue: inbound},
})
Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to not use the MessageHandler here for the following reasons:

  • I think it is good to apply the patter of the MessageHandler at least once by hand ("learning by doing")
  • The MessageHandler adds a layer of abstraction, if we can do without it that would be preferable in my opinion.
  • In this particular case, the Engine only processes a single input type SubmitCollectionGuarantee. We have tried it before and the message handler is actually more code compared to just doing the same logic by hand.

Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the core processing logic in the Engine and lets try to remove layers of complexity and indirection where possible:

func (e *Engine) onSubmitCollectionGuarantee(originID flow.Identifier, req *messages.SubmitCollectionGuarantee) error {
if originID != e.me.NodeID() {
return fmt.Errorf("invalid remote request to submit collection guarantee (from=%x)", originID)
}
return e.SubmitCollectionGuarantee(&req.Guarantee)
}
// SubmitCollectionGuarantee submits the collection guarantee to all consensus nodes.
func (e *Engine) SubmitCollectionGuarantee(guarantee *flow.CollectionGuarantee) error {

Looking at this code raises a few questions:

  • we are only propagating collections that are originating from this node (👉 code). But the exported method SubmitCollectionGuarantee is not checking this? Why not?

    • If it is an important check, we should always check it. If we are fine with collections from other nodes being broadcast, then we don't need the check in onSubmitCollectionGuarantee either..
    • turns out, SubmitCollectionGuarantee is only ever called by the pusher.Engine itself (not even the tests call it).

    Lets simplify this: merge SubmitCollectionGuarantee and onSubmitCollectionGuarantee.

  • Further digging into the code, I found that the originID is just completely useless in this context:

    • the pusher.Engine is fed only from internal components (so originID is always filled with the node's own identifier)
    • the pusher.Engine drops messages that are not from itself. Just hypothetically, lets assume that there were collections with originID other than the node's own ID. ❗Note that those messages would be queued first, wasting resources and then we have a different thread discard the Collection after dequeueing. Checks like that should always be applied before queueing irrelevant messages.

    What you see here is a pattern we struggle with a lot: the code is very ambiguous and allows a whole bunch of stuff that is never intended to happen in practise. This is an unsuitable design for engineering complex systems such as Flow, as we then have to manually handle undesired cases.

    • Example 1: messages other than SubmitCollectionGuarantee. In comparison, if we had a method that only accepted SubmitCollectionGuarantee, the compiler would enforce this constraint for us as opposed to us having to write code. Often, people don't bother to manually disallow undesired inputs, resulting in software that is doing something incorrect in case of bugs as opposed to crashing (or the code not compiling in the first place)
    • Example 2: originID. We employ a generic patter which obfuscates that we don't need an originID in this particular example, because the Engine interface does not differentiate between node-external inputs (can be malicious) vs inputs from other components within its own node (always trusted) and mixes modes of processing (synchronous vs asynchronous).

    For those reasons, we deprecated the Engine interface. Lets try to update the pusher and remove the engine interface (for concrete steps towards this, please see my next comment).

For Flow, the APIs should generally be as descriptive and restrictive as possible and make it hard to feed in wrong / mismatching inputs. Writing extra lines of boilerplate code is not a challenge for Flow. In contrast, it is challenging if you have to dig through hundreds or thousands of lines of code because the APIs are too generic and you need to know implementation details to use them correctly.

Comment on lines 130 to 131
// SubmitLocal submits an event originating on the local node.
func (e *Engine) SubmitLocal(event interface{}) {
Copy link
Member

@AlexHentschel AlexHentschel Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an exercise, try to figure out which other components are feeing inputs into the pusher.Engine. We are lucky here because we have a rarely used message type SubmitCollectionGuarantee. But just imagine you would see a log that says no matching processor for message of type flow.Header from origin. With the Engine interface it is way too time intensive to figure out which components inside a node are feeding which other components with data.

Turns out that the Finalizer is the only component that is feeding the pusher.Engine (other than messages arriving from the networking layer). Let's make our lives easier in the future by improving the code:

  • introduce a dedicated interface GuaranteedCollectionPublisher. It only has one type-safe method
    SubmitCollectionGuarantee(guarantee *flow.CollectionGuarantee)
    this method just takes the message and puts it into the queue. Then we know that the queue will only ever contain elements of type flow.CollectionGuarantee. (currently, the Finalizer creates the wrapper messages.SubmitCollectionGuarantee around CollectionGuarantee ... and the pusher.Engine unwraps it and throws the wrapper away 😑 ).
  • change the Finalizer and all layers in between to work with the GuaranteedCollectionPublisher interface. This improves code clarity: currently, engineers would need to know what functionality was backing the generic Engine interface, e.g. here: because the right type of inputs need to be fed into the pusher.Engine. When using the GuaranteedCollectionPublisher interface we have the following benefits:
    • the type itself carries information about what functionality is behind this interface. Huge improvement of clarity.
    • its easy to find all places that can feed the Pusher with data, because we work with concrete interfaces and context-specific methods (e.g. SubmitCollectionGuarantee).
    • the compiler will reject code that feeds inputs with incompatible type into the Pusher
  • In my opinion, it would improve clarity if we were to discard all inputs from the network.MessageProcessor - conceptually they are messages from different node, which they are already broadcasting. The Pusher is for broadcasting our own messages. But at the moment, inputs from node-internal components as well as messages from other nodes passing through the networking layer share the same code path in the pusher.Enging, which makes it non-trivial to figure out where messages are coming from.

To summarize: one of the main benefits of the Component interface is that it provides lifecycles methods, but no methods for ingesting data - on purpose. This is because all data from components within the node should be passed via context-specific methods. The only exception is the interface network.MessageProcessor, which would still call Process with a generic message type.

Sorry for the lengthy comment. I hope you see how the very ambiguous structure of the Engine makes it really hard to determine from the code what we are expecting and where to drop messages, because its all hidden behind layers of ambiguous interfaces in a huge code base.

func New(log zerolog.Logger, net network.EngineRegistry, state protocol.State, engMetrics module.EngineMetrics, colMetrics module.CollectionMetrics, me module.Local, collections storage.Collections, transactions storage.Transactions) (*Engine, error) {
// TODO length observer metrics
inbound, err := fifoqueue.NewFifoQueue(1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the queue capacity we should pick here. I think we should have a good argument (and document it!) of why the number we chose makes sense. If we don't want to go through the trouble of defining a sensible limit, I think we could make this clear by using an unbounded queue (implementation).

Anyway, I think a limit makes sense:

  • In a nutshell, a collection must reference a block and expires $\texttt{DefaultTransactionExpiry}$ = 600 blocks thereafter
    // Let E by the transaction expiry. If a transaction T specifies a reference
    // block R with height H, then T may be included in any block B where:
    // * R<-*B - meaning B has R as an ancestor, and
    // * R.height < B.height <= R.height+E
    const DefaultTransactionExpiry = 10 * 60
    (its a subtle mechanic, feel free to ignore)
  • consensus nodes produce about $\rho$ = 1.25 blocks per second
  • collector clusters produce up to $\varrho$ = 3 collections per second

So very very roughly, between a collection being first proposed (usually referencing the newest known block) to the collection expiring, the collector cluster will produce $\texttt{DefaultTransactionExpiry}\cdot \rho^{-1} \cdot \varrho = 1440$ collections. So if we queued 1500 collections max, then we would be relatively confident that collections have been broadcast before they expire.
The downside is that collections might sit in the queue for quite some time, if several hundreds are before them. That could be an argument of queuing only much fewer collections (e.g. 200), so collection that don't make it out fast are dropped.
Another argument we could make is actually using a LIFO queue. If the queue is nearly empty, there is little difference. However, if there is a large backlog of collections to be broadcast, I think its a valid argument of broadcasting the newest first, at least then, some collections make it in time as opposed to all of them being queued past their expiry. Just some thoughts.

@jordanschalm what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants