Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(runtime) - Mitigate the receipt size limit bug #12633

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

jancionear
Copy link
Contributor

@jancionear jancionear commented Dec 17, 2024

There is a bug (#12606) which allows to create receipts which are slightly larger than max_receipt_size.

Let's mitigate the issue by making the runtime able to handle receipts that are above max_receipt_size.
First the size limit check is relaxed to check only new receipts. Then the receipt forwarding logic is modified to pretend that all receipts have congestion_size below max_receipt_size.
This mostly matches the code that was added in 2.3.1 and 2.4.0, but this PR contains one more change that is needed for bandwidth scheduler. Before bandwidth scheduler we were able to forward receipts that are slightly above the max_receipt_size because the big outgoing limit was 4.5 MiB and try_forward pretended that all receipts are at most 4 MiB. With bandwidth scheduler this changes, bandwidth scheduler expects all receipts to be at most max_receipt_size, otherwise there's no guarantee that the receipt will ever be forwarded. To deal with this there's the change which pretends that all receipts are at most max_receipt_size when generating bandwidth requests.
This is just a mitigation, a proper fix will be done in the future.

One more advantage of this patch is that it should allow us to reduce max_receipt_size without crazy protocol upgrade logic. We could reduce it to e.g 2MB and the scheduler would still be able to handle 4MB receipts created in the previous protocol version.

I added some of tests which try to create a receipt above max size and check that everything still works fine.

As usual the test_transaction_limit_for_local_congestion test broke when I added more methods to the test-contract-rs contract. For some reason this test is very sensitive to contract size and the congestion level rapidly drops when size of the contract increases. To deal with this I added a separate small contract that is used exclusively by these congestion control tests, that should fix the issue once and for all.

The change is divided into commits for easier review.

@jancionear jancionear requested review from nagisa and wacban December 17, 2024 02:51
@jancionear jancionear requested a review from a team as a code owner December 17, 2024 02:51
Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 89.74359% with 4 lines in your changes missing coverage. Please review.

Project coverage is 70.54%. Comparing base (88ec6c8) to head (48765df).

Files with missing lines Patch % Lines
runtime/runtime/src/congestion_control.rs 82.60% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #12633       +/-   ##
===========================================
+ Coverage    1.66%   70.54%   +68.87%     
===========================================
  Files         669      847      +178     
  Lines      119942   172848    +52906     
  Branches   119942   172848    +52906     
===========================================
+ Hits         2001   121933   +119932     
+ Misses     117838    45813    -72025     
- Partials      103     5102     +4999     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check 1.36% <0.00%> (-0.01%) ⬇️
linux 69.22% <66.66%> (+67.55%) ⬆️
linux-nightly 70.14% <89.74%> (?)
pytests 1.66% <0.00%> (-0.01%) ⬇️
sanity-checks 1.47% <0.00%> (-0.01%) ⬇️
unittests 70.37% <89.74%> (?)
upgradability 0.20% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// NewReceipt validation. Tolerates some receipts that wouldn't pass new validation. It has to
/// be less strict because:
/// 1) Older receipts might have been created before new validation rules.
/// 2) There is a bug which allows to create receipts that are above the size limit. Runtime has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's not possible to create them above the size limit but that the receipt may increase in size later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I used "create" in a broader sense, I wanted to say that it's possible for a large receipt to occur.
And when a max size value is returned, the created receipt is larger than max_receipt_size, so I'd say that counts as "create"

/// 2) There is a bug which allows to create receipts that are above the size limit. Runtime has
/// to handle them gracefully until the receipt size limit bug is fixed.
/// See https://github.com/near/nearcore/issues/12606 for details.
OldReceipt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I can see that the delayed and incoming receipts are validated but the buffered receipts are not, as far as I can see. Is it an omission?
  2. Is the validation for old receipts actually necessary or is it just an extra protection? Those receipts are a result of on chain computation so I would assume we trust them here already.

Copy link
Contributor Author

@jancionear jancionear Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think validating old receipts is just a sanity check. When an old receipt fails validation the runtime panics, so we don't expect this validation to ever fail. I think it's fine to not validate buffered receipts.

Comment on lines 402 to 403
if size > max_receipt_size {
if size > max_receipt_size {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this condition got duplicated

if size > max_receipt_size {
tracing::warn!(
target: "runtime",
"try_forward observed a receipt with size exceeding the size limit! receipt_id: {} size: {} size_limit: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use structured logging - set the values as fields rather than in the formatted message

tracing::warn!(
  receipt_id=?..
  size,
  max_size,
  "try_forward observed a receipt with size exceeding the size limit"
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find these structured logs a bit harder to read than manually formatted messages, but I guess that's the proper way to do it. Changed to structured.


// A function call will generate a new receipt. Size of this receipt will be equal to
// `max_receipt_size`, it'll pass validation, but then `output_data_receivers` will be modified and
// the receipt's size will go above max_receipt_size. The receipt should be rejected, but currently
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assert that this condition is actually triggered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert that there really was a receipt with size above max_receipt_size.

Comment on lines +1787 to +1788
/// Returns a value of size "value_size".
/// Accepts json args, e.g {"value_size": 1000}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just out of curiosity, why not pass it as simple integer instead of the json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Json is easy to use and it's readable. I find it easier to think about than manually serializing/deserializing integers. And it's easier to add more arguments to a function.

Comment on lines +7 to +8
#[allow(unused)]
extern "C" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is all of it needed?
  2. Is there any way we can reuse the existing one or refactor it to avoid copy pasting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there's a lot of new functions. I tried to use the existing ones, but I want to do very specific things and the existing ones don't really do what I want to do.
It'd be nice if there was an easy way to create a per-test contract, maybe inside the test code itself. One giant contract gets a bit chaotic :/

@jancionear jancionear requested a review from wacban January 5, 2025 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants