go/runtime: Support hot-loading of runtime bundles #5961

peternose · 2024-12-06T22:35:57Z

Example status output:

"bundles": [
  {
    "name": "test-runtime",
    "id": "8000000000000000000000000000000000000000000000000000000000000000",
    "components": [
      {
        "kind": "ronl",
        "version": {}
      },
      {
        "kind": "rofl",
        "version": {}
      }
    ]
  }
],

Since runtime versions can be configured dynamically, and none of them may be active at a given moment, this function has no meaningful purpose in its current form and how it is currently used.

Extracted the creation of the provisioner, host info and caching quote service into a dedicated helper function to improve code readability.

Fixes a bug that occurs when attempting to abort the runtime immediately after it starts.

netlify · 2024-12-06T22:36:12Z

✅ Deploy Preview for oasisprotocol-oasis-core canceled.

Name	Link
🔨 Latest commit	`979c1c2`
🔍 Latest deploy log	https://app.netlify.com/sites/oasisprotocol-oasis-core/deploys/675acb25e1adc00008166450

The node can now fetch and verify runtime bundles from remote repositories and automatically update to new versions.

codecov · 2024-12-07T03:17:08Z

Codecov Report

Attention: Patch coverage is 72.08589% with 273 lines in your changes missing coverage. Please review.

Project coverage is 65.33%. Comparing base (ba6735a) to head (35d2d45).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
go/runtime/bundle/discovery.go	61.45%	75 Missing and 26 partials ⚠️
go/runtime/registry/registry.go	76.30%	24 Missing and 17 partials ⚠️
go/runtime/registry/config.go	70.33%	24 Missing and 11 partials ⚠️
go/runtime/bundle/registry.go	90.06%	9 Missing and 6 partials ⚠️
go/worker/common/committee/node.go	54.16%	9 Missing and 2 partials ⚠️
go/worker/keymanager/worker.go	65.51%	9 Missing and 1 partial ⚠️
go/runtime/host/sandbox/sandbox.go	61.90%	6 Missing and 2 partials ⚠️
go/runtime/host/tdx/qemu.go	0.00%	8 Missing ⚠️
go/runtime/registry/host.go	75.75%	4 Missing and 4 partials ⚠️
go/runtime/bundle/bundle.go	28.57%	4 Missing and 1 partial ⚠️
... and 13 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5961      +/-   ##
==========================================
+ Coverage   64.69%   65.33%   +0.63%     
==========================================
  Files         629      631       +2     
  Lines       64297    64752     +455     
==========================================
+ Hits        41599    42304     +705     
+ Misses      17775    17485     -290     
- Partials     4923     4963      +40

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ptrus

(self-note): only reviewed the discovery part now

go/runtime/bundle/discovery.go

ptrus · 2024-12-08T09:17:56Z

go/runtime/bundle/discovery.go

+		"filename", filename,
+	)
+
+	if err := d.registry.AddBundle(src, manifestHash); err != nil {


It is a bit scary that AddBundle needs to "Open" the (untrusted) bundle before verifying the manifest hash. I am not sure, but the current implementation of Open might assume that bundle came from a (somewhat) trusted source, which might not be the case in future. We should ensure that Open is safe to use with malicious inputs.

Somehwat related, I remember we talked about 38edded . Now that I have more context, shouldn't that actually be the hash of the entire bundle (not just the manifest)? That would prevent this problem and allow us to verify the downloaded file before opening it.

Validation is actually hidden in the Open function, so the bundle doesn't need to be trusted.

We can refactor this to use bundle checksum instead (maybe in another PR), as that might be more intuitive. However, the current approach allows us to remove the bundles entirely as one could just publish the manifest (e.g. on chain) and the users should then fetch binaries from untrusted sources and verify their checksums.

@kostko What are your thoughts?

The idea for using the manifest hash was that one would not need to always download all of the bundle artifacts as some may be shared between bundles.

We should probably add a separate Open variant (or have the existing one take options), one that takes a manifest hash. Obviously the zip archive parser should be well-fuzzed (which the current implementation is) and since the manifest must be the first file in the archive, we can just verify the manifest hash before continuing to load or parse anything else. I would opt for this variant.

BTW, we should do some bundle fuzzing, just to be sure these things are somewhat robust. Will open an issue.

@kostko I think in this context it would be worth adding some additional doc of the current and intendend status of bundles.

Here is what I have in mind:

right now we have legacy and non-legacy bundles (components)

furthermore we will have two ways of adding bundles (via the paths or configuring them to download).

finally, there is this things with versions that is not 100% clear yet (what is allowed to be reused, what if you get same version but different content, exact bundle validation).

I think it may be worth aggregating requirements/invariant somewhere together with legacy context, as in addition to make it more accessible it would also ease us in finding problematic corner cases/interleaving. Maybe a README.md under go/runtime/bundle? We can later move this to official docs once those features are released?

Hope I am not complicating, we could also offline :).

I think that runtime bundles don't need to be compatible with all previous versions, so I expect we will remove all these legacy fields once we are done refactoring and all runtimes upgrade to new bundles. So I'm not sure if we need a readme file.

go/runtime/bundle/registry.go

go/runtime/bundle/discovery.go

martintomazic · 2024-12-10T14:56:26Z

go/runtime/bundle/discovery.go

+// Init sets up bundle discovery using node configuration and adds configured
+// and cached bundles to the registry.
+func (d *Discovery) Init() error {
+	// Consolidate all bundles in one place, which could be useful
+	// if we implement P2P sharing in the future.
+	if err := d.copyBundles(); err != nil {
+		return err
+	}
+
+	// Add copied and cached bundles to the registry.
+	if err := d.Discover(); err != nil {
+		return err


Init only adds configured bundles to the registry. In addition, some of them may be already previously exploded (cached)? If this is the case somewhere nested WriteExploded will only verify...

However, it does not guarantee to add all the "cached" bundles, as looking at Discover method you only add the *.orc bundles to the registry (i.e. the ones that were configured). This is desired as some of the exploded bundles may be stale, i.e. no longer part of the config. (Automatic removal will be handled as part of #5737).

Do I understand your logic correctly? If so I would suggest removing cached bundles reference from the Init toplevel docs and before d.Discover() to avoid confusion?

Thanks for your reply! :)

All bundles are copied to the bundle folder.:

Bundles from the config are copied in Init on line 79.

Bundles fetched from the web are first verified and then copied to the bundle folder.

So the Discovery loads all bundles in the config, all bundles that used to be in the config, and all bundles that were downloaded. The idea is that you always have bundle and the extracted folder in the bundle directory. This way, it will be easy to implement p2p2 sharing as one can just check which orc files are in the bundle directory.

I see and thanks.

So what I suggest to change (super minor but to avoid confusion):

// Init sets up bundle discovery using node configuration and adds configured // and cached bundles to the registry.

and

// Add copied and cached bundles to the registry.

to

// Init sets up bundle discovery using node configuration and adds configured // bundles (that are guaranteed to be exploded) to the registry.

and

// Add copied bundles (that are guaranteed to be exploded) to the registry.

or smtg along the lines, just to emphasize that we are not also moving stale cached bundles (see issue #5737)

Hope it makes sense(?), and feel free to reword as you see fit...

all bundles that used to be in the config

Correct since at init time the .orc file (if configured) was moved to /data/runtimes/bundles and exploded.
Note, the nodes that were running previous software (prior to this PR/current version), may have some bundles exploded, later removed them from config (and thus also bundle), so only the exploded part will stay. I guess with #5737 the idea was to remove those stale "exploded" bundles + the ones whose version is now outdated.

Now if we push for the p2p sharing, I guess we go in the other direction, i.e. we further increase the storage load on the nodes as they store essentially all the historical bundles and their exploded paths (technically we could be removing those once outdated and only store bundles). On the positive side we get a more resilient data availability for our software components.

Thinking aloud, if we want to have some kind of p2p sharing we may want only archive nodes responsible for that? For the normal nodes I don't think we want to store outdated bundles and exploded components. In any case we may want to adapt #5737 cleanup requirements?

Happy to tackle this myself in the follow-up once we align on the strategy.

cc @kostko

There is no sense in storing or sharing historic bundles, so any pruning should get rid of them IMO.

OK, this make sense. RONL + system components in one bundle, and ROFL components in detached bundles.

Also note that ROFL components can only be fetched from detached bundles, not from bundles that contain RONLs.

Our code doesn't distinguish between ROFL and system components. But it would be nice to have a check that ROFLs are always in a detached bundle, and system components in the runtime bundle.

Should we add ROFLSystemKind Kind= "rofl-system" or SystemKind Kind = "system" and verify this in the code?

Ups, maybe we don't need this and we just mark the component as system ROFL once is read from an undetached bundle.

I have thought "system components" can only be ROFL component with current design? (update expect for one and only one RONL)

But it would be nice to have a check that ROFLs are always in a detached bundle, and system components in the runtime bundle.

If not is this really "always" then, if system component is not rofl and may be used by different bundles, then you also want to package it separately, i.e. detached?

So "system ROFL component" = non-detached ROFL. There is already the detached flag for this, but maybe it is not obvious? Or there could be an additional component attribute (e.g. system), which would only be allowed to be set if the bundle is non-detached?

martintomazic

I have checked new discovery and registry part. Overall looks super solid. Will try to review the whole PR/integration tmr :)

go/runtime/bundle/registry.go

go/runtime/bundle/discovery.go

go/oasis-node/cmd/node/node.go

go/runtime/config/config.go

go/runtime/bundle/discovery.go

go/runtime/bundle/registry.go

martintomazic · 2024-12-11T12:07:12Z

go/runtime/bundle/discovery.go

+// Init sets up bundle discovery using node configuration and adds configured
+// and cached bundles to the registry.
+func (d *Discovery) Init() error {
+	// Consolidate all bundles in one place, which could be useful
+	// if we implement P2P sharing in the future.
+	if err := d.copyBundles(); err != nil {
+		return err
+	}
+
+	// Add copied and cached bundles to the registry.
+	if err := d.Discover(); err != nil {
+		return err


I see and thanks.

So what I suggest to change (super minor but to avoid confusion):

// Init sets up bundle discovery using node configuration and adds configured // and cached bundles to the registry.

and

// Add copied and cached bundles to the registry.

to

// Init sets up bundle discovery using node configuration and adds configured // bundles (that are guaranteed to be exploded) to the registry.

and

// Add copied bundles (that are guaranteed to be exploded) to the registry.

or smtg along the lines, just to emphasize that we are not also moving stale cached bundles (see issue #5737)

Hope it makes sense(?), and feel free to reword as you see fit...

go/runtime/bundle/registry.go

go/runtime/bundle/discovery.go

go/control/api/api.go

go/runtime/host/sandbox/sandbox.go

go/runtime/registry/config.go

go/runtime/bundle/manifest.go

kostko · 2024-12-12T08:36:47Z

go/runtime/bundle/discovery.go

+// Init sets up bundle discovery using node configuration and adds configured
+// and cached bundles to the registry.
+func (d *Discovery) Init() error {
+	// Consolidate all bundles in one place, which could be useful
+	// if we implement P2P sharing in the future.
+	if err := d.copyBundles(); err != nil {
+		return err
+	}
+
+	// Add copied and cached bundles to the registry.
+	if err := d.Discover(); err != nil {
+		return err


There is no sense in storing or sharing historic bundles, so any pruning should get rid of them IMO.

kostko · 2024-12-12T08:42:05Z

go/runtime/bundle/discovery.go

+		return fmt.Errorf("failed to construct URL: %w", err)
+	}
+
+	src, err := d.fetchBundle(url)


So this scheme assumes that all bundles are directly downloadable in the registry by their hash. I would like to avoid the need to copy bundles and instead do it like it was suggested in #5755 (comment), namely that there would be an indirection step:

You construct the URL based on the registry base URL and the manifest hash.

But instead of containing the entire bundle, this URL only contains metadata on where to find the bundle.

This way we can use existing GitHub releases and just point to those releases in these metadata files instead of needing to copy over the actual bundles.

go/runtime/bundle/registry.go

go/runtime/bundle/discovery.go

go/runtime/bundle/bundle.go

.changelog/5962.cfg.md

martintomazic · 2024-12-12T12:52:12Z

go/runtime/bundle/discovery.go

 	// requestTimeout is the time limit for http client requests.
 	requestTimeout = 10 * time.Second

-	// maxBundleSizeBytes is the maximum allowed bundle size in bytes.
-	maxBundleSizeBytes = 20 * 1024 * 1024 // 20 MB
+	// maxDefaultBundleSizeBytes is the maximum allowed default bundle size
+	// in bytes.
+	maxDefaultBundleSizeBytes = 20 * 1024 * 1024 // 20 MB
 )


Minor: Will requestTimeout of 10s be sufficient for big bundles or do we also want to make this configurable/function of size?

Is this just idle timeout (e.g. waiting for response while nothing is being sent) or max time it takes for the request to complete?

I would say for the request to complete since we use it to set http.Client.Timeout

// Timeout specifies a time limit for requests made by this // Client. The timeout includes connection time, any // redirects, and reading the response body. The timer remains // running after Get, Head, Post, or Do return and will // interrupt reading of the Response.Body.

What is the max size we anticipate? For the reference I see that our documentation recommends "Recommended: 1 Gbps internet connection with low latency".

Might be a good idea to future proof us by making this parametric...

peternose added 14 commits December 2, 2024 02:39

go/oasis-test-runner/scenario/e2e/runtime: Add additional logging

87eede7

go/worker/common/committee/group: Use runtime ID instead of runtime

6169531

go/runtime/history: Shorten name for has local storage worker flag

1f8cdf0

go/runtime/registry/host: Simplify runtime host node

41d102a

go/runtime/registry/host: Remove function WaitHostedRuntime

c65fabe

Since runtime versions can be configured dynamically, and none of them may be active at a given moment, this function has no meaningful purpose in its current form and how it is currently used.

go/runtime/registry/host: Provision every version only once

fa676c5

go/runtime/registry: Add runtime constructor

d46b4f8

go/runtime/registry/config: Refactor function newRuntimeConfig

b091681

Extracted the creation of the provisioner, host info and caching quote service into a dedicated helper function to improve code readability.

go/runtime/registry: Add support for adding new runtime versions

1e71b29

go/worker/common/committee: Provision newly discovered runtime versions

7f9f1f0

go/runtime/bundle/manifest: Add version to component

eca5d8d

go/runtime/host: Replace runtime bundle with exploded components

6e9f961

go/runtime/host/sandbox: Clear cmds before notifying of runtime start

f287deb

Fixes a bug that occurs when attempting to abort the runtime immediately after it starts.

go/runtime/bundle: Add recommended filename to bundle

4cb5c4b

peternose force-pushed the peternose/feature/hot-loading branch 2 times, most recently from 58dd5fb to 9e3cb33 Compare December 7, 2024 01:45

peternose added 3 commits December 7, 2024 03:50

go/runtime: Support hot-loading of runtime bundles

5d89782

The node can now fetch and verify runtime bundles from remote repositories and automatically update to new versions.

go/control/api: Add bundle status to GetStatus output

bb585be

keymanager: Bump hashbrown to 0.15.1

5da378e

peternose force-pushed the peternose/feature/hot-loading branch from 9e3cb33 to 5da378e Compare December 7, 2024 02:50

peternose marked this pull request as ready for review December 7, 2024 03:18

peternose requested review from kostko, peterjgilbert, pro-wh and ptrus as code owners December 7, 2024 03:18

peternose requested a review from martintomazic December 7, 2024 03:18

ptrus reviewed Dec 8, 2024

View reviewed changes

go/runtime/bundle/discovery: Add timeout to http requests

2123891

peternose added 2 commits December 8, 2024 13:04

go/runtime/bundle/discovery: Add bundle size limit

a20812f

go/runtime/bundle/discovery: Lock mutex in Init as late as possible

35d2d45

ptrus reviewed Dec 9, 2024

View reviewed changes

go/runtime/bundle/registry.go Outdated Show resolved Hide resolved

go/runtime/bundle/registry.go Outdated Show resolved Hide resolved

martintomazic reviewed Dec 10, 2024

View reviewed changes

go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved

martintomazic reviewed Dec 10, 2024

View reviewed changes

martintomazic assigned martintomazic and unassigned martintomazic Dec 11, 2024

go/runtime/bundle/registry: Sort versions

c003abb

martintomazic reviewed Dec 11, 2024

View reviewed changes

kostko reviewed Dec 12, 2024

View reviewed changes

go/runtime/bundle/discovery.go Outdated Show resolved Hide resolved

kostko reviewed Dec 12, 2024

View reviewed changes

go/runtime/bundle/bundle.go Show resolved Hide resolved

peternose added 3 commits December 12, 2024 12:11

go/runtime/bundle: Fix comments and errors

1e10b4e

go/runtime/bundle: Verify manifest hash when opening a bundle

ebe082c

go/runtime/bundle: Move component to a new file

96fc95a

peternose force-pushed the peternose/feature/hot-loading branch from 49e86ad to 45b2903 Compare December 12, 2024 11:11

kostko reviewed Dec 12, 2024

View reviewed changes

.changelog/5962.cfg.md Outdated Show resolved Hide resolved

peternose added 2 commits December 12, 2024 12:38

go/runtime/bundle: Make max bundle size configurable

38eeb50

go/runtime/bundle: Close zip file if Open fails

979c1c2

peternose force-pushed the peternose/feature/hot-loading branch from 45b2903 to 979c1c2 Compare December 12, 2024 11:38

martintomazic reviewed Dec 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

go/runtime: Support hot-loading of runtime bundles #5961

go/runtime: Support hot-loading of runtime bundles #5961

peternose commented Dec 6, 2024

netlify bot commented Dec 6, 2024 •

edited

Loading

codecov bot commented Dec 7, 2024 •

edited

Loading

ptrus left a comment

ptrus Dec 8, 2024 •

edited

Loading

peternose Dec 8, 2024

kostko Dec 9, 2024

kostko Dec 12, 2024

kostko Dec 12, 2024

martintomazic Dec 12, 2024

peternose Dec 12, 2024

martintomazic Dec 10, 2024

peternose Dec 11, 2024

martintomazic Dec 11, 2024

martintomazic Dec 11, 2024

kostko Dec 12, 2024

peternose Dec 12, 2024

peternose Dec 12, 2024 •

edited

Loading

peternose Dec 12, 2024

martintomazic Dec 12, 2024 •

edited

Loading

kostko Dec 12, 2024

martintomazic left a comment

martintomazic Dec 11, 2024

kostko Dec 12, 2024

kostko Dec 12, 2024

martintomazic Dec 12, 2024

kostko Dec 12, 2024

martintomazic Dec 12, 2024

martintomazic Dec 12, 2024

go/runtime: Support hot-loading of runtime bundles #5961

Are you sure you want to change the base?

go/runtime: Support hot-loading of runtime bundles #5961

Conversation

peternose commented Dec 6, 2024

netlify bot commented Dec 6, 2024 • edited Loading

✅ Deploy Preview for oasisprotocol-oasis-core canceled.

codecov bot commented Dec 7, 2024 • edited Loading

Codecov Report

ptrus left a comment

Choose a reason for hiding this comment

ptrus Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peternose Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martintomazic Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martintomazic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netlify bot commented Dec 6, 2024 •

edited

Loading

codecov bot commented Dec 7, 2024 •

edited

Loading

ptrus Dec 8, 2024 •

edited

Loading

peternose Dec 12, 2024 •

edited

Loading

martintomazic Dec 12, 2024 •

edited

Loading