-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/runtime: Support hot-loading of runtime bundles #5961
base: master
Are you sure you want to change the base?
Conversation
Since runtime versions can be configured dynamically, and none of them may be active at a given moment, this function has no meaningful purpose in its current form and how it is currently used.
Extracted the creation of the provisioner, host info and caching quote service into a dedicated helper function to improve code readability.
Fixes a bug that occurs when attempting to abort the runtime immediately after it starts.
✅ Deploy Preview for oasisprotocol-oasis-core canceled.
|
58dd5fb
to
9e3cb33
Compare
The node can now fetch and verify runtime bundles from remote repositories and automatically update to new versions.
9e3cb33
to
5da378e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5961 +/- ##
==========================================
+ Coverage 64.69% 65.33% +0.63%
==========================================
Files 629 631 +2
Lines 64297 64752 +455
==========================================
+ Hits 41599 42304 +705
+ Misses 17775 17485 -290
- Partials 4923 4963 +40 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-note): only reviewed the discovery part now
"filename", filename, | ||
) | ||
|
||
if err := d.registry.AddBundle(src, manifestHash); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit scary that AddBundle
needs to "Open" the (untrusted) bundle before verifying the manifest hash. I am not sure, but the current implementation of Open
might assume that bundle came from a (somewhat) trusted source, which might not be the case in future. We should ensure that Open
is safe to use with malicious inputs.
Somehwat related, I remember we talked about 38edded . Now that I have more context, shouldn't that actually be the hash of the entire bundle (not just the manifest)? That would prevent this problem and allow us to verify the downloaded file before opening it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation is actually hidden in the Open
function, so the bundle doesn't need to be trusted.
We can refactor this to use bundle checksum instead (maybe in another PR), as that might be more intuitive. However, the current approach allows us to remove the bundles entirely as one could just publish the manifest (e.g. on chain) and the users should then fetch binaries from untrusted sources and verify their checksums.
@kostko What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea for using the manifest hash was that one would not need to always download all of the bundle artifacts as some may be shared between bundles.
We should probably add a separate Open
variant (or have the existing one take options), one that takes a manifest hash. Obviously the zip archive parser should be well-fuzzed (which the current implementation is) and since the manifest must be the first file in the archive, we can just verify the manifest hash before continuing to load or parse anything else. I would opt for this variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, we should do some bundle fuzzing, just to be sure these things are somewhat robust. Will open an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kostko I think in this context it would be worth adding some additional doc of the current and intendend status of bundles.
Here is what I have in mind:
- right now we have legacy and non-legacy bundles (components)
- furthermore we will have two ways of adding bundles (via the
paths
or configuring them to download). - finally, there is this things with versions that is not 100% clear yet (what is allowed to be reused, what if you get same version but different content, exact bundle validation).
I think it may be worth aggregating requirements/invariant somewhere together with legacy context, as in addition to make it more accessible it would also ease us in finding problematic corner cases/interleaving. Maybe a README.md
under go/runtime/bundle
? We can later move this to official docs once those features are released?
Hope I am not complicating, we could also offline :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that runtime bundles don't need to be compatible with all previous versions, so I expect we will remove all these legacy fields once we are done refactoring and all runtimes upgrade to new bundles. So I'm not sure if we need a readme file.
go/runtime/bundle/discovery.go
Outdated
// Init sets up bundle discovery using node configuration and adds configured | ||
// and cached bundles to the registry. | ||
func (d *Discovery) Init() error { | ||
// Consolidate all bundles in one place, which could be useful | ||
// if we implement P2P sharing in the future. | ||
if err := d.copyBundles(); err != nil { | ||
return err | ||
} | ||
|
||
// Add copied and cached bundles to the registry. | ||
if err := d.Discover(); err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Init
only adds configured bundles to the registry. In addition, some of them may be already previously exploded (cached)? If this is the case somewhere nested WriteExploded
will only verify...
However, it does not guarantee to add all the "cached" bundles, as looking at Discover
method you only add the *.orc
bundles to the registry (i.e. the ones that were configured). This is desired as some of the exploded bundles may be stale, i.e. no longer part of the config. (Automatic removal will be handled as part of #5737).
Do I understand your logic correctly? If so I would suggest removing cached bundles
reference from the Init
toplevel docs and before d.Discover()
to avoid confusion?
Thanks for your reply! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All bundles are copied to the bundle folder.:
- Bundles from the config are copied in
Init
on line 79. - Bundles fetched from the web are first verified and then copied to the bundle folder.
So the Discovery
loads all bundles in the config, all bundles that used to be in the config, and all bundles that were downloaded. The idea is that you always have bundle and the extracted folder in the bundle directory. This way, it will be easy to implement p2p2 sharing as one can just check which orc files are in the bundle directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see and thanks.
So what I suggest to change (super minor but to avoid confusion):
// Init sets up bundle discovery using node configuration and adds configured
// and cached bundles to the registry.
and
// Add copied and cached bundles to the registry.
to
// Init sets up bundle discovery using node configuration and adds configured
// bundles (that are guaranteed to be exploded) to the registry.
and
// Add copied bundles (that are guaranteed to be exploded) to the registry.
or smtg along the lines, just to emphasize that we are not also moving stale cached bundles (see issue #5737)
Hope it makes sense(?), and feel free to reword as you see fit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all bundles that used to be in the config
Correct since at init time the .orc
file (if configured) was moved to /data/runtimes/bundles
and exploded.
Note, the nodes that were running previous software (prior to this PR/current version), may have some bundles exploded, later removed them from config (and thus also bundle), so only the exploded part will stay. I guess with #5737 the idea was to remove those stale "exploded" bundles + the ones whose version is now outdated.
Now if we push for the p2p sharing, I guess we go in the other direction, i.e. we further increase the storage load on the nodes as they store essentially all the historical bundles and their exploded paths (technically we could be removing those once outdated and only store bundles). On the positive side we get a more resilient data availability for our software components.
Thinking aloud, if we want to have some kind of p2p sharing we may want only archive nodes responsible for that? For the normal nodes I don't think we want to store outdated bundles and exploded components. In any case we may want to adapt #5737 cleanup requirements?
Happy to tackle this myself in the follow-up once we align on the strategy.
cc @kostko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no sense in storing or sharing historic bundles, so any pruning should get rid of them IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this make sense. RONL + system components in one bundle, and ROFL components in detached bundles.
Also note that ROFL components can only be fetched from detached bundles, not from bundles that contain RONLs.
Our code doesn't distinguish between ROFL and system components. But it would be nice to have a check that ROFLs are always in a detached bundle, and system components in the runtime bundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add ROFLSystemKind Kind= "rofl-system"
or SystemKind Kind = "system"
and verify this in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, maybe we don't need this and we just mark the component as system ROFL once is read from an undetached bundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought "system components" can only be ROFL component with current design? (update expect for one and only one RONL)
But it would be nice to have a check that ROFLs are always in a detached bundle, and system components in the runtime bundle.
If not is this really "always" then, if system component is not rofl and may be used by different bundles, then you also want to package it separately, i.e. detached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "system ROFL component" = non-detached ROFL. There is already the detached flag for this, but maybe it is not obvious? Or there could be an additional component attribute (e.g. system
), which would only be allowed to be set if the bundle is non-detached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked new discovery and registry part. Overall looks super solid. Will try to review the whole PR/integration tmr :)
go/runtime/bundle/discovery.go
Outdated
// Init sets up bundle discovery using node configuration and adds configured | ||
// and cached bundles to the registry. | ||
func (d *Discovery) Init() error { | ||
// Consolidate all bundles in one place, which could be useful | ||
// if we implement P2P sharing in the future. | ||
if err := d.copyBundles(); err != nil { | ||
return err | ||
} | ||
|
||
// Add copied and cached bundles to the registry. | ||
if err := d.Discover(); err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see and thanks.
So what I suggest to change (super minor but to avoid confusion):
// Init sets up bundle discovery using node configuration and adds configured
// and cached bundles to the registry.
and
// Add copied and cached bundles to the registry.
to
// Init sets up bundle discovery using node configuration and adds configured
// bundles (that are guaranteed to be exploded) to the registry.
and
// Add copied bundles (that are guaranteed to be exploded) to the registry.
or smtg along the lines, just to emphasize that we are not also moving stale cached bundles (see issue #5737)
Hope it makes sense(?), and feel free to reword as you see fit...
go/runtime/bundle/discovery.go
Outdated
// Init sets up bundle discovery using node configuration and adds configured | ||
// and cached bundles to the registry. | ||
func (d *Discovery) Init() error { | ||
// Consolidate all bundles in one place, which could be useful | ||
// if we implement P2P sharing in the future. | ||
if err := d.copyBundles(); err != nil { | ||
return err | ||
} | ||
|
||
// Add copied and cached bundles to the registry. | ||
if err := d.Discover(); err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no sense in storing or sharing historic bundles, so any pruning should get rid of them IMO.
return fmt.Errorf("failed to construct URL: %w", err) | ||
} | ||
|
||
src, err := d.fetchBundle(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this scheme assumes that all bundles are directly downloadable in the registry by their hash. I would like to avoid the need to copy bundles and instead do it like it was suggested in #5755 (comment), namely that there would be an indirection step:
- You construct the URL based on the registry base URL and the manifest hash.
- But instead of containing the entire bundle, this URL only contains metadata on where to find the bundle.
This way we can use existing GitHub releases and just point to those releases in these metadata files instead of needing to copy over the actual bundles.
49e86ad
to
45b2903
Compare
45b2903
to
979c1c2
Compare
go/runtime/bundle/discovery.go
Outdated
// requestTimeout is the time limit for http client requests. | ||
requestTimeout = 10 * time.Second | ||
|
||
// maxBundleSizeBytes is the maximum allowed bundle size in bytes. | ||
maxBundleSizeBytes = 20 * 1024 * 1024 // 20 MB | ||
// maxDefaultBundleSizeBytes is the maximum allowed default bundle size | ||
// in bytes. | ||
maxDefaultBundleSizeBytes = 20 * 1024 * 1024 // 20 MB | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Will requestTimeout
of 10s be sufficient for big bundles or do we also want to make this configurable/function of size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just idle timeout (e.g. waiting for response while nothing is being sent) or max time it takes for the request to complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say for the request to complete since we use it to set http.Client.Timeout
// Timeout specifies a time limit for requests made by this
// Client. The timeout includes connection time, any
// redirects, and reading the response body. The timer remains
// running after Get, Head, Post, or Do return and will
// interrupt reading of the Response.Body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the max size we anticipate? For the reference I see that our documentation recommends "Recommended: 1 Gbps internet connection with low latency".
Might be a good idea to future proof us by making this parametric...
Issue: #5755
Example status output: