Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divergence-free worker forking #1139

Open
vigoo opened this issue Dec 5, 2024 · 0 comments
Open

Divergence-free worker forking #1139

vigoo opened this issue Dec 5, 2024 · 0 comments
Assignees
Milestone

Comments

@vigoo
Copy link
Contributor

vigoo commented Dec 5, 2024

Follow-up task for #1138

With the implementation described in #1138, the forked worker has exactly the same oplog up until the oplog index cutoff as the original one, except its first entry OplogEntry::Create which contains the new WorkerId.

Because the worker id is observable (via an environment variable), this can lead to divergence.

In this ticket, we solve it by introducing the capability to have a "shadow worker id". This can be done in the following way:

  • Store a map of shadow worker ids associated with oplog ranges in the WorkerStatusRecord. Similarly how DeletedRegions work.
  • Have a new oplog entry that adds a new entry to this.
  • Make sure that when folding over the oplog entries, the latest state of this map gets populated
  • Modify the host function that gets the environment variables to overwrite the GOLEM_WORKER_ID variable based on whether the current oplog index is within one of the recorded regions for shadowed worker id
  • Modify the forking feature to add this new oplog entry to the end of the oplog after the copied elements
@vigoo vigoo added this to the Golem 1.2 milestone Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants