You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Forking a worker is a general purpose feature that will be implemented in the worker executor, but not used for anything user facing at the moment in Golem OSS.
The forking operation takes three parameters:
source worker id
target worker id
oplog index cutoff
The request must be executed in the source worker's worker executor, but the target worker id does not have to belong to a shard that is hosted by that executor.
The implementation of this ticket must cover the following areas:
Define a new endpoint in the worker executor's gRPC API
Create or update an internal service in worker executor - most likely DefaultWorkerService is a good fit to hold this new functionality, but in case it causes difficulties, a new one can be introduced as well
Implement the actual worker forking in this service, and wire it to the gRPC request handler
Extend the test framework to be able to call fork from tests
Write at least one worker executor test for this
The following list is a draft of what steps the implementation would do in order to perform the forking:
Validate that the target worker ID does not exist and the source worker ID does exist
Get a Worker instance for the source worker with get_or_create_suspended - we don't want to start it if it was not running but we need to acquire the instance
Read the worker's oplog using the Oplog provided by Worker up to the oplog index cutoff
Create a new Oplog (using the OplogService) for the target worker, and append all the elements - NOTE that the first element, Create (or CreateV1) must be altered to contain the new worker ID, as that is the primary source of truth for the identity of a worker.
Use the worker service (by extending WorkerProxy) to resume the newly created worker. It has to go through worker service because it may live in another worker executor, depending on sharding.
By completing this ticket, we have a new expoed worker forking feature which "works", although not completely correctly yet - at this point the forked worker will replay with the new worker id from start, which can lead to divergence. A separate ticket will improve this situation.
The text was updated successfully, but these errors were encountered:
Forking a worker is a general purpose feature that will be implemented in the worker executor, but not used for anything user facing at the moment in Golem OSS.
The forking operation takes three parameters:
The request must be executed in the source worker's worker executor, but the target worker id does not have to belong to a shard that is hosted by that executor.
The implementation of this ticket must cover the following areas:
DefaultWorkerService
is a good fit to hold this new functionality, but in case it causes difficulties, a new one can be introduced as wellfork
from testsThe following list is a draft of what steps the implementation would do in order to perform the forking:
Worker
instance for the source worker withget_or_create_suspended
- we don't want to start it if it was not running but we need to acquire the instanceOplog
provided byWorker
up to the oplog index cutoffOplog
(using theOplogService
) for the target worker, and append all the elements - NOTE that the first element,Create
(orCreateV1
) must be altered to contain the new worker ID, as that is the primary source of truth for the identity of a worker.WorkerProxy
) to resume the newly created worker. It has to go through worker service because it may live in another worker executor, depending on sharding.By completing this ticket, we have a new expoed worker forking feature which "works", although not completely correctly yet - at this point the forked worker will replay with the new worker id from start, which can lead to divergence. A separate ticket will improve this situation.
The text was updated successfully, but these errors were encountered: