You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "undo" feature is a new worker executor function exposed through it's gRPC API, having two possible modes:
undo the worker to a specific oplog index
convenience feature: undo the last N invocations
The "last N invocations" can be implemented on top of the "undo to the given oplog index" feature by reading back from the end of the oplog looking for the nth ExportedFunctionInvoked entry.
On the execution level, the undo feature will take advantage of the already existing "deleted regions" support. Similarly how the Jump entry is implemented, when we undo something in the worker, that will mark the "undone" region as deleted.
In details, implementation of this ticket should consist of the following steps:
Define the new gRPC API
Add a new oplog entry Undo (it should to be different from Jump both to be able to differentiate them, and also because we may have to treat them differently when calculating some part of the worker status)
In the implementation, use get_or_create_suspended to get the Worker and check its status. Undo should only work on Idle or Failed or Exited workers - we don't allow undo when it is executing something (running/suspended/interrupted).
If the request specifies "last n invocation"s, use the worker's Oplog to translate that into a specific oplog index
Write the oplog entry and initiate a restart (by doing a special interrupt - I think the InterruptKind::Jump can be reused here)
Modify calculate_last_known_status to take the new oplog entry into account when calculating the deleted regions, and also when calculating WorkerStatus (as we have to "get back" . NOTE: I think the proper implementation is to change it that it first calculates the final deleted regions, and then in all the other "folds" it takes the deleted regions into account - right now this is not the case and it can be considered a bug.
Add the new undo API into the test DSL
Write worker executor tests that tries out this feature (both versions)
This is not split further into smaller steps, as they only make sense and only testable together.
The text was updated successfully, but these errors were encountered:
The "undo" feature is a new worker executor function exposed through it's gRPC API, having two possible modes:
The "last N invocations" can be implemented on top of the "undo to the given oplog index" feature by reading back from the end of the oplog looking for the nth
ExportedFunctionInvoked
entry.On the execution level, the undo feature will take advantage of the already existing "deleted regions" support. Similarly how the
Jump
entry is implemented, when we undo something in the worker, that will mark the "undone" region as deleted.In details, implementation of this ticket should consist of the following steps:
Undo
(it should to be different fromJump
both to be able to differentiate them, and also because we may have to treat them differently when calculating some part of the worker status)get_or_create_suspended
to get theWorker
and check its status. Undo should only work onIdle
orFailed
orExited
workers - we don't allow undo when it is executing something (running/suspended/interrupted).Oplog
to translate that into a specific oplog indexcalculate_last_known_status
to take the new oplog entry into account when calculating the deleted regions, and also when calculatingWorkerStatus
(as we have to "get back" . NOTE: I think the proper implementation is to change it that it first calculates the final deleted regions, and then in all the other "folds" it takes the deleted regions into account - right now this is not the case and it can be considered a bug.This is not split further into smaller steps, as they only make sense and only testable together.
The text was updated successfully, but these errors were encountered: