-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Refactor InvalidRosterException to be a checked exception #17187
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tim Farber-Newman <[email protected]>
Signed-off-by: Tim Farber-Newman <[email protected]>
...-node/hedera-app/src/test/java/com/hedera/node/app/roster/schemas/V0540RosterSchemaTest.java
Show resolved
Hide resolved
hedera-node/hedera-app/src/main/java/com/hedera/node/app/roster/schemas/V0540RosterSchema.java
Outdated
Show resolved
Hide resolved
.../swirlds-platform-core/src/main/java/com/swirlds/platform/roster/InvalidRosterException.java
Outdated
Show resolved
Hide resolved
...wirlds-platform-core/src/main/java/com/swirlds/platform/system/address/AddressBookUtils.java
Outdated
Show resolved
Hide resolved
...c/testFixtures/java/com/swirlds/platform/test/fixtures/state/RandomSignedStateGenerator.java
Outdated
Show resolved
Hide resolved
...ode/test-clients/src/main/java/com/hedera/services/bdd/spec/utilops/tss/RekeyScenarioOp.java
Outdated
Show resolved
Hide resolved
...ode/test-clients/src/main/java/com/hedera/services/bdd/spec/utilops/tss/RekeyScenarioOp.java
Outdated
Show resolved
Hide resolved
...swirlds-platform-core/src/main/java/com/swirlds/platform/state/signed/StartupStateUtils.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Tim Farber-Newman <[email protected]>
Signed-off-by: Tim Farber-Newman <[email protected]>
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with everything from a platform perspective. Need one or two services engineers to weight in.
} | ||
} catch (final InvalidRosterException e) { | ||
logger.error("CATASTROPHIC failure loading candidate roster", e); | ||
stack.rollbackFullStack(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure stack.rollbackFullStack()
is the behavior we want here; it doesn't seem catastrophic to the same degree as other situations we designate as 'catastrophic.' If the candidate roster is invalid it's definitely a problem, but rolling back 1) the user transaction and 2) the staking changes because we can't start keying the candidate seems excessive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a rollback at all? FWIW if the roster validation failed, it would have kept bubbling up and no rollback would have been performed (at least not in this section of code). Should the exception just be caught and a log written and no rollback? If so, what are the potential impacts of getting through most of the operations and failing at the very last moment without handling it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, I don't think we want to roll back anything. There's another section of the code that will handle the rollback for a failed transaction.
what are the potential impacts of getting through most of the operations and failing at the very last moment without handling it?
This code runs daily just after midnight, and is critical for correctly establishing the ongoing staking weights of the network. The only roster-related action here is to begin generating TSS key material for the current candidate roster, which roster is currently only required at upgrade boundaries. Said a different way: we don't want an invalid candidate roster–which is a concern, but with far less impact on the network–to be coupled with the critical staking updates.
As for what the code would do to handle an invalid roster scenario, that's a bit more tricky. We would need to think through how an invalid roster would make it into the state, and exactly what 'invalid' means, in order to somehow remediate the issue here. For now I suggest only logging the exception in the code, and also getting the right people together to think through these scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code looks dubious. Does it modify the state? Creating a new WritableRosterStore seems to indicate this. But if this is the case, we need to commit the changes if successful and roll them back if something failed as we do for the other steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@netopyr this code runs as part of the first user transaction after midnight, so the commit (or rollback) is designed to happen as part of the normal transaction flow.
} | ||
} | ||
} catch (final InvalidRosterException e) { | ||
throw new IllegalArgumentException("Invalid roster", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wrap InvalidRosterException
with IllegalArgumentException
? Seems like unnecessary nesting for no perceived benefit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InvalidRosterException
is a checked exception. In order to propagate the exception further up, I would need to add throws InvalidRosterException
to the V0540RosterSchema#restart(MigrationContext)
signature. However, if this is done, then it needs to be added to the signature of the parent class: Schema#restart(MigrationContext)
. Adding it to the Schema
class does not work because the module Schema
resides in (swirlds-state-api
) does not have visibility to the platform-sdk
module where the InvalidRosterException
is located.
So the two options I see are:
- Expose the
platform-sdk
module toswirlds-state-api
module. - Re-throw
InvalidRosterException
as an unchecked exception - which is what was done here.
I'm open to hearing what the preference is, or alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I remember the question about checked vs. unchecked now, thanks.
In order to propagate the exception further up
Interesting.. does this need to happen so that platform can handle the exception? Or is there another reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The platform doesn't have to generate a checked exception. Philosophically, throwing an unchecked exception on the transaction handling thread is a no no, in my mind. If the exception is recoverable with proper error handling, that is the best design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing an unchecked exception during startup prevents us from entering live execution with bad data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if V0540RosterSchmea
should really be in hedera-app.
Schema
has to be unaware of implementation-specific Exceptions. I think the best approach is to have swirlds-state-api
define some useful Exceptions that implementations can use. Then, InvalidRosterException
can be wrapped in one of these Exceptions. Maybe having a generic RestartException
is sufficient. It can be checked, if wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with defining the proper exceptions to throw when a service needs to throw an exception and wrapping the InvalidRosterException
in a service appropriate exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the usefulness of this PR rather questionable. Introducing a checked Exception just to wrap it into a RuntimeException where it becomes inconvenient defies the whole purpose IMO.
} | ||
} catch (final InvalidRosterException e) { | ||
logger.error("CATASTROPHIC failure loading candidate roster", e); | ||
stack.rollbackFullStack(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code looks dubious. Does it modify the state? Creating a new WritableRosterStore seems to indicate this. But if this is the case, we need to commit the changes if successful and roll them back if something failed as we do for the other steps.
import com.swirlds.platform.config.AddressBookConfig; | ||
import com.swirlds.platform.roster.InvalidRosterException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This package structure is surprising. Why are the schemas in hedera-app and everything else related to rosters in platform-sdk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Roster
is a platform concept. A RosterService
is not? And the Schemas exist because of the RosterService
?
} | ||
} | ||
} catch (final InvalidRosterException e) { | ||
throw new IllegalArgumentException("Invalid roster", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if V0540RosterSchmea
should really be in hedera-app.
Schema
has to be unaware of implementation-specific Exceptions. I think the best approach is to have swirlds-state-api
define some useful Exceptions that implementations can use. Then, InvalidRosterException
can be wrapped in one of these Exceptions. Maybe having a generic RestartException
is sufficient. It can be checked, if wanted.
try { | ||
rosterStore.putActiveRoster(roster, roundNumber); | ||
} catch (final InvalidRosterException e) { | ||
throw new IllegalArgumentException("Invalid roster", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is IAE really the right Exception here? The roster gets constructed. I think an InvalidStateException would be more appropriate.
I assume there wouldn't be wrapping it in a runtime exception when we're doing full dynamic address book and we're adopting rosters without a shutdown and restart. Is it appropriate to throw unchecked exceptions on the state handling thread? I think Tim is just trying to handle in the PR the cases where throwing a checked exception needs to be processed by the startup logic (schema migrations). If you agree that throwing a checked exception is appropriate then, we just need to get right the proper handling of it in the calling code paths. |
…tion Signed-off-by: Tim Farber-Newman <[email protected]>
try { | ||
rosterStore.putActiveRoster(roster, roundNumber); | ||
} catch (final InvalidRosterException e) { | ||
throw new IllegalStateException("Invalid roster", e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second to other reviewers: this approach seems to defeat the purpose of making the InvalidRosterException
a checked exception.
/** | ||
* A default constructor. | ||
* @param message a message | ||
*/ | ||
public InvalidRosterException(String message) { | ||
public InvalidRosterException(@Nullable final String message) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: why is it @Nullable
? This exception may be difficult to debug if no useful context is provided, so the message should be @NonNull
IMO.
@@ -36,7 +36,7 @@ private RosterValidator() {} | |||
* | |||
* @param roster a roster to validate | |||
*/ | |||
public static void validate(@NonNull final Roster roster) { | |||
public static void validate(@Nullable final Roster roster) throws InvalidRosterException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to keep the annotation as @NonNull
to help prevent obvious mistakes when using this method in code.
try { | ||
RosterUtils.setActiveRoster(stateInstance, rosterInstance, roundInstance); | ||
} catch (final InvalidRosterException e) { | ||
throw new IllegalArgumentException("Invalid roster", e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to an above comment, I reached the end of the diff in this PR and I've seen exactly one legitimate handling of the InvalidRosterException
where we swallow it in StakePeriodChanges.java
. In every other case we always wrap it into a RuntimeException and simply re-throw. From this usage pattern, I don't believe we want to make this exception a checked exception. This only complicates the code and doesn't really add any meaningful value, or at least I haven't seen it in the above diff.
I suggest to rethink the "why we're doing this change" part and see if there's a better solution to this problem than making the InvalidRosterException
a checked exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current handling is only a part of the handling needed in the future. If Services wants an unchecked exception thrown on the transaction handling thread (when we get to fully dynamic address books), then I'll back down my assertion that it should be a checked exception.
Description:
This refactors
InvalidRosterException
from a runtime exception to a checked exception.There are some boundaries across modules that do complicate this transition. For example,
V0540RosterSchema
is outside the main platform code and thus instead of propagating the checked exception up, it is converted back to a runtime exception. The question was posed as to whether things likeSchema
should be exposed to the platform module and thus permit the checked exception further in into the services layer, but no feedback was received. Thus, I am looking for some feedback in this PR as to what is the preferred course of action (exposing the checked exception into the services or re-throwing it as a plainRuntimeException
with theInvalidRosterException
chained to it).RekeyScenarioOp
andStakePeriodChanges
are two other examples that require additional scrutiny.Related issue(s):
Fixes #15766
Notes for reviewer:
Checklist