Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Refactor InvalidRosterException to be a checked exception #17187

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

timfn-hg
Copy link
Contributor

@timfn-hg timfn-hg commented Jan 2, 2025

Description:
This refactors InvalidRosterException from a runtime exception to a checked exception.

There are some boundaries across modules that do complicate this transition. For example, V0540RosterSchema is outside the main platform code and thus instead of propagating the checked exception up, it is converted back to a runtime exception. The question was posed as to whether things like Schema should be exposed to the platform module and thus permit the checked exception further in into the services layer, but no feedback was received. Thus, I am looking for some feedback in this PR as to what is the preferred course of action (exposing the checked exception into the services or re-throwing it as a plain RuntimeException with the InvalidRosterException chained to it). RekeyScenarioOp and StakePeriodChanges are two other examples that require additional scrutiny.

Related issue(s):

Fixes #15766

Notes for reviewer:

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

@timfn-hg timfn-hg added this to the v0.59 milestone Jan 2, 2025
@timfn-hg timfn-hg self-assigned this Jan 2, 2025
@timfn-hg timfn-hg requested review from a team as code owners January 2, 2025 17:28
Signed-off-by: Tim Farber-Newman <[email protected]>
Signed-off-by: Tim Farber-Newman <[email protected]>
Signed-off-by: Tim Farber-Newman <[email protected]>
Copy link

codacy-production bot commented Jan 2, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% (target: -1.00%) 57.78%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (a484555) 95934 65316 68.08%
Head commit (3da4c8f) 95932 (-2) 65311 (-5) 68.08% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#17187) 45 26 57.78%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

Copy link

codecov bot commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 57.77778% with 19 lines in your changes missing coverage. Please review.

Project coverage is 64.33%. Comparing base (a484555) to head (3da4c8f).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
...rlds/platform/system/address/AddressBookUtils.java 0.00% 6 Missing ⚠️
...irlds/platform/state/signed/StartupStateUtils.java 0.00% 5 Missing ⚠️
.../com/swirlds/state/lifecycle/RestartException.java 0.00% 4 Missing ⚠️
...era/node/app/roster/schemas/V0540RosterSchema.java 91.30% 2 Missing ⚠️
...app/workflows/handle/steps/StakePeriodChanges.java 60.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##               main   #17187   +/-   ##
=========================================
  Coverage     64.33%   64.33%           
+ Complexity    20941    20939    -2     
=========================================
  Files          2555     2556    +1     
  Lines         96171    96185   +14     
  Branches      10055    10055           
=========================================
+ Hits          61870    61879    +9     
- Misses        30663    30668    +5     
  Partials       3638     3638           
Files with missing lines Coverage Δ
...ode/app/roster/schemas/RosterTransplantSchema.java 96.55% <100.00%> (ø)
...wirlds/platform/roster/InvalidRosterException.java 100.00% <ø> (ø)
.../java/com/swirlds/platform/roster/RosterUtils.java 66.32% <ø> (ø)
...a/com/swirlds/platform/roster/RosterValidator.java 100.00% <ø> (ø)
...ds/platform/state/service/WritableRosterStore.java 85.96% <ø> (ø)
.../main/java/com/swirlds/state/lifecycle/Schema.java 100.00% <ø> (ø)
...era/node/app/roster/schemas/V0540RosterSchema.java 95.12% <91.30%> (-4.88%) ⬇️
...app/workflows/handle/steps/StakePeriodChanges.java 92.00% <60.00%> (-2.45%) ⬇️
.../com/swirlds/state/lifecycle/RestartException.java 0.00% <0.00%> (ø)
...irlds/platform/state/signed/StartupStateUtils.java 49.05% <0.00%> (-1.43%) ⬇️
... and 1 more

... and 14 files with indirect coverage changes

Impacted file tree graph

Copy link
Contributor

@edward-swirldslabs edward-swirldslabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with everything from a platform perspective. Need one or two services engineers to weight in.

}
} catch (final InvalidRosterException e) {
logger.error("CATASTROPHIC failure loading candidate roster", e);
stack.rollbackFullStack();
Copy link
Member

@mhess-swl mhess-swl Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure stack.rollbackFullStack() is the behavior we want here; it doesn't seem catastrophic to the same degree as other situations we designate as 'catastrophic.' If the candidate roster is invalid it's definitely a problem, but rolling back 1) the user transaction and 2) the staking changes because we can't start keying the candidate seems excessive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a rollback at all? FWIW if the roster validation failed, it would have kept bubbling up and no rollback would have been performed (at least not in this section of code). Should the exception just be caught and a log written and no rollback? If so, what are the potential impacts of getting through most of the operations and failing at the very last moment without handling it?

Copy link
Member

@mhess-swl mhess-swl Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I don't think we want to roll back anything. There's another section of the code that will handle the rollback for a failed transaction.

what are the potential impacts of getting through most of the operations and failing at the very last moment without handling it?

This code runs daily just after midnight, and is critical for correctly establishing the ongoing staking weights of the network. The only roster-related action here is to begin generating TSS key material for the current candidate roster, which roster is currently only required at upgrade boundaries. Said a different way: we don't want an invalid candidate roster–which is a concern, but with far less impact on the network–to be coupled with the critical staking updates.

As for what the code would do to handle an invalid roster scenario, that's a bit more tricky. We would need to think through how an invalid roster would make it into the state, and exactly what 'invalid' means, in order to somehow remediate the issue here. For now I suggest only logging the exception in the code, and also getting the right people together to think through these scenarios.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code looks dubious. Does it modify the state? Creating a new WritableRosterStore seems to indicate this. But if this is the case, we need to commit the changes if successful and roll them back if something failed as we do for the other steps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@netopyr this code runs as part of the first user transaction after midnight, so the commit (or rollback) is designed to happen as part of the normal transaction flow.

}
}
} catch (final InvalidRosterException e) {
throw new IllegalArgumentException("Invalid roster", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wrap InvalidRosterException with IllegalArgumentException? Seems like unnecessary nesting for no perceived benefit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidRosterException is a checked exception. In order to propagate the exception further up, I would need to add throws InvalidRosterException to the V0540RosterSchema#restart(MigrationContext) signature. However, if this is done, then it needs to be added to the signature of the parent class: Schema#restart(MigrationContext). Adding it to the Schema class does not work because the module Schema resides in (swirlds-state-api) does not have visibility to the platform-sdk module where the InvalidRosterException is located.

So the two options I see are:

  1. Expose the platform-sdk module to swirlds-state-api module.
  2. Re-throw InvalidRosterException as an unchecked exception - which is what was done here.

I'm open to hearing what the preference is, or alternatives.

Copy link
Member

@mhess-swl mhess-swl Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I remember the question about checked vs. unchecked now, thanks.

In order to propagate the exception further up

Interesting.. does this need to happen so that platform can handle the exception? Or is there another reason?

Copy link
Contributor

@edward-swirldslabs edward-swirldslabs Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The platform doesn't have to generate a checked exception. Philosophically, throwing an unchecked exception on the transaction handling thread is a no no, in my mind. If the exception is recoverable with proper error handling, that is the best design.

Copy link
Contributor

@edward-swirldslabs edward-swirldslabs Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing an unchecked exception during startup prevents us from entering live execution with bad data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if V0540RosterSchmea should really be in hedera-app.

Schema has to be unaware of implementation-specific Exceptions. I think the best approach is to have swirlds-state-api define some useful Exceptions that implementations can use. Then, InvalidRosterException can be wrapped in one of these Exceptions. Maybe having a generic RestartException is sufficient. It can be checked, if wanted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with defining the proper exceptions to throw when a service needs to throw an exception and wrapping the InvalidRosterException in a service appropriate exception.

Copy link
Contributor

@netopyr netopyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the usefulness of this PR rather questionable. Introducing a checked Exception just to wrap it into a RuntimeException where it becomes inconvenient defies the whole purpose IMO.

}
} catch (final InvalidRosterException e) {
logger.error("CATASTROPHIC failure loading candidate roster", e);
stack.rollbackFullStack();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code looks dubious. Does it modify the state? Creating a new WritableRosterStore seems to indicate this. But if this is the case, we need to commit the changes if successful and roll them back if something failed as we do for the other steps.

import com.swirlds.platform.config.AddressBookConfig;
import com.swirlds.platform.roster.InvalidRosterException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package structure is surprising. Why are the schemas in hedera-app and everything else related to rosters in platform-sdk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Roster is a platform concept. A RosterService is not? And the Schemas exist because of the RosterService?

}
}
} catch (final InvalidRosterException e) {
throw new IllegalArgumentException("Invalid roster", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if V0540RosterSchmea should really be in hedera-app.

Schema has to be unaware of implementation-specific Exceptions. I think the best approach is to have swirlds-state-api define some useful Exceptions that implementations can use. Then, InvalidRosterException can be wrapped in one of these Exceptions. Maybe having a generic RestartException is sufficient. It can be checked, if wanted.

try {
rosterStore.putActiveRoster(roster, roundNumber);
} catch (final InvalidRosterException e) {
throw new IllegalArgumentException("Invalid roster", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is IAE really the right Exception here? The roster gets constructed. I think an InvalidStateException would be more appropriate.

@edward-swirldslabs
Copy link
Contributor

edward-swirldslabs commented Jan 3, 2025

I find the usefulness of this PR rather questionable. Introducing a checked Exception just to wrap it into a RuntimeException where it becomes inconvenient defies the whole purpose IMO.

I assume there wouldn't be wrapping it in a runtime exception when we're doing full dynamic address book and we're adopting rosters without a shutdown and restart. Is it appropriate to throw unchecked exceptions on the state handling thread? I think Tim is just trying to handle in the PR the cases where throwing a checked exception needs to be processed by the startup logic (schema migrations).

If you agree that throwing a checked exception is appropriate then, we just need to get right the proper handling of it in the calling code paths.

Comment on lines +460 to +464
try {
rosterStore.putActiveRoster(roster, roundNumber);
} catch (final InvalidRosterException e) {
throw new IllegalStateException("Invalid roster", e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second to other reviewers: this approach seems to defeat the purpose of making the InvalidRosterException a checked exception.

/**
* A default constructor.
* @param message a message
*/
public InvalidRosterException(String message) {
public InvalidRosterException(@Nullable final String message) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: why is it @Nullable? This exception may be difficult to debug if no useful context is provided, so the message should be @NonNull IMO.

@@ -36,7 +36,7 @@ private RosterValidator() {}
*
* @param roster a roster to validate
*/
public static void validate(@NonNull final Roster roster) {
public static void validate(@Nullable final Roster roster) throws InvalidRosterException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep the annotation as @NonNull to help prevent obvious mistakes when using this method in code.

Comment on lines +184 to +188
try {
RosterUtils.setActiveRoster(stateInstance, rosterInstance, roundInstance);
} catch (final InvalidRosterException e) {
throw new IllegalArgumentException("Invalid roster", e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to an above comment, I reached the end of the diff in this PR and I've seen exactly one legitimate handling of the InvalidRosterException where we swallow it in StakePeriodChanges.java. In every other case we always wrap it into a RuntimeException and simply re-throw. From this usage pattern, I don't believe we want to make this exception a checked exception. This only complicates the code and doesn't really add any meaningful value, or at least I haven't seen it in the above diff.

I suggest to rethink the "why we're doing this change" part and see if there's a better solution to this problem than making the InvalidRosterException a checked exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current handling is only a part of the handling needed in the future. If Services wants an unchecked exception thrown on the transaction handling thread (when we get to fully dynamic address books), then I'll back down my assertion that it should be a checked exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make RosterValidator throw checked exception
5 participants