Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Upgrade][DO NOT update branch] Alpha TestNet v0.0.11 #967

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

okdas
Copy link
Member

@okdas okdas commented Nov 26, 2024

Summary

An upgrade for Alpha TestNet from v0.0.10 to v0.0.11.

Issue

Beta TestNet has been launched using v0.0.11 (release candidate), and we need to upgrade alpha so both networks use the same version.

Type of change

Chain upgrade.

Testing

  • [] An upgrade is tested by upgrading a network provisioned by the old version (v0.0.10) to the new version (v0.0.11)

Sanity Checklist

  • I have tested my changes using the available tooling
  • I have commented my code
  • I have performed a self-review of my own code; both comments & source code
  • I create and reference any new tickets, if applicable
  • I have left TODOs throughout the codebase, if applicable

@okdas okdas added the code health Cleans up some code label Nov 26, 2024
@okdas okdas added this to the Shannon Beta TestNet Launch milestone Nov 26, 2024
@okdas okdas self-assigned this Nov 26, 2024
app/upgrades/v0.0.11.go Outdated Show resolved Hide resolved
@okdas okdas changed the title [Upgrade] Alpha TestNet v0.0.11 [Upgrade][DO NOT update branch] Alpha TestNet v0.0.11 Nov 26, 2024
@okdas
Copy link
Member Author

okdas commented Nov 27, 2024

Upgrade has been tested and validated locally:

5:39PM INF applying upgrade "v0.0.11" at height: 20 module=x/upgrade
5:39PM INF Starting v0.0.11 upgrade handler module=server
5:39PM INF Starting parameter updates for v0.0.11 module=server
5:39PM INF Current session params module=server params={"num_suppliers_per_session":0}
5:39PM INF Successfully updated session params module=server new_params={"num_suppliers_per_session":15}
5:39PM INF Current tokenomics params module=server params={"dao_reward_address":"","mint_allocation_proposer":{"application":0,"dao":0,"proposer":0,"source_owner":0,"supplier":0}}
5:39PM INF Successfully updated tokenomics params module=server new_params={"dao_reward_address":"pokt1r6ja6rz6rpae58njfrsgs5n5sp3r36r2q9j04h","mint_allocation_proposer":{"application":0,"dao":0.1,"proposer":0.05,"source_owner":0.15,"supplier":0.7}}
5:39PM INF Starting authorization updates for v0.0.11 module=server
5:39PM INF Granting authorization module=server
5:39PM INF Successfully granted authorization module=server
5:39PM INF Running module migrations module=server

@okdas
Copy link
Member Author

okdas commented Nov 27, 2024

This is interesting.

The diff between versions shows we need to add a new authorization, but we already have it on Alpha TestNet:

poktrolld q authz grants-by-granter pokt10d07y265gmmuvt4z0w9aw880jnsr700j8yv32t --node=https://shannon-testnet-grove-seed-rpc.alpha.poktroll.com
- authorization:
    type: cosmos-sdk/GenericAuthorization
    value:
      msg: /poktroll.session.MsgUpdateParam
  expiration: "2500-01-01T00:00:00Z"
  grantee: pokt1r6ja6rz6rpae58njfrsgs5n5sp3r36r2q9j04h
  granter: pokt10d07y265gmmuvt4z0w9aw880jnsr700j8yv32t

@okdas
Copy link
Member Author

okdas commented Nov 27, 2024

Tested again after authorization code removed:


5:58PM INF ABCI Handshake App Info hash=48BAFBE9874B2A7625D9834A983AB231C98523767859E5208A277F8518CF66C0 height=19 module=consensus protocol-version=0 software-version=0.0.11-rc-bc2a3c44
5:58PM INF ABCI Replay Blocks appHeight=19 module=consensus stateHeight=19 storeHeight=20
5:58PM INF Replay last block using real app module=consensus
5:58PM INF applying upgrade "v0.0.11" at height: 20 module=x/upgrade
5:58PM INF Starting v0.0.11 upgrade handler module=server
5:58PM INF Starting parameter updates for v0.0.11 module=server
5:58PM INF Current session params module=server params={"num_suppliers_per_session":0}
5:58PM INF Successfully updated session params module=server new_params={"num_suppliers_per_session":15}
5:58PM INF Current tokenomics params module=server params={"dao_reward_address":"","mint_allocation_proposer":{"application":0,"dao":0,"proposer":0,"source_owner":0,"supplier":0}}
5:58PM INF Successfully updated tokenomics params module=server new_params={"dao_reward_address":"pokt1r6ja6rz6rpae58njfrsgs5n5sp3r36r2q9j04h","mint_allocation_proposer":{"application":0,"dao":0.1,"proposer":0.05,"source_owner":0.15,"supplier":0.7}}
5:58PM INF Running module migrations module=server
5:58PM INF adding a new module: 06-solomachine module=server
5:58PM INF adding a new module: 07-tendermint module=server

I think we're good to go. I'll wait for a review, before I cut a new version.

@okdas okdas marked this pull request as ready for review November 27, 2024 01:59
@okdas okdas requested review from Olshansk and bryanchriswhite and removed request for Olshansk November 27, 2024 02:00
// Adds new parameters using ignite's config.yml as a reference. Assuming we don't need any other parameters.
// https://github.com/pokt-network/poktroll/compare/v0.0.10...v0.0.11-rc
applyNewParameters := func(ctx context.Context) (err error) {
logger := cosmosTypes.UnwrapSDKContext(ctx).Logger()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @bryanchriswhite for the hint!

) upgradetypes.UpgradeHandler {

// Adds new parameters using ignite's config.yml as a reference. Assuming we don't need any other parameters.
// https://github.com/pokt-network/poktroll/compare/v0.0.10...v0.0.11-rc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be -rc or just 0.1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be an -rc here. We created RC release to bootstrap beta TestNet. The release must have this code (the upgrade handler) inside the binary. So we're just using the RC as a version to create a release on.

app/upgrades/v0.0.11.go Outdated Show resolved Hide resolved
SourceOwner: 0.15,
Application: 0.0,
}
tokenomicsParams.DaoRewardAddress = AlphaTestNetPnfAddress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be AlphaTestNetPnfAddress or BetaTestNetPnfAddress?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an upgrade exclusive to Alpha TestNet

}

// The diff shows that the only new authz authorization is for the `poktroll.session.MsgUpdateParam` message.
// However, this message is already authorized for the `pokt10d07y265gmmuvt4z0w9aw880jnsr700j8yv32t` address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the alpha or beta PNF address

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither/both? This is a gov module address that persists across networks.

@okdas okdas requested a review from Olshansk November 27, 2024 20:46

// Set tokenomics params. The values are based on default values for LocalNet/Beta TestNet.
// Validate with: `poktrolld q tokenomics params --node=https://testnet-validated-validator-rpc.poktroll.com/`
tokenomicsParams := keepers.TokenomicsKeeper.GetParams(ctx)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the line that caused a panic: https://gist.github.com/okdas/50f417e3a86230e9e1aea85da841e1de

I need to investigate why I did not have this issue during local upgrade from v0.0.10 to v0.0.11. Will try to replicate using ONLY binaries provided in releases.

Could it be the before state is different on Alpha TestNet? If so, going to document this so we check for that in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the line that caused a panic: gist.github.com/okdas/50f417e3a86230e9e1aea85da841e1de

Why did this cause the panic?

Copy link
Member Author

@okdas okdas Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Olshansk My first hypothesis it has happened due to the protobuf type change, but that should have been caught on the local test. For some reason, it did not. I'm going to be looking into that today.

I expect we will change an upgrade procedure to migrate each module individually, which can preserve the types between upgrades. There are some pros and cons associated with that. So far, I wanted the upgrade process to look simple, but it seems like additional complexity might worth it to avoid such issues in the future.

I need to spend more time to provide a complete answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to spend more time to provide a complete answer.

This is REALLY REALLY REALLY REALLY important, so please dive as deep as necessary to the finest grain of detail.

@okdas
Copy link
Member Author

okdas commented Dec 6, 2024

I'm going to remove the merge from main into this branch and do a force push.
This branch is not ready for code from main yet, as I'm still figuring out what happened on Alpha TestNet.

@Olshansk
Copy link
Member

Olshansk commented Dec 6, 2024

@okdas Following up on an offline discussion.

This branch is not ready for code from main yet, as I'm still figuring out what happened on Alpha TestNet.

If you can't resolve it by mid-day today, let's do the following:

  1. Post your latest update + investigation + details
  2. Please move over to load-testing
  3. @bryanchriswhite will pick it up next week
  4. If @bryanchriswhite doesn't figure it out by mid week next week, I'll just in and help
  5. @okdas Please be ready (aside from working on (2)) to potentially help out with some questions

Copy link
Member

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undoing my approval per the latest discussion in this thread

@okdas
Copy link
Member Author

okdas commented Dec 6, 2024

Here's my investigation so far: https://www.notion.so/buildwithgrove/Alpha-TestNet-v0-0-11-Postmortem-152a36edfff680d58411c8fa384f877c?pvs=4

TLDR: KVStore had a different representation of empty params for the tokenomics module (alpha testnet vs local node), which explains the difference in behavior and why this was not caught locally.

I've also included some action items for us (me mostly) to do to avoid this and issues that were uncovered during this upgrade / postmortem investigation.

@bryanchriswhite I would love you to look into this as well, and I'm happy to pair/help.

@okdas
Copy link
Member Author

okdas commented Dec 6, 2024

OK, and now that I sent the previous message, it struck me what exactly happened here.

Alpha TestNet was provisioned using v0.0.9, which had a different protobuf for the tokenomics module params: https://github.com/pokt-network/poktroll/blob/v0.0.9/proto/poktroll/tokenomics/params.proto#L16

And it had a value, too: https://github.com/pokt-network/pocket-network-genesis/blob/master/shannon/testnet-alpha/genesis.json#L1018

That protobuf was then removed in v0.0.10, but the bytes were still preserved in the KVStore: 082A - see them decoded here.

As I did not see any changes in the protobuf when comparing v0.0.10 to v0.0.11, I deemed it safe to read the parameters from KVstore, while it was not.

If we did not remove the parameter or used a different field ID for the new fields, this wouldn't happen.

@bryanchriswhite
Copy link
Contributor

bryanchriswhite commented Dec 9, 2024

TL;DR, we SHOULD use reserved for deleted fields AND we SHOULD be versioning our protobuf types (e.g. proto/tokenomics/v1beta1).

OK, and now that I sent the previous message, it struck me what exactly happened here.

🤯 Nice sleuthing! 🚀

If we did not remove the parameter or used a different field ID for the new fields, this wouldn't happen.

Point taken that going forward, we need to be much more careful about backwards compatibility. In this particular case, I don' think there's any way we could've avoided moving this field (to the shared module) as it was necessary to resolve a dependency cycle. In cases like this, it seems like the correct thing to do, going forward, would be to remove the field but also add its field number (and possibly field name) as reserved.

FWIW, this case would also be a bit unusual, IMHO, to resolve with versioned types as we would end up having multiple conditional branches in the effected module's InitGenesis function to handle the two different versions of genesis. 🤔 I think this makes much less sense than the typical case, which would be where a message or query handler has conditional branches for each msg/query version. "Genesis backwards-compatibility" seems like a bit of an oxymoron. 😂

EDIT: Upon further reflection, I realize that I was conflating the modification the genesis type itself with what's happening here; we changed a dependency of the genesis type. This may actually be more of a typical case after all. To rephrase, the genesis type wasn't modified, neither was it expected that the genesis JSON should be modified. As a result, in such cases, the ImportGenesis indeed MUST include conditional branches to handle the discrepancy between the genesis JSON data and the genesis structure with the modified dependency.

This (re-)raises the point of versoined protobuf source, which we currently are not setup for, neither from a go package nor a proto source package persective. I think we will need to do something like the following in order to pre-allocate sufficient degrees of freedom to ensure we can safely increment the protobuf types into the future, without causing chaos from both the go module/pkg and buf.build perspectives:

Current proto directory structure:

./proto
└── poktroll
    ├── application
    │   └── *.proto
    ├── gateway
    │   └── *.proto
    ├── proof
    │   └── *.proto
    ├── service
    │   └── *.proto
    ├── session
    │   └── *.proto
    ├── shared
    │   └── *.proto
    ├── supplier
    │   └── *.proto
    └── tokenomics
    │   └── *.proto

Conventional (versioned):

./proto
└── poktroll
    ├── application
    │   └── v1beta1
    │       └── *.proto
    ├── gateway
    │   └── v1beta1
    │       └── *.proto
    ├── proof
    │   └── v1beta1
    │       └── *.proto
    ├── service
    │   └── v1beta1
    │       └── *.proto
    ├── session
    │   └── v1beta1
    │       └── *.proto
    ├── shared
    │   └── v1beta1
    │       └── *.proto
    ├── supplier
    │   └── v1beta1
    │       └── *.proto
    └── tokenomics
        └── v1beta1
            └── *.proto

This would result in version-namespaced type URLs (e.g. /poktroll.tokenomics.v1beta1.Params) but I'm not certain what the effect on generated go source would be precisely. I would assume that we would then be generating distinct go types for each version, in distinct packages (e.g. x/types/v1beta1/*.pb{,.gw}.go). This would involve updating the go_package option for all protobuf types.

(cc: @Olshansk )

@okdas okdas force-pushed the alpha-testnet-v0.0.11-upgrade branch from cf24879 to 986ed73 Compare December 10, 2024 00:51
Copy link

gitguardian bot commented Dec 12, 2024

⚠️ GitGuardian has uncovered 3 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
13506804 Triggered Generic High Entropy Secret c457f7b localnet/kubernetes/config-path-1.yaml View secret
14150880 Triggered Generic High Entropy Secret c457f7b localnet/kubernetes/config-path-3.yaml View secret
14150881 Triggered Generic High Entropy Secret c457f7b localnet/kubernetes/config-path-2.yaml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@Olshansk
Copy link
Member

Firstly, great investigation @okdas!

@bryanchriswhite I want to avoid the module/v1beta1/*.proto refactor just now.

Let's continue with:

  • Only incrementing IDs
  • Using reserved fields for deprecated fields

Things we'll look out for:

  • How do other cosmos chains do this?
  • When (if) we hit major proto refactors, we'll revisit package versioning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code health Cleans up some code
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

3 participants