Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/tx tracking #5484

Draft
wants to merge 19 commits into
base: develop
Choose a base branch
from
Draft

Feat/tx tracking #5484

wants to merge 19 commits into from

Conversation

rdeioris
Copy link
Contributor

@rdeioris rdeioris commented Nov 20, 2024

Description

This PR implements another step of #4358, particularly the transactions query system.

As discussed previously this is an optional endpoint useful for implementers of services/abstraction apis that can be enabled with the already available STACKS_TRANSACTION_LOG environment variable.

The current transaction log system is based on storing each transaction in the "transactions" table of the index db. Each row contains the txid, the index_block_hash and the hex encoded body of the transaction. It means in the current form it requires a lot of storage and lot of transactions are basically orphaned ones.

The patch covers the first issue related to "testability", as the TRANSACTION_LOG value holding the status of the STACKS_TRANSACTION_LOG env var is a static per-process one. When in test mode TRANSACTION_LOG become a thread_local (instead of lazy_static) with the idea of each test requiring it calls borrow_mut() for setting it.

There is a bit of refactoring in the TRANSACTION_LOG check as now it is overlayed by an is_transaction_log_enabled() function. Should not be an issue as this will be inlined when in non-test mode.

It is not necessarily "the best" solution but only a surgical way for not thrashing the current code.

It is important that every test using it restores the original condition, so i have used a Drop-trait structure for keeping the state (i do not know how much rust-idiomatic it is honestly).

The user queries by txid, so one of the main task of the endpoint is to check for the txid with a valid index_block_hash (optionally by enforcing a specific tip qith the tip querystring)

Applicable issues

Additional info (benefits, drawbacks, caveats)

Currently the body of the transaction is extracted from the nakamoto staging db instead of the transactions table. This is definitely inefficient but i did it with the idea that probably we could remove the tx body itself from the transactions table (drastically reducing the size required for it, especially because i am really doubtful it will be useful to query invalid/orphaned transactions and the query of the mempool is already exposed by the /unconfirmed endpoint)

As an additional point, we could even reduce the impact of the patch by just returning the index_block_hash in the endpoint output instead of the whole body of the transaction (the tx body extraction can happen with an additional /v3/blocks query). Right now a json with the index_block_hash and the body is returned.

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated
  • Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)
  • New clarity functions have corresponding PR in clarity-benchmarking repo
  • New integration test(s) added to bitcoin-tests.yml

@@ -665,6 +680,23 @@ impl<'a> ChainstateTx<'a> {
}
}
}

pub fn get_index_block_hashes_from_txid(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way this would return multiple rows is if a transaction was mined in two different blocks. Is that a query this function is intended to serve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the same txid can ends multiple times in the transactions table, obviously we are interested in txid that actually ends in the chain so it will be a 2 step process where we return only the row with a valid (in the staging db) index_block_hash. If a user knows the index_block_hash this would not be required, but at the same time if the index-block_hash is known there is no need for this endpoint (you can just extract the transaction from the block)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the same txid can ends multiple times in the transactions table,

Can you explain to me how this happens?

@rdeioris rdeioris marked this pull request as ready for review December 1, 2024 17:29
@rdeioris rdeioris requested review from a team as code owners December 1, 2024 17:29
@rdeioris
Copy link
Contributor Author

rdeioris commented Dec 1, 2024

Updated to draft to a true pull request, i have cleaned up the code and added a new HttpNotImplemented error (501) for when the transaction log is not enabled

@aldur aldur requested a review from obycode December 9, 2024 16:35
obycode
obycode previously approved these changes Dec 9, 2024
Copy link
Contributor

@obycode obycode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs/rpc-endpoints.md Outdated Show resolved Hide resolved
.take()
.ok_or(NetError::SendError("`txid` no set".into()))?;

node.with_node_state(|_network, _sortdb, chainstate, _mempool, _rpc_args| {
Copy link
Member

@jcnelson jcnelson Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you are trying to answer two separate queries here:

  • What is the transaction for the given txid?
  • Is the transaction on the canonical Stacks fork?

Answering the latter can be very, very expensive with the approach you have taken. You're effectively scanning an unbounded number of blocks' transactions, and doing MARF reads on each block relative to the chain tip.

I don't know if there is value in only returning transactions if they are on the canonical chain, since nearly all the time they will be. Instead, I think it's just as effective to return the transaction and the list of block IDs in which the node knows it occurs (which will almost always be one block). It would then be on the user to verify that those blocks are canonical.

Copy link
Member

@jcnelson jcnelson Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked in another thread how it could be possible for a transaction to be mined in multiple different Nakamoto blocks, and I think the only way this could happen is if there is a deep Nakamoto fork. This would only happen if there was a Bitcoin fork, and the second sibling Bitcoin block was mined substantially later than the first sibling, and the second (late) sibling became canonical. Nakamoto would preserve both Bitcoin forks and both Stacks forks atop them, and would write a transaction log entry for the same transaction in both forks.

It's not impossible, but it would be so unlikely that it might not be worth solving at this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this approach way more than the current one, it will make the code way easier and extracting the tx from a block given the index_block_hash is straightforward.

I still get dozens of duplicated txid (with different index_block_hash) in my testnet transaction_log, i will try to understand why this is happening

### Changed

- Use the same burn view loader in both block validation and block processing
>>>>>>> develop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge artifact

@rdeioris rdeioris marked this pull request as draft January 3, 2025 10:01
@rdeioris
Copy link
Contributor Author

rdeioris commented Jan 3, 2025

Converting back to draft as i am changing the configuration logic as discussed during the nakamoto meetings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants