aptos-foundation · areshand · Jun 27, 2024
diff --git a/aips/store_vm_debug_info_on_side.md b/aips/store_vm_debug_info_on_side.md
@@ -0,0 +1,104 @@
+---
+aip: (this is determined by the AIP Manager, leave it empty when drafting)
+title: Store VM Debug Info on Side
+author: areshand
+discussions-to (*optional): <a url pointing to the official discussion thread>
+Status: Draft
+last-call-end-date (*optional): <mm/dd/yyyy the last date to leave feedbacks and reviews>
+type: Core
+created: 06/27/2024
+updated (*optional): <mm/dd/yyyy>
+requires (*optional): <AIP number(s)>
+---
+
+# AIP-Store VM Debug Info on Side
+
+**Summary**
+
+We aim to separate transaction failure details from the blockchain authentication. Currently, our ledger history is verified through hashes of `TransactionInfo`, which includes detailed errors of failed transactions. This debug information is valuable for developers to understand failures and update their contracts. However, incorporating debug information in the root hash calculation prevents us from updating error details to enhance the debugging experience, as this easily breaks the backward compatibility
+
+**High-level Overview**
+
+The error details to be decoupled and stored separately are limited to the `StatusCode` of `MiscellaneousError`.  It would be feasible to have more non-critical fields stored on side once this basic flow is established. 
+
+```rust
+pub enum ExecutionStatus {
+    Success,
+    OutOfGas,
+    MoveAbort {
+        location: AbortLocation,
+        code: u64,
+        ****info: Option<AbortInfo>,
+    },
+    ExecutionFailure {
+        location: AbortLocation,
+        function: u16,
+        code_offset: u16,
+    },
+    MiscellaneousError(Option<StatusCode>), // we plan to remove status code
+```
+
+These error detail fields will be set to None, with their actual values written to a new persistent storage. Call-sites that read this information will retrieve the error details from the persistent storage in the reference implementation.
+
+**Out of Scope**
+
+There are two types of debug info generated by VM. We don’t plan to migrate the debug info that records a past failure and will stay the same over time such as AbortLocation, code, function AbortInfo and failure code_offset. 
+
+**Impact**
+
+| Simulation API | Fullnode (Execution Mode) | Fullnode (Applying Output Mode) |
+| --- | --- | --- |
+| No impact | No impact | Missing error details |
+
+There will be no impact on the simulation API; transactions will still be executed by the VM, and the error details will be preserved. The same applies to fullnodes operating in execution mode.
+
+However, for fullnodes that apply transaction outputs without executing the transactions, the detailed error information will be missing.
+
+To mitigate the impact, we are taking the following steps:
+
+1. Switch fullnodes backing the explorer to execution mode so that people can view detailed errors in our explorer. These detailed errors will be pruned within the same pruning window as other state data.
+2. Support restoring detailed error codes for historical transactions through our DB restore tooling, enabling the recovery of error details for old transactions. The aptos-debugger can rerun a past transaction against any fullnode and see the detailed error info. We will update the CLI and documentation to inform developers this is an option.
+
+With these two steps and the simulation API remaining unaffected, we believe the common use cases for detailed errors are covered.
+
+**Alternative solutions**
+
+We considered adding transaction auxiliary data as an additional source for state sync, allowing fullnodes to obtain the auxiliary data through state sync. However, we deprioritized this work based on two considerations:
+
+1. These information is not authenticated by the chain and this synchronization is not trustless
+2. Common use cases for detailed errors are already covered.
+3. We are migrating to a consensus observer model where fullnodes will sync by applying transactions.
+
+**Specification and Implementation Details**
+
+Once the VM finishes executing an aborted transaction, it generates both the transaction output and VM status. The VM status contains the detailed error code, which we use to populate the simulation API results, ensuring they remain unchanged.
+
+We are introducing a new field called `TransactionAuxiliaryData` to the BCS serialized `TransactionOutput` and removing the detailed error code from the `TransactionStatus`. The `TransactionAuxiliaryData` is written to a new database and can be read per version. Non-simulation APIs that read on-chain transaction data will check for the presence of auxiliary data and use it to populate detailed errors if available.
+
+[https://documents.lucid.app/documents/c3c3c13f-c3ce-4af0-a4d7-41b01302a4c9/pages/0_0?a=1006&x=-911&y=-287&w=2422&h=1020&store=1&accept=image%2F*&auth=LCA%2027cc0f6fefa91e16d0c7b6d1dc7a30f8e27cbaf3556d470381089b95ed6e9301-ts%3D1719448284](https://documents.lucid.app/documents/c3c3c13f-c3ce-4af0-a4d7-41b01302a4c9/pages/0_0?a=1006&x=-911&y=-287&w=2422&h=1020&store=1&accept=image%2F*&auth=LCA%2027cc0f6fefa91e16d0c7b6d1dc7a30f8e27cbaf3556d470381089b95ed6e9301-ts%3D1719448284)
+
+**Risks and Drawbacks**
+
+If only a subset of validators enable this feature, the root hash of their accumulator will differ from the rest, resulting in a lack of consensus with other validators. To ensure consensus is maintained, all validator nodes must upgrade to a new binary version that supports it before we enable the feature on-chain through a governance proposal.
+
+**Security Considerations**
+
+No security concerns
+
+**Future Potential**
+
+The transaction auxiliary data introduced through this change can be used to store other non-critical transaction-related information that needs to be persisted in the database without being used to authenticate the ledger history.
+
+This approach makes it easier to update the error codes for transactions without breaking backward compatibility, allowing us to provide better error details for developers.
+
+**Timeline**
+
+The implementation is done and hide behind a feature flag. 
+
+**Suggested developer platform support timeline**
+
+We don’t expect any immediate developer experience impact. It is desirable to store this information in future on indexer to have a longer retention and reduce the burden from main DBs
+
+**Suggested deployment timeline**
+
+Once AIP is approved, we can deploy in the following release