Storage
Last updated
Last updated
β’ The persistence module of Canopy defines a Store
interface for interacting with blockchain storage.
The Store struct is a high-level abstraction layer built on top of a single Key-Value database, providing 3 main components for managing blockchain-related data:
State Store: Maintain all versions of the state (ledger) for every height of the blockchain.
CanoTrie: Efficiently verification of the entire state contents for any given version (height).
Indexer: Maintain non-state data like Transactions, Blocks, and Quorum Certificates.
Under the hood, the store uses in Managed Mode to keep historical versions of the state, enabling "time-travel" operations and historical queries.
βͺ All writes to the State Store, CanoTrie, and the Indexer happen inside a single BadgerDB batch, making sure theyβre committed atomically β one safe operation per block height
The State Store is where the actual data blobs β like Accounts and Validators β are stored to represent the blockchainβs state.
β’ It serves as the primary storage layer, organized into two parts: 'historical' partitions and 'latest' data.
This split makes it easy to quickly access snapshots, efficiently iterate over data, and safely prune older state without slowing down the system.
By utilizing a Key Value implementation, the database does not support schemas like a relational database engine such as SQL.
However, logical groupings and efficient iteration may be enabled through by utilizing prefix keys.
By prepending succinct byte
labels, a Key Value implementation is transformed into an organized architecture:
βͺ And by leveraging the lexicographical ordering of the database implementation, deterministic optimizations may be made on numbers using BigEndianEncoding
.
Thus, when retrieving a group, the keys are already ordered logically.
CanoTrie is an optimized Sparse Merkle Trie (SMT) designed for key-value storage. It combines properties of prefix trees and Merkle trees to efficiently handle sparse datasets and cryptographic integrity.
β’ At a high level, CanoTrie is optimized for blockchain operations, allowing efficient proof of existence
within the tree and enabling the creation of a single 'root' hash. This root hash represents the entire state, facilitating easy verification by other nodes to ensure consistency between a peer's states.
CanoTrie design:
Sparse Structure: Keys are organized by their binary representation, with internal nodes storing common prefixes to reduce redundant paths
Merkle Hashing: Each node stores a hash derived from its children, enabling cryptographic proofs for efficient verification of data integrity
Optimized Traversals: Operations like insertion, deletion, and lookup focus only on the relevant parts of the tree, minimizing unnecessary traversal of empty nodes
Key-Value Operations: Supports upserts and deletions by dynamically creating or removing nodes while maintaining the Merkle tree structure
Optimizations over standard SMT:
Any leaf nodes without values are set to nil. A parent node is also nil if both children are nil
If a parent has exactly one non-nil child, replace the parent with the non-nil child
A tree always starts with two children: (0x0...) and (FxF...), and a Root
Basic Algorithm:
Tree Traversal
Navigate down the tree to set current to the closest node based on the binary of the target key
Operation
Upsert (Insert or Update)
If the target already matches current: Update the existing node
Otherwise: Create a new node to represent the parent of the target node and current
Replace the pointer to current within its old parent with the new parent
Assign the current node and the target as children of the new parent
Delete
If the target matches current:
Delete current
Replace the pointer to current's parent within current's grandparent with the current's sibling
Delete current's parent node
ReHash
Update hash values for all ancestor nodes of the modified node, moving upward until the root
Example 1: Inserting 1101
Before Insert (Key = 1101):
After Insert (Inserted 1101 and 11):
Example 2: Deleting 010
Before Delete (Key = 010):
After Delete (Deleted 010 and compressed the tree):
When writing state data in the Store, 3 write operations are executed:
Latest State Store: { Key , Value }
Historical State Store: { Key , Value }
CanoTrie: { Hash(Key) , Hash(Value) }
Though there's an I/O tradeoff for writes, this implementation enables efficiency in the following operations:
Every few days, the Store executes partitioning logic:
Snapshot: At each partition boundary, copy the latest version of every key from the State Store into a new Historical Partition. This ensures that each Historical Partition contains a complete, independent snapshot of state at that height, allowing older partitions to be safely pruned without dependency risk.
Prune: Remove all versions in the Latest State Store that are older than the partition boundary, retaining only the newest versions necessary for live state execution.
Benefits of this design:
β Fast State Iteration: The State Store remains lean with minimal version history per key, dramatically improving iteration speed and reducing read amplification.
β Efficient Historical Access: Historical data is chunked into partitions, enabling faster point-in-time lookups and smaller, more targeted iterator ranges.
β No Reverse Seeks: Thanks to managed timestamps and partitioned snapshots, queries avoid expensive reverse scans.
β Organized Historical Hierarchies: Historical partitions are logically structured, making archival, backup, and pruning operations predictable and scalable.
This component indexes important non-ledger data such as Quorum Certificates, Blocks, and Transactions.
β’The indexer is designed to allow efficient querying and retrieval, which is crucial for blockchain operations β where large value blobs (like Blocks or Transactions) are stored only once, while lightweight indexes are maintained separately.
This approach enables fast and flexible logical queries, such as:
Transaction by Hash
Transactions by Height
Transactions by Sender (address)
Transactions by Recipient (address)
Block by Hash
Block by Height
Quorum Certificate by Height
Double Signers by Height
Checkpoint by ChainID + Height
The Mempool is a pre-block, in-memory store component that holds valid transactions that are pending to be included in a block.
β’ Canopy's default Mempool implementation ensures that only valid transactions are maintained in the pool by maintaining an ephemeral State Machine that applies the transactions as if the Mempool contents is going to be the next block included in the blockchain.
Pending transactions are validated at the Mempool level based on the current state and any transactions already ahead of them in the queue.
βͺ Mempool configurations allow users to limit their mempool by number of bytes and number of transactions β if either of these limits are exceeded, a certain percent of the lowest fee transactions are dropped by the pool.
Implementations that use one structure for state commitments, latest state reads, and historical state reads have shown to be incredibly inefficient for querying small sets of data. Leading to segment the store into different logical abstractions.
Importantly, the Mempool is ordered highest fee to lowest, with the exception of the 0 fee Certificate Result Transaction that requires a +β signature from an auto-subsidized .