Sharding [ /ˈʃɑːdɪŋ/ ] is a method of partitioning a database horizontally across separate servers to improve scalability, performance and data availability.
In distributed ledgers (DLTs) like Radix, sharding is used to allocate both data storage and transaction execution across a decentralized network of nodes to achieve a high transactional capacity.
Traditional blockchains like Bitcoin and Ethereum rely on every node in the network processing and storing every transaction. This provides strong decentralization and security by avoiding reliance on any trusted parties. However, it fundamentally limits the throughput of the system to what a single node can validate, resulting in poor scalability.
Sharding aims to transcend this trilemma by creating a mechanism where nodes only need to store and process a subset of the total transactions, known as a ‘shard.’ By splitting up the workload in this manner, sharding enables distributed ledgers to parallelize the processing of transactions across shards, potentially increasing the throughput of the network quadratically compared to a non-sharded system.
Radix has developed an integrated sharding and consensus architecture specifically designed for hyper-scalability of its decentralized network. In Radix’s case, sharding applies to both data availability and transaction execution as both functions are performed by nodes.
The current Radix Mainnet (Babylon) is sharded into a fixed number of 2^256 shards. Responsibility for validating shards is undertaken by groups of validators called shard groups, which may grow or shrink dynamically in response to load demand. Currently, the number of shard groups is capped at one but this will be lifted with Radix’s forthcoming Xi’an release.
Pre-sharding is in contrast to the dynamic adaptive state sharding model adopted by Shardeum, MultiversX, and NEAR, where shards are added incrementally as required. While sharding can improve scalability, an ad hoc approach to sharding leads to substantial difficulties as any changes to the shard structure require reorganizing the entire network - a time consuming and expensive process. The larger the sharded ledger grows, the more problematic this becomes. Ad hoc sharding also complicates queries and data lookups within the ledger. By sharding the data randomly, it becomes much harder to locate specific transactions or data points since they could be stored anywhere. This slows down queries as more extensive searches are required.
Shards on Radix are indexed deterministically by public keys. This means that the shard index for any address can be calculated by taking the modulo of the public key over the shard space.
$$ \Large s_i = \frac {\mathrm{mod}~p_i}{S} \qquad \footnotesize \qquad \begin{array}{l} s = shard~index \\ p = public~key\\S = total~ shard~space \end{array} $$
By deterministically grouping related data into the same shard, Radix avoids the need for expensive data reorganization as the network grows. This creates four major advantages:
A key challenge in sharding distributed ledgers is ensuring sufficient security and node coverage across all shards. If some shards have much fewer nodes than others, it creates vulnerabilities. Radix employs several techniques to maintain security across its sharded network:
Together, these mechanisms ensure Radix can securely scale to an exponentially growing shard space without running into coverage gaps or centralization issues. The network organically self-regulates to distribute validation across shards.
Main article: Cerberus