Blockchain

Summary

A blockchain is simply a distributed ledger. Unlike in the trust-based model, this ledger is collectively maintained by a peer-to-peer network. Nodes in the network follow consensus rules to maintain a single truthful history of transactions, as long as the majority of the network is not controlled by a few malicious parties. No central authority is needed in this process and trust is established through computation.

Trust-based vs Trustless Transactions

When you pay a 20-dollar bill at a brick-and-mortar store, you no longer possess the bill. So spending the same money at a different store is not possible. On the Internet, when you make a purchase using your debit card, your bank records the transaction. Double-spending is also impossible without hacking into your bank account.

In the brick-and-mortar case, the store needs to trust nothing more than knowing the bill is not counterfeit. Therefore, it does not usually ask for your personal information.  This is an example of offline trustless transaction. On the contrary, for the transaction over the Internet, your bank acts as a trusted 3rd party (intermediary) between you and the online store. This is an example of online trust-based transaction.

trustlesstrust-based

Preventing double-spending becomes much harder for online trustless transactions: is it possible to transact on the Internet without a trusted 3rd party sitting between a seller and a buyer?

The solution proposed by the original bitcoin paper [1] replaces a central authority with a peer-to-peer network, and its private ledger with a public distributed ledger. In this scenario, a party initiates a transaction by publishing it to the peer-to-peer network. Nodes in the network follow a set of consensus rules to validate the transaction and record (or refuse to record) it in the distributed ledger. The intermediary and the trust in between transacting parties and the intermediary is completely removed from the picture. Instead trust in transactions recorded in the distributed ledger is established through computation.

Trustless (online).png

Consensus in a Peer-to-Peer Network

How is consensus reached in a distributed network? In other words, how do nodes in the network agree on which transactions to be added to the distributed ledger and in what order? When there is central control unit in the network, consensus can be simply enforced by this unit making a decision, and broadcasting it to the rest of a network. But this same unit also represents a single point of failure of the whole system.

In comparison, in a peer-to-peer network, each peer has equal privileges and responsibilities. The system is more resilient to local failures (e.g., unreliable nodes or unreliable communication among them). In fact, nodes in a peer-to-peer network can join and leave the network at will, without causing the system to fail. On the flip side, consensus is much hard to achieve. The following example illustrates the challenge in reaching consensus in a peer-to-peer network.

double_spending

In this example, Bob sends a coin to Alice (TX1) and publishes it to the network. For the sake of the simplicity, we assume there are only 3 nodes in the network. TX1 first reaches Node 3, which validates and propagates it to the rest of the network. Suppose Alice tries to double spend the coin received from Bob. First, Alice sends it to John in TX2. Before TX2 is added to the ledger, She sends the same coin to Mike in TX3. As shown in the diagram, TX2 reaches Node 1 first while TX3 reaches Node 2 first. Since these two transactions are originated from the same input, only one can be added to the ledger. As the nodes may receive TX2 and TX3 in different orders, they cannot simply make the decision based on the order when the transaction were received.

Blockchain

The key to reach consensus in a peer-to-peer network is that each peer in the network follows a same set of rules to determine the validity of transactions. To make sense of these rules, it is necessary to introduce some concepts such as transaction blocks,  mining and block-chaining.

When nodes receive transactions, they pack them into blocks. A block is simply a collection of transactions waiting to be added into the ledger. Blocks created by different nodes may contain different transactions. Continuing on with our earlier example, Node 1 may creates a block that contains TX2, while Node 2 may creates a block that contains TX3.

After a node creates a new block, the goal is to add it to the ledger before anyone else does. In order to do so, the node computes a hash for the block using a cryptographic hash function. You can think of a hash as a single fixed-size (256-bit/32-byte) digest of a large piece of data. Hashes are large numbers and are usually written as hexadecimal. For example, the hash for Bitcoin Block #1000 is [4]:

00000000c937983704a73af28acdec37b049d214adbda81d7e2a3dd146f6ed09

Two properties of the cryptographic hash function [3] makes it useful in this context:

  1. the hash function is one-way: while it is quick to compute a hash from its input, it is practically infeasible to recover the input from a hash,
  2. the hash function is deterministic: the same input almost always produces the same hash, and a small change to the input changes its hash dramatically.

To be accepted into the ledger, a block’s hash must have a certain number of leading zeroes, as shown in the example above. To compute a block’s hash, the node includes a random number, called nonce, in the block. The goal is then to find a right nonce for the block so that its hash has required number of leading zeros. The “one-way” property of the hash function means that the only way to find the right nonce is by brute force. That is by trying every possible nonce until the right one is found. The more leading zeros is required, the more hash computation is needed on average to find the right nonce.

We can think of this process as a race, where the nodes compete with one another for the next spot in the ledger. This race is intentionally designed to be a resource-intensive, where resources being computing time and electricity. The winner gets a reward for compensating the resource it consumes. This process is called “mining” as an analogy to gold mining, and mining nodes in the network are called “miners“.

When a block is successfully mined, it is added to the ledger by chaining after the previous block. The block-chaining operation is implemented by including the hash of the previous block as part of the input when computing current block’s hash. The ledger is  then simply a chain of blocks formed in this way, thus the name blockchain. The “deterministic” property of the hash function ensures when a block is added to the blockchain, it and the transactions it contains cannot be modified without changing the hashes of itself and all the blocks chained after it.

It is helpful to visualize a blockchain as a stack of blocks. The following diagram shows the blockchain, assuming TX1 and TX2 in our example is recorded in Block #1000 and #1001 respectively.

blockchain.png

Decentralized Consensus

To reach consensus, nodes in the peer-to-peer network follows a set of  consensus rules to generate and validate transactions and blocks. For example, a transaction-level rule makes sure a new transaction can not spend coins that have already been spent by early transactions. A block-level rule checks if a block’s data produces its claimed hash value, which has required number of leading zeros. [5] describes these rule in great details.

Using the same set of consensus rules, each nodes in the peer-to-peer network validates transactions and blocks independently. This ensure the robustness of the system, as nodes do not depend on one another for decisions-making. After a newly announced block (and the transactions it contains) passes all  the rules, it is considered as valid and is added to the local ledge of the validating node. Because all nodes follow the same set of rules, they eventually add the same blocks to their local ledgers and in the same order, and therefore reach a system-wide consensus. In other words, the peer-to-peer system effectively maintains a single distributed ledger, where each node keeps a local cache of that ledger.

This provides a mechanism to achieve decentralized emergent consensus, which is considered the main innovation in Bitcoin [5]. Decentralized, because there is no central decision-making for reaching consensus during system operation time (although the consensus rules have be agreed on before hand). Emergent, because consensus is not achieved explicitly – there is no election or fixed moment when consensus occurs. Instead, consensus is an emergent artifact of the asynchronous interaction of thousands of independent nodes, all following simple rules.

Computation-based Trust

The cryptographic hash function plays critical roles in providing decentralized consensus. That is why bitcoins and other alt-coins are often called cryptocurrencies. First, mining, resulted directly  from the “one-way” property of the cryptographic hash function, provides proof-of-work for a block to be considered as valid by the network. A proof-of-work is a piece of data that is difficult (costly, time-consuming) to produce, but easy for others to verify.

Recall that to produce a hash with certain number of leading zeros for a block, a miner must perform a large number of hash calculations in a brute force manner. This process consumes computing time and electricity. Once a hash is found, everyone in the network can easily verify the hash is correct (by performing a single hash calculation) and estimates how much resource has been spent on average to produce the hash.

At the same time, the block-chaining operation ensures no one can change transaction history without incurring large cost. More specifically, in order to change any transaction in a block, a miner must repeat the mining operations for that block and all the blocks chained after it, thanks to the “deterministic” property of the cryptographic hash function. Even if a miner did repeat required mining operations and produce a competing chain of blocks, that chain must outpace (be longer than) the original one in order to be accepted by the network.

It is important to realize that mining and block-chaining operations together ensure security of the system, and establish trust in the information recorded in the blockchain. It has been shown [1] that, as long as the majority of the miners is not controlled by the attacker,  the probability diminishes exponentially as the number of blocks the attacker has to catch up with increases. This computation-based trust is in sharp contrast to the trust in an intermediary between transacting parties based largely on good faith.

References

  1. Satoshi Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System. Technical Report, 2008. link
  2. Leslie Lamport, Robert Shostak and Marshall Pease, The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems, pages 382-401, volume 4/3, 1982. link
  3. Cryptographic hash  functionWikipedialink
  4. Block Explorer. blockchain.info. link
  5. Andreas M. Antonopoulos, Mastering Bitcoin: Unlocking Digital Cryptocurrencies. O’Reilly Media, 2014. link