📖
[Deprecated] Launchpad Curriculum
  • README
  • About Launchpad
  • Launchpad Learning Resources
    • The Protocol Labs Network
      • What is the PL Network?
      • Teams in the PL Network
      • Protocol Labs Culture
      • Where We are Headed
      • Open Source Stewardship
      • Open Source Contribution
    • IPFS
      • IPFS Setup (Tutorial)
      • Mutable Content
      • IPFS Public DHT
      • Who Uses IPFS
      • How IPFS Works
      • Data Transfer
      • IPFS Subsystems & Architecture
      • The IPFS Gateways
      • The IPFS API (Tutorial)
      • IPFS Resources
    • IPLD
      • Content Addressing & CIDs
      • Graphs: Merkle DAGs
      • The IPLD Data Model
      • IPLD & IPFS
      • Codecs
      • IPLD Schemas
      • Paths & Selectors
      • Distributed Data Structures
      • The CAR Format
      • IPLD Resources
    • libp2p
      • Core Concepts
      • libp2p Connections and Users
      • Decentralized Messaging
      • Gossipsub
      • Distributed Networking Solutions
      • Nat Traversal and Hole Punching
      • libp2p Resources
    • Filecoin
      • How Filecoin Works
      • Filecoin Cryptoeconomics
      • The Filecoin Protocol
      • Drand
      • Progress & Future Work
      • Filecoin Resources
  • Other Resources
    • Pre-Requisites
      • Web3 & Protocol Labs
      • Building Web3
      • Web3 Tools
    • Protocol Labs Toolkits & SDKs
      • Textile
      • Ceramic
      • Fleek
      • Web3.Storage
      • Piñata
    • Built On IPFS and Filecoin
    • Additional Learning
Powered by GitBook
On this page
  • Merkle Roots
  • Merkle Mutability
  • Further Reading
Edit on GitHub
  1. Launchpad Learning Resources
  2. IPLD

Graphs: Merkle DAGs

PreviousContent Addressing & CIDsNextThe IPLD Data Model

Last updated 2 years ago

DAG is an acronym for Directed Acyclic Graph. This is a basic concept arising from graph theory and defines: any connected set of data where the connections go in only one direction ("directed") and the graph does not contain any cycles ("acyclic"). Ralph Merkle, formalised the hash tree pattern 1979 which specializes a form of DAG. Essentially:

Content being hashed may also contain hash digests of other content; therefore, any content "address" authenticates content "linked" via the inclusion of their digest in the tree below it

Hence: Merkle DAGs.

"Merkle Trees" are another common term for the same thing, however these are often associated with strict trees of hashes, where each level of the tree is only the concatenation of hashes below it. This structure is useful for "inclusion" algorithms but is not strictly necessary to define a "Merkle DAG" (or "tree"). Any node in the tree may contain hashes of content below that node along with any other content that may be useful at that node.

If you haven't worked with Merkle Trees before, .

Merkle Roots

The tip of any Merkle DAG is called its "root". The hash of that root can be said to authenticate (and "link to") all of the other content below it in the graph simply because each point in that graph includes hashes of content below it (i.e. hashes of hashes, all the way down).

A Git commit is the hash digest of the root of a large graph. This root "block" is a concatenation of various pieces of data, including the author of the commit, the timestamp, the previous commit hash digest and the hash digest of the filesystem "tree". The filesystem tree is built by hashing files, then concatenating the hashes of files within a directory to form a directory node which is then hashed and included in the parent of that directory, etc. i.e. a Git commit hash digest only hashes a small piece of data, but that data includes hashes of other data, forming a large tree spanning the files in that commit and all previous commits in the history of the Git repository.

A Bitcoin block is the hash of only 80 bytes of data! But that small packet of data includes the hashes of the previous block as well as the tip of a very large tree containing the transactions contained within that block. Bitcoin and other blockchains use the immutability and directionality of DAGs for authentication and inclusion of the entire history of the chain.

We can use the root of a Merkle DAG to address arbitrarily large amounts of data.

Merkle Mutability

A graph of immutable data can said to be "mutable" if we accept that the hashes will need to change to match the changed data, and these changed hashes will propagate all the way up through the graph to a new root.

Mutating content addressed data in this way provides some interesting properties, we can add, remove or modify data at any point in the graph and generate a new root each time.

  • New roots give us snapshots for each change.

  • Different snapshots (roots) may address much of the same data, since only small parts of the graph may have changed, this gives us de-duplication. One of the ways that Git is efficient is that only the changes need to be stored, and only those changed parts of a tree need be re-hashed to generate a new root (commit).

IPFS benefits from these same properties. A copy of Wikipedia on IPFS may take up many GBs, but changing a single page only requires changes to that page and its parents (up to the new "root"!). Having a CID for the old and new lets us move back and forward in time through the snapshots, while not needing to download the whole thing from scratch.

Further Reading

on Proto.school, a guided, interactive tutorial.

Merkle DAGs: Structuring Data for the Distributed Web
Merkle proofs Explained.
check out the tutorial in Protoschool to get a better understanding