Introduction

Welcome to The Dat Protocol, a technical book about The Dat Project. The Dat protocol is a p2p hypermedia protocol. It provides public-key-addressed file archives which can be synced securely and browsed on-demand.

Who This Book Is For

This book is written for people interested in understanding the details of the Dat protocol, and potentially implementing parts of it themselves. We go into details about the different components that make up the protocol, and provide guidance on how to approach an implementation yourself.

This book is different from the Dat Enhancement Proposals (DEPs). DEPs focus on creating standardization and specification of the Dat protocol. This book focuses on providing an introduction to key concepts, and guidance on how to implement them.

Source Code

The source files from which this book is generated can be found on GitHub.

Key Concepts

In this section we'll introduce you to the conceptual components that make up the Dat protocol. You don't need to know these to create your first Dat archive, but they're essential if you're trying to understand what happens at the protocol level.

Flat Tree

Flat Trees are the core data structure that power Dat's Hypercore feeds. They allow us to deterministically represent a tree structure as a vector. This is particularly useful because vectors map elegantly to disk and memory.

Because Flat Trees are deterministic and pre-computed, there is no overhead to using them. In effect this means that Flat Trees are a specific way of indexing into a vector more than they are their own data structure. This makes them uniquely efficient and convenient to implement in a wide range of languages.

Thinking About Flat Trees

You can represent a binary tree in a simple flat list using the following structure:

      3
  1       5
0   2   4   6  ...

Let's rotate the tree on its side for notational purposes:

 0─┐
   1─┐
 2─┘ │
     3
 4─┐ │
   5─┘
 6─┘

Each number represents an index in a flat list. Given the tree:

 D─┐
   B─┐
 E─┘ │
     A
 F─┐ │
   C─┘
 G─┘

The way this would be expressed in-memory would be as the list (vector): [D B E A F C G] or [0 1 2 3 4 5 6].

Depth

Indexes 0, 2, 4, 6 have a depth of 0. And indexes 1 and 5 have a depth of 1.

depth = 2  ^        3
depth = 1  |    1       5
depth = 0  |  0   2   4   6  ...

If we convert the graph to a chart we could express it as such:

depth = 0 | 0 2 4 6
depth = 1 | 1 5
depth = 2 | 3
depth = 3 |

Now let's add numbers up to 14:

depth = 0 | 0 2 4 6 8 10 12 14
depth = 1 | 1 5 9 13
depth = 2 | 3 11
depth = 3 | 7

Node Kinds

You might be noticing that the numbers at depth = 0 is vastly greater than the amount of numbers at every other depth. We refer to nodes at depth = 0 as leaf nodes, and nodes at every other depth as parent nodes.

leaf nodes   | 0 2 4 6 8 10 12 14
parent nodes | 1 3 5 7 9 11 13

An interesting aspect of flat trees is that the number of leaf nodes and number of parent nodes is in perfect balance. This comes to an interesting insight:

  • All even indexes refer to leaf nodes.
  • All uneven indexes refer to parent nodes.

The depth of a tree node can be calculated by counting the number of trailing 1s a node has in binary notation.

5 in binary = 101 (one trailing 1)
3 in binary = 011 (two trailing 1s)
4 in binary = 100 (zero trailing 1s)

Offset

When reading about flat-trees the word offset might regularly pop up. This refers to the offset from the left hand side of the tree.

In the following tree the indexes with an offset of 0 are: [0 1 3 7]:

(0)┐
  (1)┐
 2─┘ │
    (3)┐
 4─┐ │ │
   5─┘ │
 6─┘   │
      (7)

In the next tree the indexes with an offset of 1 are: [2 5 11]:

  0──┐
     1──┐
 (2)─┘  │
        3──┐
  4──┐  │  │
    (5)─┘  │
  6──┘     │
           7
  8──┐     │
     9──┐  │
 10──┘  │  │
      (11)─┘
 12──┐  │
    13──┘
 14──┘

Relationships Between Nodes

When describing nodes we often also talk about the relationship between nodes. This includes words such as uncle, and parent.

Take this example tree:

 0─┐
   1─┐
 2─┘ │
     3─┐
 4─┐ │ │
   5─┘ │
 6─┘   │
       7
 8
  • parent: A parent has two children under it, and is always odd-numbered. Node 3 is the parent of 1 and 5.
  • leaf: A node with no children. A leaf node is always even-numbered. Nodes 0, 2, 4, 6 and 8 are leaf nodes.
  • sibling: The other node that shares a parent with the current node. For example nodes 4 and 6 are siblings.
  • uncle: A parent's sibling. Node 1 is the uncle of nodes 4 and 6.
  • root: A top-most node where the full tree under it is complete (e.g. all parent nodes have 2 children). Node 3 is a root node.
  • span: The two nodes that are furthest away in the sub-tree. The span of node 1 is 0, 2. The span of node 3 is 0, 6.
  • right span: The left-most node in the span. The right span of node 1 is 2. The right span of node 3 is 6.

References

  • https://gist.github.com/jimpick/54adc72f11f38f1fe4bc1d45d3981708
  • https://github.com/jimpick/hypercore-simple-ipld/blob/master/tree-test.js
  • https://datatracker.ietf.org/doc/rfc7574/?include_text=1
  • https://www.datprotocol.com/deps/0002-hypercore/

Merkle Tree

Merkle Trees in Dat are specialized Flat Trees that contain the content of the archives.

In this section we'll cover how Merkle Trees work, and how we use them inside Hypercore.

What Are Merkle Trees?

Wikipedia defines a Merkle Tree as:

A hash tree or Merkle tree is a tree in which every leaf node is labelled with the hash of a data block and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes.

In flat-tree terminology this means all leaf nodes (even numbers) contain hashes of data, and all uneven numbers (parent nodes) contain hashes.

Take the following tree:

  0──┐
     1──┐
  2──┘  │
        3
  4──┐  │
     5──┘
  6──┘
  • Nodes 0, 2, 4, and 6 contain the hashes of data.
  • Node 1 contains the hash of hashes from nodes 0 and 2.
  • Node 5 contains the hash of hashes from nodes 4 and 6.
  • Node 3 contains the hash of hashes from nodes 1 and 5.

Hypercore Files

A Hypercore's internal structure typically consist of these files:

  • data: a file containing the data added to the Hypercore.
  • tree: a file containing the Merkle tree of hashes derived from the data.
  • signatures: a file containing the cryptographic signatures of the hashes in the tree file.
  • bitfield: a file to keep track of which data we have locally, and which data is part of the network.
  • public key: a file containing the public key. This is used for verifying content.
  • secret key: a file containing the signing key. This is used for adding new content, and is only available on Hypercores you've created.

The names we're using to refer to files here is also the way they're referred to in Hypercore's specs and implementations. When inspecting a .dat directory you'll see these terms used as suffixes. For example as content.tree, or metadata.signatures.

The tree file is responsible for verifying the integrity of the data that's being appended to the feed.

The signature file is responsible for ensuring the integrity of the entire tree at any given state. Every entry in the signature file verifies the current state of the entire tree file.

Not every Hypercore is the same. In most implementations of Dat it's possible to choose how data is stored. For server applications it makes sense to store it in a single file. But for desktop applications it can sometimes make sense to store content directly on disk. For example in the case of (hyper)media files.

Let's look at how these files relate to each other to create a Hypercore feed.

Merkle Trees In Theory

Whenever data is added to Hypercore, a new entry is created in the data file. We then hash the data, and write that hash to a tree file's leaf node. If the leaf node has a sibling, the parent node's hash can be computed. If the new parent node has a sibling, we can compute a new parent node above it. We recurse upward until no more parent nodes can be computed, at which point we'll have reached a root node.

When there are no more hashes left to compute, we gather all the root nodes in the tree, concatenate them, hash them, and sign them with our private key. We then store the signature in our signatures file at the same index of the leaf node that was added.

This might all sound a little abstract though, so let's look at an example.

Merkle Trees In Practice

We're starting off with an empty Hypercore feed. We're planning to add 4 pieces of data to it, one by one: [A B C D].

Entry 1

Let's add our first piece of data to Hypercore. We insert the value "A" as our first entry. In order, the following actions happen:

  1. We append the data to the data file.
  2. We compute a hash (#0) from the data, and store it at index 0 in our tree file.
  3. We gather all root nodes from our tree file (which is just the node at index 0 right now), and compute a cryptographic signature from the hash.
  4. We append the signature to our signatures file.

data

0: A

tree

0: #0 = hash(data[0])

signatures

0: sig(#0)

Entry 2

Let's add our second entry to hypercore. We now have more nodes, which means things are a little different:

  1. We append the data to the data file.
  2. We compute a hash from the data (#2), and store it at index 1 in our tree file.
  3. Because #2 has a sibling hash, we compute a new parent hash (#1), and store it at index 1.
  4. We gather all root nodes from our tree file, and compute a cryptographic signature from the hash. The only root node currently available is #1. Note that while we compute #1 from our tree file, we do not store it there. Only even-numbered hashes are stored in the tree file as the odd-numbered ones can be calculated from them.
  5. We append the signature to our signatures file.

When we talk about "computing a parent hash" it means we concatenate (+) the hashes from both child nodes, and hash the result. Hashes are always the same length, which means every node in the tree is the same length.

data

0: A
1: B

tree

0: #0 = hash(data[0]) ─┐
                       #1 = hash(#0 + #2)
1: #2 = hash(data[1]) ─┘

signatures

0: sig(#0)
1: sig(#1)

Entry 3

So far so good. We're well on our way to building a full tree-structure! Let's continue on our journey, and add our third entry: C.

  1. We append the third piece data to the data file at index 2.
  2. We compute a hash from the data (#4), and store it at index 2 in our tree file.
  3. We now have two root hashes: #1 and #4. So we concatenate them, hash the result, and sign the result.
  4. We append the signature to our signatures file.

data

0: A
1: B
2: C

tree

0: #0 = hash(data[0]) ─┐
                       #1 = hash(tree[0] + tree[1])
1: #2 = hash(data[1]) ─┘

2: #4 = hash(data[2])

signatures

0: sig(#0)
1: sig(#1)
2: sig(#1 + #4)

Entry 4

Let's add the final piece of data to our feed. This will balance out a tree, and bring us back to only one root hash!

  1. We append the fourth piece data to the data file at index 3.
  2. We compute a hash from the data (#6), and store it at index 3 in our tree file.
  3. Our only root hash is #3, so we sign it and compute the signature.
  4. We append the signature to our signatures file.

data

0: A
1: B
2: C
3: D

tree

0: #0 = hash(data[0]) ─┐
                       #1 = hash(#0 + #2) ─┐
1: #2 = hash(data[1]) ─┘                   │
                                           #3 = hash(#1 + #5)
2: #4 = hash(data[2]) ─┐                   │
                       #5 = hash(#4 + #6) ─┘
3: #6 = hash(data[3]) ─┘

signatures

0: sig(#0)
1: sig(#1)
2: sig(#1 + #4)
3: sig(#3)

Verifying A Merkle Tree

TODO

Root Nodes

If the number of leaf nodes is a multiple of 2 the flat tree will only have a single root. Otherwise it'll have more than one.

Storage Format

The format of the each node in the Merkle Tree on disk is a series of 40 byte buffers. The first 32 bytes is the hash. The next 8 bytes is the byte size of the spanning tree.

The format for storing nodes is:

  • 32 byte header which starts with a magic number to indicate what type of file it is.
  • Then a series of nodes, where each index in the sequence corresponds to a position in the Flat Tree.

To read the 6th node from disk (flat tree node #5), you'd use an offset into the file of 32 + 5 * 40, and then read 40 bytes. The first 32 bytes are the hash. The last 8 bytes is the combined length of the data nodes #4 and #6 are referencing. The length is encoded as uint64 Big Endian.

References

  • https://gist.github.com/jimpick/54adc72f11f38f1fe4bc1d45d3981708
  • https://github.com/datrs/tree-index/issues/7#issuecomment-419086236

Bitfield

What is a bitfield?

Space-efficient data structure used to figure out which data you have and what data you don't. Meant to always be kept in memory because it's small enough.

At its core, bitfields are a way of efficiently describing sets of numbers. You can think of them as a series of bits. When a number at a position is 1, it means that position is in the set. If a number is 0, that position isn't.

Example

bits:  00101
index: 01234

The set above contains 2 and 4. It's stored left to right, because that's the way it's enumerated.

In Dat we use bitfields to describe which pieces of data we have, and which pieces of data we don't. Also it's used internally to index data structures, such as the Merkle Tree.

Indexed Bitfields

The most common operations in Dat for bitfields is to either find a piece of data that's missing, or checking if we have a piece of data.

Checking if we have a piece of data is straightforward, as all we have to do is look in the bitfield in the position of the data and see if it's a 1.

Finding a piece of data we're missing is a bit more tricky. Basically it'll require a linear scan of the whole bitfield. In order to speed this up, we use an Indexed Bitfield.

Structure

At a high level Indexed Bitfields are a binary tree structure where each node is made up out of 2 bits.

  • 11 indicates all bits under this node are 1s.
  • 00 indicates all bits under this node are 0s.
  • 10 indicates bits under this node are a mixture of 1s and 0s.
  • 01 is currently unused and reserved for (possible) future purposes.

We call this the Tree Index Scheme.

Consider this Indexed Bitfield, written as a sequence of bits:

01011101000000

Because the bits are indexed as a flat-tree, we can print it as a tree structure:

       01
  01       00
01  11   00  00

By looking at the root node we can tell that there's nodes in the tree, but not yet which nodes are in the tree. By going one level lower however, it becomes clear that there's nodes in one side of the tree, but no nodes in the other side of the tree. This means we only need to check the children of the left node to find out exactly which nodes we have.

A fun fact here is also: a completely zeroed-out buffer is a valid Indexed Bitfield - it just means it's completely empty.

Optimizing the Structure

Looking at a byte and looking at a bit is the same cost in a computer. You want to optimize for getting the most information possible when looking at a byte.

Therefore in order to get the most performance out of our structure, we want to construct our tree using bytes instead of pairs of bits.

Consider the following scheme. Given two bytes: A B. Take each of them, and split each of them into pairs of two bits. We'll use a1 a2 a3 a4 to indicate the pairs in A. And b1 b2 b3 b4 to indicate the pairs in B.

The parent of A and B is C. C is constructed by applying the Tree Index Scheme to each pair of bits.

                [a1 + a2, a3 + a4, b1 + b2, b3 + b4]
[a1, a2, a3, a4]                                    [b1, b2, b3, b4]

In the example above, we use the + operator to indicate the application of the Tree Index Scheme.

In the future we might make this even more efficient using SIMD instructions, which can operate on more bits at the same time.

Lookup Tables

An efficient implementation of the previous scheme can be done using lookup tables for values between between 0 and 256.

This is all solely for performance and completely optional. The important part is that the indexing scheme is followed.

Types of Bitfields

We have 3 bitfields:

  • Data Bitfield: Indicates which data you have, and which data you don't.
  • Indexed Bitfield: Helps efficiently search through the Data Bitfield using the Tree Index Scheme.
  • Merkle Tree Bitfield: Indicates which nodes in the Merkle Tree you have, and which nodes you don't.

This means that whenever you update the Data Bitfield, you must also update the Indexed Bitfield.

Updating a Byte

If we want to set an index in a bitfield to false, it would mean we needed to flip a bit to 0. Because we can only operate on bytes, the easiest way to achieve this is to apply a bitmask.

Consider the following lookup table, in binary notation:


# #![allow(unused_variables)]
#fn main() {
let data_update = vec![
  0b01111111, // 127
  0b10111111, // 191
  0b11011111, // 223
  0b11101111, // 239
  0b11110111, // 247
  0b11111011, // 251
  0b11111101, // 253
  0b11111110, // 254
];
#}

There are 8 entries in this table, all of which have a different position of which bit is set to zero. When you want to flip a bit to zero, you take the index of the bit you want to flip, look up the entry in the table, and bitwise AND the two numbers.

Serialization

For every piece of data there's going to be 1 bit in the Data Bitfield. And 2 bits in the Merkle Tree Bitfield because there's a parent node and a leaf node. There are going to be as many parents as there will be leaves.

Every time there's 16 bits in the Data Bitfield, the Indexed Bitfield needs 2 bits to indicate if it's all 1s, 0s, or a mixture. And 2 bits for the Tree Index Scheme, totalling 4 bits in the Indexed Bitfield.

So this translates to the following ratios:

  • Data: 1024 bytes.
  • Merkle Tree: 2048 bytes.
  • Indexed Tree: 256 bytes.

Run Length Encoding

When sending data over the wire, we want to compress the bitfields further. An efficient way of doing this is by using Run Length Encoding (RLE). TODO: explain the module. For now read the README.

Storage

You can represent a binary tree in a simple flat list using the following structure:

      3
  1       5
0   2   4   6  ...

Headers

All the files in Dat's Storage are considered SLEEP files. However, not all files require a header. Only the files that have a variable algorithm (such as hashing, signing) require a SLEEP header.

Usage in Dat

When you append data, you basically update the Merkle Tree. Updating the Merkle Tree creates a new Root Hash of the Merkle Tree. For security we need to sign this Root Hash, so people can trust it's the new one.

It's important to know that everything in Hypercore is a byproduct of appending data to the Feed. This includes Signatures, Hashes, Root Nodes and more.

Types of Storage

  • data: The concatenated data that was appended to the feed.
  • merkle tree: The hashes of the data, and hashes of hashes of data - stored in a tree. Also stores the length of the data.
  • signatures: We take the Root Hash of the Merkle Tree, and sign that one.
  • bitfield: Space-efficient data structure used to figure out which data you have and what data you don't.
  • key: Ed25519 Public Key (part of a keypair).
  • secret key: Ed25519 Secret Key (part of a keypair).

When you produce a new Signature, the index for the Signature is the same index as for the data you appended to the Feed.

Random Access Storage

Hypercore supports multiple persistence backends through the random-access-storage interface. Each storage backend adheres to a standardized interface to write, read and delete ranges of bytes.

There are many different backends available, but generally all implementations implement the following backends:

  • random-access-memory: stores bytes in-memory. Ideal for testing, and providing an initial implementation.
  • random-access-disk: stores bytes on disk. Generally useful as Hypercore's first persistent backend.

Because Hypercore and Dat support partially downloading data, a useful feature is to implement sparse persistence. This means that we can write data into memory or to disk with spaces in between, but without paying any cost.

For example if we want to write bytes 0 to 10, and bytes 9877 to 11000, the space between 10 and 9877 should not cost us anything.

This is generally implemented using pagers. A pager is a vector (or array) where each entry is a range. If we want to access a particular range, we lookup the correct entry in the pager, and allocate it if needed. This way we preserve space we don't use.

It's generally good to use 1kb pages when accessing memory, and 4kb pages when accessing disk (SSD). This ensures that data will be written as continuous chunks, which is good for performance.

Implementation Guide

flat-tree

flat-tree is one of the first modules that should be implemented for the Dat protocol. See Key Concepts: Flat Tree for an overview of how Flat Trees work.

Core API

The core flat-tree API consists of several methods that can calculate the nodes relative to each other.

Children

Returns both children of a node. It cannot return any children when querying a leaf node, so the result must be a Null, Maybe, or Option type, depending on the language used.


# #![allow(unused_variables)]
#fn main() {
pub fn children_with_depth(i: usize, depth: usize) -> Option<(usize, usize)>
#}

count

Returns how many nodes are under the tree that the node spans.


# #![allow(unused_variables)]
#fn main() {
pub fn count_with_depth(i: usize) -> usize
#}

depth

Returns the depth of a node at a given index. is removed from the left-most node at its current depth


# #![allow(unused_variables)]
#fn main() {
pub fn depth(i: usize) -> usize
#}

full_roots

Returns a list of all the full roots (subtrees where all nodes have either 2 or 0 children).


# #![allow(unused_variables)]
#fn main() {
pub fn full_roots(i: usize) -> Vec<usize>{
#}

index

Return the index for a node at a given depth and offset.


# #![allow(unused_variables)]
#fn main() {
pub fn index(depth: usize, offset: usize) -> usize
#}

left_child

Return the left child of the node at index.


# #![allow(unused_variables)]
#fn main() {
pub fn left_child(i: usize) -> Option<usize>
#}

left_span

Returns the left-most child of the node at index.


# #![allow(unused_variables)]
#fn main() {
pub fn left_span(i: usize) -> usize
#}

offset

Returns the offset of a node at a given index. The offset of a node is how far it is removed from the left-most node at its current depth


# #![allow(unused_variables)]
#fn main() {
pub fn offset(i: usize) -> usize
#}

parent

Returns the parent of a node.


# #![allow(unused_variables)]
#fn main() {
pub fn parent(i: usize) -> usize
#}

right_child

Return the right child of the node at index.


# #![allow(unused_variables)]
#fn main() {
pub fn right_child(i: usize) -> Option<usize>
#}

right_span

Returns the right-most child of the node at index.


# #![allow(unused_variables)]
#fn main() {
pub fn right_span(i: usize) -> usize
#}

left_span

Returns the left-most child of the node at index.


# #![allow(unused_variables)]
#fn main() {
pub fn right_span(i: usize) -> usize
#}

sibling

Returns the node that shares the same parent node.


# #![allow(unused_variables)]
#fn main() {
pub fn sibling(i: usize) -> usize
#}

spans

Returns a pair of the left-most node in the tree, and the right-most node in the tree.


# #![allow(unused_variables)]
#fn main() {
pub fn spans(i: usize) -> (usize, usize)
#}

uncle

Returns the parent's sibling node.


# #![allow(unused_variables)]
#fn main() {
pub fn uncle(i: usize) -> usize
#}

Iterator API

Some upstream modules require stateful traversal of the tree. It can be convenient to expose a stateful iterator module as part of flat-tree for those cases.

The iterator should be a constructor that takes an initial index, and exposes methods to move to child nodes, parent nodes, etc. The iterator should also expose the index so it can be stored or used for other computations.

Optimizations

Calculate Depth

The depth of a node can be calculated by counting the number of tailing zeros in a number. Languages such as Rust expose <num>.trailing_zeros() which counts the amount of zeros at the end of a number (cttz intrinsic). By combining a bitwise negation (! or NOT operation) with the .trailing_zeros() method, the amount of trailing ones can be counted. On x86 machines this should be around 3 instructions when inlined.

Re-use the depth parameter

Sometimes when calling multiple functions on the same index, depending on the execution environment it can be efficient to reuse the depth parameter.

However if the optimized depth method is used, there's no strict need to expose the with_depth* methods, as the compiler will detect the duplicate computation, and remove it anyway. Mileage may vary, but this technique has been tested on LLVM output for the Rust version. In the worst case there might be penalty of up 3 instructions per call, which seems like a negligible penalty.

Calculate children

Calculating whether a node has children can be sped up by quickly checking if a number is even or uneven. If a number is uneven, it's already guaranteed to be a child node, so it can't have child nodes.

Spans

Similarly for spans. If an even number is targeted, it is at the bottom of the tree, so it will only span itself. Therefore it's easy to implement an is_even check, and return the index that was passed in if it's true.

Full Roots

To prevent allocations full-roots could also write to either a pre-allocated vector, or a stack-allocated collection instead.

Merkle Tree Stream

We covered how Dat's Merkle Trees work in the Merkle Tree Chapter. In this chapter we'll discuss how to implement it, and what to look out for when optimizing.

Core Abstraction

At its core, the merkle tree stream is a stateful algorithm. It takes data in (leaf nodes), hashes it, and computes as many parent hashes as it can.

At its core this can be expressed as a stateful class that has a .append() or .next() method. Internally it keeps track of three properties:

  • An index to keep track of how far along in the sequence we are.
  • A vector of hashes.
  • The current roots.

To prevent allocations, both the vector of hashes and current roots could be passed in externally, but this is not a requirement.

Methods

merkle-tree-stream should operate as a generic structure without tying itself to any particular hash function. This means that when creating an instance, the methods have to be passed in. The following methods are generally required:

  • leaf: takes data in, and produces a hash of the data.
  • parent: takes two hashes, and produces a new hash.

In strictly typed languages, both the input data, and hash need to be passed as type parameters into the generic function. In dynamically typed languages, both can generally be expressed as byte array.

Async

In practice it's perfectly fine to write the stream as a synchronous method. For most implementations of Dat I/O typically tends to be the bottleneck, but on a live system that might not always be the case (e.g. contention with other resources). So to improve the overall performance of merkle-tree-stream, it can be useful to schedule work over multiple cores.

The first candidate for parallelization is the hashing of the leaf nodes. Unlike the parent nodes, the hashes in the leaf nodes are just functions of data, so they can be freely scheduled on different cores without any problem.

A bigger challenge is when creating hashes for the parent nodes. Because each parent depends on having two child nodes, it means you can't compute a parent node before its children have been computed. This means that that if we're only adding a single entry to the stream it cannot be parallelized, because each hash depends on the hash that came before it.

Take the following tree:

  0──┐
     1──┐
  2──┘  │
        3
  4──┐  │
     5──┘
  6──┘

Computing the hashes of [0 2 4 6] are parallizable. Once those are complete, [1 5] can be computed. And then [3] can be computed.

The exact underlying mechanisms for storing and synchronizing nodes is a topic that could use more research. But starting with a read-write lock around the internal vector of hashes, and creating a task queue seems like the right starting point.

References

memory-pager

Data in Hypercore streams can be replicated out of order. This means that you could both have the first message in a feed, and the millionth message in a feed. To ensure that we don't allocate space for a million messages when we only have two we make use of a data structure called a memory pager.

Memory pagers work by having a vector of pointers to fixed-sized byte buffers. Each buffer (or page, if you will) is only allocated once there it has data that needs to be written into it.

Example

Writing

Say we have a memory pager with pages that are 4 kilobytes long each. The initial state would be:

[]

All we have is an empty list with no values.

4 kilobytes is a reasonable default for pages, as it maps directly to most operating system's page internal paging structures. This means that this can generally efficiently be allocated & cleared.

Let's say we want to write the 2000th byte. We need to figure out which page this will be on. We can do this by dividing the index by the page size, and the resulting number (integer) is the page we want to index. E.g. 2000 / 1024 = 1.

In languages that don't support integer division, the equivalent of Math.floor() should be called on the resulting value.

Now that we have the right page number, we need to find the right index on the page. We can do this by using the modulo operator. E.g. 2000 % (1024 - 1) results in index 997. We subtract 1 from the page size because the first entry is located a position 0, not position 1.

Putting this together, we need to write the 997th byte on the second page (index 1). This would look as:

[
  None,
  Some(Vec), // 1024 bytes
]

The first page isn't allocated so we put a None (or null) value there. The second page (index 1) is allocated, so we keep a pointer into the buffer. All gaps in between bytes are kept as None types.

Reading

Reading values from the pager can be split up into two operations:

  • reading a value from an empty page
  • reading a value from an existent page

It's generally recommended that any value read from the memory pager can be an empty variant (None or null), as it might not have been written yet. This is similar to accessing values in most vector implementations.

When a value is read from an empty page, if the page is empty the resulting value will be empty too.

If the page exists, the index into the buffer must be calculated (see the Writing section), and that value can be returned.

Implementation

We can get the page by dividing the index by the page size:


# #![allow(unused_variables)]
#fn main() {
let page_num = index / (page_size - 1);
let page = get_page(page_num); // get the page from our list.
#}

To get the adjusted index into the page, we need to translate the page index:


# #![allow(unused_variables)]
#fn main() {
let page_index = index % (page_size - 1);
#}

Putting these together:


# #![allow(unused_variables)]
#fn main() {
fn set(index: usize, value: T) {
  let page_num = index / (self.page_size - 1);
  let page = match self.pages.get(page_num) {
    Some(page) => page,
    None => /* allocate a new page and return it */,
  }
  let page_index = index % (page_size - 1);
  page[page_index] = value;
}
#}

References

Appendix

The following sections contain reference material you may find useful in your Dat journey.

Terminology

The following terms are used when describing the Dat Protocol.

Terminology Currently in Use

  • feed: The main data structure in Hypercore. Append-only log that uses multiple data structures and algorithms to safely store data.
  • data: Atomic pieces of data that are written to the feed by users.
  • keypair: An Ed25519 keypair used to encrypt data with.
  • signature: Certificate that proves a data structure was created by a specific keypair.
  • tree: A binary tree mapped as a flat-tree to keep an index of the current data.
  • flat-tree: Mapping of a series of integers to a binary tree structure.
  • bitfield: Space-efficient data structure used to figure out which data you have and what data you don't. Meant to always be kept in memory because it's small enough.
  • run-length-encoding (RLE): Basic compression scheme used to compress bitfields when sending over the wire.
  • parent node: A parent has two children under it, and is always odd-numbered. Node 3 is the parent of 1 and 5.
  • leaf node: A node with no children. A leaf node is always even-numbered. Nodes 0, 2, 4, 6 and 8 are leaf nodes.
  • sibling node: The other node that shares a parent with the current node. For example nodes 4 and 6 are siblings.
  • uncle node: A parent's sibling. Node 1 is the uncle of nodes 4 and 6.
  • root node: A top-most node where the full tree under it is complete (e.g. all parent nodes have 2 children). Node 3 is a root node.
  • node span: The two nodes that are furthest away in the sub-tree. The span of node 1 is 0, 2. The span of node 3 is 0, 6.
  • right node span: The left-most node in the span. The right span of node 1 is 2. The right span of node 3 is 6.