Flat Tree
Flat Trees are the core data structure that power Dat's Hypercore feeds. They allow us to deterministically represent a tree structure as a vector. This is particularly useful because vectors map elegantly to disk and memory.
Because Flat Trees are deterministic and pre-computed, there is no overhead to using them. In effect this means that Flat Trees are a specific way of indexing into a vector more than they are their own data structure. This makes them uniquely efficient and convenient to implement in a wide range of languages.
Thinking About Flat Trees
You can represent a binary tree in a simple flat list using the following structure:
3
1 5
0 2 4 6 ...
Let's rotate the tree on its side for notational purposes:
0─┐
1─┐
2─┘ │
3
4─┐ │
5─┘
6─┘
Each number represents an index in a flat list. Given the tree:
D─┐
B─┐
E─┘ │
A
F─┐ │
C─┘
G─┘
The way this would be expressed in-memory would be as the list (vector):
[D B E A F C G]
or [0 1 2 3 4 5 6]
.
Depth
Indexes 0, 2, 4, 6 have a depth of 0. And indexes 1 and 5 have a depth of 1.
depth = 2 ^ 3
depth = 1 | 1 5
depth = 0 | 0 2 4 6 ...
If we convert the graph to a chart we could express it as such:
depth = 0 | 0 2 4 6
depth = 1 | 1 5
depth = 2 | 3
depth = 3 |
Now let's add numbers up to 14:
depth = 0 | 0 2 4 6 8 10 12 14
depth = 1 | 1 5 9 13
depth = 2 | 3 11
depth = 3 | 7
Node Kinds
You might be noticing that the numbers at depth = 0
is vastly greater than the
amount of numbers at every other depth. We refer to nodes at depth = 0
as
leaf nodes
, and nodes at every other depth as parent nodes
.
leaf nodes | 0 2 4 6 8 10 12 14
parent nodes | 1 3 5 7 9 11 13
An interesting aspect of flat trees is that the number of leaf nodes
and
number of parent nodes
is in perfect balance. This comes to an interesting
insight:
- All even indexes refer to
leaf nodes
. - All uneven indexes refer to
parent nodes
.
The depth of a tree node can be calculated by counting the number of trailing 1s a node has in binary notation.
5 in binary = 101 (one trailing 1)
3 in binary = 011 (two trailing 1s)
4 in binary = 100 (zero trailing 1s)
Offset
When reading about flat-trees the word offset
might regularly pop up. This
refers to the offset from the left hand side of the tree.
In the following tree the indexes with an offset of 0 are: [0 1 3 7]
:
(0)┐
(1)┐
2─┘ │
(3)┐
4─┐ │ │
5─┘ │
6─┘ │
(7)
In the next tree the indexes with an offset of 1 are: [2 5 11]
:
0──┐
1──┐
(2)─┘ │
3──┐
4──┐ │ │
(5)─┘ │
6──┘ │
7
8──┐ │
9──┐ │
10──┘ │ │
(11)─┘
12──┐ │
13──┘
14──┘
Relationships Between Nodes
When describing nodes we often also talk about the relationship between nodes.
This includes words such as uncle
, and parent
.
Take this example tree:
0─┐
1─┐
2─┘ │
3─┐
4─┐ │ │
5─┘ │
6─┘ │
7
8
- parent: A parent has two children under it, and is always odd-numbered. Node 3 is the parent of 1 and 5.
- leaf: A node with no children. A leaf node is always even-numbered. Nodes 0, 2, 4, 6 and 8 are leaf nodes.
- sibling: The other node that shares a parent with the current node. For example nodes 4 and 6 are siblings.
- uncle: A parent's sibling. Node 1 is the uncle of nodes 4 and 6.
- root: A top-most node where the full tree under it is complete (e.g. all parent nodes have 2 children). Node 3 is a root node.
- span: The two nodes that are furthest away in the sub-tree. The span of
node 1 is
0, 2
. The span of node 3 is0, 6
. - right span: The left-most node in the span. The right span of node 1 is 2. The right span of node 3 is 6.
References
- https://gist.github.com/jimpick/54adc72f11f38f1fe4bc1d45d3981708
- https://github.com/jimpick/hypercore-simple-ipld/blob/master/tree-test.js
- https://datatracker.ietf.org/doc/rfc7574/?include_text=1
- https://www.datprotocol.com/deps/0002-hypercore/