memory-pager
Data in Hypercore streams can be replicated out of order. This means that you could both have the first message in a feed, and the millionth message in a feed. To ensure that we don't allocate space for a million messages when we only have two we make use of a data structure called a memory pager.
Memory pagers work by having a vector of pointers to fixed-sized byte buffers. Each buffer (or page, if you will) is only allocated once there it has data that needs to be written into it.
Example
Writing
Say we have a memory pager with pages that are 4 kilobytes long each. The initial state would be:
[]
All we have is an empty list with no values.
4 kilobytes is a reasonable default for pages, as it maps directly to most operating system's page internal paging structures. This means that this can generally efficiently be allocated & cleared.
Let's say we want to write the 2000th byte. We need to figure out which page
this will be on. We can do this by dividing the index by the page size, and the
resulting number (integer) is the page we want to index. E.g. 2000 / 1024
=
1
.
In languages that don't support integer division, the equivalent of
Math.floor()
should be called on the resulting value.
Now that we have the right page number, we need to find the right index on the
page. We can do this by using the modulo operator. E.g. 2000 % (1024 - 1)
results in index 997
. We subtract 1 from the page size because the first entry
is located a position 0, not position 1.
Putting this together, we need to write the 997th byte on the second page (index 1). This would look as:
[
None,
Some(Vec), // 1024 bytes
]
The first page isn't allocated so we put a None
(or null
) value there. The
second page (index 1) is allocated, so we keep a pointer into the buffer. All
gaps in between bytes are kept as None types.
Reading
Reading values from the pager can be split up into two operations:
- reading a value from an empty page
- reading a value from an existent page
It's generally recommended that any value read from the memory pager can be an
empty variant (None
or null
), as it might not have been written yet. This is
similar to accessing values in most vector implementations.
When a value is read from an empty page, if the page is empty the resulting value will be empty too.
If the page exists, the index into the buffer must be calculated (see the Writing section), and that value can be returned.
Implementation
We can get the page by dividing the index by the page size:
# #![allow(unused_variables)] #fn main() { let page_num = index / (page_size - 1); let page = get_page(page_num); // get the page from our list. #}
To get the adjusted index into the page, we need to translate the page index:
# #![allow(unused_variables)] #fn main() { let page_index = index % (page_size - 1); #}
Putting these together:
# #![allow(unused_variables)] #fn main() { fn set(index: usize, value: T) { let page_num = index / (self.page_size - 1); let page = match self.pages.get(page_num) { Some(page) => page, None => /* allocate a new page and return it */, } let page_index = index % (page_size - 1); page[page_index] = value; } #}