This module implements a specialized hash tree that is used primarily by cluster metadata's anti-entropy exchanges and by metadata clients for determining when groups of metadata keys have changed locally. The tree can be used, generally, for determining the differences in groups of keys, or to find missing groups, between two stores.
Each node of the tree is itself a hash tree, specifically a hashtree. The tree has a fixed height but each node has a
variable amount of children. The height of the tree directly
corresponds to the number of prefixes supported by the tree. A list
of prefixes, or a "prefix list", represent a group of keys. Each
unique prefix list is a node in the tree. The leaves store hashes
for the individual keys in the segments of the node's hashtree. The buckets of the leaves' hashtree provide an efficient
way of determining when keys in the segments differ between two
trees. The tails of the prefix list are used to roll up groups
into parent groups. For example, the prefixes [a, b], [a, c],
[d, e] will be rolled up into parent groups a, containing c
and b, and d, containing only 'e'. The parent group's node has
children corresponding to each child group. The top-hashes of the
child nodes are stored in the parent nodes' segments. The parent
nodes' buckets are used as an efficient method for determining when
child groups differ between two trees. The root node corresponds to
the empty list and it acts like any other node, storing hashes for
the first level of child groups. The top hash of the root node is
the top hash of the tree.
The tree in the example above might store something like:
node parent top-hash segments --------------------------------------------------- root none 1 [{a, 2}, {d, 3}] [a] root 2 [{b, 4}, {c, 5}] [d] root 3 [{e, 6}] [a,b] [a] 4 [{k1, 0}, {k2, 6}, ...] [a,c] [a] 5 [{k1, 1}, {k2, 4}, ...] [d,e] [d] 6 [{k1, 2}, {k2, 3}, ...]
When a key is inserted into the tree it is inserted into the leaf corresponding to the given prefix list. The leaf and its parents are not updated at this time. Instead the leaf is added to a dirty set. The nodes are later updated in bulk.
Updating the hashtree is a two step process. First, a snapshot of
the tree must be obtained. This prevents new writes from affecting
the update. Snapshotting the tree will snapshot each dirty
leaf. Since writes to nodes other than leaves only occur during
updates no snapshot is taken for them. Second, the tree is updated
using the snapshot. The update is performed by updating the hashtree nodes at each level starting with the leaves. The top
hash of each node in a level is inserted into its parent node after
being updated. The list of dirty parents is then updated, moving up
the tree. Once the root is reached and has been updated the process
is complete. This process is designed to minimize the traversal of
the tree and ensure that each node is only updated once.
The typical use for updating a tree is to compare it with another
recently updated tree. Comparison is done with the compare/4
function. Compare provides a sort of fold over the differences of
the tree allowing for callers to determine what to do with those
differences. In addition, the caller can accumulate a value, such
as the difference list or stats about differencces.
The tree implemented in this module assumes that it will be managed by a single process and that all calls will be made to it synchronously, with a couple exceptions:
1. Updating a tree with a snapshot can be done in another process. The snapshot must be taken by the owning process, synchronously. 2. Comparing two trees may be done by a seperate process. Compares should should use a snapshot and only be performed after an update.
The nodes in this tree are backend by LevelDB, however, this is most likely temporary and Cluster Metadata's use of the tree is ephemeral. Trees are only meant to live for the lifetime of a running node and are rebuilt on start. To ensure the tree is fresh each time, when nodes are created the backing LevelDB store is opened, closed, and then re-opened to ensure any lingering files are removed. Additionally, the nodes themselves (references tohashtree, are stored in ets.
diff() = prefix_diff() | key_diffs()
handler_fun(X) = fun((diff(), X) -> X)
insert_opt() = insert_opt_if_missing()
insert_opt_if_missing() = {if_missing, boolean()}
insert_opts() = [insert_opt()]
key_diffs() = {key_diffs, prefixes(), [{missing | remote_missing | different, binary()}]}
new_opt() = new_opt_num_levels() | new_opt_data_dir()
new_opt_data_dir() = {data_dir, file:name_all()}
new_opt_num_levels() = {num_levels, non_neg_integer()}
new_opts() = [new_opt()]
prefix() = atom() | binary()
prefix_diff() = {missing_prefix, local | remote, prefixes()}
prefixes() = [prefix()]
remote_fun() = fun((prefixes(), {get_bucket, {integer(), integer()}} | {key_hashses, integer()}) -> orddict:orddict())
abstract datatype: tree()
abstract datatype: tree_node()
| compare/4 | Compare a local and remote tree. |
| destroy/1 | Destroys the tree cleaning up any used resources. |
| get_bucket/4 | Returns the hashtree buckets for a given node in the
tree. |
| insert/4 | an alias for insert(Prefixes, Key, Hash, [], Tree). |
| insert/5 | Insert a hash into the tree. |
| key_hashes/3 | Returns the hashtree segment hashes for a given node
in the tree. |
| local_compare/2 | Compare two local trees. |
| new/2 | Creates a new hashtree. |
| prefix_hash/2 | Returns the top-hash of the node corresponding to the given prefix list. |
| top_hash/1 | Returns the top-hash of the tree. |
| update_perform/1 | Update the tree with a snapshot obtained by update_snapshot/1. |
| update_snapshot/1 | Snapshot the tree for updating. |
compare(LocalTree::tree(), RemoteFun::remote_fun(), HandlerFun::handler_fun(X), X) -> X
Compare a local and remote tree. RemoteFun is used to
access the buckets and segments of nodes in the remote
tree. HandlerFun will be called for each difference found in the
tree. A difference is either a missing local or remote prefix, or a
list of key differences, which themselves signify different or
missing keys. HandlerAcc is passed to the first call of
HandlerFun and each subsequent call is passed the value returned
by the previous call. The return value of this function is the
return value from the last call to HandlerFun.
destroy(Tree::tree()) -> ok
Destroys the tree cleaning up any used resources. This deletes the LevelDB files for the nodes.
get_bucket(Prefixes::tree_node(), Level::integer(), Bucket::integer(), Tree::tree()) -> orddict:orddict()
Returns the hashtree buckets for a given node in the
tree. This is used primarily for accessing buckets of a remote tree
during compare.
insert(Prefixes::prefixes(), Key::binary(), Hash::binary(), Tree::tree()) -> tree() | {error, term()}
an alias for insert(Prefixes, Key, Hash, [], Tree)
insert(Prefixes::prefixes(), Key::binary(), Hash::binary(), Opts::insert_opts(), Tree::tree()) -> tree() | {error, term()}
Insert a hash into the tree. The length of Prefixes must
correspond to the height of the tree -- the value used for
num_levels when creating the tree. The hash is inserted into
a leaf of the tree and that leaf is marked as dirty. The tree is not
updated at this time. Future operations on the tree should used the
tree returend by this fucntion.
true then the hash is only inserted into the tree
if the key is not already present. This is useful for
ensuring writes concurrent with building the tree
take precedence over older values. false is the default
value.
key_hashes(Prefixes::tree_node(), Segment::integer(), Tree::tree()) -> [{integer(), orddict:orddict()}]
Returns the hashtree segment hashes for a given node
in the tree. This is used primarily for accessing key hashes of a
remote tree during compare.
Compare two local trees. This function is primarily for local debugging and testing.
new(TreeId::term(), Opts::new_opts()) -> tree()
Creates a new hashtree.
Takes the following options: * num_levels - the height of the tree excluding leaves. corresponds to the length of the prefix list passed toinsert/5.
* data_dir - the directory where the LevelDB instances for the nodes will
be stored.
prefix_hash(Prefixes::prefixes(), Tree::tree()) -> undefined | binary()
Returns the top-hash of the node corresponding to the given
prefix list. The length of the prefix list can be less than or
equal to the height of the tree. If the tree has not been updated
or if the prefix list is not found or invalid, then undefined is
returned. Otherwise the hash value from the most recent update is
returned.
top_hash(Tree::tree()) -> undefined | binary()
Returns the top-hash of the tree. This is the top-hash of the root node.
update_perform(Tree::tree()) -> ok
Update the tree with a snapshot obtained by update_snapshot/1. This function may be called by a process other
than the one managing the tree.
Snapshot the tree for updating. The return tree should be
updated using update_perform/1 and to perform future operations
on the tree
Generated by EDoc