Module hashtree_tree

This module implements a specialized hash tree that is used primarily by cluster metadata's anti-entropy exchanges and by metadata clients for determining when groups of metadata keys have changed locally.

Description

This module implements a specialized hash tree that is used primarily by cluster metadata's anti-entropy exchanges and by metadata clients for determining when groups of metadata keys have changed locally. The tree can be used, generally, for determining the differences in groups of keys, or to find missing groups, between two stores.

Each node of the tree is itself a hash tree, specifically a hashtree. The tree has a fixed height but each node has a variable amount of children. The height of the tree directly corresponds to the number of prefixes supported by the tree. A list of prefixes, or a "prefix list", represent a group of keys. Each unique prefix list is a node in the tree. The leaves store hashes for the individual keys in the segments of the node's hashtree. The buckets of the leaves' hashtree provide an efficient way of determining when keys in the segments differ between two trees. The tails of the prefix list are used to roll up groups into parent groups. For example, the prefixes [a, b], [a, c], [d, e] will be rolled up into parent groups a, containing c and b, and d, containing only 'e'. The parent group's node has children corresponding to each child group. The top-hashes of the child nodes are stored in the parent nodes' segments. The parent nodes' buckets are used as an efficient method for determining when child groups differ between two trees. The root node corresponds to the empty list and it acts like any other node, storing hashes for the first level of child groups. The top hash of the root node is the top hash of the tree.

The tree in the example above might store something like:

node parent top-hash segments --------------------------------------------------- root none 1 [{a, 2}, {d, 3}] [a] root 2 [{b, 4}, {c, 5}] [d] root 3 [{e, 6}] [a,b] [a] 4 [{k1, 0}, {k2, 6}, ...] [a,c] [a] 5 [{k1, 1}, {k2, 4}, ...] [d,e] [d] 6 [{k1, 2}, {k2, 3}, ...]

When a key is inserted into the tree it is inserted into the leaf corresponding to the given prefix list. The leaf and its parents are not updated at this time. Instead the leaf is added to a dirty set. The nodes are later updated in bulk.

Updating the hashtree is a two step process. First, a snapshot of the tree must be obtained. This prevents new writes from affecting the update. Snapshotting the tree will snapshot each dirty leaf. Since writes to nodes other than leaves only occur during updates no snapshot is taken for them. Second, the tree is updated using the snapshot. The update is performed by updating the hashtree nodes at each level starting with the leaves. The top hash of each node in a level is inserted into its parent node after being updated. The list of dirty parents is then updated, moving up the tree. Once the root is reached and has been updated the process is complete. This process is designed to minimize the traversal of the tree and ensure that each node is only updated once.

The typical use for updating a tree is to compare it with another recently updated tree. Comparison is done with the compare/4 function. Compare provides a sort of fold over the differences of the tree allowing for callers to determine what to do with those differences. In addition, the caller can accumulate a value, such as the difference list or stats about differencces.

The tree implemented in this module assumes that it will be managed by a single process and that all calls will be made to it synchronously, with a couple exceptions:

1. Updating a tree with a snapshot can be done in another process. The snapshot must be taken by the owning process, synchronously. 2. Comparing two trees may be done by a seperate process. Compares should should use a snapshot and only be performed after an update.

The nodes in this tree are backend by LevelDB, however, this is most likely temporary and Cluster Metadata's use of the tree is ephemeral. Trees are only meant to live for the lifetime of a running node and are rebuilt on start. To ensure the tree is fresh each time, when nodes are created the backing LevelDB store is opened, closed, and then re-opened to ensure any lingering files are removed. Additionally, the nodes themselves (references to hashtree, are stored in ets.

Data Types

diff()

diff() = prefix_diff() | key_diffs()

handler_fun()

handler_fun(X) = fun((diff(), X) -> X)

insert_opt()

insert_opt() = insert_opt_if_missing()

insert_opt_if_missing()

insert_opt_if_missing() = {if_missing, boolean()}

insert_opts()

insert_opts() = [insert_opt()]

key_diffs()

key_diffs() = {key_diffs, prefixes(), [{missing | remote_missing | different, binary()}]}

new_opt()

new_opt() = new_opt_num_levels() | new_opt_data_dir()

new_opt_data_dir()

new_opt_data_dir() = {data_dir, file:name_all()}

new_opt_num_levels()

new_opt_num_levels() = {num_levels, non_neg_integer()}

new_opts()

new_opts() = [new_opt()]

prefix()

prefix() = atom() | binary()

prefix_diff()

prefix_diff() = {missing_prefix, local | remote, prefixes()}

prefixes()

prefixes() = [prefix()]

remote_fun()

remote_fun() = fun((prefixes(), {get_bucket, {integer(), integer()}} | {key_hashses, integer()}) -> orddict:orddict())

tree()

abstract datatype: tree()

tree_node()

abstract datatype: tree_node()

Function Index

compare/4Compare a local and remote tree.
destroy/1Destroys the tree cleaning up any used resources.
get_bucket/4Returns the hashtree buckets for a given node in the tree.
insert/4an alias for insert(Prefixes, Key, Hash, [], Tree).
insert/5Insert a hash into the tree.
key_hashes/3Returns the hashtree segment hashes for a given node in the tree.
local_compare/2Compare two local trees.
new/2Creates a new hashtree.
prefix_hash/2Returns the top-hash of the node corresponding to the given prefix list.
top_hash/1Returns the top-hash of the tree.
update_perform/1Update the tree with a snapshot obtained by update_snapshot/1.
update_snapshot/1Snapshot the tree for updating.

Function Details

compare/4

compare(LocalTree::tree(), RemoteFun::remote_fun(), HandlerFun::handler_fun(X), X) -> X

Compare a local and remote tree. RemoteFun is used to access the buckets and segments of nodes in the remote tree. HandlerFun will be called for each difference found in the tree. A difference is either a missing local or remote prefix, or a list of key differences, which themselves signify different or missing keys. HandlerAcc is passed to the first call of HandlerFun and each subsequent call is passed the value returned by the previous call. The return value of this function is the return value from the last call to HandlerFun.

destroy/1

destroy(Tree::tree()) -> ok

Destroys the tree cleaning up any used resources. This deletes the LevelDB files for the nodes.

get_bucket/4

get_bucket(Prefixes::tree_node(), Level::integer(), Bucket::integer(), Tree::tree()) -> orddict:orddict()

Returns the hashtree buckets for a given node in the tree. This is used primarily for accessing buckets of a remote tree during compare.

insert/4

insert(Prefixes::prefixes(), Key::binary(), Hash::binary(), Tree::tree()) -> tree() | {error, term()}

an alias for insert(Prefixes, Key, Hash, [], Tree)

insert/5

insert(Prefixes::prefixes(), Key::binary(), Hash::binary(), Opts::insert_opts(), Tree::tree()) -> tree() | {error, term()}

Insert a hash into the tree. The length of Prefixes must correspond to the height of the tree -- the value used for num_levels when creating the tree. The hash is inserted into a leaf of the tree and that leaf is marked as dirty. The tree is not updated at this time. Future operations on the tree should used the tree returend by this fucntion.

Insert takes the following options: * if_missing - if true then the hash is only inserted into the tree if the key is not already present. This is useful for ensuring writes concurrent with building the tree take precedence over older values. false is the default value.

key_hashes/3

key_hashes(Prefixes::tree_node(), Segment::integer(), Tree::tree()) -> [{integer(), orddict:orddict()}]

Returns the hashtree segment hashes for a given node in the tree. This is used primarily for accessing key hashes of a remote tree during compare.

local_compare/2

local_compare(T1::tree(), T2::tree()) -> [diff()]

Compare two local trees. This function is primarily for local debugging and testing.

new/2

new(TreeId::term(), Opts::new_opts()) -> tree()

Creates a new hashtree.

Takes the following options: * num_levels - the height of the tree excluding leaves. corresponds to the length of the prefix list passed to insert/5. * data_dir - the directory where the LevelDB instances for the nodes will be stored.

prefix_hash/2

prefix_hash(Prefixes::prefixes(), Tree::tree()) -> undefined | binary()

Returns the top-hash of the node corresponding to the given prefix list. The length of the prefix list can be less than or equal to the height of the tree. If the tree has not been updated or if the prefix list is not found or invalid, then undefined is returned. Otherwise the hash value from the most recent update is returned.

top_hash/1

top_hash(Tree::tree()) -> undefined | binary()

Returns the top-hash of the tree. This is the top-hash of the root node.

update_perform/1

update_perform(Tree::tree()) -> ok

Update the tree with a snapshot obtained by update_snapshot/1. This function may be called by a process other than the one managing the tree.

update_snapshot/1

update_snapshot(Tree::tree()) -> tree()

Snapshot the tree for updating. The return tree should be updated using update_perform/1 and to perform future operations on the tree


Generated by EDoc