B-Trees
B-tree implementation in C++ to index documents based on numerical id's and title strings.
|
B-tree class. More...
#include <BTree.hpp>
Classes | |
struct | BNode |
B-tree node. More... | |
struct | FileHeader |
File header data for BTree indexes. More... | |
struct | OverflowResult |
Provides information to deal with an insertion overflow. More... | |
struct | Statistics |
BTree usage analytics. More... | |
Public Types | |
typedef T | ValueType |
Type of the stored values. More... | |
Public Member Functions | |
BTree () | |
Default constructor. More... | |
~BTree () | |
Destructor. More... | |
bool | create (const char *filepath) |
Initializes BTree for writing. More... | |
bool | load (const char *filepath) |
Initializes BTree for reading only. More... | |
void | insert (const T &value) |
Inserts a value in the tree. More... | |
template<typename U > | |
std::unique_ptr< T > | seek (const U &key) |
Seeks a value that's equivalent to the one provided. More... | |
const Statistics & | getStatistics (bool includeFileBlockCount=false) const |
Returns the BTree usage statistics so far. More... | |
void | resetStatistics () |
Reset all statistics values to 0. More... | |
void | finishInsertions () |
Updates the header with the total blocks in the file. More... | |
Static Public Attributes | |
static constexpr auto | Order = M |
Tree order. More... | |
static constexpr auto | BlockSizeInUse = BlockSize |
Block size in bytes More... | |
Private Types | |
typedef Block< FileHeader, BlockSize > | FileHeaderBlock |
File header block. More... | |
typedef Block< BNode, BlockSize > | BNodeBlock |
Node block. More... | |
Private Member Functions | |
BNodeBlock | readFromDisk (long offset) |
Read a node in the block at the provided offset. More... | |
void | writeToDisk (BNodeBlock &node) |
Writes the node to disk. More... | |
FileHeaderBlock | readHeader () const |
Reads the file header. More... | |
void | writeHeader (const FileHeaderBlock &header) |
Updates the file header in disk. More... | |
std::unique_ptr< OverflowResult > | insert (BNodeBlock &node, T value, long rightNodeOffset=-1) |
Internal method for B-tree insertion. More... | |
Private Attributes | |
std::FILE * | m_file |
File pointer to the file where data will be stored. More... | |
BNodeBlock | m_root |
Root node of the B-tree. More... | |
Statistics | m_stats |
Where BTree stores its read and write statistics. More... | |
B-tree class.
BTree can be used to store T-type values in binary files for fast retrieval later.
Use the method BTree::create before inserting values and, once you've finished inserting, call BTree::finishInsertions to update the file header.
Use BTree::load before reading values.
Example usage:
T | Type of the data to be stored. Must be a POD (Plain Old Data type) and less-than comparable. |
M | Tree order. Each node will store up to 2M values and have up to 2M + 1 children. A node can't be bigger than BLOCK_SIZE bytes. |
BlockSize | Block size to use, in bytes |
|
private |
Node block.
|
private |
File header block.
typedef T BTree< T, M, BlockSize >::ValueType |
Type of the stored values.
Default constructor.
Initializes the statistics with 0 values and makes the file pointer point to null
Destructor.
Closes the file if it's open
bool BTree< T, M, BlockSize >::create | ( | const char * | filepath | ) |
Initializes BTree for writing.
Opens the file in "wb+" mode. Must be called before inserting values in the tree. Also allows reading through BTree::seek.
If there's already a file in the filepath, it'll be overwritten.
The new file will be created in the provided filepath with a header and an empty root node.
After you've finished inserting values, call BTree::finishInsertions to update the file header.
filepath | Path to the file where the tree data will be written |
void BTree< T, M, BlockSize >::finishInsertions | ( | ) |
Updates the header with the total blocks in the file.
Very important to be used once you've finished using a tree that was initialized with BTree::create
const BTree< T, M, BlockSize >::Statistics & BTree< T, M, BlockSize >::getStatistics | ( | bool | includeFileBlockCount = false | ) | const |
Returns the BTree usage statistics so far.
Including the quantity of blocks in the file is optional because it requires reading the file header to update the field. The read value will be stored in memory and can be accessed later without reading the file header again.
includeFileBlockCount | True if you want to update the Statistics::blocksInDisk value, false otherwise |
void BTree< T, M, BlockSize >::insert | ( | const T & | value | ) |
Inserts a value in the tree.
value | Value to insert |
|
private |
Internal method for B-tree insertion.
If the node is a leaf, it seeks the node pointer for the node in which the value must be inserted and tries to insert again recursively. If the node isn't a leaf, inserts normally.
In case of overflow, the method will split the node provided as parameter and return an OverflowResult instance with data for the parent node to deal with the overflow. The parent will try to insert again but, this time, in itself.
The value of rightNodeOffset should be different from -1 only if the method is dealing with an overflow. In this case, it's the parent node which must be trying to insert the node.
node | Node in which to insert |
value | Value to insert |
rightNodeOffset | Offset of the node to the right of the value to insert; if -1, the node to the right doesn't exist yet |
bool BTree< T, M, BlockSize >::load | ( | const char * | filepath | ) |
Initializes BTree for reading only.
Opens the file in "rb" mode. Must be before reading values form a file where values have already been inserted. Insertions won't be possible.
The file must have been previously created by a BTree which successfully called BTree::create and BTree::finishInsertions.
filepath | Path to the file where tree data can be found |
|
private |
Read a node in the block at the provided offset.
Assumes that the provided offset will always be valid. If an invalid offset is provided, the result is undefined.
offset | Node's offset in the file |
|
private |
Reads the file header.
This method assumes that BTree has already been initialized and the file is open. Otherwise, behavior is undefined.
Increments Statistics::blocksRead.
void BTree< T, M, BlockSize >::resetStatistics | ( | ) |
Reset all statistics values to 0.
Use with caution
std::unique_ptr< T > BTree< T, M, BlockSize >::seek | ( | const U & | key | ) |
Seeks a value that's equivalent to the one provided.
If the value is not found, a null pointer will be returned.
U | Type of the key to seek. T and U must be less-than comparable. |
key | The value to seek |
less | Comparer instance |
|
private |
Updates the file header in disk.
header | Block with file header to write |
|
private |
Writes the node to disk.
If the node still hasn't been written to disk (BNode::offset == -1), it'll be appended to the end of the file and BNode::offset will be updated. In case this happens, Statistics::blocksCreated will be incremented.
If the node has already been written to disk it'll simply be updated.
node | Node to write |
|
static |
Block size in bytes
|
private |
File pointer to the file where data will be stored.
|
private |
Root node of the B-tree.
|
mutableprivate |
Where BTree stores its read and write statistics.
|
static |
Tree order.