B-Trees
B-tree implementation in C++ to index documents based on numerical id's and title strings.
Classes | Public Types | Public Member Functions | Static Public Attributes | Private Types | Private Member Functions | Private Attributes | List of all members
BTree< T, M, BlockSize > Class Template Reference

B-tree class. More...

#include <BTree.hpp>

Collaboration diagram for BTree< T, M, BlockSize >:
Collaboration graph
[legend]

Classes

struct  BNode
 B-tree node. More...
 
struct  FileHeader
 File header data for BTree indexes. More...
 
struct  OverflowResult
 Provides information to deal with an insertion overflow. More...
 
struct  Statistics
 BTree usage analytics. More...
 

Public Types

typedef T ValueType
 Type of the stored values. More...
 

Public Member Functions

 BTree ()
 Default constructor. More...
 
 ~BTree ()
 Destructor. More...
 
bool create (const char *filepath)
 Initializes BTree for writing. More...
 
bool load (const char *filepath)
 Initializes BTree for reading only. More...
 
void insert (const T &value)
 Inserts a value in the tree. More...
 
template<typename U >
std::unique_ptr< T > seek (const U &key)
 Seeks a value that's equivalent to the one provided. More...
 
const StatisticsgetStatistics (bool includeFileBlockCount=false) const
 Returns the BTree usage statistics so far. More...
 
void resetStatistics ()
 Reset all statistics values to 0. More...
 
void finishInsertions ()
 Updates the header with the total blocks in the file. More...
 

Static Public Attributes

static constexpr auto Order = M
 Tree order. More...
 
static constexpr auto BlockSizeInUse = BlockSize
 Block size in bytes More...
 

Private Types

typedef Block< FileHeader, BlockSize > FileHeaderBlock
 File header block. More...
 
typedef Block< BNode, BlockSize > BNodeBlock
 Node block. More...
 

Private Member Functions

BNodeBlock readFromDisk (long offset)
 Read a node in the block at the provided offset. More...
 
void writeToDisk (BNodeBlock &node)
 Writes the node to disk. More...
 
FileHeaderBlock readHeader () const
 Reads the file header. More...
 
void writeHeader (const FileHeaderBlock &header)
 Updates the file header in disk. More...
 
std::unique_ptr< OverflowResultinsert (BNodeBlock &node, T value, long rightNodeOffset=-1)
 Internal method for B-tree insertion. More...
 

Private Attributes

std::FILE * m_file
 File pointer to the file where data will be stored. More...
 
BNodeBlock m_root
 Root node of the B-tree. More...
 
Statistics m_stats
 Where BTree stores its read and write statistics. More...
 

Detailed Description

template<typename T, std::size_t M, unsigned int BlockSize = BLOCK_SIZE>
class BTree< T, M, BlockSize >

B-tree class.

BTree can be used to store T-type values in binary files for fast retrieval later.

Use the method BTree::create before inserting values and, once you've finished inserting, call BTree::finishInsertions to update the file header.

Use BTree::load before reading values.

Example usage:

// Insert data
tree.create("filename.bin");
tree.insert(1);
// Read data
tree.load("filename.bin");
auto x = tree.seek(1);
if (x) f(*x); // If found, do something to x
Template Parameters
TType of the data to be stored. Must be a POD (Plain Old Data type) and less-than comparable.
MTree order. Each node will store up to 2M values and have up to 2M + 1 children. A node can't be bigger than BLOCK_SIZE bytes.
BlockSizeBlock size to use, in bytes

Member Typedef Documentation

template<typename T, std::size_t M, unsigned int BlockSize = BLOCK_SIZE>
typedef Block<BNode, BlockSize> BTree< T, M, BlockSize >::BNodeBlock
private

Node block.

template<typename T, std::size_t M, unsigned int BlockSize = BLOCK_SIZE>
typedef Block<FileHeader, BlockSize> BTree< T, M, BlockSize >::FileHeaderBlock
private

File header block.

template<typename T, std::size_t M, unsigned int BlockSize = BLOCK_SIZE>
typedef T BTree< T, M, BlockSize >::ValueType

Type of the stored values.

Constructor & Destructor Documentation

template<typename T , std::size_t M, unsigned int BlockSize>
BTree< T, M, BlockSize >::BTree ( )

Default constructor.

Initializes the statistics with 0 values and makes the file pointer point to null

template<typename T , std::size_t M, unsigned int BlockSize>
BTree< T, M, BlockSize >::~BTree ( )

Destructor.

Closes the file if it's open

Member Function Documentation

template<typename T , std::size_t M, unsigned int BlockSize>
bool BTree< T, M, BlockSize >::create ( const char *  filepath)

Initializes BTree for writing.

Opens the file in "wb+" mode. Must be called before inserting values in the tree. Also allows reading through BTree::seek.

If there's already a file in the filepath, it'll be overwritten.

The new file will be created in the provided filepath with a header and an empty root node.

After you've finished inserting values, call BTree::finishInsertions to update the file header.

Parameters
filepathPath to the file where the tree data will be written
Returns
True if the file was created successfully
template<typename T , std::size_t M, unsigned int BlockSize>
void BTree< T, M, BlockSize >::finishInsertions ( )

Updates the header with the total blocks in the file.

Very important to be used once you've finished using a tree that was initialized with BTree::create

template<typename T , std::size_t M, unsigned int BlockSize>
const BTree< T, M, BlockSize >::Statistics & BTree< T, M, BlockSize >::getStatistics ( bool  includeFileBlockCount = false) const

Returns the BTree usage statistics so far.

Including the quantity of blocks in the file is optional because it requires reading the file header to update the field. The read value will be stored in memory and can be accessed later without reading the file header again.

Parameters
includeFileBlockCountTrue if you want to update the Statistics::blocksInDisk value, false otherwise
Returns
Statistics
template<typename T, std::size_t M, unsigned int BlockSize>
void BTree< T, M, BlockSize >::insert ( const T &  value)

Inserts a value in the tree.

Parameters
valueValue to insert
template<typename T, std::size_t M, unsigned int BlockSize>
std::unique_ptr< typename BTree< T, M, BlockSize >::OverflowResult > BTree< T, M, BlockSize >::insert ( BNodeBlock node,
value,
long  rightNodeOffset = -1 
)
private

Internal method for B-tree insertion.

If the node is a leaf, it seeks the node pointer for the node in which the value must be inserted and tries to insert again recursively. If the node isn't a leaf, inserts normally.

In case of overflow, the method will split the node provided as parameter and return an OverflowResult instance with data for the parent node to deal with the overflow. The parent will try to insert again but, this time, in itself.

The value of rightNodeOffset should be different from -1 only if the method is dealing with an overflow. In this case, it's the parent node which must be trying to insert the node.

Parameters
nodeNode in which to insert
valueValue to insert
rightNodeOffsetOffset of the node to the right of the value to insert; if -1, the node to the right doesn't exist yet
Returns
A pointer with an OverflowResult instance in case of overflow, null otherwise
template<typename T , std::size_t M, unsigned int BlockSize>
bool BTree< T, M, BlockSize >::load ( const char *  filepath)

Initializes BTree for reading only.

Opens the file in "rb" mode. Must be before reading values form a file where values have already been inserted. Insertions won't be possible.

The file must have been previously created by a BTree which successfully called BTree::create and BTree::finishInsertions.

Parameters
filepathPath to the file where tree data can be found
Returns
True if it was possible to open the file in filepath
template<typename T , std::size_t M, unsigned int BlockSize>
BTree< T, M, BlockSize >::BNodeBlock BTree< T, M, BlockSize >::readFromDisk ( long  offset)
private

Read a node in the block at the provided offset.

Assumes that the provided offset will always be valid. If an invalid offset is provided, the result is undefined.

Parameters
offsetNode's offset in the file
Returns
Block with the node found in the provided offset
template<typename T , std::size_t M, unsigned int BlockSize>
BTree< T, M, BlockSize >::FileHeaderBlock BTree< T, M, BlockSize >::readHeader ( ) const
private

Reads the file header.

This method assumes that BTree has already been initialized and the file is open. Otherwise, behavior is undefined.

Increments Statistics::blocksRead.

Returns
Block with the file header
template<typename T , std::size_t M, unsigned int BlockSize>
void BTree< T, M, BlockSize >::resetStatistics ( )

Reset all statistics values to 0.

Use with caution

template<typename T , std::size_t M, unsigned int BlockSize>
template<typename U >
std::unique_ptr< T > BTree< T, M, BlockSize >::seek ( const U &  key)

Seeks a value that's equivalent to the one provided.

If the value is not found, a null pointer will be returned.

Template Parameters
UType of the key to seek. T and U must be less-than comparable.
Parameters
keyThe value to seek
lessComparer instance
Returns
Pointer with the value if found, null otherwise
template<typename T , std::size_t M, unsigned int BlockSize>
void BTree< T, M, BlockSize >::writeHeader ( const FileHeaderBlock header)
private

Updates the file header in disk.

Parameters
headerBlock with file header to write
template<typename T , std::size_t M, unsigned int BlockSize>
void BTree< T, M, BlockSize >::writeToDisk ( BNodeBlock node)
private

Writes the node to disk.

If the node still hasn't been written to disk (BNode::offset == -1), it'll be appended to the end of the file and BNode::offset will be updated. In case this happens, Statistics::blocksCreated will be incremented.

If the node has already been written to disk it'll simply be updated.

Parameters
nodeNode to write

Member Data Documentation

template<typename T, std::size_t M, unsigned int BlockSize = BLOCK_SIZE>
constexpr auto BTree< T, M, BlockSize >::BlockSizeInUse = BlockSize
static

Block size in bytes

template<typename T, std::size_t M, unsigned int BlockSize = BLOCK_SIZE>
std::FILE* BTree< T, M, BlockSize >::m_file
private

File pointer to the file where data will be stored.

template<typename T, std::size_t M, unsigned int BlockSize = BLOCK_SIZE>
BNodeBlock BTree< T, M, BlockSize >::m_root
private

Root node of the B-tree.

template<typename T, std::size_t M, unsigned int BlockSize = BLOCK_SIZE>
Statistics BTree< T, M, BlockSize >::m_stats
mutableprivate

Where BTree stores its read and write statistics.

template<typename T, std::size_t M, unsigned int BlockSize = BLOCK_SIZE>
constexpr auto BTree< T, M, BlockSize >::Order = M
static

Tree order.


The documentation for this class was generated from the following files: