B-Trees
B-tree implementation in C++ to index documents based on numerical id's and title strings.
|
#include "Commands.hpp"
#include <cstring>
#include <iostream>
#include "IdealBTree.hpp"
#include "Entry.hpp"
Classes | |
struct | IdIndex |
Helper struct to store primary indexes. More... | |
struct | TitleIndex |
Helper struct to store secondary indexes. More... | |
struct | HashfileHeader |
Header data for the hashfile. More... | |
Macros | |
#define | ROOT "./" |
File path to the folder where database files will be created. More... | |
#define | ID_TREE_FILENAME "bd-idtree.bin" |
Primary index data filename. More... | |
#define | ID_TREE_FILEPATH ROOT ID_TREE_FILENAME |
Full filepath to the primary index file. More... | |
#define | TITLE_TREE_FILENAME "bd-titletree.bin" |
Secondary index data filename. More... | |
#define | TITLE_TREE_FILEPATH ROOT TITLE_TREE_FILENAME |
Full filepath to the secondary index file. More... | |
#define | HASHFILE_FILENAME "bd-hashfile.bin" |
Hashing file filename. More... | |
#define | HASHFILE_FILEPATH ROOT HASHFILE_FILENAME |
Full filepath to the hashing file. More... | |
#define | HASHFILE_BLOCK_SIZE BLOCK_SIZE |
Block size in bytes to be used in the hashing file More... | |
#define | PATIENCE_STEP 10000 |
Show how many entries have already been read and indexed every once in a while. More... | |
Typedefs | |
typedef IdealBTree< IdIndex > | IdBTree |
Primary index B-tree. More... | |
typedef IdealBTree< TitleIndex > | TitleBTree |
Secondary index B-tree. More... | |
typedef Block< HashfileHeader, HASHFILE_BLOCK_SIZE > | HashfileHeaderBlock |
HashfileHeader block. More... | |
typedef Block< Entry, HASHFILE_BLOCK_SIZE > | EntryBlock |
Entry block. More... | |
Functions | |
bool | operator< (const IdIndex &index, int i) |
Less-than comparator between IdIndex::id and int. More... | |
bool | operator< (int i, const IdIndex &index) |
Less-than comparator between int and IdIndex::id. More... | |
bool | operator< (const TitleIndex &index, const char *title) |
Less-than comparator between TitleIndex::title and string. More... | |
bool | operator< (const char *title, const TitleIndex &index) |
Less-than comparator between string and TitleIndex::title. More... | |
void | upload (const char *filePath) |
Receives a CSV file and creates a database based on its contents. More... | |
void | findrec (long id) |
Finds an entry in the hashfile based on the entry's id. More... | |
void | seek1 (long id) |
Seeks an entry by its id using the primary index. More... | |
void | seek2 (const char *title) |
Seeks an entry by its title using the secondary index. More... | |
#define HASHFILE_BLOCK_SIZE BLOCK_SIZE |
Block size in bytes to be used in the hashing file
#define HASHFILE_FILENAME "bd-hashfile.bin" |
Hashing file filename.
#define HASHFILE_FILEPATH ROOT HASHFILE_FILENAME |
Full filepath to the hashing file.
#define ID_TREE_FILENAME "bd-idtree.bin" |
Primary index data filename.
#define ID_TREE_FILEPATH ROOT ID_TREE_FILENAME |
Full filepath to the primary index file.
#define PATIENCE_STEP 10000 |
Show how many entries have already been read and indexed every once in a while.
#define ROOT "./" |
File path to the folder where database files will be created.
#define TITLE_TREE_FILENAME "bd-titletree.bin" |
Secondary index data filename.
#define TITLE_TREE_FILEPATH ROOT TITLE_TREE_FILENAME |
Full filepath to the secondary index file.
typedef Block<Entry, HASHFILE_BLOCK_SIZE> EntryBlock |
Entry block.
HashfileHeader block.
typedef IdealBTree<IdIndex> IdBTree |
Primary index B-tree.
typedef IdealBTree<TitleIndex> TitleBTree |
Secondary index B-tree.
void findrec | ( | long | id | ) |
Finds an entry in the hashfile based on the entry's id.
The hashfile uses perfect hashing. Therefore, the entry's offset in the file can be calculated readily from the id.
In case the id doesn't result in a valid offset, the procedure will inform.
id | Id of the entry to find |
bool operator< | ( | const IdIndex & | index, |
int | i | ||
) |
Less-than comparator between IdIndex::id and int.
bool operator< | ( | int | i, |
const IdIndex & | index | ||
) |
Less-than comparator between int and IdIndex::id.
bool operator< | ( | const TitleIndex & | index, |
const char * | title | ||
) |
Less-than comparator between TitleIndex::title and string.
bool operator< | ( | const char * | title, |
const TitleIndex & | index | ||
) |
Less-than comparator between string and TitleIndex::title.
void seek1 | ( | long | id | ) |
Seeks an entry by its id using the primary index.
Uses a B-tree to seek the entry by id within the primary index. In case no entry with the id is found, the procedure will inform.
id | Id of the entry to find |
void seek2 | ( | const char * | title | ) |
Seeks an entry by its title using the secondary index.
Uses a B-tree to seek the entry by title within the secondary index.
Will return the first entry found with the title.
In case no entry with the provided title is found, the procedure will inform.
title | Title of the entry to find |
void upload | ( | const char * | filePath | ) |
Receives a CSV file and creates a database based on its contents.
Will load the data in the provided CSV file and, one by one, upload them into a database while indexing each entry by id and title using B-trees.
Each entry should be represented as a line in the CSV file with the following fields, one in each column:
Field | Type | Description |
---|---|---|
ID | Integer | Identifier of the article |
Title | Alpha 300 | Title of the article |
Year | Integer | Publication year |
Authors | Alpha 1024 | List of the authors |
Citations | Integer | Times the article has been cited |
Update | Date time | Time of the last time the article was updated (YYYY-MM-DD HH:mm:SS ) |
Snippet | Alpha 1024 | Text summary of the article's contents |
Details of the execution will be printed as the program executes.
Creates tree files upon completion:
db-hashfile.bin
: where the entries will be stored;db-idindex.bin
: primary index by id;db-titleindex.bin
: secondary index by title.The files will be overwritten if they already exist.
filePath | Path to the CSV file with entries |