B-Trees
B-tree implementation in C++ to index documents based on numerical id's and title strings.
Classes | Macros | Typedefs | Functions
Commands.cpp File Reference
#include "Commands.hpp"
#include <cstring>
#include <iostream>
#include "IdealBTree.hpp"
#include "Entry.hpp"
Include dependency graph for Commands.cpp:

Classes

struct  IdIndex
 Helper struct to store primary indexes. More...
 
struct  TitleIndex
 Helper struct to store secondary indexes. More...
 
struct  HashfileHeader
 Header data for the hashfile. More...
 

Macros

#define ROOT   "./"
 File path to the folder where database files will be created. More...
 
#define ID_TREE_FILENAME   "bd-idtree.bin"
 Primary index data filename. More...
 
#define ID_TREE_FILEPATH   ROOT ID_TREE_FILENAME
 Full filepath to the primary index file. More...
 
#define TITLE_TREE_FILENAME   "bd-titletree.bin"
 Secondary index data filename. More...
 
#define TITLE_TREE_FILEPATH   ROOT TITLE_TREE_FILENAME
 Full filepath to the secondary index file. More...
 
#define HASHFILE_FILENAME   "bd-hashfile.bin"
 Hashing file filename. More...
 
#define HASHFILE_FILEPATH   ROOT HASHFILE_FILENAME
 Full filepath to the hashing file. More...
 
#define HASHFILE_BLOCK_SIZE   BLOCK_SIZE
 Block size in bytes to be used in the hashing file More...
 
#define PATIENCE_STEP   10000
 Show how many entries have already been read and indexed every once in a while. More...
 

Typedefs

typedef IdealBTree< IdIndexIdBTree
 Primary index B-tree. More...
 
typedef IdealBTree< TitleIndexTitleBTree
 Secondary index B-tree. More...
 
typedef Block< HashfileHeader, HASHFILE_BLOCK_SIZEHashfileHeaderBlock
 HashfileHeader block. More...
 
typedef Block< Entry, HASHFILE_BLOCK_SIZEEntryBlock
 Entry block. More...
 

Functions

bool operator< (const IdIndex &index, int i)
 Less-than comparator between IdIndex::id and int. More...
 
bool operator< (int i, const IdIndex &index)
 Less-than comparator between int and IdIndex::id. More...
 
bool operator< (const TitleIndex &index, const char *title)
 Less-than comparator between TitleIndex::title and string. More...
 
bool operator< (const char *title, const TitleIndex &index)
 Less-than comparator between string and TitleIndex::title. More...
 
void upload (const char *filePath)
 Receives a CSV file and creates a database based on its contents. More...
 
void findrec (long id)
 Finds an entry in the hashfile based on the entry's id. More...
 
void seek1 (long id)
 Seeks an entry by its id using the primary index. More...
 
void seek2 (const char *title)
 Seeks an entry by its title using the secondary index. More...
 

Macro Definition Documentation

#define HASHFILE_BLOCK_SIZE   BLOCK_SIZE

Block size in bytes to be used in the hashing file

#define HASHFILE_FILENAME   "bd-hashfile.bin"

Hashing file filename.

#define HASHFILE_FILEPATH   ROOT HASHFILE_FILENAME

Full filepath to the hashing file.

#define ID_TREE_FILENAME   "bd-idtree.bin"

Primary index data filename.

#define ID_TREE_FILEPATH   ROOT ID_TREE_FILENAME

Full filepath to the primary index file.

#define PATIENCE_STEP   10000

Show how many entries have already been read and indexed every once in a while.

#define ROOT   "./"

File path to the folder where database files will be created.

#define TITLE_TREE_FILENAME   "bd-titletree.bin"

Secondary index data filename.

#define TITLE_TREE_FILEPATH   ROOT TITLE_TREE_FILENAME

Full filepath to the secondary index file.

Typedef Documentation

Entry block.

Primary index B-tree.

Secondary index B-tree.

Function Documentation

void findrec ( long  id)

Finds an entry in the hashfile based on the entry's id.

The hashfile uses perfect hashing. Therefore, the entry's offset in the file can be calculated readily from the id.

In case the id doesn't result in a valid offset, the procedure will inform.

Parameters
idId of the entry to find
bool operator< ( const IdIndex index,
int  i 
)

Less-than comparator between IdIndex::id and int.

bool operator< ( int  i,
const IdIndex index 
)

Less-than comparator between int and IdIndex::id.

bool operator< ( const TitleIndex index,
const char *  title 
)

Less-than comparator between TitleIndex::title and string.

bool operator< ( const char *  title,
const TitleIndex index 
)

Less-than comparator between string and TitleIndex::title.

void seek1 ( long  id)

Seeks an entry by its id using the primary index.

Uses a B-tree to seek the entry by id within the primary index. In case no entry with the id is found, the procedure will inform.

Parameters
idId of the entry to find
void seek2 ( const char *  title)

Seeks an entry by its title using the secondary index.

Uses a B-tree to seek the entry by title within the secondary index.

Will return the first entry found with the title.

In case no entry with the provided title is found, the procedure will inform.

Parameters
titleTitle of the entry to find
void upload ( const char *  filePath)

Receives a CSV file and creates a database based on its contents.

Will load the data in the provided CSV file and, one by one, upload them into a database while indexing each entry by id and title using B-trees.

Each entry should be represented as a line in the CSV file with the following fields, one in each column:

Field Type Description
ID Integer Identifier of the article
Title Alpha 300 Title of the article
Year Integer Publication year
Authors Alpha 1024 List of the authors
Citations Integer Times the article has been cited
Update Date time Time of the last time the article was updated (YYYY-MM-DD HH:mm:SS)
Snippet Alpha 1024 Text summary of the article's contents

Details of the execution will be printed as the program executes.

Creates tree files upon completion:

  • db-hashfile.bin: where the entries will be stored;
  • db-idindex.bin: primary index by id;
  • db-titleindex.bin: secondary index by title.

The files will be overwritten if they already exist.

Parameters
filePathPath to the CSV file with entries