Constant creation of buckets with a hashcode in order to store record on secondary memory, has a limit. Supports insertion, key-based search and remove.
explicit operator bool()
- Returns a bool that indicates whether the index has already been created.
void create_index()
- Constructs the hash index file from a fixed length binary data file.
- It creates 2 files: The directory file (.ehashdir) and the hash index (.ehash).
- Accesses to disk: O(n) where n is the total number of records in the data file.
void insert(RecordType &record, long record_ref)
- When overflow happens, a new bucket is pushed to the front of the overflow chain and linked, to allow for more efficient insertions.
- Throws an exception if the key of the record to be inserted is already present and the index is for a primary key.
void remove(KeyType key)
- Removes every record that matches the given key by marking it as removed on the data file.
- Does nothing if the key does not exist.
std::vector<RecordType> search(KeyType key)
- Searches a given key.
- Returns a vector of elements that match the given key.
- If the index was created for primary keys, it returns a single element.
- If no element matches the given key, it returns an empty vector.
If the hash is indexing a primary, secondary key or any other field:
| Member Function | Performance | Description |
|---|---|---|
insert(RecordType &record, long record_ref) | As a hash function, access should be constant. However, we must consider bucket chaining and the idea of descending on the maximum depth of the index. | |
search(KeyType key) | As a hash function, access should be constant. However, we must consider bucket chaining. | |
remove(KeyType key) | As a hash function, access should be constant. However, we must consider bucket chaining. |
struct MovieRecord { int dataId{}; char contentType[16]{'\0'}; char title[256]{'\0'}; short length{}; short releaseYear{}; short endYear{}; int votes{}; float rating{}; int gross{}; char certificate[16]{'\0'}; char description[512]{'\0'}; bool removed{}; // < Note: As of now it is required that your structure implements a bool removed std::string to_string() { std::stringstream ss; ss << "(" << dataId << ", " << contentType << ", " << title << ", " << length << ", " << releaseYear << ", " << endYear << ", " << votes << ", " << rating << ", " << gross << ", " << certificate << ", " << std::boolalpha << removed << ")"; return ss.str(); } }; std::function<bool(char[16], char[16])> equal = [](char a[16], char b[16]) -> bool { return std::string(a) == std::string(b); }; std::function<char *(MovieRecord &)> index = [=](MovieRecord &record) { return record.contentType; }; std::hash<std::string> hasher; std::function<std::size_t(char[16])> hash = [&hasher](char key[16]) { return hasher(std::string(key)); }; ExtendibleHashFile<char[16], MovieRecord, 16, std::function<char *(MovieRecord &)>, std::function<bool(char[16], char[16])>, std::function<std::size_t(char[16])>> extendible_hash_content_type{"movies_and_series.dat", "content_type", false, index, equal, hash}; if (!extendible_hash_content_type) { extendible_hash_content_type.create_index(); } char str[16] = "movie\0"; auto result = extendible_hash_content_type.search(str); std::cout << "Total: " << result.size() << std::endl;See main.cpp for further reference on how to index different data types.