- Notifications
You must be signed in to change notification settings - Fork 179
Add experimental/v1 ON_DISK_TRANSACTIONAL storage #850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This merge has dropped the performance on certain benchmarks by some 10% (check single_vertex_read, single_edge_read, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments from today
return replica_info; | ||
}); | ||
} | ||
StorageMode Storage::GetStorageMode() const { return storage_mode_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is storage_mode_
protected?
Seems like we are not using the accessor when reading it. Can this cause a race condition?
*similar story for the isolation_level_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for both operations you take exclusive lock (main_lock_)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You take the main_lock_
on write/set but on on read/get.
This is why I was asking about the Accessor
. There we take a shared lock on main_lock_
.
We shouldn't use the Storage::GetStorageMode()
directly since it is not protected. We should access it through the Accessor
or add a shared_lock before.
After our discussion on slack, I think we should add GetStorageMode
and GetIsolationLevel
to the Accessor
and DbAccessor
and replace all interpreter_context->db->GetStorageMode()
calls with a call through an accessor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it could be, that code exists for a long time it is very possible something is broken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not taking shared_lock in storage class then, why through accessor? Maybe I am missing something but I find it stupid to duplicate all functionalites to accessor level. Especially when main_lock_ is part of the storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could take the shared_lock, but that is also currently missing from the get functions. I think...
Also, from the comments on slack, I think we are trying to limit access to the database from the queries to the appropriate accessors and not directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got up to storage
} | ||
} | ||
| ||
InterpreterContext::InterpreterContext(std::unique_ptr<storage::Storage> db, InterpreterConfig interpreter_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use std::unique_ptr<storage::Storage> &&db
so the user know they are giving up control of the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
"query. "); | ||
} | ||
| ||
main_guard.unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the lock protect until the end?
The lock is part of the storage, so that might cause some problems. But here nothing stops some other thread from accessing the database while it is being switched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interpreters
lock could work, but it should be held till the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lock must be released before new DiskStorage has been created. And yes you are right that some thread can access the DB while being switched. However, this is solved by having creation_storage_mode_ in the accessor. Then upon commit, I just check whether the storage mode is the same as during the start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what happens to the storage if someone else is using it while you are switching it?
The interpreter_context->db = std::make_unique<memgraph::storage::DiskStorage>(db_config);
will destroy the old storage. We have to make sure no one else can possibly be using it while this is happening.
Nothing is stopping someone from accessing the original storage between lines:
if (interpreter_context->interpreters->size() > 1) {
and
interpreter_context->db = std::make_unique<memgraph::storage::DiskStorage>(db_config);
.
I imagine that the DiskStorage creation can take some time, so it's not impossible that this happens.
If the only check is on commit, then someone could be using the destroyed storage for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't know what happens, probably it would but the pure point of switching to disk is that you should it alone, at the beginning. So yes there is probably a moment in which something would crash but I wouldn't say the problem is someone is using destroyed storage before commit. But again I agree this should be somehow refactored but this is not easy part. Maybe when we tackle the issue of switching from Lab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this could work.
Every new connection adds an interpreter to the list. So this way they can't use it until we moved the storage.
NOTE: No idea if this code is actually okay.
interpreter_context->interpreters.WithLock([&](const auto &interpreters_) { if (interpreters_.size() > 1) { throw utils::BasicException( "You cannot switch from an in-memory storage mode to the on-disk storage mode when there are " "multiple sessions active. Close all other sessions and try again. As Memgraph Lab uses " "multiple sessions to run queries in parallel, " "it is currently impossible to switch to the on-disk storage mode within Lab. " "Close it, connect to the instance with mgconsole " "and change the storage mode to on-disk from there. Then, you can reconnect with the Lab " "and continue to use the instance as usual."); } main_guard.unlock(); interpreter_context->db = std::make_unique<memgraph::storage::DiskStorage>(interpreter_context->db->config_); });
requested_mode == storage::StorageMode::ON_DISK_TRANSACTIONAL) { | ||
callback = SwitchMemoryDevice(current_mode, requested_mode, interpreter_context).fn; | ||
} else { | ||
if (ActiveTransactionsExist(interpreter_context)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't ActiveTransactionsExist
check be done for the on-disk case as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but for on-disk there is even stronger check: whether there is only one interpreter. I will rework this at some point because of Lab
if (!db_accessor_) return; | ||
| ||
db_accessor_->Abort(); | ||
for (auto &qe : query_executions_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just call query_executions_.clear()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Darych but I think no because we support prepared queries or something like this so we have to use it or something. There is a good reason for that I am sure.
auto creation_mode = db_accessor_->GetCreationStorageMode(); | ||
if (creation_mode != storage::StorageMode::ON_DISK_TRANSACTIONAL && | ||
current_storage_mode == storage::StorageMode::ON_DISK_TRANSACTIONAL) { | ||
throw QueryException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to on-disk during a transaction should be impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I have a check:
if (in_explicit_transaction) { throw StorageModeModificationInMulticommandTxException(); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check should be unnecessary since the whole architecture should prevent this from happening.
Currently it is needed, but we should rework the code so that this cannot happen and so the check would not be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would architecture prevent? E.g
BEGIN;
CREATE VERTEX
STORAGE MODE ON_DISK_TRANSACTIONAL; // this will fail now
COMMIT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The void Interpreter::Commit()
is called on COMMIT, right?
The STORAGE MODE ON_DISK_TRANSACTIONAL;
will fail because we are in a transaction.
So when is this check in Commit() true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (creation_mode != storage::StorageMode::ON_DISK_TRANSACTIONAL && current_storage_mode == storage::StorageMode::ON_DISK_TRANSACTIONAL) {
That check is done in the Commit() and should never be true, right?
Even if it could be true in the current implementation, I think we should make changes to the code so that it never is true.
We don't allow mode changes in a transaction and we do not allow it if other users are actively using the database.
std::visit([](auto &memory_resource) { memory_resource.Release(); }, execution_memory); | ||
} | ||
| ||
void CleanRuntimeData() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Darych
if (auth_checker->Has(**impl_it, memgraph::query::AuthQuery::FineGrainedPrivilege::READ)) { | ||
const auto &check_vertex = | ||
it.source_vertex.getImpl() == (*impl_it)->From() ? (*impl_it)->To() : (*impl_it)->From(); | ||
auto edgeAcc = **impl_it; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what was happening here, cc @Darych take a look please
If it is 10% that is more-less ok I think. Buda expected that due to virtual calls. |
auto acc = vertices_.access(); | ||
auto vertex_it = acc.find(gid); | ||
if (vertex_it != acc.end()) { | ||
return VertexAccessor::Create(&*vertex_it, &transaction_, &storage_->indices_, &storage_->constraints_, config_, | ||
view); | ||
} | ||
for (const auto &vec : index_storage_) { | ||
acc = vec->access(); | ||
auto index_it = acc.find(gid); | ||
if (index_it != acc.end()) { | ||
return VertexAccessor::Create(&*index_it, &transaction_, &storage_->indices_, &storage_->constraints_, config_, | ||
view); | ||
} | ||
} | ||
| ||
rocksdb::ReadOptions read_opts; | ||
auto strTs = utils::StringTimestamp(transaction_.start_timestamp); | ||
rocksdb::Slice ts(strTs); | ||
read_opts.timestamp = &ts; | ||
auto *disk_storage = static_cast<DiskStorage *>(storage_); | ||
auto it = std::unique_ptr<rocksdb::Iterator>( | ||
disk_transaction_->GetIterator(read_opts, disk_storage->kvstore_->vertex_chandle)); | ||
for (it->SeekToFirst(); it->Valid(); it->Next()) { | ||
const auto &key = it->key(); | ||
if (Gid::FromUint(std::stoull(utils::ExtractGidFromKey(key.ToString()))) == gid) { | ||
return LoadVertexToMainMemoryCache(key, it->value()); | ||
} | ||
} | ||
return std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this function we check whether vertex exists in vertices_ first, then we check whethere there is vertex in index, then we get it from storage and load it to cache. But in loding to cache we again check if vertex exists in skip list and I think this repetition is costly.
auto main_storage_accessor = vertices_.access(); | ||
| ||
const std::string key_str = key.ToString(); | ||
storage::Gid gid = Gid::FromUint(std::stoull(utils::ExtractGidFromKey(key_str))); | ||
if (VertexExistsInCache(main_storage_accessor, gid)) { | ||
return std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is called by FindVertex. FindVertex already does check whether it is contained in skip list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this function should only do the loading, and check should be done before
if (VertexExistsInCache(index_accessor, gid)) { | ||
return std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this function shouldn't do this check. This check should be done before and then we just here load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is all intertwined but idea I think of this review is to separate it if possible
const auto edge_parts = utils::Split(key.ToStringView(), "|"); | ||
const Gid edge_gid = Gid::FromUint(std::stoull(edge_parts[4])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be part of some util function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passed through some stuff in disk storage
auto edge_acc = edges_.access(); | ||
auto res = edge_acc.find(edge_gid); | ||
if (res != edge_acc.end()) { | ||
return std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do that check before somewhere, so this function only does deserialization of edge from rocksdb
| ||
const auto [from_gid, to_gid] = std::invoke( | ||
[](const auto &edge_parts) { | ||
if (edge_parts[2] == "0") { // out edge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this index and zero be some kind of constants?
for (it->SeekToFirst(); it->Valid(); it->Next()) { | ||
LoadVertexToMainMemoryCache(it->key(), it->value()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return this as iterable, and then load vertices to main memory cache as something iterates over vertices?
auto main_storage_accessor = vertices_.access(); | ||
| ||
const std::string key_str = key.ToString(); | ||
storage::Gid gid = Gid::FromUint(std::stoull(utils::ExtractGidFromKey(key_str))); | ||
if (VertexExistsInCache(main_storage_accessor, gid)) { | ||
return std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this function should only do the loading, and check should be done before
VerticesIterable DiskStorage::DiskAccessor::Vertices(LabelId label, View view) { | ||
index_storage_.emplace_back(std::make_unique<utils::SkipList<storage::Vertex>>()); | ||
auto &indexed_vertices = index_storage_.back(); | ||
index_deltas_storage_.emplace_back(std::list<Delta>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just pass empty()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think we resolved that in other PR
if (VertexExistsInCache(index_accessor, gid)) { | ||
return std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is all intertwined but idea I think of this review is to separate it if possible
std::string key_str = index_it->key().ToString(); | ||
std::string it_value_str = index_it->value().ToString(); | ||
Gid curr_gid = Gid::FromUint(std::stoull(utils::ExtractGidFromLabelPropertyIndexStorage(key_str))); | ||
// TODO: andi this will be optimized bla bla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't have this comment here :) Just TODO(andi) optimize
Closes #842