Skip to content

Conversation

gitbuda
Copy link
Member

@gitbuda gitbuda commented Mar 29, 2023

Closes #842

@gitbuda gitbuda added this to the mg-v2.8.0 milestone Mar 29, 2023
@gitbuda gitbuda added the feature feature label Mar 29, 2023
@gitbuda gitbuda modified the milestones: mg-v2.8.0, mg-v2.9.0 Apr 17, 2023
@gitbuda gitbuda changed the title Add ON_DISK_TRANSACTIONAL storage [master < E] Add ON_DISK_TRANSACTIONAL storage May 22, 2023
@gitbuda gitbuda changed the title [master < E] Add ON_DISK_TRANSACTIONAL storage [master < E850] Add ON_DISK_TRANSACTIONAL storage May 22, 2023
@gitbuda gitbuda changed the title [master < E850] Add ON_DISK_TRANSACTIONAL storage [master < E850] Add ON_DISK_TRANSACTIONAL storage v1/experimental Jun 27, 2023
@gitbuda gitbuda changed the title [master < E850] Add ON_DISK_TRANSACTIONAL storage v1/experimental Add experimental/v1 ON_DISK_TRANSACTIONAL storage Jun 28, 2023
@gitbuda gitbuda marked this pull request as ready for review June 29, 2023 09:41
@gitbuda gitbuda merged commit 9d056e7 into master Jun 29, 2023
@gitbuda gitbuda deleted the add-on-disk-transactional-storage branch June 29, 2023 09:44
@andrejtonev
Copy link
Contributor

This merge has dropped the performance on certain benchmarks by some 10% (check single_vertex_read, single_edge_read, etc.)
Double check if this is due to a fundamental memgraph change or something to do with the benchmark itself.

Copy link
Contributor

@andrejtonev andrejtonev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments from today

return replica_info;
});
}
StorageMode Storage::GetStorageMode() const { return storage_mode_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is storage_mode_ protected?
Seems like we are not using the accessor when reading it. Can this cause a race condition?
*similar story for the isolation_level_

Copy link
Contributor

@as51340 as51340 Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for both operations you take exclusive lock (main_lock_)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You take the main_lock_ on write/set but on on read/get.
This is why I was asking about the Accessor. There we take a shared lock on main_lock_.
We shouldn't use the Storage::GetStorageMode() directly since it is not protected. We should access it through the Accessor or add a shared_lock before.
After our discussion on slack, I think we should add GetStorageMode and GetIsolationLevel to the Accessor and DbAccessor and replace all interpreter_context->db->GetStorageMode() calls with a call through an accessor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it could be, that code exists for a long time it is very possible something is broken

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not taking shared_lock in storage class then, why through accessor? Maybe I am missing something but I find it stupid to duplicate all functionalites to accessor level. Especially when main_lock_ is part of the storage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could take the shared_lock, but that is also currently missing from the get functions. I think...
Also, from the comments on slack, I think we are trying to limit access to the database from the queries to the appropriate accessors and not directly.

Copy link
Contributor

@andrejtonev andrejtonev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got up to storage

}
}

InterpreterContext::InterpreterContext(std::unique_ptr<storage::Storage> db, InterpreterConfig interpreter_config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use std::unique_ptr<storage::Storage> &&db so the user know they are giving up control of the pointer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

"query. ");
}

main_guard.unlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the lock protect until the end?
The lock is part of the storage, so that might cause some problems. But here nothing stops some other thread from accessing the database while it is being switched.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interpreters lock could work, but it should be held till the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock must be released before new DiskStorage has been created. And yes you are right that some thread can access the DB while being switched. However, this is solved by having creation_storage_mode_ in the accessor. Then upon commit, I just check whether the storage mode is the same as during the start

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what happens to the storage if someone else is using it while you are switching it?
The interpreter_context->db = std::make_unique<memgraph::storage::DiskStorage>(db_config); will destroy the old storage. We have to make sure no one else can possibly be using it while this is happening.

Nothing is stopping someone from accessing the original storage between lines:
if (interpreter_context->interpreters->size() > 1) {
and
interpreter_context->db = std::make_unique<memgraph::storage::DiskStorage>(db_config);.
I imagine that the DiskStorage creation can take some time, so it's not impossible that this happens.
If the only check is on commit, then someone could be using the destroyed storage for a long time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know what happens, probably it would but the pure point of switching to disk is that you should it alone, at the beginning. So yes there is probably a moment in which something would crash but I wouldn't say the problem is someone is using destroyed storage before commit. But again I agree this should be somehow refactored but this is not easy part. Maybe when we tackle the issue of switching from Lab

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this could work.
Every new connection adds an interpreter to the list. So this way they can't use it until we moved the storage.
NOTE: No idea if this code is actually okay.

interpreter_context->interpreters.WithLock([&](const auto &interpreters_) { if (interpreters_.size() > 1) { throw utils::BasicException( "You cannot switch from an in-memory storage mode to the on-disk storage mode when there are " "multiple sessions active. Close all other sessions and try again. As Memgraph Lab uses " "multiple sessions to run queries in parallel, " "it is currently impossible to switch to the on-disk storage mode within Lab. " "Close it, connect to the instance with mgconsole " "and change the storage mode to on-disk from there. Then, you can reconnect with the Lab " "and continue to use the instance as usual."); } main_guard.unlock(); interpreter_context->db = std::make_unique<memgraph::storage::DiskStorage>(interpreter_context->db->config_); }); 
requested_mode == storage::StorageMode::ON_DISK_TRANSACTIONAL) {
callback = SwitchMemoryDevice(current_mode, requested_mode, interpreter_context).fn;
} else {
if (ActiveTransactionsExist(interpreter_context)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't ActiveTransactionsExist check be done for the on-disk case as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but for on-disk there is even stronger check: whether there is only one interpreter. I will rework this at some point because of Lab

if (!db_accessor_) return;

db_accessor_->Abort();
for (auto &qe : query_executions_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just call query_executions_.clear()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Darych but I think no because we support prepared queries or something like this so we have to use it or something. There is a good reason for that I am sure.

auto creation_mode = db_accessor_->GetCreationStorageMode();
if (creation_mode != storage::StorageMode::ON_DISK_TRANSACTIONAL &&
current_storage_mode == storage::StorageMode::ON_DISK_TRANSACTIONAL) {
throw QueryException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to on-disk during a transaction should be impossible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I have a check:

if (in_explicit_transaction) { throw StorageModeModificationInMulticommandTxException(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should be unnecessary since the whole architecture should prevent this from happening.
Currently it is needed, but we should rework the code so that this cannot happen and so the check would not be needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would architecture prevent? E.g
BEGIN;
CREATE VERTEX
STORAGE MODE ON_DISK_TRANSACTIONAL; // this will fail now
COMMIT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The void Interpreter::Commit() is called on COMMIT, right?
The STORAGE MODE ON_DISK_TRANSACTIONAL; will fail because we are in a transaction.
So when is this check in Commit() true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 if (creation_mode != storage::StorageMode::ON_DISK_TRANSACTIONAL && current_storage_mode == storage::StorageMode::ON_DISK_TRANSACTIONAL) { 

That check is done in the Commit() and should never be true, right?
Even if it could be true in the current implementation, I think we should make changes to the code so that it never is true.
We don't allow mode changes in a transaction and we do not allow it if other users are actively using the database.

std::visit([](auto &memory_resource) { memory_resource.Release(); }, execution_memory);
}

void CleanRuntimeData() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Darych

if (auth_checker->Has(**impl_it, memgraph::query::AuthQuery::FineGrainedPrivilege::READ)) {
const auto &check_vertex =
it.source_vertex.getImpl() == (*impl_it)->From() ? (*impl_it)->To() : (*impl_it)->From();
auto edgeAcc = **impl_it;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a copy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what was happening here, cc @Darych take a look please

@as51340
Copy link
Contributor

as51340 commented Aug 9, 2023

This merge has dropped the performance on certain benchmarks by some 10% (check single_vertex_read, single_edge_read, etc.) Double check if this is due to a fundamental memgraph change or something to do with the benchmark itself.

If it is 10% that is more-less ok I think. Buda expected that due to virtual calls.

Comment on lines +745 to +774
auto acc = vertices_.access();
auto vertex_it = acc.find(gid);
if (vertex_it != acc.end()) {
return VertexAccessor::Create(&*vertex_it, &transaction_, &storage_->indices_, &storage_->constraints_, config_,
view);
}
for (const auto &vec : index_storage_) {
acc = vec->access();
auto index_it = acc.find(gid);
if (index_it != acc.end()) {
return VertexAccessor::Create(&*index_it, &transaction_, &storage_->indices_, &storage_->constraints_, config_,
view);
}
}

rocksdb::ReadOptions read_opts;
auto strTs = utils::StringTimestamp(transaction_.start_timestamp);
rocksdb::Slice ts(strTs);
read_opts.timestamp = &ts;
auto *disk_storage = static_cast<DiskStorage *>(storage_);
auto it = std::unique_ptr<rocksdb::Iterator>(
disk_transaction_->GetIterator(read_opts, disk_storage->kvstore_->vertex_chandle));
for (it->SeekToFirst(); it->Valid(); it->Next()) {
const auto &key = it->key();
if (Gid::FromUint(std::stoull(utils::ExtractGidFromKey(key.ToString()))) == gid) {
return LoadVertexToMainMemoryCache(key, it->value());
}
}
return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this function we check whether vertex exists in vertices_ first, then we check whethere there is vertex in index, then we get it from storage and load it to cache. But in loding to cache we again check if vertex exists in skip list and I think this repetition is costly.

Comment on lines +310 to +316
auto main_storage_accessor = vertices_.access();

const std::string key_str = key.ToString();
storage::Gid gid = Gid::FromUint(std::stoull(utils::ExtractGidFromKey(key_str)));
if (VertexExistsInCache(main_storage_accessor, gid)) {
return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called by FindVertex. FindVertex already does check whether it is contained in skip list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this function should only do the loading, and check should be done before

Comment on lines +341 to +343
if (VertexExistsInCache(index_accessor, gid)) {
return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this function shouldn't do this check. This check should be done before and then we just here load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is all intertwined but idea I think of this review is to separate it if possible

Comment on lines +353 to +354
const auto edge_parts = utils::Split(key.ToStringView(), "|");
const Gid edge_gid = Gid::FromUint(std::stoull(edge_parts[4]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be part of some util function?

Copy link
Contributor

@antoniofilipovic antoniofilipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed through some stuff in disk storage

Comment on lines +356 to +360
auto edge_acc = edges_.access();
auto res = edge_acc.find(edge_gid);
if (res != edge_acc.end()) {
return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do that check before somewhere, so this function only does deserialization of edge from rocksdb


const auto [from_gid, to_gid] = std::invoke(
[](const auto &edge_parts) {
if (edge_parts[2] == "0") { // out edge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this index and zero be some kind of constants?

Comment on lines +392 to +394
for (it->SeekToFirst(); it->Valid(); it->Next()) {
LoadVertexToMainMemoryCache(it->key(), it->value());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return this as iterable, and then load vertices to main memory cache as something iterates over vertices?

Comment on lines +310 to +316
auto main_storage_accessor = vertices_.access();

const std::string key_str = key.ToString();
storage::Gid gid = Gid::FromUint(std::stoull(utils::ExtractGidFromKey(key_str)));
if (VertexExistsInCache(main_storage_accessor, gid)) {
return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this function should only do the loading, and check should be done before

VerticesIterable DiskStorage::DiskAccessor::Vertices(LabelId label, View view) {
index_storage_.emplace_back(std::make_unique<utils::SkipList<storage::Vertex>>());
auto &indexed_vertices = index_storage_.back();
index_deltas_storage_.emplace_back(std::list<Delta>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just pass empty()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think we resolved that in other PR

Comment on lines +341 to +343
if (VertexExistsInCache(index_accessor, gid)) {
return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is all intertwined but idea I think of this review is to separate it if possible

std::string key_str = index_it->key().ToString();
std::string it_value_str = index_it->value().ToString();
Gid curr_gid = Gid::FromUint(std::stoull(utils::ExtractGidFromLabelPropertyIndexStorage(key_str)));
// TODO: andi this will be optimized bla bla
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't have this comment here :) Just TODO(andi) optimize

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment