- Notifications
You must be signed in to change notification settings - Fork 14
feat: implement async session creation #1405
Conversation
Also enable session creation in the background thread. I'm planning to change the existing uses of synchronous creation to async but that requires very substantial test changes. However, I did run a couple experiments - one *only* using `AsyncBatchCreateSessions()`, and one with that, and *only* calling `Grow()` from the background thread (that too some hackery) and all of the integration tests, benchmarks, and samples passed in both cases. So, I wanted to get this out for review while I work on making the unit tests function with async ops. Part of #1271
| google/cloud/spanner/internal/session_pool.cc, line 327 at r1 (raw file):
@coryan should I be saving the return value from this somewhere and call cancel() on it in the destructor? or does it not matter? |
| google/cloud/spanner/internal/session_pool.cc, line 327 at r1 (raw file): Previously, mr-salty (Todd Derr) wrote…
to clarify - "this" meaning the |
devbww left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits.
| future<StatusOr<spanner_proto::BatchCreateSessionsResponse>> | ||
| result) { | ||
| if (auto shared_pool = pool.lock()) { | ||
| shared_pool->HandleBatchCreateSessionsDone(channel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log something on failure here?
coryan left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @devjgm, @mr-salty, and @scotthart)
google/cloud/spanner/internal/session_pool.h, line 159 at r1 (raw file):
std::vector<std::unique_ptr<Session>> sessions_; // GUARDED_BY(mu_) int total_sessions_ = 0; // GUARDED_BY(mu_) int create_calls_in_progress_ = 0; // GUARDED_BY(mu_)
Should we track the number of calls or the number of sessions that are in progress?
google/cloud/spanner/internal/session_pool.cc, line 209 at r1 (raw file):
} } lk.lock();
FYI: I could not figure out why you needed to reacquire the lock because I lost my train of thought... I think this function could be split maybe? Seems like there is a portion computing how many more sessions are needed, if any, and another sending the request to create that many sessions, that suggests a split? (it could be an artifact of how this looks in
google/cloud/spanner/internal/session_pool.cc, line 327 at r1 (raw file):
Previously, mr-salty (Todd Derr) wrote…
to clarify - "this" meaning the
thenresult (which I assume isfuture<void>?
We can, but as discussed elsewhere, that has no practical effect. AFAIK, there is no way to cancel a unary RPC (streaming RPCs and timers can be cancelled), you can request a cancel, it just does nothing.
google/cloud/spanner/internal/session_pool.cc, line 422 at r1 (raw file):
} // Add sessions to the pool and update counters for `channel` and the pool. int sessions_created = response->session_size();
s/int/auto const/` maybe?
google/cloud/spanner/internal/session_pool.cc, line 430 at r1 (raw file):
std::move(*session.mutable_name()), channel)); } // Shuffle the pool so we distribute returned sessions across channels.
Aren't they associated with a Channel already (like two lines before).
google/cloud/spanner/internal/session_pool.cc, line 432 at r1 (raw file):
// Shuffle the pool so we distribute returned sessions across channels. std::shuffle(sessions_.begin(), sessions_.end(), std::mt19937(std::random_device()()));
You may want to create one of those in the constructor and hold on to it.
Codecov Report
@@ Coverage Diff @@ ## master #1405 +/- ## ========================================== - Coverage 95.2% 95.14% -0.07% ========================================== Files 190 191 +1 Lines 15713 15703 -10 ========================================== - Hits 14960 14941 -19 - Misses 753 762 +9
Continue to review full report at Codecov.
|
mr-salty left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 7 unresolved discussions (waiting on @coryan, @devbww, @devjgm, and @scotthart)
google/cloud/spanner/internal/session_pool.h, line 159 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
Should we track the number of calls or the number of sessions that are in progress?
This sent me down a bit of a rathole; I'm not sure number of sessions by itself would help much but sessions for each channel would probably give us some flexibility. I put some thoughts about that in #1406 but think that's too much to bite off right now (this PR is analogous to the current behavior - we don't issue any new calls until all outstanding calls complete)
google/cloud/spanner/internal/session_pool.cc, line 26 at r1 (raw file):
Previously, devbww (Bradley White) wrote…
Is this related?
I'm not sure where that came from... I remember one day getting some clang-tidy because I was #including <math.h> and changing it, except I see math.h wasn't here originally easier, and seems like kind of a hard thing to type accidentally... so maybe some automagic thing inserted it somehow.
google/cloud/spanner/internal/session_pool.cc, line 196 at r1 (raw file):
Previously, devbww (Bradley White) wrote…
statuscould be localized to thekWaitcase.
Done.
google/cloud/spanner/internal/session_pool.cc, line 209 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
FYI: I could not figure out why you needed to reacquire the lock because I lost my train of thought... I think this function could be split maybe? Seems like there is a portion computing how many more sessions are needed, if any, and another sending the request to create that many sessions, that suggests a split? (it could be an artifact of how this looks in
SG... I split them.
I also wish there was a scoped object that drops a lock and reacquires it... I had such a thing in a previous life (they were called CriticalSection and AntiCriticalSection).
google/cloud/spanner/internal/session_pool.cc, line 327 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
We can, but as discussed elsewhere, that has no practical effect. AFAIK, there is no way to cancel a unary RPC (streaming RPCs and timers can be cancelled), you can request a cancel, it just does nothing.
ack.
google/cloud/spanner/internal/session_pool.cc, line 332 at r1 (raw file):
Previously, devbww (Bradley White) wrote…
Should we log something on failure here?
not sure... we generally haven't before.
google/cloud/spanner/internal/session_pool.cc, line 422 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
s/int/auto const/` maybe?
Done.
google/cloud/spanner/internal/session_pool.cc, line 430 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
Aren't they associated with a Channel already (like two lines before).
I was typing a long comment here but decided it was better to put it in #1406 for posterity.
So, i think let's leave it like this for now (it was existing code, it just got moved) and revisit.
google/cloud/spanner/internal/session_pool.cc, line 432 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
You may want to create one of those in the constructor and hold on to it.
Done.
coryan left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @devbww, @devjgm, and @scotthart)
google/cloud/spanner/internal/session_pool.h, line 159 at r1 (raw file):
Previously, mr-salty (Todd Derr) wrote…
This sent me down a bit of a rathole; I'm not sure number of sessions by itself would help much but sessions for each channel would probably give us some flexibility. I put some thoughts about that in #1406 but think that's too much to bite off right now (this PR is analogous to the current behavior - we don't issue any new calls until all outstanding calls complete)
SGTM
devbww left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @devjgm, @mr-salty, and @scotthart)
google/cloud/spanner/internal/session_pool.cc, line 332 at r1 (raw file):
Previously, mr-salty (Todd Derr) wrote…
not sure... we generally haven't before.
But only because we capture and return the status, right? Here we discard it. We might be in the dark about why we're failing to allocate sessions. Perhaps we don't care, but I'm not sure.
devbww left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @devjgm, @mr-salty, and @scotthart)
google/cloud/spanner/internal/session_pool.cc, line 209 at r1 (raw file):
Previously, mr-salty (Todd Derr) wrote…
SG... I split them.
I also wish there was a scoped object that drops a lock and reacquires it... I had such a thing in a previous life (they were called
CriticalSectionandAntiCriticalSection).
AntiCriticalSection is usually a smell. It means the enclosing CriticalSection really isn't one.
coryan left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @devbww, @devjgm, @mr-salty, and @scotthart)
google/cloud/spanner/internal/session_pool.cc, line 209 at r1 (raw file):
AntiCriticalSectionis usually a smell. It means the enclosing CriticalSection really isn't one.
OTOH, releasing locks while making remote calls is a pattern 🤷♂️ I agree with you that this is a smell, but this one seems fine.
google/cloud/spanner/internal/session_pool.cc, line 332 at r1 (raw file):
Previously, devbww (Bradley White) wrote…
But only because we capture and return the status, right? Here we discard it. We might be in the dark about why we're failing to allocate sessions. Perhaps we don't care, but I'm not sure.
In case it helps: a disabled GCP_LOG() should be extremely cheap (I think an atomic read and an if).
google/cloud/spanner/internal/session_pool.cc, line 147 at r3 (raw file):
return create_counts.status(); } create_calls_in_progress_ += static_cast<int>(create_counts->size());
Consider: maybe create_calls_in_progress_ is of the wrong type?
mr-salty left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @coryan, @devbww, @devjgm, @mr-salty, and @scotthart)
google/cloud/spanner/internal/session_pool.cc, line 209 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
AntiCriticalSectionis usually a smell. It means the enclosing CriticalSection really isn't one.OTOH, releasing locks while making remote calls is a pattern 🤷♂️ I agree with you that this is a smell, but this one seems fine.
IDK, we can discuss elsewhere. I could buy the argument that CriticalSection is a poor choice of name because it implies the whole thing is atomic, but the pattern of "drop the lock, do something, reacquire the lock" is pretty common (condition variables, for instance). Certainly explicit unlock() and lock() calls are less safe and harder to grok (per Carlos' original comment).
google/cloud/spanner/internal/session_pool.cc, line 332 at r1 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
In case it helps: a disabled
GCP_LOG()should be extremely cheap (I think an atomic read and an if).
For the moment Allocate() still does what @devbww describes, so I don't think much has really changed, we might have an extra call that fails and we ignore the result, but we already do that in Initialize. When I make all allocation async we'll have to solve this.
More generally, if someone has some cycles tomorrow we should try to add some kind of SessionPool instrumentation - see #1412 for some musings on that.
mr-salty left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @coryan, @devbww, @devjgm, and @scotthart)
google/cloud/spanner/internal/session_pool.cc, line 147 at r3 (raw file):
Previously, coryan (Carlos O'Ryan) wrote…
Consider: maybe
create_calls_in_progress_is of the wrong type?
oops, I didn't see this comment before... although recall going down this rathole in the past for similar things and deciding to use int rather than caving to the vector API
…panner#1405) * feat: implement async session creation Also enable session creation in the background thread. I'm planning to change the existing uses of synchronous creation to async but that requires very substantial test changes. However, I did run a couple experiments - one *only* using `AsyncBatchCreateSessions()`, and one with that, and *only* calling `Grow()` from the background thread (that too some hackery) and all of the integration tests, benchmarks, and samples passed in both cases. So, I wanted to get this out for review while I work on making the unit tests function with async ops. Part of googleapis/google-cloud-cpp-spanner#1271 * address review comments * fix windows compile error
Also enable session creation in the background thread.
I'm planning to change the existing uses of synchronous creation to
async but that requires very substantial test changes. However, I did
run a couple experiments - one only using
AsyncBatchCreateSessions(),and one with that, and only calling
Grow()from the backgroundthread (that too some hackery) and all of the integration tests,
benchmarks, and samples passed in both cases. So, I wanted to get this
out for review while I work on making the unit tests function with async
ops.
Part of #1271
This change is