- Notifications
You must be signed in to change notification settings - Fork 49
Shard aware support #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Please, don't squash commits - you can always do it later. |
| Hi guys, as discussed on Slack I've pushed the changes necessary to a more graceful shard aware connection pooling ;) |
| @vladzcloudius care to take a look at the test in 5b636fc. I guess you have some more ideas, on how/what should be test in that regards |
| @fruch I'd like to see how driver behaves when you tear connections down: send RST from the remote side. |
* unit test for parsing and shard_id calculation * integration test using trace to verify requests going to the correct shard
since the async callback, to be on the safe side, adding a lock to the `set.remove()` call
refactor the logic of re-opening connection to all shards into it's own function so we can reuse it in `_replace()`
base on @ultrabug comment, changing the debug print to be more clear and informative
based on @ultrabug comments, this should be named better
base on @ultrabug comments, this would be faster.
Better explian why do we loop at max twice over the shards in this function.
Protocol does not (yet?) allow clients to specify which shard they want to connect to and thus depend on a round-robin allocation of the shard_id made by the host node (see system.clients table). This means that on a live cluster where client connections come and go, we cannot guarantee that we will get a connection to every shard... even by retrying twice (which slows down the connection startup also) This commit switches to use an optimistic approach where we try to connect as many times as there are shards on the remote cluster at first. Then when routing_key is used and shard aware connection picking can take place, we will try to open missing connections as we detect them. This is more graceful and allows us to not fail if we miss shard connections as well as reduce the connection startup time! A long running client will hopefully get a connection to all shards after a while! That's the best we can do for now until the protocol evolves.
Initial connection tentatives to shards are now scheduled instead of being blocking on startup. This allows the shard aware driver to connect to a Scylla cluster as fast as Cassandra one!
* add test for multiple client at the same time * add test for closing connections * add test for blocking(timing out) connections
On busy systems we could overwhelm the threadpool executor queue with repeated submissions of speculative shard connections to the same shard which results in having an unbound number of connection openings to scylla nodes This could be seen as a "connection leak" and was also not respecting the signal of cluster connections shutting down
| @ultrabug FYI, I've rebase and force pushed this branch (not very social of me.. but I want travis to start working ontop of this PR too) |
No description provided.