- Notifications
You must be signed in to change notification settings - Fork 1.5k
Data distribution improvements #2526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data distribution improvements #2526
Conversation
…s of the source unless the team matches exactly
…idth for 5 minutes
… an unmerged shard
…ution rebalancing
fdbserver/DataDistribution.actor.cpp Outdated
| bestSize = teamList[j]->size(); | ||
| } | ||
| } | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose completeSources has servers: s1, s2, s3, s4, s5; none of s1's teams is a subset of {s1, s2, s3, s4, s5}, which means found = false and bestOption is not found.
However, is it possible that s2 has a team {s2, s3, s4}? In this case, the bestOption exists but the for-loop at Line 765 fails to find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess also you could have a situation where s1 has a valid team, but there exists a larger one that doesn't include it.
fdbserver/DataDistribution.actor.cpp Outdated
| auto& teamList = self->server_info[ req.sources[i] ]->teams; | ||
| int bestSize = 0; | ||
| for( int i = 0; i < req.completeSources.size(); i++ ) { | ||
| if( self->server_info.count( req.completeSources[i] ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changing the if to
if( !self->server_info.count( req.completeSources[i] ) ) {continue;} is more intuitive.
The purpose of the for-loop is only to find the first server of req.completeSources that is in server_info. It does not loop back once it finds the first such server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are looking for the first value to satisfy some criteria, we could also write this using std::find_if. I'm ok with the current code or Meng's suggestion, too, though.
| } | ||
| | ||
| int bestSize = 0; | ||
| for( int i = 0; i < req.completeSources.size(); i++ ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how often duplicate servers exist in req.completeSources.
If it is not rare, using the complementSources set calculated at the beginning of this function has performance benefit, since the getTeam() function is in hot path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete sources should never have duplicates, it was only turned into a set to do a count to lookup if ids exists. getTeam only called roughly once per shard moved, so it is not hot enough to the point that it should cause CPU problems for the DD algorithm.
| auto shardBounds = getShardSizeBounds( merged, maxShardSize ); | ||
| if( endingStats.bytes >= shardBounds.min.bytes || | ||
| getBandwidthStatus( endingStats ) != BandwidthStatusLow || | ||
| now() - lastLowBandwidthTime < SERVER_KNOBS->DD_LOW_BANDWIDTH_DELAY || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand and agree we should not merge a shard unless it has been in lowBandwidth status for a while.
This change will increase the number of shards. Do we have a rough estimation on the extra number of shards this change may potentially increase? (The DD_LOW_BANDWIDTH_DELAY value may affect the number.)
Is the current number of shards close to the maximum number of shards a cluster can tolerate without experiencing performance issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The effect this will have on the number of shards is heavily dependent on the client workload. There is very little risk that this will increase the shard count enough to impact the database performance, but we should monitor the number of shards after we roll out this change.
| @fdb-build test this please |
fdbserver/DataDistribution.actor.cpp Outdated
| std::set<UID> completeSources; | ||
| for( int i = 0; i < req.completeSources.size(); i++ ) { | ||
| completeSources.insert( req.completeSources[i] ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be written as:
std::set<UID> completeSources(req.completeSources.begin(), req.completeSources.end()); There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also if you use this constructor, you should be able to mark it as const, which could clear up any ambiguity about whether you are duplicating the original data so that you can modify it or to just use as a lookup.
fdbserver/DataDistribution.actor.cpp Outdated
| auto& teamList = self->server_info[ req.sources[i] ]->teams; | ||
| int bestSize = 0; | ||
| for( int i = 0; i < req.completeSources.size(); i++ ) { | ||
| if( self->server_info.count( req.completeSources[i] ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are looking for the first value to satisfy some criteria, we could also write this using std::find_if. I'm ok with the current code or Meng's suggestion, too, though.
fdbserver/DataDistribution.actor.cpp Outdated
| bestSize = teamList[j]->size(); | ||
| } | ||
| } | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess also you could have a situation where s1 has a valid team, but there exists a larger one that doesn't include it.
| StorageMetrics metrics = wait( tr.waitStorageMetrics( keys, bounds.min, bounds.max, bounds.permittedError, CLIENT_KNOBS->STORAGE_METRICS_SHARD_LIMIT ) ); | ||
| BandwidthStatus newBandwidthStatus = getBandwidthStatus( metrics ); | ||
| if(newBandwidthStatus == BandwidthStatusLow && bandwidthStatus != BandwidthStatusLow) { | ||
| lastLowBandwidthTime = now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name of this might cause some confusion, as it to me implies the last time the shard was at low bandwidth rather than the time when low bandwidth started. Maybe something like lastLowBandwidthStartTime is clearer?
| The change to the |
… check all completeSources we do not need to track bestSize, since all teams in the list will be the same size
…as elapses from becoming low bandwidth
…date it based on shardSize changes
These changes improve how data distribution keeps load balanced across servers, improves how shard merges are handled, and improves how data distribution functions when recovering from a storage server failure.