Skip to content

Conversation

@etschannen
Copy link
Contributor

These changes improve how data distribution keeps load balanced across servers, improves how shard merges are handled, and improves how data distribution functions when recovering from a storage server failure.

@etschannen etschannen requested a review from ajbeamon January 10, 2020 02:30
bestSize = teamList[j]->size();
}
}
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose completeSources has servers: s1, s2, s3, s4, s5; none of s1's teams is a subset of {s1, s2, s3, s4, s5}, which means found = false and bestOption is not found.
However, is it possible that s2 has a team {s2, s3, s4}? In this case, the bestOption exists but the for-loop at Line 765 fails to find it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess also you could have a situation where s1 has a valid team, but there exists a larger one that doesn't include it.

auto& teamList = self->server_info[ req.sources[i] ]->teams;
int bestSize = 0;
for( int i = 0; i < req.completeSources.size(); i++ ) {
if( self->server_info.count( req.completeSources[i] ) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing the if to
if( !self->server_info.count( req.completeSources[i] ) ) {continue;} is more intuitive.

The purpose of the for-loop is only to find the first server of req.completeSources that is in server_info. It does not loop back once it finds the first such server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are looking for the first value to satisfy some criteria, we could also write this using std::find_if. I'm ok with the current code or Meng's suggestion, too, though.

}

int bestSize = 0;
for( int i = 0; i < req.completeSources.size(); i++ ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how often duplicate servers exist in req.completeSources.
If it is not rare, using the complementSources set calculated at the beginning of this function has performance benefit, since the getTeam() function is in hot path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete sources should never have duplicates, it was only turned into a set to do a count to lookup if ids exists. getTeam only called roughly once per shard moved, so it is not hot enough to the point that it should cause CPU problems for the DD algorithm.

auto shardBounds = getShardSizeBounds( merged, maxShardSize );
if( endingStats.bytes >= shardBounds.min.bytes ||
getBandwidthStatus( endingStats ) != BandwidthStatusLow ||
now() - lastLowBandwidthTime < SERVER_KNOBS->DD_LOW_BANDWIDTH_DELAY ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand and agree we should not merge a shard unless it has been in lowBandwidth status for a while.

This change will increase the number of shards. Do we have a rough estimation on the extra number of shards this change may potentially increase? (The DD_LOW_BANDWIDTH_DELAY value may affect the number.)

Is the current number of shards close to the maximum number of shards a cluster can tolerate without experiencing performance issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect this will have on the number of shards is heavily dependent on the client workload. There is very little risk that this will increase the shard count enough to impact the database performance, but we should monitor the number of shards after we roll out this change.

@ajbeamon
Copy link
Contributor

@fdb-build test this please

std::set<UID> completeSources;
for( int i = 0; i < req.completeSources.size(); i++ ) {
completeSources.insert( req.completeSources[i] );
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be written as:

std::set<UID> completeSources(req.completeSources.begin(), req.completeSources.end()); 
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also if you use this constructor, you should be able to mark it as const, which could clear up any ambiguity about whether you are duplicating the original data so that you can modify it or to just use as a lookup.

auto& teamList = self->server_info[ req.sources[i] ]->teams;
int bestSize = 0;
for( int i = 0; i < req.completeSources.size(); i++ ) {
if( self->server_info.count( req.completeSources[i] ) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are looking for the first value to satisfy some criteria, we could also write this using std::find_if. I'm ok with the current code or Meng's suggestion, too, though.

bestSize = teamList[j]->size();
}
}
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess also you could have a situation where s1 has a valid team, but there exists a larger one that doesn't include it.

StorageMetrics metrics = wait( tr.waitStorageMetrics( keys, bounds.min, bounds.max, bounds.permittedError, CLIENT_KNOBS->STORAGE_METRICS_SHARD_LIMIT ) );
BandwidthStatus newBandwidthStatus = getBandwidthStatus( metrics );
if(newBandwidthStatus == BandwidthStatusLow && bandwidthStatus != BandwidthStatusLow) {
lastLowBandwidthTime = now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name of this might cause some confusion, as it to me implies the last time the shard was at low bandwidth rather than the time when low bandwidth started. Maybe something like lastLowBandwidthStartTime is clearer?

@ajbeamon
Copy link
Contributor

The change to the LAST_LIMITED_RATIO knob may resolve #1884.

… check all completeSources we do not need to track bestSize, since all teams in the list will be the same size
@etschannen etschannen merged commit 17e97f2 into apple:release-6.2 Jan 11, 2020
@etschannen etschannen deleted the feature-dd-improvements branch January 13, 2020 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants