Skip to content

Conversation

@ajbeamon
Copy link
Contributor

No description provided.

Copy link
Collaborator

@atn34 atn34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (other than documentation build)

Fixes
-----

* If a cluster is upgraded during an `onError` call, the cluster could return a `cluster_version_changed` error. `(PR #1734) <https://github.com/apple/foundationdb/pull/1734>`_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to break the documentation build

Warning, treated as error:
/this_is_some_very_long_name_dir_needed_to_fix_a_bug_with_debug_rpms/foundationdb/documentation/sphinx/source/release-notes.rst:17: ERROR: Broken role invoked

f = abortableFuture(f, tr.onChange);

return flatMapThreadFuture<Void, Void>(f, [this, e](ErrorOr<Void> ready) {
if(!ready.isError() || ready.getError().code() != error_code_cluster_version_changed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so the exact problem here is:

  1. The user sets up two client libraries and then uses one of them to connect to the cluster.
  2. They do a thing that results in an error (like, say, get a transaction conflict).
  3. While waiting on the error, the cluster is upgraded.
  4. The onError call used to propagate that error up, but now it calls onError in the new thing and updates the transaction to use the new client library.

I think that means that for existing users of the retry loops, they would have just seen a retriable error propagate up rather than, say, seeing the transaction retry loop try again with an incompatible protocol version (right?). I ask because I think if it were the latter, then there would be a somewhat compelling case that this should go onto 6.1 as a patch, but I think it's fine on master if only the former.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good summary of the problem.

The old behavior should have resulted in it bubbling up a retryable error, but since it came from onError it's likely they wouldn't have actually attempted to retry it.

@alecgrieser alecgrieser merged commit a84f481 into apple:master Jul 2, 2019
@ajbeamon ajbeamon deleted the fix-onerror-retries-on-cluster-version-changed branch July 9, 2019 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants