If onError fails with cluster_version_changed, retry the error on the new transaction. #1734

ajbeamon · 2019-06-20T18:23:08Z

No description provided.

… new transaction.

atn34

LGTM (other than documentation build)

atn34 · 2019-06-21T16:04:58Z

documentation/sphinx/source/release-notes.rst

 Fixes
 -----

+* If a cluster is upgraded during an `onError` call, the cluster could return a `cluster_version_changed` error. `(PR #1734) <https://github.com/apple/foundationdb/pull/1734>`_.


This appears to break the documentation build

Warning, treated as error:
/this_is_some_very_long_name_dir_needed_to_fix_a_bug_with_debug_rpms/foundationdb/documentation/sphinx/source/release-notes.rst:17: ERROR: Broken role invoked

alecgrieser · 2019-06-28T18:39:00Z

fdbclient/MultiVersionTransaction.actor.cpp

+f = abortableFuture(f, tr.onChange);
+
+return flatMapThreadFuture<Void, Void>(f, [this, e](ErrorOr<Void> ready) {
+if(!ready.isError() || ready.getError().code() != error_code_cluster_version_changed) {


Okay, so the exact problem here is:

The user sets up two client libraries and then uses one of them to connect to the cluster.

They do a thing that results in an error (like, say, get a transaction conflict).

While waiting on the error, the cluster is upgraded.

The onError call used to propagate that error up, but now it calls onError in the new thing and updates the transaction to use the new client library.

I think that means that for existing users of the retry loops, they would have just seen a retriable error propagate up rather than, say, seeing the transaction retry loop try again with an incompatible protocol version (right?). I ask because I think if it were the latter, then there would be a somewhat compelling case that this should go onto 6.1 as a patch, but I think it's fine on master if only the former.

Yeah, that's a good summary of the problem.

The old behavior should have resulted in it bubbling up a retryable error, but since it came from onError it's likely they wouldn't have actually attempted to retry it.

ajbeamon added 2 commits June 20, 2019 11:21

If onError fails with cluster_version_changed, retry the error on the…

cf30c47

… new transaction.

Update release notes.

9e62bbd

ajbeamon requested review from alecgrieser and atn34 June 20, 2019 18:25

ajbeamon assigned alecgrieser Jun 20, 2019

atn34 approved these changes Jun 21, 2019

View reviewed changes

Documentation fix

9a55b1f

alecgrieser reviewed Jun 28, 2019

View reviewed changes

alecgrieser approved these changes Jul 2, 2019

View reviewed changes

alecgrieser merged commit a84f481 into apple:master Jul 2, 2019

ajbeamon deleted the fix-onerror-retries-on-cluster-version-changed branch July 9, 2019 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

If onError fails with cluster_version_changed, retry the error on the new transaction. #1734

If onError fails with cluster_version_changed, retry the error on the new transaction. #1734

Uh oh!

ajbeamon commented Jun 20, 2019

atn34 left a comment

atn34 Jun 21, 2019

alecgrieser Jun 28, 2019

ajbeamon Jun 28, 2019

Labels

3 participants

If onError fails with cluster_version_changed, retry the error on the new transaction. #1734

If onError fails with cluster_version_changed, retry the error on the new transaction. #1734

Uh oh!

Conversation

ajbeamon commented Jun 20, 2019

atn34 left a comment

Choose a reason for hiding this comment

atn34 Jun 21, 2019

Choose a reason for hiding this comment

alecgrieser Jun 28, 2019

Choose a reason for hiding this comment

ajbeamon Jun 28, 2019

Choose a reason for hiding this comment

Labels

3 participants