fix: distinguish server timeouts from transport timeouts #43

plamut · 2020-02-25T12:12:22Z

Fixes #40.

This PR fixes the problem with timeouts sometimes occurring too early by making a distinction between the server timeout (the timeoutMs API parameter) and the timeout for the transport layer. These two are now independent from each other.

How to reproduce

Seems like the ticket description provides a reasonable way to do it (repeating a non-trivial query a lot of times). I had quite some trouble reproducing it consistently on my network, but manually disabling one's internet connection can also be used.

See the rest of the notes for remarks/discussion.

Methods might block longer than `timeout`

In the 1.24.0 release, a timeout parameter was added to public methods to prevent HTTP requests from hanging indefinitely. However, if the timeout is not provided (e.g. when polling for job completion), trying to estimate it from the server-side "is job done" timeout can lead to random timeout errors due to random network delays, etc.

Since timeout is now directly passed as the timeout to the underlying requests lib, it means that the actual duration of a function call can be considerably longer than a wall clock timeout would be.

As this negates the wall clock timeout approximation, the logic dealing with that has also been removed in the methods that might send multiple requests. The transport timeout now applies to each individual request.

Timeout errors are not retried

If a transport timeout error is raised, it is currently not retried by the default retry object. We might actually not want to change that, as timeout errors are generally used by the core futures to signal that retrying a request took too long.

In any case, the default behavior is now again the same as in pre-1.24.0 versions, meaning that the user code that worked with, say, version 1.19.0 should behave the same.

PR checklist

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

A transport layer timeout is made independent of the query timeout, i.e. the maximum time to wait for the query to complete. The query timeout is used by the blocking poll so that the backend does not block for too long when polling for job completion, but the transport can have different timeout requirements, and we do not want it to be raising sometimes unnecessary timeout errors.

As job methods do not split the timeout anymore between all requests a method might make, the Client methods are adjusted in the same way.

plamut · 2020-03-09T10:11:55Z

@tswast Ping.

(BTW, should I still be requesting BigQuery reviews from you by default? Or should I pick, say, @shollyman instead?)

tswast

LGTM with one question.

Yes, please direct future PRs to @shollyman in the future. He can pull me in when he needs more context.

tests/unit/test_client.py

sonots · 2020-05-14T17:03:40Z

@plamut Hi, I like a new version will be released.

plamut · 2020-05-14T17:48:51Z

@sonots I will have to check, but since quite a few fixes and additions have been made since the last release, I think there is a decent chance a new version will be released "soon" (say, by the end of the month).

It's just my personal opinion, however, not an official answer, but I will try to make it happen sooner rather than later.

plamut · 2020-06-10T12:00:07Z

@sonots FYI, a new version of the library has been released, you can now try it out.

linnabrown · 2020-11-18T04:54:18Z

Thanks. Could you provide an example when I want to insert a CSV file into bigquery allowing longer timeout limitation? Thanks

plamut added 2 commits February 25, 2020 11:56

Apply timeout to each of the underlying requests

ceb9690

As job methods do not split the timeout anymore between all requests a method might make, the Client methods are adjusted in the same way.

plamut added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 25, 2020

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 25, 2020

Merge branch 'master' into iss-40

89e5c2e

plamut requested a review from tswast March 2, 2020 08:17

plamut added 2 commits March 4, 2020 11:02

Merge branch 'master' into iss-40

7dac01f

Merge branch 'master' into iss-40

9210abd

Merge branch 'master' into iss-40

fb187b9

tswast approved these changes Mar 9, 2020

View reviewed changes

tests/unit/test_client.py Show resolved Hide resolved

plamut merged commit a17be5f into googleapis:master Mar 9, 2020

plamut deleted the iss-40 branch March 9, 2020 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: distinguish server timeouts from transport timeouts #43

fix: distinguish server timeouts from transport timeouts #43

Uh oh!

plamut commented Feb 25, 2020 •

edited

Loading

plamut commented Mar 9, 2020

tswast left a comment

Uh oh!

sonots commented May 14, 2020

plamut commented May 14, 2020

plamut commented Jun 10, 2020

linnabrown commented Nov 18, 2020

Labels

5 participants

fix: distinguish server timeouts from transport timeouts #43

fix: distinguish server timeouts from transport timeouts #43

Uh oh!

Conversation

plamut commented Feb 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to reproduce

Methods might block longer than timeout

Timeout errors are not retried

PR checklist

plamut commented Mar 9, 2020

tswast left a comment

Choose a reason for hiding this comment

Uh oh!

sonots commented May 14, 2020

plamut commented May 14, 2020

plamut commented Jun 10, 2020

linnabrown commented Nov 18, 2020

Labels

5 participants

plamut commented Feb 25, 2020 •

edited

Loading

Methods might block longer than `timeout`