fix: distinguish server timeouts from transport timeouts #43
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Fixes #40.
This PR fixes the problem with timeouts sometimes occurring too early by making a distinction between the server timeout (the
timeoutMs
API parameter) and the timeout for the transport layer. These two are now independent from each other.How to reproduce
Seems like the ticket description provides a reasonable way to do it (repeating a non-trivial query a lot of times). I had quite some trouble reproducing it consistently on my network, but manually disabling one's internet connection can also be used.
See the rest of the notes for remarks/discussion.
Methods might block longer than
timeout
In the 1.24.0 release, a timeout parameter was added to public methods to prevent HTTP requests from hanging indefinitely. However, if the timeout is not provided (e.g. when polling for job completion), trying to estimate it from the server-side "is job done" timeout can lead to random timeout errors due to random network delays, etc.
Since
timeout
is now directly passed as the timeout to the underlyingrequests
lib, it means that the actual duration of a function call can be considerably longer than a wall clock timeout would be.As this negates the wall clock timeout approximation, the logic dealing with that has also been removed in the methods that might send multiple requests. The transport timeout now applies to each individual request.
Timeout errors are not retried
If a transport timeout error is raised, it is currently not retried by the default
retry
object. We might actually not want to change that, as timeout errors are generally used by the core futures to signal that retrying a request took too long.In any case, the default behavior is now again the same as in pre-1.24.0 versions, meaning that the user code that worked with, say, version 1.19.0 should behave the same.
PR checklist