- Notifications
You must be signed in to change notification settings - Fork 9
Enable client-side timeouts and replace retry logic with reqwest's #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable client-side timeouts and replace retry logic with reqwest's #39
Conversation
| 👋 Thanks for assigning @tankyleo as a reviewer! |
src/client.rs Outdated
| .timeout(DEFAULT_TIMEOUT) | ||
| .connect_timeout(DEFAULT_TIMEOUT) | ||
| .read_timeout(DEFAULT_TIMEOUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you seems like we could just do with the single global timeout here, no need for connect and read ?
But we can leave it as is and potentially tweak the inner timeouts later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, no strong opinion here.
For reference original PR is #20. On first impression, I'd be in favor of the drop myself. |
Do the added timeouts apply to a single operation? If it is never exceeded, wouldn't we still want What is meant by client-side default delay? |
Yes, they apply for a single read, but also for connecting / detecting dropped connections AFAIU.
Yes, it could be useful, but of course its somewhat redundant if we set a client-side
Ah, sorry, that was a typo I only corrected in the PR title: should have said default timeout, not delay. |
Do you mean
Isn't this already the case when configured as I mentioned above? Number of attempts takes priority over total delay given the way Maybe I'm confused about what is lesser in that example. |
Yes, if each client call is also limited by a timeout, then we'd have either
Say you configure Or, maybe even a bit more confusing would be if the user configured a |
| 🔔 1st Reminder Hey @jkczyz! This PR has been waiting for your review. |
Isn't that expected? "done after 50s" is really "done after 5 attempts".
Yeah, though isn't that a good argument to use select? Or does the current design not allow that given policy timeout is built into the type rather than the calling site being aware of it? |
True, seems like we should? And given that we already use tokio with the time feature, it should be straightforward. I think I'll add a commit to this PR. |
Argh, after looking into it for a bit I have to eat my words: it's actually not trivial, as currently And more generally, with a generic error type, we don't know what error we'd return in case the the timeout happens before the |
Would it help defining an enum parameterized by the error |
| 🔔 2nd Reminder Hey @jkczyz! This PR has been waiting for your review. |
| 🔔 3rd Reminder Hey @jkczyz! This PR has been waiting for your review. |
| 🔔 4th Reminder Hey @jkczyz! This PR has been waiting for your review. |
| 🔔 5th Reminder Hey @jkczyz! This PR has been waiting for your review. |
| 🔔 6th Reminder Hey @jkczyz! This PR has been waiting for your review. |
| 🔔 7th Reminder Hey @jkczyz! This PR has been waiting for your review. |
| 🔔 8th Reminder Hey @jkczyz! This PR has been waiting for your review. |
| 🔔 9th Reminder Hey @jkczyz! This PR has been waiting for your review. |
jkczyz left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting on @tnull (no rush!). Just want to silence the review bot.
5e83c2b to 5670866 Compare reqwest's 5670866 to 4e360cf Compare 9c94320 to 9c3148f Compare f926914 to 8b8910d Compare | Now also tacked-on a commit that corrected the expected HTTP status codes in our mocked test service. Relatedly, I discovered that the Rust vss-server didn't use the correct codes either (a regression from the Java version), fixed in lightningdevkit/vss-server#65 |
.. and finally it turns out that we can drop our |
6bc7bce to b258cec Compare While the `RetryPolicy` has a `MaxTotalDelayRetryPolicy`, the retry `loop` would only check this configured delay once the operation future actually returns a value. However, without client-side timeouts, we're not super sure the operation is actually guaranteed to return anything (even an error, IIUC). So here, we enable some coarse client-side default timeouts to ensure the polled futures eventualy return either the response *or* an error we can handle via our retry logic.
.. as some types are part of our API.
b258cec to e6d8e57 Compare | Rebased to resolve minor conflicts. |
1 similar comment
| fn build_client() -> Client { | ||
| Client::builder() | ||
| .timeout(DEFAULT_TIMEOUT) | ||
| .connect_timeout(DEFAULT_TIMEOUT) | ||
| .read_timeout(DEFAULT_TIMEOUT) | ||
| .build() | ||
| .unwrap() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably inline this in from_client (does that need to be pub?) and pass header_provider from each new method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that need to be
pub?
Yes, as we want to allow users to override any specific reqwest properties. In fact, in LDK Node we currently make use of from_client_with_headers introduced here to be able to override the default timeouts/max retries.
7cf661b to 817db29 Compare | 🔔 2nd Reminder Hey @tankyleo! This PR has been waiting for your review. |
1 similar comment
| 🔔 2nd Reminder Hey @tankyleo! This PR has been waiting for your review. |
| 🔔 3rd Reminder Hey @tankyleo! This PR has been waiting for your review. |
1 similar comment
| 🔔 3rd Reminder Hey @tankyleo! This PR has been waiting for your review. |
Previously, we'd allow to either re-use a `reqwest::Client` or supply a header provider. Here we add a new constructor that allows us to do both at the same time.
817db29 to 23097bf Compare | So it turns out that not all error cases were retryable, e.g., in LDK Node's |
| }; | ||
| let mock_server = mockito::mock("POST", GET_OBJECT_ENDPOINT) | ||
| .with_status(409) | ||
| .with_status(404) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not asserted in the test want to make sure this is intentional ? I understand why we would not assert that the server responds with a consistent error code - error response map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure I follow what you're asking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie i change the status code and the test does not fail :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie i change the status code and the test does not fail :)
Well, we currently determine the error based on the decoded payload. But, we should still mirror the actual server behavior. FWIW, if we wanted to assert something, it would be that the mock service mirrors what we do in production, but if we did so (running a service in CI) it would mitigate the idea of using a mock service in the first place. Not sure if other projects have better practices around mocking..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if we did so (running a service in CI) it would mitigate the idea of using a mock service in the first place.
Thank you can you help me understand this part ?
Well, we currently determine the error based on the decoded payload. But, we should still mirror the actual server behavior. FWIW, if we wanted to assert something, it would be that the mock service mirrors what we do in production
How about some kind of debug_assert client-side that asserts that the VssError we got from the server matches the expected StatusCode ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I had in mind. But again maybe not worth asserting here...
diff --git a/src/error.rs b/src/error.rs index 5955e6a..497a588 100644 --- a/src/error.rs +++ b/src/error.rs @@ -34,7 +34,7 @@ impl VssError { /// Create new instance of `VssError` pub fn new(status: StatusCode, payload: Bytes) -> VssError { match ErrorResponse::decode(&payload[..]) { -Ok(error_response) => VssError::from(error_response), +Ok(error_response) => VssError::from((status, error_response)), Err(e) => { let message = format!( "Unable to decode ErrorResponse from server, HttpStatusCode: {}, DecodeErr: {}", @@ -73,22 +73,35 @@ impl Display for VssError { impl Error for VssError {} -impl From<ErrorResponse> for VssError { -fn from(error_response: ErrorResponse) -> Self { +impl From<(StatusCode, ErrorResponse)> for VssError { +fn from((status, error_response): (StatusCode, ErrorResponse)) -> Self { match error_response.error_code() { -ErrorCode::NoSuchKeyException => VssError::NoSuchKeyError(error_response.message), +ErrorCode::NoSuchKeyException => { +debug_assert_eq!(status, StatusCode::NOT_FOUND); +VssError::NoSuchKeyError(error_response.message) +}, ErrorCode::InvalidRequestException => { +debug_assert_eq!(status, StatusCode::BAD_REQUEST); VssError::InvalidRequestError(error_response.message) }, -ErrorCode::ConflictException => VssError::ConflictError(error_response.message), -ErrorCode::AuthException => VssError::AuthError(error_response.message), +ErrorCode::ConflictException => { +debug_assert_eq!(status, StatusCode::CONFLICT); +VssError::ConflictError(error_response.message) +}, +ErrorCode::AuthException => { +debug_assert_eq!(status, StatusCode::UNAUTHORIZED); +VssError::AuthError(error_response.message) +}, ErrorCode::InternalServerException => { +debug_assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR); VssError::InternalServerError(error_response.message) }, -_ => VssError::InternalError(format!( -"VSS responded with an unknown error code: {}, message: {}", -error_response.error_code, error_response.message -)), +ErrorCode::Unknown => { +VssError::InternalError(format!( +"VSS responded with an unknown error code: {}, message: {}", +error_response.error_code, error_response.message +)) +}, } } }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you can you help me understand this part ?
We use a mock service explicitly to avoid having to run the full service in CI. If we now add consistency checks between the mock service and the actual implementation, we might as well just run the tests against the service. That said, it's maybe not the worst idea to begin with, as IMO the mocking always just has the potential to be wrong (as just proven), and given we're otherwise not super conservative about running stuff in CI, I'm not quite sure what we gain with the mocking approach exactly.
This is what I had in mind. But again maybe not worth asserting here...
Hmm, well, it's not only not worth asserting, it's also wrong as the bug would be on the service side. So, following Postel's law, we should be 'liberal in what we accept' and any debug_asserts would need to be added service-side to ensure correctness. However, the protocol seems underspecified there, as nowhere it's actually defined that you should also make use of HTTP status codes AFAIU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I think we can leave as is for now, will continue to consider replacing mock with the actual service now that a project like Fedimint wants to use the full VSS service in their CI
Based on #38.
We enable
reqwestclient-level timeouts:~~Additionally, we here rip out our ~broken retry logic and replace it by utilizing
reqwest's retry logic that shipped in the recent v0.12.23 release.~~