DEV Community

Cover image for Add a friendly HTTP polling to your API with Retry-After
David Bernard
David Bernard

Posted on • Originally published at github.com

Add a friendly HTTP polling to your API with Retry-After

Polling is a way to handle long, delayed, queued, asynchron work without blocking a TCP connection. To do (long) polling on http server we need at least 2 endpoints:

  • The endpoint to start the work (eg: POST /start_work or POST /workfor REST like api)
  • The endpoint to provide the result of the work when ready, or a "not ready yet" status to tell "retry later" (eg GET /work/{work_id})

sequence diagram based on body content

This approach implies a per endpoint logic :-( , handled by the Caller!

  • How to read work_id from the result of POST /start_work ? Is the status-code is 200 or 202 ?
  • How to convert work_id into url request for GET /work/{work_id} and handle the response (ready vs not ready) ?
  • What is the retry interval? Is it defined in the documentation, as an arbitrary value or as optional info in the response ?

Concept Evolution: use http redirect & retry-after

  • The server provides not the interval but an estimation for when to try next time via Retry-After http attribute (case insensitive).
  • The server provides the endpoint to get the result via the 303 SEE_OTHER status code and the Location http attribute (and indirectly triggers the retry).
  • The information are provided via http status code & attributes like handling of authentication, trace, circuit breaker, rate-limit...

So on caller side, the logic can be handled in a endpoint agnostic way (eg at the user-agent wrapper level), and reuse for every endpoint that use polling.

On server side, switching between polling and direct response is transparent for the caller.

sequence diagram with using status & header

✅ Pros

  • server can adjust Retry-After, with estimation based on current load, progress of the work,...
  • server can adjust the location of the response maybe to add complementary query parameters,...
  • the protocol becomes is agnostic of the endpoint (may could become a "standard")
  • the caller & user-agent are free to handle the polling as they want, it could like in the first example (with more information) or with a more complex way with queue intermediate state, via sidecar or proxy...
    • user-agent is free to follow redirect automatically or not, and to handle them as a blocking or non-blocking way
    • user-agent handle retry-after like retries on
    • rate-limit: 429 (Too Many Request) + Retry-After
    • downtime: 503 (Service Unavailable) + Retry-After
    • ...
    • the work_id & polling can be (nearly) hide to the Caller, it's like a regular POST request that return the response

❌ Cons

  • the Caller should handle response of GET /work/{work_id} as response of POST /start_work (both possible error,...)
  • often the implementation of user agent about following redirect, should be changed or handled by the wrapper
    • the user-agent should change the method from POST to GET on redirection (allowed for 301 (Move Permanently), 302 (Found), 303 (See Other)), this behavior can be coded at the user-agent wrapper level.
    • some user-agent don't handle Retry-After (remember http header are case insensitive)
    • Some user-agent have a maximum number of redirect (eg with curl Maximum (50) redirects followed)

References

Extracted from RFC 7231 - Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content similar info available at M

  • 303 See Other

... This status code is applicable to any HTTP method. It is primarily used to allow the output of a POST action to redirect the user agent to a selected resource, since doing so provides the information corresponding to the POST response in a form that can be separately identified, bookmarked, and cached, independent of the original request. ...

  • Retry-After > ... When sent with any 3xx (Redirection) response, Retry-After indicates the minimum time that the user agent is asked to wait before issuing the redirected request. ...

Extracted from Retry-After - HTTP | MDN

The Retry-After response HTTP header indicates how long the user agent should wait before making a follow-up request. There are three main cases this header is used:

  • When sent with a 503 (Service Unavailable) response, this indicates how long the service is expected to be unavailable.
  • When sent with a 429 (Too Many Requests) response, this indicates how long to wait before making a new request.
  • When sent with a redirect response, such as 301 (Moved Permanently), this indicates the minimum time that the user agent is asked to wait before issuing the redirected request.

Implementations (aka PoC)

🚧
WARNING: the code is not optimal, and lot of improvements can be done. PR & feedbacks are welcomed (improvements, implementation for other clients,...)
🚧

A basic server

For the PoC, I created a basic http service in Rust. The code is available at sandbox_http/polling/server-axum at development · davidB/sandbox_http.

async fn start_work(Extension(works): Extension<WorkDb>) -> impl IntoResponse { let mut rng: StdRng = SeedableRng::from_entropy(); let work_id = Uuid::new_v4(); let duration = Duration::from_secs(rng.gen_range(1..=20)); let end_at = Instant::now() + duration; let get_url = format!("/work/{}", work_id); let next_try = duration.as_secs() / 2; let mut works = works.lock().expect("acquire works lock to start_work"); works.insert( work_id, Work { work_id, end_at, duration, nb_get_call: 0, }, ); ( StatusCode::SEE_OTHER, [ (http::header::LOCATION, get_url), (http::header::RETRY_AFTER, format!("{}", next_try)), ], ) } async fn work(Path(work_id): Path<Uuid>, Extension(works): Extension<WorkDb>) -> impl IntoResponse { let mut works = works.lock().expect("acquire works lock to get_work"); tracing::info!(?work_id, "request work result"); match works.get_mut(&work_id) { None => (StatusCode::NOT_FOUND).into_response(), Some(work) => { if work.end_at > Instant::now() { work.nb_get_call += 1; let get_url = format!("/work/{}", work.work_id); let next_try = 1; ( StatusCode::SEE_OTHER, [ (http::header::LOCATION, get_url), (http::header::RETRY_AFTER, format!("{}", next_try)), ], ) .into_response() } else { (StatusCode::OK, Json(work.clone())).into_response() } } } } 
Enter fullscreen mode Exit fullscreen mode
  • Do not forgot to protect the endpoint (like others) with a kind of rate-limit

Caller with curl

curl -v --location "http://localhost:8080/start_work" -d "" 
Enter fullscreen mode Exit fullscreen mode
  • Do not use -X POST but -d "" else redirection 303 does not switch from POST to GET.
  • Failed because curl doesn't support Retry-After when follow redirection (see date in the sample below).
  • Loop stops due to "maximum redirect", without this security overload on client and server.
* Trying 127.0.0.1:8080... * Connected to localhost (127.0.0.1) port 8080 (#0) > POST /start_work HTTP/1.1 > Host: localhost:8080 > User-Agent: curl/7.82.0 > Accept: */* > Content-Length: 0 > Content-Type: application/x-www-form-urlencoded > * Mark bundle as not supporting multiuse < HTTP/1.1 303 See Other < location: /work/20913b17-1df3-40ed-b26a-df50414ecc1c < retry-after: 8 < access-control-allow-origin: * < vary: origin < vary: access-control-request-method < vary: access-control-request-headers < content-length: 0 < date: Sun, 15 May 2022 13:07:56 GMT < * Connection #0 to host localhost left intact * Issue another request to this URL: 'http://localhost:8080/work/20913b17-1df3-40ed-b26a-df50414ecc1c' * Switch to GET * Found bundle for host localhost: 0x5586add25af0 [serially] * Can not multiplex, even if we wanted to! * Re-using existing connection! (#0) with host localhost * Connected to localhost (127.0.0.1) port 8080 (#0) > GET /work/20913b17-1df3-40ed-b26a-df50414ecc1c HTTP/1.1 > Host: localhost:8080 > User-Agent: curl/7.82.0 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 303 See Other < location: /work/20913b17-1df3-40ed-b26a-df50414ecc1c < retry-after: 1 < access-control-allow-origin: * < vary: origin < vary: access-control-request-method < vary: access-control-request-headers < content-length: 0 < date: Sun, 15 May 2022 13:07:56 GMT < ... * Connection #0 to host localhost left intact * Issue another request to this URL: 'http://localhost:8080/work/20913b17-1df3-40ed-b26a-df50414ecc1c' * Found bundle for host localhost: 0x5586add25af0 [serially] * Can not multiplex, even if we wanted to! * Re-using existing connection! (#0) with host localhost * Connected to localhost (127.0.0.1) port 8080 (#0) > GET /work/20913b17-1df3-40ed-b26a-df50414ecc1c HTTP/1.1 > Host: localhost:8080 > User-Agent: curl/7.82.0 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 303 See Other < location: /work/20913b17-1df3-40ed-b26a-df50414ecc1c < retry-after: 1 < access-control-allow-origin: * < vary: origin < vary: access-control-request-method < vary: access-control-request-headers < content-length: 0 < date: Sun, 15 May 2022 13:07:56 GMT < * Connection #0 to host localhost left intact * Maximum (50) redirects followed curl: (47) Maximum (50) redirects followed 
Enter fullscreen mode Exit fullscreen mode

Caller with your favorite programming language

  • There is lot chance that the Retry-After is not well supported by your http client / user-agent. So
    • Test it, or notify your API consumers & customers to test it
    • Open a ticket/issue to the project to request support or make a PR
    • provide a workaround solution (until official support):
    • Disable the default follow redirection,
    • Implement follow redirect with support of Retry-After into your wrapper
  • The way to handle delay can be shared with retry for "rate-limit", "downtime", "circuit-breaker"

As demonstration purpose I make a sample with reqwest one of the most used http client in Rust.
You can look at it at sandbox_http/polling_with_reqwest.rs at development · davidB/sandbox_http

The output of the test:

running 1 test 110ns : check info, then continue, retry or follow 4.002665744s : check info, then continue, retry or follow 5.004217149s : check info, then continue, retry or follow 6.007647326s : check info, then continue, retry or follow 7.010080187s : check info, then continue, retry or follow 8.012471894s : check info, then continue, retry or follow [tests/polling_with_reqwest.rs:152] &body = WorkOutput { nb_get_call: 4, duration: 8s, } test polling_with_reqwest ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 8.03s 
Enter fullscreen mode Exit fullscreen mode

🎉 Success:

  • The first retry is after 4s because we defined on server as half of the duration of the work,
  • Following call as a duration around 1s
  • The http client doesn't included endpoint dedicated rules (no parse of the body, no build of url,...)
  • Supporting an other endpoint with polling doesn't require additional code
  • Bonus: support of downtime, cirsuit-break & rate-limit returning Retry-After

Top comments (0)