Skip to content

Conversation

@taer
Copy link
Contributor

@taer taer commented Aug 12, 2021

Was doing a bit of performance testing of a streaming flow interface, and was finding the request.collect was taking quite some time from the initial invocation of the method. Up to 15ms at times. The code looks something like this:

 fun exampleStream(requests: Flow<Int>): Flow<Int> { val start = System.nanoTime() return flow<Int> { requests.collect { value -> val collected = System.nanoTime() val thisTime = collected-start } } } 

The p99 of thisTime is what we're seeing vary quite dramatically. I looked at the differences between the flow version and observer version, and found the observer one made a call.request(1) in the startCall part, while the flow version only made that initial request after the call to collect.

I did a bit of attempts at perf testing of this locally, and the numbers seem about the same from the client side. In a prod environment, the thisTime is slightly better.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 12, 2021

CLA Signed

The committers are authorized under a signed CLA.

"requests flow can only be collected once"
}

call.request(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inside the flow

mutex.withLock { call.close(closeStatus, trailers) }
}

call.request(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved outside the flow. Made it a bit more eager.

@taer taer marked this pull request as ready for review August 12, 2021 19:24
@taer taer marked this pull request as draft August 16, 2021 20:31
@taer taer marked this pull request as ready for review September 28, 2022 02:51
@taer
Copy link
Contributor Author

taer commented Sep 28, 2022

This pairs w/ my new PR #360

There's never a server that I could think of who would desire to not collect the requests from the stream.

I'm not certain this will add any speed, but moves the timing to the grpc latyer and not the flow

@jamesward jamesward requested a review from lowasser September 28, 2022 15:28
@taer
Copy link
Contributor Author

taer commented Mar 10, 2023

@lowasser Just checking in to see if this project is still alive or if you have any questions/concerns on this?

@bubenheimer
Copy link

I'm not @lowasser but I have the impression that the PR would take away the application's ability to control the timing of initial request, i.e. it takes away control of initial flow control from the application. That would seem a bad idea in general.

For example, a server under load might incur additional buffering cost for each connecting client without the ability to refuse the call straight away.

@taer
Copy link
Contributor Author

taer commented Mar 13, 2023

That is something a server could do w/ the current implementation, but only for the streaming request style. The unary versions behave like this PR attempts.

Also, for that usecase of rejecting for load, an interceptor would be a much more likely path.

From my testing, this really didn't make much of a difference to be honest. I still had quite a large amount of time between the RPC method getting "invoked" and the first element appearing from the collect.

fun rpcMethd(reqesuts: Flow<Request>){ val preRequestCollect = System.nanoTime() requests.collectIndexed { index, value -> val postRequestCollect = System.nanoTime() 

p99 of measured under high load is about 5-6ms. That was what I was attempting to mitigate. The unary versions of the RPC all looked about 5-6ms better because they collected the request for us and "hid" that time

I might make one more pass at testing to see if there was anything noticeable. It's largely just moving chairs about to give us more visibility, since we use the said interceptor to manage load

@bubenheimer
Copy link

bubenheimer commented Mar 13, 2023

@taer With your change I suspect that even an interceptor will not have the chance to reject before the server requests.

I would not want to trade flow control capabilities in my streaming calls for a minimal improvement to call startup latency. That is why I am chiming in here.

I generally expect some call startup latency, and use streaming calls rather than unary to minimize overhead.

@taer
Copy link
Contributor Author

taer commented Mar 14, 2023

The interceptor code runs well before any of the code in question. The interceptors get a chance to kill the request well before anything we're talking about here has a chance to matter.

All of the interceptors have run prior to any effect of this PR making a difference.

And again, the point is moot, since it's about accounting. the grpc-libs call the method. Is the first request available? In the unary scenario, the library ate the "delay", and the RPC method has the request instantly. In the streaming case, the RPC method is called prior to the data being available and is responsible for "gathering the request"

So if you wanted to kill the request via interceptor, there is absolutely no delta here. None of this code gets called if an interceptor kills the request. This is just making the unary and streaming semi-equivalent.

An opposite approach would give the unary requests a deferred instead of the request, which would also be a valid alternative, but super breaking

In essence, the problem is the unary requests spend "the time" in the library giving a false sense of speed to the unary RPC. The streaming versions cant do this.. So if you're measuring your RPC time from start to end.... A streaming call of one will ALWAYS be slower than the equivalent streaming version.

@taer
Copy link
Contributor Author

taer commented Apr 30, 2024

Closing so this stops showing on my list.

@taer taer closed this Apr 30, 2024
@taer taer deleted the moveRequest branch April 30, 2024 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants