Change logging formatting to lazily evaluate (performance critical) #60
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
When comparing
presto-python-clientwithpyhivefor performance reasons, I found this client to run extremely slow for queries with large throughput. Runningline_profileronfetchall(), we see this:Zooming into two lines:
We see that although our
LEVELis set toINFO, the formatting on these strings are done eagerly and then discarded. The formatting itself takes a whole 7 seconds and 6 seconds respectively out of 43 seconds. For context, Pyhive takes roughly 28 seconds. The exact setup to reproduce this benchmark is redacted, but a comparable benchmark could be runningSELECT * FROM some_big_tablefrom a localhost coordinator. We would love to use this presto client in our production workflow but with the performance issue here we have decided not to consider it until a fix has been applied.