Skip to content

Conversation

o-shevchenko
Copy link
Contributor

@o-shevchenko o-shevchenko commented Nov 14, 2024

I'm trying to use the executeSelect API and faced extremely slow reading.
I tried to use ConnImplBenchmark but noticed that the Shema was changed, and the test didn't work.

bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2017
image
image

Summary of Changes
Added Fields: airport_fee, data_file_year, data_file_month.
Removed Fields: dropoff_longitude, dropoff_latitude, pickup_longitude, pickup_latitude.

After fixing the test I can confirm that we have similar speed results for our use cases.
Reading 100_000 rows takes ~15-20 seconds, which is extremely slow.

 Running ROW 100000 Time: 14978 ms ROW 200000 Time: 16409 ms ROW 300000 Time: 16966 ms ROW 400000 Time: 15963 ms ROW 500000 Time: 17480 ms 

I'm not sure if there was any performance degradation recently since I can't find any expected numbers. It's hard to read this benchmark: https://cloud.google.com/blog/topics/developers-practitioners/introducing-executeselect-client-library-method-and-how-use-it/
According to this image, reading of 1_000_000 rows should take ~1sec
image

That's what I've got on my machine:

Benchmark (rowLimit) Mode Cnt Score Error Units ConnImplBenchmark.iterateRecordsUsingReadAPI 500000 avgt 3 76549.893 ± 14496.839 ms/op ConnImplBenchmark.iterateRecordsUsingReadAPI 1000000 avgt 3 154957.127 ± 25916.110 ms/op ConnImplBenchmark.iterateRecordsWithBigQuery_Query 500000 avgt 3 82508.807 ± 17930.275 ms/op ConnImplBenchmark.iterateRecordsWithBigQuery_Query 1000000 avgt 3 165717.219 ± 86960.648 ms/op ConnImplBenchmark.iterateRecordsWithoutUsingReadAPI 500000 avgt 3 84504.175 ± 36823.590 ms/op ConnImplBenchmark.iterateRecordsWithoutUsingReadAPI 1000000 avgt 3 165142.367 ± 99899.991 ms/op 

I've opened an issue: googleapis/java-bigquerystorage#2764

@o-shevchenko o-shevchenko requested a review from a team as a code owner November 14, 2024 18:27
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Nov 14, 2024
Copy link

google-cla bot commented Nov 14, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/java-bigquery API. label Nov 14, 2024
@o-shevchenko o-shevchenko changed the title Fix ConnImplBenchmark test fix(test): Update schema for broken ConnImplBenchmark test Nov 15, 2024
@o-shevchenko
Copy link
Contributor Author

@alvarowolfx Could you please help with the review and performance evaluation?
Thanks!

@o-shevchenko
Copy link
Contributor Author

@alvarowolfx, did you have a chance to look into it?

@alvarowolfx
Copy link

@PhongChuong can you take a look on this one ?

Copy link
Contributor

@PhongChuong PhongChuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.
Lets discuss the slow read results in further in #2764.

@PhongChuong
Copy link
Contributor

/gcbrun

@o-shevchenko
Copy link
Contributor Author

Thanks for the fix. Lets discuss the slow read results in further in #2764.

Thanks for the reply. You probably mean googleapis/java-bigquerystorage#2764

@PhongChuong PhongChuong added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Dec 3, 2024
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 3, 2024
@PhongChuong PhongChuong merged commit 8cf4387 into googleapis:main Dec 3, 2024
17 checks passed
@o-shevchenko o-shevchenko deleted the benchmark branch December 10, 2024 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/java-bigquery API. size: m Pull request size is medium.

4 participants