Skip to content

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Feb 4, 2019

How to get a pandas DataFrame, fast!

The first two examples use the existing BigQuery client. These examples
create a thread pool and read in parallel. The final example shows using
just the new BigQuery Storage client, but only shows how to read with a
single thread.

@tswast tswast added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 4, 2019
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 4, 2019
@tswast tswast removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 5, 2019
@tswast tswast force-pushed the tswast-bqstorage-pandas branch from 0eaca4e to a3a48c0 Compare February 5, 2019 00:37
@tswast tswast requested review from alixhami and shollyman February 5, 2019 00:37
@tswast tswast added the bigquery label Feb 5, 2019
Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, modulo the open question on small results

# [START bigquerystorage_pandas_read_query_results]
import uuid

# Due to a known issue in the BigQuery Storage API (TODO: link to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider simply running a large query that emits enough data to avoid the inline? Pros: better demonstrates the perf of the new API, and avoids us having to revisit the sample. Cons: test time overhead and potential pitfalls for people kicking tires with small results. Part of this is dependent on how the team will be maintaining their KI list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cons: test time overhead.

Long test time is my main reason for avoiding queries that return big results. I guess it's not so bad since this repo can test the different directories independently.

Cons: potential pitfalls for people kicking tires with small results.

The current failure case is rather bad: it returns a successful response, but gives you an empty result set. That's a pretty big pitfall, because catching it requires noticing that you didn't get the data you thought you were going to get.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. Let's confirm how the KIs will be maintained so you can link them.

tswast added 10 commits February 7, 2019 10:46
How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.
* Move imports inside region tags. * Adjust query indentation to match region tags.
Move duplicate imports out of region tags. Add region tag for the whole sample.
to just above the sample where it is used. This makes the complete source code for the sample make more sense (bigquerystorage_pandas_tutorial_all)
@tswast tswast force-pushed the tswast-bqstorage-pandas branch from f064277 to 925fe3b Compare February 7, 2019 18:47
@tswast tswast merged commit e9bc7de into master Feb 7, 2019
@tswast tswast deleted the tswast-bqstorage-pandas branch February 7, 2019 18:53
plamut pushed a commit to plamut/python-bigquery-storage that referenced this pull request Sep 2, 2020
…ogleCloudPlatform/python-docs-samples#1994) * BigQuery Storage API sample for reading pandas dataframe How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.
plamut pushed a commit to googleapis/python-bigquery-storage that referenced this pull request Sep 10, 2020
…ogleCloudPlatform/python-docs-samples#1994) * BigQuery Storage API sample for reading pandas dataframe How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.
Linchin pushed a commit that referenced this pull request Aug 18, 2025
) * BigQuery Storage API sample for reading pandas dataframe How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.
parthea pushed a commit to googleapis/google-cloud-python that referenced this pull request Aug 21, 2025
…ogleCloudPlatform/python-docs-samples#1994) * BigQuery Storage API sample for reading pandas dataframe How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.
parthea pushed a commit to googleapis/google-cloud-python that referenced this pull request Sep 16, 2025
…ogleCloudPlatform/python-docs-samples#1994) * BigQuery Storage API sample for reading pandas dataframe How to get a pandas DataFrame, fast! The first two examples use the existing BigQuery client. These examples create a thread pool and read in parallel. The final example shows using just the new BigQuery Storage client, but only shows how to read with a single thread.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery cla: yes This human has signed the Contributor License Agreement.

4 participants