Skip to content

Reading data using the executeSelect API is slow #2764

@o-shevchenko

Description

@o-shevchenko

We use executeSelect API to run SQL query and read results from BigQuery. We expected a good speed based on this article

Reading data using executeSelectAPI is extremely slow.
Reading of 100_000 rows takes 23930 ms.
The profiling showed no prominent places where we spent most of the time.

Are there any recent changes that might cause performance degradation for such an API?
Do you have a benchmark to understand what performance we should expect?
Thanks!

Environment details

  1. com.google.cloud:google-cloud-bigquery:2.43.3
  2. Mac OS Sonoma M1
  3. Java version: 17

Code example

Mono.fromCallable { bigQueryOptionsBuilder.build().service } .flatMap { context -> val connectionSettings = ConnectionSettings.newBuilder() .setRequestTimeout(10L) .setUseReadAPI(true) .setMaxResults(1000) .setNumBufferedRows(1000) .setUseQueryCache(true) .build(); val connection = context.createConnection(connectionSettings) val bqResult = connection.executeSelect(sql) val result = Flux.usingWhen( Mono.just(bqResult.resultSet), { resultSet -> resultSet.toFlux(bqResult.schema) }, { _ -> Mono.fromRunnable<Unit> { connection.close() } } ) Mono.just(Data(result, bqResult.schema.toSchema())) } ... fun ResultSet.toFlux(schema:Schema): Flux<DataRecord> { return Flux.generate<DataRecord> { sink -> if (next()) { sink.next(toDataRecord(schema)) } else { sink.complete() } } }

Metadata

Metadata

Assignees

Labels

api: bigquerystorageIssues related to the googleapis/java-bigquerystorage API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions