-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I use bigquery storage to get the result of a query #802
Comments
Could you provide me with the specific endpoints you are asking about? |
This example shows how to retrieve a table, I'd like to retrieve the result of a query in a similar way. Is this possible ? |
Sure -- you can look at the Apache Beam BigQuery query source as one example. This won't necessarily be faster than the default method of fetching query results, however, particularly for small result sets. What is your use case? |
I want to do a query and download the result into Java memory as fast as possible.
Are you talking about the BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
TableResult result = bigquery.query(queryConfig);
for (FieldValueList row : result.iterateAll()) {
// do stuff
} It takes around 3 minutes to download 10M rows Instead I have tried to do a query with the bigquery API then download the temporary result table with the bigquerystorage API, but I'm not sure if it is the good way to do it: Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).build());
queryJob = queryJob.waitFor();
TableId temporaryTable = ((QueryJobConfiguration) queryJob.getConfiguration()).getDestinationTable();
// Use bigquerystorage to download the temporary table to Arrow It takes around 10 seconds to download the same 10M rows.
I am not familiar with Beam, I'll have a look at it :) |
What version of |
I am using <dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
<version>16.2.1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement> which gives me google-cloud-bigquery 1.126.1
I mean the time it takes to retrieve all the data into my local JVM This code takes several minutes to iterate over 10M rows: TableResult result = bigquery.query(queryConfig);
long count = 0;
for (FieldValueList row : result.iterateAll()) {
for (FieldValue val : row) {
count++;
}
} |
How wide is your row? How much data roughly is in your 10M+ row? I want to compare this to our benchmark to see if it's normal. Also, can you share a code snippet of how you are doing the same operation using the bigquerystorage API which is taking 10s instead of 3min? Thanks! |
I have 10M rows with 3 columns (integer, string, float) and as CSV it represents ~200MB. Here is my benchmark code for bigquerystorage API (with 1 thread to keep it simple): Example.javaimport com.google.api.gax.rpc.ServerStream;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.storage.v1.BigQueryReadClient;
import com.google.cloud.bigquery.storage.v1.CreateReadSessionRequest;
import com.google.cloud.bigquery.storage.v1.DataFormat;
import com.google.cloud.bigquery.storage.v1.ReadRowsRequest;
import com.google.cloud.bigquery.storage.v1.ReadRowsResponse;
import com.google.cloud.bigquery.storage.v1.ReadSession;
import com.google.cloud.bigquery.storage.v1.ReadSession.TableReadOptions;
import com.google.common.base.Preconditions;
import io.atoti.Config.MyTable;
import java.io.IOException;
public class Example {
public static final BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
public static void main(String[] args) throws InterruptedException, IOException {
final String query = "SELECT * FROM table"; // whatever query with big result
long start = System.currentTimeMillis();
TableId temporaryResultTable = getTemporaryTable(query);
long queryDone = System.currentTimeMillis();
downloadTable(temporaryResultTable);
long end = System.currentTimeMillis();
System.out.println("query time:\t" + (queryDone-start) +"ms");
System.out.println("download time:\t" + (end-queryDone) +"ms");
System.out.println("total time:\t" + (end-start) +"ms");
}
public static TableId getTemporaryTable(String query) throws InterruptedException{
QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).setUseLegacySql(false).build();
Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).build());
// Wait for the query to complete.
queryJob = queryJob.waitFor();
return ((QueryJobConfiguration) queryJob.getConfiguration()).getDestinationTable();
}
public static void downloadTable(TableId table) throws IOException {
String parent = String.format("projects/%s", table.getProject());
String srcTable =
String.format(
"projects/%s/datasets/%s/tables/%s",
table.getProject(), table.getDataset(), table.getTable());
TableReadOptions options = TableReadOptions.newBuilder().build();
ReadSession.Builder sessionBuilder =
ReadSession.newBuilder()
.setTable(srcTable)
.setDataFormat(DataFormat.ARROW)
.setReadOptions(options);
CreateReadSessionRequest.Builder builder =
CreateReadSessionRequest.newBuilder()
.setParent(parent)
.setReadSession(sessionBuilder)
.setMaxStreamCount(1);
try (BigQueryReadClient client = BigQueryReadClient.create()) {
ReadSession session = client.createReadSession(builder.build());
String streamName = session.getStreams(0).getName();
try (SimpleArrowRowReader reader = new SimpleArrowRowReader(session.getArrowSchema())) {
ReadRowsRequest readRowsRequest =
ReadRowsRequest.newBuilder().setReadStream(streamName).build();
ServerStream<ReadRowsResponse> stream = client.readRowsCallable().call(readRowsRequest);
for (ReadRowsResponse response : stream) {
Preconditions.checkState(response.hasArrowRecordBatch());
reader.processRows(response.getArrowRecordBatch());
}
}
}
}
} SimpleArrowRowReader.javaimport com.google.cloud.bigquery.storage.v1.ArrowRecordBatch;
import com.google.cloud.bigquery.storage.v1.ArrowSchema;
import com.google.common.base.Preconditions;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.FieldVector;
import org.apache.arrow.vector.VectorLoader;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ReadChannel;
import org.apache.arrow.vector.ipc.message.MessageSerializer;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.Schema;
import org.apache.arrow.vector.util.ByteArrayReadableSeekableByteChannel;
public class SimpleArrowRowReader implements AutoCloseable {
BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
private final VectorSchemaRoot root;
private final VectorLoader loader;
private long count = 0;
public SimpleArrowRowReader(ArrowSchema arrowSchema) throws IOException {
Schema schema =
MessageSerializer.deserializeSchema(
new ReadChannel(
new ByteArrayReadableSeekableByteChannel(
arrowSchema.getSerializedSchema().toByteArray())));
Preconditions.checkNotNull(schema);
List<FieldVector> vectors = new ArrayList<>();
for (Field field : schema.getFields()) {
vectors.add(field.createVector(allocator));
}
root = new VectorSchemaRoot(vectors);
loader = new VectorLoader(root);
}
public void processRows(ArrowRecordBatch batch) throws IOException {
org.apache.arrow.vector.ipc.message.ArrowRecordBatch deserializedBatch =
MessageSerializer.deserializeRecordBatch(
new ReadChannel(
new ByteArrayReadableSeekableByteChannel(
batch.getSerializedRecordBatch().toByteArray())),
allocator);
loader.load(deserializedBatch);
deserializedBatch.close();
long batchSize = root.getRowCount();
// System.out.println("\tBatch size: "+ count);
count += batchSize;
root.clear();
}
@Override
public void close() {
System.out.println("Stream size: " + count);
root.close();
allocator.close();
}
} which outputs Stream size: 10000000 query time: 928ms download time: 8523ms total time: 9451ms I wonder if that is a viable solution because:
But I don't see any solution more "official" for doing that with the same speed |
This seems reasonable. We potentially will be doing a BigQuery client re-write and this will be taken into consideration when we do that. Thank you! |
…cies to v2.7.0 (googleapis#802) [![WhiteSource Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [com.google.cloud:google-cloud-shared-dependencies](https://togithub.com/googleapis/java-shared-dependencies) | `2.6.0` -> `2.7.0` | [![age](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-shared-dependencies/2.7.0/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-shared-dependencies/2.7.0/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-shared-dependencies/2.7.0/compatibility-slim/2.6.0)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-shared-dependencies/2.7.0/confidence-slim/2.6.0)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>googleapis/java-shared-dependencies</summary> ### [`v2.7.0`](https://togithub.com/googleapis/java-shared-dependencies/blob/HEAD/CHANGELOG.md#​270-httpsgithubcomgoogleapisjava-shared-dependenciescomparev260v270-2022-01-28) [Compare Source](https://togithub.com/googleapis/java-shared-dependencies/compare/v2.6.0...v2.7.0) ##### Dependencies - update dependency com.fasterxml.jackson:jackson-bom to v2.13.1 ([#​555](https://togithub.com/googleapis/java-shared-dependencies/issues/555)) ([67b5663](https://togithub.com/googleapis/java-shared-dependencies/commit/67b56635509215fd132b2ffbcaed995a6de93879)) - update dependency com.google.api-client:google-api-client-bom to v1.33.1 ([#​580](https://togithub.com/googleapis/java-shared-dependencies/issues/580)) ([895dba2](https://togithub.com/googleapis/java-shared-dependencies/commit/895dba2afef6c295b0e21a4ccd27feee1b6af8e2)) - update dependency com.google.api:api-common to v2.1.3 ([#​587](https://togithub.com/googleapis/java-shared-dependencies/issues/587)) ([5dbbce8](https://togithub.com/googleapis/java-shared-dependencies/commit/5dbbce80baa045818757912e6db4f222c12b75f6)) - update dependency com.google.auth:google-auth-library-bom to v1.4.0 ([#​578](https://togithub.com/googleapis/java-shared-dependencies/issues/578)) ([c274c62](https://togithub.com/googleapis/java-shared-dependencies/commit/c274c62905192bcc48c82104bf50cbeaa1894c2b)) - update dependency com.google.errorprone:error_prone_annotations to v2.11.0 ([#​582](https://togithub.com/googleapis/java-shared-dependencies/issues/582)) ([314da38](https://togithub.com/googleapis/java-shared-dependencies/commit/314da38c9f12cf425b5d1be8297e9f2a78d13948)) - update dependency com.google.http-client:google-http-client-bom to v1.41.2 ([#​579](https://togithub.com/googleapis/java-shared-dependencies/issues/579)) ([c8df06d](https://togithub.com/googleapis/java-shared-dependencies/commit/c8df06ddb3f4a975e928bbdd9dfae293e81efa39)) - update dependency com.google.oauth-client:google-oauth-client-bom to v1.33.0 ([#​576](https://togithub.com/googleapis/java-shared-dependencies/issues/576)) ([3315988](https://togithub.com/googleapis/java-shared-dependencies/commit/3315988059ca0223f078b0f760960674bfa8c04a)) - update dependency com.google.protobuf:protobuf-bom to v3.19.3 ([#​571](https://togithub.com/googleapis/java-shared-dependencies/issues/571)) ([c1931e1](https://togithub.com/googleapis/java-shared-dependencies/commit/c1931e1c4611ca3d1e2afac1c9e983cf1ec4a7de)) - update dependency io.grpc:grpc-bom to v1.44.0 ([#​586](https://togithub.com/googleapis/java-shared-dependencies/issues/586)) ([52c47b7](https://togithub.com/googleapis/java-shared-dependencies/commit/52c47b7425af438ba1a82dac87f9002353a6a87a)) - update dependency org.checkerframework:checker-qual to v3.21.1 ([#​554](https://togithub.com/googleapis/java-shared-dependencies/issues/554)) ([81bbb91](https://togithub.com/googleapis/java-shared-dependencies/commit/81bbb91774ab67161371bfd0dfd143de89dee4c9)) - update gax.version to v2.11.0 ([#​581](https://togithub.com/googleapis/java-shared-dependencies/issues/581)) ([1899612](https://togithub.com/googleapis/java-shared-dependencies/commit/189961226f84cdd4e283780d2c7d4a4dffc3e00c)) - update gax.version to v2.9.0 ([#​573](https://togithub.com/googleapis/java-shared-dependencies/issues/573)) ([ee5e38e](https://togithub.com/googleapis/java-shared-dependencies/commit/ee5e38efea574eeb2b5803192da98ba1ce1c8e6e)) - update google.common-protos.version to v2.7.2 ([#​588](https://togithub.com/googleapis/java-shared-dependencies/issues/588)) ([caac0e6](https://togithub.com/googleapis/java-shared-dependencies/commit/caac0e68e336b801becb6322b5b81fce1824ad8a)) - update google.core.version to v2.4.0 ([#​589](https://togithub.com/googleapis/java-shared-dependencies/issues/589)) ([6a292e3](https://togithub.com/googleapis/java-shared-dependencies/commit/6a292e3530a06d6b228f3154ed4e979ef68d7aeb)) - update iam.version to v1.2.1 ([#​569](https://togithub.com/googleapis/java-shared-dependencies/issues/569)) ([62dd270](https://togithub.com/googleapis/java-shared-dependencies/commit/62dd2705b257c4bb539bfe619234f38d0e13b138)) - update opencensus.version to v0.30.0 ([#​552](https://togithub.com/googleapis/java-shared-dependencies/issues/552)) ([ebae62e](https://togithub.com/googleapis/java-shared-dependencies/commit/ebae62eca75234db4fd571f42b8ed0f6ce1bd1f7)) - update opencensus.version to v0.31.0 ([#​585](https://togithub.com/googleapis/java-shared-dependencies/issues/585)) ([021b066](https://togithub.com/googleapis/java-shared-dependencies/commit/021b066a833af1629c89fcca593d16b9786ee0bf)) </details> --- ### Configuration 📅 **Schedule**: At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, click this checkbox. --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/java-storage-nio).
🤖 I have created a release *beep* *boop* --- ### [0.123.19](googleapis/java-storage-nio@v0.123.18...v0.123.19) (2022-02-03) ### Dependencies * **java:** update actions/github-script action to v5 ([googleapis#1339](googleapis/java-storage-nio#1339)) ([googleapis#800](googleapis/java-storage-nio#800)) ([4c82c37](googleapis/java-storage-nio@4c82c37)) * update actions/github-script action to v5 ([googleapis#799](googleapis/java-storage-nio#799)) ([40febb2](googleapis/java-storage-nio@40febb2)) * update dependency com.google.cloud:google-cloud-shared-dependencies to v2.7.0 ([googleapis#802](googleapis/java-storage-nio#802)) ([2beefb6](googleapis/java-storage-nio@2beefb6)) * update dependency com.google.cloud:google-cloud-storage to v2.2.3 ([googleapis#786](googleapis/java-storage-nio#786)) ([b82657c](googleapis/java-storage-nio@b82657c)) * update dependency com.google.cloud:google-cloud-storage to v2.3.0 ([googleapis#796](googleapis/java-storage-nio#796)) ([e822be5](googleapis/java-storage-nio@e822be5)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
I'd like to download the result of a query as fast as possible.
The examples in https://github.com/googleapis/java-bigquerystorage/tree/master/samples/snippets/src/main/java/com/example/bigquerystorage are all about downloading/uploading to a table.
How can use this library to send a query and download the result ? Is there any example about it somewhere ?
The text was updated successfully, but these errors were encountered: