load_table_from_dataframe support JSON column dtype #1966

jlynchMicron · 2024-07-02T21:10:12Z

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Is your feature request related to a problem? Please describe.
python-bigquery does not seem to currently support uploading dataframes where one of the columns in the destination table is JSON dtype.

Quick partial code example:

schema = [
    bigquery.SchemaField("system_id", "STRING"),
    bigquery.SchemaField("summery_type", "STRING"),
    bigquery.SchemaField("summery_datetime", "DATETIME"),
    bigquery.SchemaField("summery_is_good", "BOOL"),
    bigquery.SchemaField("summery_data", "JSON"), #Stored in dataframe as JSON formatted string
]
job_config = bigquery.LoadJobConfig(schema=schema)

job = client.load_table_from_dataframe(data, table_obj, job_config=job_config)
print(job.result())

Result:
google.api_core.exceptions.BadRequest: 400 Unsupported field type: JSON; reason: invalid, message: Unsupported field type: JSON

Describe the solution you'd like
Implement support for loading data to BigQuery that contain JSON columns.

Additional context

$ conda list | grep bigquery
google-cloud-bigquery-core 3.25.0             pyhd8ed1ab_0  
google-cloud-bigquery-storage 2.25.0             pyhca7485f_0  
google-cloud-bigquery-storage-core 2.25.0             pyhca7485f_0

Related issues:
googleapis/python-bigquery-sqlalchemy#399
googleapis/python-bigquery-pandas#698
googleapis/python-bigquery-dataframes#816

The text was updated successfully, but these errors were encountered:

jlynchMicron · 2024-07-02T21:13:04Z

I have also tried uploading the data as dtype STRING in hopes that it would be converted to JSON server-side, but that also results in the following error: google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/gdw-dev-sse/jobs?uploadType=resumable: Provided Schema does not match Table XXXXXXXXXXXXXXXX. Field summery_data has changed type from JSON to STRING
NOTE: Hiding table name

jlynchMicron · 2024-07-02T22:21:31Z

I was able to upload my data with the following code, but JSON support should be added for pandas dataframes.

Code:

data['summery_datetime'] = data['summery_datetime'].astype(str) #Datetime objects are not JSON serializable, convert to datetime string.
data = data.to_dict(orient='records')
job = client.load_table_from_json(data, table_obj, job_config=job_config)
print(job.result())

tswast · 2024-07-08T15:06:00Z

Thanks @jlynchMicron for providing a workaround. I think there are a few problems we'll need to work through, one of which is that the bigquery backend doesn't support JSON in load jobs from Parquet files: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet#type_conversions Please file a Feature Request https://cloud.google.com/support/docs/issue-trackers (specifically, Create new BigQuery issue).

Another possible workaround is to use CSV as the source format. In your example

schema = [
    bigquery.SchemaField("system_id", "STRING"),
    bigquery.SchemaField("summery_type", "STRING"),
    bigquery.SchemaField("summery_datetime", "DATETIME"),
    bigquery.SchemaField("summery_is_good", "BOOL"),
    bigquery.SchemaField("summery_data", "JSON"), #Stored in dataframe as JSON formatted string
]
job_config = bigquery.LoadJobConfig(
    schema=schema,
    source_format="CSV",
)

job = client.load_table_from_dataframe(data, table_obj, job_config=job_config)
print(job.result())

For googlers watching this issue, I have proposed a design go/bf-json which proposes a JSONDtype in https://github.com/googleapis/python-db-dtypes-pandas which would allow us to autodetect when JSON is used in a DataFrame. Before that though, we could do the same and choose the appropriate serialization format depending on the provided schema. For example, parquet must be used with STRUCT/ARRAY columns, but CSV must be used for JSON.

Linchin · 2024-07-23T19:00:40Z

googleapis/python-db-dtypes-pandas#284 can potentially fulfill this feature request.

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 2, 2024

blunderbuss-gcf bot assigned leahecole Jul 2, 2024

tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jul 8, 2024

tswast mentioned this issue Jul 8, 2024

Support JSON data type googleapis/python-bigquery-dataframes#816

Open

leahecole assigned Linchin and unassigned leahecole Jul 16, 2024

Linchin assigned chelsea-lin and unassigned Linchin Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_table_from_dataframe support JSON column dtype #1966

load_table_from_dataframe support JSON column dtype #1966

jlynchMicron commented Jul 2, 2024 •

edited

Loading

jlynchMicron commented Jul 2, 2024 •

edited

Loading

jlynchMicron commented Jul 2, 2024

tswast commented Jul 8, 2024

Linchin commented Jul 23, 2024

load_table_from_dataframe support JSON column dtype #1966

load_table_from_dataframe support JSON column dtype #1966

Comments

jlynchMicron commented Jul 2, 2024 • edited Loading

jlynchMicron commented Jul 2, 2024 • edited Loading

jlynchMicron commented Jul 2, 2024

tswast commented Jul 8, 2024

Linchin commented Jul 23, 2024

jlynchMicron commented Jul 2, 2024 •

edited

Loading

jlynchMicron commented Jul 2, 2024 •

edited

Loading