Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load_table_from_dataframe when BigQuery table has a JSON column. #1405

Closed
twointum opened this issue Nov 11, 2022 · 6 comments
Closed
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release.

Comments

@twointum
Copy link

Environment details

  • OS type and version:
  • Python version: 3.9.2
  • pip version: 22.2.2
  • google-cloud-bigquery version: 2.34.4

Steps to reproduce

  1. Call Rest API
  2. Load to datafame, ex df = pd.DataFrame(response.json())
  3. client.load_table_from_dataframe(df, 'project.dataset.table')

Stack trace

C:\Users\xxxxxxxxx\AppData\Local\Programs\Python\Python39\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:244: UserWarning: Unable to determine type for field 'jsonFieldName'.
  warnings.warn("Unable to determine type for field '{}'.".format(bq_field.name))```

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 11, 2022
@cecabrera
Copy link

Same issue here :(

@brainysmurf
Copy link

Same here. Any workaround?

@twointum
Copy link
Author

twointum commented Apr 11, 2023

So my issue was that when a JSON column is loaded to a DF column, the dataframe changes the format. For example, it changes double quotes to single quotes. To remedy, I do this sanitization step before writing to BQ. Let me know it that works!

df['myJsonColumn'] = df['myJsonColumn'].apply(json.dumps)

PS: Also, when I set up the BQ table, I used the STRING data type to collect the JSON column and then use json_extract_scalar() to pull data out of it. I don't know if that's a good practice, but it works very well for me!

@chalmerlowe
Copy link
Contributor

Setting this to P3: there appears to be a workaround and due to limits on manpower and scheduling.

@chalmerlowe chalmerlowe added the priority: p3 Desirable enhancement or fix. May not be included in next release. label Aug 17, 2023
@Linchin Linchin changed the title Cannot load_table_from_datastore when BigQuery table has a JSON column. Cannot load_table_from_dataframe when BigQuery table has a JSON column. Feb 3, 2024
@ArnoldHueteG
Copy link

After some investigation, I've determined that this particular issue appears to be resolved. I tested it using a specific example below and found that the issue stemmed from the nested JSON being automatically converted into a dictionary during the DataFrame creation process. Following this conversion, the client successfully loaded the example.

I hope this insight might be helpful for anyone who encounters a similar situation in the future. Looking for another issue to be my first open source contribution 😅

Environment details
OS type and version: macOS 14.1.1
Python version: 3.10
pip version: 24.0
google-cloud-bigquery version: 3.17.1
pandas: 2.2.0

json_data = [
    {
        "id": 1,
        "name": "Alice",
        "age": 30,
        "address": {
            "street": "123 Apple St",
            "city": "New York",
            "state": "NY",
            "zip": 10001
        }
    },
    {
        "id": 2,
        "name": "Bob",
        "age": 35,
        "address": {
            "street": "456 Orange Ave",
            "city": "Los Angeles",
            "state": "CA",
            "zip": 90001
        }
    },
]

df = pd.DataFrame(json_data)
load_job = client.load_table_from_dataframe(
    df, "xenon-world-399922.oss.your_table"
)
load_job.result() 

@chalmerlowe
Copy link
Contributor

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release.
Projects
None yet
Development

No branches or pull requests

5 participants