Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting superset error: 'str' object has no attribute 'tzinfo' when using elasticsearch with superset:// #29574

Open
3 tasks done
mudasirmirza opened this issue Jul 12, 2024 · 4 comments
Labels
data:connect:elasticsearch Related to Elasticsearch global:error Related to global errors affecting the platform

Comments

@mudasirmirza
Copy link

Bug description

I have added 2 data source. One ElaticSearch (named as es) and one MySQL (named as mysql).
I have enabled the "querying across database" feature using the feature FLAG as described in "https://superset.apache.org/docs/configuration/databases/#querying-across-databases" and named the meta db as "superset_meta"

Now when I go to sqllab and select "superset_meta" and then a simple query like below

SELECT
    customer_id as CustomerID,
    customer_name as CustomerName
FROM
    "es.customer_availability"

I get below error

Superset meta database Error
superset error: 'str' object has no attribute 'tzinfo'

This may be triggered by:
Issue 1002 - The database returned an unexpected error. 

And in the docker-compose logs, I see below

superset_app          | 2024-07-12 09:52:20,424:DEBUG:superset.stats_logger:[stats_logger] (timing) sqllab.query.time_executing_query | 1077.662841796875 
superset_app          | 2024-07-12 09:52:20,427:DEBUG:superset.sql_lab:Query 92: 'str' object has no attribute 'tzinfo'                                                                                         
superset_app          | 2024-07-12 09:52:20,450:DEBUG:superset.stats_logger:[stats_logger] (incr) SqlLabRestApi.execute_sql_query.error                                                                         
superset_app          | 2024-07-12 09:52:20,451:WARNING:superset.views.base:SupersetErrorsException
superset_app          | Traceback (most recent call last):                                                                                                                                                      
superset_app          |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request                                                                                      
superset_app          |     rv = self.dispatch_request()                                                                                                                                                        
superset_app          |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request                                                                                           
superset_app          |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)                                                                                                            
superset_app          |   File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
superset_app          |     return f(self, *args, **kwargs)                                                                                                                                                     
superset_app          |   File "/app/superset/views/base_api.py", line 127, in wraps                                                                                                                            
superset_app          |     raise ex                                                                                                                                                                            
superset_app          |   File "/app/superset/views/base_api.py", line 121, in wraps                                                                                                                            
superset_app          |     duration, response = time_function(f, self, *args, **kwargs)                                                                                                                        
superset_app          |   File "/app/superset/utils/core.py", line 1463, in time_function                                                                                                                       
superset_app          |     response = func(*args, **kwargs)                                                                                                                                                    
superset_app          |   File "/app/superset/views/base_api.py", line 93, in wraps
superset_app          |     return f(self, *args, **kwargs)                                                                                                                                                     
superset_app          |   File "/app/superset/utils/log.py", line 255, in wrapper                                                                                                                               
superset_app          |     value = f(*args, **kwargs)                                                                                                                                                          
superset_app          |   File "/app/superset/sqllab/api.py", line 409, in execute_sql_query                                                                                                                    
superset_app          |     command_result: CommandResult = command.run()                                                                                                                                       
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 121, in run                                                                                                                    
superset_app          |     raise ex                                                                                                                                                                            
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 103, in run                                                                                                                    
superset_app          |     status = self._run_sql_json_exec_from_scratch()
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 163, in _run_sql_json_exec_from_scratch                                                                                        
superset_app          |     raise ex                                                                                                                                                                            
superset_app          |   File "/app/superset/commands/sql_lab/execute.py", line 158, in _run_sql_json_exec_from_scratch                                                                                        
superset_app          |     return self._sql_json_executor.execute(                                                                                                                                             
superset_app          |   File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute                                                                                                                
superset_app          |     if data["errors"]:  # type: ignore                                                                                                                                                  
superset_app          | superset.exceptions.SupersetErrorsException: [SupersetError(message="superset error: 'str' object has no attribute 'tzinfo'", error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'Superset meta database', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app          | 172.18.0.1 - - [12/Jul/2024:09:52:20 +0000] "POST /api/v1/sqllab/execute/ HTTP/1.1" 500 297 "http://127.0.0.1:8088/sqllab/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"                                                                                                                                                           
superset_app          | 2024-07-12 09:52:21,530:DEBUG:superset.stats_logger:[stats_logger] (incr) log
superset_app          | 172.18.0.1 - - [12/Jul/2024:09:52:21 +0000] "POST /superset/log/?explode=events HTTP/1.1" 200 1 "http://127.0.0.1:8088/sqllab/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"

In the logs, I can clearly see that it fetched the data but for some reason it is trying to find tzinfo in the fetched json from the elasticsearch data source.

Also in the logs, when it stared to process this query, it made a call like below

superset_app          | 2024-07-12 09:52:20,052:DEBUG:elasticsearch:> {"query":"SHOW COLUMNS FROM \"customer_availability\"","fetch_size":10000}

And after this I see the log showing the fetched columns.

I am not sure why is it still trying to look for tzinfo.

Please let me know if any more information is required from my side related to this issue.

How to reproduce the bug

  1. Enable ENABLE_SUPERSET_META_DB in superset_config.py
  2. Add elasticsearch-dbapi in docker/requirements-local.txt
  3. Start docker docker compose -f docker-compose-image-tag.yml up
  4. Add ElasticSearch data source and MySQL data source and superset meta db
  5. Run query as shown in the main description

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

18 or greater

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@dosubot dosubot bot added data:connect:elasticsearch Related to Elasticsearch global:error Related to global errors affecting the platform labels Jul 12, 2024
Copy link

dosubot bot commented Jul 12, 2024

@mudasirmirza Hello! I'm here to assist you with any issues you're facing. Let's work together to resolve the problem.

The error 'str' object has no attribute 'tzinfo' occurs because the code expects a datetime object with timezone information but encounters a string instead. To resolve this, you need to ensure that datetime objects in your data have the tzinfo attribute set correctly.

Here are the steps to fix this issue:

  1. Preprocess Data to Ensure Timezone Information:
    Convert string representations of datetime to actual datetime objects with timezone information before they are processed. You can modify the data preprocessing step as follows:

    from datetime import datetime
    import pytz
    
    def preprocess_data(data):
        for row in data:
            for i, value in enumerate(row):
                if isinstance(value, str):
                    try:
                        # Attempt to parse the string as a datetime object
                        dt = datetime.fromisoformat(value)
                        # Add timezone information if missing
                        if dt.tzinfo is None:
                            dt = dt.replace(tzinfo=pytz.UTC)
                        row[i] = dt
                    except ValueError:
                        # If parsing fails, keep the original string
                        pass
        return data
    
    # Example usage
    data = [
        ("2023-10-01T12:00:00", "some other value"),
        ("2023-10-02T13:00:00+00:00", "another value"),
    ]
    
    preprocessed_data = preprocess_data(data)
  2. Ensure convert_dttm Method Handles Datetime Objects Correctly:
    The convert_dttm method in the ElasticSearchEngineSpec class should convert datetime objects to the appropriate string format for Elasticsearch queries. Here is the relevant implementation:

    class ElasticSearchEngineSpec(BaseEngineSpec):  # pylint: disable=abstract-method
        engine = "elasticsearch"
        engine_name = "ElasticSearch (SQL API)"
        time_groupby_inline = True
        allows_joins = False
        allows_subqueries = True
        allows_sql_comments = False
    
        @classmethod
        def convert_dttm(
            cls, target_type: str, dttm: datetime, db_extra: Optional[dict[str, Any]] = None
        ) -> Optional[str]:
            db_extra = db_extra or {}
    
            sqla_type = cls.get_sqla_column_type(target_type)
            if isinstance(sqla_type, types.DateTime):
                es_version = db_extra.get("version")
                supports_dttm_parse = False
                try:
                    if es_version:
                        supports_dttm_parse = Version(es_version) >= Version("7.8")
                except Exception as ex:  # pylint: disable=broad-except
                    logger.error("Unexpected error while convert es_version", exc_info=True)
                    logger.exception(ex)
    
                if supports_dttm_parse:
                    datetime_formatted = dttm.isoformat(sep=" ", timespec="seconds")
                    return (
                        f"""DATETIME_PARSE('{datetime_formatted}', 'yyyy-MM-dd HH:mm:ss')"""
                    )
    
                return f"""CAST('{dttm.isoformat(timespec="seconds")}' AS DATETIME)"""
    
            return None

By ensuring that your datetime objects have the tzinfo attribute set and are correctly formatted for Elasticsearch queries, you can avoid the 'str' object has no attribute 'tzinfo' error when querying an Elasticsearch data source in Apache Superset [1][2].

To continue talking to Dosu, mention @dosu.

@mudasirmirza
Copy link
Author

Just to add to it. If I run the same query directly selecting "ElasticSearch" as the data source and selecting the "Index"

SELECT
    customer_id as CustomerID,
    customer_name as CustomerName
FROM
    "customer_availability"

This query returns proper results.

@hardikjain178
Copy link

Hello, were you able to solve the above issue? I am also running into the same problem when trying to get elasticsearch table data using superset meta db. The query works fine when using "ElasticSearch" as a datasource. Let me know if you were able to solve it and how?

@mudasirmirza
Copy link
Author

Hi @hardikjain178 ,

I was unable to resolve this and because of this I just dropped working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect:elasticsearch Related to Elasticsearch global:error Related to global errors affecting the platform
Projects
None yet
Development

No branches or pull requests

2 participants