Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ParsedQuery] Check is_read_only_query too slow #29559

Open
3 tasks done
vuongtlt13 opened this issue Jul 11, 2024 · 1 comment
Open
3 tasks done

[ParsedQuery] Check is_read_only_query too slow #29559

vuongtlt13 opened this issue Jul 11, 2024 · 1 comment
Labels
#bug:performance Performance bugs validation:required A committer should validate the issue

Comments

@vuongtlt13
Copy link

Bug description

Problem

First, I have a sql statement with length is 17121. I need check **is_read_only_query**

Logic for checking is_read_only_query is at superset/db_engine_specs/base.py#L2027
image

Test result

image

As you can see:

  • For current ParsedQuery class:
    Each function in is_select, is_explain, is_show will format sql statement by calling sqlparse.format function. But it takes quite long (~0.332s for one call). So you need to call 3 times for checking is_read_only_query --> Total time: 0.9818s

Solution

I wrote an PR to reduce time for checking is_read_only_query at #29557

Please help me to review it

How to reproduce the bug

Go to a dashboard and check time for loading result

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@dosubot dosubot bot added #bug:performance Performance bugs validation:required A committer should validate the issue labels Jul 11, 2024
Copy link

dosubot bot commented Jul 11, 2024

Hey there, @vuongtlt13! 👋 I'm here to assist you with any bugs, questions, or contributions. Let's squash those bugs together!

The pull request you mentioned introduces the is_select_query method and uses it within the is_readonly_query method to improve performance. This change reduces the number of times the sqlparse.format function is called, which significantly decreases the time required to check if a query is read-only.

Here is the relevant code from the pull request:

@classmethod
def is_readonly_query(cls, parsed_query: ParsedQuery) -> bool:
    """
    Pessimistic readonly, 100% sure statement won't mutate anything.
    """
    return KustoKqlEngineSpec.is_select_query(
        parsed_query
    ) or parsed_query.sql.startswith(".show")

@classmethod
def is_select_query(cls, parsed_query: ParsedQuery) -> bool:
    return not parsed_query.sql.startswith(".")

This change ensures that the is_readonly_query method only calls sqlparse.format once, reducing the total time from approximately 0.9818s to 0.3278s [1].

Additionally, there are existing unit tests for the is_readonly_query method that cover various scenarios for both SQL and KQL queries. These tests ensure that the method correctly identifies read-only queries [2][3][4].

You can review the pull request and run the tests to verify the improvements.

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug:performance Performance bugs validation:required A committer should validate the issue
Projects
None yet
Development

No branches or pull requests

1 participant