Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul solve_dependencies() for performance #5554

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

kristjanvalur
Copy link
Contributor

@kristjanvalur kristjanvalur commented Oct 28, 2022

This pr provides an overhaul of solve_dependencies(). It is aimed at fixing performance and increasing readability.

  1. A hidden performance bug is removed, when an extra level of dependencies is traversed, even if the cache is hit.
  2. A hidden performance bug is removed when a new Depends is created for every dependency if there are any dependency overrides.
  3. The evaluation of builtin-dependencies is moved into specialized callables/closures to avoid a lengthy series of if statements for every Dependency evaluation
  4. the call-context of dependency solving is moved into its own DependencySolverContext dataclass avoiding a lot of boilerplate code in the main solve_dependencies() function

A new test is added to test deep Depends hierarchies and caching.

A specialized unittests is also added which uses the timeit module to measure performance of Depends processing. We run an AsyncClient to reduce overhead, for complete requests. Also direct evaluation of the endpoint dependencies, both for Depends and also Request . (The test does nothing by default using if False:. Could also add a pytest switch to enable it.)

On a local PC, I measure, with the pytest_cahce_perf unittest enabled and pytest --quiet -k test_deep_cache_perf unittest:

  • before:
    tests/test_dependency_perf.py deep cache client requests
    did 1000 calls in 0.3157117000000653 seconds
    time per call: 0.32ms, rate: 3167.45/s
    deep cache direct solve
    did 2000 calls in 0.2325039000002107 seconds
    time per call: 0.12ms, rate: 8602.01/s
    request direct solve
    did 50000 calls in 0.3208132000004298 seconds
    time per call: 0.01ms, rate: 155853.94/s
    
  • after:
    tests/test_dependency_perf.py deep cache client requests
    did 1000 calls in 0.2511885999997503 seconds
    time per call: 0.25ms, rate: 3981.07/s
    deep cache direct solve
    did 5000 calls in 0.2859181999997418 seconds
    time per call: 0.06ms, rate: 17487.52/s
    request direct solve
    did 100000 calls in 0.3904711000000134 seconds
    time per call: 0.00ms, rate: 256100.90/s
    

Performance of deep Depends hierarchies is doubled, mostly due to the fixing of the cache mechanic. But simple dependencies, like the Request, are also faster, since the whole depdendency evaluation loop is simpler, and we use dispatch functions to fill the parameters rather than switch statements.

Running the entire testsuite, with:
pytest --quiet tests
reports 28.4s before and 28.1 after the change. The whole testsuite however typically creates an app and runs the endpoint once. Also, there is tremendous overhead in using TestClient rather than AsyncClient for each endpoint call. The optimizations are expected to trade app init time for endpoint execution time.

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@codecov
Copy link

codecov bot commented Oct 28, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (cf73051) 100.00% compared to head (059a07f) 100.00%.
Report is 1074 commits behind head on master.

❗ Current head 059a07f differs from pull request most recent head 4336a43. Consider uploading reports for the commit 4336a43 to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##            master     #5554   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          540       541    +1     
  Lines        13969     14064   +95     
=========================================
+ Hits         13969     14064   +95     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link
Contributor

@kristjanvalur kristjanvalur changed the title Overhaul solve_dependencies() Overhaul solve_dependencies() for performance Oct 28, 2022
@github-actions
Copy link
Contributor

@kristjanvalur kristjanvalur marked this pull request as ready for review October 28, 2022 17:45
@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

The documentation does not list `Response` as as valid dependency for
WebSocket handlers.
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2022

@@ -454,77 +455,76 @@ async def solve_generator(
return await stack.enter_async_context(cm)


@dataclasses.dataclass
class DependencySolverContext:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using dataclases.dataclass here instead of Pydantic?

I just want to know if there's a reason in particular

Copy link
Contributor Author

@kristjanvalur kristjanvalur Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataclass is basically a quick way to declare a class with initializers. I'm not looking for pydantic's validation. This is supposed to be low overhead.
Could have used a plain class instead and written my own __init__().
Could also have made the solve_dependencies() an instance method, but I don't see much of that kind of programming style in FastAPI so I left it functional.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, very good PR. I really liked it.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2022

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

context.values = values
context.errors = errors
for dependency_getter in dependant.dependency_getters:
if inspect.iscoroutinefunction(dependency_getter):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have two list of getters in order to avoid the iscoroutine check?

Also if it's done in that way, one could easily use gather instead of waiting each coroutine individually.

Copy link
Contributor Author

@kristjanvalur kristjanvalur Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that, but see my other comment about automatically gathering. We'd have to to some performance tests to make sure it doesn't slow down simple getters.

Copy link
Contributor Author

@kristjanvalur kristjanvalur Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'm looking at the code, and these are the "built-in" dependencies (Headers, params, etc)
There is only one type of async built-in dependency, and that tis the get_body_params().
This is used for Body() type parameters. And the only kind of Body() parameter where anything async happens is the File() one. FastAPI already schedules FILE sequence reads internally... and there is only one getter created, for all the Body type params. but I'll split it into a separate method and get rid of the iscoroutinefunction test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so few comments in the code, it takes a lot of effort to reverse-engineer this :)


solved_result = await solve_dependencies(
request=request,
sub_values, sub_errors = await solve_dependencies(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to run all these subtasks in parallel? Same for the code below

        if is_gen_callable(call) or is_async_gen_callable(call):
            stack = context.request.scope.get("fastapi_astack")
            assert isinstance(stack, AsyncExitStack)
            solved = await solve_generator(
                call=call, stack=stack, sub_values=sub_values
            )
        elif is_coroutine_callable(call):
            solved = await call(**sub_values)
        else:
            solved = await run_in_threadpool(call, **sub_values)
        if sub_dependant.name is not None:
            values[sub_dependant.name] = solved

See comment #5472 (comment)

Copy link
Contributor Author

@kristjanvalur kristjanvalur Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky. The thing is often dependencies don't do any IO at all, but they are marked async to be not marked sync. All sync dependencies get run from a threadpool, because they potentially block. This is a huge performance waste for a dependency which does something relatively simple without any IO.

Similarly, calling asyncio.gather on a bunch of async dependencies which don't do any IO is potentially quite wasteful. It means that each will have a Task created and that we have to go and have a whole loop through the event loop while these Tasks are scheduled and run to completion. This will cause some unnecessary latency.

So, I'm not super convinced that it should be automatically done. I think we should leave that up to the application to decide if multiple dependencies can be wrapped in a single common dependency and then explicitly call asyncio.gather

At least, we'd need to do some performance testing before deciding to go down that route. And I'd like for it to be a separate PR, because that would be a significant change in logic.

for field in dependant.path_params:

def get_path_param_getter(field: ModelField) -> DependencyGetter:
def get_param(context: DependencySolverContext) -> None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all these getters defined inline? As they don't seem to be using any local state, could you move them out and declare them at the top level?

Note that this is specially necessary for getters defined in a loop as in each iteration it will create a new function, generating more memory garbage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a style decision. My original implementation attempted to not use the outer function at all (e.g. get_path_param_getter(), but crate the closure directly (e.g. get_param(), but that doesn't work in a loop neccessitating the outer helper function. I kept it at the same place as the closure to keep all the logic in the same place.

You need not worry about "memory garbage". The outer function is assigned to a local variable and gets destroyed when the "get_dependent_dependency_getters()" exits. Each iteration just redefines the function. But I can remove it out of the loop if you want. It will just reduce the locality of the logic a bit. Python is nice like that, local functions definitions are just fine.

This also happens during the setup of the endpoints so it is not critical performance-wise.

But I'm happy to move them out to the top level if you think that style is better.

Copy link
Contributor Author

@kristjanvalur kristjanvalur Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the definitions out of the loop scope, though.

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@kristjanvalur
Copy link
Contributor Author

kristjanvalur commented Feb 23, 2023

Is there any interest in reducing call overhead for FastAPI endpoint dependencies?

@nonnibunk
Copy link

Could we get this PR merged, I'd love to get this into my project.

@alejsdev alejsdev added feature New feature or request p3 and removed investigate labels Jan 15, 2024
@Tishka17
Copy link

If you are interested in faster dependency solving, I can suggest my project which can be used with fastapi together (or without it, if you need)

https://github.com/reagento/dishka/tree/develop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request p3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants