Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of async execution #142

Open
Cito opened this issue Oct 14, 2021 · 6 comments
Open

Improve performance of async execution #142

Cito opened this issue Oct 14, 2021 · 6 comments
Assignees
Labels
help wanted Extra attention is needed investigate Needs investigaton or experimentation optimization Code optimizations and performance issues

Comments

@Cito
Copy link
Member

Cito commented Oct 14, 2021

When executing a query against a schema that contains async resolvers, then the executor method creates some overhead by defining and calling async functions (async def async_... inside methods in execute.py).

We should check whether pre-defining these functions or static methods of the ExecutionContext could reduce this overhead and boost performance. See also benchmarks provided in #141.

@jkimbo
Copy link
Member

jkimbo commented Oct 14, 2021

@Cito would you be open to splitting the ExecutionContext class into sync and async versions? That way we can drop most of the is_awaitable checks and simplify the code significantly.

@Cito
Copy link
Member Author

Cito commented Oct 15, 2021

We should consider all options, but the API should be backward compatible, to not cause another turmoil in the GraphQL-Python ecoystem, and the code and API should not deviate too much from GraphQL.js so that it's easier to keep up with the development there.

Currently, the ExecutionContext can deal with mixed resolvers (some sync, and some async). This is similar to GraphQL.js, but it is more complicated to implement in Python because contrary to JavaScript you can't await something non-awaitable here.

What we could do is create optimized versions (possibly subclasses) of the ExecutionContext for the two cases when all resolvers (including type resolvers) are sync or all are async. The schema could get an attribute that tells whether it is sync, async or mixed, which could be determined automatically via introspection at schema build time. If possible, the executor could then call the optimized version. This way it would be fully backward compatible and transparent to the caller, and always as performant as possible.

If we do this, we should strive for as little code duplication as possible (just as much as needed to solve the performance issues).

Before experimenting with such a big (internal) change, we should first check how much we can gain by predefining the async functions as explained above. Maybe this already suffices. For optimization, in the case of a purely sync or async schema, you can also pass an is_awaitable function that always returns either False or True as. This would avoid the costly real checks every time (see also #54 regarding the is_awaitable function and the optimization that has been already done here).

Using a profiler like Yappi should also give insights into where things can be further optimized.

@Cito Cito assigned Cito and jkimbo Oct 15, 2021
@Cito Cito added investigate Needs investigaton or experimentation optimization Code optimizations and performance issues labels Oct 15, 2021
@jkimbo
Copy link
Member

jkimbo commented Oct 15, 2021

We should consider all options, but the API should be backward compatible, to not cause another turmoil in the GraphQL-Python ecoystem, and the code and API should not deviate too much from GraphQL.js so that it's easier to keep up with the development there.

Absolutely!

Before experimenting with such a big (internal) change, we should first check how much we can gain by predefining the async functions as explained above. Maybe this already suffices.

Agreed. I can try and put together a PR to see how much it would improve things.

For optimization, in the case of a purely sync or async schema, you can also pass an is_awaitable function that always returns either False or True as.

I think this already happens with the sync execution pathway but I don't think it's possible to always return True for async schema's because the default resolver is still sync.

@Cito
Copy link
Member Author

Cito commented Oct 15, 2021

I think this already happens with the sync execution pathway but I don't think it's possible to always return True for async schema's because the default resolver is still sync.

Good points. Regarding the default resolvers, we could either use async default resolvers in this case or we could simply make a case distinction in the async execute_field method when there is no field resolver, and similarly for the type resolver in complete_abstract_value.

@Cito Cito added the help wanted Extra attention is needed label Dec 28, 2021
@Cito
Copy link
Member Author

Cito commented Dec 28, 2021

See also #101 for performance optimization.

@Cito
Copy link
Member Author

Cito commented Dec 28, 2021

For the records: In #142 the locally defined async methods were moved to the class level, but this did not make things better. The next step could be to create an execution context that assumes every resolver is async and that is optimized for that case, and see if that can improve performance significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed investigate Needs investigaton or experimentation optimization Code optimizations and performance issues
Projects
None yet
Development

No branches or pull requests

2 participants