Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark trino client's speed of retrieving data, it seems the bottleneck of the data pipeline #404

Open
1 task done
zeddit opened this issue Aug 28, 2023 · 0 comments

Comments

@zeddit
Copy link

zeddit commented Aug 28, 2023

Expected behavior

when using simple select * from db, the speed should not be less than the original database's speed, otherwise the system overall will be delayed by trino itself.

for example, when getting data directly from database with sqlalchemy, the speed would reach 100MB/s, while when trino is getting in, the speed overall decreases to only 10MB/s.

Actual behavior

the speed of trino should be no less than the database one.

Steps To Reproduce

I have tested the bottleneck of python client.

I used a memory connector which means the data is reside in the trino itself, the time records only for data getting out of the trino and get in to the client.

however, this bottleneck is only about 10-20MB/s, while my backend database could get about 100MB/s in a single connection.

Log output

截屏2023-08-24 13 42 05

Operating System

ubuntu 20.04

Trino Python client version

lastest

Trino Server version

lastest

Python version

3.10

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant