Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance with persistent tables (Sled) #1457

Open
kanekoshoyu opened this issue Feb 1, 2024 · 10 comments
Open

Performance with persistent tables (Sled) #1457

kanekoshoyu opened this issue Feb 1, 2024 · 10 comments

Comments

@kanekoshoyu
Copy link

kanekoshoyu commented Feb 1, 2024

Hi,

lately I've been doing quite a lot of stuff with GlueSQL, and one thing is that I am storing some fininancial data. However, I find that the performance of GlueSQL is not sufficient to fulfill requirements. Namely, the persistent tables are really slow, so I was wondering if there is any optimization available to both SharedMemoryStorage and SledStorage that you would recommend.

Currently I have about 100 (symbols) * 24 * 60 * 60 seconds of data that I have to query quite a lot of times. Each data is a row in the table. I was expecting like multiple queries in a second, apparently 1 single selection data is already taking lots of time, as below.

1 single selection:
in-memory DB 1hr data: 700ms
persistent DB 1hr data: 2700ms
in-memory DB 24hr data: 23000ms
persistent DB 24hr data: 82500ms

I changed the Sled Config to do 2GB data cache, which is the exact same amount as the data stored, and set Mode to HighThroughput. It did improve a bit of performance.

persistent DB 24hr data (cache set to 2GB): 65000ms

I wonder if any proper use of primary key within GlueSQL would help with the performance. Any optimizable options would be great.

@kanekoshoyu kanekoshoyu changed the title Cache with persistent tables (Sled) Performance with persistent tables (Sled) Feb 1, 2024
@devgony
Copy link
Collaborator

devgony commented Feb 1, 2024

Did you mean selecting 360,000 (100 * 60 * 60) rows from sled storage took 2.7 seconds or 1 row from 360,000 rows??
Did it include client latency to get the all rows or just exeuction time of query?

And i wonder what was the full query text.

@JakkuSakura
Copy link

https://gist.github.com/JakkuSakura/4bb9678501dbabf56c1b6d95269740aa
This is the source code used for benchmark
Not the exact one but still shows something

persistent database insertion for 1s data: 7ms
volatile database insertion for 1s data: 1ms
volatile shared database selection with 1 symbols 24hr data: 1581ms, 1hr=false
volatile shared database selection with 1 symbols 24hr data: 1587ms, 1hr=true
persistent database selection with 1 symbols 1hr data: 93ms, 1hr=false
volatile shared database selection with 1 symbols 1hr data: 75ms, 1hr=false
persistent database selection with 1 symbols 24hr data: 2067ms, 1hr=false
persistent database selection with 1 symbols 24hr data: 2071ms, 1hr=true
It's completely useless
There're no simple way we can improve
1hr=true means that we select 1hour's data from whole dataset

@kanekoshoyu
Copy link
Author

kanekoshoyu commented Feb 4, 2024

Did you mean selecting 360,000 (100 * 60 * 60) rows from sled storage took 2.7 seconds or 1 row from 360,000 rows?? Did it include client latency to get the all rows or just exeuction time of query?

And i wonder what was the full query text.

It is 3,600 rows from 360,000, as in getting one symbol out of 100 symbols for 1 hr data.

@kanekoshoyu
Copy link
Author

kanekoshoyu commented Feb 4, 2024

Jakku has set up a test with slight modification to above, changing symbol string to symbol ID. It did help quite a lot, so my next optimization guess is to change from Decimal to f64.

persistent database insertion for 1s data: 2ms
volatile database insertion for 1s data: 1ms
volatile shared database selection with 1 symbols 24hr data: 380ms, 1hr=false
volatile shared database selection with 1 symbols 24hr data: 385ms, 1hr=true
persistent database selection with 1 symbols 1hr data: 36ms, 1hr=false
volatile shared database selection with 1 symbols 1hr data: 17ms, 1hr=false
persistent database selection with 1 symbols 24hr data: 826ms, 1hr=true
persistent database selection with 1 symbols 24hr data: 838ms, 1hr=false

But again, query for selecting 86,400 rows out of 8,640,000 in persistent table taking 800+ms is quite slow, we are aiming at somewhere below 50ms.

@kanekoshoyu
Copy link
Author

kanekoshoyu commented Feb 4, 2024

I'm wondering what are the limiting factors to the performances here. I get that changing symbol string to id improves performance, because it reduces the time for symbol comparison. But say for size of each row, do they make big difference in terms of performance in GlueSQL selection?

Roughly speaking, Sled is structured in BTreeMap, so the best possible for query should be O(logN). However, currently 1hr vs 24hr data is taking 36ms vs 838ms, at roughly 24 times O(N) instead of 4.5 times O(logN).

Same applies to SharedMemoryStorage, 17ms vs 385ms is around the 24 times.

Operation per row that's taking too much time that the O(logN) issue is becoming a O(N) issue.

@jeromegn
Copy link

jeromegn commented May 7, 2024

@kanekoshoyu I'm interested in knowing if you've figured this out.

Since SharedMemoryStorage also shows degradation, I wonder if you could find the cause with a CPU profile. I suspect this is not a sled problem and possibly a glue sql internal thing?

@kanekoshoyu
Copy link
Author

Hi @jeromegn
Frankly we haven't figured out how to solve its root cause. Instead I came up with 2 workarounds I can share with you.

  1. any real time data analysis I just use in-memory DB, I only use persistent DB for storing/querying data, like a more traditional use of database.

This is because I had it run with flamegraph and I saw like a quarter of its runtime doing file IO. Plus one thing I found is that the stock sled storage does not do concurrency well. Between a start and end transaction, it locks the file, so each transaction has to be "atomic". I have a modded version which I wrapped with Arc<RwLock> that you can run it concurrently without database being locked. (You need to gracefully terminate the program though)

  1. Instead of having 100 symbols in a single table, I split into 100 individual tables.

Although the table storage structure might be in a hashmap, filtering and sorting still makes the result 0(n+) by nature, so a query minimised by 100 is at least 100 times faster.

@kanekoshoyu
Copy link
Author

https://github.com/kanekoshoyu/gluesql_shared_sled_storage

This is the link to the modded sled storage

@kanekoshoyu
Copy link
Author

Also a little micro-optimisations here and there.

Do not do string sorting/filtering, they are too slow. Try using an index instead.

Try using AST directly instead of the query string. Each query string is converted to AST at run-time.

@marvin-j97
Copy link

marvin-j97 commented Jun 3, 2024

I changed the Sled Config to do 2GB data cache, which is the exact same amount as the data stored, and set Mode to HighThroughput. It did improve a bit of performance.

"HighThroughput" and "LowSpace" modes don't actually do anything by the way. I'm unsure why it's even possible to set them, it's a no-op.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants