Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual table implementations are slow #20

Open
anacrolix opened this issue Oct 27, 2022 · 6 comments
Open

Virtual table implementations are slow #20

anacrolix opened this issue Oct 27, 2022 · 6 comments

Comments

@anacrolix
Copy link
Contributor

I have a virtual table implementation that generates 40 rows on average. With 12 million inputs it gets very slow. I think I've tracked the issue down to calling from C to Go being very slow. 60% of the CPU time is spent scheduling a goroutine to handle the incoming call.

One possible solution is to have an optimized version of a vtable cursor that doesn't need to call into Go for column, next, rowid, eof and maybe close (but not filter). I think filter could return a fully populated set of rows and columns and C code could walk that. This would reduce the C to Go calls by 5 per row, so about 200 calls per filter for my use case.

@riyaz-ali
Copy link
Owner

I like the idea of a buffered cursor! It could be useful in cases where you anyways perform a full-fetch in the Filter() function, then use Next() + Column() + Eof() functions to increment a counter and read the value at the current counter.

That being said, I'd still like it to be an opt-in rather than the default behavior, as the current behavior is simple and generic enough to support this and other use-cases too.

Also, for the numbers you've shared, can you provide a reproducible example and / or a cpu- and memory- profile?

@anacrolix
Copy link
Contributor Author

Yeah it would definitely be opt-in. I suspect an alternative interface to VirtualCursor, possibly an alternate Open method.

@anacrolix
Copy link
Contributor Author

pprof002

@anacrolix
Copy link
Contributor Author

anacrolix commented Oct 28, 2022

Trying the idea out in https://github.com/anacrolix/riyaz-ali-sqlite/compare/master...anacrolix:riyaz-ali-sqlite:array-filter?expand=1, (I had put my C callbacks in their own file, or I got duplicate errors, or no symbol if I made them static). It's about 3x faster for a test using a count(*). That's to be expected, as for every filter, I lose a next and eof call to Go. If columns were used, it would speed up by exactly how many extra column calls were made. Unfortunately it's still very slow, just the cost of hitting filter is very high. The sys cost dropped a lot tho, 39s down to 3.8s, not sure why. Probably less allocation. Here's the changed CPU pprof.

pprof003

@anacrolix
Copy link
Contributor Author

My estimates were out. The speed up is much larger (an intermediate function was causing a significant CPU cost in my tests that is unrelated to the vtable). For every filter, I was having about 40 eof and 40 next calls. Removing those reduced my runtime from ~2m30s to ~50s, where in both cases about ~35s were overhead elsewhere. So approximately an 8x speed up. Opting in to allocating everything up front is fine, see the branch. Switching to Rust however and the overhead is neglible. The speedup there is off the charts. It's definitely worth noting for anyone else trying this that the scheduler cost from C to Go is very very high.

@anacrolix
Copy link
Contributor Author

It's also worth noting that virtual tables are just slow in general in sqlite. I'm not sure why, but it's quite noticeable and avoiding them entirely (even if they're implemented in Rust, so presumably C too), and doing the equivalent behaviour in your application directly if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants