Virtual table implementations are slow #20

anacrolix · 2022-10-27T11:03:18Z

I have a virtual table implementation that generates 40 rows on average. With 12 million inputs it gets very slow. I think I've tracked the issue down to calling from C to Go being very slow. 60% of the CPU time is spent scheduling a goroutine to handle the incoming call.

One possible solution is to have an optimized version of a vtable cursor that doesn't need to call into Go for column, next, rowid, eof and maybe close (but not filter). I think filter could return a fully populated set of rows and columns and C code could walk that. This would reduce the C to Go calls by 5 per row, so about 200 calls per filter for my use case.

riyaz-ali · 2022-10-27T16:06:17Z

I like the idea of a buffered cursor! It could be useful in cases where you anyways perform a full-fetch in the Filter() function, then use Next() + Column() + Eof() functions to increment a counter and read the value at the current counter.

That being said, I'd still like it to be an opt-in rather than the default behavior, as the current behavior is simple and generic enough to support this and other use-cases too.

Also, for the numbers you've shared, can you provide a reproducible example and / or a cpu- and memory- profile?

anacrolix · 2022-10-27T22:19:54Z

Yeah it would definitely be opt-in. I suspect an alternative interface to VirtualCursor, possibly an alternate Open method.

anacrolix · 2022-10-27T23:30:23Z

anacrolix · 2022-10-28T03:17:46Z

Trying the idea out in https://github.com/anacrolix/riyaz-ali-sqlite/compare/master...anacrolix:riyaz-ali-sqlite:array-filter?expand=1, (I had put my C callbacks in their own file, or I got duplicate errors, or no symbol if I made them static). It's about 3x faster for a test using a count(*). That's to be expected, as for every filter, I lose a next and eof call to Go. If columns were used, it would speed up by exactly how many extra column calls were made. Unfortunately it's still very slow, just the cost of hitting filter is very high. The sys cost dropped a lot tho, 39s down to 3.8s, not sure why. Probably less allocation. Here's the changed CPU pprof.

anacrolix · 2022-11-02T04:31:43Z

My estimates were out. The speed up is much larger (an intermediate function was causing a significant CPU cost in my tests that is unrelated to the vtable). For every filter, I was having about 40 eof and 40 next calls. Removing those reduced my runtime from ~2m30s to ~50s, where in both cases about ~35s were overhead elsewhere. So approximately an 8x speed up. Opting in to allocating everything up front is fine, see the branch. Switching to Rust however and the overhead is neglible. The speedup there is off the charts. It's definitely worth noting for anyone else trying this that the scheduler cost from C to Go is very very high.

anacrolix · 2022-11-17T00:17:28Z

It's also worth noting that virtual tables are just slow in general in sqlite. I'm not sure why, but it's quite noticeable and avoiding them entirely (even if they're implemented in Rust, so presumably C too), and doing the equivalent behaviour in your application directly if possible.

anacrolix mentioned this issue Nov 2, 2022

vtable support zombiezen/go-sqlite#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtual table implementations are slow #20

Virtual table implementations are slow #20

anacrolix commented Oct 27, 2022

riyaz-ali commented Oct 27, 2022

anacrolix commented Oct 27, 2022

anacrolix commented Oct 27, 2022

anacrolix commented Oct 28, 2022 •

edited

anacrolix commented Nov 2, 2022

anacrolix commented Nov 17, 2022

Virtual table implementations are slow #20

Virtual table implementations are slow #20

Comments

anacrolix commented Oct 27, 2022

riyaz-ali commented Oct 27, 2022

anacrolix commented Oct 27, 2022

anacrolix commented Oct 27, 2022

anacrolix commented Oct 28, 2022 • edited

anacrolix commented Nov 2, 2022

anacrolix commented Nov 17, 2022

anacrolix commented Oct 28, 2022 •

edited