-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sql): fix several bugs in aggregation (avg/sum/first_value) window functions #4429
fix(sql): fix several bugs in aggregation (avg/sum/first_value) window functions #4429
Conversation
thank you for the contribution once again, good find! Fuzz test is very useful, we need more of those. It looks to me though that fuzz test just might be made to use SQL. What do you think? Outside of SQL tests are fragile. For example, you can generate random number of data partitions. If you know partition boundaries, you can run vanilla sum() on these boundaries and compare result with window function. Hope this makes sense? Regarding fuzz tests you have:
|
I'm not sure if fuzz tests in this specific scenario should be on the SQL level. I see following benefits from current "low-level" implementation of fuzz-tests
But, I agreed that current tests are pretty fragile (at least Let me know your opinion about this suggestion and points above, @bluestreak01. Regarding the general fuzz-tests structure I agreed with you. I guess it's better for me to look at some good example of existing fuzz tests and make mine similar to them. Can you point me to some good references in the codebase? (a bit of off-topic): also, I kind of like how fuzzing implemented in Go stdlib. On the high level it has 2 modes: test mode & fuzz mode.
Not sure if same DX can be achieved in Java (in Go single test can dynamically generate sub-tests, e.g. based on the directory content, which help in the debugging), but I think that 2 modes for fuzz tests can be useful (not sure, maybe QuestDB already have this). |
@bluestreak01, what do you think? |
BTW, we can extract fuzz test out from this branch and merge patch of window functions. And continue work/discussion on fuzz tests in separate PR. @bluestreak01, @nwoolmer - thoughts? |
hi Nikita, can you keep the fuzz test but start it from random seed The fix itself is very good and useful! |
@bluestreak01, yes, sure. Done. Now all tests in Is it enough or should I reduce it even further? |
- total run time of WindowFunctionUnitTest now ~7sec (on my local laptop)
dbd058a
to
5e15f43
Compare
core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionUnitTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionUnitTest.java
Outdated
Show resolved
Hide resolved
tests can be optimised further, to make execution time neglegable Could you also format the modified files using IntelliJ formatter? |
@bluestreak01, what blocks us from merging the branch 😛 ? |
PRs from forks run limited CI unfortunately. We're still dealing with the aftermath of merging previous PRs. Perhaps for the next release. We need to figure out a way to run full CI. |
Oh, ok, got it( That's fine. |
Context
New weekend - new PR to the
QuestDB
!I read the doc and noticed nice support for window aggregation functions. I started to experimenting with the queries in the live demo (it's really GREAT invention of QuestDB; I don't think I will ever start to touch the code without this live demo console!) and sample from the doc with both bounds specified started to get weird results:
This query doesn't pass basic sanity check which I performed by eye - some values in the
moving_agg
column were negative (but all prices are positive for sure!). So, there is definitely something wrong with aggregation functions.Changes
WindowFunctionUnitTest.java
with 2 fuzz tests for window functions with and without partitioning + bunch of basic unit tests from which I started to investigate the bugframeSize > 0
check were added)UNBOUNDED
partitioned window aggregation had incorrect initialization of the buffer parameters. As code written in such a way that for unbounded case only suffix outside of the frame stored in the buffer -QuestDB
need to be careful and differentiate between cases whenrangeHi = 0
andrangeHi != 0
firstIdx
can be updated to zero but it can be suddenly overwritten with some stale value ofnewFirstIdx
FirstValueOverRowsFrameFunction
implQuestions
QuestDB
almost don't use any "containers" for operating with several fields of same data structures (e.g.sum, frameSize, startOffset, ...
). Instead,QuestDB
code organized in such a way that all these fields "embedded" in the root class which performs actual computation. This approach leads to the following main issues (from my point of view):QuestDB
really need to enforce such style of coding or there are other options (which maybe rely on some clever Java compiler optimization techniques) which will allow to structure code more nicely but also have zero overhead at the runtime?