drop hints support #125

ruslandoga · 2023-09-19T08:38:41Z

It doesn't seem like ClickHouse supports hints. In Plausible query.hints are used as a workaround to pass the SAMPLE clause.

Maybe it can be similar to input/1 helper function:

q = from t in sample("table", 0.1), select: count(t.id) * 10
# select count(t.id) * 10 from table t sample 0.1

The text was updated successfully, but these errors were encountered:

hkrutzer · 2023-09-19T10:09:37Z

I use them to add FINAL as I wrote here https://github.com/plausible/ecto_ch/pull/103/files (which I will update soon :) )

ruslandoga · 2023-09-19T10:10:41Z

It doesn't seem like FINAL is a hint though. I think ClickHouse is using settings in place of hints: ClickHouse/ClickHouse#11719 (comment)

hkrutzer · 2023-09-19T10:12:30Z

Indeed it's not a hint but it seems to be the only way to pass it into that query from Ecto. I don't think Ecto supports a settings clause?

ruslandoga · 2023-09-19T10:13:40Z

It doesn't have to go into the Ecto query, settings can be passed separately from the query.

Zarathustra2 · 2023-09-19T10:20:35Z

I think final still has some issues because it cannot move always the where clause to prewhere. See ClickHouse/ClickHouse#31411

So I would advise to not use/promote the usage of final too much.

hkrutzer · 2023-09-19T10:22:21Z

Some queries cannot be performed without FINAL but I think that is tangential to this issue.

Zarathustra2 · 2023-09-19T10:23:45Z

Really? Do you have an example? I am curious because I cannot think of an example maybe because I never encountered such a situation

hkrutzer · 2023-09-19T10:35:09Z

When using an AggregatingMergeTree and you need the merged result for example

Zarathustra2 · 2023-09-19T10:37:22Z

You can use aggregating functions - sum, avg, argMaxMerge, etc. - + group by to have the same effect as your final query

hkrutzer · 2023-09-19T10:51:45Z

I don't use it like that, I merge several partial records into one final record, like this (simplified):

`id` FixedString(26),
`started_at` DateTime('UTC') MATERIALIZED ULIDStringToDateTime(id),
`disconnected_at` SimpleAggregateFunction(max, DateTime('UTC')),
`user_responded_at` SimpleAggregateFunction(max, DateTime('UTC')),
`properties` SimpleAggregateFunction(groupArrayArray, Array(String))

Additional properties (in the properties column) can be added to the record at any time, and the disconnected_at and user_responded_at are inserted later as well. They are eventually merged based on the ID column.

Zarathustra2 · 2023-09-19T10:57:42Z

But can't you just do

select id, started_at, max(disconnected_at) as disconnected_at, max(user_responded_at) as user_responded_at, groupArrayArray(properties) as properties from <table> group by id, started_at

Maybe I am not really following. Still I don't think there is any query which you can only run with final and not as a group by + aggregation.

hkrutzer · 2023-09-19T11:33:23Z

I don't know if I can, but why would I do it that way? That is a way longer query. Perhaps you are right and FINAL is not strictly speaking necessary but I don't see the advantage of writing a longer query.

Zarathustra2 · 2023-09-19T11:36:17Z

It depends on your use case. If you have a small dataset final should be fine. If you have a larger dataset final should be avoided. If you don't run into performance issues or don't care about speed then final is also fine.

hkrutzer · 2023-09-19T11:36:40Z

If I query with a group by, I get

10 rows in set. Elapsed: 6.001 sec. Processed 33.86 million rows, 802.51 MB (5.64 million rows/s., 133.73 MB/s.)

and with the FINAL modifier:

10 rows in set. Elapsed: 0.584 sec. Processed 94.00 thousand rows, 110.76 KB (160.89 thousand rows/s., 189.56 KB/s.)

With GROUP BY, Clickhouse will probably scan every row, while FINAL can be performed in a more efficient way.

When I repeat the query it is even worse:

10 rows in set. Elapsed: 6.421 sec. Processed 33.86 million rows, 802.52 MB (5.27 million rows/s., 124.98 MB/s.)

remains.

With FINAL:

10 rows in set. Elapsed: 0.096 sec. Processed 94.01 thousand rows, 111.21 KB (982.83 thousand rows/s., 1.16 MB/s.)

So this is over 70 times slower and it looks like it will slow down further with additional records.

Group by

select conversation_id, max(disconnected_at) as disconnected_at, max(user_responded_at) as user_responded_at from conversations group by conversation_id limit 10 format Vertical

final

select conversation_id, disconnected_at, user_responded_at from conversations final limit 10 format Vertical

Zarathustra2 · 2023-09-19T11:38:37Z

I mean 0.584 isn't bad depending on use case but also I dont see the full query & table so cannot say whether the query without final is performant written or not (but most likely it is not)

hkrutzer · 2023-09-19T11:42:29Z

If you say so. I will stop derailing this issue, I think I've made a clear case for FINAL.

Zarathustra2 · 2023-09-19T11:43:32Z

The case is not clear because I don't see the table definition and the queries you used that result in that huge performance diff

Edit: I am not against the PR but I am just saying that in all cases you can have at least equal performance and in most cases better performance without final modifier

ruslandoga mentioned this issue Sep 22, 2023

custom FROM experiment #129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop hints support #125

drop hints support #125

ruslandoga commented Sep 19, 2023 •

edited

hkrutzer commented Sep 19, 2023

ruslandoga commented Sep 19, 2023 •

edited

hkrutzer commented Sep 19, 2023

ruslandoga commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023

hkrutzer commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023

hkrutzer commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023

hkrutzer commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023 •

edited

hkrutzer commented Sep 19, 2023 •

edited

Zarathustra2 commented Sep 19, 2023

hkrutzer commented Sep 19, 2023 •

edited

Zarathustra2 commented Sep 19, 2023 •

edited

hkrutzer commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023 •

edited

drop hints support #125

drop hints support #125

Comments

ruslandoga commented Sep 19, 2023 • edited

hkrutzer commented Sep 19, 2023

ruslandoga commented Sep 19, 2023 • edited

hkrutzer commented Sep 19, 2023

ruslandoga commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023

hkrutzer commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023

hkrutzer commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023

hkrutzer commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023 • edited

hkrutzer commented Sep 19, 2023 • edited

Zarathustra2 commented Sep 19, 2023

hkrutzer commented Sep 19, 2023 • edited

Group by

final

Zarathustra2 commented Sep 19, 2023 • edited

hkrutzer commented Sep 19, 2023

Zarathustra2 commented Sep 19, 2023 • edited

ruslandoga commented Sep 19, 2023 •

edited

ruslandoga commented Sep 19, 2023 •

edited

Zarathustra2 commented Sep 19, 2023 •

edited

hkrutzer commented Sep 19, 2023 •

edited

hkrutzer commented Sep 19, 2023 •

edited

Zarathustra2 commented Sep 19, 2023 •

edited

Zarathustra2 commented Sep 19, 2023 •

edited