Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the issue of ClickHouse encountering an out-of-memory problem when querying array fields. #63679

Open
asdas123123-svg opened this issue May 13, 2024 · 3 comments
Labels
question Question? st-need-info We need extra data to continue (waiting for response)

Comments

@asdas123123-svg
Copy link

Hello, I encountered a very bad situation while using ClickHouse and I hope to resolve it.

Specifically:
Initially, the queries were running normally, but last Saturday, there was a sudden spike in memory usage within the same query, causing the data not to be retrieved. Upon investigating the logs, it was found that the memory usage within the group was too large.
问题
It's obvious that this is caused by insufficient memory. However, the data in my table is only 400,000 records, occupying less than 1GB of disk space. Why does querying the array field consume so much memory? And how can I solve this problem?

@asdas123123-svg asdas123123-svg added the question Question? label May 13, 2024
@UnamedRus
Copy link
Contributor

How big are arrays?

What if set max_block_size = 1000

@davenger
Copy link
Member

Yes it's hard to figure out what consumed memory without at least knowing the schema of the table and the sizes of arrays in individual rows, because if one (or several) rows have huge array they will have to be fully read and uncompressed into memory when processing the query.
Also you can notice that progress says: 359.48 thousand rows, 17.11 GB

@davenger davenger added the st-need-info We need extra data to continue (waiting for response) label May 13, 2024
@asdas123123-svg
Copy link
Author

I increased the memory and then performed a length-based sorting query, which I think is because the array length is too long. Is there an appropriate range for the length of this array? Does this mean

record{
   id               Int64 comment 'ID',
   org_log_ids      Array(String) default [],
   datasource       String
  }

When my statement is: "SELECT id, org_log_ids, datasource FROM record LIMIT 10;", does the array field (org_log_ids) undergo aggregation, leading to excessive memory usage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question? st-need-info We need extra data to continue (waiting for response)
Projects
None yet
Development

No branches or pull requests

3 participants