Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed max buffer limitation #1639

Merged
merged 1 commit into from
May 21, 2024
Merged

Fixed max buffer limitation #1639

merged 1 commit into from
May 21, 2024

Conversation

chernser
Copy link
Contributor

@chernser chernser commented May 16, 2024

Summary

Max buffer size can now be overridden by user

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG
  • For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@Paultagoras Paultagoras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me but I'll let Mark take a look as well - have you noticed a performance difference?

@chernser
Copy link
Contributor Author

@Paultagoras
I see significant improvements when I set buffer almost equal to dataset size. With smaller buffers transfer takes about 4 secs.
With 750 mb buffer it is around 2 seconds. Whole dataset is generated in memory in 1600ms (1.6 s) - just for comparison.

I think when buffer is big is may fit whole data fast but if more space is needed than it will be costly to allocate another buffer. Smaller buffer have another side effect - they require more IO calls to push data to OS level.

So may be a good recommendation here is to use buffer size > estimated size of dataset to avoid expensive allocations.
Note for us: we need to research a buffer allocation algorithm that would be most effective in sending big datasets.

@Paultagoras
Copy link
Contributor

@Paultagoras I see significant improvements when I set buffer almost equal to dataset size. With smaller buffers transfer takes about 4 secs. With 750 mb buffer it is around 2 seconds. Whole dataset is generated in memory in 1600ms (1.6 s) - just for comparison.

I think when buffer is big is may fit whole data fast but if more space is needed than it will be costly to allocate another buffer. Smaller buffer have another side effect - they require more IO calls to push data to OS level.

So may be a good recommendation here is to use buffer size > estimated size of dataset to avoid expensive allocations. Note for us: we need to research a buffer allocation algorithm that would be most effective in sending big datasets.

What sort of memory footprint did that have? I assume the logic of a smaller buffer is that it doesn't keep so much stored in RAM.

@chernser
Copy link
Contributor Author

As I see it - if we have smaller buffers - we have to block more frequently to copy them to TCP layer. I'm currently talking about current implementation. But I have not tested it in this way.

@chernser chernser merged commit a57402e into main May 21, 2024
58 checks passed
@chernser chernser deleted the fix_max_buffsize branch May 24, 2024 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants