Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of buffer, stringBuffer and tape. #42

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ZhaiMo15
Copy link

In class JsonValue, the default size of buffer and stringBuffer, as well as long[] tape in class Tape, is 34M.
But in practice it's not necessary.
This patch reduce the size of them from 34M to its actual size.

@ZhaiMo15
Copy link
Author

I believe in future, the JsonValue might be deep copied, thus the size of byte[] and long[] is important.

@ZhaiMo15 ZhaiMo15 force-pushed the ReduceBufferSize branch 2 times, most recently from e72bac3 to 9fcfa59 Compare April 16, 2024 09:50
In class JsonValue, the default size of buffer and stringBuffer, as
well as long[] tape in class Tape, is 34M.
But in practice it's not necessary.
This patch reduce the size of them from 34M to its actual size.
@@ -34,7 +34,8 @@ public JsonValue parse(byte[] buffer, int len) {
}

private byte[] padIfNeeded(byte[] buffer, int len) {
if (buffer.length - len < PADDING) {
if (buffer.length - len < PADDING && len < capacity) {
byte[] paddedBuffer = new byte[len + PADDING];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for maintaining the paddedBuffer all the time, regardless of whether it is necessary or not, is to avoid allocations on hot paths. However, I see at least two issues with padding in general. Firstly, it requires adding this extra branch. Secondly, it complicates the API: on one hand, the user doesn't need to be aware of it, but on the other hand, if they want to achieve the best performance, they should pad the input. Therefore, I've been considering removing the need for padding altogether. It should be possible, although I haven't thoroughly researched this topic.

To summarize: I'd start by verifying if removing the padding is possible. If so, I'd remove it and test the performance of the parser. If there is no regression compared to the current version with padding, we have a win-win situation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by design, the padding is 64 bytes. However, the 'paddedBuffer' is 34MB, that's such a waste. I'm just change the padding size to 64 bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that this is a waste. However, in your approach, you are potentially allocating a new array on every call of the parse method, which can be costly.

I've been working on removing the padding entirely. It's a bit complicated, but we will see if it is feasible. I'll report back.


int stringBufferLen = stringParser.getStringBufferIdx();
byte[] newStringBuffer = new byte[stringBufferLen];
System.arraycopy(stringBuffer, 0, newStringBuffer, 0, stringBufferLen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to #36? I'm asking because I'm a bit concerned that we need another allocation on the parsing path.

Copy link
Author

@ZhaiMo15 ZhaiMo15 Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think there must be an allocation somewhere, if we want to save the information of "old" data.

Copy link
Author

@ZhaiMo15 ZhaiMo15 Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as I mentioned above, the default size of buffer and stringBuffer, as well as long[] tape in class Tape, is 34M. If we allocated 34M * 3 for each element, the cost is way too much.

@ZhaiMo15 ZhaiMo15 closed this Jun 11, 2024
@ZhaiMo15 ZhaiMo15 deleted the ReduceBufferSize branch June 11, 2024 03:05
@ZhaiMo15 ZhaiMo15 restored the ReduceBufferSize branch June 11, 2024 03:25
@ZhaiMo15 ZhaiMo15 reopened this Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants