Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This exposes 'reset' for object and array instances. #1696

Merged
merged 10 commits into from
Aug 21, 2021

Conversation

lemire
Copy link
Member

@lemire lemire commented Aug 12, 2021

I am not super happy about exposing these methods, but it appears that we need to be able to scan arrays and objects twice (@NicolasJiaxin ?) in some instances. These rewind functions will do just that.

It is somewhat dangerous and should be used with caution because if you access string values multiple times, it will be unsafe (and inefficient). That is, they do not rewind the string buffer.

Note that this functionality competes with @jkeiser 's 'fork' idea but I really would prefer to release 1.0 in the near future and the fork functionality seems speculative. Currently, we have always just one json_iterator instance, and it seems that fork would allow us to copy that... which could potentially expose other bugs.

The benefit of this implementation is that it is non-invasive, so realistic for release 1.0.

If it works out for 1.0, we can revisit this design post-1.0.

@jkeiser
Copy link
Member

jkeiser commented Aug 13, 2021

I'm OK with it. We can mark it dangerous later if we come up with something better, but it is OK to have things like rewind() come with warning labels. People don't read the labels on common things, but rewind is a word that screams caution in the first place.

@lemire
Copy link
Member Author

lemire commented Aug 13, 2021

@jkeiser I'd rather not expose it if we can avoid it, but @NicolasJiaxin has been doing some serious work with it and he'll be able to give us feedback soon enough.

@lemire
Copy link
Member Author

lemire commented Aug 13, 2021

@jkeiser Thanks for the feedback btw.

@jkeiser
Copy link
Member

jkeiser commented Aug 16, 2021

I agree we should try not to expose it if we can do fork or something safer/clearer instead, just saying it's not necessarily a disastrous API sin in this case :)

@lemire
Copy link
Member Author

lemire commented Aug 16, 2021

@NicolasJiaxin I have exposed the bug you reported as part of this PR.

@NicolasJiaxin
Copy link
Contributor

@lemire If this is ever merged, maybe you could consider adding arrays/objects/values rewind for at_pointer() to enable multiple queries for arrays/objects/values (the same way we added rewind() for document::at_pointer() to enable multiple queries on a single documents). Of course, values should be either arrays or objects. Something like this could do (pseudocode):

value::rewind() {
    json_type t;
   SIMDJSON_TRY(type().get(t));
   switch (t)
   {
   case json_type::array:
      array(value).rewind();
      break;
    case json_type::object:
      object(value).rewind();
      break;
    default:
      return INCORRECT_TYPE;
    }
}

Also, this does not even have to be public. It can be protected, and I think this would work internally for at_pointer(). Just a thought..

@lemire
Copy link
Member Author

lemire commented Aug 18, 2021

If this is ever merged

Yes. I expect to merge this PR. I think that @jkeiser has de facto approved it (he wrote "I am OK with it") so it is not silly.

maybe you could consider adding arrays/objects/values rewind for at_pointer() to enable multiple queries for arrays/objects/values (the same way we added rewind() for document::at_pointer() to enable multiple queries on a single documents). Of course, values should be either arrays or objects.

At a glance, it seems like a good proposal but I am concerned with a confusion between rewind() for documents and rewind for arrays and objects. They do different things. A document rewind resets everything. All references are gone and the string buffer is cleared. An array/object rewind just points you back at the beginning. It does not clear the string buffer. You still can't consume values multiple times.

So, suppose I have this...

{"a":"my string"}

And I do std::string_view(at_pointer("/a")) multiple times on a document, each time "my string" is moved to our string buffer. That's what is backing the std::string_view instance. If the at_pointer is the document, then each call rewinds the string buffer.

However, we don't do that for object and arrays. We just set back the point at the beginning. If someone grabbed a string value, it got written to our string buffer. So if you see a loop where you repeatedly access the same values in the same array, you will end up with a buffer overflow.

We could still support what you seek, but we would need that a rewind call on the an object or array resets the string buffer.

I will open a distinct issue. (Update: see #1701 )

@lemire
Copy link
Member Author

lemire commented Aug 18, 2021

I am now thinking that we could call rewind for object and array "reset" instead of "rewind" so that it is clearly distinct from "document.rewind()". It is indeed a distinct functionality and I used the same name for both. Thoughts? cc @jkeiser @NicolasJiaxin

@NicolasJiaxin
Copy link
Contributor

I am now thinking that we could call rewind for object and array "reset" instead of "rewind" so that it is clearly distinct from "document.rewind()". It is indeed a distinct functionality and I used the same name for both. Thoughts? cc @jkeiser @NicolasJiaxin

Yes I think that is a nice distinction to make, rewind is exclusive for documents

@lemire
Copy link
Member Author

lemire commented Aug 18, 2021

@NicolasJiaxin I will make the change now.

@lemire lemire changed the title This exposes 'rewind' for object and array instances. This exposes 'reset' for object and array instances. Aug 18, 2021
@lemire
Copy link
Member Author

lemire commented Aug 18, 2021

@NicolasJiaxin I have just renamed these functions to reset.

Note that your proposal is not a bad one and I think we will implement it. I just don't want us and the users to get confused.

@NicolasJiaxin
Copy link
Contributor

@lemire Yes, I just wanted to share the thought, because I came across an issue in lemire/rcppsimdjson#2 where value.reset() might be useful, but I found other, maybe better, ways around it.

@lemire
Copy link
Member Author

lemire commented Aug 18, 2021

I found other, maybe better, ways around it.

I love working with smart people.

@lemire
Copy link
Member Author

lemire commented Aug 21, 2021

I am going to merge this. I am not super happy, but it looks like a step forward.

@lemire lemire merged commit 6bed34a into master Aug 21, 2021
@lemire lemire deleted the dlemire/exposing_array_object_rewind branch August 21, 2021 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants