Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] string_view for all target compilations #3109

Closed
xTachyon opened this issue Mar 6, 2021 · 4 comments
Closed

[C++] string_view for all target compilations #3109

xTachyon opened this issue Mar 6, 2021 · 4 comments

Comments

@xTachyon
Copy link
Contributor

xTachyon commented Mar 6, 2021

std::string_view is a great addition to the standard. It provides a great and cheap way to pass const strings around that could come from a std::string, from a c string or even from some in-memory buffer that doesn't even have to be null terminated.

I noticed there was an attempt to add usages of this type in #2847 by checking in a macro for 17. However, since this still is C++11 library, this poses some problems:

  1. Firstly, the obvious one that I'm the most concerned about is that for builds made on C++11 and C++14 there is still the unnecessary string copy that compilers can't yet optimize out.
    For the ANTLRInputStream class, even if it has a constructor that takes const char*, size_t that looks like it doesn't make a copy, that actually creates a std::string and it passes down to the other constructor. So even on a C++17 build this class is not saved from extra copies.

  2. As someone pointed out in another PR, this creates 2 separate ABIs for the library, making it impossible to link antlr built with 17 and a binary built with 11 or vice versa.

  3. This significantly reduces the ability detect compilation errors and/or runtime errors without building and running tests with the library compiled with both 17 and pre-17.

The solutions that I thought of involve replacing std::string_view with a substitute or removing it at all. This can be done because std::string_view doesn't use any language feature introduced in 17.

  1. string-view-lite looks like a full implementation of string_view and can be configured to not use std::string_view even when building with 17 in order to not break ABI again.

  2. string_view is a trivial class to implement and so far it doesn't look like Antlr will need anything more the constructors, .size() and .data(). One can implement a small StringView that can be embedded directly in this project.

  3. Maybe it would be a good idea to take a step back and just provide overloads that take const char*, size_t for any constructor/method that doesn't copy the input string and remove string_view altogether.

To be fair, I am not a big fan of the last suggestion. I think either suggestion 1 or 2 would be a good fit for the project, but I like 2 the most.

What do you think about this?

@xTachyon
Copy link
Contributor Author

xTachyon commented Mar 6, 2021

Sorry for pinging but it seems you're the guy who I should talk to @mike-lischke .

@mike-lischke
Copy link
Member

I'm trying to figure out what you are after here and, to my shame, I currently fail at that. The C++ runtime is not per se a C++11 lib. I use it with C++17 already long before PR #2847 was added.

Are you saying the addition of the string_view c-tor was a bad decision and you want it removed? With some distance to when PR #2847 was filed I would now prefer to have another c-tor, instead of modifying one depending on which C++ dialect is used to compile the C++ runtime. But otherwise this was a good addition.

When using the string_view c-tor only one copy of the input is made: when UTF-8 is converted to UTF-32. The :ANTLRInputStream(const char data_[], size_t numberOfActualCharsInArray) is rather a reminiscence to the Java API, but you don't have to use it and it doesn't depend on the C++ dialect. For the stream input we need a temporary copy because we don't have a UTF conversion function which can take a stream.

For the first point 3: I don't get what you mean here. If the C++ dialect doesn't match you will get a compilation error.

About your solutions: can you explain what you actually want to solve? It's not clear to me. Can you perhaps file a PR so I can better understand what you wanna have changed?

@mike-lischke
Copy link
Member

mike-lischke commented Mar 9, 2021

@xTachyon After doing a few tests I realized that the change to compile the c-tor of ANTRLInputStream differently depending on the used C++ dialect caused trouble even in my existing applications. So I carried out my idea to use 2 different c-tors instead. This should take care of all relevant scenarios. Please review my PR #3113.

@xTachyon
Copy link
Contributor Author

xTachyon commented Mar 9, 2021

My problem with the current design is that if you're using the library with pre-17, the only way to use ANTLRInputStream without making temporary copies is to pass an already constructed std::string, which forces you to make a copy if you don't have the std::string already constructed with just the content you want to pass. So from my point of view there's no way to use this class without making a copy.

To solve that I proposed to get a string_view-like type in pre-17 or to make sure the const char*, size_t overloads don't copy the memory. I understand that the utf8->utf32 conversion makes a copy, but my problem was the copy made before that function gets called. I also understand the reason to make a copy when the class is used with a stream.

I also suspect that a string_view for pre-17 could be used in a lot more places, like the static variables generated by the parser that currently use std::string, but that's a story for another time.

About your PR, I agree that the dual ctors idea solve the ABI change. The only thing I'd add is that load(const std::string &input) should be available always, similar to what you did in the ctor, so the interface is similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants