Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UBSAN "index out of bounds" for generated C++ code #2073

Closed
jeroen opened this issue Sep 6, 2016 · 15 comments
Closed

UBSAN "index out of bounds" for generated C++ code #2073

jeroen opened this issue Sep 6, 2016 · 15 comments

Comments

@jeroen
Copy link

jeroen commented Sep 6, 2016

UBSAN shows index out of bounds errors for generated C++ code when setting the repeated fields. Is this a bug or am I using it incorrectly? An example test.proto:

syntax = "proto2";
package mytest;

message MYTEST {
  repeated double realValue = 2;
  repeated sint32 intValue = 3;
}

Compile to C++ using protoc test.proto --cpp_out=. Then main.cc looks like this:

#include "test.pb.h"

int main(){
  mytest::MYTEST x;
  x.add_realvalue(123.123);
  x.add_intvalue(123L);
  return 0;
}

Compile everything with sanitizer flags:

g++ -fsanitize=address,undefined,bounds-strict main.cc test.pb.cc \
  $(pkg-config --cflags --libs protobuf)

And then when I run it:

root@8e50c80421c4:~/test# ./a.out
/usr/include/google/protobuf/repeated_field.h:1289:35: runtime error: index 4 out of bounds for type 'double [1]'
/usr/include/google/protobuf/repeated_field.h:1289:35: runtime error: index 4 out of bounds for type 'int [1]'
/usr/include/google/protobuf/repeated_field.h:282:38: runtime error: index 4 out of bounds for type 'int [1]'
/usr/include/google/protobuf/repeated_field.h:282:38: runtime error: index 4 out of bounds for type 'double [1]'

I am using stock gcc and protobuf from Debian Testing:

root@8e50c80421c4:~/test# g++ --version
g++ (Debian 6.1.1-11) 6.1.1 20160802
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

root@8e50c80421c4:~/test# protoc --version
libprotoc 3.0.0
@xfxyjwf
Copy link
Contributor

xfxyjwf commented Sep 6, 2016

In protobuf we defined the storage of a repeated field as:

  struct Rep {
    Arena* arena;
    Element elements[1];
  }

but actually used Rep* to point to a larger space and may access rep->elements[4]. I guess this triggers the UBSAN check.

No idea how to make UBSAN happy with that but the protobuf code is working as intended.

@seishun
Copy link
Contributor

seishun commented Sep 9, 2016

It seems UBSAN is being overzealous here. As far as I know, there is no section in the standard that explicitly disallows writing out of bounds if there is sufficient storage.

@khingblue
Copy link
Contributor

@seishun I guess that's triggered by bounds-strict which passed to g++.

@jwijffels
Copy link

Hello there. I have the same issue with an R wrapper around the sentencepiece C++ library. https://github.com/google/sentencepiece
That library also uses protobuf-lite and gives exactly the same UBSAN message (snippet of log at https://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-UBSAN/sentencepiece/ shown below)

third_party/protobuf-lite/google/protobuf/repeated_field.h:1537:35: runtime error: index 1 out of bounds for type 'void *[1]'
    #0 0x7fa7d1e1b6d6 in google::protobuf::RepeatedPtrField<sentencepiece::ModelProto_SentencePiece>::TypeHandler::Type* google::protobuf::internal::RepeatedPtrFieldBase::Add<google::protobuf::RepeatedPtrField<sentencepiece::ModelProto_SentencePiece>::TypeHandler>(google::protobuf::RepeatedPtrField<sentencepiece::ModelProto_SentencePiece>::TypeHandler::Type*) third_party/protobuf-lite/google/protobuf/repeated_field.h:1537
    #1 0x7fa7d1e1b6d6 in google::protobuf::RepeatedPtrField<sentencepiece::ModelProto_SentencePiece>::Add() third_party/protobuf-lite/google/protobuf/repeated_field.h:1977
    #2 0x7fa7d1e1b6d6 in sentencepiece::ModelProto::add_pieces() sentencepiece/src/builtin_pb/sentencepiece_model.pb.h:3390
    #3 0x7fa7d1e1b6d6 in sentencepiece::TrainerInterface::Serialize(sentencepiece::ModelProto*) const sentencepiece/src/trainer_interface.cc:469
    #4 0x7fa7d1e5f72f in sentencepiece::TrainerInterface::SaveModel(absl::string_view) const sentencepiece/src/trainer_interface.cc:509
    #5 0x7fa7d1e6beee in sentencepiece::TrainerInterface::Save() const sentencepiece/src/trainer_interface.cc:547
    #6 0x7fa7d1c31230 in sentencepiece::character::Trainer::Train() sentencepiece/src/char_model_trainer.cc:57
    #7 0x7fa7d1ddb0d4 in sentencepiece::SentencePieceTrainer::Train(sentencepiece::TrainerSpec const&, sentencepiece::NormalizerSpec const&) sentencepiece/src/sentencepiece_trainer.cc:52
    #8 0x7fa7d1de55f7 in sentencepiece::SentencePieceTrainer::Train(sentencepiece::util::min_string_view) sentencepiece/src/sentencepiece_trainer.cc:120
    #9 0x7fa7d21475a9 in spc_train(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) /data/gannet/ripley/R/packages/tests-gcc-SAN/sentencepiece/src/rcpp_sentencepiece.cpp:11

@xfxyjwf / @seishun Is there going to be a fix for these address sanitizer issues?

@jwijffels
Copy link

@TeBoring
Can we reopen this issue. I'm using protobuf in R package sentencepiece https://cran.r-project.org/package=sentencepiece which wraps https://github.com/google/sentencepiece from a google colleague of yours which uses protobuf.
I'm not allowed not put an update of this R package to CRAN because of exactly this Address Sanitizer issue reported here.

@acozzette
Copy link
Member

@jwijffels I think you are right that this is undefined behavior. It is called a "flexible array member" and is technically valid in C but not C++. However, I found from researching this that it appears to be a pretty common pattern and is supported by GCC, Clang, and MSVC. I'm not sure that we can justify spending the time to fix this just to be technically compliant with the standard, even though all major compilers already behave the expected way. The feature may end up in a future C++ standard anyway. Would the CRAN owners perhaps be receptive to making an exception if they knew more of the context around this?

@acozzette acozzette reopened this Jul 1, 2020
@jeroen
Copy link
Author

jeroen commented Jul 1, 2020

@acozzette the cran maintainers don't accept that. They specifically enforce that all code follows the standards, because it should also run on scientific supercomputers with very unusual architectures and compilers, such as intel icc, solaris studio, AIX, and so on. Therefore undefined behavior is not allowed, even if it still works in gcc/clang.

So if you believe this is valid behavior, it should be fixed in gcc-ubsan.

If you believe fixing this would negatively affect performance, you could condition it in a macro, such that we can compile with -D_STRICT_CXX_STANDARD or something like that so we can use that in the R bindings.

@jwijffels
Copy link

@acozzette I already explained cran about the expanding arrays but it was a clear 'no can't do any more'.
Is this project already using OSS-Fuzz, this issue would be a great fit. Looking forward to seeing improvements on possible fixes for this address sanitizer issue.

@acozzette
Copy link
Member

We now have a fix for this but it is just waiting to be synced from our internal repo to GitHub.

@jwijffels
Copy link

🥳

@jeroen
Copy link
Author

jeroen commented Jul 31, 2020

@acozzette can you point us to the commit, and update this issue if it has been released?

@jeroen
Copy link
Author

jeroen commented Aug 14, 2020

@acozzette has this landed yet?

@acozzette
Copy link
Member

@jeroen This has now landed in commit 95e6c5b.

@jeroen
Copy link
Author

jeroen commented Aug 19, 2020

Thanks. Will this only be part of the 4.0 release, or also get fixed in 3.x ?

@acozzette
Copy link
Member

We were originally planning to release 4.0 but decided instead to just go with 3.13 for now (which we released at the end of last week). I'm not sure if our next release will be 3.14 or 4.0, but either way it will include this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants