Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer overflow with -CF -8 options and 0x00 0xFF input #628

Open
szhorvat opened this issue Feb 25, 2024 · 23 comments
Open

Buffer overflow with -CF -8 options and 0x00 0xFF input #628

szhorvat opened this issue Feb 25, 2024 · 23 comments

Comments

@szhorvat
Copy link

AddressSanitizer detect a global buffer overflow in all our scanners (five different ones) when compiled with the -CF -8 options and fed with a two-byte input of 0x00 0xFF.

An example of one of the simpler scanners is https://github.com/igraph/igraph/blob/58e01aa8594b98c198118b507f29186854cdbc3b/src/io/ncol-lexer.l

The crash happens on the first line within a for loop like this:

		for ( yy_c = YY_SC_TO_UI(*yy_cp);
		      (yy_trans_info = &yy_current_state[yy_c])->
		yy_verify == yy_c;
		      yy_c = YY_SC_TO_UI(*++yy_cp) )
			yy_current_state += yy_trans_info->yy_nxt;
		}

Before investigating further, and putting in the time to produce a minimal example, I wanted to ask if similar issues are known.

@Mightyjo
Copy link
Contributor

Mightyjo commented Feb 25, 2024 via email

@szhorvat
Copy link
Author

Yes, correct.

@Mightyjo
Copy link
Contributor

Mightyjo commented Feb 25, 2024

This wasn't what I thought. It was more interesting.

I think everything is fine. Here's what I see happening and why I think ASAN is mad. yy_current_state is a pointer into a state transition table. (The specific table changes, but ASAN would give a different error if it weren't initialized to one of them.)

With your example scanner and using the -CF, "full speed", option, yy_current_state is initialized pointing into the 1D array of start states:
yy_current_state = yy_start_state_list[YY_G(yy_start)];

A couple of lines later it's re-assigned to point into the 2D transition table yy_nxt:
while ((yy_current_state = yy_nxt[yy_current_state][ M4_EC(YY_SC_TO_UI(*yy_cp)) ]) > 0) {

Then without further re-assignment, we hit the block on which ASAN halted:

for ( yy_c = YY_SC_TO_UI(*yy_cp);
      (yy_trans_info = &yy_current_state[yy_c])->yy_verify == yy_c;
      yy_c = YY_SC_TO_UI(*++yy_cp) )
    yy_current_state += yy_trans_info->yy_nxt;
}

I think the second assignment sets ASAN's assumption about yy_current_state's size and dimension. When we immediately violate those assumptions by treating it as a 1D array and pointer again ASAN throws the global buffer overflow error.

I don't think there's really an out-of-bounds access here. I think we just broke the Address Sanitizer by feeding it conflicting type information. Please post an ASAN log if you can. I'm curious what it thinks is happening and curious whether we can give it better hints.

BREAK BREAK

Something you might see in the future, and this is a fun one: In Fullspeed mode, flex routinely accesses yy_current_state as a 1D array with negative indexes. You'll mostly see this as yy_current_state[-1].yy_nxt when we generating backing-up actions.

@szhorvat
Copy link
Author

szhorvat commented Feb 25, 2024

I don't think there's really an out-of-bounds access here. I think we just broke the Address Sanitizer by feeding it conflicting type information. Please post an ASAN log if you can. I'm curious what it thinks is happening and curious whether we can give it better hints.

Sure, the full log is at the end.

This issue, as well as #627, came up while using OSS-fuzz, a fuzz testing service for open-source projects. With one of the new fuzz targets we started to get timeouts, and it turned out that the Flex scanner was surprisingly slow on some edge cases that the fuzzer tended to hit. Using either -Cf or -CF sped it up very considerably. However, with -Cf we hit #627 and with -CF this ASan issue. The problem is that this is blocking fuzzing. At this point I think I'll just try to address the bad performance in other ways than generating full tables (if you have any suggestions, let me know).

ASan is not supposed to have false positives, but it could of course have bugs. I was using Clang 17.0.6 on macOS for this test, and I believe OSS-fuzz is currently using Clang 15. These ASan logs are from this program with a 2-byte input of 0x00 0xFF, but this program is probably not something you should try to work with directly on your end.

"located 0 bytes after global variable 'yy_transition'" seems a bit strange in the logs below. If I understand this right, we should be hitting inside yy_transition, correct? I'm not sure what "0 bytes after" means.

==17951==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000102b68ba4 at pc 0x000102b11d30 bp 0x00016d413a20 sp 0x00016d413a18
READ of size 2 at 0x000102b68ba4 thread T0
    #0 0x102b11d2c in igraph_ncol_yylex ncol-lexer.c:1335
    #1 0x102b0bb88 in igraph_ncol_yyparse ncol-parser.c:1409
    #2 0x102b03f90 in igraph_read_graph_ncol ncol.c:157
    #3 0x1029edcb0 in LLVMFuzzerTestOneInput read_ncol.cpp:41
    #4 0x102b3afac in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) FuzzerLoop.cpp:614
    #5 0x102b2894c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) FuzzerDriver.cpp:324
    #6 0x102b2d934 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) FuzzerDriver.cpp:859
    #7 0x102b55fe0 in main FuzzerMain.cpp:20
    #8 0x182f810dc  (<unknown module>)
    #9 0x312bfffffffffffc  (<unknown module>)

0x000102b68ba4 is located 0 bytes after global variable 'yy_transition' defined in '/Users/szhorvat/Repos/igraph-main/igraph/build/src/io/parsers/ncol-lexer.c' (0x102b66f40) of size 7268
SUMMARY: AddressSanitizer: global-buffer-overflow ncol-lexer.c:1335 in igraph_ncol_yylex
Shadow bytes around the buggy address:
  0x000102b68900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000102b68980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000102b68a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000102b68a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000102b68b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x000102b68b80: 00 00 00 00[04]f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000102b68c00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000102b68c80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000102b68d00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000102b68d80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x000102b68e00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

@Mightyjo
Copy link
Contributor

Regarding your question on StackOverflow, Piotr Siupa has the right idea. Declare your own string accumulator, then append to it in an action for <STRBODY>[^\0"]. Don't include a repetition operator on that rule. Match one character at a time. Add a rule like <STRBODY>\" to match the end of the string and copy the accumulator to yytext. You'll probably need to use the %array option in this case. Altogether, that will avoid keeping state for back-tracking and your standard scanner will be faster without -Cf or -CF.

Still thinking about the ASAN log. That's odd stuff.

@Mightyjo
Copy link
Contributor

Two requests:

  1. Any chance you can post the C or C++ version of the generated scanner? I'm reading the skeleton and sometimes it helps to see how that boiled down into source. I don't understand where yy_transition is coming into play, unless it's pulling that name out of a linker table.

  2. Do you get a crash, core dump, etc. when you run this test case without ASAN? ASAN will force an exit when it sees an out-of-bounds access based on its memory map.

ASAN is really good. It has no false positives in the sense that it never reports a memory access that didn't happen. It builds its shadow map based on compile-time information, though. I suspect we mess them up, but I haven't dug in deeply yet.

@szhorvat
Copy link
Author

szhorvat commented Feb 26, 2024

It seems I can reproduce this with the simple example from the manual: https://westes.github.io/flex/manual/Simple-Examples.html#Simple-Examples

        int num_lines = 0, num_chars = 0;

%%
\n      ++num_lines; ++num_chars;
.       ++num_chars;

%%

int main()
        {
        yylex();
        printf( "# of lines = %d, # of chars = %d\n",
                num_lines, num_chars );
        }
flex -CF -8 -o ex.c ex.l 
cc ex.c -fsanitize=address -g -fno-omit-frame-pointer -O0 -lfl -o ex

Find a way to create a file infile with only two byes, 0x00 and 0xFF. Maybe echo -ne '\x00\xFF' > infile.

Run ./ex < infile

@szhorvat
Copy link
Author

2. Do you get a crash, core dump, etc. when you run this test case without ASAN?

No, not so far.

@Mightyjo
Copy link
Contributor

Thanks for the example! I'll check as soon as I can. In case it matters, are you using clang, gcc, or something else?

@szhorvat
Copy link
Author

I tried both with Clang v15 and v17 (macOS on ARM) and GCC v14 (Fedora in Docker on ARM).

Here's the GCC ASan output for the minimal example:

==126==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000000421048 at pc 0x0000004107b8 bp 0xffffc68aa360 sp 0xffffc68aa378
READ of size 2 at 0x000000421048 thread T0
    #0 0x4107b4 in yylex //ex.c:798
    #1 0x41543c in main //ex.l:12
    #2 0xffff831209d8 in __libc_start_call_main (/lib64/libc.so.6+0x309d8) (BuildId: 8adaf91039cb8cc151d7ea5fbb18d8f0aac3087c)
    #3 0xffff83120ab8 in __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x30ab8) (BuildId: 8adaf91039cb8cc151d7ea5fbb18d8f0aac3087c)
    #4 0x41026c in _start (/ex+0x41026c) (BuildId: 0a5cc01e80029fa667818cd6c58615b5be349193)

0x000000421048 is located 56 bytes before global variable '*.LC0' defined in 'ex.c' (0x421080) of size 51
  '*.LC0' is ascii string 'fatal flex scanner internal error--no action found'
0x000000421048 is located 0 bytes after global variable 'yy_transition' defined in 'ex.c:361:35' (0x420020) of size 4136
SUMMARY: AddressSanitizer: global-buffer-overflow //ex.c:798 in yylex

@szhorvat
Copy link
Author

szhorvat commented Feb 26, 2024

It's maybe interesting to mention that if I run without ASan, on macOS I get:

fatal flex scanner internal error--no action found

This doesn't look right, does it?

On Fedora, I get the expected output (regardless of whether compiling with GCC and Clang):

# of lines = 0, # of chars = 2

The Flex output is not of the same size on the two platforms, even through I used flex 2.6.4 on both. On macOS I see 1739 lines in ex.c (flex from MacPorts) and on Fedora I see 1823 lines. Not sure why, neither seem to apply significant patches:

https://github.com/macports/macports-ports/tree/master/devel/flex/files
https://src.fedoraproject.org/rpms/flex/tree/rawhide

@westes
Copy link
Owner

westes commented Feb 26, 2024

Thank you both for the very careful and thorough investigation on this issue. That we're handling a specific instance of crashing is awesome.

As a sanity check, would you confirm what version of flex you're using for this? (I didn't see it in my skim of the comments.)

Unless otherwise stated, we work on the tip of the master branch.

@szhorvat
Copy link
Author

Version 2.6.4 on all systems. A few days ago I tried to compile flex from the master branch, following these instructions, and installing all requirements listed there, but did not succeed. Some generated C headers appear to contain unevaluated code (m4 code? autoconf? I'm not very familiar with these). Any chance of a bugfix release this year? It's been 7 years since the last one, meaning 2.6.4 is old enough to be learning to write in school now ;-)

@westes
Copy link
Owner

westes commented Feb 26, 2024

Any chance of a bugfix release this year?

Yes, it's my top priority for flex at the moment. (There's just all the other things happening, so I'm not promising a specific date yet.)

@Mightyjo
Copy link
Contributor

I tried both with Clang v15 and v17 (macOS on ARM) and GCC v14 (Fedora in Docker on ARM).

I should get back to this today or tomorrow.

By macOS on ARM, are we talking about m1, m2, etc.? I don't own one of those to debug on, but it's looking like I should.

@szhorvat
Copy link
Author

By macOS on ARM, are we talking about m1, m2, etc.?

I don't think it should matter, but it's an M2. Could you not reproduce it on Intel? The same happened on the OSS-fuzz infrastructure which is x86_64 on Linux. Unfortunately I won't have direct access to Intel hardware for perhaps two more weeks.

I should get back to this today or tomorrow.

Resolving this is not urgent for me, but I hope the report will be useful for the long term improvement of Flex.

@Mightyjo
Copy link
Contributor

Just making sure I have all the info I need in case I can't reproduce. Otherwise I'll bash my head against it all weekend.

Your trouble building the main branch has me worried. We had an issue recently where main won't bootstrap on m2. I've built it on ARM though, so I don't know why. It's been really hard to debug using GHActions runners.

@Mightyjo
Copy link
Contributor

Mightyjo commented Mar 3, 2024

Quick update. I've been able to reproduce the ASAN reports on ubuntu 20.04 from the HEAD of the main branch. Our line numbers are off by 8, so I'm no longer concerned that it's an architecture flag thing in the build script. More hunting to follow.

@Mightyjo
Copy link
Contributor

Mightyjo commented Mar 4, 2024

I found a solution, but proving I've solved the right problem will take some extraordinary evidence.

Looks like we've been making the Fullspeed table too big for the last 30+ years, but it almost never causes a problem. I'm getting a PR together so folks can check my patch against their scanners. I'm having a hard time believing myself on this.

Mightyjo added a commit to Mightyjo/flex that referenced this issue Mar 4, 2024
Correct fullspeed table size computation from 'tblend + numecs + 1' to
'tblend + 2 + 1' to avoid allocating extra space that is never initialized.

Refs: westes#628
@Mightyjo
Copy link
Contributor

Mightyjo commented Mar 6, 2024

@szhorvat did the PR fix the issue for you? My mac mini is late arriving.

@szhorvat
Copy link
Author

szhorvat commented Mar 6, 2024

I'm travelling this week, I'll try again when I'm back. However, last time I tried, I wasn't able to build flex from the master branch, as some headers were mis-generated ... Can you please check that the prerequisites mentioned in the installation document are correct and complete?

@Mightyjo
Copy link
Contributor

Mightyjo commented Mar 6, 2024

The prerequisites list is complete and up to date. However, I've noticed quirks with packages on some distros. In particular, I have to install autopoint in addition to gettext on Ubuntu. Likewise, you may have to install a Tex system if your texinfo package doesn't list or provide its own dependencies.

My mac arrives next Tuesday. If we need to debug the build system we can do it then.

@Mightyjo
Copy link
Contributor

However, last time I tried, I wasn't able to build flex from the master branch, as some headers were mis-generated ... Can you please check that the prerequisites mentioned in the installation document are correct and complete?

I've moved this over to #632, and I have partial solutions now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants