Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emscripten asm.js build of cmark-gfm overflows Javascript stack and can't be loaded in web browser #365

Open
ptc-shunt opened this issue May 15, 2024 · 0 comments

Comments

@ptc-shunt
Copy link

ptc-shunt commented May 15, 2024

When building cmark-gfm to asm.js with Emscripten I ran into the issue that the cmark library suffers a Javascript stack overflow during loading. It happens in both debug and release builds.

To reproduce:

  • Install Emscripten SDK (reproduced with 3.1.50 and 3.1.59)
  • Modify api_test/CMakeLists.txt to make emscripten output an HTML test page (see attached CMakeLists.txt)
  • Build using the attached batch file
  • Go to api_test inside the build directory
  • Double-click api_test.html
  • Open Developer Tools in your web browser
  • In the Console tab, there will be a message saying "Uncaught RangeError: Maximum call stack size exceeded"

CMakeLists.txt
build_cmarkgfm_js.bat.txt

It appears this error is emitted while loading the cmark-gfm library code, i.e. before any execution has happened.

I tracked it down to the very large switch statement in case_fold_switch.inc. If you look at api_test.js (debug build, for readability) in an editor and search for cmark_utf8proc_case_fold and then scroll down you will see that Emscripten has generated a very deeply nested set of {} blocks - over 1400 levels deep. This evidently exceeds the browser's Javascript stack capacity,

function cmark_utf8proc_case_fold($0, $1, $2) {
  $0 = $0 | 0;
  $1 = $1 | 0;
  $2 = $2 | 0;
  var $5 = 0, $22 = 0, wasm2js_i32$0 = 0, wasm2js_i32$1 = 0;
  $5 = __stack_pointer - 32 | 0;
  __stack_pointer = $5;
  HEAP32[($5 + 28 | 0) >> 2] = $0;
  HEAP32[($5 + 24 | 0) >> 2] = $1;
  HEAP32[($5 + 20 | 0) >> 2] = $2;
  label$1 : {
   label$2 : while (1) {
    if (!((HEAP32[($5 + 20 | 0) >> 2] | 0 | 0) > (0 | 0) & 1 | 0)) {
     break label$1
    }
    (wasm2js_i32$0 = $5, wasm2js_i32$1 = cmark_utf8proc_iterate(HEAP32[($5 + 24 | 0) >> 2] | 0 | 0, HEAP32[($5 + 20 | 0) >> 2] | 0 | 0, $5 + 16 | 0 | 0) | 0), HEAP32[(wasm2js_i32$0 + 12 | 0) >> 2] = wasm2js_i32$1;
    label$3 : {
     label$4 : {
      if (!((HEAP32[($5 + 12 | 0) >> 2] | 0 | 0) >= (0 | 0) & 1 | 0)) {
       break label$4
      }
      $22 = HEAP32[($5 + 16 | 0) >> 2] | 0;
      label$5 : {
       label$6 : {
        label$7 : {
         label$8 : {
          label$9 : {
           label$10 : {
            label$11 : {
             label$12 : {
              label$13 : {
               label$14 : {
                label$15 : {
                 label$16 : {
                  label$17 : {
                   label$18 : {
                    label$19 : {     -----> goes on for many, many levels up to label$1407

Evidently this is an emscripten code gen issue which the huge switch statement provokes. Tried changing the compiler optimize setting (including -Os) but it didn't help.

I was able to work around it by rearranging the code in utf8.c and case_fold_switch.inc to use if's instead of a switch. The code generated from that is much less deeply nested and loads & runs correctly (console reports all tests passed).

Another alternative might be some sort of lookup table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant