Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails on Termux / aarch64 #413

Closed
digikar99 opened this issue Feb 21, 2021 · 17 comments
Closed

Build fails on Termux / aarch64 #413

digikar99 opened this issue Feb 21, 2021 · 17 comments

Comments

@digikar99
Copy link

digikar99 commented Feb 21, 2021

Hi, thank you for this library!

I was looking to see if I could use this library for a project, and wanted to check the performance differences on an aarch64 android device I own. For this, I'm using Termux, and encountered the following errors on make. I think it's related to linking libm, but not sure where exactly the fix goes, or if there is a non-trivial amount of work required to make this work on Termux; will let you know if I find the fix.

See terminal output of `make`.
[  2%] Built target mkrename_gnuabi
[  3%] Built target sleefgnuabiadvsimddp
[  4%] Built target sleefgnuabiadvsimdsp
[  5%] Built target sleefgnuabi
[  6%] Built target addSuffix
[  7%] Built target mkrename
[  8%] Built target renamedspscalar.h_generated
[  9%] Built target mkdisp
[ 10%] Built target dispscalar.c_generated
[ 14%] Built target headers
[ 15%] Built target dispscalar_obj
[ 16%] Built target renamePURECFMA_SCALAR.h_generated
[ 18%] Built target sleefdetpurecfma_scalar
[ 20%] Built target sleefpurecfma_scalar
[ 22%] Built target sleefscalar
[ 22%] Built target renamePUREC_SCALAR.h_generated
[ 23%] Built target sleefdetpurec_scalar
[ 25%] Built target sleefpurec_scalar
[ 26%] Built target renameADVSIMDNOFMA.h_generated
[ 27%] Built target sleefadvsimdnofma
[ 28%] Built target mkalias
[ 28%] Built target alias_advsimd.h_generated
[ 28%] Built target renameADVSIMD.h_generated
[ 30%] Built target sleefadvsimd
[ 30%] Built target common
[ 32%] Built target sleefdetadvsimdnofma
[ 34%] Built target sleefdetadvsimd
[ 35%] Built target sleef
[ 36%] Built target mkmasked_gnuabi
[ 37%] Linking C executable ../../bin/tester
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/tester.c.o: in function `do_test':
/data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2335: undefined reference to `nextafter'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2335: undefined reference to `nextafter'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2343: undefined reference to `nextafter'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2343: undefined reference to `nextafter'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2351: undefined reference to `nextafter'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/tester.c.o:/data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2351: more undefined references to `nextafter' follow
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/tester.c.o: in function `do_test':
/data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2585: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2586: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2612: undefined reference to `nextafter'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2931: undefined reference to `nextafterf'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2931: undefined reference to `nextafterf'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2935: undefined reference to `nextafterf'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2935: undefined reference to `nextafterf'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2944: undefined reference to `nextafterf'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/tester.c.o:/data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:2944: more undefined references to `nextafterf' follow
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/tester.c.o: in function `do_test':
/data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:(.text+0x36e40): undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:3663: undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:(.text+0x36f08): undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:3663: undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:3663: undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/tester.c.o:/data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:(.text+0x37624): more undefined references to `pow' follow
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/tester.c.o: in function `do_test':
/data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4177: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4179: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4187: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4189: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:(.text+0x48b9c): undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4199: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4201: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:(.text+0x48d08): undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4211: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4213: undefined reference to `ilogb'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:(.text+0x4e304): undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4462: undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:(.text+0x4e3d0): undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4462: undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:4462: undefined reference to `pow'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/tester.c.o:/data/data/com.termux/files/home/sleef/src/libm-tester/tester.c:(.text+0x4eb04): more undefined references to `pow' follow
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/testerutil.c.o: in function `countULPdp':
/data/data/com.termux/files/home/sleef/src/libm-tester/testerutil.c:175: undefined reference to `frexp'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/testerutil.c:176: undefined reference to `ldexpl'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/testerutil.c:176: undefined reference to `fmaxl'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/testerutil.c.o: in function `countULP2dp':
/data/data/com.termux/files/home/sleef/src/libm-tester/testerutil.c:223: undefined reference to `frexp'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/testerutil.c:224: undefined reference to `ldexpl'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/sleef/src/libm-tester/testerutil.c:224: undefined reference to `fmaxl'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/testerutil.c.o: in function `countULPsp':
/data/data/com.termux/files/home/sleef/src/libm-tester/testerutil.c:256: undefined reference to `frexp'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: CMakeFiles/tester.dir/testerutil.c.o: in function `countULP2sp':
/data/data/com.termux/files/home/sleef/src/libm-tester/testerutil.c:283: undefined reference to `frexp'
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [src/libm-tester/CMakeFiles/tester.dir/build.make:122: bin/tester] Error 1
make[1]: *** [CMakeFiles/Makefile2:1073: src/libm-tester/CMakeFiles/tester.dir/all] Error 2
make: *** [Makefile:161: all] Error 2
\```

</details>
@shibatch
Copy link
Owner

Hello,

Please check if specifying -DBUILD_TESTS=FALSE as a cmake option works.

@digikar99
Copy link
Author

digikar99 commented Feb 21, 2021

Hi, thanks!

Yes, make works, but make install fails; so need another installation prefix (edit: trying CMAKE_INSTALL_PREFIX; will get back if any issues!):

Install the project...
-- Install configuration: "Debug"
CMake Error at src/libm/cmake_install.cmake:52 (file):
  file cannot create directory: /usr/local/lib.  Maybe need administrative
  privileges.
Call Stack (most recent call first):
  src/cmake_install.cmake:42 (include)
  cmake_install.cmake:42 (include)


make: *** [Makefile:94: install] Error 1

@digikar99
Copy link
Author

Yes cmake -DBUILD_TESTS=FALSE -DCMAKE_INSTALL_PREFIX=/data/data/com.termux/files/usr .. solves it.

As far as performance goes, Sleef_sinhf4_u10 and its other three variants are 10 times slower (Sleef_sin is 5 times slower) than the corresponding scalar variant packed with clang in Termux; I'd suspect non inlining has something to do with it, but not sure, and that's a separate issue.

This particular issue's complete resolution might require being able to compile and run the tests; but otherwise, things seem to work; thanks!

@shibatch
Copy link
Owner

Are you comparing scalar functions?
Scalar functions in SLEEF are not meant to be fast, as I wrote in the FAQ.

@shibatch
Copy link
Owner

5 times difference is too much.
Probably you are comparing wrong functions.

@digikar99
Copy link
Author

I'm comparing the sleef's non-scalar versions with the libm's scalar versions. On x86_64, the speeds are comparable; however, on aarch64, the simd equivalents for sin and sinh provided by sleef are several times slower than the scalar versions provided by libm. (I'll upload a gist.)

@shibatch
Copy link
Owner

So you are comparing different functions. That’s not fair comparison.

@shibatch
Copy link
Owner

Depending on the CPU micro architecture, the FPU may not be vectorized internally. You should be careful when comparing performance.

@digikar99
Copy link
Author

https://gist.github.com/digikar99/5c103a9c017c49d6cf9d9c5cfde3e60c

Depending on the CPU micro architecture

That could be it: are any of these features supposed to work (wrt performance) with arm_neon.h? Or, could you point to any resources for asimd if arm_neon.h in case not the way forward for asimd?

fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp

@digikar99
Copy link
Author

digikar99 commented Feb 21, 2021

The index is always zero in your code.

The pointers are being incremented:

      a_ptr += 1;
      b_ptr += 1;

Or am I committing some human error by misreading something?


The calls to sin and sinh are optimized away.

If I actually comment out the b_ptr[0] = sin(a_ptr[0]);, then, it runs in 0.006 sec. With -S, I get the following assembly with the bl sin, which I'm guessing is the function call to sin:

gcc -O2 -fno-vectorize -S sin.c -lm && cat sin.s
clang-9: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
        .text
        .file   "sin.c"
        .globl  main                    // -- Begin function main
        .p2align        2
        .type   main,@function
main:                                   // @main
// %bb.0:
        stp     x24, x23, [sp, #-64]!   // 16-byte Folded Spill
        mov     w9, #16384
        mov     w10, #16384
        adrp    x11, a
        mov     x8, xzr
        movk    w9, #50460, lsl #16
        movk    w10, #17820, lsl #16
        add     x11, x11, :lo12:a
        mov     w12, #5000
        stp     x22, x21, [sp, #16]     // 16-byte Folded Spill
        stp     x20, x19, [sp, #32]     // 16-byte Folded Spill
        stp     x29, x30, [sp, #48]     // 16-byte Folded Spill
        add     x29, sp, #48            // =48
.LBB0_1:                                // =>This Inner Loop Header: Depth=1
        scvtf   s0, w8
        fmov    s1, w9
        fmov    s2, w10
        fadd    s0, s0, s1
        fdiv    s0, s0, s2
        str     s0, [x11, x8, lsl #2]
        add     x8, x8, #1              // =1
        cmp     x8, x12
        b.ne    .LBB0_1
// %bb.2:
        adrp    x20, b
        adrp    x21, a
        mov     w19, wzr
        add     x20, x20, :lo12:b
        add     x21, x21, :lo12:a
.LBB0_3:                                // =>This Loop Header: Depth=1
                                        //     Child Loop BB0_4 Depth 2
        mov     w22, #20000
        mov     x23, x21
        mov     x24, x20
.LBB0_4:                                //   Parent Loop BB0_3 Depth=1
                                        // =>  This Inner Loop Header: Depth=2
        ldr     s0, [x23], #4
        fcvt    d0, s0
        bl      sin
        fcvt    s0, d0
        subs    x22, x22, #4            // =4
        str     s0, [x24], #4
        b.ne    .LBB0_4
// %bb.5:                               //   in Loop: Header=BB0_3 Depth=1
        add     w19, w19, #1            // =1
        cmp     w19, #1000              // =1000
        b.ne    .LBB0_3
// %bb.6:
        ldp     x29, x30, [sp, #48]     // 16-byte Folded Reload
        ldp     x20, x19, [sp, #32]     // 16-byte Folded Reload
        ldp     x22, x21, [sp, #16]     // 16-byte Folded Reload
        mov     w0, wzr
        ldp     x24, x23, [sp], #64     // 16-byte Folded Reload
        ret
.Lfunc_end0:
        .size   main, .Lfunc_end0-main
                                        // -- End function
        .type   a,@object               // @a
        .comm   a,20000,4
        .type   b,@object               // @b
        .comm   b,20000,4

        .ident  "clang version 9.0.1 "
        .section        ".note.GNU-stack","",@progbits

@shibatch
Copy link
Owner

I was wrong in the last comment.
M is set to 100000 in the aarch64 sleef code. This should be 5000.
I ran the programs on my computers, and the aarch64 sleef binary runs at least faster than the scalar binaries.

@shibatch
Copy link
Owner

sin scalar : 0.366
sin sleef : 0.175
sinh scalar : 0.367
sinh sleef : 0.116

on ODROID-N2

@digikar99
Copy link
Author

digikar99 commented Feb 21, 2021

Oops, that was a typo in the gist; the code downstream was 5000.

Could you share the output of lscpu or cat /proc/cpuinfo of your aarch64 cpu - the features in particular?

Edit: found the cpu details - not much of an idea what's wrong then! I'll see if this is reproducible with other owners of the device! Thanks for your time!

@digikar99
Copy link
Author

Could you also confirm if you used Termux or some other way?

@shibatch
Copy link
Owner

I don't have a computer that runs Android.

@digikar99
Copy link
Author

digikar99 commented Feb 23, 2021

Well, okay, even on another device it is slow (this time, ~40x!).

But tried on Debian noroot - and the performance is as expected (sleef is faster).

Also of note is the output of objdump -d lib/libsleef.so > libsleef.txt: it produced a,

  • [aarch64] in Termux, a ~300 MB output;
  • [amd64] on laptop, a ~20MB output
  • [aarch64] in Debian noroot, a ~5MB output

It is true that in Termux, the "usual" root is not at "/" - so, if the build process assumes "/" to be the root, then I may be that some essential files are not being found and hence the difference.

Another observation: grep -n -A50 'Sleef_sinf4_u10' libsleef.txt gets back a lot of ldr and str. While, the file dump in debian noroot contains a good amount of densely packed vector instructions (which should be the case. On Termux, for instance [in case it interests you - or let me know if I should upload the dump to a GDrive or some place else]:

Some lines of `grep -n -A50 'Sleef_sinf4_u10' libsleef.txt`
6743138:00000000019dce5c <Sleef_sinf4_u10>:
6743139- 19dce5c:	a9be7bfd 	stp	x29, x30, [sp, #-32]!
6743140- 19dce60:	f9000bfc 	str	x28, [sp, #16]
6743141- 19dce64:	910003fd 	mov	x29, sp
6743142- 19dce68:	d1401bff 	sub	sp, sp, #0x6, lsl #12
6743143- 19dce6c:	d11103ff 	sub	sp, sp, #0x440
6743144- 19dce70:	914013e8 	add	x8, sp, #0x4, lsl #12
6743145- 19dce74:	91162108 	add	x8, x8, #0x588
6743146- 19dce78:	3d8103e0 	str	q0, [sp, #1024]
6743147- 19dce7c:	3dc103e0 	ldr	q0, [sp, #1024]
6743148- 19dce80:	3d8107e0 	str	q0, [sp, #1040]
6743149- 19dce84:	3dc107e0 	ldr	q0, [sp, #1040]
6743150- 19dce88:	3d810fe0 	str	q0, [sp, #1072]
6743151- 19dce8c:	3dc10fe0 	ldr	q0, [sp, #1072]
6743152- 19dce90:	4ea0f800 	fabs	v0.4s, v0.4s
6743153- 19dce94:	3d810be0 	str	q0, [sp, #1056]
6743154- 19dce98:	3dc10be0 	ldr	q0, [sp, #1056]
6743155- 19dce9c:	52a85f49 	mov	w9, #0x42fa0000            	// #1123680256
6743156- 19dcea0:	b91cefe9 	str	w9, [sp, #7404]
6743157- 19dcea4:	bd5cefe1 	ldr	s1, [sp, #7404]
6743158- 19dcea8:	bd1d1fe1 	str	s1, [sp, #7452]
6743159- 19dceac:	914007ea 	add	x10, sp, #0x1, lsl #12
6743160- 19dceb0:	9134714a 	add	x10, x10, #0xd1c
6743161- 19dceb4:	4d40c942 	ld1r	{v2.4s}, [x10]
6743162- 19dceb8:	3d873fe2 	str	q2, [sp, #7408]
6743163- 19dcebc:	3dc73fe2 	ldr	q2, [sp, #7408]
6743164- 19dcec0:	3d8743e2 	str	q2, [sp, #7424]
6743165- 19dcec4:	3dc743e2 	ldr	q2, [sp, #7424]
6743166- 19dcec8:	3d874fe0 	str	q0, [sp, #7472]
6743167- 19dcecc:	3d874be2 	str	q2, [sp, #7456]
6743168- 19dced0:	3dc74fe0 	ldr	q0, [sp, #7472]
6743169- 19dced4:	3dc74be2 	ldr	q2, [sp, #7456]
6743170- 19dced8:	3d875be0 	str	q0, [sp, #7520]
6743171- 19dcedc:	3d8757e2 	str	q2, [sp, #7504]
6743172- 19dcee0:	3dc75be0 	ldr	q0, [sp, #7520]
6743173- 19dcee4:	3dc757e2 	ldr	q2, [sp, #7504]
6743174- 19dcee8:	6ea0e440 	fcmgt	v0.4s, v2.4s, v0.4s
6743175- 19dceec:	3d8753e0 	str	q0, [sp, #7488]
6743176- 19dcef0:	3dc753e0 	ldr	q0, [sp, #7488]
6743177- 19dcef4:	3d87ebe0 	str	q0, [sp, #8096]
6743178- 19dcef8:	3dc7ebe0 	ldr	q0, [sp, #8096]
6743179- 19dcefc:	3d87f3e0 	str	q0, [sp, #8128]
6743180- 19dcf00:	3dc7f3e0 	ldr	q0, [sp, #8128]
6743181- 19dcf04:	fd0fdfe0 	str	d0, [sp, #8120]
6743182- 19dcf08:	fd4fdfe0 	ldr	d0, [sp, #8120]
6743183- 19dcf0c:	3dc7ebe2 	ldr	q2, [sp, #8096]
6743184- 19dcf10:	3d8807e2 	str	q2, [sp, #8208]
6743185- 19dcf14:	3dc807e2 	ldr	q2, [sp, #8208]
6743186- 19dcf18:	6e024042 	ext	v2.16b, v2.16b, v2.16b, #8
6743187- 19dcf1c:	fd1007e2 	str	d2, [sp, #8200]
6743188- 19dcf20:	fd5007e2 	ldr	d2, [sp, #8200]
--
\```

</details>

@shibatch
Copy link
Owner

I have no idea, but looks like a similar problem to the following one.

android/ndk#82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants