Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. problems on Android using termux #156

Open
Vindaar opened this issue Jun 23, 2020 · 4 comments
Open

Misc. problems on Android using termux #156

Vindaar opened this issue Jun 23, 2020 · 4 comments
Labels
bug 🪲 Something isn't working

Comments

@Vindaar
Copy link
Contributor

Vindaar commented Jun 23, 2020

Trying to run code using Weave on Android within termux results in (at least two) different problems.

This is running on a Oneplus 6 with:

u0_a198 at localhost in ~/src/weave ツ uname -a
Linux localhost 4.9.179-perf+ #1 SMP PREEMPT Sat Feb 22 01:01:52 CST 2020 aarch64 Android
u0_a198 at localhost in ~/src/weave ツ nim --version
Nim Compiler Version 1.2.2 [Android: arm64]
Compiled at 2020-06-16
Copyright (c) 2006-2020 by Andreas Rumpf

git hash: be34b5abe23e027b71a3a9da67037aac7afea6bc
active boot switches: -d:release

First of all trying to run the weave test suite results in a sigsegv:

u0_a198 at localhost in ~/src/weave ツ nimble test
  Executing task test in /data/data/com.termux/files/home/src/weave/weave.nimble

========================================================================================
Running [ c  ] weave/cross_thread_com/channels_spsc_single.nim
========================================================================================
Testing if 2 threads can send data
-----------------------------------
Traceback (most recent call last)
channels_spsc_single.nim(187) channels_spsc_single
channels_spsc_single.nim(171) main
memory_pools.nim(480)    initialize
memory_pools.nim(243)    newArena
channels_mpsc_unbounded_batch.nim(80) initialize
atomics.nim(302)         store
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Error: execution of an external program failed: '/data/data/com.termux/files/home/src/weave/build/channels_spsc_single '
stack trace: (most recent call last)
/data/data/com.termux/files/home/nimblecache/nimscriptapi.nim(168, 16)
/data/data/com.termux/files/home/src/weave/weave_4720.nims(35, 8) testTask
/data/data/com.termux/files/home/src/weave/weave_4720.nims(32, 8) test
/data/data/com.termux/files/usr/lib/nim/lib/system/nimscript.nim(260, 7) exec
/data/data/com.termux/files/usr/lib/nim/lib/system/nimscript.nim(260, 7) Error: unhandled exception: FAILED: nim c  --verbosity:0 --hints:off --warnings:off --threads:on -d:release --stacktrace:on --linetrace:on --outdir:build -r weave/cross_thread_com/channels_spsc_single.nim [OSError]
     Error: Exception raised during nimble script execution

Originally I tried to run @HugoGranstrom's nimbody benchmark for fun, which results in a linker error due to pthread_setaffinity_np not being defined:

/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/.cache/nim/nimbody_benchmark_r/stdlib_system.nim.c.o: in function `pinToCpu__fHDMYDG2VIuvwYmv2GhDRQ':
stdlib_system.nim.c:(.text+0x8458): undefined reference to `pthread_setaffinity_np'
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/home/.cache/nim/nimbody_benchmark_r/@m..@sweave@sweave@sruntime.nim.c.o: in function `init__NAeFbG9bHEd9byvJB8MEex6g':
@m..@sweave@sweave@sruntime.nim.c:(.text+0x214): undefined reference to `pthread_setaffinity_np'
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

If there's anything else that would be useful, let me know.

@mratsim mratsim added the bug 🪲 Something isn't working label Jun 24, 2020
@HJarausch
Copy link

I have a similar problem. I found that there is no pthread_setaffinity_np on Android but Weave tries to call it which gives a linker error.

mratsim added a commit that referenced this issue Mar 7, 2021
@mratsim
Copy link
Owner

mratsim commented Mar 7, 2021

Fixed for Android. I'm unsure about termux, will have to test because my CI does run fine on ARM64 on Travis 32 cores CPUs.

@HJarausch
Copy link

Thanks Mamy. Unfortunately it seems to more complicated.
There are still other problems - see below

First, I've tried a different fix than yours but got an error message
cpu_set(cpu, cpuset) cannot be called which I don't understand

proc sched_setaffinity(
       thread: Pthread,
       cpuset_size: csize_t,
       cpuset: ptr CpuSet
  ) {.header: "<sched.h>".}

...
proc set_thread_affinity(t: Pthread, cpu: int32) {.inline.}=
  var cpuset {.noinit.}: CpuSet
  when defined(osx):
    {.warning: "To improve performance we should pin threads to cores.\n" &
                "This is not possible with MacOS".}
  elif defined(android):
    cpu_zero(cpuset)
    cpu_set(cpu, cpuset)   # error  cpu_set(cpu, cpuset) cannot be called
    sched_setaffinity(Pthread(0), sizeof(cpuset), addr cpuset)
  else:

    cpu_zero(cpuset)
    cpu_set(cpu, cpuset)
    pthread_setaffinity_np(t, sizeof(CpuSet), cpuset)

Back to your fix I need in addtion primitives/affinity_posix.nim

-proc set_thread_affinity(t: Pthread, cpu: int32) {.inline.}=
+proc set_thread_affinity(t: Pthread, cpu: csize_t) {.inline.}=

Furthermore, I need to patch memory/allocs.nim

 when defined(windows):
   proc aligned_alloc(alignment, size: csize_t): pointer {.sideeffect,importc:"_aligned_malloc", header:"<malloc.h>".}
   proc wv_freeAligned*[T](p: ptr T){.sideeffect,importc:"_aligned_free", header:"<malloc.h>".}
-elif defined(osx):
+elif defined(osx) or defined(android):
   proc posix_memalign(mem: var pointer, alignment, size: csize_t){.sideeffect,importc, header:"<stdlib.h>".}
   proc aligned_alloc(alignment, size: csize_t): pointer {.inline.} =
     posix_memalign(result, alignment, size)

And last - worst - I have to patch Nim's /usr/lib/nim/system/threadlocalstorage.nim , as well

@@ -194,8 +194,15 @@
   proc cpusetIncl(cpu: cint; s: var CpuSet) {.
     importc: "CPU_SET", header: schedh.}
 
-  proc setAffinity(thread: SysThread; setsize: csize_t; s: var CpuSet) {.
-    importc: "pthread_setaffinity_np", header: pthreadh.}
+  when defined(android):
+    proc sched_setaffinity(thread: SysThread;  cpuset_size: csize_t, ptrCpuset: ptr  CpuSet) {.
+      importc: "sched_setaffinity", header: schedh.}
+
+    proc setAffinity(thread: SysThread; setsize: csize_t; s: var CpuSet) =
+      sched_setaffinity(thread, setsize, addr s)
+  else:
+    proc setAffinity(thread: SysThread; setsize: csize_t; s: var CpuSet) {.
+      importc: "pthread_setaffinity_np", header: pthreadh.}
 
 
 const

since the linker tries to find pthread_setaffinity_np which doesn't exist.
Should I create a pull request for this?

With all these patches, Weave does work under Termux (Android).
I am not sure if it's possible to improve on your patch disabling setAffinity altogether.

By the way,
a short time ago, I asked for debugging a problem with Weave in my application (Ant colony optimization).
After several hours of hunting a non-existent bug in my code, I found out that I need to
put parallelFor in an syncScope(): block, which is very non-intuitive if one comes from OpenMP or Nim's experimental
parallel clause.

@mratsim
Copy link
Owner

mratsim commented Mar 8, 2021

It's strange that you need to patch Nim itself, I wasn't aware that Android didn't implement the full pthread interface.

After several hours of hunting a non-existent bug in my code, I found out that I need to
put parallelFor in an syncScope(): block, which is very non-intuitive if one comes from OpenMP or Nim's experimental
parallel clause.

You can use awaitable loops like this https://github.com/mratsim/weave#awaitable-loop.

OpenMP evolved from a sync parallelism with a master thread that spawns all the parallel-for work while Weave is fundamentally async and any thread can spawn anything at anytime and can have barriers independently from others.

Furthermore, since OpenMP, decades happened and API for parallelism evolved, there have been many complaints about OpenMP implicit barriers that killed performance by requiring extra synchronization so I decided to remove all barriers by default and use Future/Flowvar (awaitable loops) and structured parallelism (syncScope which is like the async-finish construct in the X10 programming language which built on lessons learned from high-performance computing) to specify all synchronization points.

So I can indeed improve the documentation, but in the end I think it's much better if all synchronizations happens explicitly and is grep-able in the code than if synchronization is implicit. This means that if threads are desync-ed, some synchronization is missing. And it has performance benefits as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants