Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault closing application with multiple isolates and displaying an image #634

Open
jslater89 opened this issue Dec 3, 2021 · 17 comments

Comments

@jslater89
Copy link
Contributor

Hover doctor

$ hover doctor
hover: Hover version v0.47.0 running on linux
hover: Sharing packaging tools
hover: darwin-bundle is supported
hover: darwin-dmg is supported
hover: To package darwin-pkg these tools are required: mkbom,xar
hover: Install bomutils from your package manager or from https://github.com/hogliux/bomutils
hover: Install xar from your package manager or from https://github.com/mackyle/xar
hover: To still package darwin-pkg without the required tools installed you need to run hover with the `--docker` flag.
hover: linux-appimage is supported
hover: linux-deb is supported
hover: To package linux-pkg these tools are required: makepkg
hover: You need to be on Arch Linux or another distro that uses pacman as package manager to use this. Installing makepkg on other distros is hard and dangerous.
hover: To still package linux-pkg without the required tools installed you need to run hover with the `--docker` flag.
hover: To package linux-rpm these tools are required: rpmbuild
hover: You need to be on Red Hat Linux or another distro that uses rpm as package manager to use this. Installing rpmbuild on other distros is hard and dangerous.
hover: To still package linux-rpm without the required tools installed you need to run hover with the `--docker` flag.
hover: To package linux-snap these tools are required: snapcraft
hover: Install snapd from your package manager or from https://snapcraft.io/docs/installing-snapd
hover: To still package linux-snap without the required tools installed you need to run hover with the `--docker` flag.
hover: windows-msi is supported
hover: 
hover: Sharing flutter version
Flutter 2.8.0-3.3.pre • channel beta • https://github.com/flutter/flutter.git
Framework • revision 262b70ece1 (2 days ago) • 2021-12-01 13:00:48 -0800
Engine • revision 06a7363b0c
Tools • Dart 2.15.0 (build 2.15.0-268.18.beta)
hover: Flutter engine commit: https://github.com/flutter/engine/commit/06a7363b0cfd4092fe06eb80f829b5fbc94fd32a
hover: Finding out the C compiler version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

hover: Sharing the content of go.mod
module skypirates_flutter/go

go 1.13

require (
        github.com/go-flutter-desktop/go-flutter v0.44.0
        github.com/go-flutter-desktop/plugins/path_provider v0.4.0
        github.com/go-flutter-desktop/plugins/shared_preferences v0.4.3
        github.com/google/uuid v1.3.0 // indirect
        github.com/hajimehoshi/go-mp3 v0.3.2 // indirect
        github.com/hajimehoshi/oto v1.0.1 // indirect
        github.com/jfreymuth/oggvorbis v1.0.3 // indirect
        github.com/jslater89/warble/go v0.1.0
        github.com/mewkiz/pkg v0.0.0-20211102230744-16a6ce8f1b77 // indirect
        github.com/pkg/errors v0.9.1
        golang.org/x/exp v0.0.0-20211129234152-8a230f1f7d7a // indirect
        golang.org/x/image v0.0.0-20211028202545-6944b10bf410 // indirect
        golang.org/x/mobile v0.0.0-20211109191125-d61a72f26a1a // indirect
        golang.org/x/sys v0.0.0-20211124211545-fe61309f8881 // indirect
)

// replace github.com/jslater89/warble/go => /home/jay/development/personal/warble/go
// replace github.com/go-flutter-desktop/go-flutter => /home/jay/development/personal/go-flutter-desktop/go-flutter
hover: Sharing the content of hover.yaml
application-name: Skypirates!
executable-name: ""
package-name: ""
organization-name: ""
license: ""
target: lib/main_desktop.dart
branch: ""
cache-path: ""
opengl: ""
engine-version: ""
hover: Sharing the content of go/cmd
go/cmd/import-path_provider-plugin.go   go/cmd/import-shared_preferences-plugin.go      go/cmd/import-warble-plugin.go  go/cmd/main.go  go/cmd/options.go

Problem description

The application experiences a segfault in the io.flutter.ui thread when closed while additional isolates are running. This happens with the debug engine (which comes from Google) as well as profile/release (which we build). The official Flutter desktop embedding doesn't crash in the same circumstance, so despite the lack of anything out of the project binary in the backtrace, it seems to be go-flutter-desktop related, somehow.

(gdb) backtrace
#0  0x00007ffff78f6e78 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#1  0x00007ffff78fde66 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#2  0x00007ffff77d0a40 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#3  0x00007ffff77d0aa7 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#4  0x00007ffff76cde49 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#5  0x00007ffff79cd485 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#6  0x00007ffff749ed73 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#7  0x00007ffff74a6ce6 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#8  0x00007ffff749ec11 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#9  0x00007ffff74a52b5 in ?? () from /path/to/project/go/build/outputs/linux-debug_unopt/libflutter_engine.so
#10 0x00007ffff5bdb609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007ffff5a05293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I haven't managed to find a minimal reproduction yet—it's happening in a large app that isn't currently in a public repository, and I haven't been able to work out which part is causing the problem. I suspect it'll be easier to narrow down once I get an engine built with debug symbols, but I thought I'd check in here to see if there were any ideas.

@jslater89
Copy link
Contributor Author

jslater89 commented Dec 4, 2021

#0  sk_sp<GrGLCaps>::get (this=0x58) at ../../third_party/skia/include/core/SkRefCnt.h:298
#1  0x00007ffff726c139 in GrGLContextInfo::caps (this=0x0) at ../../third_party/skia/src/gpu/gl/GrGLContext.h:46
#2  0x00007ffff726bd74 in GrGLGpu::glCaps (this=0x7fffc40df250) at ../../third_party/skia/src/gpu/gl/GrGLGpu.h:50
#3  0x00007ffff729a28d in GrGLGpu::deleteSync (this=0x7fffc40df250, sync=0x1) at ../../third_party/skia/src/gpu/gl/GrGLGpu.cpp:4044
#4  0x00007ffff72bc20c in GrGLSemaphore::~GrGLSemaphore (this=0x7fffc40e2d00) at ../../third_party/skia/src/gpu/gl/GrGLSemaphore.cpp:18
#5  0x00007ffff72bc239 in GrGLSemaphore::~GrGLSemaphore (this=0x7fffc40e2d00) at ../../third_party/skia/src/gpu/gl/GrGLSemaphore.cpp:16
#6  0x00007ffff6f11c2c in std::__1::default_delete<GrSemaphore>::operator() (this=0x7fffc40e5f70, __ptr=0x7fffc40e2d00) at ../../third_party/libcxx/include/memory:2338
#7  0x00007ffff6f11bec in std::__1::unique_ptr<GrSemaphore, std::__1::default_delete<GrSemaphore> >::reset (this=0x7fffc40e5f70, __p=0x0) at ../../third_party/libcxx/include/memory:2593
#8  0x00007ffff6f113e9 in std::__1::unique_ptr<GrSemaphore, std::__1::default_delete<GrSemaphore> >::~unique_ptr (this=0x7fffc40e5f70) at ../../third_party/libcxx/include/memory:2547
#9  0x00007ffff6f0e2d3 in GrBackendTextureImageGenerator::RefHelper::~RefHelper (this=0x7fffc40e5f10) at ../../third_party/skia/src/gpu/GrBackendTextureImageGenerator.cpp:41
#10 0x00007ffff6f1157d in SkNVRefCnt<GrBackendTextureImageGenerator::RefHelper>::unref (this=0x7fffc40e5f10) at ../../third_party/skia/include/core/SkRefCnt.h:180
#11 0x00007ffff6f0e84b in GrBackendTextureImageGenerator::~GrBackendTextureImageGenerator (this=0xc7bc70 <kDartIsolateSnapshotData+794128>) at ../../third_party/skia/src/gpu/GrBackendTextureImageGenerator.cpp:81
#12 0x00007ffff6f0e899 in GrBackendTextureImageGenerator::~GrBackendTextureImageGenerator (this=0xc7bc70 <kDartIsolateSnapshotData+794128>) at ../../third_party/skia/src/gpu/GrBackendTextureImageGenerator.cpp:80
#13 0x00007ffff6a4e54c in std::__1::default_delete<SkImageGenerator>::operator() (this=0x7fffc40ebfc8, __ptr=0xc7bc70 <kDartIsolateSnapshotData+794128>) at ../../third_party/libcxx/include/memory:2338
#14 0x00007ffff6a4e4cc in std::__1::unique_ptr<SkImageGenerator, std::__1::default_delete<SkImageGenerator> >::reset (this=0x7fffc40ebfc8, __p=0x0) at ../../third_party/libcxx/include/memory:2593
#15 0x00007ffff6a4e2b9 in std::__1::unique_ptr<SkImageGenerator, std::__1::default_delete<SkImageGenerator> >::~unique_ptr (this=0x7fffc40ebfc8) at ../../third_party/libcxx/include/memory:2547
#16 0x00007ffff6c1e96a in SharedGenerator::~SharedGenerator (this=0x7fffc40ebfc0) at ../../third_party/skia/src/image/SkImage_Lazy.cpp:38
#17 0x00007ffff6c1e91d in SkNVRefCnt<SharedGenerator>::unref (this=0x7fffc40ebfc0) at ../../third_party/skia/include/core/SkRefCnt.h:180
#18 0x00007ffff6c1d8b0 in SkSafeUnref<SharedGenerator> (obj=0x7fffc40ebfc0) at ../../third_party/skia/include/core/SkRefCnt.h:150
#19 0x00007ffff6c1dd7c in sk_sp<SharedGenerator>::~sk_sp (this=0x7fffc40ec020) at ../../third_party/skia/include/core/SkRefCnt.h:251
#20 0x00007ffff6c1e352 in SkImage_Lazy::~SkImage_Lazy (this=0x7fffc40ebff0) at ../../third_party/skia/src/image/SkImage_Lazy.h:21
#21 0x00007ffff6c1e389 in SkImage_Lazy::~SkImage_Lazy (this=0x7fffc40ebff0) at ../../third_party/skia/src/image/SkImage_Lazy.h:21
#22 0x00007ffff696b45f in SkRefCntBase::internal_dispose (this=0x7fffc40ebff0) at ../../third_party/skia/include/core/SkRefCnt.h:98
#23 0x00007ffff62f49a0 in SkRefCntBase::unref (this=0x7fffc40ebff0) at ../../third_party/skia/include/core/SkRefCnt.h:77
#24 0x00007ffff6348d40 in SkSafeUnref<SkImage> (obj=0x7fffc40ebff0) at ../../third_party/skia/include/core/SkRefCnt.h:150
#25 0x00007ffff6348d8c in sk_sp<SkImage>::~sk_sp (this=0x7fffcc4d8470) at ../../third_party/skia/include/core/SkRefCnt.h:251
#26 0x00007ffff7499829 in flutter::DrawImageRectOp::~DrawImageRectOp (this=0x7fffcc4d8430) at ../../flutter/flow/display_list.cc:576
#27 0x00007ffff7483510 in flutter::DisposeOps (ptr=0x7fffcc4d8478 "", end=0x7fffcc4d8530 "") at ../../flutter/flow/display_list.cc:907
#28 0x00007ffff7483207 in flutter::DisplayList::~DisplayList (this=0x7fffcc2de470) at ../../flutter/flow/display_list.cc:1022
#29 0x00007ffff7483669 in flutter::DisplayList::~DisplayList (this=0x7fffcc2de470) at ../../flutter/flow/display_list.cc:1020
#30 0x00007ffff696b45f in SkRefCntBase::internal_dispose (this=0x7fffcc2de470) at ../../third_party/skia/include/core/SkRefCnt.h:98
#31 0x00007ffff62f49a0 in SkRefCntBase::unref (this=0x7fffcc2de470) at ../../third_party/skia/include/core/SkRefCnt.h:77
#32 0x00007ffff74cc80f in flutter::SkiaUnrefQueue::Drain (this=0x7fffc40e19e0) at ../../flutter/flow/skia_gpu_object.cc:44
#33 0x00007ffff74cdd3d in flutter::SkiaUnrefQueue::Unref(SkRefCnt*)::$_0::operator()() const (this=0x7fffc4000ec8) at ../../flutter/flow/skia_gpu_object.cc:30
#34 0x00007ffff74cdcfd in std::__1::__invoke<flutter::SkiaUnrefQueue::Unref(SkRefCnt*)::$_0&> (__f=...) at ../../third_party/libcxx/include/type_traits:3530
#35 0x00007ffff74cdcad in std::__1::__invoke_void_return_wrapper<void>::__call<flutter::SkiaUnrefQueue::Unref(SkRefCnt*)::$_0&>(flutter::SkiaUnrefQueue::Unref(SkRefCnt*)::$_0&) (__args=...) at ../../third_party/libcxx/include/__functional_base:348
#36 0x00007ffff74cdc7d in std::__1::__function::__alloc_func<flutter::SkiaUnrefQueue::Unref(SkRefCnt*)::$_0, std::__1::allocator<flutter::SkiaUnrefQueue::Unref(SkRefCnt*)::$_0>, void ()>::operator()() (this=0x7fffc4000ec8)
    at ../../third_party/libcxx/include/functional:1533
#37 0x00007ffff74ccfd9 in std::__1::__function::__func<flutter::SkiaUnrefQueue::Unref(SkRefCnt*)::$_0, std::__1::allocator<flutter::SkiaUnrefQueue::Unref(SkRefCnt*)::$_0>, void ()>::operator()() (this=0x7fffc4000ec0)
    at ../../third_party/libcxx/include/functional:1707
#38 0x00007ffff62ff8d2 in std::__1::__function::__value_func<void ()>::operator()() const (this=0x7fffd62c0cf0) at ../../third_party/libcxx/include/functional:1860
#39 0x00007ffff62ff875 in std::__1::function<void ()>::operator()() const (this=0x7fffd62c0cf0) at ../../third_party/libcxx/include/functional:2419
#40 0x00007ffff63e8ec4 in fml::MessageLoopImpl::FlushTasks (this=0x7fffc4000b80, type=fml::FlushType::kAll) at ../../flutter/fml/message_loop_impl.cc:130
#41 0x00007ffff63e8d9a in fml::MessageLoopImpl::RunExpiredTasksNow (this=0x7fffc4000b80) at ../../flutter/fml/message_loop_impl.cc:143
#42 0x00007ffff64032ba in fml::MessageLoopLinux::OnEventFired (this=0x7fffc4000b80) at ../../flutter/fml/platform/linux/message_loop_linux.cc:89
#43 0x00007ffff640326c in fml::MessageLoopLinux::Run (this=0x7fffc4000b80) at ../../flutter/fml/platform/linux/message_loop_linux.cc:70
#44 0x00007ffff63e8d39 in fml::MessageLoopImpl::DoRun (this=0x7fffc4000b80) at ../../flutter/fml/message_loop_impl.cc:96
#45 0x00007ffff63e7e9d in fml::MessageLoop::Run (this=0x7fffc4000b60) at ../../flutter/fml/message_loop.cc:49
#46 0x00007ffff63fdfe7 in fml::Thread::Thread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::$_0::operator()() const (this=0xb7d288 <dart::observatory::observatory_assets_archive_+7042104>)
    at ../../flutter/fml/thread.cc:35
#47 0x00007ffff63fdf3d in std::__1::__invoke<fml::Thread::Thread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::$_0> (__f=...) at ../../third_party/libcxx/include/type_traits:3530
#48 0x00007ffff63fdee5 in std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, fml::Thread::Thread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::$_0>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, fml::Thread::Thread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::$_0>&, std::__1::__tuple_indices<>) (__t=...) at ../../third_party/libcxx/include/thread:341
#49 0x00007ffff63fdc78 in std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, fml::Thread::Thread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::$_0> >(void*) (__vp=0xb7d280 <dart::observatory::observatory_assets_archive_+7042096>) at ../../third_party/libcxx/include/thread:351
#50 0x00007ffff3e33609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#51 0x00007ffff3c5d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@jslater89
Copy link
Contributor Author

A minimal example is an app which displays an image, starts a background isolate, and calls SystemNavigator.pop. See: https://github.com/jslater89/go-flutter-desktop-repro

It seems like it must be in the drawing code somewhere, but I haven't quite tracked down what happens on close yet.

@jslater89
Copy link
Contributor Author

Following some gdb-fu today, the crash happens because a GrGLGpu is used after free in the multiple-isolate case: its destructor runs, then its deleteSync method is called.

I'm not sure why, yet, but the message loop stuff in the backtrace seems plausible? One message loop per isolate, wires crossed somehow so that both loops end up with an event that leads to deleting a GrGLGpu shared between threads?

I may not have time to dig in any more this weekend, but I'll continue nosing around as I have time.

@jslater89
Copy link
Contributor Author

I had a little more time to look at things today, and found confirmation that it's related to the message loop/task runner setup: with multiple isolates and an image drawn, EventLoop.PostTask gets called from a non-main thread.

Out of time for now, but I have two avenues of attack now: tracing the non-main call to the event loop backward, and working forward from the two-isolates-no-images case.

@pchampio
Copy link
Member

pchampio commented Dec 6, 2021

Sorry for the late response, I'm out also out of time^^
I did the following modification in

a.engine.RunTask, // Flush tasks

|
v

	eventLoop := newEventLoop(
		glfw.PostEmptyEvent, // Wakeup GLFW
		func(t *embedder.FlutterTask) error {
			if !a.window.ShouldClose(){ 
				return a.engine.RunTask(t)
			}
			return nil
		},    // Flush tasks
	)

But it didn't work out!

Task thread assignment is handled by the flutter engine framework with:

go-flutter/event-loop.go

Lines 52 to 56 in a180924

// RunOnCurrentThread return true if tasks posted on the
// calling thread will be run on that same thread.
func (t *EventLoop) RunOnCurrentThread() bool {
return currentthread.Equal(currentthread.ID(), t.mainThreadID)
}

Maybe this requirement isn't meet:

go-flutter/event-loop.go

Lines 58 to 60 in a180924

// PostTask posts a Flutter engine tasks to the event loop for delayed execution.
// PostTask must ALWAYS be called on the same goroutine/thread as `newEventLoop`
func (t *EventLoop) PostTask(task embedder.FlutterTask, targetTimeNanos uint64) {

@pchampio
Copy link
Member

pchampio commented Dec 6, 2021

The errors occurs when we call:

res := (Result)(C.FlutterEngineShutdown(flu.Engine))
if res != ResultSuccess {
return res.GoError("engine.Shutdown()")
}

@jslater89
Copy link
Contributor Author

I did the following modification [...]

Ooh, good thought.

I have this addition in event-loop.go:

        if !t.RunOnCurrentThread() {
		log.Fatalf("Thread mismatch for task: %v %v %v", currentthread.ID(), t.mainThreadID, task)
	}

Sometime later this week or this weekend, I'll trace that back into the Flutter source to find out where the event loop/tasker is getting shared between threads, and what we can do on the go side to fix it. (Or, alternately, what the default Linux embedder does.)

@jslater89
Copy link
Contributor Author

The event-loop thread mismatch may potentially be barking up the wrong tree. Even without spawning extra isolates, the event loop gets called from the io.flutter.ui thread.

#0  github.com/go-flutter-desktop/go-flutter.(*EventLoop).PostTask (t=0xc0000a4360, task=..., targetTimeNanos=77445180810000) at /home/jay/development/personal/go-flutter-desktop/go-flutter/event-loop.go:63
#1  0x000000000057edd8 in github.com/go-flutter-desktop/go-flutter.(*EventLoop).PostTask-fm (task=..., targetTimeNanos=77445180810000) at /home/jay/development/personal/go-flutter-desktop/go-flutter/event-loop.go:61
#2  0x0000000000550e7c in github.com/go-flutter-desktop/go-flutter/embedder.proxy_post_task_callback (task=..., targetTimeNanos=77445180810000, userData=0xc0000b8420) at /home/jay/development/personal/go-flutter-desktop/go-flutter/embedder/embedder_proxy.go:94
#3  github.com/go-flutter-desktop/go-flutter/embedder._cgoexpwrap_1d54f1096e3c_proxy_post_task_callback (p0=..., p1=77445180810000, p2=0xc0000b8420) at _cgo_gotypes.go:917
#4  0x000000000047c0ab in runtime.call32 () at /usr/lib/go-1.13/src/runtime/asm_amd64.s:539
#5  0x000000000042a067 in runtime.cgocallbackg1 (ctxt=0) at /usr/lib/go-1.13/src/runtime/cgocall.go:314
#6  0x0000000000429e11 in runtime.cgocallbackg (ctxt=0) at /usr/lib/go-1.13/src/runtime/cgocall.go:191
#7  0x000000000047d67b in runtime.cgocallback_gofunc () at /usr/lib/go-1.13/src/runtime/asm_amd64.s:793
#8  0x000000000047ddc1 in runtime.goexit () at /usr/lib/go-1.13/src/runtime/asm_amd64.s:1357
#9  0x0000000000000000 in ?? ()
(gdb) info threads
  Id   Target Id                                            Frame 
  1    Thread 0x7ffff3af4bc0 (LWP 128988) "test"            0x00007ffff3d4e12b in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7fffffffdbc0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7fffffffdbb0)
    at ../sysdeps/unix/sysv/linux/select.c:41
  2    Thread 0x7ffff18a2700 (LWP 129009) "test"            runtime.usleep () at /usr/lib/go-1.13/src/runtime/sys_linux_amd64.s:131
  3    Thread 0x7fffebfff700 (LWP 129010) "test"            runtime.futex () at /usr/lib/go-1.13/src/runtime/sys_linux_amd64.s:536
  4    Thread 0x7ffff10a1700 (LWP 129011) "test"            runtime.futex () at /usr/lib/go-1.13/src/runtime/sys_linux_amd64.s:536
  5    Thread 0x7ffff08a0700 (LWP 129012) "test"            runtime.futex () at /usr/lib/go-1.13/src/runtime/sys_linux_amd64.s:536
  6    Thread 0x7fffeb7fe700 (LWP 129013) "test"            runtime.futex () at /usr/lib/go-1.13/src/runtime/sys_linux_amd64.s:536
  7    Thread 0x7fffeaffd700 (LWP 129014) "test"            runtime.futex () at /usr/lib/go-1.13/src/runtime/sys_linux_amd64.s:536
* 8    Thread 0x7fffe8df5700 (LWP 129040) "io.flutter.ui"   github.com/go-flutter-desktop/go-flutter.(*EventLoop).PostTask (t=0xc0000a4360, task=..., targetTimeNanos=77445180810000) at /home/jay/development/personal/go-flutter-desktop/go-flutter/event-loop.go:63
  9    Thread 0x7fffd22c1700 (LWP 129041) "io.flutter.io"   0x00007ffff3d585ce in epoll_wait (epfd=22, events=0x7fffd22c0da8, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  10   Thread 0x7fffd1ac0700 (LWP 129042) "io.worker.1"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  11   Thread 0x7fffd12bf700 (LWP 129043) "io.worker.2"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  12   Thread 0x7fffd0abe700 (LWP 129044) "io.worker.3"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  13   Thread 0x7fffc9be5700 (LWP 129045) "io.worker.4"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  14   Thread 0x7fffc93e4700 (LWP 129046) "io.worker.5"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  15   Thread 0x7fffc8be3700 (LWP 129047) "io.worker.6"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  16   Thread 0x7fffc3fff700 (LWP 129048) "io.worker.7"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  17   Thread 0x7fffc37fe700 (LWP 129049) "io.worker.8"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
--Type <RET> for more, q to quit, c to continue without paging--
  18   Thread 0x7fffc2ffd700 (LWP 129050) "io.worker.9"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  19   Thread 0x7fffc27fc700 (LWP 129051) "io.worker.10"    futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  20   Thread 0x7fffc1ffb700 (LWP 129052) "io.worker.11"    futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  21   Thread 0x7fffc17fa700 (LWP 129053) "io.worker.12"    futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa7f840) at ../sysdeps/nptl/futex-internal.h:183
  22   Thread 0x7fffe8530700 (LWP 129054) "dart:io EventHa" 0x00007ffff3d585ce in epoll_wait (epfd=27, events=0x7fffe852fda0, maxevents=16, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  23   Thread 0x7fffd01ff700 (LWP 129055) "DartWorker"      futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fffd01fe930, clockid=<optimized out>, expected=0, futex_word=0xa1af48) at ../sysdeps/nptl/futex-internal.h:320
  24   Thread 0x7fffc0dff700 (LWP 129056) "DartWorker"      futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fffc0dfe930, clockid=<optimized out>, expected=0, futex_word=0xa1af4c) at ../sysdeps/nptl/futex-internal.h:320
  25   Thread 0x7fffc0cfe700 (LWP 129057) "DartWorker"      futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fffc0cfd930, clockid=<optimized out>, expected=0, futex_word=0xa1af48) at ../sysdeps/nptl/futex-internal.h:320
  26   Thread 0x7fffc08ff700 (LWP 129058) "DartWorker"      0x00007ffff7d6aa0b in dart::kernel::Reader::ReadUInt (this=0x7fffc08fd688) at ../../third_party/dart/runtime/vm/kernel_binary.h:313

@jslater89
Copy link
Contributor Author

That was barking up the wrong tree, looking at embedder.h: PostTask doesn't need to be called from the same thread as newEventLoop, so long as the posted task runs on that thread, which is what happens.

I think it has to be something in the task scheduling somewhere, going by the fact that Flutter is trying to free the skia image after freeing the GPU. I just can't work out how it's happening yet.

@pchampio
Copy link
Member

pchampio commented Dec 9, 2021

The event-loop thread mismatch may potentially be barking up the wrong tree. Even without spawning extra isolates, the event loop gets called from the io.flutter.ui thread.

Yea, IIRC flutter has multiples task runners: https://github.com/flutter/flutter/wiki/The-Engine-architecture#threading
And it's up to the engine to select the correct runner to execute a given task on.

We must inform the engine which was the thread used to initialized the graphic context (main thread) and this is done through:

go-flutter/event-loop.go

Lines 52 to 56 in a180924

// RunOnCurrentThread return true if tasks posted on the
// calling thread will be run on that same thread.
func (t *EventLoop) RunOnCurrentThread() bool {
return currentthread.Equal(currentthread.ID(), t.mainThreadID)
}

This enables the engine to execute task of the ui runner on the go-flutter main thread. Seems Logical.

newEventLoop, so long as the posted task runs on that thread, which is what happens.

Yea correct! The comment is wrong.

What is bugging me is that the error occurs on FlutterEngineShutdown so no tasks should run after this?
Maybe I'm wrong and we should empty the task lists?

go-flutter/event-loop.go

Lines 17 to 18 in a180924

// store the task (event) by their priorities
priorityqueue *priorityqueue.PriorityQueue

@jslater89
Copy link
Contributor Author

jslater89 commented Dec 9, 2021

On second thought, I think you're right—I'd have to step through it with the debugger again, but it does seem like the event loop is no longer looping by the time we get to FlutterEngineShutdown. That is, it's blocked while FlutterEngineShutdown happens, and outside the loop that actually runs tasks.

Hmm. Maybe I need to see if I can find where Flutter tears down the root isolate, and make sure that's not happening more than once for some reason? But it doesn't seem like it is, based on breakpoints and what hits them. It does, rather, seem like a simple 'things happening out of order' problem.

@jslater89
Copy link
Contributor Author

Working forward from startup, it looks like the two threads that see segfaults, in my experience (io.flutter.io and io.flutter.ui) are engine-managed, and created/managed by Flutter. (The only two we're even allowed to provide are the platform and raster threads.)

flutter::TaskRunners task_runners(
      kFlutterThreadName,
      platform_task_runner,                    // platform
      render_task_runner,                      // raster
      thread_host.ui_thread->GetTaskRunner(),  // ui (always engine managed)
      thread_host.io_thread->GetTaskRunner()   // io (always engine managed)
  );

So whatever's happening is at least partially the Flutter engine's fault, maybe? It doesn't seem to be happening on a thread managed by us, at least. I'll have to look at what the other embedders are doing, too. Maybe we need to flush our message loops on exit prior to shutdown, or something.

I noticed that the FlutterEmbedderGLFW example in the flutter engine repository never actually calls FlutterEngineShutdown. That seems like potentially poor practice, but it does keep the crash from happening, so I can continue work on the project I'm using this embedder for in parallel with looking into this bug.

@jslater89 jslater89 changed the title Segmentation fault closing application with multiple isolates Segmentation fault closing application with multiple isolates and displaying an image Dec 10, 2021
@jslater89
Copy link
Contributor Author

I did some stepping through the flutter engine shutdown procedure tonight, without much to show for it. There are some bits and bobs in shell.cc that tear things down cleanly. They run the platform and render tasks on the current thread correctly, and queue some work on the UI and IO threads that is also correctly time.

Then, after all the teardown is seemingly done, some events shake out of the IO thread's message loop task queue and try to free some Skia image objects after the GPU is gone. I think my next step is going to be two runs of the shutdown process, one with two isolates and one with one, and breakpoints on the particular Skia objects that get torn down late, so I can see where/how the Flutter code interacts with them and why they're in the IO queue late.

@pchampio
Copy link
Member

With your previous message I now remember that we are doing something particular in:

FlutterCustomTaskRunners custom_task_runners = {};
custom_task_runners.struct_size = sizeof(FlutterCustomTaskRunners);
// Render task and platform task are handled by the same TaskRunner
custom_task_runners.platform_task_runner = &platform_task_runner;
custom_task_runners.render_task_runner = &platform_task_runner;
Args->custom_task_runners = &custom_task_runners;

And also: #134 (comment)

@jslater89
Copy link
Contributor Author

Some lunchtime investigation suggests that letting Flutter manage the render task runner doesn't make a difference, which makes sense. Flutter's teardown code seems to occur in the order I'd expect it to.

  fml::TaskRunner::RunNowOrPostTask(
      task_runners_.GetUITaskRunner(),
      fml::MakeCopyable([this, &ui_latch]() mutable {
        engine_.reset();
        ui_latch.Signal();
      }));
  ui_latch.Wait();

  fml::TaskRunner::RunNowOrPostTask(
      task_runners_.GetRasterTaskRunner(),
      fml::MakeCopyable(
          [this, rasterizer = std::move(rasterizer_), &gpu_latch]() mutable {
            rasterizer.reset();
            this->weak_factory_gpu_.reset();
            gpu_latch.Signal();
          }));
  gpu_latch.Wait();

  fml::TaskRunner::RunNowOrPostTask(
      task_runners_.GetIOTaskRunner(),
      fml::MakeCopyable([io_manager = std::move(io_manager_),
                         platform_view = platform_view_.get(),
                         &io_latch]() mutable {
        io_manager.reset();
        if (platform_view) {
          platform_view->ReleaseResourceContext();
        }
        io_latch.Signal();
      }));

  io_latch.Wait();

  // The platform view must go last because it may be holding onto platform side
  // counterparts to resources owned by subsystems running on other threads. For
  // example, the NSOpenGLContext on the Mac.
  fml::TaskRunner::RunNowOrPostTask(
      task_runners_.GetPlatformTaskRunner(),
      fml::MakeCopyable([platform_view = std::move(platform_view_),
                         &platform_latch]() mutable {
        platform_view.reset();
        platform_latch.Signal();
      }));
  platform_latch.Wait();

Today's lunchtime work says that it has to do with the SkiaUnrefQueue owned by ShellIOManager. (ShellIOManager works on the IO task runner/thread, and its responsibilities seem mainly limited to loading and processing images.) The queue's Drain method is called during the teardown code above, before the GrGLGpu object gets destroyed (I think; it's been a bit), which clears out all the events in it. There's a note that it's the caller's responsibility to ensure that no further unrefs get queued after draining the queue manually, but I think that carries with it the implication that the reason you shouldn't queue unrefs after manual draining is that you've probably also deleted all of the resources the unref needs to succeed.

Later on, after the teardown code above completes, another event shows up in io_manager's unref queue. It comes from Picture's dispose method. Picture has a reference to the io_manager unref queue since, as the name suggests, it manages the loaded image. So, for some reason, when tearing down the app with two isolates, the UI is disposed of (or the UI dispose tasks run) after the renderer teardown finishes.

Getting closer now!

@jslater89
Copy link
Contributor Author

I found the first major difference between the background isolate and no-background-isolate case.

With no background isolate, SkiaGPUObject<SkImage>::reset() happens on the io.flutter.ui thread. The backtrace indicates that it's happening in the UI message loop, as a consequence of the RuntimeController being deleted in RuntimeController::~RuntimeController(), which itself happens as the shell is tearing down the message loops. I think that means reset pushes the unref job to the io_manager's unref queue, which then gets to drain it when the io thread is tearing down.

With a background isolate, SkiaGPUObject<SkImage>::reset() happens on a plain old DartWorker thread, as the Dart VM is tearing down an isolate group, outside of the UI message loop. I think that means there's no guarantee it happens before the IO thread teardown stuff happens, which is the point at which it's no longer safe to be unreffing images.

This seems potentially meaningful, but also maybe not our fault? I think my next step is probably going to be opening an issue in the Flutter repository to see if they'll take this seriously.

@pchampio
Copy link
Member

Yea, I think we should open an issue on the flutter repo.
Without additional information, I don't see how calls to FlutterEngineShutdown can result in a segfault.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants