Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks when using widgets with data #2348

Closed
namezis opened this issue Aug 4, 2021 · 13 comments
Closed

Deadlocks when using widgets with data #2348

namezis opened this issue Aug 4, 2021 · 13 comments
Labels
bug Something isn't working

Comments

@namezis
Copy link

namezis commented Aug 4, 2021

Description:

I have an application with split horizontal layout:

	middle := container.NewHSplit(
		program.MainTree,
		program.DetailsContainer,
	)

Left is tree, and right is place for tree node details form.

program.MainTree = &widget.Tree{}
...
program.DetailsContainer = container.NewMax(widget.NewLabel("No selection"))

When tapped on some of the tree nodes one can get the details to modify node properties.
Some forms have data bind text or checkbox entries. Data bind to global structure fields.

I seem to get very sporadic, I assume dead lock, in fyne.io/fyne/v2@v2.0.3/data/binding/queue.go.
The deadlock happens when I change e.g.: checkbox and quickly move to another node.

This particular fatal error was: sync: Unlock of unlocked RWMutex
In other cases the UI stops refreshing, but it can still be moved and be closed.

runtime.throw(0xb6bfd5, 0x20)
/usr/lib/go-1.16/src/runtime/panic.go:1117 +0x72 fp=0xc0003d3b60 sp=0xc0003d3b30 pc=0x4630f2
sync.throw(0xb6bfd5, 0x20)
/usr/lib/go-1.16/src/runtime/panic.go:1103 +0x35 fp=0xc0003d3b80 sp=0xc0003d3b60 pc=0x497bd5
sync.(*RWMutex).Unlock(0xc000cb34f8)
/usr/lib/go-1.16/src/sync/rwmutex.go:142 +0x6a fp=0xc0003d3bc8 sp=0xc0003d3b80 pc=0x4ab18a
fyne.io/fyne/v2/widget.(*entryRenderer).Refresh(0xc0010a9000)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/entry.go:1246 +0x7a5 fp=0xc0003d3e50 sp=0xc0003d3bc8 pc=0x82a145
fyne.io/fyne/v2/widget.(*BaseWidget).Refresh(0xc001927200)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/widget.go:139 +0x82 fp=0xc0003d3ea0 sp=0xc0003d3e50 pc=0x84e7a2
fyne.io/fyne/v2/widget.(*Entry).Bind.func2()
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/entry.go:140 +0x1dc fp=0xc0003d3f58 sp=0xc0003d3ea0 pc=0x84f6bc
fyne.io/fyne/v2/data/binding.(*listener).DataChanged(0xc001301890)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/data/binding/binding.go:55 +0x2a fp=0xc0003d3f68 sp=0xc0003d3f58 pc=0x814b6a
fyne.io/fyne/v2/data/binding.DataListener.DataChanged-fm()
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/data/binding/binding.go:40 +0x44 fp=0xc0003d3f90 sp=0xc0003d3f68 pc=0x818ee4
fyne.io/fyne/v2/data/binding.processItems()
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/data/binding/queue.go:56 +0x71 fp=0xc0003d3fe0 sp=0xc0003d3f90 pc=0x818711
runtime.goexit()
/usr/lib/go-1.16/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc0003d3fe8 sp=0xc0003d3fe0 pc=0x49c161
created by fyne.io/fyne/v2/data/binding.init.0
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/data/binding/queue.go:15 +0x35

from main form:

goroutine 14 [runnable]:
sync.(*RWMutex).RUnlock(0xc0001582d0)
/usr/lib/go-1.16/src/sync/rwmutex.go:75 +0x65
fyne.io/fyne/v2/app.(*settings).Theme(0xc0001582d0, 0xbaee98, 0xc00007e7e0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/app/settings.go:65 +0xa6
fyne.io/fyne/v2/theme.current(0x0, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/theme/theme.go:336 +0x9f
fyne.io/fyne/v2/theme.safeColorLookup(0xb54ee5, 0x6, 0x0, 0x0, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/theme/theme.go:579 +0x31
fyne.io/fyne/v2/theme.ShadowColor(0x0, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/theme/theme.go:437 +0x51
fyne.io/fyne/v2/widget.(*Entry).CreateRenderer(0xc001927500, 0x0, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/entry.go:164 +0xc9
fyne.io/fyne/v2/internal/cache.Renderer(0xbb4188, 0xc001927500, 0x0, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/internal/cache/widget.go:29 +0x2a4
fyne.io/fyne/v2/widget.(*BaseWidget).MinSize(0xc001927500, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/widget.go:86 +0x65
fyne.io/fyne/v2/widget.(*Entry).MinSize(0xc001927500, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/entry.go:374 +0x74
fyne.io/fyne/v2/layout.(*formLayout).tableCellsSize(0x1372f70, 0xc0002acf00, 0xa, 0xa, 0x0, 0x0, 0x0, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/layout/formlayout.go:58 +0x2d9
fyne.io/fyne/v2/layout.(*formLayout).MinSize(0x1372f70, 0xc0002acf00, 0xa, 0xa, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/layout/formlayout.go:111 +0x74
fyne.io/fyne/v2.NewContainerWithLayout(0xbabd90, 0x1372f70, 0xc0002acf00, 0xa, 0xa, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/container.go:47 +0x109
fyne.io/fyne/v2/container.New(0xbabd90, 0x1372f70, 0xc0002acf00, 0xa, 0xa, 0x0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/container/container.go:12 +0x5c
user.app/app/objects.(*Controller).GetDetailsForm(0xc000430710, 0xc000312da0, 0x1b, 0x0, 0x0) <- My function creating layout with data bind elements
/home/user/app/objects/controller.go:128 +0xe85
main.main.func23(0xc000312da0, 0x1b)
/home/user/app/main.go:182 +0x733
fyne.io/fyne/v2/widget.(*Tree).Select(0xc000166d00, 0xc000312da0, 0x1b)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/tree.go:224 +0x405
fyne.io/fyne/v2/widget.(*treeNode).Tapped(0xc00044b320, 0xc0014c9330)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/tree.go:625 +0x5e
fyne.io/fyne/v2/internal/driver/glfw.(*window).mouseClicked.func7()
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/internal/driver/glfw/window.go:799 +0x53
fyne.io/fyne/v2/internal/driver/glfw.(*window).runEventQueue(0xc0001741c0)
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/internal/driver/glfw/window.go:1265 +0xc3
created by fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).createWindow.func1
/home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/internal/driver/glfw/window.go:1289 +0x1ac

It could be that I am doing something wrong, but currently I cannot see where. All my go routines are parked in correct places.

To Reproduce and Example code:

The application is not one that I can paste here. If it helps fyne I am willing to prepare simplified version and add reference here.
I want to highlight here that I can work with this app for hours without problems. At times I can have several 100s of nodes on the tree.

Screenshots:

MyApp_659

Device:

  • Linux
  • Mint 19.3 Cinnamon
  • go1.16.4 linux/amd64
  • fyne/v2@v2.0.3
@namezis namezis added the unverified A bug that has been reported but not verified label Aug 4, 2021
@Jacalz
Copy link
Member

Jacalz commented Aug 4, 2021

Please add a short code example that we can use to replicate. It makes it much easier asker to try and replicate and having one also means that we can rule out the potential that the error could be in your own code :)

@namezis
Copy link
Author

namezis commented Aug 5, 2021

I have reproduced this deadlock on a smaller version of code.

The call stack in VS Code is:

Selection_682

First mutex:
Selection_683

Second mutex:
Selection_684

In this case all the widgets are still rendering but do not respond to events, except the main window frame which can be resized and closed.

This is the main window view when the deadlock happened:

Tree Example_685

To replicate; the procedure is to move quickly between the tree nodes e.g: "sample6" and "sample9", but before the switch change the selection or other data on "sample6". I think hence the binding.(*listener).DataChanged. The event is trying to update - I assume the data changed - but it clashes with the new layout creation somehow.

Selection_686

To replicate the dead lock one needs to exercise the node switch several times. This may not happen for hours, and then suddenly in less expected moment.

I admit the indexing of the tree nodes is not the fastest, but it will do for now.

Sample source code can be found here: https://github.com/namezis/fyne-tree-with-details-form

I stopped using the widget components "*WithData" for critical parts of the code, and the app seems to be more stable.
The alternative if to use standard components with OnChange to update the new values.

Any comments are welcome. Do not hesitate to ask for more help.

@andydotxyz
Copy link
Member

Thanks this info is really helpful.
I can't quite track the cause, neither of those locks holds it open - both are property access.
Something else in the stack must have the lock held I think.

@namezis
Copy link
Author

namezis commented Aug 7, 2021

That is correct.

In another instance of this problem.

List of goroutines:

Goroutine 1 - Start: /usr/lib/go-1.16/src/runtime/proc.go:115 runtime.main (0x45dfc0) [select]
Goroutine 2 - Start: /usr/lib/go-1.16/src/runtime/proc.go:267 runtime.forcegchelper (0x45e3e0) [force gc (idle) 452306h3m26.302676813s]
Goroutine 3 - Start: /usr/lib/go-1.16/src/runtime/mgcsweep.go:156 runtime.bgsweep (0x44a000) [GC sweep wait]
Goroutine 4 - Start: /usr/lib/go-1.16/src/runtime/mgcscavenge.go:252 runtime.bgscavenge (0x448020) [GC scavenge wait]
Goroutine 18 - Start: /usr/lib/go-1.16/src/runtime/mfinal.go:161 runtime.runfinq (0x43f1a0) [finalizer wait 452306h3m26.302764095s]
Goroutine 19 - Start: /usr/lib/go-1.16/src/sync/once.go:42 sync.(*Once).Do (0x4a8720) [sleep 452306h1m26.286461985s]
Goroutine 20 - Start: /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/data/binding/queue.go:21 fyne.io/fyne/v2/data/binding.makeInfiniteQueue.func1 (0x682ce0) [select]
Goroutine 21 - Start: /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/data/binding/queue.go:52 fyne.io/fyne/v2/data/binding.processItems (0x682c20) [semacquire]
Goroutine 22 - Start: /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/app/app.go:118 fyne.io/fyne/v2/app.newAppWithDriver.func1 (0x709140) [chan receive 452306h3m26.302925664s]
Goroutine 23 - Start: /usr/lib/go-1.16/src/runtime/mgc.go:1877 runtime.gcBgMarkWorker (0x442ea0) [GC worker (idle)]
Goroutine 34 - Start: /home/user/go/pkg/mod/github.com/fsnotify/fsnotify@v1.4.9/inotify.go:172 github.com/fsnotify/fsnotify.(*Watcher).readEvents (0x704ce0) (thread 6369)
Goroutine 35 - Start: /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/app/settings_desktop.go:41 fyne.io/fyne/v2/app.watchFile.func1 (0x709540) [chan receive 452306h3m26.303035367s]
Goroutine 36 - Start: /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/internal/driver/glfw/loop.go:181 fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).startDrawThread.func1 (0x6c2a00) [select]
Goroutine 37 - Start: /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/internal/driver/glfw/window.go:1259 fyne.io/fyne/v2/internal/driver/glfw.(*window).runEventQueue (0x6c13e0) [semacquire]
Goroutine 38 - Start: /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/internal/driver/glfw/canvas.go:493 fyne.io/fyne/v2/internal/driver/glfw.(*glCanvas).setupThemeListener.func1 (0x6c2160) [chan receive 452306h3m26.303124733s]
Goroutine 39 - Start: /usr/lib/go-1.16/src/runtime/mgc.go:1877 runtime.gcBgMarkWorker (0x442ea0) [GC worker (idle) 452306h3m26.303204685s]
Goroutine 40 - Start: /usr/lib/go-1.16/src/runtime/mgc.go:1877 runtime.gcBgMarkWorker (0x442ea0) [GC worker (idle)]
Goroutine 41 - Start: /usr/lib/go-1.16/src/runtime/mgc.go:1877 runtime.gcBgMarkWorker (0x442ea0) [GC worker (idle)]
Goroutine 50 - Start: /usr/lib/go-1.16/src/runtime/mgc.go:1877 runtime.gcBgMarkWorker (0x442ea0) [GC worker (idle)]
Goroutine 51 - Start: /usr/lib/go-1.16/src/runtime/mgc.go:1877 runtime.gcBgMarkWorker (0x442ea0) [GC worker (idle)]
Goroutine 52 - Start: /usr/lib/go-1.16/src/runtime/mgc.go:1877 runtime.gcBgMarkWorker (0x442ea0) [GC worker (idle)]
Goroutine 53 - Start: /usr/lib/go-1.16/src/runtime/mgc.go:1877 runtime.gcBgMarkWorker (0x442ea0) [GC worker (idle)]
Goroutine 67 - Start: /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/internal/painter/svg_cache.go:107 fyne.io/fyne/v2/internal/painter.SvgCacheMonitorTheme.func1 (0x659ca0) [chan receive 452306h3m26.303413944s]

Goroutine 21 frame 5 at /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/widget.go:214 (PC: 0x6a0c65)
209: })
210: }
211:
212: // Disabled returns true if this widget is currently disabled or false if it can currently be interacted with.
213: func (w *DisableableWidget) Disabled() bool {
=> 214: w.propertyLock.RLock()
215: defer w.propertyLock.RUnlock()
216:
217: return w.disabled
218: }
219:

Goroutine 37 frame 5 at /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/widget.go:55 (PC: 0x6a023c)
50: w.propertyLock.RUnlock()
51: if baseSize == size {
52: return
53: }
54:
=> 55: w.propertyLock.Lock()
56: w.size = size
57: w.propertyLock.Unlock()
58:
59: if impl == nil {
60: return

I have attached the whole stack: stack-trace-lock-r1.txt
There is no code that is being processed inside the lock.

Does this mean that some code went over the w.propertyLock and left it locked?

Or, it could be that the lock is not yet initialized?

@namezis
Copy link
Author

namezis commented Aug 7, 2021

I noticed that in:

Goroutine 21 frame 6 at /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/check.go:53 (PC: 0x687050)

I can get the label of the check widget, so I populated this with tree id for which the tree tap happens.

I kept modifying only the checkbox, and the checkbox for which the lock is about is the same component in both goroutine 21 and goroutine 37.

Goroutine 21 frame 6 at /home/user/go/pkg/mod/fyne.io/fyne/v2@v2.0.3/widget/check.go:53 (PC: 0x687050)
48:
49: // applyTheme updates this Check to the current theme
50: func (c *checkRenderer) applyTheme() {
51: c.label.Color = theme.ForegroundColor()
52: c.label.TextSize = theme.TextSize()
=> 53: if c.check.Disabled() {
54: c.label.Color = theme.DisabledColor()
55: }
56: }
57:
58: func (c *checkRenderer) Refresh() {

Goroutine 37 frame 10 at /work/go-tests/fyne/fyne-tree-with-details-form/objects/info.go:65 (PC: 0x710494)
60:
61: paramsCard := widget.NewCard(
62: "Params",
63: "",
64: container.NewVBox(
=> 65: container.New(layout.NewFormLayout(),
66: widget.NewLabel("Param1:"), widget.NewEntryWithData(boundParam1),
67: widget.NewLabel("Param2:"), widget.NewEntryWithData(boundParam2),
68: widget.NewLabel("Param3:"), widget.NewEntryWithData(boundParam3),
69: widget.NewLabel("Param4:"), widget.NewEntryWithData(boundParam4),
70: widget.NewSeparator(), widget.NewSeparator(),

Why is the Goroutine 21 executing update when the layout is still being prepared?

@andydotxyz
Copy link
Member

This may relate to the underlying race that I am looking at where render thread is calling at the same time as layout when initialised from an external update.
See #1028 for more thoughts on this. I would love to have this sorted in 2.1, so this information could really help.

@andydotxyz andydotxyz added bug Something isn't working and removed unverified A bug that has been reported but not verified labels Aug 7, 2021
@andydotxyz andydotxyz added this to the Aberlour (2.1) milestone Aug 7, 2021
@namezis
Copy link
Author

namezis commented Aug 8, 2021

Ok, Please forgot about changing any data on the form with data bound widgets to replicate this issue.

The deadlock may happen just simply by creating a layout with widgets of kind *WithData.

One of the threads that locks is created by any widget.New*WithData constructor, and within the Bind call.
This seems to be designed to add initial DataChanged queued event:

Selection_721

As far as I understand that, the execution of that call is nondeterministic. There is no tooling to control that.

The following is the DataChanged callback executed in above code:

image

At some point, this simple code:

image

will start position the widgets (Move or Resize code in e.g formlayout.go function Layout) while another go routine will try to perform (*BaseWidget).Refresh on it:

image

Now the check propertyLock is abused in (*checkRenderer).Refresh() - best to my understanding:

image

down the stack frame we have:

image

I checked the pointers to c.check.propertyLock and w.propertyLock and they are the same.

There is a bit of material about usage of recursive read locking. The best summary can be found here:

https://stackoverflow.com/questions/14670979/recursive-locking-in-go/14671462#14671462

and in the RWMutext code itself:

image

[...] (RLock locks) should not be used for recursive read locking; a blocked Lock call excludes new readers from acquiring the lock. [...]

Russ Cox describes solution to the problem under the following link:

https://groups.google.com/d/msg/golang-nuts/XqW1qcuZgKg/Ui3nQkeLV80J

Recursive (aka reentrant) mutexes are a bad idea.

[...]
Let's take a look at recursive mutexes.
Suppose we have code like this:

func F() {
mu.Lock()
... do some stuff ...
G()
... do some more stuff ...
mu.Unlock()
}

func G() {
mu.Lock()
... do some stuff ...
mu.Unlock()
}
[...]
If you need to implement functionality that can be called
with or without holding a mutex, the clearest thing to do
is to write two versions. For example, instead of the above G,
you could write:

// To be called with mu already held.
// Caller must be careful to ensure that ...
func g() {
... do some stuff ...
}

func G() {
mu.Lock()
g()
mu.Unlock()
}

It looks like the simplest resolution to this particular deadlock is to use c.check.disabled, with lower case, if such was to exist:

image

without the RLocks:

image

Any thoughts?

@andydotxyz
Copy link
Member

Thanks for tracking this down, excellent work.
I agree that in appplyTheme we should be using the field directly as we already hold the lock.
Would you like to open a PR for this or shall I?

In the long term it is probable that renderer and widget should have different locks - but I don't think that would have fixed this particular issue.

@namezis
Copy link
Author

namezis commented Aug 9, 2021

Andrew,

Feel free to take it over.

@andydotxyz
Copy link
Member

Thanks, this should be fixed on develop, specifically version 187cb29d.
Let me know how you get on :).

@namezis
Copy link
Author

namezis commented Aug 12, 2021

Thanks very much. I am looking forward to use it.

@williambrode
Copy link

williambrode commented Oct 26, 2022

This issue still exists in the (c *checkRenderer) updateResource() and (c *checkRenderer) updateFocusIndicator() which are run inside (c *checkRenderer) Refresh(). I just hit this same deadlock during (c *checkRenderer) updateResource(). As you've figured out in this issue the Read lock can't be acquired recursively since it may have to wait on a write lock.

Here is the flow for the issue:

func (c *checkRenderer) Refresh() {
	c.check.propertyLock.RLock() <------ Takes lock
	c.applyTheme()
	c.updateLabel()
	c.updateResource() <----
	c.updateFocusIndicator()
	c.check.propertyLock.RUnlock()
	canvas.Refresh(c.check.super())
}


func (c *checkRenderer) updateResource() {
	res := theme.CheckButtonIcon()
	if c.check.Checked {
		res = theme.NewPrimaryThemedResource(theme.CheckButtonCheckedIcon())
	}
	if c.check.Disabled() { <----
		if c.check.Checked {
			res = theme.NewDisabledResource(theme.CheckButtonCheckedIcon())
		} else {
			res = theme.NewDisabledResource(res)
		}
	}
	c.icon.Resource = res
}


// Disabled returns true if this widget is currently disabled or false if it can currently be interacted with.
func (w *DisableableWidget) Disabled() bool {
	w.propertyLock.RLock() <------  Tries to lock again
	defer w.propertyLock.RUnlock()

	return w.disabled
}

I wonder if in the tests you could find this situation by replacing they mutex with a custom struct that can verify if the read lock was already taken in the call stack somehow. Or if there is some static analysis tool that can do it.

@andydotxyz do you want me to create a separate issue for these 2 instances or reopen this one?

@andydotxyz
Copy link
Member

Given the age of this issue can you open a new one please? I'd guess the code is more recent than the original fix and someone didn't notice that calling the public method from inside the private lock is a bad thing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants