Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clients get disconnected by notifications from Firefox #284

Open
torokati44 opened this issue Feb 26, 2022 · 16 comments
Open

Clients get disconnected by notifications from Firefox #284

torokati44 opened this issue Feb 26, 2022 · 16 comments

Comments

@torokati44
Copy link

torokati44 commented Feb 26, 2022

I have a (seemingly) fairly complex issue at hand, and I'm not sure which component of many is at fault.
Basically I'm cross-posting this issue here: https://bugs.kde.org/show_bug.cgi?id=450050

In short: I can reliably trigger the disconnection of a number of independent clients (such as plasmashell, dbus-monitor, and Bustle) from the session bus by sending desktop notifications repeatedly from a page in Firefox.
When this happens, Firefox gets blocked (maybe waiting for a reply from the bus?), and any invocations of notify-send after this also just hang indefinitely.

@dvdhrm
Copy link
Member

dvdhrm commented Mar 8, 2022

Do you have any hints in the system-log or system-journal about this? Any messages from dbus-broker?

@torokati44
Copy link
Author

torokati44 commented Mar 10, 2022

I don't think so, not really:

[attila:~]$ journalctl -u dbus-broker -b0 
Mar 10 02:37:53 ryzen systemd[1]: Starting D-Bus System Message Bus...
Mar 10 02:37:53 ryzen dbus-broker-launch[800]: Looking up NSS user entry for 'dbus'...
Mar 10 02:37:53 ryzen dbus-broker-launch[800]: NSS returned no entry for 'dbus'
Mar 10 02:37:53 ryzen dbus-broker-launch[800]: Invalid user-name in /usr/share/dbus-1/system.conf +18: <user>dbus</user>
Mar 10 02:37:53 ryzen systemd[1]: Started D-Bus System Message Bus.
Mar 10 02:37:53 ryzen dbus-broker-lau[800]: Ready
Mar 10 02:37:54 ryzen dbus-broker[802]: Dispatched 836 messages @ 0(±1)μs / message.
Mar 10 02:37:54 ryzen systemd[1]: Stopping D-Bus System Message Bus...
Mar 10 02:37:54 ryzen systemd[1]: dbus-broker.service: Deactivated successfully.
Mar 10 02:37:54 ryzen systemd[1]: Stopped D-Bus System Message Bus.
Mar 10 02:37:56 ryzen systemd[1]: Starting D-Bus System Message Bus...
Mar 10 02:37:56 ryzen dbus-broker-launch[1287]: Ignoring duplicate name 'org.freedesktop.ratbag1' in service file '/usr/share/dbus-1/system-services/org.freedesktop.ratbag1.service'
Mar 10 02:37:56 ryzen systemd[1]: Started D-Bus System Message Bus.
Mar 10 02:37:56 ryzen dbus-broker-lau[1287]: Ready
[attila:~]$ 

That "Invalid user-name" looks to be https://bugzilla.redhat.com/show_bug.cgi?id=1996198 and can be harmless, while the duplicate name in the service file is explicitly ignored, so...

I also tried to look for ways to start dbus-broker in "verbose" mode, to no avail.

@dvdhrm
Copy link
Member

dvdhrm commented Mar 10, 2022

I also tried to look for ways to start dbus-broker in "verbose" mode, to no avail.

It unconditionally prints any warnings/errors.

Does that log include messages when the bug actually appears?

@torokati44
Copy link
Author

It unconditionally prints any warnings/errors.

I see.

Does that log include messages when the bug actually appears?

Of course. These disconnections result in no log output from dbus-broker whatsoever.

@torokati44
Copy link
Author

torokati44 commented Mar 10, 2022

Also, a minor correction about notify-send: it does terminate eventually, it just takes about 55 seconds (+/- 5 seconds). I suspect a read/poll timeout, because the notification doesn't actually appear. (EDIT: Well, that can also be because plasmashell, which is supposed to show the notification, is also disconnected by this time.)

I can instead sometimes see this in Bustle:
image

Which is a bit suspect - 2E being the hexadecimal ASCII for . of course.
I'm not at all familiar with the DBus protocol though, so it could also be totally fine.

@torokati44
Copy link
Author

Is there any way I could raise the log level, or enable some verbose/debug/tracing mode, where everything (not just errors) is dumped into the journal, so that I (we?) can try getting to the end of this? :/

@torokati44
Copy link
Author

torokati44 commented Mar 21, 2022

I have some .pcap captures saved by Bustle from when this happens.
Does this give anyone any hints at all?

image
(Sorry for the enormous image...)

The point it not the last, selected message, but the one above it - and the fact that when I select that, Wireshark also crashes... :| (EDIT: The elided "Length" of this message is 59486538 btw.)

I can imagine this not being healthy to dbus-broker, or some of the clients, either?

@torokati44
Copy link
Author

Hmmm, could the fact that I have literally over 9000 tabs open in 4 Firefox windows have something to do with this? 🤔
Just because looking at it in a hex editor (which doesn't crash), that erroneous message seems to contain within it ~900 "entries", each with a subtext "field" for a tab name I have open, an actions one for who knows what, and an icon-data for the favicon; and more.

So, when these 50+ MiB long messages are running around between the browser and the window manager all the time (possibly on each window focus change, most likely also involving additional serialization/deserialization at the endpoints?), is it possible that something somewhere gets out of sync, times out, gets limited, or in any way garbled?

@polarathene
Copy link

polarathene commented Mar 26, 2022

50+ MiB long messages are running around between the browser and the window manager all the time
...
gets limited

I'm not affected by this issue personally (or a user of dbus-broker yet for that matter), but while glancing over the issue history, this quota issue might be relevant?:

The UID in question has ~250MB of pending data in the broker and has reached it's quota. The quota is a hard limit, and rate limiting is not supported.
A client causing a user to exceed their quota is not supported.

Although later:

We are neither disconnecting a client nor forcibly closing a session. We simply refuse further resource allocation if the limits are reached.
This is a DoS protection so one user cannot consume infinite shared resources and thus affect operation of a different user.

They also mention raising quota and connection limits as workarounds. Perhaps doing so reduces or delays the problem you're experiencing here? This wiki page may also be relevant.


9k tabs is probably a really niche scenario 😅

I personally often have up to 20-30 windows across virtual desktops with 2-3k tabs across them. That isn't causing problems like described here, but it does add friction to my UX, so I'll be managing them via separate VM instances instead.. might be applicable to you if the 9k tabs is legit usage beyond temporary/testing purposes.

@dvdhrm
Copy link
Member

dvdhrm commented May 5, 2022

Hmmm, could the fact that I have literally over 9000 tabs open in 4 Firefox windows have something to do with this?

Can you reproduce this with <20 tags open? If no, it might very well be related.

And yes, I can easily imagine there being bus-notifications that encapsulate the full information of those open tabs, and thus might easily exceed signature-limits, body-limits, or quota-limits of the process in question.

Note that dbus-broker is behaving normally, according to your logs. If an application is unresponsive, it is really up to that application to report where it hangs. But please note that the firefox-bugtracker might be unwilling to spend much time debugging issues with 9000 tabs open (but I might be wrong...).

Can you try narrowing this down? Try figuring out under which circumstances this happens. Try figuring out why clients disconnect (are you telling me several clients disconnect without any message in the system logs whatsoever?), try finding reproducers for other people to test.

@torokati44
Copy link
Author

Can you reproduce this with <20 tags open?

I'm starting to go mad because I can't seem to reproduce it on my other (almost identical, software-wise) machine with however many tabs, nor on this machine with any other Firefox instance (a separate Beta build for example) running, also with however many tabs. It's only this Firefox (the main one on the system) on this particular machine.

Even though I have, so far (already with significant effort), reduced the number of open tabs to below 4000, in only two windows.

Can you try narrowing this down?

Yes, I'm trying really hard...

Try figuring out why clients disconnect

At least Bustle reports some sort of error, I have posted a screenshot of it attached to the KDE bug linked in the OP: https://bugsfiles.kde.org/attachment.cgi?id=147085

(are you telling me several clients disconnect without any message in the system logs whatsoever?)

That's exactly what I'm seeing.

@torokati44
Copy link
Author

And again, it's really weird that I have to copy-paste something first - either a screenshot from Spectacle into GIMP, or some text from Firefox to LibreOffice Writer - in order to trigger it.
Or at least doing that makes the disconnections happen with really high probability quite soon, while it barely ever happens, if at all, if I'm not doing that.

@devkral
Copy link

devkral commented Nov 10, 2022

have you checked that dbus-broker is activated properly and dbus is disabled? After some updates of my distro dbus was partly active again.
Maybe this is the issue.
(gnome-keyring seems to disconnect in my case after hibernation on the one pc, while opening links on the other pc with firefox doesn't work. I think a partly dropped in dbus-broker is the issue, but I am still evaluating)

@torokati44
Copy link
Author

Well, that is possible... As my system was also distro-upgraded into dbus-broker from the "vanilla" server. But so far I found no evidence that there's still anything active other than dbus-broker...
Please keep me updated when you find something out!

@devkral
Copy link

devkral commented Nov 22, 2022

my firefox problem isn't fixed by removing dbus-broker (so it is not related). The other problem I still evaluate

@devkral
Copy link

devkral commented Nov 22, 2022

dbus-broker is also not the culprit of the second issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants