Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WHEP source ends up in deadlock #3108

Closed
3 of 14 tasks
RouquinBlanc opened this issue Mar 5, 2024 · 1 comment · Fixed by #3110
Closed
3 of 14 tasks

WHEP source ends up in deadlock #3108

RouquinBlanc opened this issue Mar 5, 2024 · 1 comment · Fixed by #3110
Labels
bug Something isn't working webrtc

Comments

@RouquinBlanc
Copy link
Contributor

Which version are you using?

v1.5.1

Which operating system are you using?

  • Linux amd64 standard
  • Linux amd64 Docker
  • Linux arm64 standard
  • Linux arm64 Docker
  • Linux arm7 standard
  • Linux arm7 Docker
  • Linux arm6 standard
  • Linux arm6 Docker
  • Windows amd64 standard
  • Windows amd64 Docker (WSL backend)
  • macOS amd64 standard
  • macOS amd64 Docker
  • macOS arm64 standard
  • Other (please describe)

Describe the issue

This is probably very similar to #3062 but kept aside until getting more evidence about being a duplicate.

Basically the instance remains stuck, unable to reconnect a WHEP source, which happens more on bad connections.

Describe how to replicate the issue

  1. Start a mediamtx instance with a WHEP source configured. For the remote part, we use another mediamtx with an RTSP stream.
  2. If the connection to the WHEP source goes down or is unreachable, mediamtx will not reconnect anymore until restart

Logs were taken including goroutines listing when the issue happens.

It looks like the issue may be similar to this issue, when blocking on a webrtc callback deadlocks.

Looking at goroutines, we have this blocked:

1 @ 0x1043ef218 0x1043b98a4 0x1043b9464 0x104ae1840 0x104b47464 0x104b4a7ec 0x104b5fc58 0x104b853d4 0x104b853c1 0x104bd7224 0x104be78f8 0x104429084
#	0x104ae183f	github.com/pion/ice/v2.(*Agent).Close+0xdf						/Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/ice/v2@v2.0.0-20231112223552-32d34dfcf3a1/agent.go:955
#	0x104b47463	github.com/pion/webrtc/v3.(*ICEGatherer).Close+0x63					/Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/webrtc/v3@v3.0.0-20231112223655-e402ed2689c6/icegatherer.go:197
#	0x104b4a7eb	github.com/pion/webrtc/v3.(*ICETransport).Stop+0xab					/Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/webrtc/v3@v3.0.0-20231112223655-e402ed2689c6/icetransport.go:202
#	0x104b5fc57	github.com/pion/webrtc/v3.(*PeerConnection).Close+0x3f7					/Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/webrtc/v3@v3.0.0-20231112223655-e402ed2689c6/peerconnection.go:2088
#	0x104b853d3	github.com/bluenviron/mediamtx/internal/protocols/webrtc.(*PeerConnection).Close+0x433	/Users/xxxxxxxx/workspace/go/src/mediamtx/internal/protocols/webrtc/peer_connection.go:142
#	0x104b853c0	github.com/bluenviron/mediamtx/internal/protocols/webrtc.(*WHIPClient).Read+0x420	/Users/xxxxxxxx/workspace/go/src/mediamtx/internal/protocols/webrtc/whip_client.go:146
#	0x104bd7223	github.com/bluenviron/mediamtx/internal/staticsources/webrtc.(*Source).Run+0x1f3	/Users/xxxxxxxx/workspace/go/src/mediamtx/internal/staticsources/webrtc/source.go:54
#	0x104be78f7	github.com/bluenviron/mediamtx/internal/core.(*staticSourceHandler).run.func1.1+0x47	/Users/xxxxxxxx/workspace/go/src/mediamtx/internal/core/static_source_handler.go:172

And this at the same time:

1 @ 0x1043ef218 0x1044026b8 0x104b832a0 0x104b473a4 0x104ae4754 0x104add81c 0x104429084
#	0x104b8329f	github.com/bluenviron/mediamtx/internal/protocols/webrtc.(*PeerConnection).Start.func3+0x17f	/Users/xxxxxxxx/workspace/go/src/mediamtx/internal/protocols/webrtc/peer_connection.go:127
#	0x104b473a3	github.com/pion/webrtc/v3.(*ICEGatherer).Gather.func1+0x273					/Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/webrtc/v3@v3.0.0-20231112223655-e402ed2689c6/icegatherer.go:177
#	0x104ae4753	github.com/pion/ice/v2.(*Agent).onCandidate+0x83						/Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/ice/v2@v2.0.0-20231112223552-32d34dfcf3a1/agent_handlers.go:34
#	0x104add81b	github.com/pion/ice/v2.(*Agent).candidateRoutine+0x4b						/Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/ice/v2@v2.0.0-20231112223552-32d34dfcf3a1/agent_handlers.go:58

The issue can be made systematic by forcing something to fail in WHIPClient.Read after c.pc.Start(). The most likely to fail is PostOffer but any call to c.pc.Close() before the for loop is going to deadlock.

Did you attach the server logs?

failure.log
goroutines.txt

Logs and pprof goroutines list during issue

Did you attach a network dump?

no (does not look relevant)

@aler9 aler9 added bug Something isn't working webrtc labels Mar 6, 2024
@aler9 aler9 linked a pull request Mar 6, 2024 that will close this issue
aler9 added a commit that referenced this issue Mar 6, 2024
Co-authored-by: Jonathan Martin <jonathan.martin@marss.com>
Co-authored-by: aler9 <46489434+aler9@users.noreply.github.com>
Copy link
Contributor

This issue is mentioned in release v1.7.0 🚀
Check out the entire changelog by clicking here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working webrtc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants