Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A possible dead lock in connection close notification Go channel #11

Closed
rickyzhang82 opened this issue Aug 26, 2021 · 5 comments
Closed

Comments

@rickyzhang82
Copy link

The dead lock happens when the connection is closing. I lost the core dump since then. But here is the sequence:

  1. Create connection close notification Go channel by assigning in connection.NotifyClose.
  2. The connection is disconnected. It is in closing state.
  3. Create a channel in the closing connection.
  4. If the connection close notification Go channel is not consumed, step 3 is in dead lock.

The documentation should state that a separate Go routines should be dedicated to consume the connection close notification Go channel.

@DanielePalaia
Copy link
Contributor

Hi, this one is quite old, anyway I was trying to reproduce the issue as you described adding a unit test and I wasn't able to:

func TestShouldNotWaitAfterConnectionClosedNewChannelCreatedIssue44(t *testing.T) {
	conn := integrationConnection(t, "TestShouldNotWaitAfterConnectionClosedNewChannelCreatedIssue44")
	ch, err := conn.Channel()
	if err != nil {
		t.Fatalf("channel error: %v", err)
	}

	conn.NotifyClose(make(chan *Error, 1))
	/*go func() {
		<-closed
	}()*/

	_, err = ch.PublishWithDeferredConfirm("test", "test", false, false, Publishing{Body: []byte("abc")})
	if err != nil {
		t.Fatalf("PublishWithDeferredConfirm error: %v", err)
	}

	ch.Close()
	conn.Close()

	ch, err = conn.Channel()
	if err != nil {
		t.Fatalf("channel error: %v", err)
	}

}

Was the scenario different?

@DanielePalaia
Copy link
Contributor

I wasn't really able to reproduce this anyway.
If you notice the deadlock again open a new one providing a trace to investigate

@rickyzhang82
Copy link
Author

Sorry, I didn't get your email until now.

Your unit test doesn't replicate my sequence. The dead lock happens between the process of consuming the Go channel from conn.NotifyClose and creating a new rabbitMQ channel conn.Channel() when rabbitMQ connection is in closing state.

In our logic, we have a select statement to:

  • consume the Go channel from conn.NotifyClose.
  • consume the message delivery sent from rabbitMQ. Once it is received a message, we create a new channel from the same connection.

We found the problem in the high availability test by killing one of rabbitMQ server node in the cluster randomly. The mitigation requires us to consume the Go channel from conn.NotifyClose in a separate Go routine as soon as possible otherwise creating a new channel will get stuck indefinitely.

@Gsantomaggio
Copy link
Member

@rickyzhang82
Could you please provide a snipped code? We tried in different ways to reproduce the problem without luck.
Thank you

@DanielePalaia
Copy link
Contributor

Hi yes, trying to reproduce in various way but not able able to like this flow:

func TestShouldNotWaitAfterConnectionClosedNewChannelCreatedIssueXXXX(t *testing.T) {
	conn := integrationConnection(t, "TestShouldNotWaitAfterConnectionClosedNewChannelCreatedIssueXXXX")
	ch, err := conn.Channel()
	if err != nil {
		t.Fatalf("channel error: %v", err)
	}

	messages, _ := ch.Consume("test", "#", false, false, false, false, nil)

	closed := conn.NotifyClose(make(chan *Error, 1))
	var wg sync.WaitGroup

	wg.Add(1)

	go func() {

		select {
		case <-closed:
			t.Logf("connection is now closed")
			/*ch, err = conn.Channel()
			if err != nil {
				t.Fatalf("channel error: %v", err)
			}*/

		case d, _ := <-messages:

			t.Logf(
				"got %dB delivery: [%v] %q",
				len(d.Body),
				d.DeliveryTag,
				d.Body,
			)

			

			err := d.Ack(true)
			if err != nil {
				t.Logf("Error in Ack %v", err)
			}

                         time.Sleep(120 * time.Second)
			// Simulate a network issue (close connection from management-api, ecc...)

			//Create a new channel
			ch, err = conn.Channel()
			if err != nil {
				t.Fatalf("channel error: %v", err)
			}

		}

	}()

	/*ch.Close()
	conn.Close()*/

	wg.Wait()

}

Trying to reopen the channel both in case <-closed and case d, _ := <-messages after I close the connection from the client code or rabbitMQ UI but without luck. Best thing if you are still able to reproduce the issue is to provide a trace at least to have some sort of idea on where begin to look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants