Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQ: Queue crash using SAC #5460

Closed
Zerpet opened this issue Aug 8, 2022 · 7 comments
Closed

CQ: Queue crash using SAC #5460

Zerpet opened this issue Aug 8, 2022 · 7 comments
Milestone

Comments

@Zerpet
Copy link
Contributor

Zerpet commented Aug 8, 2022

From this issue rabbitmq/amqp091-go#106, we found a reliable way to reproduce a queue crash using 2 consumers in a Single Active Consumer enabled Classic Queue.

I will upload a slight modified version of the original code to reproduce the issue in rabbitmq/amqp091-go#106 to a gist shortly.

Code for repro: https://github.com/Zerpet/amqp091-go-repro-106/tree/main

Steps to repro:

  1. Start one consumer (it will declare a CQ with SAC argument). Set it to "crash" in 4-5 seconds
  2. Start second consumer
  3. Start producer

Observed behaviour:

  • Consumer 1 starts consuming as soon as there are ready messages
  • Consumer 1 "crashes" after 4-5 seconds, as expected
  • Consumer 2 does not take over the consumption
  • A stack trace and an error is observed in RabbitMQ logs
  • A message related to queue recovery is observed in RabbitMQ logs

The issue referenced has RabbitMQ logs attached with the error mentioned above.

@laststem
Copy link

Any news related to this topic?
crash still occurs in 3.11 version.

@mkuratczyk
Copy link
Contributor

I had a quick look and indeed could reproduce easily using the provided Go code, but somehow not using perf-test. Anyway, the stacktrace is:

supervisor: {<0.1203.0>,rabbit_amqqueue_sup}
errorContext: child_terminated
reason: {{badmatch,false},
         [{rabbit_amqqueue_process,'-attempt_delivery/4-fun-0-',11,
              [{file,"deps/rabbit/src/rabbit_amqqueue_process.erl"},
               {line,689}]},
          {rabbit_queue_consumers,deliver_to_consumer,4,
              [{file,"deps/rabbit/src/rabbit_queue_consumers.erl"},
               {line,275}]},
          {rabbit_queue_consumers,deliver_to_consumer,3,
              [{file,"deps/rabbit/src/rabbit_queue_consumers.erl"},
               {line,262}]},
          {rabbit_queue_consumers,deliver,6,
              [{file,"deps/rabbit/src/rabbit_queue_consumers.erl"},
               {line,222}]},
          {rabbit_amqqueue_process,attempt_delivery,4,
              [{file,"deps/rabbit/src/rabbit_amqqueue_process.erl"},
               {line,688}]},
          {rabbit_amqqueue_process,deliver_or_enqueue,3,
              [{file,"deps/rabbit/src/rabbit_amqqueue_process.erl"},
               {line,758}]},
          {rabbit_amqqueue_process,handle_cast,2,
              [{file,"deps/rabbit/src/rabbit_amqqueue_process.erl"},
               {line,1585}]},
          {gen_server2,handle_msg,2,
              [{file,"deps/rabbit_common/src/gen_server2.erl"},
               {line,1067}]}]}
offender: [{pid,<0.1204.0>},
           {id,rabbit_amqqueue},
           {mfargs,
               {rabbit_prequeue,start_link,
                   [{amqqueue,
                        {resource,<<"/">>,queue,<<"queue">>},
                        true,false,none,
                        [{<<"x-single-active-consumer">>,bool,true},
                         {<<"x-queue-version">>,signedint,2}],
                        none,[],[],[],undefined,undefined,[],[],live,0,[],
                        <<"/">>,
                        #{user => <<"guest">>},
                        rabbit_classic_queue,#{}},
                    declare,<0.1202.0>]}},
           {restart_type,transient},
           {significant,true},
           {shutdown,600000},
           {child_type,worker}]

Seems like the backing queue is not empty in this case:
https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbit/src/rabbit_amqqueue_process.erl#L689

I'll defer to @lhoguin as it also affects CQv2.

@lhoguin
Copy link
Contributor

lhoguin commented Nov 24, 2022

Hello, we will take a look next week. In the meantime it would be helpful if you could check whether the problem also occurs on 3.9.

@laststem
Copy link

Hello. @lhoguin
I don't know the patch version(x), I've already tested it on 3.8.x, 3.9.x, 3.10.x, and 3.11.x and they all fail.

Thanks for taking a look soon.

@lhoguin
Copy link
Contributor

lhoguin commented Nov 24, 2022

Alright thanks that helps a lot, it means the problem is not related to the many changes I have done since 3.10. I'll get back to you when I have something.

@lhoguin
Copy link
Contributor

lhoguin commented Nov 28, 2022

Hello, I have pushed a potential fix in the following PR: #6502

Culprit was an old assertion that I do not believe is necessary, but tests haven't finished running, so we will see.

@lhoguin
Copy link
Contributor

lhoguin commented Dec 1, 2022

PR was merged so I'm closing this issue. Thanks!

@lhoguin lhoguin closed this as completed Dec 1, 2022
@michaelklishin michaelklishin added this to the 3.11.5 milestone Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants