Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadletter mechanism does not integrate well w/ transactional message brokers #4917

Open
clemensv opened this issue Jul 19, 2022 · 23 comments
Open

Comments

@clemensv
Copy link

func (a *DaprRuntime) sendToDeadLetterIfConfigured(name string, msg *pubsub.NewMessage) (isDeadLetterConfigured bool, err error) {

The deadletter mechanism in DAPR creates a broken abstraction for all scenarios where the underlying provider is a transactional message broker that has native DLQ functionality. The broker itself will generally continue to use its own deadlettering mechanism when messages expire or when messages exceed the delivery count, which are both conditions that DAPR will trigger as implemented (the Service Bus Component surely will).

In that case, you'll end up with two deadletter drop destinations rather than one, which is an operational management issue.

Message brokers also implement the move of a message queue into the deadletter queue as a transaction operation and that is not the case here. It's possible for a message to be put into the deadletter queue and yet fail to be settled in the origin queue. That's not the case with the built-in mechanism inside of a message broker.

For all messaging provides that do have DLQ support, messages that must be deadlettered must be deadlettered/rejected by the message broker into the native deadletter mechanism in order to have a consistent location for deadlettered messages, independent of whether the app or broker chose to reject them.

@yaron2
Copy link
Member

yaron2 commented Jul 19, 2022

Thanks for raising this @clemensv. Dead letter topics are off by default, and are an opt-in feature. I suggest we move this to docs as this is really about documenting a best practice which advises users to not turn on Dapr Dead Letter Topics if the underlying message broker supports it natively.

@clemensv
Copy link
Author

I don't think that is sufficient. If the underlying broker supports DLQ, all operations that lead to a message being deadlettered then also ought to reject the message into the broker's DLQ. That doesn't happen if you just turn the mechanism off.

@olitomlinson
Copy link

@yaron2

I wonder if it is a possible to author a component which has a native DLQ (such as Azure Service Bus) differently

  • Skip the creation of a dapr-owned DLQ
  • Skip the movement of the message to that dapr-owned DLQ, and let the underlying broker handle it as expected

However dapr could still bind to the native DLQ, and delegate messages to the user-code handler as normal?

The dev experience would be the same for end-users and operators, however a more robust DLQ implementation would be available for those brokers that are natively capable.

My knowledge of ASB is a little dated now, but I know there was (may be still is) a convention to get the name of DLQ when given the Topic/Queue name, which might make discovery of the DLQ particularly easy. I believe the DLQ for ASB is always enabled too which helps with predictability. I'll stop speaking now while Clemens is in the room incase I embarrass myself :P

@yaron2
Copy link
Member

yaron2 commented Jul 21, 2022

Reading this link, it seems that if Dapr DLQ isn't enabled, AND if any of the scenarios leading to a message going into the ASB DLQ occurs, then from Dapr's perspective the message is rejected.

Let's play out the following scenario:

User keeps Dapr DLQ disabled and uses ASB that's configured to send messages to the DLQ after 10 retries. Dapr reads the message off ASB and attempts 10 deliveries, all of them fail. ASB will then move the message to it's native DLQ, and Dapr consumers will no longer consume the message.

Based on this and also the described system rules in the link above, it seems to me like Dapr won't interfere with the native DLQ mechanism.

@olitomlinson
Copy link

olitomlinson commented Jul 21, 2022

I understand that Dapr won't interfere with the native DLQ mechanism providing that Dapr PubSub DLQ is disabled.

I could be out of my depth here, but I'm hearing here is Clemens is saying is that when Dapr PubSub DLQ is enabled AND the component utilises a Native DLQ in the broker, then the Dapr-Created DLQ should not be utilised under any circumstance (And I interpret this as any calls to sendToDeadLetterIfConfigured should not move the message to the Dapr-owned DLQ, but instead the message should be NACKd/rejected via the native broker API.

@yaron2
Copy link
Member

yaron2 commented Jul 21, 2022

I understand that Dapr won't interfere with the native DLQ mechanism providing that Dapr PubSub DLQ is disabled.

I could be out of my depth here, but I'm hearing here is Clemens is saying is that when Dapr PubSub DLQ is enabled AND the component utilises a Native DLQ in the broker, then the Dapr-Created DLQ should not be utilised under any circumstance (And I interpret this as any calls to sendToDeadLetterIfConfigured should not move the message to the Dapr-owned DLQ, but instead the message should be NACKd/rejected via the native broker API.

I agree. Hence my suggestion to move this to docs, since this should really be a recommendation that says "only enable Dapr dead letter if your message broker doesn't support native dead letter".

Dapr can't know if a message broker has native DLT support (and whether it's enabled or not).

@olitomlinson
Copy link

I agree. Hence my suggestion to move this to docs, since this should really be a recommendation that says "only enable Dapr dead letter if your message broker doesn't support native dead letter".

I guess my thoughts on this is if we know that enabling DLQ on ASB component is going to cause a potential dangerous situation, is a recommendation enough? Should the ASB component be hardcoded to DLQ Disabled and not overridable by the user?

@yaron2
Copy link
Member

yaron2 commented Jul 21, 2022

I agree. Hence my suggestion to move this to docs, since this should really be a recommendation that says "only enable Dapr dead letter if your message broker doesn't support native dead letter".

I guess my thoughts on this is if we know that enabling DLQ on ASB component is going to cause a potential dangerous situation, is a recommendation enough? Should the ASB component be hardcoded to DLQ Disabled and not overridable by the user?

You mean Dapr DLQ disabled? I don't think so. If a user enables ASB DLQ and then explicitly also enables Dapr DLQ then there's not much we can do. Dapr DLQ is not a component specific implementation, it sits in daprd and this can't be disabled per component.

@olitomlinson
Copy link

Sorry, I've not been explicit enough. I'll try again

I guess my thoughts on this is if we know that enabling Dapr PubSub DLQ (and the underlying broker is ASB) is going to cause a potential dangerous situation, is a recommendation enough? Should the ASB component be hardcoded to DLQ Disabled and not overridable by the user?


Btw I'm pretty sure that native-DLQ are always enabled in ASB, and they can't be turned off/disabled.

@yaron2
Copy link
Member

yaron2 commented Jul 21, 2022

Sorry, I've not been explicit enough. I'll try again

I guess my thoughts on this is if we know that enabling Dapr PubSub DLQ (and the underlying broker is ASB) is going to cause a potential dangerous situation, is a recommendation enough? Should the ASB component be hardcoded to DLQ Disabled and not overridable by the user?

Btw I'm pretty sure that native-DLQ are always enabled in ASB, and they can't be turned off/disabled.

Technically, we can't program the ASB component to have DLQ disabled.

@clemensv
Copy link
Author

Deadlettering may quite well be a conscious act by the application that considers a message and finds it faulty. That path and any other condition at the DAPR level that causes the message to be deadlettered should go into the native DLQ. I have not found where DAPR has a path to explicitly move messages into the native DLQ along with a reason code; did I miss that?

@jjcollinge
Copy link
Contributor

jjcollinge commented Jul 22, 2022

I don't believe there is any capability to explicitly move a message to a DLQ (ASB or otherwise) in Dapr today. @yaron2 could we potentially do this is using explicit response codes from the app's handler (similar to retry/drop)? We match the response code and raise a specific error, do not retry on that error type, passing it back up to the pubsub component's handler (assumes Dapr's DLQ is disabled but we could also ignore this error type for Dapr's DLQ). Then in the pubsub component we can check for that error being returned and use the native SDK (or equivalent) to dead letter the message explicitly in the broker?

@yaron2
Copy link
Member

yaron2 commented Aug 10, 2022

I don't believe there is any capability to explicitly move a message to a DLQ (ASB or otherwise) in Dapr today. @yaron2 could we potentially do this is using explicit response codes from the app's handler (similar to retry/drop)? We match the response code and raise a specific error, do not retry on that error type, passing it back up to the pubsub component's handler (assumes Dapr's DLQ is disabled but we could also ignore this error type for Dapr's DLQ). Then in the pubsub component we can check for that error being returned and use the native SDK (or equivalent) to dead letter the message explicitly in the broker?

That could work, provided the underlying SDK supports it.

@bledbet
Copy link

bledbet commented Aug 19, 2022

I would also be in favor of an option for the subscribing service to return a specific response code that would cause dapr to deadletter immediately. As background (https://discord.com/channels/778680217417809931/901141467840524289/1009925708858007715) I was a little surprised to find that there is a specific response code (404) to "drop" the message - I really can't see how this is different from the app logging it's own warning about the message intentionally being ignored and returning a 200. When I first read the documentation, I was thinking "well, maybe they are using 'drop' as a generic term because some of these 'brokers' don't have native DLQ functionality, and maybe a broker that does have native DLQ functionality will actually deadletter on a 404", but found out that it did not. I am accustomed to differentiating between retryable errors (which should roll back to the broker and become available to be consumed again) and non-retryable errors (which should be deadlettered immediately because it would only waste resources to retry), and it sounds like this is not currently supported.

I am also very much in favor of leveraging the broker's native DLQ mechanism for many of the same reasons expressed here. Another reason that I haven't seen expressed is that (unless I am misunderstanding) it seems that if I declare a subscription with a deadLetterTopic, I also have to declare a subscription to that topic that I'm using for DL's (at least if I want dapr to create that topic for me) - and if I do that, I also have to build the subscriber service for it. Since I might not know ahead of time what problems might cause DL's, I might not know what to do with them programmatically yet - initially, I may just want them to accumulate in the DLQ where a support team can look at them and figure out if/how to automate their handling (probably manually fixing and requeuing in the mean-time), allowing me to build out the solution more iteratively. Also, wouldn't the dead letter topic need a dead letter topic?

It feels like the deadLetterTopic is just there to compensate for the 'brokers' that don't have native DLQ's, and it seems to make sense to either warn or throw if a configuration is used that is not safe.

One other idea is that some brokers (like ASB) have forwarding capability - would it make sense for the component for such a broker to leverage the native DLQ functionality, and if a deadLetterTopic is provided, set the native DLQ up to forward to the specified topic? Further, would it make sense for dapr to create the topic for the specified deadLetterTopic without having to declare a subscription to that topic (I shouldn't be forced to be ready to have a consumer for the dead letters).

@bledbet
Copy link

bledbet commented Aug 22, 2022

Another area where integration with transactional message brokers might be improved is in how dapr leverages the broker when following a retry policy.

I'm getting expected behavior under "normal circumstances" when using an Azure ServiceBus pubsub with maxDeliveryCount=1 and an exponential retry policy with maxRetries=10. The message is tried 10 times with an increasing amount of time between each try, and it finally ends up on the deadletter queue for the subscription.

I'm a little concerned with having maxDeliveryCount set to 1 in a production environment though.
1) It seems like a transient issue with the dapr sidecar communicating with the ASB instance (like a failure to renew a message lease, or a failure to commit the read) would cause a message to go straight to deadletter prematurely, and
2) It seems like a dapr sidecar restart would cause any messages that have been read into memory (by prefetchCount and/or maxActiveMessages) but not processed yet to go straight to deadletter.

But if you change maxDeliveryCount to 10 and maxRetries to 1, you get the 10 retries without the exponential backoff, which would reduce the risk of deadlettering because of transient error, but obviously doesn't behave as desired with respect to the backoff.

Something to consider would be that the ASB pubsub component should leverage the broker's built-in max delivery count mechanism when following a retry policy. By that, I mean that, in the case of an ASB pubsub, each time dapr gets a "retry" back from the subscriber service, it should continue to hold the message lease for the appropriate amount of time, as specified by the retry policy, and then it should Abandon() the message, "rolling it back" to the broker, and letting the broker decide when it goes to deadletter, rather than having it's own retry loop.

@olitomlinson
Copy link

initially, I may just want them to accumulate in the DLQ where a support team can look at them and figure out if/how to automate their handling (probably manually fixing and requeuing in the mean-time), allowing me to build out the solution more iteratively.

Completely agree with this. The native DLQ is the perfect place to hold the messages until they can be triaged. In the past we have built 'shovel' tools that allow us to PeekLock a DL message, modify it if necessary, and then push it back to the queue/topic (never bothered with transactions across queues as our processes were idempotent and resilient to retries)

I'm currently thinking about how I would achieve a similar process without the native DLQ holding messages. It feels like it will be more complex - essentially taking the whole message envelope and serialising it to a dapr state store, where it can be triaged, with the intent of faithfully reproducing it back into the originating dapr Topic to be reprocessed as a new message. Which I'm certain will be a more nuanced and difficult process... I definitely need to think more on this :(

@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale Issues and PRs without response label Dec 14, 2022
@dapr-bot
Copy link
Collaborator

This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

@mukundansundar mukundansundar removed the stale Issues and PRs without response label Dec 22, 2022
@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale Issues and PRs without response label Feb 20, 2023
@dapr-bot
Copy link
Collaborator

This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

@olitomlinson
Copy link

olitomlinson commented Mar 28, 2024

Given that as of 1.13 the semantics of DROP hint has been fixed to send to the Dapr DLQ, if DQL is configured, it makes sense that this is revisited.

For certain PubSub Components that support Native DLQ (Such as Azure Service Bus) it makes sense to invoke the native DLQ mechanism (when Dapr DLQ is not configured) - this will directly the resolve the original issue that @clemensv raised

ASB has a deadletter method available in the Go SDK

@yaron2

Will this require work in the runtime first, as its my understanding that an interface doesn't exist for the runtime to signal to a contrib component to invoke a native DLQ?

@olitomlinson olitomlinson reopened this Mar 28, 2024
@dapr-bot dapr-bot removed the stale Issues and PRs without response label Mar 28, 2024
@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale Issues and PRs without response label May 27, 2024
@olitomlinson
Copy link

bump

@dapr-bot dapr-bot removed the stale Issues and PRs without response label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants