-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadletter mechanism does not integrate well w/ transactional message brokers #4917
Comments
Thanks for raising this @clemensv. Dead letter topics are off by default, and are an opt-in feature. I suggest we move this to docs as this is really about documenting a best practice which advises users to not turn on Dapr Dead Letter Topics if the underlying message broker supports it natively. |
I don't think that is sufficient. If the underlying broker supports DLQ, all operations that lead to a message being deadlettered then also ought to reject the message into the broker's DLQ. That doesn't happen if you just turn the mechanism off. |
I wonder if it is a possible to author a component which has a native DLQ (such as Azure Service Bus) differently
However dapr could still bind to the native DLQ, and delegate messages to the user-code handler as normal? The dev experience would be the same for end-users and operators, however a more robust DLQ implementation would be available for those brokers that are natively capable. My knowledge of ASB is a little dated now, but I know there was (may be still is) a convention to get the name of DLQ when given the Topic/Queue name, which might make discovery of the DLQ particularly easy. I believe the DLQ for ASB is always enabled too which helps with predictability. I'll stop speaking now while Clemens is in the room incase I embarrass myself :P |
Reading this link, it seems that if Dapr DLQ isn't enabled, AND if any of the scenarios leading to a message going into the ASB DLQ occurs, then from Dapr's perspective the message is rejected. Let's play out the following scenario: User keeps Dapr DLQ disabled and uses ASB that's configured to send messages to the DLQ after 10 retries. Dapr reads the message off ASB and attempts 10 deliveries, all of them fail. ASB will then move the message to it's native DLQ, and Dapr consumers will no longer consume the message. Based on this and also the described system rules in the link above, it seems to me like Dapr won't interfere with the native DLQ mechanism. |
I understand that Dapr won't interfere with the native DLQ mechanism providing that Dapr PubSub DLQ is disabled. I could be out of my depth here, but I'm hearing here is Clemens is saying is that when Dapr PubSub DLQ is enabled AND the component utilises a Native DLQ in the broker, then the Dapr-Created DLQ should not be utilised under any circumstance (And I interpret this as any calls to |
I agree. Hence my suggestion to move this to docs, since this should really be a recommendation that says "only enable Dapr dead letter if your message broker doesn't support native dead letter". Dapr can't know if a message broker has native DLT support (and whether it's enabled or not). |
I guess my thoughts on this is if we know that enabling DLQ on ASB component is going to cause a potential dangerous situation, is a recommendation enough? Should the ASB component be hardcoded to DLQ Disabled and not overridable by the user? |
You mean Dapr DLQ disabled? I don't think so. If a user enables ASB DLQ and then explicitly also enables Dapr DLQ then there's not much we can do. Dapr DLQ is not a component specific implementation, it sits in |
Sorry, I've not been explicit enough. I'll try again I guess my thoughts on this is if we know that enabling Dapr PubSub DLQ (and the underlying broker is ASB) is going to cause a potential dangerous situation, is a recommendation enough? Should the ASB component be hardcoded to DLQ Disabled and not overridable by the user? Btw I'm pretty sure that native-DLQ are always enabled in ASB, and they can't be turned off/disabled. |
Technically, we can't program the ASB component to have DLQ disabled. |
Deadlettering may quite well be a conscious act by the application that considers a message and finds it faulty. That path and any other condition at the DAPR level that causes the message to be deadlettered should go into the native DLQ. I have not found where DAPR has a path to explicitly move messages into the native DLQ along with a reason code; did I miss that? |
I don't believe there is any capability to explicitly move a message to a DLQ (ASB or otherwise) in Dapr today. @yaron2 could we potentially do this is using explicit response codes from the app's handler (similar to retry/drop)? We match the response code and raise a specific error, do not retry on that error type, passing it back up to the pubsub component's handler (assumes Dapr's DLQ is disabled but we could also ignore this error type for Dapr's DLQ). Then in the pubsub component we can check for that error being returned and use the native SDK (or equivalent) to dead letter the message explicitly in the broker? |
That could work, provided the underlying SDK supports it. |
I would also be in favor of an option for the subscribing service to return a specific response code that would cause dapr to deadletter immediately. As background (https://discord.com/channels/778680217417809931/901141467840524289/1009925708858007715) I was a little surprised to find that there is a specific response code (404) to "drop" the message - I really can't see how this is different from the app logging it's own warning about the message intentionally being ignored and returning a 200. When I first read the documentation, I was thinking "well, maybe they are using 'drop' as a generic term because some of these 'brokers' don't have native DLQ functionality, and maybe a broker that does have native DLQ functionality will actually deadletter on a 404", but found out that it did not. I am accustomed to differentiating between retryable errors (which should roll back to the broker and become available to be consumed again) and non-retryable errors (which should be deadlettered immediately because it would only waste resources to retry), and it sounds like this is not currently supported. I am also very much in favor of leveraging the broker's native DLQ mechanism for many of the same reasons expressed here. Another reason that I haven't seen expressed is that (unless I am misunderstanding) it seems that if I declare a subscription with a deadLetterTopic, I also have to declare a subscription to that topic that I'm using for DL's (at least if I want dapr to create that topic for me) - and if I do that, I also have to build the subscriber service for it. Since I might not know ahead of time what problems might cause DL's, I might not know what to do with them programmatically yet - initially, I may just want them to accumulate in the DLQ where a support team can look at them and figure out if/how to automate their handling (probably manually fixing and requeuing in the mean-time), allowing me to build out the solution more iteratively. Also, wouldn't the dead letter topic need a dead letter topic? It feels like the deadLetterTopic is just there to compensate for the 'brokers' that don't have native DLQ's, and it seems to make sense to either warn or throw if a configuration is used that is not safe. One other idea is that some brokers (like ASB) have forwarding capability - would it make sense for the component for such a broker to leverage the native DLQ functionality, and if a deadLetterTopic is provided, set the native DLQ up to forward to the specified topic? Further, would it make sense for dapr to create the topic for the specified deadLetterTopic without having to declare a subscription to that topic (I shouldn't be forced to be ready to have a consumer for the dead letters). |
Another area where integration with transactional message brokers might be improved is in how dapr leverages the broker when following a retry policy. I'm getting expected behavior under "normal circumstances" when using an Azure ServiceBus pubsub with maxDeliveryCount=1 and an exponential retry policy with maxRetries=10. The message is tried 10 times with an increasing amount of time between each try, and it finally ends up on the deadletter queue for the subscription. I'm a little concerned with having maxDeliveryCount set to 1 in a production environment though. But if you change maxDeliveryCount to 10 and maxRetries to 1, you get the 10 retries without the exponential backoff, which would reduce the risk of deadlettering because of transient error, but obviously doesn't behave as desired with respect to the backoff. Something to consider would be that the ASB pubsub component should leverage the broker's built-in max delivery count mechanism when following a retry policy. By that, I mean that, in the case of an ASB pubsub, each time dapr gets a "retry" back from the subscriber service, it should continue to hold the message lease for the appropriate amount of time, as specified by the retry policy, and then it should Abandon() the message, "rolling it back" to the broker, and letting the broker decide when it goes to deadletter, rather than having it's own retry loop. |
Completely agree with this. The native DLQ is the perfect place to hold the messages until they can be triaged. In the past we have built 'shovel' tools that allow us to PeekLock a DL message, modify it if necessary, and then push it back to the queue/topic (never bothered with transactions across queues as our processes were idempotent and resilient to retries) I'm currently thinking about how I would achieve a similar process without the native DLQ holding messages. It feels like it will be more complex - essentially taking the whole message envelope and serialising it to a dapr state store, where it can be triaged, with the intent of faithfully reproducing it back into the originating dapr Topic to be reprocessed as a new message. Which I'm certain will be a more nuanced and difficult process... I definitely need to think more on this :( |
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions. |
Given that as of 1.13 the semantics of DROP hint has been fixed to send to the Dapr DLQ, if DQL is configured, it makes sense that this is revisited. For certain PubSub Components that support Native DLQ (Such as Azure Service Bus) it makes sense to invoke the native DLQ mechanism (when Dapr DLQ is not configured) - this will directly the resolve the original issue that @clemensv raised ASB has a deadletter method available in the Go SDK Will this require work in the runtime first, as its my understanding that an interface doesn't exist for the runtime to signal to a contrib component to invoke a native DLQ? |
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions. |
bump |
dapr/pkg/runtime/runtime.go
Line 548 in b8ae13b
The deadletter mechanism in DAPR creates a broken abstraction for all scenarios where the underlying provider is a transactional message broker that has native DLQ functionality. The broker itself will generally continue to use its own deadlettering mechanism when messages expire or when messages exceed the delivery count, which are both conditions that DAPR will trigger as implemented (the Service Bus Component surely will).
In that case, you'll end up with two deadletter drop destinations rather than one, which is an operational management issue.
Message brokers also implement the move of a message queue into the deadletter queue as a transaction operation and that is not the case here. It's possible for a message to be put into the deadletter queue and yet fail to be settled in the origin queue. That's not the case with the built-in mechanism inside of a message broker.
For all messaging provides that do have DLQ support, messages that must be deadlettered must be deadlettered/rejected by the message broker into the native deadletter mechanism in order to have a consistent location for deadlettered messages, independent of whether the app or broker chose to reject them.
The text was updated successfully, but these errors were encountered: