-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes: #15194 - Check the whole queue for matching queued webhooks #15318
base: develop
Are you sure you want to change the base?
Fixes: #15194 - Check the whole queue for matching queued webhooks #15318
Conversation
a2e6304
to
1728997
Compare
@peteeckel thanks for all your work on this, both in capturing the root issue and in devising a solution! Admittedly, I haven't attempted to reproduce this firsthand, but thanks to your very detailed notes I believe I understand the root issue. Given that we cannot rely on merely checking the most recent queued item ( If you're open to it, I'd be happy to continue your work in this direction. (Of course, you're also welcome to continue tackling it yourself.) Let me know what you think, and thanks again! |
Hi Jeremy, you're very welcome - I know that some of the bugs I find are slightly exotic and that you're probably working much more than usual to get NetBox 4 out, so never mind about the delay. As for converting the quere to a dictionary: That makes a lot of sense to me, provided that nobody actually relies on it being a queue, which probably isn't the case. The only thing that might be of concern is that some users might make assumptions about the order in which webhooks are triggered, so my first thought was to use an In CPython, dicts are in fact always ordered, but the question is whether we can rely on NetBox always being run on CPython? Trying to err on the safe side I'd assume the answer is 'no', so the queue should be replaced by an I'm currently on a business trip until Thursday, so I can start working on it again on Friday - if that's not an issue I'll give it a try. On the other hand, the sooner the problem is fixed the better, so don't hesitate to go for it. |
Hi @jeremystretch, I just had a look at the code and it seems that converting the queue to a hash map is not a trivial change. To me, there seems to be a showstopper in So while changing it is rather simple, it will definitely break documented or at least established interfaces, which would probably be possible with the switch to NetBox 4 but not earlier, correct? |
Hi @jeremystretch, as this seems to be stalled on the question of the interface to methods in the I think moving from a queue to a hash map would require changing the (documented?) interface, so the question is whether or not this is an option (or whether I am missing something). If it gets changed, it would probably be best with NetBox 4.0 as a major version. Or were you suggesting converting the queue to a hash map just for the sake of the check whether an item is already in the queue, without changing the queue structure as such? I doubt this would be an efficient use of resources, as repeatedly copying the data into a hash map is more effort than the linear search I implemented. If not, do you require any changes to my PR? |
@peteeckel Please add a test that illustrates the behavior you're describing, so that the developer who picks this up can be confident in their understanding of the problem. Additionally, given the sensitivity of the code path that this PR targets, we need to include tests which confirm the continued correct operation of the rest of the code path's behavior. For example, there's some concern about the possibility of race conditions when multiple actors try to modify the same object. |
Hi @jeffgdotorg, that's an interesting request :-)
|
By the way, I can add a test to the NetBox DNS test suite. Not sure if that would help, though. |
@peteeckel my biggest concern, given the nature of the bug, is that some future change to the logging logic could reintroduce the error. If you can walk me through exactly what's happening in the plugin, I might be able to rig up a test to replicate it in NetBox. |
HI @jeremystretch, thanks for the quick reaction. I constructed some code to reproduce the behviour in the underlying issue. It can be triggered in multiple ways, all requiring a specifically crafted The easiest way to trigger it is to call When there is an m2m relationship, the issue is also triggered when the The only thing my patch does is to check the whole previous queue instead of only the last event in the queue for events for the same object, and not only With your earlier suggestion of changing the data structure of the queue to a hash map, that would be much simpler and more elegant, however at the cost of breaking the interface for the |
Just in case you are asking yourself "Why should anyone want to do that kind of stuff?" ... Take a DNS zone, now update the list of name servers for the zone. The latter is an m2m relationship.
And there you have everything described above in a single request. I documented the result in peteeckel/netbox-plugin-dns#89 (comment), including a verification that the PR actually solves this. |
Fixes: #15194
This PR changes the logic how changes to M2M relationships are matched to changes in the base object to which the M2M relationship is related. The current code assumes that the
post_save
is immediately followed by them2m_changed
signal for the former, but that is not generally true as anysave()
operation for a different object occurring in between will separate the two entries in the queue. This results in two webhooks being triggered, the first one of which has incorrect data in its payload.The new code finds the position of the webhook data for the object change in the queue and updates it with the M2M change data and ensures only one webhook is triggered, and with the correct data.