Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT: QoS2 tests and fixes #4406

Merged
merged 6 commits into from Aug 23, 2023
Merged

MQTT: QoS2 tests and fixes #4406

merged 6 commits into from Aug 23, 2023

Conversation

levb
Copy link
Contributor

@levb levb commented Aug 18, 2023

Adds QoS2 related tests and 2 uncovered bug fixes to #4349.

Still WIP/to come:

  • test that QoS2 session state (pubrel) gets saved/restored.
  • figure out a test and a solution for saving/restoring the PI mapping/use state when server is restarted

@kozlovic I'd really appreciate suggestions on

  • (a) more tests to add (or modify for QoS2), especially WRT cleanup, clustering
  • (b) what to do about [MQTT-3.10.4-3], see comments in TestMQTTQoS2InflightMsgsDeliveredAfterUnsubscribe

@levb levb requested a review from kozlovic August 18, 2023 13:30
@levb levb requested a review from a team as a code owner August 18, 2023 13:30
server/mqtt.go Outdated
@@ -3859,6 +3859,7 @@ func (c *client) mqttProcessPubRec(pi uint16) error {

sess.mu.Lock()
if sess.c != c {
sess.mu.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the new test you added would flap on my computer. What fixed it was to move the "jsa.storeMsg" under the lock and I think I understand why: otherwise it would race between the pubRelCb that would update the pending map and when a bit lower you lookup the ackSubject it would use the "wrong" one.
We may need to have another call to discuss better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kozlovic I think I understand the issue you're describing. 2/5 I don't like placing the storeMsg roundtrip under the session lock as the solution.

There may be a way to fix this by re-arranging the order of execution/error handling - the only reason the storeMsg call is first in the function is because we must respond, but this can be done explicitly in the earlier error return paths, and then after the Unlock when in-memory map is already prepared for mqttDeliverPubRelCb. Unfortunately, we may then be racing with another mqttDeliverMsgCbQoS12. I don't like going down that path.

A more verbose but 1/5 cleaner solution is to separate the PI <> ACK mappings for PUBRELs altogether. I was trying to avoid it, but perhaps wrongly. Will try this now.

Since I have not seen the new tests flapping (yet?), can you please add the exact go/CPU/OS environment and the go test command? Then again, since we've ran into this, I should probably add an explicit wrapper function and a test that would forcefully wait for the PUBREL delivery attempt before returning from "storeMsg".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as for a call - please see Slack)

@levb levb merged commit 03c22ed into lev-mqtt-qos2 Aug 23, 2023
2 checks passed
@levb levb deleted the lev-mqtt-qos2-test branch August 23, 2023 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants