Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DTLS server does not handle retransmits of the client's last flight #479

Open
mniestroj opened this issue Jun 22, 2022 · 2 comments
Open

Comments

@mniestroj
Copy link

Your environment.

  • Version: v2.1.5 with small adjustments to "example listener"

What did you do?

Run example PSK listener from https://github.com/mniestroj/pion-dtls/tree/handshake-not-finishing, which contains some adjustments for mbedTLS client that I use. Also the retransmit timer was extended from 1s to 30s to help reproduce the issue.

Communication was done on tun/tap interface, because it actually happens between virtual machine (QEMU) and host. In order to simulate packet loss, following command was executed on host:

$ tc qdisc add dev tap0 root netem loss 75%

which basically makes 75% of the packets to be dropped, which is great way to test handshake retransmission mechanisms (and reproduce this bug).

$ PION_LOG_TRACE=all go run ./examples/listen/psk/main.go
dtls TRACE: 17:27:13.476609 handshaker.go:166: [handshake:server] Flight 0: Preparing
dtls TRACE: 17:27:13.476647 handshaker.go:166: [handshake:server] Flight 0: Sending
dtls TRACE: 17:27:13.476651 handshaker.go:166: [handshake:server] Flight 0: Waiting
dtls TRACE: 17:27:13.476709 handshaker.go:282: [handshake:server] Flight 0 -> Flight 2
dtls TRACE: 17:27:13.476714 handshaker.go:166: [handshake:server] Flight 2: Preparing
dtls TRACE: 17:27:13.476716 handshaker.go:166: [handshake:server] Flight 2: Sending
dtls TRACE: 17:27:13.476719 conn.go:387: [handshake:server] -> HelloVerifyRequest (epoch: 0, seq: 0)
dtls TRACE: 17:27:13.476741 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 17:27:14.485394 handshaker.go:282: [handshake:server] Flight 2 -> Flight 2
dtls TRACE: 17:27:14.485406 handshaker.go:166: [handshake:server] Flight 2: Preparing
dtls TRACE: 17:27:14.485410 handshaker.go:166: [handshake:server] Flight 2: Sending
dtls TRACE: 17:27:14.485414 conn.go:387: [handshake:server] -> HelloVerifyRequest (epoch: 0, seq: 0)
dtls TRACE: 17:27:14.485432 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 17:27:15.495397 handshaker.go:282: [handshake:server] Flight 2 -> Flight 2
dtls TRACE: 17:27:15.495410 handshaker.go:166: [handshake:server] Flight 2: Preparing
dtls TRACE: 17:27:15.495414 handshaker.go:166: [handshake:server] Flight 2: Sending
dtls TRACE: 17:27:15.495418 conn.go:387: [handshake:server] -> HelloVerifyRequest (epoch: 0, seq: 0)
dtls TRACE: 17:27:15.495437 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 17:27:16.505384 handshaker.go:282: [handshake:server] Flight 2 -> Flight 2
dtls TRACE: 17:27:16.505397 handshaker.go:166: [handshake:server] Flight 2: Preparing
dtls TRACE: 17:27:16.505407 handshaker.go:166: [handshake:server] Flight 2: Sending
dtls TRACE: 17:27:16.505412 conn.go:387: [handshake:server] -> HelloVerifyRequest (epoch: 0, seq: 0)
dtls TRACE: 17:27:16.505430 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 17:27:17.515407 handshaker.go:282: [handshake:server] Flight 2 -> Flight 2
dtls TRACE: 17:27:17.515424 handshaker.go:166: [handshake:server] Flight 2: Preparing
dtls TRACE: 17:27:17.515427 handshaker.go:166: [handshake:server] Flight 2: Sending
dtls TRACE: 17:27:17.515431 conn.go:387: [handshake:server] -> HelloVerifyRequest (epoch: 0, seq: 0)
dtls TRACE: 17:27:17.515461 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 17:27:17.529679 handshaker.go:282: [handshake:server] Flight 2 -> Flight 4
dtls TRACE: 17:27:17.529686 handshaker.go:166: [handshake:server] Flight 4: Preparing
dtls TRACE: 17:27:17.529690 handshaker.go:166: [handshake:server] Flight 4: Sending
dtls TRACE: 17:27:17.529693 conn.go:387: [handshake:server] -> ServerHello (epoch: 0, seq: 1)
dtls TRACE: 17:27:17.529697 conn.go:387: [handshake:server] -> ServerHelloDone (epoch: 0, seq: 2)
dtls TRACE: 17:27:17.529713 handshaker.go:166: [handshake:server] Flight 4: Waiting
dtls DEBUG: 17:27:17.572600 conn.go:737: CipherSuite not initialized, queuing packet
dtls DEBUG: 17:27:17.572610 conn.go:660: received packet of next epoch, queuing packet
Client's hint: mark-one-id@one 
dtls TRACE: 17:27:17.572636 conn.go:743: server: <- ChangeCipherSpec (epoch: 1)
dtls TRACE: 17:27:17.572656 handshaker.go:282: [handshake:server] Flight 4 -> Flight 6
dtls TRACE: 17:27:17.572660 handshaker.go:166: [handshake:server] Flight 6: Preparing
dtls TRACE: 17:27:17.572667 handshaker.go:234: [handshake:server] -> changeCipherSpec (epoch: 1)
dtls TRACE: 17:27:17.572670 handshaker.go:166: [handshake:server] Flight 6: Sending
dtls TRACE: 17:27:17.572673 conn.go:387: [handshake:server] -> Finished (epoch: 1, seq: 3)
dtls TRACE: 17:27:17.572697 handshaker.go:166: [handshake:server] Flight 6: Finished
dtls TRACE: 17:27:17.572713 conn.go:216: Handshake Completed
Connected to 192.0.2.1:64571
dtls TRACE: 17:27:18.585513 conn.go:743: server: <- ChangeCipherSpec (epoch: 1)
dtls TRACE: 17:27:19.595466 conn.go:743: server: <- ChangeCipherSpec (epoch: 1)

wireshark1

What did you expect?

Server should follow DTLS spec, which says: "the node that transmits the last flight MUST respond to a retransmit of the peer's last flight with a retransmit of the last flight". This means that server should respond to client retransmissions, as the handshake was not complete from the client point of view (due to packet loss).

Server thinks that handshake has completed. This is not true, because mbedTLS client did not receive the last message from server, so it keeps retransmitting previous message.

According to https://datatracker.ietf.org/doc/html/rfc6347#section-4.2.4:

   In addition, for at least twice the default MSL defined for TCP,
   when in the FINISHED state, the node that transmits the last flight
   (the server in an ordinary handshake or the client in a resumed
   handshake) MUST respond to a retransmit of the peer's last flight
   with a retransmit of the last flight.  This avoids deadlock
   conditions if the last flight gets lost.  This requirement applies to
   DTLS 1.0 as well, and though not explicit in DTLS1, it was always
   required for the state machine to function correctly.  To see why
   this is necessary, consider what happens in an ordinary handshake if
   the server's Finished message is lost: the server believes the
   handshake is complete but it actually is not.  As the client is
   waiting for the Finished message, the client's retransmit timer will
   fire and it will retransmit the client's Finished message.  This will
   cause the server to respond with its own Finished message, completing
   the handshake.  The same logic applies on the server side for the
   resumed handshake.

What happened?

Server ignores client retransmissions. This can be seen on wireshark logs (screen) above.

@jdbruijn
Copy link
Member

jdbruijn commented Feb 1, 2023

We think we have the same problem as this. In our case the server is pion/dtls and the client is an embedded device communicating over cellular (e.g. NB-IoT). The client occasionally seems to miss the server's Hello Verify Request message and retransmits its Client Hello (with sequence number 0). The server ignores this as duplicate packet (see logs below) and doesn't send a new Hello Verify Request. Attached the Wireshark captures. (In this case I'm not sure whether the replay should stop this packet or whether the handshake should continue with a new server Hello Verify Request after the client hello on no. 12.) Hope this helps analysing this issue.

capture.pcap.zip

image

Server log retransmissions
dtls TRACE: 15:04:51.828203 handshaker.go:166: [handshake:server] Flight 0: Preparing
dtls TRACE: 15:04:51.828276 handshaker.go:166: [handshake:server] Flight 0: Sending
dtls TRACE: 15:04:51.828302 handshaker.go:166: [handshake:server] Flight 0: Waiting
dtls TRACE: 15:04:51.828408 handshaker.go:282: [handshake:server] Flight 0 -> Flight 2
dtls TRACE: 15:04:51.828424 handshaker.go:166: [handshake:server] Flight 2: Preparing
dtls TRACE: 15:04:51.828436 handshaker.go:166: [handshake:server] Flight 2: Sending
dtls TRACE: 15:04:51.828447 conn.go:387: [handshake:server] -> HelloVerifyRequest (epoch: 0, seq: 0)
dtls TRACE: 15:04:51.828496 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:04:52.829430 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:04:53.830491 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:04:54.831180 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:04:55.832146 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:04:56.833152 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls DEBUG: 15:04:57.449185 conn.go:674: discarded duplicated packet (epoch: 0, seq: 0)
dtls TRACE: 15:04:57.833916 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:04:58.834248 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:04:59.834361 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:00.834596 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:01.835640 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:02.836381 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:03.837412 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:04.837987 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:05.838382 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:06.838829 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:07.840758 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:08.840882 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:09.841116 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:10.842098 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:11.846328 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:12.850823 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls DEBUG: 15:05:13.529103 conn.go:674: discarded duplicated packet (epoch: 0, seq: 0)
dtls TRACE: 15:05:13.851419 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:14.852340 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:15.853112 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:16.853419 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:17.854099 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:18.855161 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:19.856138 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls TRACE: 15:05:20.857056 handshaker.go:166: [handshake:server] Flight 2: Waiting
2023/01/31 15:05:21.828226 dtls-listener.go:47: dtls: cannot accept connection: cannot accept connection: handshake error: context deadline exceeded
Server log normal connection
dtls TRACE: 14:54:47.469307 conn.go:387: [handshake:server] -> ServerHello (epoch: 0, seq: 1)
dtls TRACE: 14:54:47.469319 conn.go:387: [handshake:server] -> ServerHelloDone (epoch: 0, seq: 2)
dtls TRACE: 14:54:47.469374 handshaker.go:166: [handshake:server] Flight 4: Waiting
dtls DEBUG: 14:54:47.583354 conn.go:737: CipherSuite not initialized, queuing packet
dtls DEBUG: 14:54:47.583430 conn.go:660: received packet of next epoch, queuing packet
2023/01/31 14:54:47.593816 psk-authenticator.go:22: HTTP request (identityHint: ODY3Nzg3MDUwMjU2MTI2, duration: 10.283515ms)
dtls TRACE: 14:54:47.593977 conn.go:743: server: <- ChangeCipherSpec (epoch: 1)
dtls TRACE: 14:54:47.594044 handshaker.go:282: [handshake:server] Flight 4 -> Flight 6
dtls TRACE: 14:54:47.594065 handshaker.go:166: [handshake:server] Flight 6: Preparing
dtls TRACE: 14:54:47.594130 handshaker.go:234: [handshake:server] -> changeCipherSpec (epoch: 1)
dtls TRACE: 14:54:47.594147 handshaker.go:166: [handshake:server] Flight 6: Sending
dtls TRACE: 14:54:47.594163 conn.go:387: [handshake:server] -> Finished (epoch: 1, seq: 3)
dtls TRACE: 14:54:47.594245 handshaker.go:166: [handshake:server] Flight 6: Finished
dtls TRACE: 14:54:47.594270 conn.go:216: Handshake Completed
2023/01/31 14:54:49.840175 handler.go:30: Identity hint: ODY3Nzg3MDUwMjU2MTI2
2023/01/31 14:54:49.840231 handler.go:33: Incoming POST  request from 80.187.98.113:21504
2023/01/31 14:54:49.840241 handler.go:42: Remote address '80.187.98.113' at port 21504
2023/01/31 14:54:49.840254 handler.go:79: Convert octet-stream data to CBOR (queries: [format=cbor])
2023/01/31 14:54:49.861455 http.go:110: HTTP response (duration: 21.085935ms code: 202, content-type: text/plain; charset=utf-8, length: 1)
2023/01/31 14:54:49.861507 handler.go:120: Received HTTP response on POST /iot
2023/01/31 14:54:49.861520 handler.go:128:   Code: 202, Body: 30

@shimataro
Copy link

shimataro commented Sep 29, 2023

My problem may also be related to this issue.

According to RFC6347 section 4.2.8,

In cases where a server believes it has an existing association on a given host/port quartet and it receives an epoch=0 ClientHello, it SHOULD proceed with a new handshake but MUST NOT destroy the existing association until the client has demonstrated reachability either by completing a cookie exchange or by completing a complete handshake including delivering a verifiable Finished message.

However, it appears that the server does not respond to ClientHello with existing parameters.


These are log screenshots on the client.
The server is built with PionDTLS, the client is not.
(client=10.254.0.162, server=100.66.1.41)

ClientHello (10.254.0.162:49157 -> 100.66.1.41:23088)

image

No.34: ClientHello (10.254.0.162:49157 -> 100.66.1.41:23088)
No.38: HelloVerifyRequest

ClientHello with existing parameters (10.254.0.162:49157 -> 100.66.1.41:23088): **problem phase**

image

No.72-: ClientHello with existing parameters (10.254.0.162:49157 -> 100.66.1.41:23088; ignored)

ClientHello with new parameters (10.254.0.162:49160 -> 100.66.1.41:23088)

image

No.176: ClientHello with new parameters (10.254.0.162:49160 -> 100.66.1.41:23088; client port is changed)
No.181: HelloVerifyRequest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants