[Feature Request] MTU discovery aka iterate on packet size #873

jmanteau · 2023-12-17T11:12:31Z

Trippy is an incredible tool that I start to integrate into my tooling.
One interesting point could be to iterate on the packet size to discover what is the maximum size without fragmentation (on each hop) hence discovery the allowable MTU on the path.
In a way, reimplement PMTUD and showing it graphically in the output.

fujiapple852 · 2023-12-19T13:09:03Z

@jmanteau could this be done during normal tracing or would this need to be a separate mode of operation?

I've only pondered this briefly but I think it would need to be a separate mode as nodes which respond with Fragmentation Needed will not be useful for general tracing/ping purposes?

I'm keen to keep the feature set of Trippy narrowly focused and so would like to implement his if it could be an addition to the basic trace data, much as ICMP extension data is for example.

jmanteau · 2023-12-20T16:40:35Z

From an Internet point of view, the MTU part can be useful but less interesting. Where adding this info can be interesting is in the Enterprise world with diverse WAN links and encap done over the network.
I'm inline that it should be a switch to activate and not shown by default. And there will be an inherent limitation that you can only see the "MTU Path" until you encountered the smallest one on the path as the hop after cannot be reached with a larger one.
This one is a suggestion as I see potential for trippy to aggregate the different icmp use case and not only traceroute. But I fully understand if you decide this is not the main goal of trippy (ping sweep still works fine !)

fujiapple852 · 2023-12-21T09:38:39Z

I think being able to see the "MTU Path" could be valuable, especially in ECMP contexts where the MTU may differ between paths.

Another idea would be to send extra probes, outside of those used for normal tracing, with varying MTUs such that path MTU data could be obtained alongside, but not instead of, the trace data.

I think we'd have to PoC this and see how feasible it is. Is this something you'd be able to attempt?

jmanteau · 2023-12-22T18:26:15Z

From a networking point of view and general programming, yes I could. However I am a complete novice in Rust and I don't know Trippy codebase.
I can try to do a separate PoC from Trippy with Python to show the idea ?

fujiapple852 · 2023-12-28T05:22:26Z

@jmanteau Thanks for offering. I think the concept and mechanism for MTU discover is well understood, the "trick" here is figuring out how to integrate it with the existing codebase, so I don't think a PoC outside of Trippy would add much.

mpenning · 2024-05-12T19:31:28Z

@fujiapple852, I echo the use-case of this feature request and would point out the following that may help as part of the implementation...

`tracepath` pmtu

VXLAN is quite popular between datacenters these days, but VXLAN adds like 50-bytes of encapsulation overhead at the IP layer. As such an option to print ptmu would help trip's use-case in testing MTU issues (which can be a headache for people like me... a network engineer).

FYI, one of the very useful things about tracepath is that it prints pmtu by default.

Example

mpenning@mudslide:~$ sudo tracepath cisco.com
 1?: [LOCALHOST]                      pmtu 1500
 1:  no reply
 2:  172.31.255.1                                          5.059ms
 3:  192.168.1.254                                        24.597ms
 4:  107-192-216-1.lightspeed.austtx.sbcglobal.net        10.967ms
 5:  71.149.90.20                                         10.296ms
 6:  no reply
 7:  32.130.16.9                                          17.184ms asymm  9
 8:  gar24.dlstx.ip.att.net                               31.814ms
 9:  12.247.85.46                                         34.750ms
10:  128.107.2.9                                          33.816ms
11:  72.163.0.102                                         33.575ms asymm 12
12:  rcdn9-cd2-dmzdcc-gw2-por1.cisco.com                  32.711ms
13:  rcdn9-bb07-fab1-sw3811-dmzdcc1uplink.cisco.com       32.719ms asymm 14
...
     Too many hops: pmtu 1500
     Resume: pmtu 1500

This section reinforces the original request, above. Also see below...

Ethernet MTU / Jumbo frames

FYI, these days you can manually set the ethernet MTU of switches to over 9000 bytes. Multiple vendors allow jumbo frames and you can't rightly test jumbo frame capabilities with trip today... (also see next paragraph)

Packet-size: IPv4 and IPv6

As hinted above, the following setting should change as part of implementing this feature...

mpenning@mudslide:~$ sudo bin/trip --packet-size 1500 4.2.2.2
Error: packet-size (1500) must be between 48 and 1024 inclusive for Ipv4thenIpv6
                                          ^^^^^^^^^^^
mpenning@mudslide:~$

Perhaps 1024 was initially chosen for simplicity, but it gets in the way of characterizing the limits of real jumbo frame networks and host IP stacks.

DF Flag: configurable

The maximum IPv4 packet and IPv6 packet (if fragmentation is allowed) is 65,535 bytes (for IPv4, ref rfc 791). As part of this feature, I would like the IPv4 / IPv6 DF-flag to be configurable, but leave the default to DF set (which is what trip defaults for IPv4... I can't readily test IPv6).

fujiapple852 · 2024-05-15T13:53:12Z

@mpenning (cc @jmanteau) i'm keen on the idea of adding the ability to determine and display path MTU in Trippy, but i'm not yet clear on the right implementation approach and there are a number of complexities in Trippy not present in tools like tracepath that need to be accounted for here.

I've written up some notes on the options and challenges below.

Probe Options

The two broad options I can see are:

Intrusive

The first option, which i'll name "intrusive", would involve varying the MTU for legitimate tracing probes, much as Trippy varies the TTL of probes today. Trippy would interpret ICMP DestinationUnreachable messages with code 4 (i.e. "Fragmentation needed, DF bit set") responses and use these to determine the lowest path MTU.

The advantage of this approach is that it works "in-band" with the existing tracing strategy, i.e. we don't have to send any additional probe packets for path MTU discovery.

However this would mean that, if a probe fails along the path due to the MTU, it will prevent that host (and any subsequent hosts on the path) from returning a response it otherwise would have (i.e. ICMP TimeExceeded) which will distort the tracing statistics. So for example, a probe with an initial ttl of 4 may generate an error at the 3rd hop on the path due to the MTU, and thus the TimeExceeded error that would otherwise have been returned by 4th hop on the path will never occur, and instead the tracer will see the MTU error from hop 3.

This also begs the question, during MTU discovery, would Trippy vary the probe size per hop within a round or between rounds? My gut feeling is Trippy should use a consistent packet size for all probes within a single round and only vary the size (when needed) between rounds. This would lead to some rounds being "truncated" (no probe will get past the first host which rejects the probe due to the MTU) which seems weird.

Dedicated

The second option, which i'll name "dedicated", would involve the tracer sending dedicated probes over and above those used for regular tracing, for the sole purpose of determining the path MTU.

This has the advantage of avoiding any negative interaction with the normal statistics used for tracing. It is not immediately clear how this would work as, assuming only UDP here, Trippy needs a way to distinguish probes by sequence number and uses various techniques (classic, Paris, Dublin) to stuff this into an outgoing probe packet and it would need to do this for both regular probes and dedicated MTU probes. This would need to be tightly integrated into the tracing sliding window algorithm which is already complex.

Other Issues

Either way, Trippy would also have to have logic to decide when to vary the packet length and how to deal with legacy systems which do not set the "Next-Hop MTU" in the IMCP DestinationUnreachable message. The guidance in rfc1191 section 7.1 seems sensible here.

Flows / Paris / Dublin

There is also the issue for "flows", which Trippy records for UDP/paris and UDP/dublin, where the path MTU is, clearly, a function of the path and so each path will potentially have a separate MTU and each such MTU can change over the lifetime of the trace. It isn't immediately clear to me how Trippy could associated a given DestinationUnreachable with code 4 to an existing flow, given a flow is identified by the list of hosts traversed, and this list will be truncated when probes fail due to MTU.

Maybe path MTU discovery would only be supported for UDP/classic mode as a simplification.

Packet Size

I don't recall why 1024 was chosen as the maximum packet size initially, it was perhaps an arbitrary choice or inspired by other tools. Of course, for tracing purposes, this limit on packet size is not big issue, It only becomes an issue for path MTU discovery on links which support large frames.

One practical issue with increasing this limit is that Trippy attempts to avoid heap allocations in the tracer hot path and so allocates the probe buffer on the stack. Increasing this substantially above 1024 would increase required maximum stack size of the application, though I don't think even 64k would be a deal breaker.

Changing the maximum packet size also introduced complexities for the implements of IPv6/dublin (see IPv6 section below), but these should be possible to overcome, perhaps by having separate maximum packet size limits for different modes. This would need some more thought.

Don't Fragment Bit

Trippy sets the DF bit for IPv4 unconditionally, this was done mainly to allow for UDP/dublin which uses the IPv4 fragment identification field and this is only safe to do so if DF is set. Outside of MTU discovery, is there a tracing use-case for not setting DF? I can't think of a reason to run a trace with IPv4 probes that are being fragmented at the IP layer. Having said that, I don't see any harm in allowing this for UDP/classic and UDP/paris and I feel it should "just work" in Trippy today, though I have not tested it.

IPv6

IPv6 has no fragmentation and hence no DF bit so another mechanism is needed to support path MTU discovered, as set out in the various RFCs. I haven't explored this yet.

IPv6/UDP/dublin would be problematic as the sequence number is encoded as the size of the payload, and so the two features would be in direct conflict and MTU discovery would need to be disallowed for IPv6/UDP/dublin tracing.

Payload Patterns

Trippy allows users to specify an octet to use as the repeated payload. This should work as expected when the payload is varied in length as the packet size is adjusted.

c-git · 2024-05-16T13:05:49Z

I don't feel very strongly about MTU detection as it is not a need that I have but understand why it is relevant to others. I just wanted to jump in with my opinion on the first option Intrusive. I don't think this would be a good idea as distorting the statistics for all later hops IMO could lead to confusing users. I think it would be very hard to surface a sufficient warning so users understand that the stats are not correct because MTU was trying to be detected. Unfortunately I don't have a solution as the other option while seem less bad to me does seem much more complicated.

That said I'm wondering if MTU is something that changes often? Maybe a separate mode or feature could be used for it's detection. Maybe even option one Intrusive but because you are in MTU mode and you went there by choice and not default you hopefully understand the impact on the stats. Just an idea.

mpenning · 2024-05-19T00:28:23Z

Hello @fujiapple852, I've considered your comment above and thought I'd share the following...

PMTU Detection CLI option

I think the thing that makes the most sense is that PMTU detection should NOT be on by default and it should be triggered from a CLI option.

Additionally, I think the user should be required to specify the max MTU size to be detected in the CLI option. For the case of nodes that do not share what the MTU should be in the ICMP response, this avoids the issue of having to send an absurd number of probes for MTU values beyond what the user knows is configured in their network (and sometimes people will know what the max MTU should be). In the case that they do not know what the max should be, they can set max MTU some large value (beyond what they believe the equipment is capable of).

PMTU unit of measure

In all cases, I think the MTU should be specified as the maximum IP packet size in bytes (including IPv4 / IPv6 headers and associated options).

PMTU implementation ideas

Based on a comparison of intrusive and dedicated MTU detection, I think a dedicated MTU detection mode should be used; however, I think it's helpful to keep tracing the path as usual while detecting PMTU because you want to know whether a hop is dropping packets. If a specific combination of IP addr / hop TTL value is dropping all packets, it's pointless to try detecting MTU at that node and you should just bypass MTU detection while that node drops all packets.

Furthermore, once you find that a node fails to return an MTU probe, the question is open whether the node dropped the probe due to MTU size, packet loss, or control-plane ICMP rate-limiting. As such, I think the user should be given the option of specifying how many MTU probes should be sent per-packet-size, per-hop. This places the burden on the user to set how reliable they want the MTU detection to be.

Flows / Paris / Dublin

I personally think that MTU detection for ICMP, UDP, and TCP are all valuable; I have witnessed different results from ICMP vs UDP vs TCP traces.

Large Packet size use-case

The reason I mention fragmented IP packet sizes up to 64k is because real networks may have problems with:

Fragments administratively blocked
Reassembly (which is sometimes done on the router in IPv4 networks)

DF-bit

You are correct that there is no DF bit in IPv6, but there is an IPvt6 more-fragments bit. IPv6 fragmentation is still permitted by the sending station, so I believe that testing it is useful in IPv6 corner cases.

fujiapple852 added triage enhancement New feature or request and removed triage labels Dec 19, 2023

fujiapple852 self-assigned this Apr 20, 2024

fujiapple852 added this to the 0.11.0 milestone Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] MTU discovery aka iterate on packet size #873

[Feature Request] MTU discovery aka iterate on packet size #873

jmanteau commented Dec 17, 2023

fujiapple852 commented Dec 19, 2023

jmanteau commented Dec 20, 2023

fujiapple852 commented Dec 21, 2023

jmanteau commented Dec 22, 2023

fujiapple852 commented Dec 28, 2023

mpenning commented May 12, 2024

fujiapple852 commented May 15, 2024 •

edited

c-git commented May 16, 2024

mpenning commented May 19, 2024

[Feature Request] MTU discovery aka iterate on packet size #873

[Feature Request] MTU discovery aka iterate on packet size #873

Comments

jmanteau commented Dec 17, 2023

fujiapple852 commented Dec 19, 2023

jmanteau commented Dec 20, 2023

fujiapple852 commented Dec 21, 2023

jmanteau commented Dec 22, 2023

fujiapple852 commented Dec 28, 2023

mpenning commented May 12, 2024

tracepath pmtu

Ethernet MTU / Jumbo frames

Packet-size: IPv4 and IPv6

DF Flag: configurable

fujiapple852 commented May 15, 2024 • edited

Probe Options

Intrusive

Dedicated

Other Issues

Flows / Paris / Dublin

Packet Size

Don't Fragment Bit

IPv6

Payload Patterns

c-git commented May 16, 2024

mpenning commented May 19, 2024

PMTU Detection CLI option

PMTU unit of measure

PMTU implementation ideas

Flows / Paris / Dublin

Large Packet size use-case

DF-bit

`tracepath` pmtu

fujiapple852 commented May 15, 2024 •

edited