Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: design doc for end host without dispatcher #4280

Merged
merged 14 commits into from
May 23, 2024

Conversation

matzf
Copy link
Member

@matzf matzf commented Oct 11, 2022

Design document for proposal #3961.

[doc]


In a separate PR, once this is merged, I will attempt to improve the organisation of these design documents and add a template.


This change is Reviewable

@matzf
Copy link
Member Author

matzf commented Oct 11, 2022

For the continued discussion: please continue discussing the overall idea in #3961, and discuss the specifics of the proposed design document in this PR here.

Copy link
Contributor

@marcfrei marcfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, Matthias! I very much support this proposal, especially to have a convincing deployment story for mobile and small embedded platforms.

Another niche use-case that would benefit from this design are applications that rely on kernel or hardware based send and receive timestamps. In our ongoing time synchronization project we first tried to implement timestamping with a somewhat messy extension of the dispatcher and the ReliableSocket protocol over the UNIX domain socket before we finally decided not to have the default dispatcher on time server hosts and instead to listen directly on the end-host port from our application. This works for dedicated time servers but it wouldn't really be practical for end hosts where, for example, besides a time service client multiple other SCION based apps are deployed.

doc/design/router-port-dispatch.rst Outdated Show resolved Hide resolved
doc/design/router-port-dispatch.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@marcfrei marcfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @matzf)

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed (commit messages unreviewed), 1 unresolved discussion (waiting on @matzf)


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

All SCMP messages would be forwarded to the default end-host port and dispatched from there to the correct application with the SCMP daemon.

Same issue as above; this still requires a shared component on the end-host if applications should be able to receive SCMP messages.

I'm wondering if the overhead of pushing the SCMP quote parsing in the router is worth the trade-off of not needing to install an additional application.
Parsing down to the L4 is already not great for performance. But parsing the L4 contents to then get the quote to know where to forward the packet to is even more expensive.

Checking whether the shared application is installed is fairly simple and and the user can be instructed to install the right application, which is a one time operation.
From the complexity to a user it feels ~same as granting camera access for an application.
And as soon as this would make it to the kernel, the additional app is also no longer needed. (To be fair, that's gonna take some time)

Furthermore, while SCMP messages are good for efficient fail-over, they are solely informational.
Everything should work without SCMP messages (if not, we need to address this).
I would not classify them as crucial (and nor should they be)

Code quote:

this still requires a shared component on the end-host if applications should be able to receive SCMP messages.

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @matzf)

a discussion (no related file):
One detail that has not been discussed yet is the SVC redirection.

Currently, the dispatcher delivers SVC packets to a random process that registered for this service.
Currently, we just have control service and discovery service that need to reply to SVC resolution.

As I see it, there are two ways of solve this problem:

  • The router is currently configured with IP addresses where it dispatches SVC packets to. We could modify this list to be UDP addresses. The services then open 2 UDP ports. One for SVC redirects and one for the actual QUIC connections.

  • The "SCMP daemon" handles the redirection directly. I.e., the Service registers the port where it will accept QUIC connections with the "SCMP daemon". The services then only have to open 1 UDP port.

I'm slightly preferring the first option because it seems more reliable, because it resolves less processes. E.g., if the SCMP daemon unexpectedly stops, all SVC redirects fail. Also, after it starts again, all the services would need to re-register.


Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @matzf)

a discussion (no related file):

Previously, oncilla (Dominik Roos) wrote…

One detail that has not been discussed yet is the SVC redirection.

Currently, the dispatcher delivers SVC packets to a random process that registered for this service.
Currently, we just have control service and discovery service that need to reply to SVC resolution.

As I see it, there are two ways of solve this problem:

  • The router is currently configured with IP addresses where it dispatches SVC packets to. We could modify this list to be UDP addresses. The services then open 2 UDP ports. One for SVC redirects and one for the actual QUIC connections.

  • The "SCMP daemon" handles the redirection directly. I.e., the Service registers the port where it will accept QUIC connections with the "SCMP daemon". The services then only have to open 1 UDP port.

I'm slightly preferring the first option because it seems more reliable, because it resolves less processes. E.g., if the SCMP daemon unexpectedly stops, all SVC redirects fail. Also, after it starts again, all the services would need to re-register.

(fwiw, I played around with the second approach in a hackathon because it required no changes to the router. It is feasible, but requires quite some state management to handle edge cases.)


Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @oncilla)


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):
That's a very fair concern.

Checking whether the shared application is installed is fairly simple and and the user can be instructed to install the right application, which is a one time operation.
From the complexity to a user it feels ~same as granting camera access for an application.
And as soon as this would make it to the kernel, the additional app is also no longer needed. (To be fair, that's gonna take some time)

I don't share this view. While such an approach may be appropriate for niche cases like SCION demo applications, this will not work for more "consumer" oriented applications that just want to make use of SCION whenever it is available. For example, consider a browser (let's say Brave) and/or some video chat tool (e.g. Zoom) add SCION support to their apps at some point. If these now suddenly prompt the user with "hey please install this third party application SCION SCMP dispatcher to make things even better", this would confuse and annoy a lot of people and only a tiny fraction would follow through.

Also, somebody would have to own the shared dependency application. That would probably need to be us -- maintaining a user-facing package for various different platforms does not seem fun.

Furthermore, while SCMP messages are good for efficient fail-over, they are solely informational.
Everything should work without SCMP messages (if not, we need to address this).
I would not classify them as crucial (and nor should they be)

That's a good point. So we could just say, tough luck, no SCMPs on this platform. Or, alternatively, we could offload the SCMP quote parsing to the router as proposed, but strictly rate limit this -- as they are optional, dropping any fraction is fair game.


Alternative idea:

Set the SCION FlowID based on the source port for all UDP packets. The flow ID is a 20 bit field, so we have room for the port plus 4 free bits. SCMP errors (already now) use the flow ID of the offending packet.
When delivering the SCMP message to the end host, the underlay destination port is determined by extracting the corresponding bits from the flow ID of the SCMP message. The quote does not have to be inspected.
This approach could also be used for SCMP Echo and Traceroute replies.

Setting the FlowID in this particular way would be required only for hosts that want packets to be port-dispatched by the router. Once we don't need this anymore (because e.g. SCION support is available from the operating system), the Flow ID field can again be chosen as an arbitrary high-entropy identifier.

Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @oncilla)

a discussion (no related file):

Previously, oncilla (Dominik Roos) wrote…

(fwiw, I played around with the second approach in a hackathon because it required no changes to the router. It is feasible, but requires quite some state management to handle edge cases.)

Thanks for pointing this out!

The first approach seems perfectly appropriate to me. I'll include this in the document.
Just for the record, the 2 ports are only necessary due to "implementation details" in snet/appnet. The service could also listen on the same UDP/IP port for both purposes, the application could internally dispatch packets for the two cases (based on the SCION destination address type).


Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @matzf)

a discussion (no related file):

The service could also listen on the same UDP/IP port for both purposes, the application could internally dispatch packets for the two cases (based on the SCION destination address type).

True. But I think it is easier to let the kernel take care of doing the dispatching instead of us needing to implement additional dispatching logic.


Copy link
Contributor

@JordiSubira JordiSubira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed (commit messages unreviewed), 5 unresolved discussions (waiting on @matzf and @oncilla)


doc/design/router-port-dispatch.rst line 27 at r3 (raw file):

The application then receives and transmits SCION packets (exclusively) over this unix domain socket connection.

The dispatcher also implements other end host stack functionality; in particular, it replies to certain SCMP messages like SCMP Echo (ping).

Another

Code quote:

 other

doc/design/router-port-dispatch.rst line 34 at r3 (raw file):

- It is a single point of failure for the SCION network stack of a host. If it goes down, a lot of processes (control service, gateway, tooling) go haywire.
- It lives in userspace, moving every forwarded packet from kernelspace to userspace and back to kernelspace, meaning a performance hit.

user space; or at least try to make it consistent, e.g., above we have user space

Code quote:

userspace

doc/design/router-port-dispatch.rst line 159 at r3 (raw file):

- it also fixes many of the problems listed above related to unix domain sockets, reconnection, etc.

However, this still requires a shared component on the end-host if multiple applications want to use SCION concurrently.

end host

Code quote:

end-host

doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

That's a very fair concern.

Checking whether the shared application is installed is fairly simple and and the user can be instructed to install the right application, which is a one time operation.
From the complexity to a user it feels ~same as granting camera access for an application.
And as soon as this would make it to the kernel, the additional app is also no longer needed. (To be fair, that's gonna take some time)

I don't share this view. While such an approach may be appropriate for niche cases like SCION demo applications, this will not work for more "consumer" oriented applications that just want to make use of SCION whenever it is available. For example, consider a browser (let's say Brave) and/or some video chat tool (e.g. Zoom) add SCION support to their apps at some point. If these now suddenly prompt the user with "hey please install this third party application SCION SCMP dispatcher to make things even better", this would confuse and annoy a lot of people and only a tiny fraction would follow through.

Also, somebody would have to own the shared dependency application. That would probably need to be us -- maintaining a user-facing package for various different platforms does not seem fun.

Furthermore, while SCMP messages are good for efficient fail-over, they are solely informational.
Everything should work without SCMP messages (if not, we need to address this).
I would not classify them as crucial (and nor should they be)

That's a good point. So we could just say, tough luck, no SCMPs on this platform. Or, alternatively, we could offload the SCMP quote parsing to the router as proposed, but strictly rate limit this -- as they are optional, dropping any fraction is fair game.


Alternative idea:

Set the SCION FlowID based on the source port for all UDP packets. The flow ID is a 20 bit field, so we have room for the port plus 4 free bits. SCMP errors (already now) use the flow ID of the offending packet.
When delivering the SCMP message to the end host, the underlay destination port is determined by extracting the corresponding bits from the flow ID of the SCMP message. The quote does not have to be inspected.
This approach could also be used for SCMP Echo and Traceroute replies.

Setting the FlowID in this particular way would be required only for hosts that want packets to be port-dispatched by the router. Once we don't need this anymore (because e.g. SCION support is available from the operating system), the Flow ID field can again be chosen as an arbitrary high-entropy identifier.

I find this alternative idea just fine if we want to spare the quoted packet inspection. How are SCMP errors using the flowID now? In a similar way to decide the upper-layer process that must process the packet?

Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 2 files reviewed, 5 unresolved discussions (waiting on @JordiSubira, @marcfrei, and @oncilla)


doc/design/router-port-dispatch.rst line 27 at r3 (raw file):

Previously, JordiSubira wrote…

Another

Hmm, "functionality" is in plural here (uncountable) . To me, "other" seems better, or at least also acceptable.


doc/design/router-port-dispatch.rst line 34 at r3 (raw file):

Previously, JordiSubira wrote…

user space; or at least try to make it consistent, e.g., above we have user space

Done.


doc/design/router-port-dispatch.rst line 159 at r3 (raw file):

Previously, JordiSubira wrote…

end host

Done.


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, JordiSubira wrote…

I find this alternative idea just fine if we want to spare the quoted packet inspection. How are SCMP errors using the flowID now? In a similar way to decide the upper-layer process that must process the packet?

The FlowID is meant to be consumed by network devices, in particular the routers, to allow safely processing packets in parallel; only packets with the same FlowID are required to be kept in order.

Currently, this is not currently used in any more specific way for SCMPs. The router sets the FlowID of any SCMP error/traceroute response packet to the FlowID of the offending/request packet. That's simply the most sensible choice, but currently nothing depends on this behavior; with the proposed alternative idea, this would become mandatory.

The application process to receive an SCMP error message is determined by inspecting the source UDP port from the quoted offending message. SCMP traceroute or echo reply packets are delivered to applications based on the identifier field of these messages. Currently this happens in the dispatcher, in the long term future, this would happen in the OS network stack.

Copy link
Contributor

@JordiSubira JordiSubira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r1, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @matzf and @oncilla)


doc/design/router-port-dispatch.rst line 27 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Hmm, "functionality" is in plural here (uncountable) . To me, "other" seems better, or at least also acceptable.

No big deal, it can behave as both according to the dict.


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

The FlowID is meant to be consumed by network devices, in particular the routers, to allow safely processing packets in parallel; only packets with the same FlowID are required to be kept in order.

Currently, this is not currently used in any more specific way for SCMPs. The router sets the FlowID of any SCMP error/traceroute response packet to the FlowID of the offending/request packet. That's simply the most sensible choice, but currently nothing depends on this behavior; with the proposed alternative idea, this would become mandatory.

The application process to receive an SCMP error message is determined by inspecting the source UDP port from the quoted offending message. SCMP traceroute or echo reply packets are delivered to applications based on the identifier field of these messages. Currently this happens in the dispatcher, in the long term future, this would happen in the OS network stack.

Thanks for the explanation! If we decide we adopt this alternative, I'd summarize and add this explanation.

Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @oncilla)

a discussion (no related file):

Previously, oncilla (Dominik Roos) wrote…

The service could also listen on the same UDP/IP port for both purposes, the application could internally dispatch packets for the two cases (based on the SCION destination address type).

True. But I think it is easier to let the kernel take care of doing the dispatching instead of us needing to implement additional dispatching logic.

Ok, done.



doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, JordiSubira wrote…

Thanks for the explanation! If we decide we adopt this alternative, I'd summarize and add this explanation.

Any more opinions on this point?

Summary of the proposed options:

  • SCMP daemon (leading to no SCMPs on systems where this is not running/available)
  • SCMP quote parsing in the router (as currently described in the design document), possibly with strict rate limits
  • (ab-)using the FlowID to encode an underlay port for SCMP error message replies.

My current personal favorite is the FlowID hack.
I understood that @oncilla, you preferred the SCMP daemon variant over the parsing in the router, but what is your opinion about the FlowID hack?

Copy link
Contributor

@marcfrei marcfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @matzf and @oncilla)


doc/design/router-port-dispatch.rst line 83 at r5 (raw file):

^^^^^^^^^^^

The remaining functionality of the dispatcher, namely responding to SCMP echo requests, is implemented to a new, very simple "SCMP daemon".

"implemented in"?

Copy link
Contributor

@marcfrei marcfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @matzf and @oncilla)


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Any more opinions on this point?

Summary of the proposed options:

  • SCMP daemon (leading to no SCMPs on systems where this is not running/available)
  • SCMP quote parsing in the router (as currently described in the design document), possibly with strict rate limits
  • (ab-)using the FlowID to encode an underlay port for SCMP error message replies.

My current personal favorite is the FlowID hack.
I understood that @oncilla, you preferred the SCMP daemon variant over the parsing in the router, but what is your opinion about the FlowID hack?

Two questions to better understand the three proposed alternatives:

  • How likely is it that some future SCMP message types will not be "solely informational"? I.e., for example, that a particular SCMP error message type will become important for applications on the end host to work properly? I think this question is important because the new design really should not (or cannot) assume the presence of a shared SCMP daemon on the end host.
  • Setting the FlowID based on the underlay port only for SCMP traffic wouldn't work because the assumption here is that peeking into the L4 payload could be too expensive also at the source of an SCMP message, right?

Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @oncilla)


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

How likely is it that some future SCMP message types will not be "solely informational"? I.e., for example, that a particular SCMP error message type will become important for applications on the end host to work properly? I think this question is important because the new design really should not (or cannot) assume the presence of a shared SCMP daemon on the end host.

The "interface down" error messages already fall into this category, to some extent. While applications need to be aware that they cannot solely rely on these SCMP error messages to determine path health (as the SCMPs are rate limited and e.g. are not available if the first router on the path is dead), having the information about where the path failure occurs can help a lot in handling this. Otherwise applications are just tapping in the dark; in many cases it's hard to even tell whether the path is down or whether the remote host or application failed. So applications either need to give up after some retries, pessimistically avoid a large section of the internet ("the failure might be in any of the ASes on this path, so avoid them all"), or explore large number of paths to determine the broken link.

Setting the FlowID based on the underlay port only for SCMP traffic wouldn't work because the assumption here is that peeking into the L4 payload could be too expensive also at the source of an SCMP message, right?

The idea was to set this "specially crafted" FlowID already when sending out the UDP packets. For SCMP error messages, we just use the FlowID of the offending packet (i.e. the value in the first line of the packet). The router already does this, and this seems like a sensible choice anyway.
The good thing is that only the sending host needs to be aware of this scheme. The bad bit is that all applications running on this host need to be aware of this scheme.

Copy link
Contributor

@marcfrei marcfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @matzf and @oncilla)


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

How likely is it that some future SCMP message types will not be "solely informational"? I.e., for example, that a particular SCMP error message type will become important for applications on the end host to work properly? I think this question is important because the new design really should not (or cannot) assume the presence of a shared SCMP daemon on the end host.

The "interface down" error messages already fall into this category, to some extent. While applications need to be aware that they cannot solely rely on these SCMP error messages to determine path health (as the SCMPs are rate limited and e.g. are not available if the first router on the path is dead), having the information about where the path failure occurs can help a lot in handling this. Otherwise applications are just tapping in the dark; in many cases it's hard to even tell whether the path is down or whether the remote host or application failed. So applications either need to give up after some retries, pessimistically avoid a large section of the internet ("the failure might be in any of the ASes on this path, so avoid them all"), or explore large number of paths to determine the broken link.

Setting the FlowID based on the underlay port only for SCMP traffic wouldn't work because the assumption here is that peeking into the L4 payload could be too expensive also at the source of an SCMP message, right?

The idea was to set this "specially crafted" FlowID already when sending out the UDP packets. For SCMP error messages, we just use the FlowID of the offending packet (i.e. the value in the first line of the packet). The router already does this, and this seems like a sensible choice anyway.
The good thing is that only the sending host needs to be aware of this scheme. The bad bit is that all applications running on this host need to be aware of this scheme.

Thanks! By using the underlay source port as the FlowID, applications would loose the ability to implement recommendations like, e.g, the use of flow labels in QUIC [1]. This wouldn't be too big of a disadvantage?

[1] https://www.rfc-editor.org/rfc/rfc9000.html#name-use-of-ipv6-flow-label-and-

Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @oncilla)


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, marcfrei (Marc Frei) wrote…

Thanks! By using the underlay source port as the FlowID, applications would loose the ability to implement recommendations like, e.g, the use of flow labels in QUIC [1]. This wouldn't be too big of a disadvantage?

[1] https://www.rfc-editor.org/rfc/rfc9000.html#name-use-of-ipv6-flow-label-and-

Correct, although there are still 4 bits left to play with 😉
In the particular case, it seems unproblematic; the recommendation is to avoid reusing the same flow ID when migrating QUIC connection to a new address. When a new socket opened for this new address to migrate to, it will be assigned a different (random) port and thus, with the proposed approach, automatically use a different flow ID.

I don't know how big of a problem this is in general. Reading RFC6437 , it doesn't feel like this would be super terrible though.

Copy link
Contributor

@marcfrei marcfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @matzf and @oncilla)


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Correct, although there are still 4 bits left to play with 😉
In the particular case, it seems unproblematic; the recommendation is to avoid reusing the same flow ID when migrating QUIC connection to a new address. When a new socket opened for this new address to migrate to, it will be assigned a different (random) port and thus, with the proposed approach, automatically use a different flow ID.

I don't know how big of a problem this is in general. Reading RFC6437 , it doesn't feel like this would be super terrible though.

So using the FlowID seems to have no obvious disadvantages so far, ignoring the fact that the source port would appear three times in an outgoing UDP/SCION packet...

@JordiSubira
Copy link
Contributor

For the record, I've started with the implementation of the part among which there's consensus (i.e., I'll leave the SCMP daemon for later). Any idea if how long and what range should we use to have compatibility.

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @marcfrei and @matzf)


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, marcfrei (Marc Frei) wrote…

So using the FlowID seems to have no obvious disadvantages so far, ignoring the fact that the source port would appear three times in an outgoing UDP/SCION packet...

Sorry for the late response.

This is quite an important topic, and the decision has a large impact on current and future deployments.
Because these changes are so fundamental, I have also brought up this topic internally at Anapaya, and we have discussed them extensively.

There are three main concerns that we have when changing such fundamental parts of the networking stack.

  1. Can the change be rolled out incrementally?
  2. Do we fragment the ecosystem?
  3. Do we obstruct future progress?

Concern 1 is very important to us (Anapaya), because we already have a large productive deployment which must be updated incrementally.
We cannot have a single update day where the behavior of a full AS changes atomically. We must provide an interrupt free upgrade path on the granularity of one host/router.

Concern 2 is important if we want to keep the ecosystem coherent. It is also required if we want to support ASes that contain components from different vendors.

Concern 3 is important to us long-term. We do not want to back ourselves into a corner that we cannot get out of anymore. While this discussion is about an implementation detail that will not be put in the standard, we think it will still serve as a reference implementation and will be considered as the de-facto standard.

Thoughts

Given these three concerns, here are our thoughts on the proposed solutions:

Doing all the processing in the endhost (replace dispatcher ebpf solution)

This would be the cleanest solution because it does not introduce yet another layer violation.
Conceptually, the underlay port should not be tied to the UDP/SCION port. This change is compatible with the current deployment,
and does not fragment the ecosystem. It can also be incrementally rolled out.

However, it obstructs future progress in the mobile platforms. Running ebpf on an android host requires either root
privileges or a system code signature. Whether it is supported on iOS, is unknown to me.

UDP/SCION Port == Underlay Port && SCMP Daemon does SCMP error dispatching

This solution introduces a layer violation by tying the underlay to the UDP/SCION layer.
It is incrementally deployable with the proposed reserved port range for packets where the rewrite should happen.
We run the risk of fragmenting the ecosystem, but it is rather low. The current dispatcher allocates ports
sequentially in the range 32768-65535
If the reserved range is chosen carefully, it is very unlikely that there is a clash.

The drawback of this approach is the SCMP daemon. According to @marcfrei it is hard to implement it on mobile platforms.
Thus, we might obstruct future progress with this solution.

UDP/SCION Port == Underlay Port && SCMP Quote parsing

This solution is essentially the same as the solution above, except that the SCMP Quote is read by the router and the underlay port is set to the original source port.
Same as above, this change should be incrementally deployable, and we run low risk of fragmenting the ecosystem.

The additional benefit is that the end hosts do not need the SCMP daemon to receive SCMP error messages. This will simplify deployments on mobile platforms a lot. However, this has a severe performance impact. Limiting the performance of routers (that are under heavy load) for a hack because the OS of a mobile platform can not provide the necessary tooling feels off.

Flow ID

This solution introduces two layer violations. It ties both the underlay and the flow id to UDP/SCION.
This change can be incrementally rolled out, if we assume that the flow ID has not been used so far.
However, it is important to note that all other proposals above are AS-local changes. This change on the other hand requires all ASes to behave in an expected manner around the flow ID (re-using flow id for scmp error packet)

Tying the flow ID to the source port limits the number of different flows to 16. This might not be an issue for client applications that can open as many ports as they want, but on the server side, this can be quite restrictive.

According to RFC6437 "flow label values should be chosen such that their bits exhibit a high degree of variability".
This would be violated because even though ports can be chosen from the 16-bit space, usually they have very low variability.
Of course, the flow ID is not the single input, so it might be fine. But I don't really like going against an IPv6 RFC that was the inspiration for our flow ID.

While the flow ID is currently not used by scionproto, we (Anapaya) are actively working on RSS that includes the flow ID. We were planning to use in the order of hundreds of flow IDs per connection. Limiting the number of different flows to 16 is concerning to us as this heavily restricts the possible values.

Conclusion

While replacing the dispatcher with an ebpf program that deals with the rewriting would be our favorite solution, we understand that it is not feasible at the moment for many platforms. In the interest of bringing native SCION applications to the end hosts, I'm in favor of going forward with one of the other proposals.

However the choice is not obvious to me yet. I would push against sticking it in the flow ID, for the reasons outlined above. It is one more layer violation and limits the usefulness of the flow ID to much for its actual purpose .

Parsing the quote on the router is quite expensive. Essentially, we at least double the parsing effort. It would be best to avoid this if possible. Thus, I'm torn between parsing the SCMP packets in the router, and having an SCMP daemon on the end host. If we were to implement the parsing option, we will definitely start with SCMP rate limiting.

@marcfrei could you elaborate on the challenges of implementing an SCMP daemon on mobile platforms?

Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @oncilla)


doc/design/router-port-dispatch.rst line 168 at r3 (raw file):

Previously, oncilla (Dominik Roos) wrote…

Sorry for the late response.

This is quite an important topic, and the decision has a large impact on current and future deployments.
Because these changes are so fundamental, I have also brought up this topic internally at Anapaya, and we have discussed them extensively.

There are three main concerns that we have when changing such fundamental parts of the networking stack.

  1. Can the change be rolled out incrementally?
  2. Do we fragment the ecosystem?
  3. Do we obstruct future progress?

Concern 1 is very important to us (Anapaya), because we already have a large productive deployment which must be updated incrementally.
We cannot have a single update day where the behavior of a full AS changes atomically. We must provide an interrupt free upgrade path on the granularity of one host/router.

Concern 2 is important if we want to keep the ecosystem coherent. It is also required if we want to support ASes that contain components from different vendors.

Concern 3 is important to us long-term. We do not want to back ourselves into a corner that we cannot get out of anymore. While this discussion is about an implementation detail that will not be put in the standard, we think it will still serve as a reference implementation and will be considered as the de-facto standard.

Thoughts

Given these three concerns, here are our thoughts on the proposed solutions:

Doing all the processing in the endhost (replace dispatcher ebpf solution)

This would be the cleanest solution because it does not introduce yet another layer violation.
Conceptually, the underlay port should not be tied to the UDP/SCION port. This change is compatible with the current deployment,
and does not fragment the ecosystem. It can also be incrementally rolled out.

However, it obstructs future progress in the mobile platforms. Running ebpf on an android host requires either root
privileges or a system code signature. Whether it is supported on iOS, is unknown to me.

UDP/SCION Port == Underlay Port && SCMP Daemon does SCMP error dispatching

This solution introduces a layer violation by tying the underlay to the UDP/SCION layer.
It is incrementally deployable with the proposed reserved port range for packets where the rewrite should happen.
We run the risk of fragmenting the ecosystem, but it is rather low. The current dispatcher allocates ports
sequentially in the range 32768-65535
If the reserved range is chosen carefully, it is very unlikely that there is a clash.

The drawback of this approach is the SCMP daemon. According to @marcfrei it is hard to implement it on mobile platforms.
Thus, we might obstruct future progress with this solution.

UDP/SCION Port == Underlay Port && SCMP Quote parsing

This solution is essentially the same as the solution above, except that the SCMP Quote is read by the router and the underlay port is set to the original source port.
Same as above, this change should be incrementally deployable, and we run low risk of fragmenting the ecosystem.

The additional benefit is that the end hosts do not need the SCMP daemon to receive SCMP error messages. This will simplify deployments on mobile platforms a lot. However, this has a severe performance impact. Limiting the performance of routers (that are under heavy load) for a hack because the OS of a mobile platform can not provide the necessary tooling feels off.

Flow ID

This solution introduces two layer violations. It ties both the underlay and the flow id to UDP/SCION.
This change can be incrementally rolled out, if we assume that the flow ID has not been used so far.
However, it is important to note that all other proposals above are AS-local changes. This change on the other hand requires all ASes to behave in an expected manner around the flow ID (re-using flow id for scmp error packet)

Tying the flow ID to the source port limits the number of different flows to 16. This might not be an issue for client applications that can open as many ports as they want, but on the server side, this can be quite restrictive.

According to RFC6437 "flow label values should be chosen such that their bits exhibit a high degree of variability".
This would be violated because even though ports can be chosen from the 16-bit space, usually they have very low variability.
Of course, the flow ID is not the single input, so it might be fine. But I don't really like going against an IPv6 RFC that was the inspiration for our flow ID.

While the flow ID is currently not used by scionproto, we (Anapaya) are actively working on RSS that includes the flow ID. We were planning to use in the order of hundreds of flow IDs per connection. Limiting the number of different flows to 16 is concerning to us as this heavily restricts the possible values.

Conclusion

While replacing the dispatcher with an ebpf program that deals with the rewriting would be our favorite solution, we understand that it is not feasible at the moment for many platforms. In the interest of bringing native SCION applications to the end hosts, I'm in favor of going forward with one of the other proposals.

However the choice is not obvious to me yet. I would push against sticking it in the flow ID, for the reasons outlined above. It is one more layer violation and limits the usefulness of the flow ID to much for its actual purpose .

Parsing the quote on the router is quite expensive. Essentially, we at least double the parsing effort. It would be best to avoid this if possible. Thus, I'm torn between parsing the SCMP packets in the router, and having an SCMP daemon on the end host. If we were to implement the parsing option, we will definitely start with SCMP rate limiting.

@marcfrei could you elaborate on the challenges of implementing an SCMP daemon on mobile platforms?

Thanks for the very structured write up, that's great. For what it's worth, I believe we've implicitly considered exactly these three concerns when we discussed this design, but it's great to spell this out explicitly.

As a general point regarding the "layer violation" objection. The way I see the UDP underlay is that this is already a stop-gap approach. There is a mismatch between the address type in the SCION header (IP v4/v6) and the underlay (UDP, i.e. IP + port). Now, actually having the UDP port to the SCION address would seem very odd. Instead, the "right" approach should be to have IP directly as the underlay.
My guess as to why we don't have IP underlay directly (yet) is that it's just much more practical to use UDP compared to defining a new protocol number which would certainly be dropped by all kinds of middleboxes. So, the UDP underlay is just a pragmatic, convenient choice to aid deployability.
So, the UDP underlay is already "wrong", so milking this UDP underlay a bit more to aid deployability more seems fair game.
I admit this is a bit of a foot-in-the-door argument.

What we ideally want to achieve is to make it possible to deploy multiple SCION-native applications independently and fully self-contained. There should be no requirement for any non-standard system components to be installed or running, and no coordination between different applications.
The ebpf-dispatcher approach does not achieve this (generally, not only on mobile platforms); something will need to install this ebpf program (and even needs special privileges for this).
Requiring an SCMP daemon fails this ideal goal in the same way (needs to be installed and started somehow).

I can understand the concerns against ab-using the flow ID.


Here are two variants on the approaches suggested before:

UDP/SCION Port == Underlay Port && shared responsibility for SCMP error dispatching

This solution is essentially the same as the SCMP daemon approach, except that instead of having a specific SCMP daemon process, all the SCION applications listen on the UDP port 30041 for SCMP error messages with SO_REUSEPORT (or some analogous platform specific option). The applications parse the SCMP packets and forward to the correct application (send via IP to localhost:port).
This approach does not achieve the ideal of "no coordination between applications". The coordination effort is modest, though, and there is no need to explicitly register with or startup/shutdown central components. The mechanism appears to be available in some form on almost any platform. There is, however, information leaking between the non-privileged applications (applications can see the quoted message causing the SCMP error).
On Linux, the SO_REUSEPORT mechanism is limited to processes of the same user ID; this is a bit awkward for our use case, but not a huge problem. Applications need to ignore bind errors when opening this socket, and, in case of a bind error periodically re-try to open the socket to cover the case all applications previously "serving" the SCMP error port have gone away.

UDP/SCION Port == Underlay Port && limited SCMP Quote parsing by abusing the reserved bits in the common header

This is a frankeinstein monster mash-up of the SCMP Quote parsing and the Flow ID ideas.
To avoid having to parse the full quoted packet header down to the L4, we squirrel away the port information to a fixed location early in the packet header.
We could again use the Flow ID for this idea, but actually there are 16 reserved bits in the common header just waiting to be (ab-)used.
In contrast to the Flow ID approach, this does not require any behavioral change of the routers in other ASes, as long as they faithfully copy the offending packet to the quote.

@marcfrei
Copy link
Contributor

... could you elaborate on the challenges of implementing an SCMP daemon on mobile platforms?

I don't know whether and how this could work on Android but at least on iOS/iPadOS there is as far as I can see simply no concept of a user-installable background daemon. There is however the possibility to install and enable a separate app that provides a so-called Network Extension (https://developer.apple.com/documentation/networkextension) which could come close to what is needed here.

We would need to investigate whether it is feasible in principle to implement SCMP daemon functionality based on what the network extension model allows. In addition, there's then also the question how such a network extension would work together with other network extensions. VPN apps (like, e.g. Tailscale) have to be implemented as network extensions and at least VPN extensions are mutually exclusive among each other. Independent of the technical feasibility it also seems to me that such a solution would be rather finicky to manage for typical end-users. Installing a network extension requires entry of the device passcode which already feels a little scary to begin with. It's then very easy to imagine support situations where a SCION-based application doesn't work as expected. The provider of such an app would need to ask the user to check whether the separate SCION SCMP daemon app that provides the network extension is already installed and enabled correctly. The approach thus wouldn't really satisfy Matthias's requirement:

What we ideally want to achieve is to make it possible to deploy multiple SCION-native applications independently and fully self-contained. There should be no requirement for any non-standard system components to be installed or running, and no coordination between different applications.

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @JordiSubira and @matzf)

Copy link
Contributor

@JordiSubira JordiSubira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @matzf)


doc/dev/design/router-port-dispatch.rst line 164 at r11 (raw file):

         "min": <port>,
         "max": <port>
       }

Right now I am using the following format based on a Marc's suggestion on the implementation PR (https://reviewable.io/reviews/scionproto/scion/4344#-NqRlyUyA8rSxIAJFYWy):

"endhost_port_range": "1024-65535",

It also allows the syntax "" and "-" to indicate an empty port range. Where should we change it?

Code quote:

       "dispatched_ports": {
         "min": <port>,
         "max": <port>
       }

Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @JordiSubira, @lukedirtwalker, and @marcfrei)


doc/dev/design/router-port-dispatch.rst line 164 at r11 (raw file):

Previously, JordiSubira wrote…

Right now I am using the following format based on a Marc's suggestion on the implementation PR (https://reviewable.io/reviews/scionproto/scion/4344#-NqRlyUyA8rSxIAJFYWy):

"endhost_port_range": "1024-65535",

It also allows the syntax "" and "-" to indicate an empty port range. Where should we change it?

Done, i've changed it here. I've also added an "all" value that is a shorthand for 1-65535. When documenting the intended procedure for the update, I realized that this will be the value used for most of the time once the update is complete, so it makes sense to have a short value.


doc/dev/design/router-port-dispatch.rst line 172 at r11 (raw file):

Previously, marcfrei (Marc Frei) wrote…

Resolved in an offline discussion: the port range is intended to get removed or made wider again after an AS is updated completely.

I've added more details on the compatibility mechanisms and the intended update procedure now, as discussed in the chat and in tuesday's call.
The significance of this port range has been somewhat reduced since the first version of this document (which had not yet included the "shim dispatcher"). Now it is really only the range of (mostly ephemeral) ports that can be used by "new" devices/applications without the shim dispatcher, during the transition phase before the update of the AS is complete. I hope this is clearer now -- admittedly, it had not been entirely clear to me before thinking it through again to write it down.

Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 2 unresolved discussions (waiting on @JordiSubira, @lukedirtwalker, and @marcfrei)


doc/dev/design/router-port-dispatch.rst line 215 at r11 (raw file):

Previously, marcfrei (Marc Frei) wrote…

unknown

Done.

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r13, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @JordiSubira and @matzf)


doc/dev/design/router-port-dispatch.rst line 208 at r13 (raw file):

Conversely, if our long term vision materializes and we'd have SCION support directly built-in to the operating system's network stack, then this workaround becomes obsolete.
In an optimistic scenario, where there are millions of end hosts running SCION-enabled applications, we can not expect that all devices and applications will updated to the same level of SCION support within a useful time frame.

Suggestion:

will be updated 

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @JordiSubira and @matzf)


doc/dev/design/router-port-dispatch.rst line 208 at r13 (raw file):

Conversely, if our long term vision materializes and we'd have SCION support directly built-in to the operating system's network stack, then this workaround becomes obsolete.
In an optimistic scenario, where there are millions of end hosts running SCION-enabled applications, we can not expect that all devices and applications will updated to the same level of SCION support within a useful time frame.

typo:

Copy link
Contributor

@marcfrei marcfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @JordiSubira and @matzf)


doc/dev/design/router-port-dispatch.rst line 153 at r13 (raw file):

  The processing rule above is extended:

  2. If the underlay UDP/IP destination port determined above is within the port range specified in the topology configuration,

above, i.e. in processing rule 1,

Code quote:

above

doc/dev/design/router-port-dispatch.rst line 202 at r13 (raw file):

   new applications/devices without the shim dispatcher, they can pick the empty range in step 1.,
   and state 3. is skipped.

Should we add an additional note that explicitly explains how the transition works for server applications listening on well-known ports? I guess they wouldn't start listening in the dispatched_ports range but just use the shim dispatcher until dispatched_ports gets extended to also cover the well known ports?


doc/dev/design/router-port-dispatch.rst line 216 at r13 (raw file):

- enable shim dispatcher for services outside of intended ``dispatched_port`` range
- shrink ``dispatched_ports`` range and configure this on routers and hosts

When can this step safely be initiated, i.e., when do we know that all hosts in a given AS have enabled the shim dispatchers again?


doc/dev/design/router-port-dispatch.rst line 279 at r13 (raw file):

     Remove the packet dispatching/forwarding functionality from "dispatcher".
     Only SCMP echo responder remains in dispatcher. Rename to "SCMP Daemon" (scmpd).
   - set suitable default for port range in ``dispatched_ports`` topology configuration

configuration.

Code quote:

configuration

matzf pushed a commit that referenced this pull request May 17, 2024
…4344)

Implement the dispatcher-less end host with the UDP port dispatch in the
router, as discussed in #4280.
Applications (using snet) now open underlay UDP ports directly, and use
the same port number for the underlay UDP and SCION/UDP. This
SCION_UDP.dst_port number is used by the router as underlay port when
forwarding packets to destination hosts.
The `dispatcher` has been completely refactored and pruned. It now
serves only as a responder for SCMP echo/traceroute requests, and, as a
transition mechanism, acts as a stateless "shim" that forwards UDP
datagrams.
The `reliable/sock` packages has been removed.
Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 4 unresolved discussions (waiting on @JordiSubira, @lukedirtwalker, and @marcfrei)


doc/dev/design/router-port-dispatch.rst line 202 at r13 (raw file):

Previously, marcfrei (Marc Frei) wrote…

Should we add an additional note that explicitly explains how the transition works for server applications listening on well-known ports? I guess they wouldn't start listening in the dispatched_ports range but just use the shim dispatcher until dispatched_ports gets extended to also cover the well known ports?

Good idea, done.


doc/dev/design/router-port-dispatch.rst line 216 at r13 (raw file):

Previously, marcfrei (Marc Frei) wrote…

When can this step safely be initiated, i.e., when do we know that all hosts in a given AS have enabled the shim dispatchers again?

Good question, I don't have a good answer. I've added a note to point out this flaw.

Copy link
Contributor

@marcfrei marcfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r14, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @JordiSubira)

Copy link
Member Author

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @JordiSubira from 2 discussions.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @matzf)

@matzf matzf merged commit c43edd7 into scionproto:master May 23, 2024
4 checks passed
@matzf matzf deleted the doc-no-dispatcher branch May 23, 2024 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants