Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of invalid utf-8 code points according to [mqtt-v3.1.1-plus-errata01] #59

Open
rockebee opened this issue May 23, 2019 · 0 comments

Comments

@rockebee
Copy link

Hi,
as part of a production pen test for an application using Aedes as MQTT broker, a question came up how the broker handles invalid UTF-8 code points in topic strings.
According to the MQTT spec (cmp. http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/errata01/os/mqtt-v3.1.1-errata01-os-complete.html#_Toc442180829) and as highlighted by a whitepaper from Trendmicro (https://documents.trendmicro.com/assets/white_papers/wp-the-fragility-of-industrial-IoTs-data-backbone.pdf, section 2.1.2) some code points (i. e. control characters) MUST close the network connection, for some others it MAY close the network connection (e. g. U+0001..U+001F, U+007F..U+009F). However, I could not find any of the filtering anywhere in the code of neither Aedes nor mqtt-packet, which I was supposing to be the relevant candidate for doing so (as it provides the parser for the topic).

The conformance statements from the spec:

The character data in a UTF-8 encoded string MUST be well-formed UTF-8 as defined by the Unicode specification [Unicode] and restated in RFC 3629 [RFC3629]. In particular this data MUST NOT include encodings of code points between U+D800 and U+DFFF. If a Server or Client receives a Control Packet containing ill-formed UTF-8 it MUST close the Network Connection [MQTT-1.5.3-1].

A UTF-8 encoded string MUST NOT include an encoding of the null character U+0000. If a receiver (Server or Client) receives a Control Packet containing U+0000 it MUST close the Network Connection [MQTT-1.5.3-2].

The data SHOULD NOT include encodings of the Unicode [Unicode] code points listed below. If a receiver (Server or Client) receives a Control Packet containing any of them it MAY close the Network Connection:

U+0001..U+001F control characters 
U+007F..U+009F control characters 
Code points defined in the Unicode specification [Unicode] to be non-characters (for example U+0FFFF) 

A UTF-8 encoded sequence 0xEF 0xBB 0xBF is always to be interpreted to mean U+FEFF ("ZERO WIDTH NO-BREAK SPACE") wherever it appears in a string and MUST NOT be skipped over or stripped off by a packet receiver [MQTT-1.5.3-3].

Is this part of the spec just not implemented anywhere or am I looking at the wrong code base?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant