Handling of invalid utf-8 code points according to [mqtt-v3.1.1-plus-errata01] #59

rockebee · 2019-05-23T10:04:56Z

Hi,
as part of a production pen test for an application using Aedes as MQTT broker, a question came up how the broker handles invalid UTF-8 code points in topic strings.
According to the MQTT spec (cmp. http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/errata01/os/mqtt-v3.1.1-errata01-os-complete.html#_Toc442180829) and as highlighted by a whitepaper from Trendmicro (https://documents.trendmicro.com/assets/white_papers/wp-the-fragility-of-industrial-IoTs-data-backbone.pdf, section 2.1.2) some code points (i. e. control characters) MUST close the network connection, for some others it MAY close the network connection (e. g. U+0001..U+001F, U+007F..U+009F). However, I could not find any of the filtering anywhere in the code of neither Aedes nor mqtt-packet, which I was supposing to be the relevant candidate for doing so (as it provides the parser for the topic).

The conformance statements from the spec:

The character data in a UTF-8 encoded string MUST be well-formed UTF-8 as defined by the Unicode specification [Unicode] and restated in RFC 3629 [RFC3629]. In particular this data MUST NOT include encodings of code points between U+D800 and U+DFFF. If a Server or Client receives a Control Packet containing ill-formed UTF-8 it MUST close the Network Connection [MQTT-1.5.3-1].

A UTF-8 encoded string MUST NOT include an encoding of the null character U+0000. If a receiver (Server or Client) receives a Control Packet containing U+0000 it MUST close the Network Connection [MQTT-1.5.3-2].

The data SHOULD NOT include encodings of the Unicode [Unicode] code points listed below. If a receiver (Server or Client) receives a Control Packet containing any of them it MAY close the Network Connection:

U+0001..U+001F control characters 
U+007F..U+009F control characters 
Code points defined in the Unicode specification [Unicode] to be non-characters (for example U+0FFFF) 

A UTF-8 encoded sequence 0xEF 0xBB 0xBF is always to be interpreted to mean U+FEFF ("ZERO WIDTH NO-BREAK SPACE") wherever it appears in a string and MUST NOT be skipped over or stripped off by a packet receiver [MQTT-1.5.3-3].

Is this part of the spec just not implemented anywhere or am I looking at the wrong code base?

The text was updated successfully, but these errors were encountered:

stapelberg mentioned this issue Jan 7, 2021

hmq does not close connection when receiving invalid utf-8 fhmq/hmq#104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of invalid utf-8 code points according to [mqtt-v3.1.1-plus-errata01] #59

Handling of invalid utf-8 code points according to [mqtt-v3.1.1-plus-errata01] #59

rockebee commented May 23, 2019

Handling of invalid utf-8 code points according to [mqtt-v3.1.1-plus-errata01] #59

Handling of invalid utf-8 code points according to [mqtt-v3.1.1-plus-errata01] #59

Comments

rockebee commented May 23, 2019