Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baggage: Accept non-ASCII keys #5132

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -28,6 +28,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
### Changed

- `SpanFromContext` and `SpanContextFromContext` in `go.opentelemetry.io/otel/trace` no longer make a heap allocation when the passed context has no span. (#5049)
- `NewMemberRaw` in `go.opentelemetry.io/otel/baggage` allows UTF-8 string. (#5132)

### Fixed

Expand Down
31 changes: 29 additions & 2 deletions baggage/baggage.go
Expand Up @@ -8,6 +8,7 @@ import (
"fmt"
"net/url"
"strings"
"unicode/utf8"

"go.opentelemetry.io/otel/internal/baggage"
)
Expand Down Expand Up @@ -241,7 +242,12 @@ func NewMember(key, value string, props ...Property) (Member, error) {

// NewMemberRaw returns a new Member from the passed arguments.
//
// The passed key must be compliant with W3C Baggage specification.
// The passed key and value must be valid UTF-8 string.
Copy link
Member

@pellared pellared Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewPropertyRaw (and tests) should be updated as well.
The name/key passed to NewPropertyRaw can be any valid UTF-8 string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about this, as the property is defined in the W3C baggage, not the otel specification. And, the W3C didn't allow UTF-8 in the key.

https://www.w3.org/TR/baggage/#key

Additional metadata MAY be appended to values in the form of property set, represented as semi-colon ; delimited list of keys and/or key-value pairs, e.g. ;k1=v1;k2;k3=v3. Property keys and values are given no specific meaning by this specification. Leading and trailing OWS is allowed and is not considered to be a part of the property key or value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

property is defined in the W3C baggage, not the otel specification

You are right. The OTel spec does not even have the "Members" and "Property" distinction reflecting W3C definitions. I would even say that the OTel spec says "properties" to reflect "members" in W3C which makes it very confusing.

I feel we should update the OTel spec first. Are you able to follow-up on this?

Copy link
Member

@pellared pellared Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was designed not to define "Property" from the spec, as it was only a W3C extension of baggage value.

But yeah, the spec can mention the W3C property somewhere to prevent confusion. I can follow up on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this:

Metadata Optional metadata associated with the name-value pair. This should be an opaque wrapper for a string with no semantic meaning. Left opaque to allow for future functionality.

So actually, I think it should have similar behavior and it is not actually blocked by specification.

// However, the specific Propagators that are used to transmit baggage entries across
// component boundaries may impose their own restrictions on baggage key.
Comment on lines +272 to +273
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update Member.String() and Property.String() so that we do not encode them if its names are invalid according to W3C Baggage Propagator spec. Reference #4946 (comment)

Copy link
Member Author

@XSAM XSAM Apr 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should allow Member.String() to return percent-encoded UTF-8 from keys, as OTel baggage allows UTF-8 in keys. If we do not encode them, it will be strange, as the result of String can miss some keys.

However, the W3C baggage propagator uses Member.String() directly for encoding. Maybe we should let the W3C baggage propagator drop some members if the key is UTF-8. Though additional memory allocation could occur, I am not sure whether we should go for this path or not.

Copy link
Member Author

@XSAM XSAM Apr 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we provide a method to expose baggage.List, so we could avoid memory allocation? (it is under "go.opentelemetry.io/otel/internal/baggage")

// For example, the W3C Baggage specification restricts the baggage keys to strings that
// satisfy the token definition from RFC7230, Section 3.2.6.
// For maximum compatibility, alpha-numeric value are strongly recommended to be used as baggage key.
func NewMemberRaw(key, value string, props ...Property) (Member, error) {
m := Member{
key: key,
Expand Down Expand Up @@ -313,9 +319,12 @@ func (m Member) validate() error {
return fmt.Errorf("%w: %q", errInvalidMember, m)
}

if !validateKey(m.key) {
if !validateBaggageName(m.key) {
return fmt.Errorf("%w: %q", errInvalidKey, m.key)
}
if !validateBaggageValue(m.value) {
return fmt.Errorf("%w: %q", errInvalidValue, m.value)
}
return m.properties.validate()
}

Expand Down Expand Up @@ -630,6 +639,23 @@ func skipSpace(s string, offset int) int {
return i
}

// validateBaggageName checks if the string is a valid OpenTelemetry Baggage name.
// Baggage name is a valid UTF-8 string.
func validateBaggageName(s string) bool {
if len(s) == 0 {
return false
}

return utf8.ValidString(s)
}

// validateBaggageValue checks if the string is a valid OpenTelemetry Baggage value.
// Baggage value is a valid UTF-8 strings.
func validateBaggageValue(s string) bool {
return utf8.ValidString(s)
}

// validateKey checks if the string is a valid W3C Baggage key.
func validateKey(s string) bool {
if len(s) == 0 {
return false
Expand Down Expand Up @@ -658,6 +684,7 @@ func validateKeyChar(c int32) bool {
c == 0x7e
}

// validateValue checks if the string is a valid W3C Baggage value.
func validateValue(s string) bool {
for _, c := range s {
if !validateValueChar(c) {
Expand Down
48 changes: 48 additions & 0 deletions baggage/baggage_test.go
Expand Up @@ -140,6 +140,11 @@ func TestNewKeyValueProperty(t *testing.T) {
assert.ErrorIs(t, err, errInvalidValue)
assert.Equal(t, Property{}, p)

// wrong value with wrong decoding
p, err = NewKeyValueProperty("key", "%zzzzz")
assert.ErrorIs(t, err, errInvalidValue)
assert.Equal(t, Property{}, p)

p, err = NewKeyValueProperty("key", "value")
assert.NoError(t, err)
assert.Equal(t, Property{key: "key", value: "value", hasValue: true}, p)
Expand Down Expand Up @@ -409,6 +414,15 @@ func TestBaggageParse(t *testing.T) {
"foo": {Value: "ąść"},
},
},
{
name: "encoded UTF-8 string in key",
in: "a=b,%C4%85%C5%9B%C4%87=%C4%85%C5%9B%C4%87",
want: baggage.List{
"a": {Value: "b"},
// The percent-encoded key won't be decoded.
"%C4%85%C5%9B%C4%87": {Value: "ąść"},
},
},
{
name: "invalid member: empty",
in: "foo=,,bar=",
Expand Down Expand Up @@ -861,6 +875,10 @@ func TestMemberValidation(t *testing.T) {
m.hasData = true
assert.ErrorIs(t, m.validate(), errInvalidKey)

// Invalid UTF-8 in value
m.key, m.value = "k", string([]byte{255})
assert.ErrorIs(t, m.validate(), errInvalidValue)

m.key, m.value = "k", "\\"
assert.NoError(t, m.validate())
}
Expand All @@ -882,6 +900,11 @@ func TestNewMember(t *testing.T) {
}
assert.Equal(t, expected, m)

// wrong value with invalid token
val = ";"
_, err = NewMember(key, val, p)
assert.ErrorIs(t, err, errInvalidValue)

// wrong value with wrong decoding
val = "%zzzzz"
_, err = NewMember(key, val, p)
Expand Down Expand Up @@ -926,6 +949,31 @@ func TestNewMemberRaw(t *testing.T) {
assert.Equal(t, expected, m)
}

func TestBaggageUTF8(t *testing.T) {
testCases := map[string]string{
"ąść": "B% 💼",

// Case sensitive
"a": "a",
"A": "A",
}

var members []Member
for k, v := range testCases {
m, err := NewMemberRaw(k, v)
require.NoError(t, err)

members = append(members, m)
}

b, err := New(members...)
require.NoError(t, err)

for k, v := range testCases {
assert.Equal(t, v, b.Member(k).Value())
}
}

func TestPropertiesValidate(t *testing.T) {
p := properties{{}}
assert.ErrorIs(t, p.validate(), errInvalidKey)
Expand Down