Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8: Add partial support for parsing UTF-8 metric and label names #13271

Merged
merged 1 commit into from
Feb 16, 2024

Conversation

ywwg
Copy link
Contributor

@ywwg ywwg commented Dec 8, 2023

This PR adds support for the new grammar of {"metric_name", "l1"="val"} to promql and some of the exposition formats. This grammar will also be valid for non-UTF-8 names. UTF-8 names will not be considered valid unless model.NameValidationScheme is changed.

This PR does not update the go expfmt parser in text_parse.go, which will be addressed by prometheus/common#554.

Correction, this PR does update the promql parser.

Part of #13095

@ywwg ywwg added this to the OTEL Support milestone Dec 8, 2023
@ywwg ywwg changed the title UTF-8: Add support for parsing UTF8 metric and label names UTF-8: Add partial support for parsing UTF8 metric and label names Feb 6, 2024
@ywwg ywwg marked this pull request as ready for review February 6, 2024 19:30
@ywwg ywwg requested a review from beorn7 February 6, 2024 19:34
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Mostly nitpicking in the comment and requesting more test cases.

I don't really feel I'm the right person to review this because I am not familiar at all with the generated parser. I might also ask dumb questions in my comments.

Let's try to find a reviewer that knows those things better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning: I'm not familiar with the lexer syntax. I'll just rubberstamp this one if the results look OK to me. Experts, please chime in in time to stop me.

{"http.status",q="0.9",a="b"} 8.3835e-05
{"http.status",q="0.9",a="b"} 8.3835e-05
{q="0.9","http.status",a="b"} 8.3835e-05
{"go.gc_duration_seconds_sum"} 0.004304266`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a few test cases with "weirder" characters:

  • Multibyte characters like "ö" or "€".
  • Whitespace in the name.
  • An escaped " in the name.
  • { and } as part of the name.

Also, what about quoted label names? Including label names with all the weird characters above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oo, found some bugs, thanks :)

model/textparse/promparse.go Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same flag here as for the OpenMetrics lexer file: I don't really know the lexer syntax.

model/textparse/promparse.go Outdated Show resolved Hide resolved
model/textparse/promparse.go Show resolved Hide resolved
model/textparse/promparse.go Outdated Show resolved Hide resolved
@@ -476,3 +519,11 @@ func (p *OpenMetricsParser) getFloatValue(t token, after string) (float64, error
}
return val, nil
}

func (p *OpenMetricsParser) verifyMetricName(start, end int) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here as below for the corresponding method in promparse.go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently you removed verifyMetricName from promparse.go, but left it in here. But this should go, too, shouldn't it?

model/textparse/promparse_test.go Outdated Show resolved Hide resolved
scrape/scrape_test.go Show resolved Hide resolved
@beorn7
Copy link
Member

beorn7 commented Feb 8, 2024

I believe the Go tests should pass if you run go mod tidy.

@beorn7
Copy link
Member

beorn7 commented Feb 8, 2024

This PR does not update PromQL parsing

It does now, doesn't it?

However, the frontend cod isn't updated yet, so while things work, the UI is complaining passive-aggressively with squiggly lines:

image

@ywwg
Copy link
Contributor Author

ywwg commented Feb 8, 2024

It does now, doesn't it?

fixed

@ywwg
Copy link
Contributor Author

ywwg commented Feb 8, 2024

yes, another parser that will have to be fixed is the typescript one -- there is a task for that but I will create a proper issue to raise visibilitiy

@ywwg ywwg changed the title UTF-8: Add partial support for parsing UTF8 metric and label names UTF-8: Add partial support for parsing UTF-8 metric and label names Feb 8, 2024
@@ -582,7 +582,11 @@ label_match_list: label_match_list COMMA label_matcher
;

label_matcher : IDENTIFIER match_op STRING
{ $$ = yylex.(*parser).newLabelMatcher($1, $2, $3); }
{ $$ = yylex.(*parser).newLabelMatcher($1, $2, $3); }
| string_identifier match_op STRING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need string_identifier match_op error rule? or is the parser error useful without it already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that be like
{"x"=} ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok yes, adding that makes the error a little better

@ywwg
Copy link
Contributor Author

ywwg commented Feb 13, 2024

test failure is I think unrelated but I'll dig in

@ywwg
Copy link
Contributor Author

ywwg commented Feb 13, 2024

Hm this test failure is a bad sign -- there have been multiple places where people are testing the value of the negotiated format against the string constants. This is Bad because there are now many different ways the formats can look and be equivalent. How do we prevent people from doing these bad string comparisons? make the string constants non-exported?

@ywwg
Copy link
Contributor Author

ywwg commented Feb 13, 2024

fuzz error looks concerning, I'm trying to figure out how to run that test myself

@ywwg
Copy link
Contributor Author

ywwg commented Feb 14, 2024

I was able to debug the fuzz error, fixing

@ywwg
Copy link
Contributor Author

ywwg commented Feb 14, 2024

got it

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now with fixed fuzzing and extended tests, as far as I can say.

See comments for minor things, otherwise I'd say this is good to go. Last chance for others to chime in!

go.mod Outdated
@@ -52,7 +52,7 @@ require (
github.com/prometheus/alertmanager v0.26.0
github.com/prometheus/client_golang v1.18.0
github.com/prometheus/client_model v0.5.0
github.com/prometheus/common v0.46.0
github.com/prometheus/common v0.46.1-0.20240206181200-773d5664eb8d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not pin to specific commits here. I have just tagged 0.47.0 in common, so you can use it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the tag on github, do you need to push it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed it an hour ago, and even created a nice release.

Maybe Github is once more very eventually consistent?
https://github.com/prometheus/common/releases/tag/0.47.0

@@ -476,3 +519,11 @@ func (p *OpenMetricsParser) getFloatValue(t token, after string) (float64, error
}
return val, nil
}

func (p *OpenMetricsParser) verifyMetricName(start, end int) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently you removed verifyMetricName from promparse.go, but left it in here. But this should go, too, shouldn't it?

@ywwg
Copy link
Contributor Author

ywwg commented Feb 15, 2024

Apparently you removed verifyMetricName from promparse.go, but left it in here. But this should go, too, shouldn't it?

ah yes sorry

@beorn7
Copy link
Member

beorn7 commented Feb 15, 2024

Thank you. I think you can do the squashing and DCO'ing now, and then we can merge (barring any new objections from others).

func TestEvaluations(t *testing.T) {
RunBuiltinTests(t, newTestEngine())
}
// func TestEvaluations(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

This adds support for the new grammar of `{"metric_name", "l1"="val"}` to promql and some of the exposition formats.
This grammar will also be valid for non-UTF-8 names.
UTF-8 names will not be considered valid unless model.NameValidationScheme is changed.

This does not update the go expfmt parser in text_parse.go, which will be addressed by prometheus/common#554.

Part of prometheus#13095

Signed-off-by: Owen Williams <owen.williams@grafana.com>
@beorn7 beorn7 merged commit ac10cd4 into prometheus:main Feb 16, 2024
24 checks passed
@beorn7
Copy link
Member

beorn7 commented Feb 16, 2024

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants