Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document how to remove port from the instance label #2296

Open
calestyo opened this issue Mar 15, 2023 · 8 comments
Open

document how to remove port from the instance label #2296

calestyo opened this issue Mar 15, 2023 · 8 comments

Comments

@calestyo
Copy link

Hey.

AFAICS the following is not yet "officially" documented (sorry in case I just missed it).

What many people seem to want (given the number of stackoverflow/etc. questions and blog entries about it) is to have their instance labels consisting of just the hostname without the port.

There many different (unofficial) FAQs, blog posts, etc. ... all giving some different description on how to do it, using relabel_configs.

One (nearly) particular good one is IMO https://stackoverflow.com/a/63414542/6646161 , which has the nice feature of not overriding any manually set "<labelname>": "<labelvalue>" like e.g. in a <file_sd_config> - assuming that if an admin explicitly set something there, it should be kept as is (including the :port, which should removed when it was just inherited from the host:port of the target setting.

However, IMO there's a problem with the above stackoverflow solution, namely that it would fail at least in the case when the instance label contains a ;, which could be the case again, when manually set.

I came up with my own solution, which I think should work even in that case:

    relabel_configs:
      - source_labels: [instance]
        target_label:  instance
        regex:         '(.+)'
        replacement:   '_${1}:0'
      - source_labels: [instance,__address__]
        #separator:   ;
        target_label:  instance
        regex:         '^;(.+)'
        replacement:   ';${1}'
      - source_labels: [instance]
        target_label:  instance
        regex:         '^.(.+):\d+$'
        replacement:   '${1}'
  • The first rule is only for the case when the instance label has been manually set by the config (and not via the target’s host:port) ... if so, it will be non-zero in length and .+ will match (otherwise it won't).

    It merely adds a leading _ (indicating that this manually set) and a dummy :0 port to be removed again later.
  • The second rule is only to bring in the __address__ and only matches if there is a leading ; (which is only the case if instance is empty - if not, there would be a leading _ from the first rule). It leaves the leading ; so that in both cases there's some leading character (either _ or ;).
  • The third rule strips the leading character (either _ or ;) and the port.

This strictly requires the value of separator to be ; so the above setting might be uncommented if this is added to the documentation, in case the default value should ever change.

Please feel free to improve that (or come up with something even better)... but in any case, I think it (or some better version) would be worth to be added to the official documentation at some place.

Thanks,
Chris.

@calestyo
Copy link
Author

Maybe the above regexps could be improved a bit further - not from a functional PoV but in terms of understanding:

AFAIU, in Prometheus regexps are always anchored (which doesn't matter above), but one could either remove all my anchors (which I think makes it less readable as in all other places where regexps are used they're typically not anchored)... or fully anchor them explicitly on both sides.

But again, as all the unanchored sides have (.+) it doesn’t really matter.

@candlerb
Copy link

candlerb commented Apr 28, 2023

I believe you're over-thinking this. Just pick some character for the separator which you know is never going to appear in the __address__, given that you know it has to contain either an IP address or a DNS name.

    relabel_configs:
      - source_labels: [instance, __address__]
        separator: '!'
        regex: '!([^!]+):(\d+)'
        target_label: instance
        replacement: '${1}'   # this is implicit default anyway

Space works fine, as indeed should the default semicolon (how many DNS names do you see that contain semicolons?). Otherwise there are plenty of non-printing and unicode symbols you could use.

However, if I were documenting this as best practice, I wouldn't even do that. I would simply show:

    relabel_configs:
      - source_labels: [__address__]
        regex: '(.+):(\d+)'
        target_label: instance
        replacement: '${1}'   # this is implicit default anyway

This is because IMO, best practice is that the instance label consistently comes either from the rewriting rules, or from per-target labels in service discovery. Therefore, if you're doing it in rewriting, it's OK to ignore any value set in service discovery.

In any case, setting the instance label in SD only works in limited cases (static and file SD); as soon as you upgrade to a more sophisticated SD mechanism, they won't directly set the 'instance' label so you have to do it in rewriting rules.

@candlerb
Copy link

Prometheus regexps are always anchored (which doesn't matter above), but one could either remove all my anchors (which I think makes it less readable as in all other places where regexps are used they're typically not anchored)... or fully anchor them explicitly on both sides.

In Prometheus the anchoring of regexps is pervasive, and in my opinion it can and should be relied on. For example, in PromQL you would idiomatically write

foo{instance=~"bar|baz"}

and not

foo{instance=~"^(bar|baz)$"}

@calestyo
Copy link
Author

I believe you're over-thinking this. Just pick some character for the separator which you know is never going to appear in the address, given that you know it has to contain either an IP address or a DNS name.

I don't think that this would really cover all thinkable cases, though.
instance, AFAIU, is only initialised with __address__ if not set, and only then one would have the guarantee that e.g. space isn't used as "literal" character.

But some awkward setups may simply always manually set instance to whichever value they like.

As for the anchoring,... as said, I'd be open to change that... though personally I still find it less easily readable without the anchors.

@candlerb
Copy link

instance, AFAIU, is only initialised with __address__ if not set

That's correct - and only after the target rewriting rules have run. So if instance was not set in the SD, and not set it target rewriting, then it's set to a copy of __address__.

and only then one would have the guarantee that e.g. space isn't used as "literal" character

Sorry, I don't understand that. __address__ as it comes from the service discovery should consist of either ipaddress:port or hostname:port, to be usable directly.

If it comes in any other form, then it needs rewriting to be applied to it anyway. For example, if SD returns hostname<space>ipaddress, then you can have rewriting rules to set "instance" to the part before the space, and "address" to the part after the space with ":port" appended.

But some awkward setups may simply always manually set instance to whichever value they like

They could, and it should work with the rules I gave. Can you give a concrete example where it would not?

@calestyo
Copy link
Author

Sorry, I don't understand that. address as it comes from the service discovery should consist of either ipaddress:port or hostname:port, to be usable directly.

Yes.

If it comes in any other form, then it needs rewriting to be applied to it anyway. For example, if SD returns hostnameipaddress, then you can have rewriting rules to set "instance" to the part before the space, and "address" to the part after the space with ":port" appended.

If anything would return this as __address__ I'd rather consider it even to be a bug.

They could, and it should work with the rules I gave. Can you give a concrete example where it would not?

As far as I understand it, you 2nd rule, will simply always take __address__, try to split it up and if that works, it would set instance to that part before the final colon, right?
So it would override instance even if it had already set some value, e.g. intentionally by other label rewriting or target labels.

IMO that shouldn't be done, because if a user intentionally set that (or intentionally accepts foreign labels) then these are likely the values that are truly wanted.

And as far as I understand your 1st rule, if some awkward setup intentionally set instance to !foo it would result in something like !foo!hostname:port, which, when matched against the regex !([^!]+):(\d+) wouldn't match at all as the first subexpression can only match foo but then it expects a : but finds the 2nd !.

@candlerb
Copy link

If anything would return this as __address__ I'd rather consider it even to be a bug.

static_configs and file_sd_configs can return whatever strings you like for the __address__. (http_sd_configs too, now I think of it). Example here of how it can be useful.

As far as I understand it, you 2nd rule, will simply always take __address__, try to split it up and if that works, it would set instance to that part before the final colon, right?
So it would override instance even if it had already set some value, e.g. intentionally by other label rewriting or target labels.

That's correct. As I said before, that case was for the case where instance is always being set by rewriting rules. The first, and slightly more complex rule, is if you want any instance label set by static_configs etc to take precedence. I consider it bad practice to "mix and match" like that: I think you should be consistent, and either set it in the SD data, or set it in rewriting.

And as far as I understand your 1st rule, if some awkward setup intentionally set instance to !foo it would result in something like !foo!hostname:port, which, when matched against the regex !([^!]+):(\d+) wouldn't match at all as the first subexpression can only match foo but then it expects a : but finds the 2nd !.

Well that's true, but it provides the exactly behaviour you wanted: if any instance label is present, with any value, then the regex does not match and the instance label is not overwritten.

It doesn't make any difference whether the existing value is instance="foo" or instance="!foo". Neither of these will match the regex, because of the inserted '!' separator

The value being matched is <instance>!<__address__> and it only matches if ! is in the first position and nowhere else, which can only be true if instance is unset.

@jdbarnes-isi
Copy link

i wish this was in the official example/config file, so it was more obvious that it is necessary if you dont want port numbers in your other consuming interfaces. cheers, and thanks for the thread, it helped me set a relabel stanza on every job in my config file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants