-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for helm memberlist cluster label migration #7961
Conversation
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
weight: 110 | ||
--- | ||
|
||
# Introduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [doc-validator] reported by reviewdog 🐶
The first heading 'Introduction' doesn't match the title 'Migrating Mimir Helm Installation to Use Memberlist Cluster Label'.
Decide which of the first heading or title is most applicable and change the other to match, or update them both.
|
||
The migration should take around 30 minutes. The risk of not doing this migration if you have Mimir and Tempo or Loki in the same cluster is the possibility of those different backend to talk with each other. | ||
|
||
# How The Migration Solve The Issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [doc-validator] reported by reviewdog 🐶
The first heading 'How The Migration Solve The Issue' doesn't match the title 'Migrating Mimir Helm Installation to Use Memberlist Cluster Label'.
Decide which of the first heading or title is most applicable and change the other to match, or update them both.
|
||
Cluster label will prevent such situation by only allowing communication for components that has same cluster label. Once that is enabled, before memberlist try to communicate with other components, it will verify whether that new components are having same cluster label and only allowing memberlist to communicate if the parties are having the same cluster label. | ||
|
||
# Migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [doc-validator] reported by reviewdog 🐶
The first heading 'Migration' doesn't match the title 'Migrating Mimir Helm Installation to Use Memberlist Cluster Label'.
Decide which of the first heading or title is most applicable and change the other to match, or update them both.
|
||
Apply the configuration changes by running `helm upgrade mimir-distributed -f values.yaml`. | ||
|
||
# 4. Verifying The Migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [doc-validator] reported by reviewdog 🐶
The first heading '4. Verifying The Migration' doesn't match the title 'Migrating Mimir Helm Installation to Use Memberlist Cluster Label'.
Decide which of the first heading or title is most applicable and change the other to match, or update them both.
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems more like this is an important configuration step rather than a migration task. Perhaps the title should be "Configure a unique cluster label"?
This looks like a task topic, there's some useful information for how to structure a task topic in https://grafana.com/docs/writers-toolkit/structure/topic-types/task/.
I haven't reviewed the prose of the PR yet, I think we should consider the name of the file and get the structure more aligned with the guidance in https://grafana.com/docs/writers-toolkit/structure/topic-types/task/ and then I can provide a style review for the wording.
@@ -0,0 +1,96 @@ | |||
--- | |||
description: "Migrating Mimir Helm Installation to Use Memberlist Cluster Label" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the description should provide more information on what the reader can expect to get from this page.
The description is presented to the user in search engines and on social media when links to this page are used.
A good example is https://github.com/grafana/mimir/blob/main/docs/sources/mimir/set-up/migrate/migrate-from-cortex.md?plain=1#L4-L7
https://grafana.com/docs/writers-toolkit/write/front-matter/#description-required
title: "Migrating Mimir Helm Installation to Use Memberlist Cluster Label" | ||
menuTitle: "Migrating Mimir Helm Installation to Use Memberlist Cluster Label" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Titles and headings should use sentence case rather than title case.
We should also avoid gerunds ("ing" endings in verbs).
Titles should also use the full product name "Grafana Mimir" for search engine optimization.
I have corrected this as an example, but please see my general review comment because we may want to rename this page.
title: "Migrating Mimir Helm Installation to Use Memberlist Cluster Label" | |
menuTitle: "Migrating Mimir Helm Installation to Use Memberlist Cluster Label" | |
title: "Migrate Grafana Mimir Helm chart to use memberlist cluster label" |
Also the menuTitle
can be omitted if it is the same as the title.
In this case, I think we need a much shorter title that omits the product name.
Considering a page rename may make this menuTitle
much shorter, perhaps Configure a memberlist cluster label
?
# Migrating Mimir Helm Installation to Use Memberlist Cluster Label | ||
|
||
## Introduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we don't want stacked headings (headings with no content between them).
The introductory content can be moved out of the H2 and become the conceptual material at the start of this page.
{{% docs/reference %}} | ||
[memberlist]: "/ -> /docs/mimir/<MIMIR_DOCS_VERSION>/references/architecture/memberlist-and-the-gossip-protocol" | ||
[hash ring]: "/ -> /docs/mimir/<MIMIR_DOCS_VERSION>/references/architecture/hash-ring" | ||
{{% /docs/reference %}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is any need to use docs/reference
here because the links are the same and this content isn't reused via Hugo mounts as far as I know. You only need to use that shortcode for content that is used in multiple website projects like being both in Grafana and Grafana Cloud documentation.
Instead, you can just use URLs with <MIMIR_DOCS_VERSION>
for version substitution.
Grafana Mimir uses [memberlist] to share works and deciding which component replica to send the workload such as when ingesting series. Memberlist encodes the replica information in data structure called [hash ring], this is a consistent hashing scheme in which different ingesters instance tokens are placed around the ring. The token position in the ring determines which ingester should handle a request. The information of this hash ring is delivered to different components by using gossip protocol. | ||
|
||
By default, hash ring is global. If we also have Grafana Tempo and Grafana Loki in the same Kubernetes cluster, their components can unintentionally talk and sending data with each other, because Loki and Tempo also uses memberlist. This possibility to happen increases in cluster where the pods may reuse each other's IPs after churning. | ||
|
||
In this document we will describe the step on how to migrate a Mimir installation to prevent two separate memberlist gossip hash ring to join into one. | ||
|
||
There are three steps of the migration which we will describe in details in the migration section. But in brief summary the steps are: | ||
|
||
1. Disable memberlist cluster label verification | ||
1. Set cluster label to all Mimir components | ||
1. Enable memberlist cluster label verification again | ||
|
||
The migration should take around 30 minutes. The risk of not doing this migration if you have Mimir and Tempo or Loki in the same cluster is the possibility of those different backend to talk with each other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second sentence on L24 is the most important bit of this introduction. A reader wants to know why they should perform this task and they may not really care about the conceptual information on L12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed many repetition and make a more concise introduction in the beginning of the section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall I think this article conflates memberlist (which is a cluster membership protocol) with our key-value implementation on top of memberlist. It's not a huge deal for the purpose of the article, but it would be nice to have it correct
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
weight: 110 | ||
--- | ||
|
||
# Configure a unique Grafana Mimir's Memberlist cluster label in helm installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [doc-validator] reported by reviewdog 🐶
The first heading 'Configure a unique Grafana Mimir's Memberlist cluster label in helm installation' doesn't match the title 'Configure a unique Grafana Mimir's Memberlist cluster label in the mimir-distributed Helm chart installation'.
Decide which of the first heading or title is most applicable and change the other to match, or update them both.
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
I was not fully aware on the writers-toolkit before this. It's really useful and I will refer to it next time working on other documentation. I have updated the structure slightly and hopefully matched the task topic structure. But please advise again if this is still not enough. |
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
I think it is important to make it correct. Thanks for highlighting. I rewrote and shortened most of the section, hopefully, now the information is more accurate. |
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
I haven't tested step by step of configuration update. I will do it tomorrow and adjust the step accordingly if find out something is not accurate after the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving mostly wording suggestions
Multiple [Memberlist](https://grafana.com/docs/mimir/<MIMIR_VERSION>/references/architecture/memberlist-and-the-gossip-protocol/) [gossip ring](https://grafana.com/docs/mimir/<MIMIR_VERSION>/references/architecture/hash-ring/) cluster is at risk of merging into one without enabling cluster label verification. | ||
For example, if a Mimir, Tempo or Loki are running in the same Kubernetes cluster, they might communicate with each other without this configuration update. | ||
Once cluster label verification is enabled, before Mimir components communicate with other components, it will verify whether those other components are having same cluster label. | ||
The process to update the configuration should take around 30 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be more than 30 minutes for a large cluster - 3 rollouts 1hr each can span 3hrs. Should we just say that it will take 3 rollouts of the whole cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this make more sense. Even 30 minutes that I put there was also quite abrupt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to apply this change to the doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange. I remember that I changed this as what was suggested. Maybe it got stashed in between of other commits. I confirm the commit this time in cf11b13.
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work and thanks for addressing my comments!
@jdbaldry do you want to take another look before we merge this? Or should we ask the tech writer starting soon to take a look?
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
...harts/mimir-distributed/migration-guides/migrate-enable-cluster-label-verification/_index.md
Outdated
Show resolved
Hide resolved
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
I did one round of testing running the steps from the documentation. I also ran the test when we updated the cluster label directly without following the three steps in the doc. Updating the cluster label directly will create a temporary Memberlist partition where old pods will run in the old Memberlist cluster and can't join the new Memberlist cluster with the new cluster label. Eventually, the old Memberlist cluster will be gone and all components will join the new Memberlist cluster. Nevertheless, I guess the goal of following the steps in the doc is to avoid that inconsistency when temporarily different mimir components are running in the different Memberlist clusters. |
…grate-enable-cluster-label-verification/_index.md Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
Thanks for the revision here. I think it looks a lot better with the changes you have made and we can definitely get this in. After the new writer has ramped up, they can come back to this and do a closer pass for style conformity :) |
sounds good; @lamida should we merge this in then? |
@dimitarvdimitrov actually I am under impression that @jdbaldry wanted me to wait for the new writer to review this first before merging. 🤔 |
Sorry for the confusion. I meant that you can merge this as it is and the new writer can revise this later as they become more acquainted with the project |
What this PR does
Add documentation to migrate mimir helm installation to enable cluster label verification. The rendered doc is here.
Which issue(s) this PR fixes or relates to
Fixes #2865
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.