Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Tolerations and Node Selector on componets #52

Open
mitchellmaler opened this issue Oct 2, 2020 · 13 comments
Open

Node Tolerations and Node Selector on componets #52

mitchellmaler opened this issue Oct 2, 2020 · 13 comments
Labels
bug Something isn't working

Comments

@mitchellmaler
Copy link

Is your feature request related to a problem? Please describe.
Currently we are trying to deploy the compactor to a specific set of nodes with extra disk space. We tried to add the workloadOverrides with nodeSelector and tolerations but that does not seem to be picked up. It's possible we are just using the overrides incorrectly too since the documentation is limited on how to use them.

compactor:
  workloadOverrides:
    tolerations:
      - effect: NoSchedule
        key: dedicated
        value: prometheus
    nodeSelector:
      dedicated: prometheus

Describe the solution you'd like to see
Either have specific fields to add these to the components (compactor and others) or allow the overrides to pull them in.

Describe alternatives you've considered
Patching the deployment after the operator creates it.

@tarokkk
Copy link
Contributor

tarokkk commented Oct 2, 2020

The configuration looks good it is a possible bug

@tarokkk tarokkk added the bug Something isn't working label Oct 2, 2020
@mitchellmaler
Copy link
Author

we are running operator version 0.1.0.

Here is our objectstore manifest

apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ObjectStore
metadata:
  name: thanos
  namespace: telemetry-system
spec:
  compactor:
    workloadOverrides:
      spec:
        nodeSelector:
          dedicated: prometheus
        tolerations:
        - effect: NoSchedule
          key: dedicated
          value: prometheus
  config:
    mountFrom:
      secretKeyRef:
        key: thanos.yaml
        name: thanos-objstore-config

@Ultrafenrir
Copy link
Contributor

+1
The same with 0.1.1

Kinda critical thing , will be very appreciate for fix

@Ultrafenrir
Copy link
Contributor

Ultrafenrir commented Oct 25, 2020

we are running operator version 0.1.0.

Here is our objectstore manifest

apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ObjectStore
metadata:
  name: thanos
  namespace: telemetry-system
spec:
  compactor:
    workloadOverrides:
      spec:
        nodeSelector:
          dedicated: prometheus
        tolerations:
        - effect: NoSchedule
          key: dedicated
          value: prometheus
  config:
    mountFrom:
      secretKeyRef:
        key: thanos.yaml
        name: thanos-objstore-config

@mitchellmaler i looked into the code and find out the problem. With config below work just fine for all resources

apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ObjectStore
metadata:
  name: secret
spec:
  config:
    mountFrom:
      secretKeyRef:
        name: secret
        key: object-store.yaml
  compactor:
    workloadOverrides:
      tolerations:
        - effect: NoSchedule
          key: dedicated
          value: prometheus
      nodeSelector:
        dedicated: prometheus

Pls check from ur side and we can close the issue :)

@kody-abe
Copy link

kody-abe commented Feb 16, 2021

+1 This is also an issue for me. I have tried what @Ultrafenrir has posted and it did not work. Operator version 0.1.3

I saw that this had changed to deploymentOverrides which did not work either.

apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ObjectStore
spec:
  bucketWeb:
    deploymentOverrides:
      spec:
        template:
          spec:
            nodeSelector:
              stack: system
  compactor:
    deploymentOverrides:
      spec:
        template:
          spec:
            nodeSelector:
              stack: system
  config:
    mountFrom:
      secretKeyRef:
        key: object-store.yaml
        name: thanos-s3-configuration

@kody-abe
Copy link

@tarokkk Any thoughts on this? Unfortunately, this is blocking us and I might have to build this out a different way

@tarokkk
Copy link
Contributor

tarokkk commented Feb 22, 2021

We will release this week a completely rewritten override functionality that should solve this issue as well.

@pepov
Copy link
Contributor

pepov commented Feb 22, 2021

I'm doing a test for this scenario with the latest overrides mechanisms introduced in master.

@pepov
Copy link
Contributor

pepov commented Feb 22, 2021

@kody-abe master works fine for me with the above config. Could you please give it a try in a dev/staging environment with the current master image?

@kody-abe
Copy link

@pepov Can confirm, master works as expected with the above config! Look forward to the release.

@kody-abe
Copy link

@pepov Spoke too soon. This did not work for rule or storeGateway in the Thanos object. But, did work for query.

# thanos.monitoring.banzaicloud.io "thanos" was not valid:
# * <nil>: Invalid value: "The edited file failed validation": [ValidationError(Thanos.spec.rule): unknown field "deploymentOverrides" in io.banzaicloud.monitoring.v1alpha1.Thanos.spec.rule, ValidationError(Thanos.spec.storeGateway.deploymentOverrides): unknown field "spec" in io.banzaicloud.monitoring.v1alpha1.Thanos.spec.storeGateway.deploymentOverrides]
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
  name: thanos
spec:
  query:
    deploymentOverrides:
      spec:
        template:
          spec:
            nodeSelector:
              stack: system
  rule:
    deploymentOverrides:
      spec:
        template:
          spec:
            nodeSelector:
              stack: system
  storeGateway:
    deploymentOverrides:
      spec:
        template:
          spec:
            nodeSelector:
              stack: system

@pepov
Copy link
Contributor

pepov commented Feb 24, 2021

For this to work you have to use the latest CRDs from master. Also note, that rule does not have a deploymentOverrides, but a statefulsetOverrides field.

@kody-abe
Copy link

Yes! That was the ticket. This now works as expected for all types. Thanks @pepov! Looking forward to the release.

@pepov pepov removed their assignment May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants