Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin manager doesn't handle case where charm deployed with config options set #280

Open
Mehdi-Bendriss opened this issue May 1, 2024 · 6 comments · May be fixed by #282
Open

Plugin manager doesn't handle case where charm deployed with config options set #280

Mehdi-Bendriss opened this issue May 1, 2024 · 6 comments · May be fixed by #282
Labels
bug Something isn't working

Comments

@Mehdi-Bendriss
Copy link
Contributor

Mehdi-Bendriss commented May 1, 2024

The charm should handle from the get-go the possibility to deploy the charm with config options set and without the opensearch service being up.

@Mehdi-Bendriss Mehdi-Bendriss added the bug Something isn't working label May 1, 2024
Copy link

github-actions bot commented May 1, 2024

@phvalguima
Copy link
Contributor

@Mehdi-Bendriss indeed, I've opened a PR to shuffle the checks on plugin manager.

@phvalguima
Copy link
Contributor

Doing some digging @Mehdi-Bendriss

Back in the day, the decision to wait for the OpenSearch cluster was because we needed: (1) to cover the case some plugins manage things via API calls; (2) to know the opensearch version; and (3) we use the /_cluster/_settings to load the default settings. So, we needed the cluster up and running.

I am breaking this up into:

  1. Any plugin that needs to manage things via API call should check the health of the cluster using check_plugin_manager_health
  2. Moving opensearch_distro.version to load the workload_version file we have present instead of an API call
  3. We will waive the need of loading the default settings if this particular unit is powered down: which makes sense, in this moment we can do any config changes as we will eventually powered it back up later

That still frees the config_changed to just call plugin_manager.run() before everything is set, as the run() method changes hard configuration only.

@phvalguima
Copy link
Contributor

That is going to demand some attention on OpenSearchKeystore. We have now to account to the case where the keystore was not yet created, because the 1st start is not finished. In this case, we need to save the keystore password for later and pass it to the opensearch at startup time instead of leaving the opensearch to manage that.

@phvalguima
Copy link
Contributor

Whilst I agree we should deal with plugin_manager configuration independent of the cluster being ready or not, I did quite some digging into this issue and I've found out we are stuck on an endless loop of: opensearch-peers-changed hook issued > calls deferred config-changed > changes the content of peer databag > gets deferred > retriggers a new peers-changed

image

@phvalguima
Copy link
Contributor

Now, this is caused by a change that happens within opensearch_peers_relation_changed, at the deferred config-changed, the peer databag starts with:

'deployment-description': '{"config": {"cluster_name": "backup-test", "init_hold": false, "roles": ["cluster_manager"], "data_temperature": null}, "start": "start-with-provided-roles", "pending_directives": [], "typ": "main-orchestrator", "app": "main", "state": {"value": "active", "message": ""}, "promotion_time": 1714724984.931757}'

And finishes with:

'deployment-description': '{"config": {"cluster_name": "backup-test", "init_hold": false, "roles": ["cluster_manager"], "data_temperature": null}, "start": "start-with-provided-roles", "pending_directives": [], "typ": "main-orchestrator", "app": "main", "state": {"value": "active", "message": ""}, "promotion_time": 1714724436.7171}'

That is caused by this:

values["promotion_time"] = datetime.now().timestamp()

Which wrongly resets the promotion_time.

Full stack trace:

  /var/lib/juju/agents/unit-main-0/charm/src/charm.py(264)<module>()
-> main(OpenSearchOperatorCharm)
  /var/lib/juju/agents/unit-main-0/charm/venv/ops/main.py(544)main()
-> manager.run()
  /var/lib/juju/agents/unit-main-0/charm/venv/ops/main.py(520)run()
-> self._emit()
  /var/lib/juju/agents/unit-main-0/charm/venv/ops/main.py(506)_emit()
-> self.framework.reemit()
  /var/lib/juju/agents/unit-main-0/charm/venv/ops/framework.py(859)reemit()
-> self._reemit()
  /var/lib/juju/agents/unit-main-0/charm/venv/ops/framework.py(939)_reemit()
-> custom_handler(event)
  /var/lib/juju/agents/unit-main-0/charm/lib/charms/opensearch/v0/opensearch_base_charm.py(617)_on_config_changed()
-> previous_deployment_desc = self.opensearch_peer_cm.deployment_desc()
  /var/lib/juju/agents/unit-main-0/charm/lib/charms/opensearch/v0/opensearch_peer_clusters.py(323)deployment_desc()
-> return DeploymentDescription.from_dict(current_deployment_desc)
  /var/lib/juju/agents/unit-main-0/charm/lib/charms/opensearch/v0/models.py(39)from_dict()
-> return cls(**input_dict)
  /var/lib/juju/agents/unit-main-0/charm/venv/pydantic/main.py(339)__init__()
-> values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
  /var/lib/juju/agents/unit-main-0/charm/venv/pydantic/main.py(1100)validate_model()
-> values = validator(cls_, values)
> /var/lib/juju/agents/unit-main-0/charm/lib/charms/opensearch/v0/models.py(202)set_promotion_time()
-> if values["typ"] == DeploymentType.MAIN_ORCHESTRATOR:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants