Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible based operator templating failed when CR contains complex error status #4721

Closed
sundag opened this issue Apr 7, 2021 · 1 comment
Labels
language/ansible Issue is related to an Ansible operator project

Comments

@sundag
Copy link

sundag commented Apr 7, 2021

Bug Report

What did you do?

We created an ansible based operator to deploy our solution. We are using origin CR values in our Ansible template in formats like below:

{{ _icp4a_ibm_com_icp4acluster.spec.bai_configuration is defined or baml_bai_in_optional_components }}

That works well for a newly applied CR with simple status field. But if the status contains some complex error info which contains almost all the CR fields also. Just like this:

    conditions:
    - ansibleResult:
        changed: 78
        completion: 2021-03-11T00:31:52.462449
        failures: 0
        ok: 2103
        skipped: 1759
      lastTransitionTime: "2021-03-11T00:31:53Z"
      message: |-
        An unhandled exception occurred while running the lookup plugin 'template'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while templating '{'apiVersion': 'icp4a.ibm.com/v1', 'kind': 'ICP4ACluster', 'metadata': {'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"icp4a.ibm.com/v1","kind":"ICP4ACluster","metadata":{"annotations":{},"labels":{"app.kubernetes.io
...
com.ibm.automation.cloud.ecm.tai.ECMTamTrustAssociationInterceptor.accessControlGroup=\\\\"cn=participants,cn=groups,O=IBM,C=US\\\\"\\\\n                useRegistrySecurityNameForSubject=\\\\"false\\\\"\\\\n                
...
\'f:traceSpecification\': {}}, \'f:replica_count\': {}}, \'f:scim\': {\'.\': {}, \'f:autoscaling\': {\'.\': {}, \'f:enabled\': {}, \'f:maxreplicas\': {}, \'f:minreplicas\': {}, \'f:targetAverageUtilization\': {}}, \'f:custom_xml\': {}, \'f:logs\': {\'.\': {}, \'f:traceSpecification\': {}}, \'f:replica_count\': {}}, \'f:sso\': {\'.\': {}, \'f:access_token_lifetime\': {}, 
...
        'pod_list' is undefined
        unknown playbook failure
        Business Automation Insights: Installation failed. For details, see the output log.
      reason: Failed
      status: "False"
      type: Failure
    - lastTransitionTime: "2021-03-11T00:31:53Z"
      message: Running reconciliation
      reason: Running
      status: "True"
      type: Running

The templating error happens on next reconcile

TASK [BAML : Whether to deploy intelligent task prioritization server] *********
task path: /opt/ansible/roles/BAML/tasks/init-variables.yml:18
Wednesday 07 April 2021  03:58:30 +0000 (0:00:00.052)       0:11:46.372 ******* 
...
fatal: [localhost]: FAILED! => {"msg": "The conditional check 'baml_configuration.intelligent_task_prioritization is defined and baml_bai_install_requested' failed. The error was: An unhandled exception occurred while templating '{{ _icp4a_ibm_com_icp4acluster.spec.bai_configuration is defined or baml_bai_in_optional_components }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while templating
...

And the error log contains almost the whole CR also.

What did you expect to see?

The reconcile and template should work. Not effected by the fields in CR status

What did you see instead? Under which circumstances?

Templating error when the CR status fields contain complex values.

Environment

Ansible base operator

/language ansible

Kubernetes cluster type:

OpenShift 4.5.x

$ operator-sdk version

quay.io/operator-framework/ansible-operator:v0.17.2
Upgrade Ansible to 2.10.5

$ go version (if language is Go)

$ kubectl version

Possible Solution

After remove the CR status values with k8s API the reconcile works again.

Additional context

@openshift-ci-robot openshift-ci-robot added the language/ansible Issue is related to an Ansible operator project label Apr 7, 2021
@estroz
Copy link
Member

estroz commented Apr 12, 2021

You seem to be using a very old version of the ansible-operator image, and an unsupported version of ansible (only 2.9.z is supported in master). I recommend upgrading to the latest ansible-operator release then trying the above steps again.

The issue here may be related to unsafe vars, fixed in #4566 and will be a part of v1.6. Feel free to re-open this issue if you do not see your problem resolved.

@estroz estroz closed this as completed Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language/ansible Issue is related to an Ansible operator project
Projects
None yet
Development

No branches or pull requests

3 participants