Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursion issues as of 1.1.0 / ansible 2.10.2 #4202

Closed
faust64 opened this issue Nov 7, 2020 · 7 comments
Closed

Recursion issues as of 1.1.0 / ansible 2.10.2 #4202

faust64 opened this issue Nov 7, 2020 · 7 comments
Assignees
Labels
area/dependency Issues or PRs related to dependency changes language/ansible Issue is related to an Ansible operator project release-blocker This issue blocks the parent release milestone
Milestone

Comments

@faust64
Copy link

faust64 commented Nov 7, 2020

Bug Report

What did you do?

Changed my base image from 1.0.1, to 1.1.0.

What did you expect to see?

Nothing special, similar results to 1.0.1 I guess.

What did you see instead? Under which circumstances?

Playbooks crashing.

In practice. My operator deploys stuff, that could have dependencies on other components we might have deployed with another CR.

At some point, I would detect those components, and may start waiting for them (can be a build that need to complete, a service that needs to startup, sample below for a PipelineRun):

    - include_role:
        name: commons
        tasks_from: helpers/wait-for.yaml
      vars:
        check_with: "{{ lookup('k8s', api_version='tekton.dev/v1beta1',
                               kind='PipelineRun', namespace=namespace,
                               resource_name='xxx') | default('') }}"
        obj_name: "xxx"
        retries_for: 20
        wait_for: 10

Entering the wait-for.yaml:

- name: "Waits for {{ obj_name }} to startup"
  block:
  - name: "Checks latest {{ obj_name }} status"
    debug:
       msg: |
         blabla
    delay: "{{ wait_for | default(10) }}"
    ignore_errors: True
    retries: "{{ retries_for | default(10) }}"
    until:
    - condA
    - condB
    - condC
    - condD
    - cond...
    - condN

So .... that code's been working pretty well, until upgrading my operator base image to 1.1.0

Since then, I would see an error such as maximum recursion depth exceeded while calling a Python object.
For a given CR, that trace would always mention the same line: at some point, it stops reading the conditions, while in the middle of them (eg: trace would mention condD deploying a CR A, condN for CR B, ...)

Obviously, a quick workaround to try and avoid these would be to rewrite the until close, such as:

until:
- condA and condB and condC ....

This way, some of my playbooks would eventually manage to reach their end.
Then again, as soon as I'ld have to wait for an object to become ready, I'm still very likely to overflow my python stack.

How come?
This used to work great.
Any chance it would get fixed?

Looking at Ansible issues, they would have heard of this ( ansible/ansible#71920 ). AFAIU, 2.10.2 should be fixed. Though if that were the case, I obviously wouldn't be here.

Environment

Operator type:

/language ansible

Kubernetes cluster type:

vanilla

$ operator-sdk version

1.1.0

$ kubectl version

1.18.3

Possible Solution

Not a fix / temporary workaround: rewriting playbook conditions, refactoring them into a one-liner

@asmacdo
Copy link
Member

asmacdo commented Nov 9, 2020

Operator SDK 1.0.1 and 1.1.0 are both still using Ansible 2.9 https://github.com/operator-framework/operator-sdk/blob/v1.1.0/hack/image/ansible/Dockerfile#L24

@jberkhahn jberkhahn added this to the v1.3.0 milestone Nov 9, 2020
@jberkhahn jberkhahn modified the milestones: v1.3.0, v1.4.0 Nov 11, 2020
@faust64
Copy link
Author

faust64 commented Nov 14, 2020

Sure, 1.0.1 does ship with Ansible 2.9. And I had no issue.

Though I just pulled the last 1.1.0, and I confirm that it does ship with 2.10.
Yet you're right, that Dockerfile suggests otherwise. That's weird.

$ docker pull  quay.io/operator-framework/ansible-operator:v1.1.0
v1.1.0: Pulling from operator-framework/ansible-operator
ec1681b6a383: Already exists 
c4d668e229cd: Already exists 
96f82905e73b: Pull complete 
....
$ docker run -it --entrypoint /bin/sh quay.io/operator-framework/ansible-operator:v1.1.0
sh-4.4$ ansible --version
ansible 2.10.2
sh-4.4$ pip3 list | grep ansible
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
ansible (2.10.1)
ansible-base (2.10.2)
ansible-runner (1.3.4)
ansible-runner-http (1.0.0)

@dalbani
Copy link

dalbani commented Nov 16, 2020

Out of curiosity, I track the meaning of ansible~=2.9: it's called a "compatible release", translated in this case to >= 2.9.N, == 2.* according to https://www.python.org/dev/peps/pep-0440/#compatible-release.

For reproducibility purposes, shouldn't all the versions, including ansible, be pinned to a specific version in the Dockerfile?

@dalbani
Copy link

dalbani commented Nov 19, 2020

For the reference, I have stated my question about version pinning in #4237.

@asmacdo asmacdo added the release-blocker This issue blocks the parent release milestone label Dec 14, 2020
@asmacdo
Copy link
Member

asmacdo commented Dec 14, 2020

Using Ansible 2.10 was a mistake, and I will be pinning to 2.9 in the fix for #4237.

Adding this as a release blocker since we should have been using 2.9.

@estroz estroz modified the milestones: v1.5.0, v1.3.0 Dec 15, 2020
@estroz estroz added area/dependency Issues or PRs related to dependency changes language/ansible Issue is related to an Ansible operator project and removed blocked labels Dec 15, 2020
@asmacdo
Copy link
Member

asmacdo commented Dec 15, 2020

Getting the requirements to be install via requirements.txt turned out to be more complex than I originally realized. I'm pinning the top level deps by hand, fixing this issue. #4321

@asmacdo
Copy link
Member

asmacdo commented Dec 15, 2020

Fixed by #4321

@asmacdo asmacdo closed this as completed Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependency Issues or PRs related to dependency changes language/ansible Issue is related to an Ansible operator project release-blocker This issue blocks the parent release milestone
Projects
None yet
Development

No branches or pull requests

5 participants