Recursion issues as of 1.1.0 / ansible 2.10.2 #4202

faust64 · 2020-11-07T07:15:21Z

Bug Report

What did you do?

Changed my base image from 1.0.1, to 1.1.0.

What did you expect to see?

Nothing special, similar results to 1.0.1 I guess.

What did you see instead? Under which circumstances?

Playbooks crashing.

In practice. My operator deploys stuff, that could have dependencies on other components we might have deployed with another CR.

At some point, I would detect those components, and may start waiting for them (can be a build that need to complete, a service that needs to startup, sample below for a PipelineRun):

    - include_role:
        name: commons
        tasks_from: helpers/wait-for.yaml
      vars:
        check_with: "{{ lookup('k8s', api_version='tekton.dev/v1beta1',
                               kind='PipelineRun', namespace=namespace,
                               resource_name='xxx') | default('') }}"
        obj_name: "xxx"
        retries_for: 20
        wait_for: 10

Entering the wait-for.yaml:

- name: "Waits for {{ obj_name }} to startup"
  block:
  - name: "Checks latest {{ obj_name }} status"
    debug:
       msg: |
         blabla
    delay: "{{ wait_for | default(10) }}"
    ignore_errors: True
    retries: "{{ retries_for | default(10) }}"
    until:
    - condA
    - condB
    - condC
    - condD
    - cond...
    - condN

So .... that code's been working pretty well, until upgrading my operator base image to 1.1.0

Since then, I would see an error such as maximum recursion depth exceeded while calling a Python object.
For a given CR, that trace would always mention the same line: at some point, it stops reading the conditions, while in the middle of them (eg: trace would mention condD deploying a CR A, condN for CR B, ...)

Obviously, a quick workaround to try and avoid these would be to rewrite the until close, such as:

until:
- condA and condB and condC ....

This way, some of my playbooks would eventually manage to reach their end.
Then again, as soon as I'ld have to wait for an object to become ready, I'm still very likely to overflow my python stack.

How come?
This used to work great.
Any chance it would get fixed?

Looking at Ansible issues, they would have heard of this ( ansible/ansible#71920 ). AFAIU, 2.10.2 should be fixed. Though if that were the case, I obviously wouldn't be here.

Environment

Operator type:

/language ansible

Kubernetes cluster type:

vanilla

$ operator-sdk version

1.1.0

$ kubectl version

1.18.3

Possible Solution

Not a fix / temporary workaround: rewriting playbook conditions, refactoring them into a one-liner

The text was updated successfully, but these errors were encountered:

asmacdo · 2020-11-09T18:51:36Z

Operator SDK 1.0.1 and 1.1.0 are both still using Ansible 2.9 https://github.com/operator-framework/operator-sdk/blob/v1.1.0/hack/image/ansible/Dockerfile#L24

faust64 · 2020-11-14T12:42:50Z

Sure, 1.0.1 does ship with Ansible 2.9. And I had no issue.

Though I just pulled the last 1.1.0, and I confirm that it does ship with 2.10.
Yet you're right, that Dockerfile suggests otherwise. That's weird.

$ docker pull  quay.io/operator-framework/ansible-operator:v1.1.0
v1.1.0: Pulling from operator-framework/ansible-operator
ec1681b6a383: Already exists 
c4d668e229cd: Already exists 
96f82905e73b: Pull complete 
....
$ docker run -it --entrypoint /bin/sh quay.io/operator-framework/ansible-operator:v1.1.0
sh-4.4$ ansible --version
ansible 2.10.2
sh-4.4$ pip3 list | grep ansible
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
ansible (2.10.1)
ansible-base (2.10.2)
ansible-runner (1.3.4)
ansible-runner-http (1.0.0)

dalbani · 2020-11-16T11:04:51Z

Out of curiosity, I track the meaning of ansible~=2.9: it's called a "compatible release", translated in this case to >= 2.9.N, == 2.* according to https://www.python.org/dev/peps/pep-0440/#compatible-release.

For reproducibility purposes, shouldn't all the versions, including ansible, be pinned to a specific version in the Dockerfile?

dalbani · 2020-11-19T13:30:44Z

For the reference, I have stated my question about version pinning in #4237.

asmacdo · 2020-12-14T20:45:29Z

Using Ansible 2.10 was a mistake, and I will be pinning to 2.9 in the fix for #4237.

Adding this as a release blocker since we should have been using 2.9.

asmacdo · 2020-12-15T15:34:57Z

Getting the requirements to be install via requirements.txt turned out to be more complex than I originally realized. I'm pinning the top level deps by hand, fixing this issue. #4321

asmacdo · 2020-12-15T19:38:01Z

Fixed by #4321

faust64 mentioned this issue Nov 7, 2020

Concurrency with multiple CR #3585

Closed

jberkhahn assigned asmacdo Nov 9, 2020

jberkhahn added this to the v1.3.0 milestone Nov 9, 2020

jberkhahn added the blocked label Nov 11, 2020

jberkhahn modified the milestones: v1.3.0, v1.4.0 Nov 11, 2020

asmacdo mentioned this issue Dec 14, 2020

Provide a way to replicate Ansible/Python environment of Docker image #4237

Closed

asmacdo added the release-blocker This issue blocks the parent release milestone label Dec 14, 2020

estroz modified the milestones: v1.5.0, v1.3.0 Dec 15, 2020

estroz added area/dependency Issues or PRs related to dependency changes language/ansible Issue is related to an Ansible operator project and removed blocked labels Dec 15, 2020

asmacdo closed this as completed Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursion issues as of 1.1.0 / ansible 2.10.2 #4202

Recursion issues as of 1.1.0 / ansible 2.10.2 #4202

faust64 commented Nov 7, 2020

asmacdo commented Nov 9, 2020

faust64 commented Nov 14, 2020 •

edited

dalbani commented Nov 16, 2020

dalbani commented Nov 19, 2020

asmacdo commented Dec 14, 2020

asmacdo commented Dec 15, 2020

asmacdo commented Dec 15, 2020

Recursion issues as of 1.1.0 / ansible 2.10.2 #4202

Recursion issues as of 1.1.0 / ansible 2.10.2 #4202

Comments

faust64 commented Nov 7, 2020

Bug Report

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

Environment

Possible Solution

asmacdo commented Nov 9, 2020

faust64 commented Nov 14, 2020 • edited

dalbani commented Nov 16, 2020

dalbani commented Nov 19, 2020

asmacdo commented Dec 14, 2020

asmacdo commented Dec 15, 2020

asmacdo commented Dec 15, 2020

faust64 commented Nov 14, 2020 •

edited