Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful continuation on NOT_FOUND #170

Open
jimconner opened this issue Jun 27, 2022 · 2 comments
Open

Graceful continuation on NOT_FOUND #170

jimconner opened this issue Jun 27, 2022 · 2 comments

Comments

@jimconner
Copy link

Hi - We are using cf-python-client as part of a task which downloads each app-blob in turn. As we have tens of thousands of containers, this job takes 8-10 hours to complete.

The job is theoretically quite straightforward...

for app in cfClient.v3.apps:
    space_name=app.space()['name']
    org_name=app.space().organization()['name']
    ...

More often than not, during the running of this task, we'll encounter an error along the lines of...

cloudfoundry_client.errors.InvalidStatusCode: NOT_FOUND = {"errors": [{"detail": "Space not found", "title": "CF-ResourceNotFound", "code": 10010}]}

or

cloudfoundry_client.errors.InvalidStatusCode: NOT_FOUND = {"errors": [{"detail": "App not found", "title": "CF-ResourceNotFound", "code": 10010}]} 

Given the number of orgs/spaces/apps we have in service, it is not too surprising that an app or space which was present at the start of the task is no longer around 6-8 hours later.

When we encounter an error like this we have no options but to restart the 8-10 hour task from the beginning again and hope that it manages to make it through without an error.

I was just wondering if there are any options to allow for more graceful continuation when objects are no longer found? This doesn't seem to be something that I can handle with a try: except: - Are there any options to make errors like this non-fatal?

Cheers!!

@antechrestos
Copy link
Member

@jimconner normally the cfClient.v3.apps call return a generator that crawl the application you have access to using the pagination mechanism offered by the api.

as object as deserialized and then returned using the yield operator, I think that App not found is when you try to use the app

However, I think that you may optimise your call as

  • app.space() does a request against the clou controller to get the space linked by the app object (see the documentation
  • app.space().organization() does a request to get the space (one more time) and then requests the api to get the organisation. As there is no cache you must call the same space and organisation several time (for space 2*number_of_application_in_space, for organisation number_of_application_in_org )

Did you try the following approach

for org in cfClient.v3.organizations:
    org_name = org['name']
    try:
       for space in cfClient.spaces.list(organization_guids=org['guid']):
          space_name=space['name']
          try:
            for app in cfClient.v3.apps.list(space_guids=space['guid']):
               app_name=app['name']
           except InvalidStatusCode:
              pass
    except InvalidStatusCode:
       pass

@jimconner
Copy link
Author

Thanks for replying @antechrestos - I've been through a few different approaches for this so far. I started out by fetching all apps, spaces and orgs and used list comprehension to convert into in-memory objects, which was nice and fast, but gave me NOT_FOUND issues. I changed to the current app.space().organization() in an attempt to fetch things more dynamically (which mostly worked out OK).

I'll try your suggested approach sometime soon, but I suspect the same issue will persist. We have some short-lived orgs/spaces/apps that are created and destroyed by pipelines. Looping through v3.organizations will likely give us the same result. We're hosting >5K apps on this particular CF foundation and there may be a few hundred apps within any given org. The script in question here is fetching the current droplet for each of the apps so that we can create SBOMs and run scanning tools across them. This is always going to be a slow process due to the size and number of blobs involved.

The other place I need to work out some kind of graceful continuation is for when there are gitches on the network (or in one of the three different types of load balancer that sit in front of our estate) that cause a packet or two to vanish unexpectedly.

  File "/usr/lib/python3.9/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

If you've got any ideas for how to make the client reconnect automatically then I'm all ears :-)

Thanks again for replying @antechrestos - Please don't feel under any pressure. This is quite a low-priority problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants