Graceful continuation on NOT_FOUND #170

jimconner · 2022-06-27T08:34:46Z

Hi - We are using cf-python-client as part of a task which downloads each app-blob in turn. As we have tens of thousands of containers, this job takes 8-10 hours to complete.

The job is theoretically quite straightforward...

for app in cfClient.v3.apps:
    space_name=app.space()['name']
    org_name=app.space().organization()['name']
    ...

More often than not, during the running of this task, we'll encounter an error along the lines of...

cloudfoundry_client.errors.InvalidStatusCode: NOT_FOUND = {"errors": [{"detail": "Space not found", "title": "CF-ResourceNotFound", "code": 10010}]}

or

cloudfoundry_client.errors.InvalidStatusCode: NOT_FOUND = {"errors": [{"detail": "App not found", "title": "CF-ResourceNotFound", "code": 10010}]}

Given the number of orgs/spaces/apps we have in service, it is not too surprising that an app or space which was present at the start of the task is no longer around 6-8 hours later.

When we encounter an error like this we have no options but to restart the 8-10 hour task from the beginning again and hope that it manages to make it through without an error.

I was just wondering if there are any options to allow for more graceful continuation when objects are no longer found? This doesn't seem to be something that I can handle with a try: except: - Are there any options to make errors like this non-fatal?

Cheers!!

The text was updated successfully, but these errors were encountered:

antechrestos · 2022-06-27T15:12:07Z

@jimconner normally the cfClient.v3.apps call return a generator that crawl the application you have access to using the pagination mechanism offered by the api.

as object as deserialized and then returned using the yield operator, I think that App not found is when you try to use the app

However, I think that you may optimise your call as

app.space() does a request against the clou controller to get the space linked by the app object (see the documentation
app.space().organization() does a request to get the space (one more time) and then requests the api to get the organisation. As there is no cache you must call the same space and organisation several time (for space 2*number_of_application_in_space, for organisation number_of_application_in_org )

Did you try the following approach

for org in cfClient.v3.organizations:
    org_name = org['name']
    try:
       for space in cfClient.spaces.list(organization_guids=org['guid']):
          space_name=space['name']
          try:
            for app in cfClient.v3.apps.list(space_guids=space['guid']):
               app_name=app['name']
           except InvalidStatusCode:
              pass
    except InvalidStatusCode:
       pass

jimconner · 2022-06-28T07:32:13Z

Thanks for replying @antechrestos - I've been through a few different approaches for this so far. I started out by fetching all apps, spaces and orgs and used list comprehension to convert into in-memory objects, which was nice and fast, but gave me NOT_FOUND issues. I changed to the current app.space().organization() in an attempt to fetch things more dynamically (which mostly worked out OK).

I'll try your suggested approach sometime soon, but I suspect the same issue will persist. We have some short-lived orgs/spaces/apps that are created and destroyed by pipelines. Looping through v3.organizations will likely give us the same result. We're hosting >5K apps on this particular CF foundation and there may be a few hundred apps within any given org. The script in question here is fetching the current droplet for each of the apps so that we can create SBOMs and run scanning tools across them. This is always going to be a slow process due to the size and number of blobs involved.

The other place I need to work out some kind of graceful continuation is for when there are gitches on the network (or in one of the three different types of load balancer that sit in front of our estate) that cause a packet or two to vanish unexpectedly.

  File "/usr/lib/python3.9/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

If you've got any ideas for how to make the client reconnect automatically then I'm all ears :-)

Thanks again for replying @antechrestos - Please don't feel under any pressure. This is quite a low-priority problem for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful continuation on NOT_FOUND #170

Graceful continuation on NOT_FOUND #170

jimconner commented Jun 27, 2022

antechrestos commented Jun 27, 2022

jimconner commented Jun 28, 2022

Graceful continuation on NOT_FOUND #170

Graceful continuation on NOT_FOUND #170

Comments

jimconner commented Jun 27, 2022

antechrestos commented Jun 27, 2022

jimconner commented Jun 28, 2022