Disable or Clear Cache for API Pipeline #314

biggiebk · 2023-03-03T00:13:09Z

biggiebk
Mar 3, 2023

I am building out a shell that would use pypyr pipelines for commands to help automate common engineering tasks. Given that this a running shell and resulting in the pipelines potentially being run more than once prior to exiting I will either need to disable the cache or clear it. Do you have any guidance on how this should be best accomplished?

Answered by yaythomas

Mar 13, 2023

hey @biggiebk, I just published a new release with #316 and #317. Please see release notes here: https://pypyr.io/updates/releases/v5.8.0/

You'll find both the "clear all" and the "disable cache" features here.

You can use any of the usual config sources in pypyr to set no_cache mode (toml, yaml or $env), but if you just want to do it in code I also added an example for you in the docs here: https://pypyr.io/docs/api/run-pipeline/#disable-cache

Hope this helps!

View full answer

yaythomas · 2023-03-03T05:59:37Z

yaythomas
Mar 3, 2023
Maintainer

hey @biggiebk, this sounds like an awesome project and a nifty use for pypyr! Let me know how it goes, I'd love to see it action or your work in progress!

As you've probably noticed, the pypyr cache is alive for as long as the process is. The cache is in memory, so it dies alongside the process.

In other words, if you're spawning pypyr as a subprocess from your shell, then each pypyr invocation will have its own cache, you don't need to clear anything, you'll get a fresh cache instance on each spawned subprocess.

If you're calling pypyr as an API from your shell, however, the cache will stay alive as long as the calling process is alive.

This is generally not a bad thing. The most expensive operation in pypyr by a wide, wide margin is parsing the pipeline yaml. For snappy pipeline executions when running via API, you probably don't want to clear the pipeline cache just for giggles.

There is no objection to the same pipeline running more than once from cache - in fact, that's what the cache is for, to speed up execution for subsequent runs of the same pipeline.

If the pipeline yaml has NOT changed, or any underlying step/parser/backoff modules have NOT changed, then you DON'T need to reload the cache.

However, the cache won't auto-reload any changes made to the original pipeline yaml on disk if you updated your pipeline since the 1st time that pipeline ran in the current process.

If you do want to pick up any changes to the underlying pipeline yaml since the 1st run or changes to custom modules like steps, then you can refresh the caches like this:

from pypyr.cache.loadercache import loader_cache

# clear all cached pipelines
loader_cache.clear_pipes()

# or only clear pipelines for specified loader
loader_cache.clear_pipes(loader_name='mypackage.my_custom_loader')

There are a couple of other caches too - in each case, calling clear() will purge:

from pypyr.cache.backoffcache import backoff_cache
backoff_cache.clear()

from pypyr.cache.loadercache import loader_cache
# notice you can call clear() independently from clear_pipes(). 
# clear() will effectively result in pipelines being cleared also, 
# because pipelines cache by loader. i.e if you call .clear() you 
# don't need to call clear_pipes().
loader_cache.clear()

from pypyr.cache.namespacecache import pystring_namespace_cache
pystring_namespace_cache.clear()

from pypyr.cache.parsercache import contextparser_cache
contextparser_cache.clear()

from pypyr.cache.stepcache import step_cache
step_cache.clear()

0 replies

biggiebk · 2023-03-10T16:15:57Z

biggiebk
Mar 10, 2023
Author

Yeah I think should an interesting project and will definitely let you know once I have something up and running.

I still do not seem to be able to get the cache to clear when using the API calls. Here is a quick and dirty example of what I am doing.

Example python:

from pypyr import pipelinerunner
from pypyr.cache.loadercache import loader_cache
from pypyr.cache.backoffcache import backoff_cache
from pypyr.cache.namespacecache import pystring_namespace_cache
from pypyr.cache.parsercache import contextparser_cache
from pypyr.cache.stepcache import step_cache

while True:
	input()
	context = pipelinerunner.run(pipeline_name='test')
	print(context['myoutput'])
	backoff_cache.clear()
	loader_cache.clear()
	pystring_namespace_cache.clear()
	contextparser_cache.clear()
	step_cache.clear()

Contents of test.yaml.

context_parser: pypyr.parser.keyvaluepairs
steps:
  - name: pypyr.steps.set
    in:
      set:
        myoutput: First run

In between runs I update myoutput in test.yaml to say Second run and then attempt another go at it. Sadly the second run still only prints out First run. Any direction you can provide as to what I have wrong I would appreciate it.

0 replies

yaythomas · 2023-03-11T19:15:03Z

yaythomas
Mar 11, 2023
Maintainer

oh sorry @biggiebk, I clean forgot to mention that the file loader also keeps its own cache. . .

Add the following to your code and it'll work:

from pypyr.loaders.file import _file_cache as file_cache

file_cache.clear()

I don't know exactly how you're envisaging your user flows. . . it might be that your shell has an explicit command "clear cache" or "reload" or something that explicitly clears the caches, so you give your users control over exactly when they want the relatively expensive reloads & yaml parsing to happen (I guess a user should have a pretty good idea that they just edited some yaml therefore -> reload). . . or maybe for your use case it's better just always to assume the assets need reloading.

With this in mind, for future:
I'm thinking about what I can do in the pypyr core to make your life simpler here. . . Maybe:

A single clear_cache_all(), that'll do all the above clearing of the various caches, so you could just:

from pypyr import pipelinerunner

context = pipelinerunner.run(pipeline_name='my-pipe')
pipelinerunner.clear_cache_all()

Or add a "no cache" mode, that you could configure via ENV variable $PYPYR_NO_CACHE=1 or via pyproject.toml/pypyr-config.yaml or via code like this:

from pypyr.config import config
from pypyr import pipelinerunner

config.no_cache = True

context = pipelinerunner.run(pipeline_name='my-pipe')

The idea would be that no cache would bypass the cache entirely each time, loading everything fresh each time.

Word of warning, I suspect the no cache mode might still end up with these caveats:
A) remember that python caches modules in sys.modules, so once a custom step has been initialised Python will keep on using the loaded module and NOT pick up any changes on disk UNLESS you explicitly reload the module like this:

import importlib
importlib.reload(module)

Be warned this is generally not something you should do willy nilly. Some python code implicitly assumes that the module is a singleton and will load once and once alone, so you might surprise developers who have some initialization side effects in the module.

This is not really a pypyr thing, it's how python works.

To make this clear. . . If a pypyrista creates a custom step called ./mystep.py that looks like this:

# ./mystep.py
def run_step(context):
    print("this is my custom step")

The python runtime will keep on using ./mystep.py as it was when the python runtime FIRST encountered it during the current lifetime of the python process.

I have a suspicion you're probably more interested in the pipeline yaml changes, so this shouldn't affect you too much - but still something to keep in mind and communicate to your users so they don't get surprised!

B) custom retry backoff strategies getting cached unless we take some extra care there. . . I guess you could argue this is getting into some esoteric terrain, however - I'll need to think through the pros and cons on this a bit more.

0 replies

biggiebk · 2023-03-11T22:32:36Z

biggiebk
Mar 11, 2023
Author

@yaythomas, thanks that did the trick.

As for the cache I was originally think I'd like it just disabled or just clear it out after each run, however I see this differently now. I'd like the user to be able to make that decision based on what they are doing. Essentially if they are making a lot of pipeline updates they would be able to switch into a dev mode that either disables the cache or clears it after each run. Plus provide a clear cache command should somebody want to just sanitize the env or they just made a quick change to a pipeline.

I think the ability to clear all the cache quickly and to disable would be a nice features that I would love to have available.

Understood on the python behavior.

0 replies

yaythomas · 2023-03-13T01:56:10Z

yaythomas
Mar 13, 2023
Maintainer

hey @biggiebk, I just published a new release with #316 and #317. Please see release notes here: https://pypyr.io/updates/releases/v5.8.0/

You'll find both the "clear all" and the "disable cache" features here.

You can use any of the usual config sources in pypyr to set no_cache mode (toml, yaml or $env), but if you just want to do it in code I also added an example for you in the docs here: https://pypyr.io/docs/api/run-pipeline/#disable-cache

Hope this helps!

1 reply

biggiebk Mar 14, 2023
Author

@yaythomas. Thanks for the quick turn around on this was not expecting it to be so soon. Tested both the clear all and disabling the cache. Both worked perfect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable or Clear Cache for API Pipeline #314

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Disable or Clear Cache for API Pipeline #314

biggiebk Mar 3, 2023

Replies: 5 comments · 1 reply

yaythomas Mar 3, 2023 Maintainer

biggiebk Mar 10, 2023 Author

yaythomas Mar 11, 2023 Maintainer

biggiebk Mar 11, 2023 Author

yaythomas Mar 13, 2023 Maintainer

biggiebk Mar 14, 2023 Author

biggiebk
Mar 3, 2023

Replies: 5 comments 1 reply

yaythomas
Mar 3, 2023
Maintainer

biggiebk
Mar 10, 2023
Author

yaythomas
Mar 11, 2023
Maintainer

biggiebk
Mar 11, 2023
Author

yaythomas
Mar 13, 2023
Maintainer

biggiebk Mar 14, 2023
Author