-
I am building out a shell that would use pypyr pipelines for commands to help automate common engineering tasks. Given that this a running shell and resulting in the pipelines potentially being run more than once prior to exiting I will either need to disable the cache or clear it. Do you have any guidance on how this should be best accomplished? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
hey @biggiebk, this sounds like an awesome project and a nifty use for pypyr! Let me know how it goes, I'd love to see it action or your work in progress! As you've probably noticed, the pypyr cache is alive for as long as the process is. The cache is in memory, so it dies alongside the process. In other words, if you're spawning pypyr as a subprocess from your shell, then each pypyr invocation will have its own cache, you don't need to clear anything, you'll get a fresh cache instance on each spawned subprocess. If you're calling pypyr as an API from your shell, however, the cache will stay alive as long as the calling process is alive. This is generally not a bad thing. The most expensive operation in pypyr by a wide, wide margin is parsing the pipeline yaml. For snappy pipeline executions when running via API, you probably don't want to clear the pipeline cache just for giggles. There is no objection to the same pipeline running more than once from cache - in fact, that's what the cache is for, to speed up execution for subsequent runs of the same pipeline. If the pipeline yaml has NOT changed, or any underlying step/parser/backoff modules have NOT changed, then you DON'T need to reload the cache. However, the cache won't auto-reload any changes made to the original pipeline yaml on disk if you updated your pipeline since the 1st time that pipeline ran in the current process. If you do want to pick up any changes to the underlying pipeline yaml since the 1st run or changes to custom modules like steps, then you can refresh the caches like this: from pypyr.cache.loadercache import loader_cache
# clear all cached pipelines
loader_cache.clear_pipes()
# or only clear pipelines for specified loader
loader_cache.clear_pipes(loader_name='mypackage.my_custom_loader') There are a couple of other caches too - in each case, calling from pypyr.cache.backoffcache import backoff_cache
backoff_cache.clear()
from pypyr.cache.loadercache import loader_cache
# notice you can call clear() independently from clear_pipes().
# clear() will effectively result in pipelines being cleared also,
# because pipelines cache by loader. i.e if you call .clear() you
# don't need to call clear_pipes().
loader_cache.clear()
from pypyr.cache.namespacecache import pystring_namespace_cache
pystring_namespace_cache.clear()
from pypyr.cache.parsercache import contextparser_cache
contextparser_cache.clear()
from pypyr.cache.stepcache import step_cache
step_cache.clear() |
Beta Was this translation helpful? Give feedback.
-
Yeah I think should an interesting project and will definitely let you know once I have something up and running. I still do not seem to be able to get the cache to clear when using the API calls. Here is a quick and dirty example of what I am doing. Example python: from pypyr import pipelinerunner
from pypyr.cache.loadercache import loader_cache
from pypyr.cache.backoffcache import backoff_cache
from pypyr.cache.namespacecache import pystring_namespace_cache
from pypyr.cache.parsercache import contextparser_cache
from pypyr.cache.stepcache import step_cache
while True:
input()
context = pipelinerunner.run(pipeline_name='test')
print(context['myoutput'])
backoff_cache.clear()
loader_cache.clear()
pystring_namespace_cache.clear()
contextparser_cache.clear()
step_cache.clear()
Contents of test.yaml.
In between runs I update myoutput in test.yaml to say Second run and then attempt another go at it. Sadly the second run still only prints out First run. Any direction you can provide as to what I have wrong I would appreciate it. |
Beta Was this translation helpful? Give feedback.
-
oh sorry @biggiebk, I clean forgot to mention that the file loader also keeps its own cache. . . Add the following to your code and it'll work: from pypyr.loaders.file import _file_cache as file_cache
file_cache.clear() I don't know exactly how you're envisaging your user flows. . . it might be that your shell has an explicit command "clear cache" or "reload" or something that explicitly clears the caches, so you give your users control over exactly when they want the relatively expensive reloads & yaml parsing to happen (I guess a user should have a pretty good idea that they just edited some yaml therefore -> reload). . . or maybe for your use case it's better just always to assume the assets need reloading. With this in mind, for future:
from pypyr import pipelinerunner
context = pipelinerunner.run(pipeline_name='my-pipe')
pipelinerunner.clear_cache_all()
from pypyr.config import config
from pypyr import pipelinerunner
config.no_cache = True
context = pipelinerunner.run(pipeline_name='my-pipe') The idea would be that no cache would bypass the cache entirely each time, loading everything fresh each time. Word of warning, I suspect the no cache mode might still end up with these caveats: import importlib
importlib.reload(module) Be warned this is generally not something you should do willy nilly. Some python code implicitly assumes that the module is a singleton and will load once and once alone, so you might surprise developers who have some initialization side effects in the module. This is not really a pypyr thing, it's how python works. To make this clear. . . If a pypyrista creates a custom step called # ./mystep.py
def run_step(context):
print("this is my custom step") The python runtime will keep on using I have a suspicion you're probably more interested in the pipeline yaml changes, so this shouldn't affect you too much - but still something to keep in mind and communicate to your users so they don't get surprised! B) custom retry backoff strategies getting cached unless we take some extra care there. . . I guess you could argue this is getting into some esoteric terrain, however - I'll need to think through the pros and cons on this a bit more. |
Beta Was this translation helpful? Give feedback.
-
@yaythomas, thanks that did the trick. As for the cache I was originally think I'd like it just disabled or just clear it out after each run, however I see this differently now. I'd like the user to be able to make that decision based on what they are doing. Essentially if they are making a lot of pipeline updates they would be able to switch into a dev mode that either disables the cache or clears it after each run. Plus provide a clear cache command should somebody want to just sanitize the env or they just made a quick change to a pipeline. I think the ability to clear all the cache quickly and to disable would be a nice features that I would love to have available. Understood on the python behavior. |
Beta Was this translation helpful? Give feedback.
-
hey @biggiebk, I just published a new release with #316 and #317. Please see release notes here: https://pypyr.io/updates/releases/v5.8.0/ You'll find both the "clear all" and the "disable cache" features here. You can use any of the usual config sources in pypyr to set Hope this helps! |
Beta Was this translation helpful? Give feedback.
hey @biggiebk, I just published a new release with #316 and #317. Please see release notes here: https://pypyr.io/updates/releases/v5.8.0/
You'll find both the "clear all" and the "disable cache" features here.
You can use any of the usual config sources in pypyr to set
no_cache
mode (toml, yaml or $env), but if you just want to do it in code I also added an example for you in the docs here: https://pypyr.io/docs/api/run-pipeline/#disable-cacheHope this helps!