Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not swallow PicklingError #1359

Merged
merged 11 commits into from Feb 15, 2023

Conversation

Nielius
Copy link
Contributor

@Nielius Nielius commented Nov 8, 2022

Currently, when caching fails because the function output can't be pickled, there is no warning, but the output will not be cached. This surprises users, who expect their output to be cached.

This is related to #952 and #896 . It does not solve the underlying race condition in #896, but it does implement the suggestion made by @tomMoral in this comment: it just raises an error when the caching fails due to pickling.

@codecov
Copy link

codecov bot commented Nov 8, 2022

Codecov Report

Base: 93.32% // Head: 93.96% // Increases project coverage by +0.63% 🎉

Coverage data is based on head (e4d1c2a) compared to base (83ba40f).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1359      +/-   ##
==========================================
+ Coverage   93.32%   93.96%   +0.63%     
==========================================
  Files          52       52              
  Lines        7300     7328      +28     
==========================================
+ Hits         6813     6886      +73     
+ Misses        487      442      -45     
Impacted Files Coverage Δ
joblib/_store_backends.py 91.58% <100.00%> (+0.81%) ⬆️
joblib/memory.py 95.26% <100.00%> (+0.01%) ⬆️
joblib/test/test_store_backends.py 94.54% <100.00%> (+3.11%) ⬆️
joblib/test/test_memory.py 98.58% <0.00%> (+0.14%) ⬆️
joblib/test/test_parallel.py 97.05% <0.00%> (+0.23%) ⬆️
joblib/_parallel_backends.py 93.75% <0.00%> (+0.73%) ⬆️
joblib/func_inspect.py 92.34% <0.00%> (+1.09%) ⬆️
joblib/logger.py 86.84% <0.00%> (+1.31%) ⬆️
joblib/_memmapping_reducer.py 95.88% <0.00%> (+1.49%) ⬆️
joblib/disk.py 92.06% <0.00%> (+1.58%) ⬆️
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

Thanks a lot for this PR :)
It looks simple enough but this change could break existing code, so we should do the change in two steps:

  • First raise a warning (instead of swallowing the error).
  • Then in a later version change to raise the PicklingError.

Could you change this to raise a warning instead? Also could you please move the try/except in the write_func?

This might be worth it to add a test, to make sure the error is correctly raised.

joblib/_store_backends.py Outdated Show resolved Hide resolved
@Nielius
Copy link
Contributor Author

Nielius commented Nov 12, 2022

Hi,

Thanks for the quick feedback @tomMoral !

I've addressed your feedback by:

  • moving the try/except block
  • changing to a warning instead of raising an exception
  • adding a unit test to test that a warning is being logged

In addition, I've also changed the catch-all except block to log a warning. When I was writing the test, I noticed how frustrating it was that it literally swallows every kind of error. I think the warnings.warn( ...) is better than just doing nothing --- do you agree?

Copy link
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests!!

LGTM, just a small question for putting a dedicated try/except for concurrent folder and raising others in the future?

'failed to pickle output. '
'In future versions of joblib '
'this will raise an exception.'
'Exception: %s.' % e,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid the old formating synthax in the new code base, so please use:

Suggested change
'Exception: %s.' % e,
f'Exception: {e}.'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you please rewrap the string to better use the available <80 char limit of PEP8?

Some people complain than 80 is not enough, others seem reluctant to use more than 60 apparently ;) 80 is a good middle ground.

Comment on lines 202 to 206
except Exception as e: # noqa: E722
warnings.warn(
'Unable to cache to disk. '
'Possibly a race condition in the creation of the directory. '
'Exception: %s' % e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are changing a bit this, I would be in favor of sanitizing this, by putting a try/except for OSError in create_location, and having this raise a deprecation warning saying that any other error will be raised in future version of joblib. WDYT?

Small nitpick on style.

Suggested change
except Exception as e: # noqa: E722
warnings.warn(
'Unable to cache to disk. '
'Possibly a race condition in the creation of the directory. '
'Exception: %s' % e)
except Exception as e: # noqa: E722
warnings.warn(
'Unable to cache to disk. '
'Possibly a race condition in the creation of the directory. '
f'Exception: {e}'
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So following the conversation with @ogrisel , I would not do a deprecation warning but just a warning for the error.
Maybe create a specfic CacheWarning that will be easy to filter out?

@ogrisel
Copy link
Contributor

ogrisel commented Nov 16, 2022

We could even have a constructor parameter on the Memory object: error="ignore", error="warn", or error="raise".

I would be in favor of using error="warn" by default but users who prefer other options are free to switch easily.

@tomMoral
Copy link
Contributor

This feels a bit YAGNI to me. There are already many parameters that users don't understand, I would avoid adding some if possible.
The rational of swallowing error linked to dumping was that caching should not create extra problems due to OS (memory/ disk space/collision). So thinking back, I would say that raising a warning for any error except pickling one is the right behavior.
That way, users can implement the three behaviors simply with warning filters? WDYT?

@ogrisel
Copy link
Contributor

ogrisel commented Nov 17, 2022

Alright then.

Copy link
Contributor

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@tomMoral, @ogrisel, I applied the requested modifications regarding formatting and added a new warning for all errors but pickling error (it's available from joblib.memory).

Copy link
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @Nielius and @jeremiedbb for finishing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants