Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Delete Behavior #246

Open
ESattler opened this issue Nov 10, 2022 · 5 comments
Open

Inconsistent Delete Behavior #246

ESattler opened this issue Nov 10, 2022 · 5 comments

Comments

@ESattler
Copy link

Version: 22.1.0

I noticed in my larger script inconsistent behavior where it was throwing an error deleting when it shouldn't, so I recreated a smaller example:

from collections import OrderedDict

test = OrderedDict([
    ('name', 'dog'), 
    ('test', '12345'), 
    ('inputs', OrderedDict([
        ('animal', 'cat'), 
        ('test-value', 'test'),
    ])), 
    ('test2', [OrderedDict([
        ('name', 'dog2')
    ])])
])

import pprint

pprint.pprint(test)

import glom

print(glom.glom(test, "inputs.animal"))
glom.delete(test, "inputs.animal")

pprint.pprint(test)

If I run this over and over, sometimes it works and sometimes it doesn't and throws an error

glom.mutation.PathDeleteError: error raised while processing, details below.
 Target-spec trace (most recent last):
 - Target: OrderedDict([('name', 'dog'), ('test', '12345'), ('inputs', OrderedDict([('animal', 'c... (len=4)
 - Spec: Delete(Path('inputs', 'animal'))
 - Spec: Path('inputs')
glom.mutation.PathDeleteError: could not delete 'animal' on object at Path('inputs'), got error: AttributeError('animal')

Which doesn't make any sense. it successfully prints out what is at inputs.animal but fails then to delete that path and it's inconsistent. Out of 10 runs in a row, 3 of them failed for deleting.

I tried version 20.11.0 too out of curiosity and it didn't seem to change anything.

@ESattler
Copy link
Author

Digging into it deeper, I've figured out that when it fails, it's because here _delete returns a <built-in function delattr> but when it passes, it is <built-in function delitem>

@kurtbrose
Copy link
Collaborator

I suspect what is happening here is that delete is mutating the dict, and the second time you run it, it fails.

Here's a quick example script, you can see as long as we use a fresh input it works, but if we re-use the same data structure twice we get that exact error.

from collections import OrderedDict
import glom

def build():
   return OrderedDict([
      ('name', 'dog'), 
      ('test', '12345'), 
      ('inputs', OrderedDict([
          ('animal', 'cat'), 
          ('test-value', 'test'),
      ])), 
      ('test2', [OrderedDict([
          ('name', 'dog2')
      ])])
   ])

glom.delete(build(), "inputs.animal")
glom.delete(build(), "inputs.animal")
glom.delete(build(), "inputs.animal")

test = build()
glom.delete(test, "inputs.animal")
glom.delete(test, "inputs.animal")

You mentioned this was a minimization of an error in a more complex case. Maybe the original case is different?

Also, does delete destructively mutating its input not what you expected? Were you expecting it to return a mutated copy?

@ESattler
Copy link
Author

@kurtbrose So in my actual use case we are reading in a YAML file and modifying it with glom to delete stuff and then dumping it back. I noticed sometimes it would update and sometimes it wouldn't and it's because I had ignore missing set to true which was masking the error of it not finding the element even though it existed.

In my script example above, I was actually just straight running that exact script over and over and sometimes it would fail and sometimes it wouldn't but as you can see, I'm not using the same dictionary. Start of the script we initialize test and then perform actions on it. So maybe it could be a memory issue with Python? But I am fresh running those exact lines and seeing it inconsistently work and fail randomly.

Out of curiosity, did you run my example above? I'm curious if you can replicate it. I actually decided to try this on my home computer running Windows and using the Windows Subsystem for Linux and I can't get it to fail. When it was failing before, it was on a Macbook and the other people who tested it for me were also running Mac, so it makes me wonder if some weird memory or caching issue with Mac and running the script.

@mahmoud
Copy link
Owner

mahmoud commented Jan 18, 2023

Hey! So I tried this back in November and had no luck, but now on 3.10 I think I've repro'd the behavior! It's super late here, and the issue is super funky, so forgive these rough notes.

First, a bit about my repro. When the script above is run with python -m pdb, it succeeds. Without the -m pdb it fails. Adding unconditional prints in glom core, same effect; remove the prints, and the failure came back. Always on the second try though, which makes me suspect there's something ugly in __pycache__. Some sort of import side effect? Likely something to do with dict/set random seed (does pdb use a consistent seed?). Very wild!

Next: Quick fix. Explicitly register the OrderedDict type here. We'd been relying on dict functionality to pick it up, but clearly that doesn't always work as intended. Why does this work?

At the moment, when glom encounters a type that it doesn't recognize, it tries to find the closest type it knows about. Then it attempts to use the operation handler associated with that. This has two implications:

  1. Unrelated abstract data types can win this "closest type" check. Here, ObjStyleKeys was being declared the closest type to OrderedDict, due to some nondeterminism somewhere.

More importantly:

  1. Autodiscover functions are only run against explicitly registered types. Delete's autodiscover reaches the correct conclusion about OrderedDict. Might need to rethink this, especially now that we have op type caching.

Anyways, super interesting bug, thanks for reporting it and making such a great minimal case! This was a tough one to report and I'm really glad you did.

I'll push the quick fix for OD momentarily, for release with glom 23 this week. Deeper fix to follow. Thanks again!

@kurtbrose
Copy link
Collaborator

Oops, I see now I missed in the original report:

got error: AttributeError('animal')

That means it was trying to do __delattr__ on an attribute, not __delitem__ on a key.

A temporary work-around would be to use a T; that removes ambiguity and glom doesn't try to infer which operator is correct for the type.

glom.delete(test, T["inputs"]["animal"])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants