Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTree persisted in ZODB throws KeyError when accessing item clearly in the Btree - maxKey() causes an Access Violation #102

Open
matthchr opened this issue Mar 15, 2019 · 8 comments

Comments

@matthchr
Copy link

I have a BTree which I am persisting via ZODB. In this BTree I add and remove items over time. Somehow, I've managed to get a BTree which has a key for a particular item, but raises a KeyError if I try to actually access that item. Whatever happened to this BTree it's been persisted to my ZODB and even closing/reloading from the DB maintains the same behavior.

In the below code-snippets, repro is the BTree in question loaded from ZODB.

print('job-0000000014' in repro)  # prints True
print('job-0000000386' in repro)  # prints False
for item in repro:
    print(item)

shows

...
job-0000000014
...
job-0000000386

Finally, this code raises a KeyError on job-0000000386:

for item in repro:
    print('{} = {}'.format(item, repro[item]))

I stepped through a bit in a debugger and it seems that this item is the last item in the last bucket, and interestingly, I can get the item from the bucket directly -- the following code works fine and returns the object t2.

bucket = repro._firstbucket
    while bucket._next is not None:
        bucket = bucket._next
    t1 = bucket.get('job-0000000281')
    t2 = bucket.get('job-0000000386')

Digging a bit more, it seems that calling repro.maxKey() triggers an access violation: -1073741819 (0xC0000005)

If you need more details about the structure of my BTree with repros the problem I can share them with you (unfortunately I am not able to construct a new one which reproduces this problem deterministically, but I have a copy of one from a ZODB)

@d-maurer
Copy link
Contributor

Is the problem reproducible on the result of pickling and then unpickling your tree?
If this is the case, you could attach the pickle of your tree to this issue and we would be able to analyse the tree. Otherwise, you have a way to mostly repair your tree with this approach.
Of course, it is possible that your tree can no longer be pickled.

Note that you should also tell which ZODB version you are using.

A BTree consists of two parts: a tree form access structure and a linked list of buckets. Your observation indicates that the access structure is damaged. Fortunately, the entire relevant information is contained in the linked bucket list. You can rebuild the tree from this list: create a new tree, iterate over the bucket list and put their content into the new tree. Thus, you willl be able to recover. Nevertheless, it might be interesting to learn in what way the structure is damaged.

@matthchr
Copy link
Author

matthchr commented Mar 15, 2019

@d-maurer thanks for the quick response. We're using ZODB 5.5.1, BTrees 4.5.1

Yes, the problem persists through pickling -- even modifying the broken BTree in a variety of ways (such as removing other items and then re-persisting it) doesn't make the issue go away.

I've attached the BTree in question saved into a basic ZODB with the same version mentioned above, located at matthchr.zip -- unzip it and it contains matthchr.db.

You can crack it open like so:

    zodb = ZODB.DB('matthchr.db')
    conn = zodb.open()
    repro = conn.root()['a']

I'm not super interested in repairing this (there's not any critical data in the BTree which I am afraid of losing) -- more in understanding either what we did wrong to cause this, or if there is a bug in ZODB/BTrees which needs to be fixed.

@d-maurer
Copy link
Contributor

d-maurer commented Mar 15, 2019 via email

@d-maurer
Copy link
Contributor

d-maurer commented Mar 15, 2019 via email

@matthchr
Copy link
Author

The instance we saw this bug may have had concurrent writes. We actually do our best to avoid them via locking but after I saw this I went and looked and realized there's at least once place where we wern't locking properly and could've allowed a concurrent write (add w/ simultaneous delete).

I don't think many would be true though. At the most, 2-3 concurrent writes would've happened, certainly not more than that.

I've seen 5-6 additional repro's of this issue since I started looking for it, I'll make some changes to improve our locking and see if that resolves the problem, as a concurrency issue was what it felt like to me as well...

@d-maurer
Copy link
Contributor

d-maurer commented Mar 16, 2019 via email

@d-maurer
Copy link
Contributor

d-maurer commented Mar 16, 2019 via email

@d-maurer
Copy link
Contributor

d-maurer commented Mar 19, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants