MDB_NOTFOUND: No matching key/data pair found after full disk #587

datryn-ribdun · 2024-01-21T15:57:47Z

Just found out my VPS that hosts 3 urbit ships had ran out of disk space and so those ships had crashed. I ran chop on 2 of them and they started working fine. My main (~datryn-ribdun) was giving me some "loom corrupt" and mentioned "north" (the exact error is lost after some VPS reboots), but I remembered that deleting /.urb/chk lets you trigger a replay of all events that resolves snapshot corruption issues.
Now when I run datryn-ribdun/.run --loom 32 (my usual command) I get the typical start lines

urbit 2.12
boot: home is /home/urbit/urbit-ships/datryn-ribdun
loom: mapped 2048MB
lite: arvo formula 2a2274c9
lite: core 4bb376f0
lite: final state 4bb376f0
loom: mapped 4096MB
boot: protected loom
live: logical boot
boot: installed 661 jets
------------------playback starting ----------------------
pier: replaying events 1-2907645618
lmdb: read: initial cursor_get failed at 1: MDB_NOTFOUND: No matching key/data pair found
pier: disk read bail

I also tried <pier>/.run vile and got vile: unable to extract key file
i was pretty confident that deleting <pier>/.urb/chk was safe, but now I'm worried I somehow deleted some key file. Looking at <pier>/.urb/log is still 37G, so i believe I still have my event history. Any ideas how to proceed? I'd hate to have to breach my main

Vere 2.12

The text was updated successfully, but these errors were encountered:

mrdomino · 2024-01-21T16:14:57Z

Issue is also present for roll on the develop branch. AFAICT it happens any time the checkpoint is deleted on a pier that has been rolled or (apparently) chopped. ~~Nothing to do with the full disk.~~

mrdomino · 2024-01-21T16:20:45Z

May not be that cut-and-dry. I should say instead: I have experienced the MDB_NOTFOUND error as well on piers that have been rolled on 3.0 prerelease.

My testing so far (IIRC - this was yesterday or so) has revealed, all on 3.0 prerelease:

Delete chk, no roll: pier replays events successfully
Delete chk, roll: MDB_NOTFOUND
Don't delete chk, roll: no errors

datryn-ribdun · 2024-01-21T16:24:25Z

^^ Seems like a related issue, but I never ran roll and never even had a successful chop because it was complaining about loom being corrupted.

mrdomino · 2024-01-22T01:29:40Z

The roll issue seems easily resolvable; just a matter of the correct checkpoint not being copied in. Manually copying in the north.bin and south.bin from the checkpoint fixes it.

Are there any contents under .urb/chk on your pier? In the error state, I had a north.bin and south.bin that were both size 0.

datryn-ribdun · 2024-01-27T03:50:02Z

Yup I see,

-rw-rw-r-- 1 urbit urbit    0 Jan 15 22:38 north.bin
-rw-rw-r-- 1 urbit urbit    0 Jan 15 22:38 south.bin

urbit is my user on this vm.

datryn-ribdun · 2024-01-27T03:52:52Z

I just tried rm -r .urb/chk followed by ./.run play and get the following

loom: mapped 2048MB
boot: protected loom
live: logical boot
boot: installed 661 jets
lmdb: read: initial cursor_get failed at 1: MDB_NOTFOUND: No matching key/data pair found
boot: read failed
mars: boot fail

mrdomino · 2024-01-27T14:25:10Z

You don't have any other good checkpoints, e.g. under bhk?

datryn-ribdun · 2024-01-28T04:30:29Z

I had no idea bhk was backup that could be swapped in for chk.
Tried a cp bhk/* chk/ and started the ship. It' been replaying for a few hours, so hopefully this will work.

Assuming this fixes things, there's probably 2 things that could be improved with vere:

If there is no .urb/chk/ directory, why does vere make one and the create a 0byte north.bin and south.bin, then complain that "No matching key/data pair found"? Seems like before this point there should be a failure for No bin files found, did you delete chk/? Try moving the .bin files from .urb/bhk into .urb/chk. IDK on wording, but someway to not scare the user into thinking their ship is perma-broken.
Not filling disk to 0b remaining. Once disk is full its a pain to have to find something to delete, then chop, then boot ship to make sure things work, then delete backup chop. I might be overfitting and thinking this is a more general problem than it actually is, but anyone who runs on a cheap VPS probably runs on <100GB of disk and a well used ship can easily pass that if you're not regularly chopping.

datryn-ribdun · 2024-01-28T16:59:18Z

After many hours of

pier: ($event_number): play: done
pier: ($event_number+1): play: done

my terminal was spammed ith

recover: top: meme

recover: top: meme

recover: top: meme
.....
....
loom: external fault: 0x50

datryn-ribdun · 2024-01-28T21:47:03Z

Trying again with ./.run play --loom 32 and killing all other RAM heavy processes running on this VPS.

datryn-ribdun · 2024-01-29T04:52:08Z

Tried with above command and even ./.run --loom 33 thinking that maybe adding some loom headroom would help, but every time I hit the same issue of

recover: top: meme
loom: external fault: 0x50 (0x20000000 : 0x280000000)

Assertion '0' failed in pkg/noun/manage.c:1791
home:bailing out
Aborted

Tenari · 2024-05-13T21:40:08Z

seems related urbit/urbit#6989

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDB_NOTFOUND: No matching key/data pair found after full disk #587

MDB_NOTFOUND: No matching key/data pair found after full disk #587

datryn-ribdun commented Jan 21, 2024 •

edited

mrdomino commented Jan 21, 2024 •

edited

mrdomino commented Jan 21, 2024

datryn-ribdun commented Jan 21, 2024

mrdomino commented Jan 22, 2024

datryn-ribdun commented Jan 27, 2024 •

edited

datryn-ribdun commented Jan 27, 2024

mrdomino commented Jan 27, 2024

datryn-ribdun commented Jan 28, 2024 •

edited

datryn-ribdun commented Jan 28, 2024 •

edited

datryn-ribdun commented Jan 28, 2024

datryn-ribdun commented Jan 29, 2024

Tenari commented May 13, 2024

MDB_NOTFOUND: No matching key/data pair found after full disk #587

MDB_NOTFOUND: No matching key/data pair found after full disk #587

Comments

datryn-ribdun commented Jan 21, 2024 • edited

mrdomino commented Jan 21, 2024 • edited

mrdomino commented Jan 21, 2024

datryn-ribdun commented Jan 21, 2024

mrdomino commented Jan 22, 2024

datryn-ribdun commented Jan 27, 2024 • edited

datryn-ribdun commented Jan 27, 2024

mrdomino commented Jan 27, 2024

datryn-ribdun commented Jan 28, 2024 • edited

datryn-ribdun commented Jan 28, 2024 • edited

datryn-ribdun commented Jan 28, 2024

datryn-ribdun commented Jan 29, 2024

Tenari commented May 13, 2024

datryn-ribdun commented Jan 21, 2024 •

edited

mrdomino commented Jan 21, 2024 •

edited

datryn-ribdun commented Jan 27, 2024 •

edited

datryn-ribdun commented Jan 28, 2024 •

edited

datryn-ribdun commented Jan 28, 2024 •

edited