Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd database corruption #72

Open
DanielOaks opened this issue Mar 26, 2021 · 5 comments
Open

Odd database corruption #72

DanielOaks opened this issue Mar 26, 2021 · 5 comments

Comments

@DanielOaks
Copy link

DanielOaks commented Mar 26, 2021

Hi tidwall! We've been using buntdb in our irc server for a number of years now and just ran into a really odd instance of a corrupt db from a user. We've been looking into it over here: ergochat/ergo#1603

Basically, two commands in the db are truncated, in two separate places in the middle of the database file. The actual file itself isn't truncated, just those two entries, so it doesn't seem to be the same sort of error as #71?

The user shutdown our software successfully, and after a reboot buntdb just returns invalid database.


Here's the first instance of the failure w/ anonymised data (~60k lines into a ~69k line db file) where ~20 characters are truncated from the command. The third long json blob is what's cut off:

*3
$3
set
$35
account.lastseen danieliamanadminus
$635
{"":"2021-03-12T19:02:23.967591811Z","AdfdDfdsaSds":"2021-03-12T15:29:07.517666927Z","someoneAnywayLol":"2021-01-10T14:59:37.444691184Z","aliceo":"2021-01-10T10:32:25.489426821Z","koryCaccoAccountUwu":"2021-02-13T19:41:14.449738789Z","dananan":"2021-01-21T07:16:31.289440487Z","danantheman":"2021-01-03T12:19:36.737684151Z","MarioHey":"2021-01-13T07:49:24.638691227Z","MarMarMarMarioioioioi":"2021-03-12T14:06:04.567924428Z","ackYesIGotItYaBastardi":"2021-03-12T19:27:37.905129873Z","thisiseomth":"2021-02-03T07:48:35.702562025Z","anotheruserhere":"2021-01-23T10:01:09.970222719Z","mariomaybeidunnowh":"2021-02-24T15:29:32.572548859Z"}
*3
$3
set
$35
account.lastseen danieliamanadminus
$635
{"":"2021-03-12T19:26:52.925761759Z","AdfdDfdsaSds":"2021-03-12T15:29:07.517666927Z","someoneAnywayLol":"2021-01-10T14:59:37.444691184Z","aliceo":"2021-01-10T10:32:25.489426821Z","koryCaccoAccountUwu":"2021-02-13T19:41:14.449738789Z","dananan":"2021-01-21T07:16:31.289440487Z","danantheman":"2021-01-03T12:19:36.737684151Z","MarioHey":"2021-01-13T07:49:24.638691227Z","MarMarMarMarioioioioi":"2021-03-12T14:06:04.567924428Z","ackYesIGotItYaBastardi":"2021-03-12T19:27:37.905129873Z","thisiseomth":"2021-02-03T07:48:35.702562025Z","anotheruserhere":"2021-01-23T10:01:09.970222719Z","mariomaybeidunnowh":"2021-02-24T15:29:32.572548859Z"}
*3
$3
set
$35
account.lastseen danieliamanadminus
$635
{"":"2021-03-12T19:30:56.923724078Z","AdfdDfdsaSds":"2021-03-12T19:31:04.382088854Z","someoneAnywayLol":"2021-01-10T14:59:37.444691184Z","aliceo":"2021-01-10T10:32:25.489426821Z","koryCaccoAccountUwu":"2021-02-13T19:41:14.449738789Z","dananan":"2021-01-21T07:16:31.289440487Z","danantheman":"2021-01-03T12:19:36.737684151Z","MarioHey":"2021-01-13T07:49:24.638691227Z","MarMarMarMarioioioioi":"2021-03-12T14:06:04.567924428Z","ackYesIGotItYaBastardi":"2021-03-12T19:27:37.905129873Z","thisiseomth":"2021-02-03T07:48:35.702562025Z","anotheruserhere":"2021-01-23T10:01:09.970222719Z","mariomaybeidunnowh":"2021-02-24T1*3
$3
set
$20
account.lastseen ab4
$42
{"nvuer":"2021-03-12T19:45:13.201442563Z"}
*3
$3
set
$19
account.lastseen z1
$41
{"qygd":"2021-03-12T20:02:29.353327449Z"}
*3
$3
set
$35
account.lastseen danieliamanadminus
$635
{"":"2021-03-12T20:03:24.975077942Z","AdfdDfdsaSds":"2021-03-12T19:33:43.461586221Z","someoneAnywayLol":"2021-01-10T14:59:37.444691184Z","aliceo":"2021-01-10T10:32:25.489426821Z","koryCaccoAccountUwu":"2021-02-13T19:41:14.449738789Z","dananan":"2021-01-21T07:16:31.289440487Z","danantheman":"2021-01-03T12:19:36.737684151Z","MarioHey":"2021-01-13T07:49:24.638691227Z","MarMarMarMarioioioioi":"2021-03-12T14:06:04.567924428Z","ackYesIGotItYaBastardi":"2021-03-12T19:27:37.905129873Z","thisiseomth":"2021-02-03T07:48:35.702562025Z","anotheruserhere":"2021-01-23T10:01:09.970222719Z","mariomaybeidunnowh":"2021-02-24T15:29:32.572548859Z"}
*3
$3
set
$20
account.lastseen ab4
$42
{"nvuer":"2021-03-12T20:45:13.201454136Z"}

The above instance returns an error at line 842 because it gets dumped in the middle of a line.


The second instance looks almost identical, just that ~103 characters are cut off and it happens to fall on a \r\n so buntdb thinks it's finished a command, tries to read a new one, and gets an a instead of a * (since * indicates 'new command') and returns an error on line 781. This second instance is about 63k lines into the 69k line db file.

There's no extra data inserted into the db at all, just the end of those two long lines that are missing.


If it can help, we open up our db here and we save that key in the Update call here.

Thanks again for the great library, it's done really well for our uses so far! I wanted to track this down further, but figured it'd be wise to ask first since it looked strange. If you're after any more info just let me know.

@tidwall
Copy link
Owner

tidwall commented Mar 26, 2021

Hi Daniel.

I agree that it does not seem to be the same sort of error as #71.

All data is sequentially appended to the tail of buntdb db file it should not be possible to write in the middle of the file. Also, loading a corrupted db file or a file with partially written entries (as in #71) is not allowed and buntdb will returns an error, so it shouldn't be caused by a previous process failing to finish writing data or a bad fsync.

The only thing that I can think that might cause an issue like this is if two or more processes opened the buntdb database at the same time and each tried to write entries at the same time.

@tidwall
Copy link
Owner

tidwall commented Mar 29, 2021

I've been thinking about this issue some more. I'm wondering if I should not introduce an flock feature to BuntDB. If the problem is related to multiple processes clobbering the file then perhaps this could help.

@tidwall
Copy link
Owner

tidwall commented Apr 1, 2021

I've been banging around in the code and I saw that there is a write operation that could partially write data and still fail.

It's likely not common, but in the case of low system resources, it could happen. Which would leave the database in a corrupted state.

I wrote a test utility to reproduce the issue.

https://gist.github.com/tidwall/e823bcb75558c2a590cc17227d630ee1

I fixed the problem b6a7490 and pushed a new version v1.2.3.

@slingamn
Copy link

slingamn commented Apr 2, 2021

Thanks so much! I agree that this is more likely than concurrent file access. We were thinking that if concurrent file access were causing this, the second half of the write would have to be interleaved somewhere else in the file, but we couldn't find it.

@tidwall
Copy link
Owner

tidwall commented Apr 2, 2021

You’re welcome. I’m glad it’s squared away. Let me know if you happen upon any more problems. Thanks for the detailed reporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants