You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a mirror pool experiencing some data corruption. Looks to be caused by a PCIe issue while writing some data that garbled a few blocks so can't be retrieved from either disk. Only a few hits against checksum errors during scrubs.
The corrupted file in question is a KVM VM qcow2 file so I thought perhaps I could find the issue in the guest, fix it, and maybe the unallocated block(s) would fall off in a scrub and everything would be fine.
As part of this cleanup, I also removed some old snapshots from the dataset in question. After doing the above, I tried to run a scrub. This process hung. See error below.
Describe how to reproduce the problem
I have no idea how to reproduce this. The assert line seems to point at the ashift size being incorrect on a block somewhere, but I have no idea how that could have happened.
Include any warning/errors/backtraces from the system logs
I've tried booting to rescue mode but the same thing happens when I attempt to import the pool. Is there any way to recover the data? It seems like the data on disk should still be good aside from this one place where it is encountering a mismatch in the ashift value, or is the entire vdev toast?
The text was updated successfully, but these errors were encountered:
Specifically, the error means "the offset we are trying to free is not a valid offset on this vdev" - e.g. if you had an ashift 12 (4k) vdev, and tried to free something that was 512b into it, you'd trip this.
You might be able to import the pool read-only (since it's not going to evaluate trying to free anything or free space or anything) and yank your data off that way.
I did try a read-only import but it threw the same error message and hung again. I just have the one vdev and it is 4k ashift. Would this be pointing at a single block's offset or is this something about the metadata of the entire vdev?
Edit: I've also tried
echo 1 >> /sys/module/zfs/parameters/zfs_recover
Followed by a read-only import with no luck. Same panic error.
Might be able to use git's zdb feature to emit send streams from things too damaged to import normally and get datasets off that way.
I believe it's referring to thinking it has an element to free from a metaslab and finding out the metaslab is e.g. 4k aligned and it's trying to remove 512b into it, or something like that. So it's a specific object, but it's already pending removal in its head.
You could try a readonly import at an older txg for the older txgs listed in the uberblock still, that might or might not fly. You could also use zdb to try "simulating" if that's going to panic rather than letting your kernel do it.
System information
Describe the problem you're observing
I have a mirror pool experiencing some data corruption. Looks to be caused by a PCIe issue while writing some data that garbled a few blocks so can't be retrieved from either disk. Only a few hits against checksum errors during scrubs.
The corrupted file in question is a KVM VM qcow2 file so I thought perhaps I could find the issue in the guest, fix it, and maybe the unallocated block(s) would fall off in a scrub and everything would be fine.
As part of this cleanup, I also removed some old snapshots from the dataset in question. After doing the above, I tried to run a scrub. This process hung. See error below.
Describe how to reproduce the problem
I have no idea how to reproduce this. The assert line seems to point at the ashift size being incorrect on a block somewhere, but I have no idea how that could have happened.
Include any warning/errors/backtraces from the system logs
I've tried booting to rescue mode but the same thing happens when I attempt to import the pool. Is there any way to recover the data? It seems like the data on disk should still be good aside from this one place where it is encountering a mismatch in the ashift value, or is the entire vdev toast?
The text was updated successfully, but these errors were encountered: