Re: Corrupted filesystem, looking for guidance

From: Chris Murphy <lists@colorremedies.com>
To: "Sébastien Luttringer" <seblu@seblu.net>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Corrupted filesystem, looking for guidance
Date: Tue, 12 Feb 2019 15:57:41 -0700	[thread overview]
Message-ID: <CAJCQCtSJyDOhw=oCN5kjrZexu0H9pOOiPGyJBGRNUtn2wruKAw@mail.gmail.com> (raw)
In-Reply-To: <7ef0e91501a04cd4c5e0d942db638a0b50ef3ec3.camel@seblu.net>

On Mon, Feb 11, 2019 at 8:16 PM Sébastien Luttringer <seblu@seblu.net>
wrote:

>
> I'm not able to find the root cause of the btrfs corruption. All disks
> looks
> healthy (selftest ok, no error logged), no kernel trace of link failure or
> something.
> I run a check on the md layer, and 2 mismatch was discovered:
> Feb 11 04:02:35 kernel: md127: mismatch sector in range 490387096-4903871=
04
> Feb 11 04:31:14 kernel: md127: mismatch sector in range
> 1024770720-1024770728
> I run a repair (resync) but mismatch are still around after.
>

Both mismatches are 8 512 sectors which is consisted with bad data on a
single 4096 byte physical sector on an advanced format drive.

This command
echo repair > /sys/block/mdX/md/sync_action

FYI: This only does full stripe reads, recomputes parity and overwrites the
parity strip. It assumes the data strips are correct, so long as the
underlying member devices do not return a read error. And the only way they
can return a read error is if their SCT ERC time is less than the kernel's
SCSI command timer. Otherwise errors can accumulate.

smartctl -l scterc /dev/sdX
cat /sys/block/sdX/device/timeout

The first must be a lesser value than the second. If the first is disabled
and can't be enabled, then the generally accepted assumed maximum time for
recoveries is an almost unbelievable 180 seconds; so the second needs to be
set to 180 and is not persistent. You'll need a udev rule or startup script
to set it at every boot.

It is sufficient to merely run a check, rather than repair, to trigger the
proper md RAID fixup from a device read error.

Getting a mismatch on a check means there's a hardware problem somewhere.
The mismatch count only tells you there is a mismatch between data strips
and their parity strip. It doesn't tell you which device is wrong. And if
there are no read errors, and no link resets, and yet you get mismatches,
that suggests silent data corruption. Further, if the mismatches are
consistently in the same sector range, it suggests the repair scrub
returned one set of data, and the subsequent check scrub returned different
data - that's the only way you get mismatches following a repair scrub.

All Btrfs can do in this case is hopefully it was using DUP metadata, and
then it can recover so long as the origin of the problem isn't memory
defect related. If it's bad RAM, then chances are both copies of metadata
will be identically wrong and thus no help in recovery.

>How could I save my filesystem? Should I try --repair or --init-csum-tree?

If it mounts read-only, update your backups. That is the first priority. Be
prepared to need them. If it will not mount read only anymore then I
suggest 'btrfs restore' to scrape data out of the volume to a backup while
it's still possible. Any repair attempt means writing changes, and any
writes are inherently risky in this situation. So yeah - if the data is
important, focus on backups first.

Next, I expect until the RAID is healthy that it's difficult to make a
successful repair of the file system. And for the RAID to be healthy, first
memory and storage hardware needs to be certainly healthy - the fact there
are mismatches following an md repair scrub directly suggests hardware
issues. The linux-raid list is usually quite helpful tracking down such
problems, including which devices are suspect, but they're going to ask the
same questions about SCT ERC and SCSI command timer values I mentioned
earlier, and will want to figure out why you're continuing to see
mismatches even after a repair scrub - not normal.

---
Chris Murphy