Re: Corrupted filesystem, looking for guidance

From: Chris Murphy <lists@colorremedies.com>
To: "Sébastien Luttringer" <seblu@seblu.net>
Cc: Chris Murphy <lists@colorremedies.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Corrupted filesystem, looking for guidance
Date: Sat, 23 Feb 2019 17:00:42 -0700	[thread overview]
Message-ID: <CAJCQCtQzqvtVyzZETUzsH+ZqD8QuFSOGd9fxpEPT3LkLXRbr9A@mail.gmail.com> (raw)
In-Reply-To: <4fd5e655c49278cf5967b2774ab34e4a0571f722.camel@seblu.net>

On Sat, Feb 23, 2019 at 11:14 AM Sébastien Luttringer <seblu@seblu.net> wrote:

> What I don't get is how this could end up to silent sector corruption or let
> accumulate bad sectors. A read timeout, a link reset will end up with an error
> kick at minimum one drive from the array, forcing a full rebuild. No?

No. Link resets don't result in a drive being kicked out of an array.

Accumulation happens because a link reset means there's no discrete
read error with sector LBA, which is necessary for md to know what
sector to repair and where to obtain the mirror copy (or stripe
reconstruction from parity if parity raid).

>
> I discovered that my SAS drives have no such timeout and they don't need an ERC
> value to be defined. So, I updated my timeout to 180 when my drives are SATA
> and doesn't support ERC. Thanks a lot for making me discovering this.

SAS drives you probably don't need to worry about. I'm pretty sure all
of them do a fast error recovery in less than 30 seconds. I'm not sure
off hand how to discover this, other than digging through manufacturer
specs for that make/model.

> > If you do want to move to strictly Btrfs, I suggest raid5 for data but
> > use raid1 for metadata instead of raid5. Metadata raid 5 writes can't
> > really be assured to be atomic. Using raid1 metadata is less fragile.
> Make sense. Is raid10 suitable (atomic) option for metadata? Looks like
> performance are better than raid1?

It's better performance than raid1, but since the full metadata write
can be striped among multiple drives, you run into the same problem as
with parity raid, which is that metadata write isn't guaranteed to be
completed until all drives commit all parts of that metadata write to
stable media. So it's maybe not really atomic, it depends. I'd expect
SAS drives don't lie, and actually commit to stable media when is says
it has. Therefore barriers should work as expected.

> > --repair should be safe but even in 4.20.1 tools you'll see the man
> > page says it's dangerous and you should ask on list before using it.
> Few month ago I was strongly advised to ask here before calling repair.
> Are you saying that it's no more useful?

Ask on list before using it, or just realize you're taking a chance.
It's quite a lot safer than it used to be a few years ago. But
sometimes it makes things worse still.

> > Well at this point if you ran a those commands the file system is
> > different so you should refresh the thread by posting current normal
> > mount (no options) kernel messages; and also 'btrfs check' output
> > without repair; and also output from btrfs-debug-tree. If the problem
> > is simple enough and a dev has time it might be they get you a file
> > system specific patch to apply and it can be fixed. But it's really
> > important that you stop making changes to the file system in the
> > meantime. Just gather information. Be deliberate.
> It's a pity that there is yet no solution without involving a human. I'll not
> request developer time which could be used to improve the filesystem. :)

Well a lot of times they're able to improve the file system but
figuring out how to fix edge cases resulting in problems.

-- 
Chris Murphy