All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mikael Abrahamsson <swmike@swm.pp.se>
To: Chris Murphy <lists@colorremedies.com>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Filesystem corruption on RAID1
Date: Mon, 21 Aug 2017 10:37:16 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.20.1708211027370.3655@uplift.swm.pp.se> (raw)
In-Reply-To: <CAJCQCtQNx=8-16Xu1ffxqYh04W3mDy_qiPSysBw-g=fwWOtDMA@mail.gmail.com>

On Sun, 20 Aug 2017, Chris Murphy wrote:

> Since md doesn't read from both mirrors, it's possible there's a read 
> from a non-corrupt drive, which presents good information to fsck, which 
> then sees no reason to fix anything in that block; but the other mirror 
> does have corruption which thus goes undetected.

That was exactly what I wrote.

> One way of dealing with it is to scrub (repair) so they both have the 
> same information to hand over to fsck. Fixups then get replicated to 
> disks by md.

Yes, it is, but that would require a full repair before doing fsck. That 
seems excessive because that will take hours on larger drives.

> Another way is to split the mirror (make one device faulty), and then
> fix the remaining drive (now degraded). If that goes well, the 2nd
> device can be re-added. Here's a caveat thought: how it resync's will
> depend on the write-intent bitmap being present. I have no idea if
> write-intent bitmaps on two drives can get out of sync and what the
> ensuing behavior is, but I'd like to think md will discover the fixed
> drive event count is higher than the re-added one, and if necessary
> does a full resync, rather than possibly re-introducing any
> corruption.

This doesn't solve the problem because it doesn't check if the second 
mirror is out of sync with the first one, because it'll only detect writes 
to the degraded array and sync those. It doesn't fix the "fsck read the 
block and it was fine, but on the second drive it's not fine".

In that case fsck would have to be modified to write all blocks it read to 
make them dirty, so they're sync:ed.

However, this again causes the problem that if there is an URE on the 
degraded array remaining drive, things will fail.

The only way to solve this is to add more code to implement a new mode 
which would be "repair-on-read".

I understand that we can't necessarily detect which drive has the right or 
wrong information, but at least we can this way make sure that when fsck 
is done, all the inodes and other metadata is now consistent. Everything 
that fsck touched during the fsck will be consistent across all drives, 
with correct parity. It might not contain the "best" information that 
could have been presented by a more intelligent algorithm/metadata, but at 
least it's better than today when after a fsck run you don't know if 
parity is correct or not.

It would also be a good diagnostic tool for admins. If you suspect that 
you're getting inconsistencies but you're fine with the performance 
degradation then md could log inconsistencies somewhere so you know about 
them.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

  parent reply	other threads:[~2017-08-21  8:37 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-13 15:35 Filesystem corruption on RAID1 Gionatan Danti
2017-07-13 16:48 ` Roman Mamedov
2017-07-13 21:28   ` Gionatan Danti
2017-07-13 21:34     ` Reindl Harald
2017-07-13 22:34       ` Gionatan Danti
2017-07-14  0:32         ` Reindl Harald
2017-07-14  0:52           ` Anthony Youngman
2017-07-14  1:10             ` Reindl Harald
2017-07-14 10:46           ` Gionatan Danti
2017-07-14 10:58             ` Reindl Harald
2017-08-17  8:23             ` Gionatan Danti
2017-08-17 12:41               ` Roger Heflin
2017-08-17 14:31                 ` Gionatan Danti
2017-08-17 17:33                   ` Wols Lists
2017-08-17 20:50                     ` Gionatan Danti
2017-08-17 21:01                       ` Roger Heflin
2017-08-17 21:21                         ` Gionatan Danti
2017-08-17 21:23                           ` Gionatan Danti
2017-08-17 22:51                       ` Wols Lists
2017-08-18 12:26                         ` Gionatan Danti
2017-08-18 12:54                           ` Roger Heflin
2017-08-18 19:42                             ` Gionatan Danti
2017-08-20  7:14                               ` Mikael Abrahamsson
2017-08-20  7:24                                 ` Gionatan Danti
2017-08-20 10:43                                   ` Mikael Abrahamsson
2017-08-20 13:07                                     ` Wols Lists
2017-08-20 15:38                                       ` Adam Goryachev
2017-08-20 15:48                                         ` Mikael Abrahamsson
2017-08-20 16:10                                           ` Wols Lists
2017-08-20 23:11                                             ` Adam Goryachev
2017-08-21 14:03                                               ` Anthony Youngman
2017-08-20 19:11                                           ` Gionatan Danti
2017-08-20 19:03                                         ` Gionatan Danti
2017-08-20 19:01                                       ` Gionatan Danti
2017-08-31 22:55                                     ` Robert L Mathews
2017-09-01  5:39                                       ` Reindl Harald
2017-09-01 23:14                                         ` Robert L Mathews
2017-08-20 23:22                                 ` Chris Murphy
2017-08-21  5:57                                   ` Gionatan Danti
2017-08-21  8:37                                   ` Mikael Abrahamsson [this message]
2017-08-21 12:28                                     ` Gionatan Danti
2017-08-21 14:09                                       ` Anthony Youngman
2017-08-21 17:33                                     ` Chris Murphy
2017-08-21 17:52                                       ` Reindl Harald
2017-07-14  1:48         ` Chris Murphy
2017-07-14  7:22           ` Roman Mamedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1708211027370.3655@uplift.swm.pp.se \
    --to=swmike@swm.pp.se \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.