linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
To: NeilBrown <neilb@suse.com>, linux-raid@vger.kernel.org, shli@kernel.org
Cc: kernel@gpiccoli.net, jay.vosburgh@canonical.com,
	dm-devel@redhat.com, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] [PATCH 0/1] Introduce emergency raid0 stop for mounted arrays
Date: Thu, 2 Aug 2018 10:30:57 -0300	[thread overview]
Message-ID: <7cfa220e-c04d-b69e-fe39-5b9277f0538d@canonical.com> (raw)
In-Reply-To: <87tvodpmc8.fsf@notabene.neil.brown.name>

On 01/08/2018 22:51, NeilBrown wrote:
>> [...] 
> If you have hard drive and some sectors or track stop working, I think
> you would still expect IO to the other sectors or tracks to keep
> working.
> 
> For this reason, the behaviour of md/raid0 is to continue to serve IO to
> working devices, and only fail IO to failed/missing devices.
> 

Hi Neil, thanks for your quick response. I agree with you about the
potential sector failure, it shouldn't automatically fail the entire
array for a single failed write.

The check I'm using in the patch is against device request queue - if a
raid0 member queue is dying/dead, then we consider the device as dead,
and as a consequence, the array is marked dead.

In my understanding of raid0/stripping, data is split among N devices,
called raid members. If one member is failed, for sure the data written
to the array will be corrupted, since that "portion" of data going to
the failed device won't be stored.

Regarding the current behavior, one test I made was to remove 1 device
of a 2-disk raid0 array and after that, write a file. Write completed
normally (no errors from the userspace perspective), and I hashed the
file using md5. I then rebooted the machine, raid0 was back with the 2
devices, and guess what?
The written file was there, but corrupted (with a different hash). I
don't think this is something fine, user could have written important
data and don't realize it was getting corrupted while writing.


> It seems reasonable that you might want a different behaviour, but I
> think that should be optional.  i.e. you would need to explicitly set a
> "one-out-all-out" flag on the array.  I'm not sure if this should cause
> reads to fail, but it seems quite reasonable that it would cause all
> writes to fail.
> 
> I would only change the kernel to recognise the flag and refuse any
> writes after any error has been seen.
> I would use udev/mdadm to detect a device removal and to mark the
> relevant component device as missing.
>

Using the udev/mdadm to notice a member has failed and the array must be
stopped might work, it was my first approach. The main issue here is
timing: it takes "some time" until userspace is aware of the failure, so
we have a window in which writes were sent between

(A) the array member failed/got removed and
(B) mdadm notices and instruct driver to refuse new writes;

between (A) and (B), those writes are seen as completed, since they are
indeed complete (at least, they are fine from the page cache point of
view). Then, writeback will try to write those, which will cause
problems or they will complete in a corrupted form (the file will
be present in the array's filesystem after array is restored, but
corrupted).

So, the in-kernel mechanism avoided most part of window (A)-(B),
although it seems we still have some problems when nesting arrays,
due to this same window, even with the in-kernel mechanism (given the
fact it takes some time to remove the top array when a pretty "far"
bottom-member is failed).

More suggestions on how to deal with this in a definitive manner are
highly appreciated.
Thanks,


Guilherme

  reply	other threads:[~2018-08-02 15:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-01 14:56 [RFC] [PATCH 0/1] Introduce emergency raid0 stop for mounted arrays Guilherme G. Piccoli
2018-08-01 14:56 ` [RFC] [PATCH 1/1] md/raid0: Introduce emergency stop for raid0 arrays Guilherme G. Piccoli
2018-08-02  1:51 ` [RFC] [PATCH 0/1] Introduce emergency raid0 stop for mounted arrays NeilBrown
2018-08-02 13:30   ` Guilherme G. Piccoli [this message]
2018-08-02 21:37     ` NeilBrown
2018-08-09 23:17       ` Guilherme G. Piccoli
2018-09-03 12:16         ` Guilherme G. Piccoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7cfa220e-c04d-b69e-fe39-5b9277f0538d@canonical.com \
    --to=gpiccoli@canonical.com \
    --cc=dm-devel@redhat.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=kernel@gpiccoli.net \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).