linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: "Guilherme G. Piccoli" <gpiccoli@canonical.com>,
	linux-raid@vger.kernel.org, shli@kernel.org
Cc: gpiccoli@canonical.com, kernel@gpiccoli.net,
	jay.vosburgh@canonical.com, dm-devel@redhat.com,
	linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] [PATCH 0/1] Introduce emergency raid0 stop for mounted arrays
Date: Thu, 02 Aug 2018 11:51:35 +1000	[thread overview]
Message-ID: <87tvodpmc8.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20180801145608.19880-1-gpiccoli@canonical.com>

[-- Attachment #1: Type: text/plain, Size: 2622 bytes --]

On Wed, Aug 01 2018, Guilherme G. Piccoli wrote:

> Currently the md driver completely relies in the userspace to stop an
> array in case of some failure. There's an interesting case for raid0: if
> we remove a raid0 member, like PCI hot(un)plugging an NVMe device, and
> the raid0 array is _mounted_, mdadm cannot stop the array, since the tool
> tries to open the block device (to perform the ioctl) with O_EXCL flag.
>
> So, in this case the array is still alive - users may write to this
> "broken-yet-alive" array and unless they check the kernel log or some
> other monitor tool, everything will seem fine and the writes are completed
> with no errors. Being more precise, direct writes will not work, but since
> usually writes are done in a regular form, i.e., backed by the page
> cache, the most common scenario is an user being able to regularly write
> to a broken raid0, and get all their data corrupted.
>
> PROPOSAL:
> The idea proposed here to fix this behavior is mimic other block devices:
> if one have a filesystem mounted in a block device on top of an NVMe or
> SCSI disk and the disk gets removed, writes are prevented, errors are
> observed and it's obvious something is wrong. Same goes for USB sticks,
> which are sometimes even removed physically from the machine without
> getting their filesystem unmounted before.
>
> We believe right now the md driver is not behaving properly for raid0
> arrays (it is handling these errors for other levels though). The approach
> took for raid-0 is basically an emergency removal procedure, in which I/O
> is blocked from the device, the regular clean-up happens and the associate
> disk is deleted. It went to extensive testing, as detailed below.
>
> Not all are roses, we have some caveats that need to be resolved.
> Feedback is _much appreciated_.

If you have hard drive and some sectors or track stop working, I think
you would still expect IO to the other sectors or tracks to keep
working.

For this reason, the behaviour of md/raid0 is to continue to serve IO to
working devices, and only fail IO to failed/missing devices.

It seems reasonable that you might want a different behaviour, but I
think that should be optional.  i.e. you would need to explicitly set a
"one-out-all-out" flag on the array.  I'm not sure if this should cause
reads to fail, but it seems quite reasonable that it would cause all
writes to fail.

I would only change the kernel to recognise the flag and refuse any
writes after any error has been seen.
I would use udev/mdadm to detect a device removal and to mark the
relevant component device as missing.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  parent reply	other threads:[~2018-08-02  3:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-01 14:56 [RFC] [PATCH 0/1] Introduce emergency raid0 stop for mounted arrays Guilherme G. Piccoli
2018-08-01 14:56 ` [RFC] [PATCH 1/1] md/raid0: Introduce emergency stop for raid0 arrays Guilherme G. Piccoli
2018-08-02  1:51 ` NeilBrown [this message]
2018-08-02 13:30   ` [RFC] [PATCH 0/1] Introduce emergency raid0 stop for mounted arrays Guilherme G. Piccoli
2018-08-02 21:37     ` NeilBrown
2018-08-09 23:17       ` Guilherme G. Piccoli
2018-09-03 12:16         ` Guilherme G. Piccoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tvodpmc8.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=dm-devel@redhat.com \
    --cc=gpiccoli@canonical.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=kernel@gpiccoli.net \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).