On Wed, Aug 01 2018, Guilherme G. Piccoli wrote:

> Currently the md driver completely relies in the userspace to stop an
> array in case of some failure. There's an interesting case for raid0: if
> we remove a raid0 member, like PCI hot(un)plugging an NVMe device, and
> the raid0 array is _mounted_, mdadm cannot stop the array, since the tool
> tries to open the block device (to perform the ioctl) with O_EXCL flag.
>
> So, in this case the array is still alive - users may write to this
> "broken-yet-alive" array and unless they check the kernel log or some
> other monitor tool, everything will seem fine and the writes are completed
> with no errors. Being more precise, direct writes will not work, but since
> usually writes are done in a regular form, i.e., backed by the page
> cache, the most common scenario is an user being able to regularly write
> to a broken raid0, and get all their data corrupted.
>
> PROPOSAL:
> The idea proposed here to fix this behavior is mimic other block devices:
> if one have a filesystem mounted in a block device on top of an NVMe or
> SCSI disk and the disk gets removed, writes are prevented, errors are
> observed and it's obvious something is wrong. Same goes for USB sticks,
> which are sometimes even removed physically from the machine without
> getting their filesystem unmounted before.
>
> We believe right now the md driver is not behaving properly for raid0
> arrays (it is handling these errors for other levels though). The approach
> took for raid-0 is basically an emergency removal procedure, in which I/O
> is blocked from the device, the regular clean-up happens and the associate
> disk is deleted. It went to extensive testing, as detailed below.
>
> Not all are roses, we have some caveats that need to be resolved.
> Feedback is _much appreciated_.

If you have hard drive and some sectors or track stop working, I think
you would still expect IO to the other sectors or tracks to keep
working.

For this reason, the behaviour of md/raid0 is to continue to serve IO to
working devices, and only fail IO to failed/missing devices.

It seems reasonable that you might want a different behaviour, but I
think that should be optional.  i.e. you would need to explicitly set a
"one-out-all-out" flag on the array.  I'm not sure if this should cause
reads to fail, but it seems quite reasonable that it would cause all
writes to fail.

I would only change the kernel to recognise the flag and refuse any
writes after any error has been seen.
I would use udev/mdadm to detect a device removal and to mark the
relevant component device as missing.

NeilBrown