Should a raid-0 array immediately stop if a component disk is removed?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Should a raid-0 array immediately stop if a component disk is removed?
@ 2018-04-27 21:49 ` Guilherme G. Piccoli
  0 siblings, 0 replies; 6+ messages in thread
From: Guilherme G. Piccoli @ 2018-04-27 21:49 UTC (permalink / raw)
  To: linux-raid; +Cc: linux-scsi, linux-block, linux-nvme, kernel

Hello, we've noticed an interesting behavior when using a raid-0 md
array. Suppose we have a 2-disk raid-0 array that has a mount point
set - in our tests, we've used ext4 filesystem. If we remove one of
the component disks via sysfs[0], userspace is notified, but mdadm tool
fails to stop the array[1] (it cannot open the array device node with
O_EXCL flag, hence it fails to issue the STOP_ARRAY ioctl). Even if we
circumvent the mdadm O_EXCL open, md driver will fail to execute the
ioctl given the array is mounted.

As a result, the array keeps mounted and we can even read/write from
it, although it's possible to observe filesystem errors on dmesg[2].
Eventually, after some _minutes_, the filesystem gets remounted as
read-only.

During this weird window in which the array had a component disk removed
but is still mounted/active (and accepting read/writes), we tried to
perform reads and writes and sync command, which "succeed" (meaning the
commands themselves didn't fail, although the errors were observed in
dmesg). When "dd" was executed with "oflag=direct", the writes failed
immediately. This was observed with both nvme and scsi disks composing
the raid-0 array.

We've started to pursue a solution to this, which seems to be an odd
behavior. But worth to check in the CC'ed lists if perhaps this is "by
design" or if it was already discussed in the past (maybe an idea was
proposed). Tests were executed with v4.17-rc2 and upstream mdadm tool.

Thanks in advance,

Guilherme

[0] To remove the array component disk, we've used:

a) For nvme: "echo 1 > /sys/block/nvme0n1/device/device/remove"
b) For scsi: "echo 1 > /sys/block/sda/device/delete"

[1] mdadm tool tries to fail the array by executing: "mdadm -If
<component_block_device>"

[2] Example:
[...]
[88.719] Buffer I/O error on device md0, logical block 157704
[88.722] Buffer I/O error on device md0, logical block 157705
[88.725] EXT4-fs warning (device md0): ext4_end_bio:323: I/O error 10
writing to inode 14 (offset 0 size 8388608 starting block 79744)
[88.725] EXT4-fs warning (device md0): ext4_end_bio:323: I/O error 10
writing to inode 14 (offset 0 size 8388608 starting block 80000)
[...]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Should a raid-0 array immediately stop if a component disk is removed?
@ 2018-04-27 21:49 ` Guilherme G. Piccoli
  0 siblings, 0 replies; 6+ messages in thread
From: Guilherme G. Piccoli @ 2018-04-27 21:49 UTC (permalink / raw)


Hello, we've noticed an interesting behavior when using a raid-0 md
array. Suppose we have a 2-disk raid-0 array that has a mount point
set - in our tests, we've used ext4 filesystem. If we remove one of
the component disks via sysfs[0], userspace is notified, but mdadm tool
fails to stop the array[1] (it cannot open the array device node with
O_EXCL flag, hence it fails to issue the STOP_ARRAY ioctl). Even if we
circumvent the mdadm O_EXCL open, md driver will fail to execute the
ioctl given the array is mounted.

As a result, the array keeps mounted and we can even read/write from
it, although it's possible to observe filesystem errors on dmesg[2].
Eventually, after some _minutes_, the filesystem gets remounted as
read-only.

During this weird window in which the array had a component disk removed
but is still mounted/active (and accepting read/writes), we tried to
perform reads and writes and sync command, which "succeed" (meaning the
commands themselves didn't fail, although the errors were observed in
dmesg). When "dd" was executed with "oflag=direct", the writes failed
immediately. This was observed with both nvme and scsi disks composing
the raid-0 array.

We've started to pursue a solution to this, which seems to be an odd
behavior. But worth to check in the CC'ed lists if perhaps this is "by
design" or if it was already discussed in the past (maybe an idea was
proposed). Tests were executed with v4.17-rc2 and upstream mdadm tool.

Thanks in advance,


Guilherme


[0] To remove the array component disk, we've used:

a) For nvme: "echo 1 > /sys/block/nvme0n1/device/device/remove"
b) For scsi: "echo 1 > /sys/block/sda/device/delete"


[1] mdadm tool tries to fail the array by executing: "mdadm -If
<component_block_device>"


[2] Example:
[...]
[88.719] Buffer I/O error on device md0, logical block 157704
[88.722] Buffer I/O error on device md0, logical block 157705
[88.725] EXT4-fs warning (device md0): ext4_end_bio:323: I/O error 10
writing to inode 14 (offset 0 size 8388608 starting block 79744)
[88.725] EXT4-fs warning (device md0): ext4_end_bio:323: I/O error 10
writing to inode 14 (offset 0 size 8388608 starting block 80000)
[...]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Should a raid-0 array immediately stop if a component disk is removed?
  2018-04-27 21:49 ` Guilherme G. Piccoli
@ 2018-04-27 22:11   ` Wols Lists
  -1 siblings, 0 replies; 6+ messages in thread
From: Wols Lists @ 2018-04-27 22:11 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-raid
  Cc: linux-block, kernel, linux-nvme, linux-scsi

On 27/04/18 22:49, Guilherme G. Piccoli wrote:
> Hello, we've noticed an interesting behavior when using a raid-0 md
> array. Suppose we have a 2-disk raid-0 array that has a mount point
> set - in our tests, we've used ext4 filesystem. If we remove one of
> the component disks via sysfs[0], userspace is notified, but mdadm tool
> fails to stop the array[1] (it cannot open the array device node with
> O_EXCL flag, hence it fails to issue the STOP_ARRAY ioctl). Even if we
> circumvent the mdadm O_EXCL open, md driver will fail to execute the
> ioctl given the array is mounted.

Sounds like you're not using mdadm to remove the disk. So why do you
expect mdadm to stop the array immediately? It doesn't know anything is
wrong until it trips over the missing disk.
> 
> As a result, the array keeps mounted and we can even read/write from
> it, although it's possible to observe filesystem errors on dmesg[2].
> Eventually, after some _minutes_, the filesystem gets remounted as
> read-only.

Is your array linear or striped? If it's striped, I would expect it to
fall over in a heap very quickly. If it's linear, it depends whether you
remove drive 0 or drive 1. If you remove drive 0, it will fall over very
quickly. If you remove drive 1, the fuller your array the quicker it
will fall over (if your array isn't very full, drive 1 may well not be
used in which case the array might not fall over at all!)
> 
> During this weird window in which the array had a component disk removed
> but is still mounted/active (and accepting read/writes), we tried to
> perform reads and writes and sync command, which "succeed" (meaning the
> commands themselves didn't fail, although the errors were observed in
> dmesg). When "dd" was executed with "oflag=direct", the writes failed
> immediately. This was observed with both nvme and scsi disks composing
> the raid-0 array.
> 
> We've started to pursue a solution to this, which seems to be an odd
> behavior. But worth to check in the CC'ed lists if perhaps this is "by
> design" or if it was already discussed in the past (maybe an idea was
> proposed). Tests were executed with v4.17-rc2 and upstream mdadm tool.

Note that raid-0 is NOT redundant. Standard advice is "if a drive fails,
expect to lose your data". So the fact that your array limps on should
be the pleasant surprise, not that it blows up in ways you didn't expect.
> 
> Thanks in advance,
> 
> 
> Guilherme

Cheers,
Wol


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Should a raid-0 array immediately stop if a component disk is removed?
@ 2018-04-27 22:11   ` Wols Lists
  0 siblings, 0 replies; 6+ messages in thread
From: Wols Lists @ 2018-04-27 22:11 UTC (permalink / raw)


On 27/04/18 22:49, Guilherme G. Piccoli wrote:
> Hello, we've noticed an interesting behavior when using a raid-0 md
> array. Suppose we have a 2-disk raid-0 array that has a mount point
> set - in our tests, we've used ext4 filesystem. If we remove one of
> the component disks via sysfs[0], userspace is notified, but mdadm tool
> fails to stop the array[1] (it cannot open the array device node with
> O_EXCL flag, hence it fails to issue the STOP_ARRAY ioctl). Even if we
> circumvent the mdadm O_EXCL open, md driver will fail to execute the
> ioctl given the array is mounted.

Sounds like you're not using mdadm to remove the disk. So why do you
expect mdadm to stop the array immediately? It doesn't know anything is
wrong until it trips over the missing disk.
> 
> As a result, the array keeps mounted and we can even read/write from
> it, although it's possible to observe filesystem errors on dmesg[2].
> Eventually, after some _minutes_, the filesystem gets remounted as
> read-only.

Is your array linear or striped? If it's striped, I would expect it to
fall over in a heap very quickly. If it's linear, it depends whether you
remove drive 0 or drive 1. If you remove drive 0, it will fall over very
quickly. If you remove drive 1, the fuller your array the quicker it
will fall over (if your array isn't very full, drive 1 may well not be
used in which case the array might not fall over at all!)
> 
> During this weird window in which the array had a component disk removed
> but is still mounted/active (and accepting read/writes), we tried to
> perform reads and writes and sync command, which "succeed" (meaning the
> commands themselves didn't fail, although the errors were observed in
> dmesg). When "dd" was executed with "oflag=direct", the writes failed
> immediately. This was observed with both nvme and scsi disks composing
> the raid-0 array.
> 
> We've started to pursue a solution to this, which seems to be an odd
> behavior. But worth to check in the CC'ed lists if perhaps this is "by
> design" or if it was already discussed in the past (maybe an idea was
> proposed). Tests were executed with v4.17-rc2 and upstream mdadm tool.

Note that raid-0 is NOT redundant. Standard advice is "if a drive fails,
expect to lose your data". So the fact that your array limps on should
be the pleasant surprise, not that it blows up in ways you didn't expect.
> 
> Thanks in advance,
> 
> 
> Guilherme

Cheers,
Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Should a raid-0 array immediately stop if a component disk is removed?
  2018-04-27 22:11   ` Wols Lists
@ 2018-04-27 22:54     ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 6+ messages in thread
From: Guilherme G. Piccoli @ 2018-04-27 22:54 UTC (permalink / raw)
  To: Wols Lists, linux-raid; +Cc: linux-scsi, linux-block, linux-nvme, kernel

Thanks for your quick reply Anthony!
Inline comments below:

On 27/04/2018 19:11, Wols Lists wrote:
> On 27/04/18 22:49, Guilherme G. Piccoli wrote:
> [...]
> Sounds like you're not using mdadm to remove the disk. So why do you
> expect mdadm to stop the array immediately? It doesn't know anything is
> wrong until it trips over the missing disk.

In fact, mdadm is aware something is wrong - it tries to stop the array,
running "mdadm -If <array-component-just-removed>", but it fails because
the mount point prevents it to stop the array.
And the question lies exactly in this point: should it be (successfully)
stopped? I think it should, since we can continue writing on disks
causing data corruption.

> [...]
> Is your array linear or striped? If it's striped, I would expect it to
> fall over in a heap very quickly. If it's linear, it depends whether you

It's stripped. I was able to keep writing for some time (minutes).

> [...] 
> Note that raid-0 is NOT redundant. Standard advice is "if a drive fails,
> expect to lose your data". So the fact that your array limps on should
> be the pleasant surprise, not that it blows up in ways you didn't expect.

OK, I understand that. But imagine the following scenario: a regular
user gets for some reason a component disk removed, and they don't look
the logs before (or after) writes - the user can write stuff thinking
everything is fine, and that data is corrupted. I'd expect userspace
writes to fail as soon as possible in case one of raid-0 components is gone.

Thanks,

Guilherme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Should a raid-0 array immediately stop if a component disk is removed?
@ 2018-04-27 22:54     ` Guilherme G. Piccoli
  0 siblings, 0 replies; 6+ messages in thread
From: Guilherme G. Piccoli @ 2018-04-27 22:54 UTC (permalink / raw)

Thanks for your quick reply Anthony!
Inline comments below:

On 27/04/2018 19:11, Wols Lists wrote:
> On 27/04/18 22:49, Guilherme G. Piccoli wrote:
> [...]
> Sounds like you're not using mdadm to remove the disk. So why do you
> expect mdadm to stop the array immediately? It doesn't know anything is
> wrong until it trips over the missing disk.

In fact, mdadm is aware something is wrong - it tries to stop the array,
running "mdadm -If <array-component-just-removed>", but it fails because
the mount point prevents it to stop the array.
And the question lies exactly in this point: should it be (successfully)
stopped? I think it should, since we can continue writing on disks
causing data corruption.

> [...]
> Is your array linear or striped? If it's striped, I would expect it to
> fall over in a heap very quickly. If it's linear, it depends whether you

It's stripped. I was able to keep writing for some time (minutes).

> [...] 
> Note that raid-0 is NOT redundant. Standard advice is "if a drive fails,
> expect to lose your data". So the fact that your array limps on should
> be the pleasant surprise, not that it blows up in ways you didn't expect.

OK, I understand that. But imagine the following scenario: a regular
user gets for some reason a component disk removed, and they don't look
the logs before (or after) writes - the user can write stuff thinking
everything is fine, and that data is corrupted. I'd expect userspace
writes to fail as soon as possible in case one of raid-0 components is gone.

Thanks,

Guilherme

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-27 22:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 21:49 Should a raid-0 array immediately stop if a component disk is removed? Guilherme G. Piccoli
2018-04-27 21:49 ` Guilherme G. Piccoli
2018-04-27 22:11 ` Wols Lists
2018-04-27 22:11   ` Wols Lists
2018-04-27 22:54   ` Guilherme G. Piccoli
2018-04-27 22:54     ` Guilherme G. Piccoli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.