linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mdraid: raid1 and iscsi-multipath devices - never faults but should!
@ 2020-10-22 10:24 Thomas Rosenstein
  2020-10-22 11:44 ` Jack Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Rosenstein @ 2020-10-22 10:24 UTC (permalink / raw)
  To: linux-raid

Hello,

I'm trying todo something interesting, the structure looks like this:

xfs
- mdraid
   - multipath (with no_path_queue = fail)
     - iscsi path 1
     - iscsi path 2
   - multipath (with no_path_queue = fail)
     - iscsi path 1
     - iscsi path 2

During normal operation everything looks good, once a path fails (i.e. 
iscsi target is removed), the array goes to degraded, if the path comes 
back nothing happens.

Q1) Can I enable auto recovery for failed devices?

If the device is readded manually (or by software) everything resyncs 
and it works again. As all should be.

If BOTH devices fail at the same time (worst case scenario) it gets 
wonky. I would expect a total hang (as with iscsi, and multipath 
queue_no_path)

1) XFS reports Input/Output error
2) dmesg has logs like:

[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 
41472, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 
41473, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 
41474, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 
41475, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 
41476, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 
41477, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block 
41478, async page read

3) mdadm --detail /dev/md127 shows:

/dev/md127:
            Version : 1.2
      Creation Time : Wed Oct 21 17:25:22 2020
         Raid Level : raid1
         Array Size : 96640 (94.38 MiB 98.96 MB)
      Used Dev Size : 96640 (94.38 MiB 98.96 MB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent

        Update Time : Thu Oct 22 09:23:35 2020
              State : clean, degraded
     Active Devices : 1
    Working Devices : 1
     Failed Devices : 1
      Spare Devices : 0

Consistency Policy : resync

               Name : v-b08c6663-7296-4c66-9faf-ac687
               UUID : cc282a5c:59a499b3:682f5e6f:36f9c490
             Events : 122

     Number   Major   Minor   RaidDevice State
        0     253        2        0      active sync   /dev/dm-2
        -       0        0        1      removed

        1     253        3        -      faulty   /dev/dm-

4) I can read from /dev/md127, but only however much is in the buffer 
(see above dmesg logs)


In my opinion this should happen, or at least should be configurable.
I expect:
1) XFS hangs indefinitly (like multipath queue_no_path)
2) mdadm shows FAULTED as State

Q2) Can this be configured in any way?

After BOTH paths are recovered, nothing works anymore, and the raid 
doesn't recover automatically.
Only a complete unmount and stop followed by an assemble and mount makes 
the raid function again.

Q3) Is that expected behavior?

Thanks
Thomas Rosenstein

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mdraid: raid1 and iscsi-multipath devices - never faults but should!
  2020-10-22 10:24 mdraid: raid1 and iscsi-multipath devices - never faults but should! Thomas Rosenstein
@ 2020-10-22 11:44 ` Jack Wang
  2020-10-22 12:38   ` Thomas Rosenstein
  0 siblings, 1 reply; 3+ messages in thread
From: Jack Wang @ 2020-10-22 11:44 UTC (permalink / raw)
  To: Thomas Rosenstein; +Cc: linux-raid

Thomas Rosenstein <thomas.rosenstein@creamfinance.com> 于2020年10月22日周四 下午12:28写道:
>
> Hello,
>
> I'm trying todo something interesting, the structure looks like this:
>
> xfs
> - mdraid
>    - multipath (with no_path_queue = fail)
>      - iscsi path 1
>      - iscsi path 2
>    - multipath (with no_path_queue = fail)
>      - iscsi path 1
>      - iscsi path 2
>
> During normal operation everything looks good, once a path fails (i.e.
> iscsi target is removed), the array goes to degraded, if the path comes
> back nothing happens.
>
> Q1) Can I enable auto recovery for failed devices?
>
> If the device is readded manually (or by software) everything resyncs
> and it works again. As all should be.
>
> If BOTH devices fail at the same time (worst case scenario) it gets
> wonky. I would expect a total hang (as with iscsi, and multipath
> queue_no_path)
>
> 1) XFS reports Input/Output error
> 2) dmesg has logs like:
>
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41472, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41473, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41474, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41475, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41476, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41477, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41478, async page read
>
> 3) mdadm --detail /dev/md127 shows:
>
> /dev/md127:
>             Version : 1.2
>       Creation Time : Wed Oct 21 17:25:22 2020
>          Raid Level : raid1
>          Array Size : 96640 (94.38 MiB 98.96 MB)
>       Used Dev Size : 96640 (94.38 MiB 98.96 MB)
>        Raid Devices : 2
>       Total Devices : 2
>         Persistence : Superblock is persistent
>
>         Update Time : Thu Oct 22 09:23:35 2020
>               State : clean, degraded
>      Active Devices : 1
>     Working Devices : 1
>      Failed Devices : 1
>       Spare Devices : 0
>
> Consistency Policy : resync
>
>                Name : v-b08c6663-7296-4c66-9faf-ac687
>                UUID : cc282a5c:59a499b3:682f5e6f:36f9c490
>              Events : 122
>
>      Number   Major   Minor   RaidDevice State
>         0     253        2        0      active sync   /dev/dm-2
>         -       0        0        1      removed
>
>         1     253        3        -      faulty   /dev/dm-
>
> 4) I can read from /dev/md127, but only however much is in the buffer
> (see above dmesg logs)
>
>
> In my opinion this should happen, or at least should be configurable.
> I expect:
> 1) XFS hangs indefinitly (like multipath queue_no_path)
> 2) mdadm shows FAULTED as State

>
> Q2) Can this be configured in any way?
you can enable the last device to fail
9a567843f7ce ("md: allow last device to be forcibly removed from RAID1/RAID10.")
>
> After BOTH paths are recovered, nothing works anymore, and the raid
> doesn't recover automatically.
> Only a complete unmount and stop followed by an assemble and mount makes
> the raid function again.
>
> Q3) Is that expected behavior?
>
> Thanks
> Thomas Rosenstein

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mdraid: raid1 and iscsi-multipath devices - never faults but should!
  2020-10-22 11:44 ` Jack Wang
@ 2020-10-22 12:38   ` Thomas Rosenstein
  0 siblings, 0 replies; 3+ messages in thread
From: Thomas Rosenstein @ 2020-10-22 12:38 UTC (permalink / raw)
  To: Jack Wang; +Cc: linux-raid



On 22 Oct 2020, at 13:44, Jack Wang wrote:

> Thomas Rosenstein <thomas.rosenstein@creamfinance.com> 
> 于2020年10月22日周四 下午12:28写道:
>>
>> Hello,
>>
>> I'm trying todo something interesting, the structure looks like this:
>>
>> xfs
>> - mdraid
>>    - multipath (with no_path_queue = fail)
>>      - iscsi path 1
>>      - iscsi path 2
>>    - multipath (with no_path_queue = fail)
>>      - iscsi path 1
>>      - iscsi path 2
>>
>> During normal operation everything looks good, once a path fails 
>> (i.e.
>> iscsi target is removed), the array goes to degraded, if the path 
>> comes
>> back nothing happens.
>>
>> Q1) Can I enable auto recovery for failed devices?
>>
>> If the device is readded manually (or by software) everything resyncs
>> and it works again. As all should be.
>>
>> If BOTH devices fail at the same time (worst case scenario) it gets
>> wonky. I would expect a total hang (as with iscsi, and multipath
>> queue_no_path)
>>
>> 1) XFS reports Input/Output error
>> 2) dmesg has logs like:
>>
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical 
>> block
>> 41472, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical 
>> block
>> 41473, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical 
>> block
>> 41474, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical 
>> block
>> 41475, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical 
>> block
>> 41476, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical 
>> block
>> 41477, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical 
>> block
>> 41478, async page read
>>
>> 3) mdadm --detail /dev/md127 shows:
>>
>> /dev/md127:
>>             Version : 1.2
>>       Creation Time : Wed Oct 21 17:25:22 2020
>>          Raid Level : raid1
>>          Array Size : 96640 (94.38 MiB 98.96 MB)
>>       Used Dev Size : 96640 (94.38 MiB 98.96 MB)
>>        Raid Devices : 2
>>       Total Devices : 2
>>         Persistence : Superblock is persistent
>>
>>         Update Time : Thu Oct 22 09:23:35 2020
>>               State : clean, degraded
>>      Active Devices : 1
>>     Working Devices : 1
>>      Failed Devices : 1
>>       Spare Devices : 0
>>
>> Consistency Policy : resync
>>
>>                Name : v-b08c6663-7296-4c66-9faf-ac687
>>                UUID : cc282a5c:59a499b3:682f5e6f:36f9c490
>>              Events : 122
>>
>>      Number   Major   Minor   RaidDevice State
>>         0     253        2        0      active sync   /dev/dm-2
>>         -       0        0        1      removed
>>
>>         1     253        3        -      faulty   /dev/dm-
>>
>> 4) I can read from /dev/md127, but only however much is in the buffer
>> (see above dmesg logs)
>>
>>
>> In my opinion this should happen, or at least should be configurable.
>> I expect:
>> 1) XFS hangs indefinitly (like multipath queue_no_path)
>> 2) mdadm shows FAULTED as State
>
>>
>> Q2) Can this be configured in any way?
> you can enable the last device to fail
> 9a567843f7ce ("md: allow last device to be forcibly removed from 
> RAID1/RAID10.")

That did work, last device moved into faulted. Is there a way to recover 
from that? or is the array completely broken at that point?
I tried to re-add the first device after it's back up, but that leads to 
a Recovery / Synchronize Loop

btw. kernel 5.4.60

>>
>> After BOTH paths are recovered, nothing works anymore, and the raid
>> doesn't recover automatically.
>> Only a complete unmount and stop followed by an assemble and mount 
>> makes
>> the raid function again.
>>
>> Q3) Is that expected behavior?
>>
>> Thanks
>> Thomas Rosenstein

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-10-22 12:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22 10:24 mdraid: raid1 and iscsi-multipath devices - never faults but should! Thomas Rosenstein
2020-10-22 11:44 ` Jack Wang
2020-10-22 12:38   ` Thomas Rosenstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).