* mdraid: raid1 and iscsi-multipath devices - never faults but should!
@ 2020-10-22 10:24 Thomas Rosenstein
2020-10-22 11:44 ` Jack Wang
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Rosenstein @ 2020-10-22 10:24 UTC (permalink / raw)
To: linux-raid
Hello,
I'm trying todo something interesting, the structure looks like this:
xfs
- mdraid
- multipath (with no_path_queue = fail)
- iscsi path 1
- iscsi path 2
- multipath (with no_path_queue = fail)
- iscsi path 1
- iscsi path 2
During normal operation everything looks good, once a path fails (i.e.
iscsi target is removed), the array goes to degraded, if the path comes
back nothing happens.
Q1) Can I enable auto recovery for failed devices?
If the device is readded manually (or by software) everything resyncs
and it works again. As all should be.
If BOTH devices fail at the same time (worst case scenario) it gets
wonky. I would expect a total hang (as with iscsi, and multipath
queue_no_path)
1) XFS reports Input/Output error
2) dmesg has logs like:
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41472, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41473, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41474, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41475, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41476, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41477, async page read
[Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
41478, async page read
3) mdadm --detail /dev/md127 shows:
/dev/md127:
Version : 1.2
Creation Time : Wed Oct 21 17:25:22 2020
Raid Level : raid1
Array Size : 96640 (94.38 MiB 98.96 MB)
Used Dev Size : 96640 (94.38 MiB 98.96 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Thu Oct 22 09:23:35 2020
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Consistency Policy : resync
Name : v-b08c6663-7296-4c66-9faf-ac687
UUID : cc282a5c:59a499b3:682f5e6f:36f9c490
Events : 122
Number Major Minor RaidDevice State
0 253 2 0 active sync /dev/dm-2
- 0 0 1 removed
1 253 3 - faulty /dev/dm-
4) I can read from /dev/md127, but only however much is in the buffer
(see above dmesg logs)
In my opinion this should happen, or at least should be configurable.
I expect:
1) XFS hangs indefinitly (like multipath queue_no_path)
2) mdadm shows FAULTED as State
Q2) Can this be configured in any way?
After BOTH paths are recovered, nothing works anymore, and the raid
doesn't recover automatically.
Only a complete unmount and stop followed by an assemble and mount makes
the raid function again.
Q3) Is that expected behavior?
Thanks
Thomas Rosenstein
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: mdraid: raid1 and iscsi-multipath devices - never faults but should!
2020-10-22 10:24 mdraid: raid1 and iscsi-multipath devices - never faults but should! Thomas Rosenstein
@ 2020-10-22 11:44 ` Jack Wang
2020-10-22 12:38 ` Thomas Rosenstein
0 siblings, 1 reply; 3+ messages in thread
From: Jack Wang @ 2020-10-22 11:44 UTC (permalink / raw)
To: Thomas Rosenstein; +Cc: linux-raid
Thomas Rosenstein <thomas.rosenstein@creamfinance.com> 于2020年10月22日周四 下午12:28写道:
>
> Hello,
>
> I'm trying todo something interesting, the structure looks like this:
>
> xfs
> - mdraid
> - multipath (with no_path_queue = fail)
> - iscsi path 1
> - iscsi path 2
> - multipath (with no_path_queue = fail)
> - iscsi path 1
> - iscsi path 2
>
> During normal operation everything looks good, once a path fails (i.e.
> iscsi target is removed), the array goes to degraded, if the path comes
> back nothing happens.
>
> Q1) Can I enable auto recovery for failed devices?
>
> If the device is readded manually (or by software) everything resyncs
> and it works again. As all should be.
>
> If BOTH devices fail at the same time (worst case scenario) it gets
> wonky. I would expect a total hang (as with iscsi, and multipath
> queue_no_path)
>
> 1) XFS reports Input/Output error
> 2) dmesg has logs like:
>
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41472, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41473, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41474, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41475, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41476, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41477, async page read
> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical block
> 41478, async page read
>
> 3) mdadm --detail /dev/md127 shows:
>
> /dev/md127:
> Version : 1.2
> Creation Time : Wed Oct 21 17:25:22 2020
> Raid Level : raid1
> Array Size : 96640 (94.38 MiB 98.96 MB)
> Used Dev Size : 96640 (94.38 MiB 98.96 MB)
> Raid Devices : 2
> Total Devices : 2
> Persistence : Superblock is persistent
>
> Update Time : Thu Oct 22 09:23:35 2020
> State : clean, degraded
> Active Devices : 1
> Working Devices : 1
> Failed Devices : 1
> Spare Devices : 0
>
> Consistency Policy : resync
>
> Name : v-b08c6663-7296-4c66-9faf-ac687
> UUID : cc282a5c:59a499b3:682f5e6f:36f9c490
> Events : 122
>
> Number Major Minor RaidDevice State
> 0 253 2 0 active sync /dev/dm-2
> - 0 0 1 removed
>
> 1 253 3 - faulty /dev/dm-
>
> 4) I can read from /dev/md127, but only however much is in the buffer
> (see above dmesg logs)
>
>
> In my opinion this should happen, or at least should be configurable.
> I expect:
> 1) XFS hangs indefinitly (like multipath queue_no_path)
> 2) mdadm shows FAULTED as State
>
> Q2) Can this be configured in any way?
you can enable the last device to fail
9a567843f7ce ("md: allow last device to be forcibly removed from RAID1/RAID10.")
>
> After BOTH paths are recovered, nothing works anymore, and the raid
> doesn't recover automatically.
> Only a complete unmount and stop followed by an assemble and mount makes
> the raid function again.
>
> Q3) Is that expected behavior?
>
> Thanks
> Thomas Rosenstein
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: mdraid: raid1 and iscsi-multipath devices - never faults but should!
2020-10-22 11:44 ` Jack Wang
@ 2020-10-22 12:38 ` Thomas Rosenstein
0 siblings, 0 replies; 3+ messages in thread
From: Thomas Rosenstein @ 2020-10-22 12:38 UTC (permalink / raw)
To: Jack Wang; +Cc: linux-raid
On 22 Oct 2020, at 13:44, Jack Wang wrote:
> Thomas Rosenstein <thomas.rosenstein@creamfinance.com>
> 于2020年10月22日周四 下午12:28写道:
>>
>> Hello,
>>
>> I'm trying todo something interesting, the structure looks like this:
>>
>> xfs
>> - mdraid
>> - multipath (with no_path_queue = fail)
>> - iscsi path 1
>> - iscsi path 2
>> - multipath (with no_path_queue = fail)
>> - iscsi path 1
>> - iscsi path 2
>>
>> During normal operation everything looks good, once a path fails
>> (i.e.
>> iscsi target is removed), the array goes to degraded, if the path
>> comes
>> back nothing happens.
>>
>> Q1) Can I enable auto recovery for failed devices?
>>
>> If the device is readded manually (or by software) everything resyncs
>> and it works again. As all should be.
>>
>> If BOTH devices fail at the same time (worst case scenario) it gets
>> wonky. I would expect a total hang (as with iscsi, and multipath
>> queue_no_path)
>>
>> 1) XFS reports Input/Output error
>> 2) dmesg has logs like:
>>
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical
>> block
>> 41472, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical
>> block
>> 41473, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical
>> block
>> 41474, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical
>> block
>> 41475, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical
>> block
>> 41476, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical
>> block
>> 41477, async page read
>> [Thu Oct 22 09:25:28 2020] Buffer I/O error on dev md127, logical
>> block
>> 41478, async page read
>>
>> 3) mdadm --detail /dev/md127 shows:
>>
>> /dev/md127:
>> Version : 1.2
>> Creation Time : Wed Oct 21 17:25:22 2020
>> Raid Level : raid1
>> Array Size : 96640 (94.38 MiB 98.96 MB)
>> Used Dev Size : 96640 (94.38 MiB 98.96 MB)
>> Raid Devices : 2
>> Total Devices : 2
>> Persistence : Superblock is persistent
>>
>> Update Time : Thu Oct 22 09:23:35 2020
>> State : clean, degraded
>> Active Devices : 1
>> Working Devices : 1
>> Failed Devices : 1
>> Spare Devices : 0
>>
>> Consistency Policy : resync
>>
>> Name : v-b08c6663-7296-4c66-9faf-ac687
>> UUID : cc282a5c:59a499b3:682f5e6f:36f9c490
>> Events : 122
>>
>> Number Major Minor RaidDevice State
>> 0 253 2 0 active sync /dev/dm-2
>> - 0 0 1 removed
>>
>> 1 253 3 - faulty /dev/dm-
>>
>> 4) I can read from /dev/md127, but only however much is in the buffer
>> (see above dmesg logs)
>>
>>
>> In my opinion this should happen, or at least should be configurable.
>> I expect:
>> 1) XFS hangs indefinitly (like multipath queue_no_path)
>> 2) mdadm shows FAULTED as State
>
>>
>> Q2) Can this be configured in any way?
> you can enable the last device to fail
> 9a567843f7ce ("md: allow last device to be forcibly removed from
> RAID1/RAID10.")
That did work, last device moved into faulted. Is there a way to recover
from that? or is the array completely broken at that point?
I tried to re-add the first device after it's back up, but that leads to
a Recovery / Synchronize Loop
btw. kernel 5.4.60
>>
>> After BOTH paths are recovered, nothing works anymore, and the raid
>> doesn't recover automatically.
>> Only a complete unmount and stop followed by an assemble and mount
>> makes
>> the raid function again.
>>
>> Q3) Is that expected behavior?
>>
>> Thanks
>> Thomas Rosenstein
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-10-22 12:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22 10:24 mdraid: raid1 and iscsi-multipath devices - never faults but should! Thomas Rosenstein
2020-10-22 11:44 ` Jack Wang
2020-10-22 12:38 ` Thomas Rosenstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).