All of lore.kernel.org
 help / color / mirror / Atom feed
* Dealing with constantly failing paths
@ 2018-09-13  9:42 Özkan Göksu
  2018-09-13 17:45 ` Benjamin Marzinski
  0 siblings, 1 reply; 2+ messages in thread
From: Özkan Göksu @ 2018-09-13  9:42 UTC (permalink / raw)
  To: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 1715 bytes --]

Hello.
I'm sorry to have e-mailed you here but I did not really find the answer.

When a disk starts to die slowly multipath starts to Failing & Reinstating
paths and this keeps forever.. (I'm using LSI-3008HBA card with SAS-JBOD
not FC-Network)
Because kernel do not echo to offline faulted disk. This is causing
terrible problems to me.

I'm using: multipath-tools 0.7.4-1
Linux DEV2 4.14.67-1-lts #1


Dmesg;
    Sep 13 11:20:17 DEV2 kernel: sd 0:0:190:0: attempting task abort!
scmd(ffff88110e632948)
    Sep 13 11:20:17 DEV2 kernel: sd 0:0:190:0: [sdft] tag#3 CDB: opcode=0x0
00 00 00 00 00 00
    Sep 13 11:20:17 DEV2 kernel: scsi target0:0:190: handle(0x0037),
sas_address(0x5000c50093d4e7c6), phy(38)
    Sep 13 11:20:17 DEV2 kernel: scsi target0:0:190:
enclosure_logical_id(0x500304800929ec7f), slot(37)
    Sep 13 11:20:17 DEV2 kernel: scsi target0:0:190: enclosure
level(0x0001),connector name(1   )
    Sep 13 11:20:17 DEV2 kernel: sd 0:0:190:0: task abort: SUCCESS
scmd(ffff88110e632948)
    Sep 13 11:20:18 DEV2 kernel: device-mapper: multipath: Failing path
130:240.
    Sep 13 11:25:34 DEV2 kernel: device-mapper: multipath: Reinstating path
130:240.
Full dmesg example: https://paste.ubuntu.com/p/H9NMWxNfgD/

As you can see kernel aborted the mission and after that multipath failed.
So I want to get rid of this problem via telling Multipath "do not
Reinstate the path".
This method will keep dead the zombie disk.
If I dont kick the disk out its causing HBA reset and I'm losing all disk
in my pool and ZFS pool suspending.

I'm not saying this problem related to multipathd, I'm just thinking this
will save me.

So how can I tell the multipath do not Reinstate X times failed path?
Thank you.

[-- Attachment #1.2: Type: text/html, Size: 2327 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Dealing with constantly failing paths
  2018-09-13  9:42 Dealing with constantly failing paths Özkan Göksu
@ 2018-09-13 17:45 ` Benjamin Marzinski
  0 siblings, 0 replies; 2+ messages in thread
From: Benjamin Marzinski @ 2018-09-13 17:45 UTC (permalink / raw)
  To: Özkan Göksu; +Cc: dm-devel

On Thu, Sep 13, 2018 at 12:42:54PM +0300, Özkan Göksu wrote:
>    Hello. 
>    I'm sorry to have e-mailed you here but I did not really find the answer.
>    When a disk starts to die slowly multipath starts to Failing & Reinstating
>    paths and this keeps forever.. (I'm using LSI-3008HBA card with SAS-JBOD
>    not FC-Network)
>    Because kernel do not echo to offline faulted disk. This is causing
>    terrible problems to me.
>    I'm using: multipath-tools 0.7.4-1
>    Linux DEV2 4.14.67-1-lts #1 
>    Dmesg;
>        Sep 13 11:20:17 DEV2 kernel: sd 0:0:190:0: attempting task abort!
>    scmd(ffff88110e632948)
>        Sep 13 11:20:17 DEV2 kernel: sd 0:0:190:0: [sdft] tag#3 CDB:
>    opcode=0x0 00 00 00 00 00 00
>        Sep 13 11:20:17 DEV2 kernel: scsi target0:0:190: handle(0x0037),
>    sas_address(0x5000c50093d4e7c6), phy(38)
>        Sep 13 11:20:17 DEV2 kernel: scsi target0:0:190:
>    enclosure_logical_id(0x500304800929ec7f), slot(37)
>        Sep 13 11:20:17 DEV2 kernel: scsi target0:0:190: enclosure
>    level(0x0001),connector name(1   )
>        Sep 13 11:20:17 DEV2 kernel: sd 0:0:190:0: task abort: SUCCESS
>    scmd(ffff88110e632948)
>        Sep 13 11:20:18 DEV2 kernel: device-mapper: multipath: Failing path
>    130:240.
>        Sep 13 11:25:34 DEV2 kernel: device-mapper: multipath: Reinstating
>    path 130:240.
>    Full dmesg example: [1]https://paste.ubuntu.com/p/H9NMWxNfgD/
>     
>    As you can see kernel aborted the mission and after that multipath failed.
>    So I want to get rid of this problem via telling Multipath "do not
>    Reinstate the path".  
>    This method will keep dead the zombie disk.
>    If I dont kick the disk out its causing HBA reset and I'm losing all disk
>    in my pool and ZFS pool suspending.
>    I'm not saying this problem related to multipathd, I'm just thinking this
>    will save me.
>    So how can I tell the multipath do not Reinstate X times failed path?
>    Thank you.

In recent releases there are two seperate methods to do this. Both of
them involve setting multiple multipath.conf parameters. The older
method is to set "delay_wait_checks" and "delay_watch_checks". The newer
one is to set "marginal_path_double_failed_time",
"marginal_path_err_sample_time", "marginal_path_err_rate_threshold", and
"marginal_path_err_recheck_gap_time". You can look in the multipath.conf
man page to see if both sets of options are available to you, and how
they work. If the version of the multipath tools you are using has both
sets of options, the "marginal_path_*" options should do a better job at
finding these marginal paths.

-Ben

> 
> References
> 
>    Visible links
>    1. https://paste.ubuntu.com/p/H9NMWxNfgD/

> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-09-13 17:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-13  9:42 Dealing with constantly failing paths Özkan Göksu
2018-09-13 17:45 ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.