All of lore.kernel.org
 help / color / mirror / Atom feed
* deterministic io throughput in multipath
@ 2016-12-19 11:50 Muneendra Kumar M
  2016-12-19 12:09 ` Hannes Reinecke
  2016-12-21 16:09 ` Benjamin Marzinski
  0 siblings, 2 replies; 21+ messages in thread
From: Muneendra Kumar M @ 2016-12-19 11:50 UTC (permalink / raw)
  To: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 829 bytes --]

Customers using Linux host (mostly RHEL host) using a SAN network for block storage, complain the Linux multipath stack is not resilient to handle non-deterministic storage network behaviors. This has caused many customer move away to non-linux based servers. The intent of the below patch and the prevailing issues are given below. With the below design we are seeing the Linux multipath stack becoming resilient to such network issues. We hope by getting this patch accepted will help in more Linux server adoption that use SAN network.
I have already sent the design details to the community in a different mail chain and the details are available in the below link.
https://www.redhat.com/archives/dm-devel/2016-December/msg00122.html.
Can you please go through the design and send the comments to us.

Regards,
Muneendra.



[-- Attachment #1.2: Type: text/html, Size: 3573 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2016-12-19 11:50 deterministic io throughput in multipath Muneendra Kumar M
@ 2016-12-19 12:09 ` Hannes Reinecke
  2016-12-21 16:09 ` Benjamin Marzinski
  1 sibling, 0 replies; 21+ messages in thread
From: Hannes Reinecke @ 2016-12-19 12:09 UTC (permalink / raw)
  To: dm-devel

On 12/19/2016 12:50 PM, Muneendra Kumar M wrote:
> Customers using Linux host (mostly RHEL host) using a SAN network for
> block storage, complain the Linux multipath stack is not resilient to
> handle non-deterministic storage network behaviors. This has caused many
> customer move away to non-linux based servers. The intent of the below
> patch and the prevailing issues are given below. With the below design
> we are seeing the Linux multipath stack becoming resilient to such
> network issues. We hope by getting this patch accepted will help in more
> Linux server adoption that use SAN network.
> 
> I have already sent the design details to the community in a different
> mail chain and the details are available in the below link.
> 
> https://www.redhat.com/archives/dm-devel/2016-December/msg00122.html.
> 
> Can you please go through the design and send the comments to us.  
> 
This issue is coming up from time to time.
Standard answer here is that  using 'service-time' as a path selector
_should_ already give you the expected results; namely any path
exhibiting intermediate I/O errors should have a higher latency than any
functional path.
Hence the 'service-time' path selector should switch away from those
paths automatically.

Have you tried this?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2016-12-19 11:50 deterministic io throughput in multipath Muneendra Kumar M
  2016-12-19 12:09 ` Hannes Reinecke
@ 2016-12-21 16:09 ` Benjamin Marzinski
  2016-12-22  5:39   ` Muneendra Kumar M
  2016-12-26  9:42   ` Muneendra Kumar M
  1 sibling, 2 replies; 21+ messages in thread
From: Benjamin Marzinski @ 2016-12-21 16:09 UTC (permalink / raw)
  To: Muneendra Kumar M; +Cc: dm-devel

Have you looked into the delay_watch_checks and delay_wait_checks
configuration parameters?  The idea behind them is to minimize the use
of paths that are intermittently failing.

-Ben

On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
>    Customers using Linux host (mostly RHEL host) using a SAN network for
>    block storage, complain the Linux multipath stack is not resilient to
>    handle non-deterministic storage network behaviors. This has caused many
>    customer move away to non-linux based servers. The intent of the below
>    patch and the prevailing issues are given below. With the below design we
>    are seeing the Linux multipath stack becoming resilient to such network
>    issues. We hope by getting this patch accepted will help in more Linux
>    server adoption that use SAN network.
> 
>    I have already sent the design details to the community in a different
>    mail chain and the details are available in the below link.
> 
>    [1]https://www.redhat.com/archives/dm-devel/2016-December/msg00122.html.
> 
>    Can you please go through the design and send the comments to us.  
> 
>     
> 
>    Regards,
> 
>    Muneendra.
> 
>     
> 
>     
> 
> References
> 
>    Visible links
>    1. https://www.redhat.com/archives/dm-devel/2016-December/msg00122.html

> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2016-12-21 16:09 ` Benjamin Marzinski
@ 2016-12-22  5:39   ` Muneendra Kumar M
  2016-12-26  9:42   ` Muneendra Kumar M
  1 sibling, 0 replies; 21+ messages in thread
From: Muneendra Kumar M @ 2016-12-22  5:39 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel

Hi Ben,

Thanks for the reply.
I will look into this parameters will do the internal testing and let you know the results.

Regards,
Muneendra.

-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
Sent: Wednesday, December 21, 2016 9:40 PM
To: Muneendra Kumar M <mmandala@Brocade.com>
Cc: dm-devel@redhat.com
Subject: Re: [dm-devel] deterministic io throughput in multipath

Have you looked into the delay_watch_checks and delay_wait_checks configuration parameters?  The idea behind them is to minimize the use of paths that are intermittently failing.

-Ben

On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
>    Customers using Linux host (mostly RHEL host) using a SAN network for
>    block storage, complain the Linux multipath stack is not resilient to
>    handle non-deterministic storage network behaviors. This has caused many
>    customer move away to non-linux based servers. The intent of the below
>    patch and the prevailing issues are given below. With the below design we
>    are seeing the Linux multipath stack becoming resilient to such network
>    issues. We hope by getting this patch accepted will help in more Linux
>    server adoption that use SAN network.
> 
>    I have already sent the design details to the community in a different
>    mail chain and the details are available in the below link.
> 
>    [1]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e= .
> 
>    Can you please go through the design and send the comments to us.
> 
>     
> 
>    Regards,
> 
>    Muneendra.
> 
>     
> 
>     
> 
> References
> 
>    Visible links
>    1. 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_ar
> chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjub
> gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1K
> XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu
> 52hG3MKzM&e=

> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_ma
> ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2016-12-21 16:09 ` Benjamin Marzinski
  2016-12-22  5:39   ` Muneendra Kumar M
@ 2016-12-26  9:42   ` Muneendra Kumar M
  2017-01-03 17:12     ` Benjamin Marzinski
  1 sibling, 1 reply; 21+ messages in thread
From: Muneendra Kumar M @ 2016-12-26  9:42 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel

Hi Ben,

If there are two paths on a dm-1 say sda and sdb as below.

#  multipath -ll
	mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN MyLun
	size=8.0M features='0' hwhandler='0' wp=rw
	`-+- policy='round-robin 0' prio=50 status=active
	  |- 8:0:1:0  sda 8:48 active ready  running
	  `- 9:0:1:0  sdb 8:64 active ready  running          

And on sda if iam seeing lot of errors due to which the sda path is fluctuating from failed state to active state and vicevera.

My requirement is something like this if sda is failed for more then 5 times in a hour duration ,then I want to keep the sda in failed state for few hours (3hrs)

And the data should travel only thorugh sdb path.
Will this be possible with the below parameters.
Can you just let me know what values I should add for delay_watch_checks and delay_wait_checks.

Regards,
Muneendra.



-----Original Message-----
From: Muneendra Kumar M 
Sent: Thursday, December 22, 2016 11:10 AM
To: 'Benjamin Marzinski' <bmarzins@redhat.com>
Cc: dm-devel@redhat.com
Subject: RE: [dm-devel] deterministic io throughput in multipath

Hi Ben,

Thanks for the reply.
I will look into this parameters will do the internal testing and let you know the results.

Regards,
Muneendra.

-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
Sent: Wednesday, December 21, 2016 9:40 PM
To: Muneendra Kumar M <mmandala@Brocade.com>
Cc: dm-devel@redhat.com
Subject: Re: [dm-devel] deterministic io throughput in multipath

Have you looked into the delay_watch_checks and delay_wait_checks configuration parameters?  The idea behind them is to minimize the use of paths that are intermittently failing.

-Ben

On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
>    Customers using Linux host (mostly RHEL host) using a SAN network for
>    block storage, complain the Linux multipath stack is not resilient to
>    handle non-deterministic storage network behaviors. This has caused many
>    customer move away to non-linux based servers. The intent of the below
>    patch and the prevailing issues are given below. With the below design we
>    are seeing the Linux multipath stack becoming resilient to such network
>    issues. We hope by getting this patch accepted will help in more Linux
>    server adoption that use SAN network.
> 
>    I have already sent the design details to the community in a different
>    mail chain and the details are available in the below link.
> 
>    [1]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e= .
> 
>    Can you please go through the design and send the comments to us.
> 
>     
> 
>    Regards,
> 
>    Muneendra.
> 
>     
> 
>     
> 
> References
> 
>    Visible links
>    1. 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_ar
> chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjub
> gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1K
> XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu
> 52hG3MKzM&e=

> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_ma
> ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2016-12-26  9:42   ` Muneendra Kumar M
@ 2017-01-03 17:12     ` Benjamin Marzinski
  2017-01-04 13:26       ` Muneendra Kumar M
  2017-01-16 11:19       ` Muneendra Kumar M
  0 siblings, 2 replies; 21+ messages in thread
From: Benjamin Marzinski @ 2017-01-03 17:12 UTC (permalink / raw)
  To: Muneendra Kumar M; +Cc: dm-devel

On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
> Hi Ben,
> 
> If there are two paths on a dm-1 say sda and sdb as below.
> 
> #  multipath -ll
> 	mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN MyLun
> 	size=8.0M features='0' hwhandler='0' wp=rw
> 	`-+- policy='round-robin 0' prio=50 status=active
> 	  |- 8:0:1:0  sda 8:48 active ready  running
> 	  `- 9:0:1:0  sdb 8:64 active ready  running          
> 
> And on sda if iam seeing lot of errors due to which the sda path is fluctuating from failed state to active state and vicevera.
> 
> My requirement is something like this if sda is failed for more then 5 times in a hour duration ,then I want to keep the sda in failed state for few hours (3hrs)
> 
> And the data should travel only thorugh sdb path.
> Will this be possible with the below parameters.

No. delay_watch_checks sets how may path checks you watch a path that
has recently come back from the failed state. If the path fails again
within this time, multipath device delays it.  This means that the delay
is always trigger by two failures within the time limit.  It's possible
to adapt this to count numbers of failures, and act after a certain
number within a certain timeframe, but it would take a bit more work.

delay_wait_checks doesn't guarantee that it will delay for any set
length of time.  Instead, it sets the number of consecutive successful
path checks that must occur before the path is usable again. You could
set this for 3 hours of path checks, but if a check failed during this
time, you would restart the 3 hours over again.

-Ben

> Can you just let me know what values I should add for delay_watch_checks and delay_wait_checks.
> 
> Regards,
> Muneendra.
> 
> 
> 
> -----Original Message-----
> From: Muneendra Kumar M 
> Sent: Thursday, December 22, 2016 11:10 AM
> To: 'Benjamin Marzinski' <bmarzins@redhat.com>
> Cc: dm-devel@redhat.com
> Subject: RE: [dm-devel] deterministic io throughput in multipath
> 
> Hi Ben,
> 
> Thanks for the reply.
> I will look into this parameters will do the internal testing and let you know the results.
> 
> Regards,
> Muneendra.
> 
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
> Sent: Wednesday, December 21, 2016 9:40 PM
> To: Muneendra Kumar M <mmandala@Brocade.com>
> Cc: dm-devel@redhat.com
> Subject: Re: [dm-devel] deterministic io throughput in multipath
> 
> Have you looked into the delay_watch_checks and delay_wait_checks configuration parameters?  The idea behind them is to minimize the use of paths that are intermittently failing.
> 
> -Ben
> 
> On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
> >    Customers using Linux host (mostly RHEL host) using a SAN network for
> >    block storage, complain the Linux multipath stack is not resilient to
> >    handle non-deterministic storage network behaviors. This has caused many
> >    customer move away to non-linux based servers. The intent of the below
> >    patch and the prevailing issues are given below. With the below design we
> >    are seeing the Linux multipath stack becoming resilient to such network
> >    issues. We hope by getting this patch accepted will help in more Linux
> >    server adoption that use SAN network.
> > 
> >    I have already sent the design details to the community in a different
> >    mail chain and the details are available in the below link.
> > 
> >    [1]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e= .
> > 
> >    Can you please go through the design and send the comments to us.
> > 
> >     
> > 
> >    Regards,
> > 
> >    Muneendra.
> > 
> >     
> > 
> >     
> > 
> > References
> > 
> >    Visible links
> >    1. 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_ar
> > chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjub
> > gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1K
> > XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu
> > 52hG3MKzM&e=
> 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_ma
> > ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-01-03 17:12     ` Benjamin Marzinski
@ 2017-01-04 13:26       ` Muneendra Kumar M
  2017-01-16 11:19       ` Muneendra Kumar M
  1 sibling, 0 replies; 21+ messages in thread
From: Muneendra Kumar M @ 2017-01-04 13:26 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel

Hi Ben,
Thanks for the information.

Regards,
Muneendra.

-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
Sent: Tuesday, January 03, 2017 10:42 PM
To: Muneendra Kumar M <mmandala@Brocade.com>
Cc: dm-devel@redhat.com
Subject: Re: [dm-devel] deterministic io throughput in multipath

On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
> Hi Ben,
> 
> If there are two paths on a dm-1 say sda and sdb as below.
> 
> #  multipath -ll
> 	mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN MyLun
> 	size=8.0M features='0' hwhandler='0' wp=rw
> 	`-+- policy='round-robin 0' prio=50 status=active
> 	  |- 8:0:1:0  sda 8:48 active ready  running
> 	  `- 9:0:1:0  sdb 8:64 active ready  running          
> 
> And on sda if iam seeing lot of errors due to which the sda path is fluctuating from failed state to active state and vicevera.
> 
> My requirement is something like this if sda is failed for more then 5 
> times in a hour duration ,then I want to keep the sda in failed state 
> for few hours (3hrs)
> 
> And the data should travel only thorugh sdb path.
> Will this be possible with the below parameters.

No. delay_watch_checks sets how may path checks you watch a path that has recently come back from the failed state. If the path fails again within this time, multipath device delays it.  This means that the delay is always trigger by two failures within the time limit.  It's possible to adapt this to count numbers of failures, and act after a certain number within a certain timeframe, but it would take a bit more work.

delay_wait_checks doesn't guarantee that it will delay for any set length of time.  Instead, it sets the number of consecutive successful path checks that must occur before the path is usable again. You could set this for 3 hours of path checks, but if a check failed during this time, you would restart the 3 hours over again.

-Ben

> Can you just let me know what values I should add for delay_watch_checks and delay_wait_checks.
> 
> Regards,
> Muneendra.
> 
> 
> 
> -----Original Message-----
> From: Muneendra Kumar M
> Sent: Thursday, December 22, 2016 11:10 AM
> To: 'Benjamin Marzinski' <bmarzins@redhat.com>
> Cc: dm-devel@redhat.com
> Subject: RE: [dm-devel] deterministic io throughput in multipath
> 
> Hi Ben,
> 
> Thanks for the reply.
> I will look into this parameters will do the internal testing and let you know the results.
> 
> Regards,
> Muneendra.
> 
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@redhat.com]
> Sent: Wednesday, December 21, 2016 9:40 PM
> To: Muneendra Kumar M <mmandala@Brocade.com>
> Cc: dm-devel@redhat.com
> Subject: Re: [dm-devel] deterministic io throughput in multipath
> 
> Have you looked into the delay_watch_checks and delay_wait_checks configuration parameters?  The idea behind them is to minimize the use of paths that are intermittently failing.
> 
> -Ben
> 
> On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
> >    Customers using Linux host (mostly RHEL host) using a SAN network for
> >    block storage, complain the Linux multipath stack is not resilient to
> >    handle non-deterministic storage network behaviors. This has caused many
> >    customer move away to non-linux based servers. The intent of the below
> >    patch and the prevailing issues are given below. With the below design we
> >    are seeing the Linux multipath stack becoming resilient to such network
> >    issues. We hope by getting this patch accepted will help in more Linux
> >    server adoption that use SAN network.
> > 
> >    I have already sent the design details to the community in a different
> >    mail chain and the details are available in the below link.
> > 
> >    [1]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e= .
> > 
> >    Can you please go through the design and send the comments to us.
> > 
> >     
> > 
> >    Regards,
> > 
> >    Muneendra.
> > 
> >     
> > 
> >     
> > 
> > References
> > 
> >    Visible links
> >    1. 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> > ar 
> > chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
> > ub 
> > gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
> > 1K 
> > XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
> > Ru
> > 52hG3MKzM&e=
> 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> > ma 
> > ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
> > 7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
> > i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-01-03 17:12     ` Benjamin Marzinski
  2017-01-04 13:26       ` Muneendra Kumar M
@ 2017-01-16 11:19       ` Muneendra Kumar M
  2017-01-17  1:04         ` Benjamin Marzinski
  1 sibling, 1 reply; 21+ messages in thread
From: Muneendra Kumar M @ 2017-01-16 11:19 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 7579 bytes --]

Hi Ben,
After the below discussion we  came with the approach which will meet our requirement.
I have attached the patch which is working good in our field tests.

Could you please review the attached patch and provide us your valuable comments .
Below are the files that has been changed .

 libmultipath/config.c      |  3 +++
 libmultipath/config.h      |  9 +++++++++
 libmultipath/configure.c   |  3 +++
 libmultipath/defaults.h    |  1 +
 libmultipath/dict.c            | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 libmultipath/dict.h        |  1 +
 libmultipath/propsel.c     | 44 ++++++++++++++++++++++++++++++++++++++++++++
 libmultipath/propsel.h     |  6 ++++++
 libmultipath/structs.h     | 12 +++++++++++-
 libmultipath/structs_vec.c | 10 ++++++++++
 multipath/multipath.conf.5 | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 multipathd/main.c          | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--

We have added three new config parameters whose description is below.
1.san_path_err_threshold:
        If set to a value greater than 0, multipathd will watch paths and check how many times a path has been failed due to errors. If the number of failures on a particular path is greater then the san_path_err_threshold then the path will not  reinstate  till san_path_err_recovery_time. These path failures should occur within a san_path_err_threshold_window time frame, if not we will consider the path is good enough to reinstate.

2.san_path_err_threshold_window:
        If set to a value greater than 0, multipathd will check whether the path failures has exceeded  the san_path_err_threshold within this time frame i.e san_path_err_threshold_window . If so we will not reinstate the path till         san_path_err_recovery_time.

3.san_path_err_recovery_time:
If set to a value greater than 0, multipathd will make sure that when path failures has exceeded the san_path_err_threshold within san_path_err_threshold_window then the path  will be placed in failed state for san_path_err_recovery_time duration. Once san_path_err_recovery_time has timeout  we will reinstate the failed path .

Regards,
Muneendra.

-----Original Message-----
From: Muneendra Kumar M
Sent: Wednesday, January 04, 2017 6:56 PM
To: 'Benjamin Marzinski' <bmarzins@redhat.com>
Cc: dm-devel@redhat.com
Subject: RE: [dm-devel] deterministic io throughput in multipath

Hi Ben,
Thanks for the information.

Regards,
Muneendra.

-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com]
Sent: Tuesday, January 03, 2017 10:42 PM
To: Muneendra Kumar M <mmandala@Brocade.com<mailto:mmandala@Brocade.com>>
Cc: dm-devel@redhat.com<mailto:dm-devel@redhat.com>
Subject: Re: [dm-devel] deterministic io throughput in multipath

On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
> Hi Ben,
>
> If there are two paths on a dm-1 say sda and sdb as below.
>
> #  multipath -ll
>       mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN MyLun
>       size=8.0M features='0' hwhandler='0' wp=rw
>       `-+- policy='round-robin 0' prio=50 status=active
>         |- 8:0:1:0  sda 8:48 active ready  running
>         `- 9:0:1:0  sdb 8:64 active ready  running
>
> And on sda if iam seeing lot of errors due to which the sda path is fluctuating from failed state to active state and vicevera.
>
> My requirement is something like this if sda is failed for more then 5
> times in a hour duration ,then I want to keep the sda in failed state
> for few hours (3hrs)
>
> And the data should travel only thorugh sdb path.
> Will this be possible with the below parameters.

No. delay_watch_checks sets how may path checks you watch a path that has recently come back from the failed state. If the path fails again within this time, multipath device delays it.  This means that the delay is always trigger by two failures within the time limit.  It's possible to adapt this to count numbers of failures, and act after a certain number within a certain timeframe, but it would take a bit more work.

delay_wait_checks doesn't guarantee that it will delay for any set length of time.  Instead, it sets the number of consecutive successful path checks that must occur before the path is usable again. You could set this for 3 hours of path checks, but if a check failed during this time, you would restart the 3 hours over again.

-Ben

> Can you just let me know what values I should add for delay_watch_checks and delay_wait_checks.
>
> Regards,
> Muneendra.
>
>
>
> -----Original Message-----
> From: Muneendra Kumar M
> Sent: Thursday, December 22, 2016 11:10 AM
> To: 'Benjamin Marzinski' <bmarzins@redhat.com<mailto:bmarzins@redhat.com>>
> Cc: dm-devel@redhat.com<mailto:dm-devel@redhat.com>
> Subject: RE: [dm-devel] deterministic io throughput in multipath
>
> Hi Ben,
>
> Thanks for the reply.
> I will look into this parameters will do the internal testing and let you know the results.
>
> Regards,
> Muneendra.
>
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@redhat.com]
> Sent: Wednesday, December 21, 2016 9:40 PM
> To: Muneendra Kumar M <mmandala@Brocade.com<mailto:mmandala@Brocade.com>>
> Cc: dm-devel@redhat.com<mailto:dm-devel@redhat.com>
> Subject: Re: [dm-devel] deterministic io throughput in multipath
>
> Have you looked into the delay_watch_checks and delay_wait_checks configuration parameters?  The idea behind them is to minimize the use of paths that are intermittently failing.
>
> -Ben
>
> On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
> >    Customers using Linux host (mostly RHEL host) using a SAN network for
> >    block storage, complain the Linux multipath stack is not resilient to
> >    handle non-deterministic storage network behaviors. This has caused many
> >    customer move away to non-linux based servers. The intent of the below
> >    patch and the prevailing issues are given below. With the below design we
> >    are seeing the Linux multipath stack becoming resilient to such network
> >    issues. We hope by getting this patch accepted will help in more Linux
> >    server adoption that use SAN network.
> >
> >    I have already sent the design details to the community in a different
> >    mail chain and the details are available in the below link.
> >
> >    [1]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e= .
> >
> >    Can you please go through the design and send the comments to us.
> >
> >
> >
> >    Regards,
> >
> >    Muneendra.
> >
> >
> >
> >
> >
> > References
> >
> >    Visible links
> >    1.
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> > ar
> > chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
> > ub
> > gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
> > 1K
> > XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
> > Ru
> > 52hG3MKzM&e=
>
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com<mailto:dm-devel@redhat.com>
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> > ma
> > ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
> > 7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
> > i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=


[-- Attachment #1.2: Type: text/html, Size: 13525 bytes --]

[-- Attachment #2: san_path_err.patch --]
[-- Type: application/octet-stream, Size: 20594 bytes --]

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 15ddbd8..19adb97 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -348,6 +348,9 @@ merge_hwe (struct hwentry * dst, struct hwentry * src)
 	merge_num(delay_wait_checks);
 	merge_num(skip_kpartx);
 	merge_num(max_sectors_kb);
+	merge_num(san_path_err_threshold);
+	merge_num(san_path_err_threshold_window);
+	merge_num(san_path_err_recovery_time);
 
 	/*
 	 * Make sure features is consistent with
diff --git a/libmultipath/config.h b/libmultipath/config.h
index 9670020..2985958 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -65,6 +65,9 @@ struct hwentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_threshold_window;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	char * bl_product;
@@ -93,6 +96,9 @@ struct mpentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_threshold_window;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	uid_t uid;
@@ -138,6 +144,9 @@ struct config {
 	int processed_main_config;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_threshold_window;
+	int san_path_err_recovery_time;
 	int uxsock_timeout;
 	int strict_timing;
 	int retrigger_tries;
diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index a0fcad9..0f50826 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -294,6 +294,9 @@ int setup_map(struct multipath *mpp, char *params, int params_size)
 	select_deferred_remove(conf, mpp);
 	select_delay_watch_checks(conf, mpp);
 	select_delay_wait_checks(conf, mpp);
+	select_san_path_err_threshold(conf, mpp);
+	select_san_path_err_threshold_window(conf, mpp);
+	select_san_path_err_recovery_time(conf, mpp);
 	select_skip_kpartx(conf, mpp);
 	select_max_sectors_kb(conf, mpp);
 
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index b9b0a37..9e8059c 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
@@ -24,6 +24,7 @@
 #define DEFAULT_DETECT_PRIO	DETECT_PRIO_ON
 #define DEFAULT_DEFERRED_REMOVE	DEFERRED_REMOVE_OFF
 #define DEFAULT_DELAY_CHECKS	DELAY_CHECKS_OFF
+#define DEFAULT_ERR_CHECKS	ERR_CHECKS_OFF
 #define DEFAULT_UEVENT_STACKSIZE 256
 #define DEFAULT_RETRIGGER_DELAY	10
 #define DEFAULT_RETRIGGER_TRIES	3
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index dc21846..a5689bd 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -1074,6 +1074,72 @@ declare_hw_snprint(delay_wait_checks, print_delay_checks)
 declare_mp_handler(delay_wait_checks, set_delay_checks)
 declare_mp_snprint(delay_wait_checks, print_delay_checks)
 
+
+static int
+set_path_err_info(vector strvec, void *ptr)
+{
+        int *int_ptr = (int *)ptr;
+        char * buff;
+
+        buff = set_value(strvec);
+        if (!buff)
+                return 1;
+
+        if (!strcmp(buff, "no") || !strcmp(buff, "0"))
+                *int_ptr = ERR_CHECKS_OFF;
+        else if ((*int_ptr = atoi(buff)) < 1)
+                *int_ptr = ERR_CHECKS_UNDEF;
+
+        FREE(buff);
+        return 0;
+}
+
+int
+print_path_err_info(char * buff, int len, void *ptr)
+{
+        int *int_ptr = (int *)ptr;
+
+        switch(*int_ptr) {
+        case ERR_CHECKS_UNDEF:
+                return 0;
+        case ERR_CHECKS_OFF:
+                return snprintf(buff, len, "\"off\"");
+        default:
+                return snprintf(buff, len, "%i", *int_ptr);
+        }
+}
+
+
+
+
+
+declare_def_handler(san_path_err_threshold, set_path_err_info)
+declare_def_snprint(san_path_err_threshold, print_path_err_info)
+declare_ovr_handler(san_path_err_threshold, set_path_err_info)
+declare_ovr_snprint(san_path_err_threshold, print_path_err_info)
+declare_hw_handler(san_path_err_threshold, set_path_err_info)
+declare_hw_snprint(san_path_err_threshold, print_path_err_info)
+declare_mp_handler(san_path_err_threshold, set_path_err_info)
+declare_mp_snprint(san_path_err_threshold, print_path_err_info)
+
+declare_def_handler(san_path_err_threshold_window, set_path_err_info)
+declare_def_snprint(san_path_err_threshold_window, print_path_err_info)
+declare_ovr_handler(san_path_err_threshold_window, set_path_err_info)
+declare_ovr_snprint(san_path_err_threshold_window, print_path_err_info)
+declare_hw_handler(san_path_err_threshold_window, set_path_err_info)
+declare_hw_snprint(san_path_err_threshold_window, print_path_err_info)
+declare_mp_handler(san_path_err_threshold_window, set_path_err_info)
+declare_mp_snprint(san_path_err_threshold_window, print_path_err_info)
+
+
+declare_def_handler(san_path_err_recovery_time, set_path_err_info)
+declare_def_snprint(san_path_err_recovery_time, print_path_err_info)
+declare_ovr_handler(san_path_err_recovery_time, set_path_err_info)
+declare_ovr_snprint(san_path_err_recovery_time, print_path_err_info)
+declare_hw_handler(san_path_err_recovery_time, set_path_err_info)
+declare_hw_snprint(san_path_err_recovery_time, print_path_err_info)
+declare_mp_handler(san_path_err_recovery_time, set_path_err_info)
+declare_mp_snprint(san_path_err_recovery_time, print_path_err_info)
 static int
 def_uxsock_timeout_handler(struct config *conf, vector strvec)
 {
@@ -1404,6 +1470,10 @@ init_keywords(vector keywords)
 	install_keyword("config_dir", &def_config_dir_handler, &snprint_def_config_dir);
 	install_keyword("delay_watch_checks", &def_delay_watch_checks_handler, &snprint_def_delay_watch_checks);
 	install_keyword("delay_wait_checks", &def_delay_wait_checks_handler, &snprint_def_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &def_san_path_err_threshold_handler, &snprint_def_san_path_err_threshold);
+        install_keyword("san_path_err_threshold_window", &def_san_path_err_threshold_window_handler, &snprint_def_san_path_err_threshold_window);
+        install_keyword("san_path_err_recovery_time", &def_san_path_err_recovery_time_handler, &snprint_def_san_path_err_recovery_time);
+
 	install_keyword("find_multipaths", &def_find_multipaths_handler, &snprint_def_find_multipaths);
 	install_keyword("uxsock_timeout", &def_uxsock_timeout_handler, &snprint_def_uxsock_timeout);
 	install_keyword("retrigger_tries", &def_retrigger_tries_handler, &snprint_def_retrigger_tries);
@@ -1486,6 +1556,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &hw_deferred_remove_handler, &snprint_hw_deferred_remove);
 	install_keyword("delay_watch_checks", &hw_delay_watch_checks_handler, &snprint_hw_delay_watch_checks);
 	install_keyword("delay_wait_checks", &hw_delay_wait_checks_handler, &snprint_hw_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &hw_san_path_err_threshold_handler, &snprint_hw_san_path_err_threshold);
+        install_keyword("san_path_err_threshold_window", &hw_san_path_err_threshold_window_handler, &snprint_hw_san_path_err_threshold_window);
+        install_keyword("san_path_err_recovery_time", &hw_san_path_err_recovery_time_handler, &snprint_hw_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &hw_skip_kpartx_handler, &snprint_hw_skip_kpartx);
 	install_keyword("max_sectors_kb", &hw_max_sectors_kb_handler, &snprint_hw_max_sectors_kb);
 	install_sublevel_end();
@@ -1515,6 +1588,10 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &ovr_deferred_remove_handler, &snprint_ovr_deferred_remove);
 	install_keyword("delay_watch_checks", &ovr_delay_watch_checks_handler, &snprint_ovr_delay_watch_checks);
 	install_keyword("delay_wait_checks", &ovr_delay_wait_checks_handler, &snprint_ovr_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &ovr_san_path_err_threshold_handler, &snprint_ovr_san_path_err_threshold);
+        install_keyword("san_path_err_threshold_window", &ovr_san_path_err_threshold_window_handler, &snprint_ovr_san_path_err_threshold_window);
+        install_keyword("san_path_err_recovery_time", &ovr_san_path_err_recovery_time_handler, &snprint_ovr_san_path_err_recovery_time);
+
 	install_keyword("skip_kpartx", &ovr_skip_kpartx_handler, &snprint_ovr_skip_kpartx);
 	install_keyword("max_sectors_kb", &ovr_max_sectors_kb_handler, &snprint_ovr_max_sectors_kb);
 
@@ -1543,6 +1620,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &mp_deferred_remove_handler, &snprint_mp_deferred_remove);
 	install_keyword("delay_watch_checks", &mp_delay_watch_checks_handler, &snprint_mp_delay_watch_checks);
 	install_keyword("delay_wait_checks", &mp_delay_wait_checks_handler, &snprint_mp_delay_wait_checks);
+	install_keyword("san_path_err_threshold", &mp_san_path_err_threshold_handler, &snprint_mp_san_path_err_threshold);
+	install_keyword("san_path_err_threshold_window", &mp_san_path_err_threshold_window_handler, &snprint_mp_san_path_err_threshold_window);
+	install_keyword("san_path_err_recovery_time", &mp_san_path_err_recovery_time_handler, &snprint_mp_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &mp_skip_kpartx_handler, &snprint_mp_skip_kpartx);
 	install_keyword("max_sectors_kb", &mp_max_sectors_kb_handler, &snprint_mp_max_sectors_kb);
 	install_sublevel_end();
diff --git a/libmultipath/dict.h b/libmultipath/dict.h
index 4cd03c5..adaa9f1 100644
--- a/libmultipath/dict.h
+++ b/libmultipath/dict.h
@@ -15,5 +15,6 @@ int print_fast_io_fail(char * buff, int len, void *ptr);
 int print_dev_loss(char * buff, int len, void *ptr);
 int print_reservation_key(char * buff, int len, void * ptr);
 int print_delay_checks(char * buff, int len, void *ptr);
+int print_path_err_info(char * buff, int len, void *ptr);
 
 #endif /* _DICT_H */
diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index c0bc616..f4ca378 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -643,7 +643,51 @@ out:
 	return 0;
 
 }
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_threshold);
+        mp_set_ovr(san_path_err_threshold);
+        mp_set_hwe(san_path_err_threshold);
+        mp_set_conf(san_path_err_threshold);
+        mp_set_default(san_path_err_threshold, DEFAULT_ERR_CHECKS);
+out:
+        print_path_err_info(buff, 12, &mp->san_path_err_threshold);
+        condlog(3, "%s: san_path_err_threshold = %s %s", mp->alias, buff, origin);
+        return 0;
+}
+
+int select_san_path_err_threshold_window(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_threshold_window);
+        mp_set_ovr(san_path_err_threshold_window);
+        mp_set_hwe(san_path_err_threshold_window);
+        mp_set_conf(san_path_err_threshold_window);
+        mp_set_default(san_path_err_threshold_window, DEFAULT_ERR_CHECKS);
+out:
+        print_path_err_info(buff, 12, &mp->san_path_err_threshold_window);
+        condlog(3, "%s: san_path_err_threshold_window = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
 
+        mp_set_mpe(san_path_err_recovery_time);
+        mp_set_ovr(san_path_err_recovery_time);
+        mp_set_hwe(san_path_err_recovery_time);
+        mp_set_conf(san_path_err_recovery_time);
+        mp_set_default(san_path_err_recovery_time, DEFAULT_ERR_CHECKS);
+out:
+        print_path_err_info(buff, 12, &mp->san_path_err_recovery_time);
+        condlog(3, "%s: san_path_err_recovery_time = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
 int select_skip_kpartx (struct config *conf, struct multipath * mp)
 {
 	char *origin;
diff --git a/libmultipath/propsel.h b/libmultipath/propsel.h
index ad98fa5..88b5840 100644
--- a/libmultipath/propsel.h
+++ b/libmultipath/propsel.h
@@ -24,3 +24,9 @@ int select_delay_watch_checks (struct config *conf, struct multipath * mp);
 int select_delay_wait_checks (struct config *conf, struct multipath * mp);
 int select_skip_kpartx (struct config *conf, struct multipath * mp);
 int select_max_sectors_kb (struct config *conf, struct multipath * mp);
+int select_san_path_err_threshold_window(struct config *conf, struct multipath *mp);
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp);
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp);
+
+
+
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 396f69d..8b7a803 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -156,6 +156,10 @@ enum delay_checks_states {
 	DELAY_CHECKS_OFF = -1,
 	DELAY_CHECKS_UNDEF = 0,
 };
+enum err_checks_states {
+	ERR_CHECKS_OFF = -1,
+	ERR_CHECKS_UNDEF = 0,
+};
 
 enum initialized_states {
 	INIT_FAILED,
@@ -223,7 +227,10 @@ struct path {
 	int initialized;
 	int retriggers;
 	int wwid_changed;
-
+	unsigned int path_failures;
+	time_t   failure_start_time;
+	time_t dis_reinstante_time;
+	int disable_reinstate;
 	/* configlet pointers */
 	struct hwentry * hwe;
 };
@@ -255,6 +262,9 @@ struct multipath {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_threshold_window;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	unsigned int dev_loss;
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 22be8e0..bf84b17 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -546,6 +546,7 @@ int update_multipath (struct vectors *vecs, char *mapname, int reset)
 	struct pathgroup  *pgp;
 	struct path *pp;
 	int i, j;
+	struct timespec start_time;
 
 	mpp = find_mp_by_alias(vecs->mpvec, mapname);
 
@@ -570,6 +571,15 @@ int update_multipath (struct vectors *vecs, char *mapname, int reset)
 				int oldstate = pp->state;
 				condlog(2, "%s: mark as failed", pp->dev);
 				mpp->stat_path_failures++;
+				/*Captured the time when we see the first failure on the path*/
+				if(pp->path_failures == 0) {
+					if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
+						start_time.tv_sec = 0;
+					pp->failure_start_time = start_time.tv_sec;
+	
+				}
+				/*Increment the number of path failures*/
+				pp->path_failures++;
 				pp->state = PATH_DOWN;
 				if (oldstate == PATH_UP ||
 				    oldstate == PATH_GHOST)
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 36589f5..7dfd48a 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
@@ -751,6 +751,46 @@ The default is: \fB/etc/multipath/conf.d/\fR
 .
 .
 .TP
+.B san_path_err_threshold
+If set to a value greater than 0, multipathd will watch paths and check how many
+times a path has been failed due to errors.If the number of failures on a particular
+path is greater then the san_path_err_threshold then the path will not  reinstante
+till san_path_err_recovery_time.These path failures should occur within a 
+san_path_err_threshold_window time frame, if not we will consider the path is good enough
+to reinstantate.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_threshold_window
+If set to a value greater than 0, multipathd will check whether the path failures
+has exceeded  the san_path_err_threshold within this time frame i.e 
+san_path_err_threshold_window . If so we will not reinstante the path till
+san_path_err_recovery_time.
+san_path_err_threshold_window value should be in secs.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_recovery_time
+If set to a value greater than 0, multipathd will make sure that when path failures
+has exceeded the san_path_err_threshold within san_path_err_threshold_window then the path
+will be placed in failed state for san_path_err_recovery_time duration.Once san_path_err_recovery_time
+has timeout  we will reinstante the failed path .
+san_path_err_recovery_time value should be in secs.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
 .B delay_watch_checks
 If set to a value greater than 0, multipathd will watch paths that have
 recently become valid for this many checks. If they fail again while they are
@@ -1015,6 +1055,12 @@ are taken from the \fIdefaults\fR or \fIdevices\fR section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_threshold_window
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1128,6 +1174,12 @@ section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_threshold_window
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1192,6 +1244,12 @@ the values are taken from the \fIdevices\fR or \fIdefaults\fR sections:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_threshold_window
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
diff --git a/multipathd/main.c b/multipathd/main.c
index adc3258..facfc03 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1486,7 +1486,54 @@ void repair_path(struct path * pp)
 	checker_repair(&pp->checker);
 	LOG_MSG(1, checker_message(&pp->checker));
 }
+static int check_path_validity_err( struct path * pp){
+	struct timespec start_time;
+	int disable_reinstate = 0;
+
+	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
+		start_time.tv_sec = 0;
+
+		/*If number of path failures are more then the san_path_err_threshold*/
+		if((pp->mpp->san_path_err_threshold > 0)&& (pp->path_failures > pp->mpp->san_path_err_threshold)){
+			condlog(3,"\npath %s :hit the error threshold\n",pp->dev);
+
+			if(!pp->disable_reinstate){
+				/*if the error threshold has hit hit within the san_path_err_threshold_window
+				 * time frame donot reinstante the path till the san_path_err_recovery_time
+				 * place the path in failed state till san_path_err_recovery_time so that the 
+				 * cutomer can rectify the issue within this time .Once the copletion of 
+				 * san_path_err_recovery_time it should automatically reinstantate the path
+				 * */
+				if((pp->mpp->san_path_err_threshold_window > 0) && 
+				   ((start_time.tv_sec - pp->failure_start_time) < pp->mpp->san_path_err_threshold_window)){
+					condlog(3,"\npath %s :hit the error threshold within the thrshold window time\n",pp->dev);
+					disable_reinstate = 1; 
+					pp->dis_reinstante_time = start_time.tv_sec ;
+					pp->disable_reinstate = 1;
+				}else{
+					/*even though the number of errors are greater then the san_path_err_threshold
+					 *since it doesnot hit within the san_path_err_threshold_window time  we should not take these
+					 * errros into account and we have to rewatch the errors
+					 */
+					pp->path_failures = 0;
+					pp->disable_reinstate = 0;
+
+				}
+			}
+			if(pp->disable_reinstate){
+				disable_reinstate = 1;
+				if((pp->mpp->san_path_err_recovery_time > 0) && 
+				   (start_time.tv_sec - pp->dis_reinstante_time ) > pp->mpp->san_path_err_recovery_time){
+					disable_reinstate =0;
+					pp->path_failures = 0;
+					pp->disable_reinstate = 0;
+					 condlog(3,"\npath %s :reinstate the path after err recovery time\n",pp->dev);
+				}
 
+			}
+		}
+	return  disable_reinstate;
+}
 /*
  * Returns '1' if the path has been checked, '-1' if it was blacklisted
  * and '0' otherwise
@@ -1503,7 +1550,11 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	int retrigger_tries, checkint;
 	struct config *conf;
 	int ret;
+	struct timespec start_time;
 
+	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
+		start_time.tv_sec = 0;
+	
 	if ((pp->initialized == INIT_OK ||
 	     pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp)
 		return 0;
@@ -1615,12 +1666,18 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	 * and if target supports only implicit tpgs mode.
 	 * this will prevent unnecessary i/o by dm on stand-by
 	 * paths if there are no other active paths in map.
+	 *
+	 * when path failures has exceeded the san_path_err_threshold 
+	 * within san_path_err_threshold_window then we don't reinstate
+	 * failed path for san_path_err_recovery_time
 	 */
-	disable_reinstate = (newstate == PATH_GHOST &&
+	disable_reinstate = ((newstate == PATH_GHOST &&
 			    pp->mpp->nr_active == 0 &&
-			    pp->tpgs == TPGS_IMPLICIT) ? 1 : 0;
+			    pp->tpgs == TPGS_IMPLICIT) ? 1 :
+			    check_path_validity_err(pp));
 
 	pp->chkrstate = newstate;
+
 	if (newstate != pp->state) {
 		int oldstate = pp->state;
 		pp->state = newstate;

[-- Attachment #3: san_path_err.patch --]
[-- Type: application/octet-stream, Size: 20594 bytes --]

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 15ddbd8..19adb97 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -348,6 +348,9 @@ merge_hwe (struct hwentry * dst, struct hwentry * src)
 	merge_num(delay_wait_checks);
 	merge_num(skip_kpartx);
 	merge_num(max_sectors_kb);
+	merge_num(san_path_err_threshold);
+	merge_num(san_path_err_threshold_window);
+	merge_num(san_path_err_recovery_time);
 
 	/*
 	 * Make sure features is consistent with
diff --git a/libmultipath/config.h b/libmultipath/config.h
index 9670020..2985958 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -65,6 +65,9 @@ struct hwentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_threshold_window;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	char * bl_product;
@@ -93,6 +96,9 @@ struct mpentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_threshold_window;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	uid_t uid;
@@ -138,6 +144,9 @@ struct config {
 	int processed_main_config;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_threshold_window;
+	int san_path_err_recovery_time;
 	int uxsock_timeout;
 	int strict_timing;
 	int retrigger_tries;
diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index a0fcad9..0f50826 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -294,6 +294,9 @@ int setup_map(struct multipath *mpp, char *params, int params_size)
 	select_deferred_remove(conf, mpp);
 	select_delay_watch_checks(conf, mpp);
 	select_delay_wait_checks(conf, mpp);
+	select_san_path_err_threshold(conf, mpp);
+	select_san_path_err_threshold_window(conf, mpp);
+	select_san_path_err_recovery_time(conf, mpp);
 	select_skip_kpartx(conf, mpp);
 	select_max_sectors_kb(conf, mpp);
 
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index b9b0a37..9e8059c 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
@@ -24,6 +24,7 @@
 #define DEFAULT_DETECT_PRIO	DETECT_PRIO_ON
 #define DEFAULT_DEFERRED_REMOVE	DEFERRED_REMOVE_OFF
 #define DEFAULT_DELAY_CHECKS	DELAY_CHECKS_OFF
+#define DEFAULT_ERR_CHECKS	ERR_CHECKS_OFF
 #define DEFAULT_UEVENT_STACKSIZE 256
 #define DEFAULT_RETRIGGER_DELAY	10
 #define DEFAULT_RETRIGGER_TRIES	3
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index dc21846..a5689bd 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -1074,6 +1074,72 @@ declare_hw_snprint(delay_wait_checks, print_delay_checks)
 declare_mp_handler(delay_wait_checks, set_delay_checks)
 declare_mp_snprint(delay_wait_checks, print_delay_checks)
 
+
+static int
+set_path_err_info(vector strvec, void *ptr)
+{
+        int *int_ptr = (int *)ptr;
+        char * buff;
+
+        buff = set_value(strvec);
+        if (!buff)
+                return 1;
+
+        if (!strcmp(buff, "no") || !strcmp(buff, "0"))
+                *int_ptr = ERR_CHECKS_OFF;
+        else if ((*int_ptr = atoi(buff)) < 1)
+                *int_ptr = ERR_CHECKS_UNDEF;
+
+        FREE(buff);
+        return 0;
+}
+
+int
+print_path_err_info(char * buff, int len, void *ptr)
+{
+        int *int_ptr = (int *)ptr;
+
+        switch(*int_ptr) {
+        case ERR_CHECKS_UNDEF:
+                return 0;
+        case ERR_CHECKS_OFF:
+                return snprintf(buff, len, "\"off\"");
+        default:
+                return snprintf(buff, len, "%i", *int_ptr);
+        }
+}
+
+
+
+
+
+declare_def_handler(san_path_err_threshold, set_path_err_info)
+declare_def_snprint(san_path_err_threshold, print_path_err_info)
+declare_ovr_handler(san_path_err_threshold, set_path_err_info)
+declare_ovr_snprint(san_path_err_threshold, print_path_err_info)
+declare_hw_handler(san_path_err_threshold, set_path_err_info)
+declare_hw_snprint(san_path_err_threshold, print_path_err_info)
+declare_mp_handler(san_path_err_threshold, set_path_err_info)
+declare_mp_snprint(san_path_err_threshold, print_path_err_info)
+
+declare_def_handler(san_path_err_threshold_window, set_path_err_info)
+declare_def_snprint(san_path_err_threshold_window, print_path_err_info)
+declare_ovr_handler(san_path_err_threshold_window, set_path_err_info)
+declare_ovr_snprint(san_path_err_threshold_window, print_path_err_info)
+declare_hw_handler(san_path_err_threshold_window, set_path_err_info)
+declare_hw_snprint(san_path_err_threshold_window, print_path_err_info)
+declare_mp_handler(san_path_err_threshold_window, set_path_err_info)
+declare_mp_snprint(san_path_err_threshold_window, print_path_err_info)
+
+
+declare_def_handler(san_path_err_recovery_time, set_path_err_info)
+declare_def_snprint(san_path_err_recovery_time, print_path_err_info)
+declare_ovr_handler(san_path_err_recovery_time, set_path_err_info)
+declare_ovr_snprint(san_path_err_recovery_time, print_path_err_info)
+declare_hw_handler(san_path_err_recovery_time, set_path_err_info)
+declare_hw_snprint(san_path_err_recovery_time, print_path_err_info)
+declare_mp_handler(san_path_err_recovery_time, set_path_err_info)
+declare_mp_snprint(san_path_err_recovery_time, print_path_err_info)
 static int
 def_uxsock_timeout_handler(struct config *conf, vector strvec)
 {
@@ -1404,6 +1470,10 @@ init_keywords(vector keywords)
 	install_keyword("config_dir", &def_config_dir_handler, &snprint_def_config_dir);
 	install_keyword("delay_watch_checks", &def_delay_watch_checks_handler, &snprint_def_delay_watch_checks);
 	install_keyword("delay_wait_checks", &def_delay_wait_checks_handler, &snprint_def_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &def_san_path_err_threshold_handler, &snprint_def_san_path_err_threshold);
+        install_keyword("san_path_err_threshold_window", &def_san_path_err_threshold_window_handler, &snprint_def_san_path_err_threshold_window);
+        install_keyword("san_path_err_recovery_time", &def_san_path_err_recovery_time_handler, &snprint_def_san_path_err_recovery_time);
+
 	install_keyword("find_multipaths", &def_find_multipaths_handler, &snprint_def_find_multipaths);
 	install_keyword("uxsock_timeout", &def_uxsock_timeout_handler, &snprint_def_uxsock_timeout);
 	install_keyword("retrigger_tries", &def_retrigger_tries_handler, &snprint_def_retrigger_tries);
@@ -1486,6 +1556,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &hw_deferred_remove_handler, &snprint_hw_deferred_remove);
 	install_keyword("delay_watch_checks", &hw_delay_watch_checks_handler, &snprint_hw_delay_watch_checks);
 	install_keyword("delay_wait_checks", &hw_delay_wait_checks_handler, &snprint_hw_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &hw_san_path_err_threshold_handler, &snprint_hw_san_path_err_threshold);
+        install_keyword("san_path_err_threshold_window", &hw_san_path_err_threshold_window_handler, &snprint_hw_san_path_err_threshold_window);
+        install_keyword("san_path_err_recovery_time", &hw_san_path_err_recovery_time_handler, &snprint_hw_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &hw_skip_kpartx_handler, &snprint_hw_skip_kpartx);
 	install_keyword("max_sectors_kb", &hw_max_sectors_kb_handler, &snprint_hw_max_sectors_kb);
 	install_sublevel_end();
@@ -1515,6 +1588,10 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &ovr_deferred_remove_handler, &snprint_ovr_deferred_remove);
 	install_keyword("delay_watch_checks", &ovr_delay_watch_checks_handler, &snprint_ovr_delay_watch_checks);
 	install_keyword("delay_wait_checks", &ovr_delay_wait_checks_handler, &snprint_ovr_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &ovr_san_path_err_threshold_handler, &snprint_ovr_san_path_err_threshold);
+        install_keyword("san_path_err_threshold_window", &ovr_san_path_err_threshold_window_handler, &snprint_ovr_san_path_err_threshold_window);
+        install_keyword("san_path_err_recovery_time", &ovr_san_path_err_recovery_time_handler, &snprint_ovr_san_path_err_recovery_time);
+
 	install_keyword("skip_kpartx", &ovr_skip_kpartx_handler, &snprint_ovr_skip_kpartx);
 	install_keyword("max_sectors_kb", &ovr_max_sectors_kb_handler, &snprint_ovr_max_sectors_kb);
 
@@ -1543,6 +1620,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &mp_deferred_remove_handler, &snprint_mp_deferred_remove);
 	install_keyword("delay_watch_checks", &mp_delay_watch_checks_handler, &snprint_mp_delay_watch_checks);
 	install_keyword("delay_wait_checks", &mp_delay_wait_checks_handler, &snprint_mp_delay_wait_checks);
+	install_keyword("san_path_err_threshold", &mp_san_path_err_threshold_handler, &snprint_mp_san_path_err_threshold);
+	install_keyword("san_path_err_threshold_window", &mp_san_path_err_threshold_window_handler, &snprint_mp_san_path_err_threshold_window);
+	install_keyword("san_path_err_recovery_time", &mp_san_path_err_recovery_time_handler, &snprint_mp_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &mp_skip_kpartx_handler, &snprint_mp_skip_kpartx);
 	install_keyword("max_sectors_kb", &mp_max_sectors_kb_handler, &snprint_mp_max_sectors_kb);
 	install_sublevel_end();
diff --git a/libmultipath/dict.h b/libmultipath/dict.h
index 4cd03c5..adaa9f1 100644
--- a/libmultipath/dict.h
+++ b/libmultipath/dict.h
@@ -15,5 +15,6 @@ int print_fast_io_fail(char * buff, int len, void *ptr);
 int print_dev_loss(char * buff, int len, void *ptr);
 int print_reservation_key(char * buff, int len, void * ptr);
 int print_delay_checks(char * buff, int len, void *ptr);
+int print_path_err_info(char * buff, int len, void *ptr);
 
 #endif /* _DICT_H */
diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index c0bc616..f4ca378 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -643,7 +643,51 @@ out:
 	return 0;
 
 }
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_threshold);
+        mp_set_ovr(san_path_err_threshold);
+        mp_set_hwe(san_path_err_threshold);
+        mp_set_conf(san_path_err_threshold);
+        mp_set_default(san_path_err_threshold, DEFAULT_ERR_CHECKS);
+out:
+        print_path_err_info(buff, 12, &mp->san_path_err_threshold);
+        condlog(3, "%s: san_path_err_threshold = %s %s", mp->alias, buff, origin);
+        return 0;
+}
+
+int select_san_path_err_threshold_window(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_threshold_window);
+        mp_set_ovr(san_path_err_threshold_window);
+        mp_set_hwe(san_path_err_threshold_window);
+        mp_set_conf(san_path_err_threshold_window);
+        mp_set_default(san_path_err_threshold_window, DEFAULT_ERR_CHECKS);
+out:
+        print_path_err_info(buff, 12, &mp->san_path_err_threshold_window);
+        condlog(3, "%s: san_path_err_threshold_window = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
 
+        mp_set_mpe(san_path_err_recovery_time);
+        mp_set_ovr(san_path_err_recovery_time);
+        mp_set_hwe(san_path_err_recovery_time);
+        mp_set_conf(san_path_err_recovery_time);
+        mp_set_default(san_path_err_recovery_time, DEFAULT_ERR_CHECKS);
+out:
+        print_path_err_info(buff, 12, &mp->san_path_err_recovery_time);
+        condlog(3, "%s: san_path_err_recovery_time = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
 int select_skip_kpartx (struct config *conf, struct multipath * mp)
 {
 	char *origin;
diff --git a/libmultipath/propsel.h b/libmultipath/propsel.h
index ad98fa5..88b5840 100644
--- a/libmultipath/propsel.h
+++ b/libmultipath/propsel.h
@@ -24,3 +24,9 @@ int select_delay_watch_checks (struct config *conf, struct multipath * mp);
 int select_delay_wait_checks (struct config *conf, struct multipath * mp);
 int select_skip_kpartx (struct config *conf, struct multipath * mp);
 int select_max_sectors_kb (struct config *conf, struct multipath * mp);
+int select_san_path_err_threshold_window(struct config *conf, struct multipath *mp);
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp);
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp);
+
+
+
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 396f69d..8b7a803 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -156,6 +156,10 @@ enum delay_checks_states {
 	DELAY_CHECKS_OFF = -1,
 	DELAY_CHECKS_UNDEF = 0,
 };
+enum err_checks_states {
+	ERR_CHECKS_OFF = -1,
+	ERR_CHECKS_UNDEF = 0,
+};
 
 enum initialized_states {
 	INIT_FAILED,
@@ -223,7 +227,10 @@ struct path {
 	int initialized;
 	int retriggers;
 	int wwid_changed;
-
+	unsigned int path_failures;
+	time_t   failure_start_time;
+	time_t dis_reinstante_time;
+	int disable_reinstate;
 	/* configlet pointers */
 	struct hwentry * hwe;
 };
@@ -255,6 +262,9 @@ struct multipath {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_threshold_window;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	unsigned int dev_loss;
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 22be8e0..bf84b17 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -546,6 +546,7 @@ int update_multipath (struct vectors *vecs, char *mapname, int reset)
 	struct pathgroup  *pgp;
 	struct path *pp;
 	int i, j;
+	struct timespec start_time;
 
 	mpp = find_mp_by_alias(vecs->mpvec, mapname);
 
@@ -570,6 +571,15 @@ int update_multipath (struct vectors *vecs, char *mapname, int reset)
 				int oldstate = pp->state;
 				condlog(2, "%s: mark as failed", pp->dev);
 				mpp->stat_path_failures++;
+				/*Captured the time when we see the first failure on the path*/
+				if(pp->path_failures == 0) {
+					if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
+						start_time.tv_sec = 0;
+					pp->failure_start_time = start_time.tv_sec;
+	
+				}
+				/*Increment the number of path failures*/
+				pp->path_failures++;
 				pp->state = PATH_DOWN;
 				if (oldstate == PATH_UP ||
 				    oldstate == PATH_GHOST)
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 36589f5..7dfd48a 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
@@ -751,6 +751,46 @@ The default is: \fB/etc/multipath/conf.d/\fR
 .
 .
 .TP
+.B san_path_err_threshold
+If set to a value greater than 0, multipathd will watch paths and check how many
+times a path has been failed due to errors.If the number of failures on a particular
+path is greater then the san_path_err_threshold then the path will not  reinstante
+till san_path_err_recovery_time.These path failures should occur within a 
+san_path_err_threshold_window time frame, if not we will consider the path is good enough
+to reinstantate.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_threshold_window
+If set to a value greater than 0, multipathd will check whether the path failures
+has exceeded  the san_path_err_threshold within this time frame i.e 
+san_path_err_threshold_window . If so we will not reinstante the path till
+san_path_err_recovery_time.
+san_path_err_threshold_window value should be in secs.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_recovery_time
+If set to a value greater than 0, multipathd will make sure that when path failures
+has exceeded the san_path_err_threshold within san_path_err_threshold_window then the path
+will be placed in failed state for san_path_err_recovery_time duration.Once san_path_err_recovery_time
+has timeout  we will reinstante the failed path .
+san_path_err_recovery_time value should be in secs.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
 .B delay_watch_checks
 If set to a value greater than 0, multipathd will watch paths that have
 recently become valid for this many checks. If they fail again while they are
@@ -1015,6 +1055,12 @@ are taken from the \fIdefaults\fR or \fIdevices\fR section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_threshold_window
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1128,6 +1174,12 @@ section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_threshold_window
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1192,6 +1244,12 @@ the values are taken from the \fIdevices\fR or \fIdefaults\fR sections:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_threshold_window
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
diff --git a/multipathd/main.c b/multipathd/main.c
index adc3258..facfc03 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1486,7 +1486,54 @@ void repair_path(struct path * pp)
 	checker_repair(&pp->checker);
 	LOG_MSG(1, checker_message(&pp->checker));
 }
+static int check_path_validity_err( struct path * pp){
+	struct timespec start_time;
+	int disable_reinstate = 0;
+
+	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
+		start_time.tv_sec = 0;
+
+		/*If number of path failures are more then the san_path_err_threshold*/
+		if((pp->mpp->san_path_err_threshold > 0)&& (pp->path_failures > pp->mpp->san_path_err_threshold)){
+			condlog(3,"\npath %s :hit the error threshold\n",pp->dev);
+
+			if(!pp->disable_reinstate){
+				/*if the error threshold has hit hit within the san_path_err_threshold_window
+				 * time frame donot reinstante the path till the san_path_err_recovery_time
+				 * place the path in failed state till san_path_err_recovery_time so that the 
+				 * cutomer can rectify the issue within this time .Once the copletion of 
+				 * san_path_err_recovery_time it should automatically reinstantate the path
+				 * */
+				if((pp->mpp->san_path_err_threshold_window > 0) && 
+				   ((start_time.tv_sec - pp->failure_start_time) < pp->mpp->san_path_err_threshold_window)){
+					condlog(3,"\npath %s :hit the error threshold within the thrshold window time\n",pp->dev);
+					disable_reinstate = 1; 
+					pp->dis_reinstante_time = start_time.tv_sec ;
+					pp->disable_reinstate = 1;
+				}else{
+					/*even though the number of errors are greater then the san_path_err_threshold
+					 *since it doesnot hit within the san_path_err_threshold_window time  we should not take these
+					 * errros into account and we have to rewatch the errors
+					 */
+					pp->path_failures = 0;
+					pp->disable_reinstate = 0;
+
+				}
+			}
+			if(pp->disable_reinstate){
+				disable_reinstate = 1;
+				if((pp->mpp->san_path_err_recovery_time > 0) && 
+				   (start_time.tv_sec - pp->dis_reinstante_time ) > pp->mpp->san_path_err_recovery_time){
+					disable_reinstate =0;
+					pp->path_failures = 0;
+					pp->disable_reinstate = 0;
+					 condlog(3,"\npath %s :reinstate the path after err recovery time\n",pp->dev);
+				}
 
+			}
+		}
+	return  disable_reinstate;
+}
 /*
  * Returns '1' if the path has been checked, '-1' if it was blacklisted
  * and '0' otherwise
@@ -1503,7 +1550,11 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	int retrigger_tries, checkint;
 	struct config *conf;
 	int ret;
+	struct timespec start_time;
 
+	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
+		start_time.tv_sec = 0;
+	
 	if ((pp->initialized == INIT_OK ||
 	     pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp)
 		return 0;
@@ -1615,12 +1666,18 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	 * and if target supports only implicit tpgs mode.
 	 * this will prevent unnecessary i/o by dm on stand-by
 	 * paths if there are no other active paths in map.
+	 *
+	 * when path failures has exceeded the san_path_err_threshold 
+	 * within san_path_err_threshold_window then we don't reinstate
+	 * failed path for san_path_err_recovery_time
 	 */
-	disable_reinstate = (newstate == PATH_GHOST &&
+	disable_reinstate = ((newstate == PATH_GHOST &&
 			    pp->mpp->nr_active == 0 &&
-			    pp->tpgs == TPGS_IMPLICIT) ? 1 : 0;
+			    pp->tpgs == TPGS_IMPLICIT) ? 1 :
+			    check_path_validity_err(pp));
 
 	pp->chkrstate = newstate;
+
 	if (newstate != pp->state) {
 		int oldstate = pp->state;
 		pp->state = newstate;

[-- Attachment #4: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-01-16 11:19       ` Muneendra Kumar M
@ 2017-01-17  1:04         ` Benjamin Marzinski
  2017-01-17 10:43           ` Muneendra Kumar M
  2017-01-23 11:02           ` Muneendra Kumar M
  0 siblings, 2 replies; 21+ messages in thread
From: Benjamin Marzinski @ 2017-01-17  1:04 UTC (permalink / raw)
  To: Muneendra Kumar M; +Cc: dm-devel

On Mon, Jan 16, 2017 at 11:19:19AM +0000, Muneendra Kumar M wrote:
>    Hi Ben,
>    After the below discussion we  came with the approach which will meet our
>    requirement.
>    I have attached the patch which is working good in our field tests.
>    Could you please review the attached patch and provide us your valuable
>    comments .

I can see a number of issues with this patch.

First, some nit-picks:
- I assume "dis_reinstante_time" should be "dis_reinstate_time"

- The indenting in check_path_validity_err is wrong, which made it
  confusing until I noticed that

if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)

  doesn't have an open brace, and shouldn't indent the rest of the
  function.

- You call clock_gettime in check_path, but never use the result.

- In dict.c, instead of writing your own functions that are the same as
  the *_delay_checks functions, you could make those functions generic
  and use them for both.  To go match the other generic function names
  they would probably be something like

set_off_int_undef

print_off_int_undef

  You would also need to change DELAY_CHECKS_* and ERR_CHECKS_* to
  point to some common enum that you created, the way
  user_friendly_names_states (to name one of many) does. The generic
  enum used by *_off_int_undef would be something like.

enum no_undef {
	NU_NO = -1,
	NU_UNDEF = 0,
}

  The idea is to try to cut down on the number of functions that are
  simply copy-pasting other functions in dict.c.


Those are all minor cleanup issues, but there are some bigger problems.

Instead of checking if san_path_err_threshold,
san_path_err_threshold_window, and san_path_err_recovery_time are
greater than zero seperately, you should probably check them all at the
start of check_path_validity_err, and return 0 unless they all are set.
Right now, if a user sets san_path_err_threshold and
san_path_err_threshold_window but not san_path_err_recovery_time, their
path will never recover after it hits the error threshold.  I pretty
sure that you don't mean to permanently disable the paths.


time_t is a signed type, which means that if you get the clock time in
update_multpath and then fail to get the clock time in
check_path_validity_err, this check:

start_time.tv_sec - pp->failure_start_time) < pp->mpp->san_path_err_threshold_window

will always be true.  I realize that clock_gettime is very unlikely to
fail.  But if it does, probably the safest thing to so is to just
immediately return 0 in check_path_validity_err.


The way you set path_failures in update_multipath may not get you what
you want.  It will only count path failures found by the kernel, and not
the path checker.  If the check_path finds the error, pp->state will be
set to PATH_DOWN before pp->dmstate is set to PSTATE_FAILED. That means
you will not increment path_failures. Perhaps this is what you want, but
I would assume that you would want to count every time the path goes
down regardless of whether multipathd or the kernel noticed it.


I'm not super enthusiastic about how the san_path_err_threshold_window
works.  First, it starts counting from when the path goes down, so if
the path takes long enough to get restored, and then fails immediately,
it can just keep failing and it will never hit the
san_path_err_threshold_window, since it spends so much of that time with
the path failed.  Also, the window gets set on the first error, and
never reset until the number of errors is over the threshold.  This
means that if you get one early error and then a bunch of errors much
later, you will go for (2 x san_path_err_threshold) - 1 errors until you
stop reinstating the path, because of the window reset in the middle of
the string of errors.  It seems like a better idea would be to have
check_path_validity_err reset path_failures as soon as it notices that
you are past san_path_err_threshold_window, instead of waiting till the
number of errors hits san_path_err_threshold.


If I was going to design this, I think I would have
san_path_err_threshold and san_path_err_recovery_time like you do, but
instead of having a san_path_err_threshold_window, I would have
something like san_path_err_forget_rate.  The idea is that every
san_path_err_forget_rate number of successful path checks you decrement
path_failures by 1. This means that there is no window after which you
reset.  If the path failures come in faster than the forget rate, you
will eventually hit the error threshold. This also has the benefit of
easily not counting time when the path was down as time where the path
wasn't having problems. But if you don't like my idea, yours will
work fine with some polish.

-Ben


>    Below are the files that has been changed .
>     
>    libmultipath/config.c      |  3 +++
>    libmultipath/config.h      |  9 +++++++++
>    libmultipath/configure.c   |  3 +++
>    libmultipath/defaults.h    |  1 +
>    libmultipath/dict.c             | 80
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    libmultipath/dict.h        |  1 +
>    libmultipath/propsel.c     | 44
>    ++++++++++++++++++++++++++++++++++++++++++++
>    libmultipath/propsel.h     |  6 ++++++
>    libmultipath/structs.h     | 12 +++++++++++-
>    libmultipath/structs_vec.c | 10 ++++++++++
>    multipath/multipath.conf.5 | 58
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    multipathd/main.c          | 61
>    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>     
>    We have added three new config parameters whose description is below.
>    1.san_path_err_threshold:
>            If set to a value greater than 0, multipathd will watch paths and
>    check how many times a path has been failed due to errors. If the number
>    of failures on a particular path is greater then the
>    san_path_err_threshold then the path will not  reinstate  till
>    san_path_err_recovery_time. These path failures should occur within a
>    san_path_err_threshold_window time frame, if not we will consider the path
>    is good enough to reinstate.
>     
>    2.san_path_err_threshold_window:
>            If set to a value greater than 0, multipathd will check whether
>    the path failures has exceeded  the san_path_err_threshold within this
>    time frame i.e san_path_err_threshold_window . If so we will not reinstate
>    the path till          san_path_err_recovery_time.
>     
>    3.san_path_err_recovery_time:
>    If set to a value greater than 0, multipathd will make sure that when path
>    failures has exceeded the san_path_err_threshold within
>    san_path_err_threshold_window then the path  will be placed in failed
>    state for san_path_err_recovery_time duration. Once
>    san_path_err_recovery_time has timeout  we will reinstate the failed path
>    .
>     
>    Regards,
>    Muneendra.
>     
>    -----Original Message-----
>    From: Muneendra Kumar M
>    Sent: Wednesday, January 04, 2017 6:56 PM
>    To: 'Benjamin Marzinski' <bmarzins@redhat.com>
>    Cc: dm-devel@redhat.com
>    Subject: RE: [dm-devel] deterministic io throughput in multipath
>     
>    Hi Ben,
>    Thanks for the information.
>     
>    Regards,
>    Muneendra.
>     
>    -----Original Message-----
>    From: Benjamin Marzinski [[1]mailto:bmarzins@redhat.com]
>    Sent: Tuesday, January 03, 2017 10:42 PM
>    To: Muneendra Kumar M <[2]mmandala@Brocade.com>
>    Cc: [3]dm-devel@redhat.com
>    Subject: Re: [dm-devel] deterministic io throughput in multipath
>     
>    On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
>    > Hi Ben,
>    >
>    > If there are two paths on a dm-1 say sda and sdb as below.
>    >
>    > #  multipath -ll
>    >        mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN
>    MyLun
>    >        size=8.0M features='0' hwhandler='0' wp=rw
>    >        `-+- policy='round-robin 0' prio=50 status=active
>    >          |- 8:0:1:0  sda 8:48 active ready  running
>    >          `- 9:0:1:0  sdb 8:64 active ready  running         
>    >
>    > And on sda if iam seeing lot of errors due to which the sda path is
>    fluctuating from failed state to active state and vicevera.
>    >
>    > My requirement is something like this if sda is failed for more then 5
>    > times in a hour duration ,then I want to keep the sda in failed state
>    > for few hours (3hrs)
>    >
>    > And the data should travel only thorugh sdb path.
>    > Will this be possible with the below parameters.
>     
>    No. delay_watch_checks sets how may path checks you watch a path that has
>    recently come back from the failed state. If the path fails again within
>    this time, multipath device delays it.  This means that the delay is
>    always trigger by two failures within the time limit.  It's possible to
>    adapt this to count numbers of failures, and act after a certain number
>    within a certain timeframe, but it would take a bit more work.
>     
>    delay_wait_checks doesn't guarantee that it will delay for any set length
>    of time.  Instead, it sets the number of consecutive successful path
>    checks that must occur before the path is usable again. You could set this
>    for 3 hours of path checks, but if a check failed during this time, you
>    would restart the 3 hours over again.
>     
>    -Ben
>     
>    > Can you just let me know what values I should add for delay_watch_checks
>    and delay_wait_checks.
>    >
>    > Regards,
>    > Muneendra.
>    >
>    >
>    >
>    > -----Original Message-----
>    > From: Muneendra Kumar M
>    > Sent: Thursday, December 22, 2016 11:10 AM
>    > To: 'Benjamin Marzinski' <[4]bmarzins@redhat.com>
>    > Cc: [5]dm-devel@redhat.com
>    > Subject: RE: [dm-devel] deterministic io throughput in multipath
>    >
>    > Hi Ben,
>    >
>    > Thanks for the reply.
>    > I will look into this parameters will do the internal testing and let
>    you know the results.
>    >
>    > Regards,
>    > Muneendra.
>    >
>    > -----Original Message-----
>    > From: Benjamin Marzinski [[6]mailto:bmarzins@redhat.com]
>    > Sent: Wednesday, December 21, 2016 9:40 PM
>    > To: Muneendra Kumar M <[7]mmandala@Brocade.com>
>    > Cc: [8]dm-devel@redhat.com
>    > Subject: Re: [dm-devel] deterministic io throughput in multipath
>    >
>    > Have you looked into the delay_watch_checks and delay_wait_checks
>    configuration parameters?  The idea behind them is to minimize the use of
>    paths that are intermittently failing.
>    >
>    > -Ben
>    >
>    > On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
>    > >    Customers using Linux host (mostly RHEL host) using a SAN network
>    for
>    > >    block storage, complain the Linux multipath stack is not resilient
>    to
>    > >    handle non-deterministic storage network behaviors. This has caused
>    many
>    > >    customer move away to non-linux based servers. The intent of the
>    below
>    > >    patch and the prevailing issues are given below. With the below
>    design we
>    > >    are seeing the Linux multipath stack becoming resilient to such
>    network
>    > >    issues. We hope by getting this patch accepted will help in more
>    Linux
>    > >    server adoption that use SAN network.
>    > >
>    > >    I have already sent the design details to the community in a
>    different
>    > >    mail chain and the details are available in the below link.
>    > >
>    > >   
>    [1][9]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e=
>    .
>    > >
>    > >    Can you please go through the design and send the comments to us.
>    > >
>    > >     
>    > >
>    > >    Regards,
>    > >
>    > >    Muneendra.
>    > >
>    > >     
>    > >
>    > >     
>    > >
>    > > References
>    > >
>    > >    Visible links
>    > >    1.
>    > >
>    [10]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    > > ar
>    > > chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
>    > > ub
>    > > gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
>    > > 1K
>    > > XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
>    > > Ru
>    > > 52hG3MKzM&e=
>    >
>    > > --
>    > > dm-devel mailing list
>    > > [11]dm-devel@redhat.com
>    > >
>    [12]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    > > ma
>    > > ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
>    > > 7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
>    > > i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=
>     
> 
> References
> 
>    Visible links
>    1. mailto:bmarzins@redhat.com
>    2. mailto:mmandala@brocade.com
>    3. mailto:dm-devel@redhat.com
>    4. mailto:bmarzins@redhat.com
>    5. mailto:dm-devel@redhat.com
>    6. mailto:bmarzins@redhat.com
>    7. mailto:mmandala@brocade.com
>    8. mailto:dm-devel@redhat.com
>    9. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
>   10. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>   11. mailto:dm-devel@redhat.com
>   12. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-01-17  1:04         ` Benjamin Marzinski
@ 2017-01-17 10:43           ` Muneendra Kumar M
  2017-01-23 11:02           ` Muneendra Kumar M
  1 sibling, 0 replies; 21+ messages in thread
From: Muneendra Kumar M @ 2017-01-17 10:43 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 14819 bytes --]

Hi Ben,
Thanks for the review.
In dict.c  I will make sure I will make generic functions which will be used by both delay_checks and err_checks.

We want to increment the path failures every time the path goes down regardless of whether multipathd or the kernel noticed the failure of paths.
Thanks for pointing this.

I will completely agree with the idea which you mentioned below by reconsidering the san_path_err_threshold_window with
san_path_err_forget_rate. This will avoid counting time when the path was down as time where the path wasn't having problems.

I will incorporate all the changes mentioned below and will resend the patch once the testing is done.

Regards,
Muneendra.



-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com]
Sent: Tuesday, January 17, 2017 6:35 AM
To: Muneendra Kumar M <mmandala@Brocade.com>
Cc: dm-devel@redhat.com
Subject: Re: [dm-devel] deterministic io throughput in multipath

On Mon, Jan 16, 2017 at 11:19:19AM +0000, Muneendra Kumar M wrote:
>    Hi Ben,
>    After the below discussion we  came with the approach which will meet our
>    requirement.
>    I have attached the patch which is working good in our field tests.
>    Could you please review the attached patch and provide us your valuable
>    comments .

I can see a number of issues with this patch.

First, some nit-picks:
- I assume "dis_reinstante_time" should be "dis_reinstate_time"

- The indenting in check_path_validity_err is wrong, which made it
  confusing until I noticed that

if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)

  doesn't have an open brace, and shouldn't indent the rest of the
  function.

- You call clock_gettime in check_path, but never use the result.

- In dict.c, instead of writing your own functions that are the same as
  the *_delay_checks functions, you could make those functions generic
  and use them for both.  To go match the other generic function names
  they would probably be something like

set_off_int_undef

print_off_int_undef

  You would also need to change DELAY_CHECKS_* and ERR_CHECKS_* to
  point to some common enum that you created, the way
  user_friendly_names_states (to name one of many) does. The generic
  enum used by *_off_int_undef would be something like.

enum no_undef {
        NU_NO = -1,
        NU_UNDEF = 0,
}

  The idea is to try to cut down on the number of functions that are
  simply copy-pasting other functions in dict.c.


Those are all minor cleanup issues, but there are some bigger problems.

Instead of checking if san_path_err_threshold, san_path_err_threshold_window, and san_path_err_recovery_time are greater than zero seperately, you should probably check them all at the start of check_path_validity_err, and return 0 unless they all are set.
Right now, if a user sets san_path_err_threshold and san_path_err_threshold_window but not san_path_err_recovery_time, their path will never recover after it hits the error threshold.  I pretty sure that you don't mean to permanently disable the paths.


time_t is a signed type, which means that if you get the clock time in update_multpath and then fail to get the clock time in check_path_validity_err, this check:

start_time.tv_sec - pp->failure_start_time) < pp->mpp->san_path_err_threshold_window

will always be true.  I realize that clock_gettime is very unlikely to fail.  But if it does, probably the safest thing to so is to just immediately return 0 in check_path_validity_err.


The way you set path_failures in update_multipath may not get you what you want.  It will only count path failures found by the kernel, and not the path checker.  If the check_path finds the error, pp->state will be set to PATH_DOWN before pp->dmstate is set to PSTATE_FAILED. That means you will not increment path_failures. Perhaps this is what you want, but I would assume that you would want to count every time the path goes down regardless of whether multipathd or the kernel noticed it.


I'm not super enthusiastic about how the san_path_err_threshold_window works.  First, it starts counting from when the path goes down, so if the path takes long enough to get restored, and then fails immediately, it can just keep failing and it will never hit the san_path_err_threshold_window, since it spends so much of that time with the path failed.  Also, the window gets set on the first error, and never reset until the number of errors is over the threshold.  This means that if you get one early error and then a bunch of errors much later, you will go for (2 x san_path_err_threshold) - 1 errors until you stop reinstating the path, because of the window reset in the middle of the string of errors.  It seems like a better idea would be to have check_path_validity_err reset path_failures 
 as soon as it notices that you are past san_path_err_threshold_window, instead of waiting till the number of errors hits san_path_err_threshold.


If I was going to design this, I think I would have san_path_err_threshold and san_path_err_recovery_time like you do, but instead of having a san_path_err_threshold_window, I would have something like san_path_err_forget_rate.  The idea is that every san_path_err_forget_rate number of successful path checks you decrement path_failures by 1. This means that there is no window after which you reset.  If the path failures come in faster than the forget rate, you will eventually hit the error threshold. This also has the benefit of easily not counting time when the path was down as time where the path wasn't having problems. But if you don't like my idea, yours will work fine with some polish.

-Ben


>    Below are the files that has been changed .
>
>    libmultipath/config.c      |  3 +++
>    libmultipath/config.h      |  9 +++++++++
>    libmultipath/configure.c   |  3 +++
>    libmultipath/defaults.h    |  1 +
>    libmultipath/dict.c             | 80
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    libmultipath/dict.h        |  1 +
>    libmultipath/propsel.c     | 44
>    ++++++++++++++++++++++++++++++++++++++++++++
>    libmultipath/propsel.h     |  6 ++++++
>    libmultipath/structs.h     | 12 +++++++++++-
>    libmultipath/structs_vec.c | 10 ++++++++++
>    multipath/multipath.conf.5 | 58
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    multipathd/main.c          | 61
>    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>
>    We have added three new config parameters whose description is below.
>    1.san_path_err_threshold:
>            If set to a value greater than 0, multipathd will watch paths and
>    check how many times a path has been failed due to errors. If the number
>    of failures on a particular path is greater then the
>    san_path_err_threshold then the path will not  reinstate  till
>    san_path_err_recovery_time. These path failures should occur within a
>    san_path_err_threshold_window time frame, if not we will consider the path
>    is good enough to reinstate.
>
>    2.san_path_err_threshold_window:
>            If set to a value greater than 0, multipathd will check whether
>    the path failures has exceeded  the san_path_err_threshold within this
>    time frame i.e san_path_err_threshold_window . If so we will not reinstate
>    the path till          san_path_err_recovery_time.
>
>    3.san_path_err_recovery_time:
>    If set to a value greater than 0, multipathd will make sure that when path
>    failures has exceeded the san_path_err_threshold within
>    san_path_err_threshold_window then the path  will be placed in failed
>    state for san_path_err_recovery_time duration. Once
>    san_path_err_recovery_time has timeout  we will reinstate the failed path
>    .
>
>    Regards,
>    Muneendra.
>
>    -----Original Message-----
>    From: Muneendra Kumar M
>    Sent: Wednesday, January 04, 2017 6:56 PM
>    To: 'Benjamin Marzinski' <bmarzins@redhat.com<mailto:bmarzins@redhat.com>>
>    Cc: dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    Subject: RE: [dm-devel] deterministic io throughput in multipath
>
>    Hi Ben,
>    Thanks for the information.
>
>    Regards,
>    Muneendra.
>
>    -----Original Message-----
>    From: Benjamin Marzinski [[1]mailto:bmarzins@redhat.com]
>    Sent: Tuesday, January 03, 2017 10:42 PM
>    To: Muneendra Kumar M <[2]mmandala@Brocade.com<mailto:mmandala@Brocade.com>>
>    Cc: [3]dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    Subject: Re: [dm-devel] deterministic io throughput in multipath
>
>    On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
>    > Hi Ben,
>    >
>    > If there are two paths on a dm-1 say sda and sdb as below.
>    >
>    > #  multipath -ll
>    >        mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN
>    MyLun
>    >        size=8.0M features='0' hwhandler='0' wp=rw
>    >        `-+- policy='round-robin 0' prio=50 status=active
>    >          |- 8:0:1:0  sda 8:48 active ready  running
>    >          `- 9:0:1:0  sdb 8:64 active ready  running
>    >
>    > And on sda if iam seeing lot of errors due to which the sda path is
>    fluctuating from failed state to active state and vicevera.
>    >
>    > My requirement is something like this if sda is failed for more then 5
>    > times in a hour duration ,then I want to keep the sda in failed state
>    > for few hours (3hrs)
>    >
>    > And the data should travel only thorugh sdb path.
>    > Will this be possible with the below parameters.
>
>    No. delay_watch_checks sets how may path checks you watch a path that has
>    recently come back from the failed state. If the path fails again within
>    this time, multipath device delays it.  This means that the delay is
>    always trigger by two failures within the time limit.  It's possible to
>    adapt this to count numbers of failures, and act after a certain number
>    within a certain timeframe, but it would take a bit more work.
>
>    delay_wait_checks doesn't guarantee that it will delay for any set length
>    of time.  Instead, it sets the number of consecutive successful path
>    checks that must occur before the path is usable again. You could set this
>    for 3 hours of path checks, but if a check failed during this time, you
>    would restart the 3 hours over again.
>
>    -Ben
>
>    > Can you just let me know what values I should add for delay_watch_checks
>    and delay_wait_checks.
>    >
>    > Regards,
>    > Muneendra.
>    >
>    >
>    >
>    > -----Original Message-----
>    > From: Muneendra Kumar M
>    > Sent: Thursday, December 22, 2016 11:10 AM
>    > To: 'Benjamin Marzinski' <[4]bmarzins@redhat.com<mailto:bmarzins@redhat.com>>
>    > Cc: [5]dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    > Subject: RE: [dm-devel] deterministic io throughput in multipath
>    >
>    > Hi Ben,
>    >
>    > Thanks for the reply.
>    > I will look into this parameters will do the internal testing and let
>    you know the results.
>    >
>    > Regards,
>    > Muneendra.
>    >
>    > -----Original Message-----
>    > From: Benjamin Marzinski [[6]mailto:bmarzins@redhat.com]
>    > Sent: Wednesday, December 21, 2016 9:40 PM
>    > To: Muneendra Kumar M <[7]mmandala@Brocade.com<mailto:mmandala@Brocade.com>>
>    > Cc: [8]dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    > Subject: Re: [dm-devel] deterministic io throughput in multipath
>    >
>    > Have you looked into the delay_watch_checks and delay_wait_checks
>    configuration parameters?  The idea behind them is to minimize the use of
>    paths that are intermittently failing.
>    >
>    > -Ben
>    >
>    > On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
>    > >    Customers using Linux host (mostly RHEL host) using a SAN network
>    for
>    > >    block storage, complain the Linux multipath stack is not resilient
>    to
>    > >    handle non-deterministic storage network behaviors. This has caused
>    many
>    > >    customer move away to non-linux based servers. The intent of the
>    below
>    > >    patch and the prevailing issues are given below. With the below
>    design we
>    > >    are seeing the Linux multipath stack becoming resilient to such
>    network
>    > >    issues. We hope by getting this patch accepted will help in more
>    Linux
>    > >    server adoption that use SAN network.
>    > >
>    > >    I have already sent the design details to the community in a
>    different
>    > >    mail chain and the details are available in the below link.
>    > >
>    > >
>    [1][9]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e=
>    .
>    > >
>    > >    Can you please go through the design and send the comments to us.
>    > >
>    > >
>    > >
>    > >    Regards,
>    > >
>    > >    Muneendra.
>    > >
>    > >
>    > >
>    > >
>    > >
>    > > References
>    > >
>    > >    Visible links
>    > >    1.
>    > >
>    [10]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    > > ar
>    > > chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
>    > > ub
>    > > gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
>    > > 1K
>    > > XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
>    > > Ru
>    > > 52hG3MKzM&e=
>    >
>    > > --
>    > > dm-devel mailing list
>    > > [11]dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    > >
>    [12]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    > > ma
>    > > ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
>    > > 7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
>    > >
> i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=
>
>
> References
>
>    Visible links
>    1. mailto:bmarzins@redhat.com
>    2. mailto:mmandala@brocade.com
>    3. mailto:dm-devel@redhat.com
>    4. mailto:bmarzins@redhat.com
>    5. mailto:dm-devel@redhat.com
>    6. mailto:bmarzins@redhat.com
>    7. mailto:mmandala@brocade.com
>    8. mailto:dm-devel@redhat.com
>    9. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
>   10. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>   11. mailto:dm-devel@redhat.com
>   12.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_





[-- Attachment #1.2: Type: text/html, Size: 28081 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-01-17  1:04         ` Benjamin Marzinski
  2017-01-17 10:43           ` Muneendra Kumar M
@ 2017-01-23 11:02           ` Muneendra Kumar M
  2017-01-25  9:28             ` Benjamin Marzinski
  1 sibling, 1 reply; 21+ messages in thread
From: Muneendra Kumar M @ 2017-01-23 11:02 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 16054 bytes --]

Hi Ben,
I have made the changes as per the below review comments .

Could you please review the attached patch and provide us your valuable comments .

Below are the files that has been changed .

libmultipath/config.c      |  3 +++
 libmultipath/config.h      |  9 +++++++++
 libmultipath/configure.c   |  3 +++
 libmultipath/defaults.h    |  3 ++-
 libmultipath/dict.c        | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
 libmultipath/dict.h        |  3 +--
 libmultipath/propsel.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 libmultipath/propsel.h     |  3 +++
 libmultipath/structs.h     | 14 ++++++++++----
 libmultipath/structs_vec.c |  6 ++++++
 multipath/multipath.conf.5 | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 multipathd/main.c          | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---


Regards,
Muneendra.

_____________________________________________
From: Muneendra Kumar M
Sent: Tuesday, January 17, 2017 4:13 PM
To: 'Benjamin Marzinski' <bmarzins@redhat.com>
Cc: dm-devel@redhat.com
Subject: RE: [dm-devel] deterministic io throughput in multipath


Hi Ben,
Thanks for the review.
In dict.c  I will make sure I will make generic functions which will be used by both delay_checks and err_checks.

We want to increment the path failures every time the path goes down regardless of whether multipathd or the kernel noticed the failure of paths.
Thanks for pointing this.

I will completely agree with the idea which you mentioned below by reconsidering the san_path_err_threshold_window with
san_path_err_forget_rate. This will avoid counting time when the path was down as time where the path wasn't having problems.

I will incorporate all the changes mentioned below and will resend the patch once the testing is done.

Regards,
Muneendra.



-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com]
Sent: Tuesday, January 17, 2017 6:35 AM
To: Muneendra Kumar M <mmandala@Brocade.com<mailto:mmandala@Brocade.com>>
Cc: dm-devel@redhat.com<mailto:dm-devel@redhat.com>
Subject: Re: [dm-devel] deterministic io throughput in multipath

On Mon, Jan 16, 2017 at 11:19:19AM +0000, Muneendra Kumar M wrote:
>    Hi Ben,
>    After the below discussion we  came with the approach which will meet our
>    requirement.
>    I have attached the patch which is working good in our field tests.
>    Could you please review the attached patch and provide us your valuable
>    comments .

I can see a number of issues with this patch.

First, some nit-picks:
- I assume "dis_reinstante_time" should be "dis_reinstate_time"

- The indenting in check_path_validity_err is wrong, which made it
  confusing until I noticed that

if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)

  doesn't have an open brace, and shouldn't indent the rest of the
  function.

- You call clock_gettime in check_path, but never use the result.

- In dict.c, instead of writing your own functions that are the same as
  the *_delay_checks functions, you could make those functions generic
  and use them for both.  To go match the other generic function names
  they would probably be something like

set_off_int_undef

print_off_int_undef

  You would also need to change DELAY_CHECKS_* and ERR_CHECKS_* to
  point to some common enum that you created, the way
  user_friendly_names_states (to name one of many) does. The generic
  enum used by *_off_int_undef would be something like.

enum no_undef {
        NU_NO = -1,
        NU_UNDEF = 0,
}

  The idea is to try to cut down on the number of functions that are
  simply copy-pasting other functions in dict.c.


Those are all minor cleanup issues, but there are some bigger problems.

Instead of checking if san_path_err_threshold, san_path_err_threshold_window, and san_path_err_recovery_time are greater than zero seperately, you should probably check them all at the start of check_path_validity_err, and return 0 unless they all are set.
Right now, if a user sets san_path_err_threshold and san_path_err_threshold_window but not san_path_err_recovery_time, their path will never recover after it hits the error threshold.  I pretty sure that you don't mean to permanently disable the paths.


time_t is a signed type, which means that if you get the clock time in update_multpath and then fail to get the clock time in check_path_validity_err, this check:

start_time.tv_sec - pp->failure_start_time) < pp->mpp->san_path_err_threshold_window

will always be true.  I realize that clock_gettime is very unlikely to fail.  But if it does, probably the safest thing to so is to just immediately return 0 in check_path_validity_err.


The way you set path_failures in update_multipath may not get you what you want.  It will only count path failures found by the kernel, and not the path checker.  If the check_path finds the error, pp->state will be set to PATH_DOWN before pp->dmstate is set to PSTATE_FAILED. That means you will not increment path_failures. Perhaps this is what you want, but I would assume that you would want to count every time the path goes down regardless of whether multipathd or the kernel noticed it.


I'm not super enthusiastic about how the san_path_err_threshold_window works.  First, it starts counting from when the path goes down, so if the path takes long enough to get restored, and then fails immediately, it can just keep failing and it will never hit the san_path_err_threshold_window, since it spends so much of that time with the path failed.  Also, the window gets set on the first error, and never reset until the number of errors is over the threshold.  This means that if you get one early error and then a bunch of errors much later, you will go for (2 x san_path_err_threshold) - 1 errors until you stop reinstating the path, because of the window reset in the middle of the string of errors.  It seems like a better idea would be to have check_path_validity_err reset path_failures 
 as soon as it notices that you are past san_path_err_threshold_window, instead of waiting till the number of errors hits san_path_err_threshold.


If I was going to design this, I think I would have san_path_err_threshold and san_path_err_recovery_time like you do, but instead of having a san_path_err_threshold_window, I would have something like san_path_err_forget_rate.  The idea is that every san_path_err_forget_rate number of successful path checks you decrement path_failures by 1. This means that there is no window after which you reset.  If the path failures come in faster than the forget rate, you will eventually hit the error threshold. This also has the benefit of easily not counting time when the path was down as time where the path wasn't having problems. But if you don't like my idea, yours will work fine with some polish.

-Ben


>    Below are the files that has been changed .
>
>    libmultipath/config.c      |  3 +++
>    libmultipath/config.h      |  9 +++++++++
>    libmultipath/configure.c   |  3 +++
>    libmultipath/defaults.h    |  1 +
>    libmultipath/dict.c             | 80
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    libmultipath/dict.h        |  1 +
>    libmultipath/propsel.c     | 44
>    ++++++++++++++++++++++++++++++++++++++++++++
>    libmultipath/propsel.h     |  6 ++++++
>    libmultipath/structs.h     | 12 +++++++++++-
>    libmultipath/structs_vec.c | 10 ++++++++++
>    multipath/multipath.conf.5 | 58
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    multipathd/main.c          | 61
>    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>
>    We have added three new config parameters whose description is below.
>    1.san_path_err_threshold:
>            If set to a value greater than 0, multipathd will watch paths and
>    check how many times a path has been failed due to errors. If the number
>    of failures on a particular path is greater then the
>    san_path_err_threshold then the path will not  reinstate  till
>    san_path_err_recovery_time. These path failures should occur within a
>    san_path_err_threshold_window time frame, if not we will consider the path
>    is good enough to reinstate.
>
>    2.san_path_err_threshold_window:
>            If set to a value greater than 0, multipathd will check whether
>    the path failures has exceeded  the san_path_err_threshold within this
>    time frame i.e san_path_err_threshold_window . If so we will not reinstate
>    the path till          san_path_err_recovery_time.
>
>    3.san_path_err_recovery_time:
>    If set to a value greater than 0, multipathd will make sure that when path
>    failures has exceeded the san_path_err_threshold within
>    san_path_err_threshold_window then the path  will be placed in failed
>    state for san_path_err_recovery_time duration. Once
>    san_path_err_recovery_time has timeout  we will reinstate the failed path
>    .
>
>    Regards,
>    Muneendra.
>
>    -----Original Message-----
>    From: Muneendra Kumar M
>    Sent: Wednesday, January 04, 2017 6:56 PM
>    To: 'Benjamin Marzinski' <bmarzins@redhat.com<mailto:bmarzins@redhat.com>>
>    Cc: dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    Subject: RE: [dm-devel] deterministic io throughput in multipath
>
>    Hi Ben,
>    Thanks for the information.
>
>    Regards,
>    Muneendra.
>
>    -----Original Message-----
>    From: Benjamin Marzinski [[1]mailto:bmarzins@redhat.com]
>    Sent: Tuesday, January 03, 2017 10:42 PM
>    To: Muneendra Kumar M <[2]mmandala@Brocade.com<mailto:mmandala@Brocade.com>>
>    Cc: [3]dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    Subject: Re: [dm-devel] deterministic io throughput in multipath
>
>    On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
>    > Hi Ben,
>    >
>    > If there are two paths on a dm-1 say sda and sdb as below.
>    >
>    > #  multipath -ll
>    >        mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN
>    MyLun
>    >        size=8.0M features='0' hwhandler='0' wp=rw
>    >        `-+- policy='round-robin 0' prio=50 status=active
>    >          |- 8:0:1:0  sda 8:48 active ready  running
>    >          `- 9:0:1:0  sdb 8:64 active ready  running
>    >
>    > And on sda if iam seeing lot of errors due to which the sda path is
>    fluctuating from failed state to active state and vicevera.
>    >
>    > My requirement is something like this if sda is failed for more then 5
>    > times in a hour duration ,then I want to keep the sda in failed state
>    > for few hours (3hrs)
>    >
>    > And the data should travel only thorugh sdb path.
>    > Will this be possible with the below parameters.
>
>    No. delay_watch_checks sets how may path checks you watch a path that has
>    recently come back from the failed state. If the path fails again within
>    this time, multipath device delays it.  This means that the delay is
>    always trigger by two failures within the time limit.  It's possible to
>    adapt this to count numbers of failures, and act after a certain number
>    within a certain timeframe, but it would take a bit more work.
>
>    delay_wait_checks doesn't guarantee that it will delay for any set length
>    of time.  Instead, it sets the number of consecutive successful path
>    checks that must occur before the path is usable again. You could set this
>    for 3 hours of path checks, but if a check failed during this time, you
>    would restart the 3 hours over again.
>
>    -Ben
>
>    > Can you just let me know what values I should add for delay_watch_checks
>    and delay_wait_checks.
>    >
>    > Regards,
>    > Muneendra.
>    >
>    >
>    >
>    > -----Original Message-----
>    > From: Muneendra Kumar M
>    > Sent: Thursday, December 22, 2016 11:10 AM
>    > To: 'Benjamin Marzinski' <[4]bmarzins@redhat.com<mailto:bmarzins@redhat.com>>
>    > Cc: [5]dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    > Subject: RE: [dm-devel] deterministic io throughput in multipath
>    >
>    > Hi Ben,
>    >
>    > Thanks for the reply.
>    > I will look into this parameters will do the internal testing and let
>    you know the results.
>    >
>    > Regards,
>    > Muneendra.
>    >
>    > -----Original Message-----
>    > From: Benjamin Marzinski [[6]mailto:bmarzins@redhat.com]
>    > Sent: Wednesday, December 21, 2016 9:40 PM
>    > To: Muneendra Kumar M <[7]mmandala@Brocade.com<mailto:mmandala@Brocade.com>>
>    > Cc: [8]dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    > Subject: Re: [dm-devel] deterministic io throughput in multipath
>    >
>    > Have you looked into the delay_watch_checks and delay_wait_checks
>    configuration parameters?  The idea behind them is to minimize the use of
>    paths that are intermittently failing.
>    >
>    > -Ben
>    >
>    > On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
>    > >    Customers using Linux host (mostly RHEL host) using a SAN network
>    for
>    > >    block storage, complain the Linux multipath stack is not resilient
>    to
>    > >    handle non-deterministic storage network behaviors. This has caused
>    many
>    > >    customer move away to non-linux based servers. The intent of the
>    below
>    > >    patch and the prevailing issues are given below. With the below
>    design we
>    > >    are seeing the Linux multipath stack becoming resilient to such
>    network
>    > >    issues. We hope by getting this patch accepted will help in more
>    Linux
>    > >    server adoption that use SAN network.
>    > >
>    > >    I have already sent the design details to the community in a
>    different
>    > >    mail chain and the details are available in the below link.
>    > >
>    > >
>    [1][9]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e=
>    .
>    > >
>    > >    Can you please go through the design and send the comments to us.
>    > >
>    > >
>    > >
>    > >    Regards,
>    > >
>    > >    Muneendra.
>    > >
>    > >
>    > >
>    > >
>    > >
>    > > References
>    > >
>    > >    Visible links
>    > >    1.
>    > >
>    [10]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    > > ar
>    > > chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
>    > > ub
>    > > gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
>    > > 1K
>    > > XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
>    > > Ru
>    > > 52hG3MKzM&e=
>    >
>    > > --
>    > > dm-devel mailing list
>    > > [11]dm-devel@redhat.com<mailto:dm-devel@redhat.com>
>    > >
>    [12]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    > > ma
>    > > ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
>    > > 7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
>    > >
> i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=
>
>
> References
>
>    Visible links
>    1. mailto:bmarzins@redhat.com
>    2. mailto:mmandala@brocade.com
>    3. mailto:dm-devel@redhat.com
>    4. mailto:bmarzins@redhat.com
>    5. mailto:dm-devel@redhat.com
>    6. mailto:bmarzins@redhat.com
>    7. mailto:mmandala@brocade.com
>    8. mailto:dm-devel@redhat.com
>    9. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
>   10. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>   11. mailto:dm-devel@redhat.com
>   12.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_





[-- Attachment #1.2: Type: text/html, Size: 31945 bytes --]

[-- Attachment #2: san_path_error.patch --]
[-- Type: application/octet-stream, Size: 23007 bytes --]

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 15ddbd8..be384af 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -348,6 +348,9 @@ merge_hwe (struct hwentry * dst, struct hwentry * src)
 	merge_num(delay_wait_checks);
 	merge_num(skip_kpartx);
 	merge_num(max_sectors_kb);
+	merge_num(san_path_err_threshold);
+	merge_num(san_path_err_forget_rate);
+	merge_num(san_path_err_recovery_time);
 
 	/*
 	 * Make sure features is consistent with
diff --git a/libmultipath/config.h b/libmultipath/config.h
index 9670020..9e47894 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -65,6 +65,9 @@ struct hwentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	char * bl_product;
@@ -93,6 +96,9 @@ struct mpentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	uid_t uid;
@@ -138,6 +144,9 @@ struct config {
 	int processed_main_config;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int uxsock_timeout;
 	int strict_timing;
 	int retrigger_tries;
diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index a0fcad9..5ad3007 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -294,6 +294,9 @@ int setup_map(struct multipath *mpp, char *params, int params_size)
 	select_deferred_remove(conf, mpp);
 	select_delay_watch_checks(conf, mpp);
 	select_delay_wait_checks(conf, mpp);
+	select_san_path_err_threshold(conf, mpp);
+	select_san_path_err_forget_rate(conf, mpp);
+	select_san_path_err_recovery_time(conf, mpp);
 	select_skip_kpartx(conf, mpp);
 	select_max_sectors_kb(conf, mpp);
 
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index b9b0a37..3ef1579 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
@@ -23,7 +23,8 @@
 #define DEFAULT_RETAIN_HWHANDLER RETAIN_HWHANDLER_ON
 #define DEFAULT_DETECT_PRIO	DETECT_PRIO_ON
 #define DEFAULT_DEFERRED_REMOVE	DEFERRED_REMOVE_OFF
-#define DEFAULT_DELAY_CHECKS	DELAY_CHECKS_OFF
+#define DEFAULT_DELAY_CHECKS	NU_NO
+#define DEFAULT_ERR_CHECKS	NU_NO
 #define DEFAULT_UEVENT_STACKSIZE 256
 #define DEFAULT_RETRIGGER_DELAY	10
 #define DEFAULT_RETRIGGER_TRIES	3
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index dc21846..4754572 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -1023,7 +1023,7 @@ declare_mp_handler(reservation_key, set_reservation_key)
 declare_mp_snprint(reservation_key, print_reservation_key)
 
 static int
-set_delay_checks(vector strvec, void *ptr)
+set_off_int_undef(vector strvec, void *ptr)
 {
 	int *int_ptr = (int *)ptr;
 	char * buff;
@@ -1033,47 +1033,69 @@ set_delay_checks(vector strvec, void *ptr)
 		return 1;
 
 	if (!strcmp(buff, "no") || !strcmp(buff, "0"))
-		*int_ptr = DELAY_CHECKS_OFF;
+		*int_ptr = NU_NO;
 	else if ((*int_ptr = atoi(buff)) < 1)
-		*int_ptr = DELAY_CHECKS_UNDEF;
+		*int_ptr = NU_UNDEF;
 
 	FREE(buff);
 	return 0;
 }
 
 int
-print_delay_checks(char * buff, int len, void *ptr)
+print_off_int_undef(char * buff, int len, void *ptr)
 {
 	int *int_ptr = (int *)ptr;
 
 	switch(*int_ptr) {
-	case DELAY_CHECKS_UNDEF:
+	case NU_UNDEF:
 		return 0;
-	case DELAY_CHECKS_OFF:
+	case NU_NO:
 		return snprintf(buff, len, "\"off\"");
 	default:
 		return snprintf(buff, len, "%i", *int_ptr);
 	}
 }
 
-declare_def_handler(delay_watch_checks, set_delay_checks)
-declare_def_snprint(delay_watch_checks, print_delay_checks)
-declare_ovr_handler(delay_watch_checks, set_delay_checks)
-declare_ovr_snprint(delay_watch_checks, print_delay_checks)
-declare_hw_handler(delay_watch_checks, set_delay_checks)
-declare_hw_snprint(delay_watch_checks, print_delay_checks)
-declare_mp_handler(delay_watch_checks, set_delay_checks)
-declare_mp_snprint(delay_watch_checks, print_delay_checks)
-
-declare_def_handler(delay_wait_checks, set_delay_checks)
-declare_def_snprint(delay_wait_checks, print_delay_checks)
-declare_ovr_handler(delay_wait_checks, set_delay_checks)
-declare_ovr_snprint(delay_wait_checks, print_delay_checks)
-declare_hw_handler(delay_wait_checks, set_delay_checks)
-declare_hw_snprint(delay_wait_checks, print_delay_checks)
-declare_mp_handler(delay_wait_checks, set_delay_checks)
-declare_mp_snprint(delay_wait_checks, print_delay_checks)
-
+declare_def_handler(delay_watch_checks, set_off_int_undef)
+declare_def_snprint(delay_watch_checks, print_off_int_undef)
+declare_ovr_handler(delay_watch_checks, set_off_int_undef)
+declare_ovr_snprint(delay_watch_checks, print_off_int_undef)
+declare_hw_handler(delay_watch_checks, set_off_int_undef)
+declare_hw_snprint(delay_watch_checks, print_off_int_undef)
+declare_mp_handler(delay_watch_checks, set_off_int_undef)
+declare_mp_snprint(delay_watch_checks, print_off_int_undef)
+declare_def_handler(delay_wait_checks, set_off_int_undef)
+declare_def_snprint(delay_wait_checks, print_off_int_undef)
+declare_ovr_handler(delay_wait_checks, set_off_int_undef)
+declare_ovr_snprint(delay_wait_checks, print_off_int_undef)
+declare_hw_handler(delay_wait_checks, set_off_int_undef)
+declare_hw_snprint(delay_wait_checks, print_off_int_undef)
+declare_mp_handler(delay_wait_checks, set_off_int_undef)
+declare_mp_snprint(delay_wait_checks, print_off_int_undef)
+declare_def_handler(san_path_err_threshold, set_off_int_undef)
+declare_def_snprint(san_path_err_threshold, print_off_int_undef)
+declare_ovr_handler(san_path_err_threshold, set_off_int_undef)
+declare_ovr_snprint(san_path_err_threshold, print_off_int_undef)
+declare_hw_handler(san_path_err_threshold, set_off_int_undef)
+declare_hw_snprint(san_path_err_threshold, print_off_int_undef)
+declare_mp_handler(san_path_err_threshold, set_off_int_undef)
+declare_mp_snprint(san_path_err_threshold, print_off_int_undef)
+declare_def_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_def_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_ovr_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_ovr_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_hw_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_hw_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_mp_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_mp_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_def_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_def_snprint(san_path_err_recovery_time, print_off_int_undef)
+declare_ovr_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_ovr_snprint(san_path_err_recovery_time, print_off_int_undef)
+declare_hw_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_hw_snprint(san_path_err_recovery_time, print_off_int_undef)
+declare_mp_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_mp_snprint(san_path_err_recovery_time, print_off_int_undef)
 static int
 def_uxsock_timeout_handler(struct config *conf, vector strvec)
 {
@@ -1404,6 +1426,10 @@ init_keywords(vector keywords)
 	install_keyword("config_dir", &def_config_dir_handler, &snprint_def_config_dir);
 	install_keyword("delay_watch_checks", &def_delay_watch_checks_handler, &snprint_def_delay_watch_checks);
 	install_keyword("delay_wait_checks", &def_delay_wait_checks_handler, &snprint_def_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &def_san_path_err_threshold_handler, &snprint_def_san_path_err_threshold);
+        install_keyword("san_path_err_forget_rate", &def_san_path_err_forget_rate_handler, &snprint_def_san_path_err_forget_rate);
+        install_keyword("san_path_err_recovery_time", &def_san_path_err_recovery_time_handler, &snprint_def_san_path_err_recovery_time);
+
 	install_keyword("find_multipaths", &def_find_multipaths_handler, &snprint_def_find_multipaths);
 	install_keyword("uxsock_timeout", &def_uxsock_timeout_handler, &snprint_def_uxsock_timeout);
 	install_keyword("retrigger_tries", &def_retrigger_tries_handler, &snprint_def_retrigger_tries);
@@ -1486,6 +1512,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &hw_deferred_remove_handler, &snprint_hw_deferred_remove);
 	install_keyword("delay_watch_checks", &hw_delay_watch_checks_handler, &snprint_hw_delay_watch_checks);
 	install_keyword("delay_wait_checks", &hw_delay_wait_checks_handler, &snprint_hw_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &hw_san_path_err_threshold_handler, &snprint_hw_san_path_err_threshold);
+        install_keyword("san_path_err_forget_rate", &hw_san_path_err_forget_rate_handler, &snprint_hw_san_path_err_forget_rate);
+        install_keyword("san_path_err_recovery_time", &hw_san_path_err_recovery_time_handler, &snprint_hw_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &hw_skip_kpartx_handler, &snprint_hw_skip_kpartx);
 	install_keyword("max_sectors_kb", &hw_max_sectors_kb_handler, &snprint_hw_max_sectors_kb);
 	install_sublevel_end();
@@ -1515,6 +1544,10 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &ovr_deferred_remove_handler, &snprint_ovr_deferred_remove);
 	install_keyword("delay_watch_checks", &ovr_delay_watch_checks_handler, &snprint_ovr_delay_watch_checks);
 	install_keyword("delay_wait_checks", &ovr_delay_wait_checks_handler, &snprint_ovr_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &ovr_san_path_err_threshold_handler, &snprint_ovr_san_path_err_threshold);
+        install_keyword("san_path_err_forget_rate", &ovr_san_path_err_forget_rate_handler, &snprint_ovr_san_path_err_forget_rate);
+        install_keyword("san_path_err_recovery_time", &ovr_san_path_err_recovery_time_handler, &snprint_ovr_san_path_err_recovery_time);
+
 	install_keyword("skip_kpartx", &ovr_skip_kpartx_handler, &snprint_ovr_skip_kpartx);
 	install_keyword("max_sectors_kb", &ovr_max_sectors_kb_handler, &snprint_ovr_max_sectors_kb);
 
@@ -1543,6 +1576,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &mp_deferred_remove_handler, &snprint_mp_deferred_remove);
 	install_keyword("delay_watch_checks", &mp_delay_watch_checks_handler, &snprint_mp_delay_watch_checks);
 	install_keyword("delay_wait_checks", &mp_delay_wait_checks_handler, &snprint_mp_delay_wait_checks);
+	install_keyword("san_path_err_threshold", &mp_san_path_err_threshold_handler, &snprint_mp_san_path_err_threshold);
+	install_keyword("san_path_err_forget_rate", &mp_san_path_err_forget_rate_handler, &snprint_mp_san_path_err_forget_rate);
+	install_keyword("san_path_err_recovery_time", &mp_san_path_err_recovery_time_handler, &snprint_mp_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &mp_skip_kpartx_handler, &snprint_mp_skip_kpartx);
 	install_keyword("max_sectors_kb", &mp_max_sectors_kb_handler, &snprint_mp_max_sectors_kb);
 	install_sublevel_end();
diff --git a/libmultipath/dict.h b/libmultipath/dict.h
index 4cd03c5..2d6097d 100644
--- a/libmultipath/dict.h
+++ b/libmultipath/dict.h
@@ -14,6 +14,5 @@ int print_no_path_retry(char * buff, int len, void *ptr);
 int print_fast_io_fail(char * buff, int len, void *ptr);
 int print_dev_loss(char * buff, int len, void *ptr);
 int print_reservation_key(char * buff, int len, void * ptr);
-int print_delay_checks(char * buff, int len, void *ptr);
-
+int print_off_int_undef(char * buff, int len, void *ptr);
 #endif /* _DICT_H */
diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index c0bc616..e4afef7 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -623,7 +623,7 @@ int select_delay_watch_checks(struct config *conf, struct multipath *mp)
 	mp_set_conf(delay_watch_checks);
 	mp_set_default(delay_watch_checks, DEFAULT_DELAY_CHECKS);
 out:
-	print_delay_checks(buff, 12, &mp->delay_watch_checks);
+	print_off_int_undef(buff, 12, &mp->delay_watch_checks);
 	condlog(3, "%s: delay_watch_checks = %s %s", mp->alias, buff, origin);
 	return 0;
 }
@@ -638,12 +638,56 @@ int select_delay_wait_checks(struct config *conf, struct multipath *mp)
 	mp_set_conf(delay_wait_checks);
 	mp_set_default(delay_wait_checks, DEFAULT_DELAY_CHECKS);
 out:
-	print_delay_checks(buff, 12, &mp->delay_wait_checks);
+	print_off_int_undef(buff, 12, &mp->delay_wait_checks);
 	condlog(3, "%s: delay_wait_checks = %s %s", mp->alias, buff, origin);
 	return 0;
 
 }
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_threshold);
+        mp_set_ovr(san_path_err_threshold);
+        mp_set_hwe(san_path_err_threshold);
+        mp_set_conf(san_path_err_threshold);
+        mp_set_default(san_path_err_threshold, DEFAULT_ERR_CHECKS);
+out:
+        print_off_int_undef(buff, 12, &mp->san_path_err_threshold);
+        condlog(3, "%s: san_path_err_threshold = %s %s", mp->alias, buff, origin);
+        return 0;
+}
+
+int select_san_path_err_forget_rate(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_forget_rate);
+        mp_set_ovr(san_path_err_forget_rate);
+        mp_set_hwe(san_path_err_forget_rate);
+        mp_set_conf(san_path_err_forget_rate);
+        mp_set_default(san_path_err_forget_rate, DEFAULT_ERR_CHECKS);
+out:
+        print_off_int_undef(buff, 12, &mp->san_path_err_forget_rate);
+        condlog(3, "%s: san_path_err_forget_rate = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
 
+        mp_set_mpe(san_path_err_recovery_time);
+        mp_set_ovr(san_path_err_recovery_time);
+        mp_set_hwe(san_path_err_recovery_time);
+        mp_set_conf(san_path_err_recovery_time);
+        mp_set_default(san_path_err_recovery_time, DEFAULT_ERR_CHECKS);
+out:
+        print_off_int_undef(buff, 12, &mp->san_path_err_recovery_time);
+        condlog(3, "%s: san_path_err_recovery_time = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
 int select_skip_kpartx (struct config *conf, struct multipath * mp)
 {
 	char *origin;
diff --git a/libmultipath/propsel.h b/libmultipath/propsel.h
index ad98fa5..e5b6f93 100644
--- a/libmultipath/propsel.h
+++ b/libmultipath/propsel.h
@@ -24,3 +24,6 @@ int select_delay_watch_checks (struct config *conf, struct multipath * mp);
 int select_delay_wait_checks (struct config *conf, struct multipath * mp);
 int select_skip_kpartx (struct config *conf, struct multipath * mp);
 int select_max_sectors_kb (struct config *conf, struct multipath * mp);
+int select_san_path_err_forget_rate(struct config *conf, struct multipath *mp);
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp);
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp);
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 396f69d..6edd927 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -152,9 +152,9 @@ enum scsi_protocol {
 	SCSI_PROTOCOL_UNSPEC = 0xf, /* No specific protocol */
 };
 
-enum delay_checks_states {
-	DELAY_CHECKS_OFF = -1,
-	DELAY_CHECKS_UNDEF = 0,
+enum no_undef_states {
+	NU_NO = -1,
+	NU_UNDEF = 0,
 };
 
 enum initialized_states {
@@ -223,7 +223,10 @@ struct path {
 	int initialized;
 	int retriggers;
 	int wwid_changed;
-
+	unsigned int path_failures;
+	time_t dis_reinstate_time;
+	int disable_reinstate;
+	int san_path_err_forget_rate;
 	/* configlet pointers */
 	struct hwentry * hwe;
 };
@@ -255,6 +258,9 @@ struct multipath {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	unsigned int dev_loss;
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 22be8e0..1dbc3b2 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -570,6 +570,12 @@ int update_multipath (struct vectors *vecs, char *mapname, int reset)
 				int oldstate = pp->state;
 				condlog(2, "%s: mark as failed", pp->dev);
 				mpp->stat_path_failures++;
+				/*assigned  the path_err_forget_rate when we see the first failure on the path*/
+				if (pp->path_failures == 0) {
+					pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+				}
+				/*Increment the number of path failures*/
+				pp->path_failures++;
 				pp->state = PATH_DOWN;
 				if (oldstate == PATH_UP ||
 				    oldstate == PATH_GHOST)
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 36589f5..3c564ad 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
@@ -751,6 +751,45 @@ The default is: \fB/etc/multipath/conf.d/\fR
 .
 .
 .TP
+.B san_path_err_threshold
+If set to a value greater than 0, multipathd will watch paths and check how many
+times a path has been failed due to errors.If the number of failures on a particular
+path is greater then the san_path_err_threshold then the path will not  reinstante
+till san_path_err_recovery_time.These path failures should occur within a 
+san_path_err_forget_rate checks, if not we will consider the path is good enough
+to reinstantate.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_forget_rate
+If set to a value greater than 0, multipathd will check whether the path failures
+has exceeded  the san_path_err_threshold within this many checks i.e 
+san_path_err_forget_rate . If so we will not reinstante the path till
+san_path_err_recovery_time.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_recovery_time
+If set to a value greater than 0, multipathd will make sure that when path failures
+has exceeded the san_path_err_threshold within san_path_err_forget_rate then the path
+will be placed in failed state for san_path_err_recovery_time duration.Once san_path_err_recovery_time
+has timeout  we will reinstante the failed path .
+san_path_err_recovery_time value should be in secs.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
 .B delay_watch_checks
 If set to a value greater than 0, multipathd will watch paths that have
 recently become valid for this many checks. If they fail again while they are
@@ -1015,6 +1054,12 @@ are taken from the \fIdefaults\fR or \fIdevices\fR section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_forget_rate
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1128,6 +1173,12 @@ section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_forget_rate
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1192,6 +1243,12 @@ the values are taken from the \fIdevices\fR or \fIdefaults\fR sections:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_forget_rate
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
diff --git a/multipathd/main.c b/multipathd/main.c
index adc3258..43d07ab 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1486,7 +1486,57 @@ void repair_path(struct path * pp)
 	checker_repair(&pp->checker);
 	LOG_MSG(1, checker_message(&pp->checker));
 }
+static int check_path_validity_err ( struct path * pp) {
+	struct timespec start_time;
+	int disable_reinstate = 0;
 
+	if (!((pp->mpp->san_path_err_threshold > 0) && 
+	    (pp->mpp->san_path_err_forget_rate > 0) &&
+	    (pp->mpp->san_path_err_recovery_time >0))) {
+		return disable_reinstate;
+	}
+	
+	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
+		return disable_reinstate;	
+	}
+
+	if (!pp->disable_reinstate) {
+		if (pp->path_failures) {
+			/*if the error threshold has hit hit within the san_path_err_forget_rate
+			 *cycles donot reinstante the path till the san_path_err_recovery_time
+			 *place the path in failed state till san_path_err_recovery_time so that the
+			 *cutomer can rectify the issue within this time .Once the completion of
+			 *san_path_err_recovery_time it should automatically reinstantate the path
+			 */
+			if ((pp->path_failures > pp->mpp->san_path_err_threshold) &&
+					(pp->san_path_err_forget_rate > 0)) {
+				printf("\n%s:%d: %s hit error threshold \n",__func__,__LINE__,pp->dev);
+				pp->dis_reinstate_time = start_time.tv_sec ;
+				pp->disable_reinstate = 1;
+				disable_reinstate = 1;
+			} else if ((pp->san_path_err_forget_rate > 0)) {
+				pp->san_path_err_forget_rate--;
+			} else {
+				/*for every san_path_err_forget_rate number
+				 *of successful path checks decrement path_failures by 1
+				 */
+				pp->path_failures --;
+				pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+			}
+		}
+	} else {
+		disable_reinstate = 1;
+		if ((pp->mpp->san_path_err_recovery_time > 0) &&
+				(start_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
+			disable_reinstate =0;
+			pp->path_failures = 0;
+			pp->disable_reinstate = 0;
+			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+			condlog(3,"\npath %s :reinstate the path after err recovery time\n",pp->dev);
+		}
+	}
+	return  disable_reinstate;
+}
 /*
  * Returns '1' if the path has been checked, '-1' if it was blacklisted
  * and '0' otherwise
@@ -1502,7 +1552,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	int oldchkrstate = pp->chkrstate;
 	int retrigger_tries, checkint;
 	struct config *conf;
-	int ret;
+	int ret;	
 
 	if ((pp->initialized == INIT_OK ||
 	     pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp)
@@ -1610,17 +1660,31 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 			pp->wait_checks = 0;
 	}
 
+	if (newstate == PATH_DOWN || newstate == PATH_GHOST) {
+		/*assigned  the path_err_forget_rate when we see the first failure on the path*/
+		if(pp->path_failures == 0){
+			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+		}
+		pp->path_failures++;
+	}
+
 	/*
 	 * don't reinstate failed path, if its in stand-by
 	 * and if target supports only implicit tpgs mode.
 	 * this will prevent unnecessary i/o by dm on stand-by
 	 * paths if there are no other active paths in map.
+	 *
+	 * when path failures has exceeded the san_path_err_threshold 
+	 * within san_path_err_forget_rate then we don't reinstate
+	 * failed path for san_path_err_recovery_time
 	 */
-	disable_reinstate = (newstate == PATH_GHOST &&
+	disable_reinstate = ((newstate == PATH_GHOST &&
 			    pp->mpp->nr_active == 0 &&
-			    pp->tpgs == TPGS_IMPLICIT) ? 1 : 0;
+			    pp->tpgs == TPGS_IMPLICIT) ? 1 :
+			    check_path_validity_err(pp));
 
 	pp->chkrstate = newstate;
+
 	if (newstate != pp->state) {
 		int oldstate = pp->state;
 		pp->state = newstate;

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-01-23 11:02           ` Muneendra Kumar M
@ 2017-01-25  9:28             ` Benjamin Marzinski
  2017-01-25 11:48               ` Muneendra Kumar M
  0 siblings, 1 reply; 21+ messages in thread
From: Benjamin Marzinski @ 2017-01-25  9:28 UTC (permalink / raw)
  To: Muneendra Kumar M; +Cc: dm-devel

This looks fine to me.  If this what you want to push, I'm o.k. with it.
But I'd like to make some suggestions that you are free to ignore.

Right now you have to check in two places to see if the path failed (in
update_multipath and check_path). If you look at the delayed_*_checks
code, it flags the path failures when you reinstate the path in
check_path, since this will only happen there.

Next, right now you use the disable_reinstate code to deal with the
devices when they shouldn't be reinstated. The issue with this is that
the path appears to be up when people look at its state, but still isn't
being used. If you do the check early and set the path state to
PATH_DELAYED, like delayed_*_checks does, then the path is clearly
marked when users look to see why it isn't being used. Also, if you exit
check_path early, then you won't be running the prioritizer on these
likely-unstable paths.

Finally, the way you use dis_reinstate_time, a flakey device can get
reinstated as soon as it comes back up, as long it was down for long
enough, simply because pp->dis_reinstate_time reached
mpp->san_path_err_recovery_time while the device was failed.
delayed_*_checks depends on a number of successful path checks, so
you know that the device has at least been nominally functional for
san_path_err_recovery_time.

Like I said, you don't have to change any of this to make me happy with
your patch. But if you did change all of these, then the current
delay_*_checks code would just end up being a special case of your code.
I'd really like to pull out the delayed_*_checks code and just keep your
version, since it seems more useful. It would be nice to keep the same
functionality. But even if you don't make these changes, I still think
we should pull out the delayed_*_checks code, since they both do the
same general thing, and your code does it better.

-Ben

On Mon, Jan 23, 2017 at 11:02:42AM +0000, Muneendra Kumar M wrote:
>    Hi Ben,
>    I have made the changes as per the below review comments .
>     
>    Could you please review the attached patch and provide us your valuable
>    comments .
>    Below are the files that has been changed .
>     
>    libmultipath/config.c      |  3 +++
>    libmultipath/config.h      |  9 +++++++++
>    libmultipath/configure.c   |  3 +++
>    libmultipath/defaults.h    |  3 ++-
>    libmultipath/dict.c        | 84
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
>    libmultipath/dict.h        |  3 +--
>    libmultipath/propsel.c     | 48
>    ++++++++++++++++++++++++++++++++++++++++++++++--
>    libmultipath/propsel.h     |  3 +++
>    libmultipath/structs.h     | 14 ++++++++++----
>    libmultipath/structs_vec.c |  6 ++++++
>    multipath/multipath.conf.5 | 57
>    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    multipathd/main.c          | 70
>    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>     
>     
>    Regards,
>    Muneendra.
>     
>    _____________________________________________
>    From: Muneendra Kumar M
>    Sent: Tuesday, January 17, 2017 4:13 PM
>    To: 'Benjamin Marzinski' <bmarzins@redhat.com>
>    Cc: dm-devel@redhat.com
>    Subject: RE: [dm-devel] deterministic io throughput in multipath
>     
>     
>    Hi Ben,
>    Thanks for the review.
>    In dict.c  I will make sure I will make generic functions which will be
>    used by both delay_checks and err_checks.
>     
>    We want to increment the path failures every time the path goes down
>    regardless of whether multipathd or the kernel noticed the failure of
>    paths.
>    Thanks for pointing this.
>     
>    I will completely agree with the idea which you mentioned below by
>    reconsidering the san_path_err_threshold_window with
>    san_path_err_forget_rate. This will avoid counting time when the path was
>    down as time where the path wasn't having problems.
>     
>    I will incorporate all the changes mentioned below and will resend the
>    patch once the testing is done.
>     
>    Regards,
>    Muneendra.
>     
>     
>     
>    -----Original Message-----
>    From: Benjamin Marzinski [[1]mailto:bmarzins@redhat.com]
>    Sent: Tuesday, January 17, 2017 6:35 AM
>    To: Muneendra Kumar M <[2]mmandala@Brocade.com>
>    Cc: [3]dm-devel@redhat.com
>    Subject: Re: [dm-devel] deterministic io throughput in multipath
>     
>    On Mon, Jan 16, 2017 at 11:19:19AM +0000, Muneendra Kumar M wrote:
>    >    Hi Ben,
>    >    After the below discussion we  came with the approach which will meet
>    our
>    >    requirement.
>    >    I have attached the patch which is working good in our field tests.
>    >    Could you please review the attached patch and provide us your
>    valuable
>    >    comments .
>     
>    I can see a number of issues with this patch.
>     
>    First, some nit-picks:
>    - I assume "dis_reinstante_time" should be "dis_reinstate_time"
>     
>    - The indenting in check_path_validity_err is wrong, which made it
>      confusing until I noticed that
>     
>    if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
>     
>      doesn't have an open brace, and shouldn't indent the rest of the
>      function.
>     
>    - You call clock_gettime in check_path, but never use the result.
>     
>    - In dict.c, instead of writing your own functions that are the same as
>      the *_delay_checks functions, you could make those functions generic
>      and use them for both.  To go match the other generic function names
>      they would probably be something like
>     
>    set_off_int_undef
>     
>    print_off_int_undef
>     
>      You would also need to change DELAY_CHECKS_* and ERR_CHECKS_* to
>      point to some common enum that you created, the way
>      user_friendly_names_states (to name one of many) does. The generic
>      enum used by *_off_int_undef would be something like.
>     
>    enum no_undef {
>            NU_NO = -1,
>            NU_UNDEF = 0,
>    }
>     
>      The idea is to try to cut down on the number of functions that are
>      simply copy-pasting other functions in dict.c.
>     
>     
>    Those are all minor cleanup issues, but there are some bigger problems.
>     
>    Instead of checking if san_path_err_threshold,
>    san_path_err_threshold_window, and san_path_err_recovery_time are greater
>    than zero seperately, you should probably check them all at the start of
>    check_path_validity_err, and return 0 unless they all are set.
>    Right now, if a user sets san_path_err_threshold and
>    san_path_err_threshold_window but not san_path_err_recovery_time, their
>    path will never recover after it hits the error threshold.  I pretty sure
>    that you don't mean to permanently disable the paths.
>     
>     
>    time_t is a signed type, which means that if you get the clock time in
>    update_multpath and then fail to get the clock time in
>    check_path_validity_err, this check:
>     
>    start_time.tv_sec - pp->failure_start_time) <
>    pp->mpp->san_path_err_threshold_window
>     
>    will always be true.  I realize that clock_gettime is very unlikely to
>    fail.  But if it does, probably the safest thing to so is to just
>    immediately return 0 in check_path_validity_err.
>     
>     
>    The way you set path_failures in update_multipath may not get you what you
>    want.  It will only count path failures found by the kernel, and not the
>    path checker.  If the check_path finds the error, pp->state will be set to
>    PATH_DOWN before pp->dmstate is set to PSTATE_FAILED. That means you will
>    not increment path_failures. Perhaps this is what you want, but I would
>    assume that you would want to count every time the path goes down
>    regardless of whether multipathd or the kernel noticed it.
>     
>     
>    I'm not super enthusiastic about how the san_path_err_threshold_window
>    works.  First, it starts counting from when the path goes down, so if the
>    path takes long enough to get restored, and then fails immediately, it can
>    just keep failing and it will never hit the san_path_err_threshold_window,
>    since it spends so much of that time with the path failed.  Also, the
>    window gets set on the first error, and never reset until the number of
>    errors is over the threshold.  This means that if you get one early error
>    and then a bunch of errors much later, you will go for (2 x
>    san_path_err_threshold) - 1 errors until you stop reinstating the path,
>    because of the window reset in the middle of the string of errors.  It
>    seems like a better idea would be to have check_path_validity_err reset
>    path_failures as soon as it notices that you are past
>    san_path_err_threshold_window, instead of waiting till the number of
>    errors hits san_path_err_threshold.
>     
>     
>    If I was going to design this, I think I would have san_path_err_threshold
>    and san_path_err_recovery_time like you do, but instead of having a
>    san_path_err_threshold_window, I would have something like
>    san_path_err_forget_rate.  The idea is that every san_path_err_forget_rate
>    number of successful path checks you decrement path_failures by 1. This
>    means that there is no window after which you reset.  If the path failures
>    come in faster than the forget rate, you will eventually hit the error
>    threshold. This also has the benefit of easily not counting time when the
>    path was down as time where the path wasn't having problems. But if you
>    don't like my idea, yours will work fine with some polish.
>     
>    -Ben
>     
>     
>    >    Below are the files that has been changed .
>    >     
>    >    libmultipath/config.c      |  3 +++
>    >    libmultipath/config.h      |  9 +++++++++
>    >    libmultipath/configure.c   |  3 +++
>    >    libmultipath/defaults.h    |  1 +
>    >    libmultipath/dict.c             | 80
>    >   
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    >    libmultipath/dict.h        |  1 +
>    >    libmultipath/propsel.c     | 44
>    >    ++++++++++++++++++++++++++++++++++++++++++++
>    >    libmultipath/propsel.h     |  6 ++++++
>    >    libmultipath/structs.h     | 12 +++++++++++-
>    >    libmultipath/structs_vec.c | 10 ++++++++++
>    >    multipath/multipath.conf.5 | 58
>    >    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    >    multipathd/main.c          | 61
>    >    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>    >     
>    >    We have added three new config parameters whose description is below.
>    >    1.san_path_err_threshold:
>    >            If set to a value greater than 0, multipathd will watch paths
>    and
>    >    check how many times a path has been failed due to errors. If the
>    number
>    >    of failures on a particular path is greater then the
>    >    san_path_err_threshold then the path will not  reinstate  till
>    >    san_path_err_recovery_time. These path failures should occur within a
>    >    san_path_err_threshold_window time frame, if not we will consider the
>    path
>    >    is good enough to reinstate.
>    >     
>    >    2.san_path_err_threshold_window:
>    >            If set to a value greater than 0, multipathd will check
>    whether
>    >    the path failures has exceeded  the san_path_err_threshold within
>    this
>    >    time frame i.e san_path_err_threshold_window . If so we will not
>    reinstate
>    >    the path till          san_path_err_recovery_time.
>    >     
>    >    3.san_path_err_recovery_time:
>    >    If set to a value greater than 0, multipathd will make sure that when
>    path
>    >    failures has exceeded the san_path_err_threshold within
>    >    san_path_err_threshold_window then the path  will be placed in failed
>    >    state for san_path_err_recovery_time duration. Once
>    >    san_path_err_recovery_time has timeout  we will reinstate the failed
>    path
>    >    .
>    >     
>    >    Regards,
>    >    Muneendra.
>    >     
>    >    -----Original Message-----
>    >    From: Muneendra Kumar M
>    >    Sent: Wednesday, January 04, 2017 6:56 PM
>    >    To: 'Benjamin Marzinski' <[4]bmarzins@redhat.com>
>    >    Cc: [5]dm-devel@redhat.com
>    >    Subject: RE: [dm-devel] deterministic io throughput in multipath
>    >     
>    >    Hi Ben,
>    >    Thanks for the information.
>    >     
>    >    Regards,
>    >    Muneendra.
>    >     
>    >    -----Original Message-----
>    >    From: Benjamin Marzinski [[1][6]mailto:bmarzins@redhat.com]
>    >    Sent: Tuesday, January 03, 2017 10:42 PM
>    >    To: Muneendra Kumar M <[2][7]mmandala@Brocade.com>
>    >    Cc: [3][8]dm-devel@redhat.com
>    >    Subject: Re: [dm-devel] deterministic io throughput in multipath
>    >     
>    >    On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
>    >    > Hi Ben,
>    >    >
>    >    > If there are two paths on a dm-1 say sda and sdb as below.
>    >    >
>    >    > #  multipath -ll
>    >    >        mpathd (3600110d001ee7f0102050001cc0b6751) dm-1
>    SANBlaze,VLUN
>    >    MyLun
>    >    >        size=8.0M features='0' hwhandler='0' wp=rw
>    >    >        `-+- policy='round-robin 0' prio=50 status=active
>    >    >          |- 8:0:1:0  sda 8:48 active ready  running
>    >    >          `- 9:0:1:0  sdb 8:64 active ready  running         
>    >    >
>    >    > And on sda if iam seeing lot of errors due to which the sda path is
>    >    fluctuating from failed state to active state and vicevera.
>    >    >
>    >    > My requirement is something like this if sda is failed for more
>    then 5
>    >    > times in a hour duration ,then I want to keep the sda in failed
>    state
>    >    > for few hours (3hrs)
>    >    >
>    >    > And the data should travel only thorugh sdb path.
>    >    > Will this be possible with the below parameters.
>    >     
>    >    No. delay_watch_checks sets how may path checks you watch a path that
>    has
>    >    recently come back from the failed state. If the path fails again
>    within
>    >    this time, multipath device delays it.  This means that the delay is
>    >    always trigger by two failures within the time limit.  It's possible
>    to
>    >    adapt this to count numbers of failures, and act after a certain
>    number
>    >    within a certain timeframe, but it would take a bit more work.
>    >     
>    >    delay_wait_checks doesn't guarantee that it will delay for any set
>    length
>    >    of time.  Instead, it sets the number of consecutive successful path
>    >    checks that must occur before the path is usable again. You could set
>    this
>    >    for 3 hours of path checks, but if a check failed during this time,
>    you
>    >    would restart the 3 hours over again.
>    >     
>    >    -Ben
>    >     
>    >    > Can you just let me know what values I should add for
>    delay_watch_checks
>    >    and delay_wait_checks.
>    >    >
>    >    > Regards,
>    >    > Muneendra.
>    >    >
>    >    >
>    >    >
>    >    > -----Original Message-----
>    >    > From: Muneendra Kumar M
>    >    > Sent: Thursday, December 22, 2016 11:10 AM
>    >    > To: 'Benjamin Marzinski' <[4][9]bmarzins@redhat.com>
>    >    > Cc: [5][10]dm-devel@redhat.com
>    >    > Subject: RE: [dm-devel] deterministic io throughput in multipath
>    >    >
>    >    > Hi Ben,
>    >    >
>    >    > Thanks for the reply.
>    >    > I will look into this parameters will do the internal testing and
>    let
>    >    you know the results.
>    >    >
>    >    > Regards,
>    >    > Muneendra.
>    >    >
>    >    > -----Original Message-----
>    >    > From: Benjamin Marzinski [[6][11]mailto:bmarzins@redhat.com]
>    >    > Sent: Wednesday, December 21, 2016 9:40 PM
>    >    > To: Muneendra Kumar M <[7][12]mmandala@Brocade.com>
>    >    > Cc: [8][13]dm-devel@redhat.com
>    >    > Subject: Re: [dm-devel] deterministic io throughput in multipath
>    >    >
>    >    > Have you looked into the delay_watch_checks and delay_wait_checks
>    >    configuration parameters?  The idea behind them is to minimize the
>    use of
>    >    paths that are intermittently failing.
>    >    >
>    >    > -Ben
>    >    >
>    >    > On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
>    >    > >    Customers using Linux host (mostly RHEL host) using a SAN
>    network
>    >    for
>    >    > >    block storage, complain the Linux multipath stack is not
>    resilient
>    >    to
>    >    > >    handle non-deterministic storage network behaviors. This has
>    caused
>    >    many
>    >    > >    customer move away to non-linux based servers. The intent of
>    the
>    >    below
>    >    > >    patch and the prevailing issues are given below. With the
>    below
>    >    design we
>    >    > >    are seeing the Linux multipath stack becoming resilient to
>    such
>    >    network
>    >    > >    issues. We hope by getting this patch accepted will help in
>    more
>    >    Linux
>    >    > >    server adoption that use SAN network.
>    >    > >
>    >    > >    I have already sent the design details to the community in a
>    >    different
>    >    > >    mail chain and the details are available in the below link.
>    >    > >
>    >    > >   
>    >   
>    [1][9][14]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e=
>    >    .
>    >    > >
>    >    > >    Can you please go through the design and send the comments to
>    us.
>    >    > >
>    >    > >     
>    >    > >
>    >    > >    Regards,
>    >    > >
>    >    > >    Muneendra.
>    >    > >
>    >    > >     
>    >    > >
>    >    > >     
>    >    > >
>    >    > > References
>    >    > >
>    >    > >    Visible links
>    >    > >    1.
>    >    > >
>    >   
>    [10][15]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    >    > > ar
>    >    > >
>    chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
>    >    > > ub
>    >    > >
>    gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
>    >    > > 1K
>    >    > >
>    XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
>    >    > > Ru
>    >    > > 52hG3MKzM&e=
>    >    >
>    >    > > --
>    >    > > dm-devel mailing list
>    >    > > [11][16]dm-devel@redhat.com
>    >    > >
>    >   
>    [12][17]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    >    > > ma
>    >    > >
>    ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
>    >    > >
>    7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
>    >    > >
>    > i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=
>    >     
>    >
>    > References
>    >
>    >    Visible links
>    >    1. [18]mailto:bmarzins@redhat.com
>    >    2. [19]mailto:mmandala@brocade.com
>    >    3. [20]mailto:dm-devel@redhat.com
>    >    4. [21]mailto:bmarzins@redhat.com
>    >    5. [22]mailto:dm-devel@redhat.com
>    >    6. [23]mailto:bmarzins@redhat.com
>    >    7. [24]mailto:mmandala@brocade.com
>    >    8. [25]mailto:dm-devel@redhat.com
>    >    9.
>    [26]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
>    >   10.
>    [27]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    >   11. [28]mailto:dm-devel@redhat.com
>    >   12.
>    > [29]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>     
>     
>     
>     
> 
> References
> 
>    Visible links
>    1. mailto:bmarzins@redhat.com
>    2. mailto:mmandala@brocade.com
>    3. mailto:dm-devel@redhat.com
>    4. mailto:bmarzins@redhat.com
>    5. mailto:dm-devel@redhat.com
>    6. mailto:bmarzins@redhat.com
>    7. mailto:mmandala@brocade.com
>    8. mailto:dm-devel@redhat.com
>    9. mailto:bmarzins@redhat.com
>   10. mailto:dm-devel@redhat.com
>   11. mailto:bmarzins@redhat.com
>   12. mailto:mmandala@brocade.com
>   13. mailto:dm-devel@redhat.com
>   14. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
>   15. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>   16. mailto:dm-devel@redhat.com
>   17. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>   18. mailto:bmarzins@redhat.com
>   19. mailto:mmandala@brocade.com
>   20. mailto:dm-devel@redhat.com
>   21. mailto:bmarzins@redhat.com
>   22. mailto:dm-devel@redhat.com
>   23. mailto:bmarzins@redhat.com
>   24. mailto:mmandala@brocade.com
>   25. mailto:dm-devel@redhat.com
>   26. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
>   27. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>   28. mailto:dm-devel@redhat.com
>   29. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-01-25  9:28             ` Benjamin Marzinski
@ 2017-01-25 11:48               ` Muneendra Kumar M
  2017-01-25 13:07                 ` Benjamin Marzinski
  0 siblings, 1 reply; 21+ messages in thread
From: Muneendra Kumar M @ 2017-01-25 11:48 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel

Hi Ben,
Thanks for the review .
I will consider the below points and will do the necessary changes.

I have two general questions which may not be related to this.
1)Is there any standard tests that we need to do to check the functionality of the multipath daemon.
2)Iam new to git is there any standard steps which we generally  follow to push the changes .

Regards,
Muneendra.



-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
Sent: Wednesday, January 25, 2017 2:59 PM
To: Muneendra Kumar M <mmandala@Brocade.com>
Cc: dm-devel@redhat.com
Subject: Re: [dm-devel] deterministic io throughput in multipath

This looks fine to me.  If this what you want to push, I'm o.k. with it.
But I'd like to make some suggestions that you are free to ignore.

Right now you have to check in two places to see if the path failed (in update_multipath and check_path). If you look at the delayed_*_checks code, it flags the path failures when you reinstate the path in check_path, since this will only happen there.

Next, right now you use the disable_reinstate code to deal with the devices when they shouldn't be reinstated. The issue with this is that the path appears to be up when people look at its state, but still isn't being used. If you do the check early and set the path state to PATH_DELAYED, like delayed_*_checks does, then the path is clearly marked when users look to see why it isn't being used. Also, if you exit check_path early, then you won't be running the prioritizer on these likely-unstable paths.

Finally, the way you use dis_reinstate_time, a flakey device can get reinstated as soon as it comes back up, as long it was down for long enough, simply because pp->dis_reinstate_time reached
mpp->san_path_err_recovery_time while the device was failed.
delayed_*_checks depends on a number of successful path checks, so you know that the device has at least been nominally functional for san_path_err_recovery_time.

Like I said, you don't have to change any of this to make me happy with your patch. But if you did change all of these, then the current delay_*_checks code would just end up being a special case of your code.
I'd really like to pull out the delayed_*_checks code and just keep your version, since it seems more useful. It would be nice to keep the same functionality. But even if you don't make these changes, I still think we should pull out the delayed_*_checks code, since they both do the same general thing, and your code does it better.

-Ben

On Mon, Jan 23, 2017 at 11:02:42AM +0000, Muneendra Kumar M wrote:
>    Hi Ben,
>    I have made the changes as per the below review comments .
>     
>    Could you please review the attached patch and provide us your valuable
>    comments .
>    Below are the files that has been changed .
>     
>    libmultipath/config.c      |  3 +++
>    libmultipath/config.h      |  9 +++++++++
>    libmultipath/configure.c   |  3 +++
>    libmultipath/defaults.h    |  3 ++-
>    libmultipath/dict.c        | 84
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
>    libmultipath/dict.h        |  3 +--
>    libmultipath/propsel.c     | 48
>    ++++++++++++++++++++++++++++++++++++++++++++++--
>    libmultipath/propsel.h     |  3 +++
>    libmultipath/structs.h     | 14 ++++++++++----
>    libmultipath/structs_vec.c |  6 ++++++
>    multipath/multipath.conf.5 | 57
>    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    multipathd/main.c          | 70
>    
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>     
>     
>    Regards,
>    Muneendra.
>     
>    _____________________________________________
>    From: Muneendra Kumar M
>    Sent: Tuesday, January 17, 2017 4:13 PM
>    To: 'Benjamin Marzinski' <bmarzins@redhat.com>
>    Cc: dm-devel@redhat.com
>    Subject: RE: [dm-devel] deterministic io throughput in multipath
>     
>     
>    Hi Ben,
>    Thanks for the review.
>    In dict.c  I will make sure I will make generic functions which will be
>    used by both delay_checks and err_checks.
>     
>    We want to increment the path failures every time the path goes down
>    regardless of whether multipathd or the kernel noticed the failure of
>    paths.
>    Thanks for pointing this.
>     
>    I will completely agree with the idea which you mentioned below by
>    reconsidering the san_path_err_threshold_window with
>    san_path_err_forget_rate. This will avoid counting time when the path was
>    down as time where the path wasn't having problems.
>     
>    I will incorporate all the changes mentioned below and will resend the
>    patch once the testing is done.
>     
>    Regards,
>    Muneendra.
>     
>     
>     
>    -----Original Message-----
>    From: Benjamin Marzinski [[1]mailto:bmarzins@redhat.com]
>    Sent: Tuesday, January 17, 2017 6:35 AM
>    To: Muneendra Kumar M <[2]mmandala@Brocade.com>
>    Cc: [3]dm-devel@redhat.com
>    Subject: Re: [dm-devel] deterministic io throughput in multipath
>     
>    On Mon, Jan 16, 2017 at 11:19:19AM +0000, Muneendra Kumar M wrote:
>    >    Hi Ben,
>    >    After the below discussion we  came with the approach which will meet
>    our
>    >    requirement.
>    >    I have attached the patch which is working good in our field tests.
>    >    Could you please review the attached patch and provide us your
>    valuable
>    >    comments .
>     
>    I can see a number of issues with this patch.
>     
>    First, some nit-picks:
>    - I assume "dis_reinstante_time" should be "dis_reinstate_time"
>     
>    - The indenting in check_path_validity_err is wrong, which made it
>      confusing until I noticed that
>     
>    if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
>     
>      doesn't have an open brace, and shouldn't indent the rest of the
>      function.
>     
>    - You call clock_gettime in check_path, but never use the result.
>     
>    - In dict.c, instead of writing your own functions that are the same as
>      the *_delay_checks functions, you could make those functions generic
>      and use them for both.  To go match the other generic function names
>      they would probably be something like
>     
>    set_off_int_undef
>     
>    print_off_int_undef
>     
>      You would also need to change DELAY_CHECKS_* and ERR_CHECKS_* to
>      point to some common enum that you created, the way
>      user_friendly_names_states (to name one of many) does. The generic
>      enum used by *_off_int_undef would be something like.
>     
>    enum no_undef {
>            NU_NO = -1,
>            NU_UNDEF = 0,
>    }
>     
>      The idea is to try to cut down on the number of functions that are
>      simply copy-pasting other functions in dict.c.
>     
>     
>    Those are all minor cleanup issues, but there are some bigger problems.
>     
>    Instead of checking if san_path_err_threshold,
>    san_path_err_threshold_window, and san_path_err_recovery_time are greater
>    than zero seperately, you should probably check them all at the start of
>    check_path_validity_err, and return 0 unless they all are set.
>    Right now, if a user sets san_path_err_threshold and
>    san_path_err_threshold_window but not san_path_err_recovery_time, their
>    path will never recover after it hits the error threshold.  I pretty sure
>    that you don't mean to permanently disable the paths.
>     
>     
>    time_t is a signed type, which means that if you get the clock time in
>    update_multpath and then fail to get the clock time in
>    check_path_validity_err, this check:
>     
>    start_time.tv_sec - pp->failure_start_time) <
>    pp->mpp->san_path_err_threshold_window
>     
>    will always be true.  I realize that clock_gettime is very unlikely to
>    fail.  But if it does, probably the safest thing to so is to just
>    immediately return 0 in check_path_validity_err.
>     
>     
>    The way you set path_failures in update_multipath may not get you what you
>    want.  It will only count path failures found by the kernel, and not the
>    path checker.  If the check_path finds the error, pp->state will be set to
>    PATH_DOWN before pp->dmstate is set to PSTATE_FAILED. That means you will
>    not increment path_failures. Perhaps this is what you want, but I would
>    assume that you would want to count every time the path goes down
>    regardless of whether multipathd or the kernel noticed it.
>     
>     
>    I'm not super enthusiastic about how the san_path_err_threshold_window
>    works.  First, it starts counting from when the path goes down, so if the
>    path takes long enough to get restored, and then fails immediately, it can
>    just keep failing and it will never hit the san_path_err_threshold_window,
>    since it spends so much of that time with the path failed.  Also, the
>    window gets set on the first error, and never reset until the number of
>    errors is over the threshold.  This means that if you get one early error
>    and then a bunch of errors much later, you will go for (2 x
>    san_path_err_threshold) - 1 errors until you stop reinstating the path,
>    because of the window reset in the middle of the string of errors.  It
>    seems like a better idea would be to have check_path_validity_err reset
>    path_failures as soon as it notices that you are past
>    san_path_err_threshold_window, instead of waiting till the number of
>    errors hits san_path_err_threshold.
>     
>     
>    If I was going to design this, I think I would have san_path_err_threshold
>    and san_path_err_recovery_time like you do, but instead of having a
>    san_path_err_threshold_window, I would have something like
>    san_path_err_forget_rate.  The idea is that every san_path_err_forget_rate
>    number of successful path checks you decrement path_failures by 1. This
>    means that there is no window after which you reset.  If the path failures
>    come in faster than the forget rate, you will eventually hit the error
>    threshold. This also has the benefit of easily not counting time when the
>    path was down as time where the path wasn't having problems. But if you
>    don't like my idea, yours will work fine with some polish.
>     
>    -Ben
>     
>     
>    >    Below are the files that has been changed .
>    >     
>    >    libmultipath/config.c      |  3 +++
>    >    libmultipath/config.h      |  9 +++++++++
>    >    libmultipath/configure.c   |  3 +++
>    >    libmultipath/defaults.h    |  1 +
>    >    libmultipath/dict.c             | 80
>    >   
>    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    >    libmultipath/dict.h        |  1 +
>    >    libmultipath/propsel.c     | 44
>    >    ++++++++++++++++++++++++++++++++++++++++++++
>    >    libmultipath/propsel.h     |  6 ++++++
>    >    libmultipath/structs.h     | 12 +++++++++++-
>    >    libmultipath/structs_vec.c | 10 ++++++++++
>    >    multipath/multipath.conf.5 | 58
>    >    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>    >    multipathd/main.c          | 61
>    >    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>    >     
>    >    We have added three new config parameters whose description is below.
>    >    1.san_path_err_threshold:
>    >            If set to a value greater than 0, multipathd will watch paths
>    and
>    >    check how many times a path has been failed due to errors. If the
>    number
>    >    of failures on a particular path is greater then the
>    >    san_path_err_threshold then the path will not  reinstate  till
>    >    san_path_err_recovery_time. These path failures should occur within a
>    >    san_path_err_threshold_window time frame, if not we will consider the
>    path
>    >    is good enough to reinstate.
>    >     
>    >    2.san_path_err_threshold_window:
>    >            If set to a value greater than 0, multipathd will check
>    whether
>    >    the path failures has exceeded  the san_path_err_threshold within
>    this
>    >    time frame i.e san_path_err_threshold_window . If so we will not
>    reinstate
>    >    the path till          san_path_err_recovery_time.
>    >     
>    >    3.san_path_err_recovery_time:
>    >    If set to a value greater than 0, multipathd will make sure that when
>    path
>    >    failures has exceeded the san_path_err_threshold within
>    >    san_path_err_threshold_window then the path  will be placed in failed
>    >    state for san_path_err_recovery_time duration. Once
>    >    san_path_err_recovery_time has timeout  we will reinstate the failed
>    path
>    >    .
>    >     
>    >    Regards,
>    >    Muneendra.
>    >     
>    >    -----Original Message-----
>    >    From: Muneendra Kumar M
>    >    Sent: Wednesday, January 04, 2017 6:56 PM
>    >    To: 'Benjamin Marzinski' <[4]bmarzins@redhat.com>
>    >    Cc: [5]dm-devel@redhat.com
>    >    Subject: RE: [dm-devel] deterministic io throughput in multipath
>    >     
>    >    Hi Ben,
>    >    Thanks for the information.
>    >     
>    >    Regards,
>    >    Muneendra.
>    >     
>    >    -----Original Message-----
>    >    From: Benjamin Marzinski [[1][6]mailto:bmarzins@redhat.com]
>    >    Sent: Tuesday, January 03, 2017 10:42 PM
>    >    To: Muneendra Kumar M <[2][7]mmandala@Brocade.com>
>    >    Cc: [3][8]dm-devel@redhat.com
>    >    Subject: Re: [dm-devel] deterministic io throughput in multipath
>    >     
>    >    On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
>    >    > Hi Ben,
>    >    >
>    >    > If there are two paths on a dm-1 say sda and sdb as below.
>    >    >
>    >    > #  multipath -ll
>    >    >        mpathd (3600110d001ee7f0102050001cc0b6751) dm-1
>    SANBlaze,VLUN
>    >    MyLun
>    >    >        size=8.0M features='0' hwhandler='0' wp=rw
>    >    >        `-+- policy='round-robin 0' prio=50 status=active
>    >    >          |- 8:0:1:0  sda 8:48 active ready  running
>    >    >          `- 9:0:1:0  sdb 8:64 active ready  running         
>    >    >
>    >    > And on sda if iam seeing lot of errors due to which the sda path is
>    >    fluctuating from failed state to active state and vicevera.
>    >    >
>    >    > My requirement is something like this if sda is failed for more
>    then 5
>    >    > times in a hour duration ,then I want to keep the sda in failed
>    state
>    >    > for few hours (3hrs)
>    >    >
>    >    > And the data should travel only thorugh sdb path.
>    >    > Will this be possible with the below parameters.
>    >     
>    >    No. delay_watch_checks sets how may path checks you watch a path that
>    has
>    >    recently come back from the failed state. If the path fails again
>    within
>    >    this time, multipath device delays it.  This means that the delay is
>    >    always trigger by two failures within the time limit.  It's possible
>    to
>    >    adapt this to count numbers of failures, and act after a certain
>    number
>    >    within a certain timeframe, but it would take a bit more work.
>    >     
>    >    delay_wait_checks doesn't guarantee that it will delay for any set
>    length
>    >    of time.  Instead, it sets the number of consecutive successful path
>    >    checks that must occur before the path is usable again. You could set
>    this
>    >    for 3 hours of path checks, but if a check failed during this time,
>    you
>    >    would restart the 3 hours over again.
>    >     
>    >    -Ben
>    >     
>    >    > Can you just let me know what values I should add for
>    delay_watch_checks
>    >    and delay_wait_checks.
>    >    >
>    >    > Regards,
>    >    > Muneendra.
>    >    >
>    >    >
>    >    >
>    >    > -----Original Message-----
>    >    > From: Muneendra Kumar M
>    >    > Sent: Thursday, December 22, 2016 11:10 AM
>    >    > To: 'Benjamin Marzinski' <[4][9]bmarzins@redhat.com>
>    >    > Cc: [5][10]dm-devel@redhat.com
>    >    > Subject: RE: [dm-devel] deterministic io throughput in multipath
>    >    >
>    >    > Hi Ben,
>    >    >
>    >    > Thanks for the reply.
>    >    > I will look into this parameters will do the internal testing and
>    let
>    >    you know the results.
>    >    >
>    >    > Regards,
>    >    > Muneendra.
>    >    >
>    >    > -----Original Message-----
>    >    > From: Benjamin Marzinski [[6][11]mailto:bmarzins@redhat.com]
>    >    > Sent: Wednesday, December 21, 2016 9:40 PM
>    >    > To: Muneendra Kumar M <[7][12]mmandala@Brocade.com>
>    >    > Cc: [8][13]dm-devel@redhat.com
>    >    > Subject: Re: [dm-devel] deterministic io throughput in multipath
>    >    >
>    >    > Have you looked into the delay_watch_checks and delay_wait_checks
>    >    configuration parameters?  The idea behind them is to minimize the
>    use of
>    >    paths that are intermittently failing.
>    >    >
>    >    > -Ben
>    >    >
>    >    > On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
>    >    > >    Customers using Linux host (mostly RHEL host) using a SAN
>    network
>    >    for
>    >    > >    block storage, complain the Linux multipath stack is not
>    resilient
>    >    to
>    >    > >    handle non-deterministic storage network behaviors. This has
>    caused
>    >    many
>    >    > >    customer move away to non-linux based servers. The intent of
>    the
>    >    below
>    >    > >    patch and the prevailing issues are given below. With the
>    below
>    >    design we
>    >    > >    are seeing the Linux multipath stack becoming resilient to
>    such
>    >    network
>    >    > >    issues. We hope by getting this patch accepted will help in
>    more
>    >    Linux
>    >    > >    server adoption that use SAN network.
>    >    > >
>    >    > >    I have already sent the design details to the community in a
>    >    different
>    >    > >    mail chain and the details are available in the below link.
>    >    > >
>    >    > >   
>    >   
>    [1][9][14]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e=
>    >    .
>    >    > >
>    >    > >    Can you please go through the design and send the comments to
>    us.
>    >    > >
>    >    > >     
>    >    > >
>    >    > >    Regards,
>    >    > >
>    >    > >    Muneendra.
>    >    > >
>    >    > >     
>    >    > >
>    >    > >     
>    >    > >
>    >    > > References
>    >    > >
>    >    > >    Visible links
>    >    > >    1.
>    >    > >
>    >   
>    [10][15]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    >    > > ar
>    >    > >
>    chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
>    >    > > ub
>    >    > >
>    gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
>    >    > > 1K
>    >    > >
>    XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
>    >    > > Ru
>    >    > > 52hG3MKzM&e=
>    >    >
>    >    > > --
>    >    > > dm-devel mailing list
>    >    > > [11][16]dm-devel@redhat.com
>    >    > >
>    >   
>    [12][17]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    >    > > ma
>    >    > >
>    ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
>    >    > >
>    7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
>    >    > >
>    > i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=
>    >     
>    >
>    > References
>    >
>    >    Visible links
>    >    1. [18]mailto:bmarzins@redhat.com
>    >    2. [19]mailto:mmandala@brocade.com
>    >    3. [20]mailto:dm-devel@redhat.com
>    >    4. [21]mailto:bmarzins@redhat.com
>    >    5. [22]mailto:dm-devel@redhat.com
>    >    6. [23]mailto:bmarzins@redhat.com
>    >    7. [24]mailto:mmandala@brocade.com
>    >    8. [25]mailto:dm-devel@redhat.com
>    >    9.
>    [26]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
>    >   10.
>    [27]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>    >   11. [28]mailto:dm-devel@redhat.com
>    >   12.
>    > 
> [29]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.co
> m_
>     
>     
>     
>     
> 
> References
> 
>    Visible links
>    1. mailto:bmarzins@redhat.com
>    2. mailto:mmandala@brocade.com
>    3. mailto:dm-devel@redhat.com
>    4. mailto:bmarzins@redhat.com
>    5. mailto:dm-devel@redhat.com
>    6. mailto:bmarzins@redhat.com
>    7. mailto:mmandala@brocade.com
>    8. mailto:dm-devel@redhat.com
>    9. mailto:bmarzins@redhat.com
>   10. mailto:dm-devel@redhat.com
>   11. mailto:bmarzins@redhat.com
>   12. mailto:mmandala@brocade.com
>   13. mailto:dm-devel@redhat.com
>   14. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
>   15. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>   16. mailto:dm-devel@redhat.com
>   17. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>   18. mailto:bmarzins@redhat.com
>   19. mailto:mmandala@brocade.com
>   20. mailto:dm-devel@redhat.com
>   21. mailto:bmarzins@redhat.com
>   22. mailto:dm-devel@redhat.com
>   23. mailto:bmarzins@redhat.com
>   24. mailto:mmandala@brocade.com
>   25. mailto:dm-devel@redhat.com
>   26. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
>   27. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
>   28. mailto:dm-devel@redhat.com
>   29. 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-01-25 11:48               ` Muneendra Kumar M
@ 2017-01-25 13:07                 ` Benjamin Marzinski
  2017-02-01 11:58                   ` Muneendra Kumar M
  0 siblings, 1 reply; 21+ messages in thread
From: Benjamin Marzinski @ 2017-01-25 13:07 UTC (permalink / raw)
  To: Muneendra Kumar M; +Cc: dm-devel

On Wed, Jan 25, 2017 at 11:48:33AM +0000, Muneendra Kumar M wrote:
> Hi Ben,
> Thanks for the review .
> I will consider the below points and will do the necessary changes.
> 
> I have two general questions which may not be related to this.
> 1)Is there any standard tests that we need to do to check the functionality of the multipath daemon.

No. multipath doesn't have a standard set of regression tests.  You need
to do your own testing.

> 2)Iam new to git is there any standard steps which we generally  follow to push the changes .

You don't need to use git to push a patch, but it is easier to process
if your patch is inline in the email instead of as an attachment
(assuming your mail client doesn't mangle the patch).

If you want to use git, you just need to commit your patches to a branch
off the head of master. Then you can build patches with

# git format-patch --cover-letter -s -n -o <dir> origin

and send them with

# git send-email --to "device-mapper development <dm-devel@redhat.com>" --cc "Christophe Varoqui <christophe.varoqui@opensvc.com>" --no-chain-reply-to --suppress-from <dir>

You may need to first need to set up your git name and email

-Ben

> Regards,
> Muneendra.
> 
> 
> 
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
> Sent: Wednesday, January 25, 2017 2:59 PM
> To: Muneendra Kumar M <mmandala@Brocade.com>
> Cc: dm-devel@redhat.com
> Subject: Re: [dm-devel] deterministic io throughput in multipath
> 
> This looks fine to me.  If this what you want to push, I'm o.k. with it.
> But I'd like to make some suggestions that you are free to ignore.
> 
> Right now you have to check in two places to see if the path failed (in update_multipath and check_path). If you look at the delayed_*_checks code, it flags the path failures when you reinstate the path in check_path, since this will only happen there.
> 
> Next, right now you use the disable_reinstate code to deal with the devices when they shouldn't be reinstated. The issue with this is that the path appears to be up when people look at its state, but still isn't being used. If you do the check early and set the path state to PATH_DELAYED, like delayed_*_checks does, then the path is clearly marked when users look to see why it isn't being used. Also, if you exit check_path early, then you won't be running the prioritizer on these likely-unstable paths.
> 
> Finally, the way you use dis_reinstate_time, a flakey device can get reinstated as soon as it comes back up, as long it was down for long enough, simply because pp->dis_reinstate_time reached
> mpp->san_path_err_recovery_time while the device was failed.
> delayed_*_checks depends on a number of successful path checks, so you know that the device has at least been nominally functional for san_path_err_recovery_time.
> 
> Like I said, you don't have to change any of this to make me happy with your patch. But if you did change all of these, then the current delay_*_checks code would just end up being a special case of your code.
> I'd really like to pull out the delayed_*_checks code and just keep your version, since it seems more useful. It would be nice to keep the same functionality. But even if you don't make these changes, I still think we should pull out the delayed_*_checks code, since they both do the same general thing, and your code does it better.
> 
> -Ben
> 
> On Mon, Jan 23, 2017 at 11:02:42AM +0000, Muneendra Kumar M wrote:
> >    Hi Ben,
> >    I have made the changes as per the below review comments .
> >     
> >    Could you please review the attached patch and provide us your valuable
> >    comments .
> >    Below are the files that has been changed .
> >     
> >    libmultipath/config.c      |  3 +++
> >    libmultipath/config.h      |  9 +++++++++
> >    libmultipath/configure.c   |  3 +++
> >    libmultipath/defaults.h    |  3 ++-
> >    libmultipath/dict.c        | 84
> >    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
> >    libmultipath/dict.h        |  3 +--
> >    libmultipath/propsel.c     | 48
> >    ++++++++++++++++++++++++++++++++++++++++++++++--
> >    libmultipath/propsel.h     |  3 +++
> >    libmultipath/structs.h     | 14 ++++++++++----
> >    libmultipath/structs_vec.c |  6 ++++++
> >    multipath/multipath.conf.5 | 57
> >    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >    multipathd/main.c          | 70
> >    
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
> >     
> >     
> >    Regards,
> >    Muneendra.
> >     
> >    _____________________________________________
> >    From: Muneendra Kumar M
> >    Sent: Tuesday, January 17, 2017 4:13 PM
> >    To: 'Benjamin Marzinski' <bmarzins@redhat.com>
> >    Cc: dm-devel@redhat.com
> >    Subject: RE: [dm-devel] deterministic io throughput in multipath
> >     
> >     
> >    Hi Ben,
> >    Thanks for the review.
> >    In dict.c  I will make sure I will make generic functions which will be
> >    used by both delay_checks and err_checks.
> >     
> >    We want to increment the path failures every time the path goes down
> >    regardless of whether multipathd or the kernel noticed the failure of
> >    paths.
> >    Thanks for pointing this.
> >     
> >    I will completely agree with the idea which you mentioned below by
> >    reconsidering the san_path_err_threshold_window with
> >    san_path_err_forget_rate. This will avoid counting time when the path was
> >    down as time where the path wasn't having problems.
> >     
> >    I will incorporate all the changes mentioned below and will resend the
> >    patch once the testing is done.
> >     
> >    Regards,
> >    Muneendra.
> >     
> >     
> >     
> >    -----Original Message-----
> >    From: Benjamin Marzinski [[1]mailto:bmarzins@redhat.com]
> >    Sent: Tuesday, January 17, 2017 6:35 AM
> >    To: Muneendra Kumar M <[2]mmandala@Brocade.com>
> >    Cc: [3]dm-devel@redhat.com
> >    Subject: Re: [dm-devel] deterministic io throughput in multipath
> >     
> >    On Mon, Jan 16, 2017 at 11:19:19AM +0000, Muneendra Kumar M wrote:
> >    >    Hi Ben,
> >    >    After the below discussion we  came with the approach which will meet
> >    our
> >    >    requirement.
> >    >    I have attached the patch which is working good in our field tests.
> >    >    Could you please review the attached patch and provide us your
> >    valuable
> >    >    comments .
> >     
> >    I can see a number of issues with this patch.
> >     
> >    First, some nit-picks:
> >    - I assume "dis_reinstante_time" should be "dis_reinstate_time"
> >     
> >    - The indenting in check_path_validity_err is wrong, which made it
> >      confusing until I noticed that
> >     
> >    if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
> >     
> >      doesn't have an open brace, and shouldn't indent the rest of the
> >      function.
> >     
> >    - You call clock_gettime in check_path, but never use the result.
> >     
> >    - In dict.c, instead of writing your own functions that are the same as
> >      the *_delay_checks functions, you could make those functions generic
> >      and use them for both.  To go match the other generic function names
> >      they would probably be something like
> >     
> >    set_off_int_undef
> >     
> >    print_off_int_undef
> >     
> >      You would also need to change DELAY_CHECKS_* and ERR_CHECKS_* to
> >      point to some common enum that you created, the way
> >      user_friendly_names_states (to name one of many) does. The generic
> >      enum used by *_off_int_undef would be something like.
> >     
> >    enum no_undef {
> >            NU_NO = -1,
> >            NU_UNDEF = 0,
> >    }
> >     
> >      The idea is to try to cut down on the number of functions that are
> >      simply copy-pasting other functions in dict.c.
> >     
> >     
> >    Those are all minor cleanup issues, but there are some bigger problems.
> >     
> >    Instead of checking if san_path_err_threshold,
> >    san_path_err_threshold_window, and san_path_err_recovery_time are greater
> >    than zero seperately, you should probably check them all at the start of
> >    check_path_validity_err, and return 0 unless they all are set.
> >    Right now, if a user sets san_path_err_threshold and
> >    san_path_err_threshold_window but not san_path_err_recovery_time, their
> >    path will never recover after it hits the error threshold.  I pretty sure
> >    that you don't mean to permanently disable the paths.
> >     
> >     
> >    time_t is a signed type, which means that if you get the clock time in
> >    update_multpath and then fail to get the clock time in
> >    check_path_validity_err, this check:
> >     
> >    start_time.tv_sec - pp->failure_start_time) <
> >    pp->mpp->san_path_err_threshold_window
> >     
> >    will always be true.  I realize that clock_gettime is very unlikely to
> >    fail.  But if it does, probably the safest thing to so is to just
> >    immediately return 0 in check_path_validity_err.
> >     
> >     
> >    The way you set path_failures in update_multipath may not get you what you
> >    want.  It will only count path failures found by the kernel, and not the
> >    path checker.  If the check_path finds the error, pp->state will be set to
> >    PATH_DOWN before pp->dmstate is set to PSTATE_FAILED. That means you will
> >    not increment path_failures. Perhaps this is what you want, but I would
> >    assume that you would want to count every time the path goes down
> >    regardless of whether multipathd or the kernel noticed it.
> >     
> >     
> >    I'm not super enthusiastic about how the san_path_err_threshold_window
> >    works.  First, it starts counting from when the path goes down, so if the
> >    path takes long enough to get restored, and then fails immediately, it can
> >    just keep failing and it will never hit the san_path_err_threshold_window,
> >    since it spends so much of that time with the path failed.  Also, the
> >    window gets set on the first error, and never reset until the number of
> >    errors is over the threshold.  This means that if you get one early error
> >    and then a bunch of errors much later, you will go for (2 x
> >    san_path_err_threshold) - 1 errors until you stop reinstating the path,
> >    because of the window reset in the middle of the string of errors.  It
> >    seems like a better idea would be to have check_path_validity_err reset
> >    path_failures as soon as it notices that you are past
> >    san_path_err_threshold_window, instead of waiting till the number of
> >    errors hits san_path_err_threshold.
> >     
> >     
> >    If I was going to design this, I think I would have san_path_err_threshold
> >    and san_path_err_recovery_time like you do, but instead of having a
> >    san_path_err_threshold_window, I would have something like
> >    san_path_err_forget_rate.  The idea is that every san_path_err_forget_rate
> >    number of successful path checks you decrement path_failures by 1. This
> >    means that there is no window after which you reset.  If the path failures
> >    come in faster than the forget rate, you will eventually hit the error
> >    threshold. This also has the benefit of easily not counting time when the
> >    path was down as time where the path wasn't having problems. But if you
> >    don't like my idea, yours will work fine with some polish.
> >     
> >    -Ben
> >     
> >     
> >    >    Below are the files that has been changed .
> >    >     
> >    >    libmultipath/config.c      |  3 +++
> >    >    libmultipath/config.h      |  9 +++++++++
> >    >    libmultipath/configure.c   |  3 +++
> >    >    libmultipath/defaults.h    |  1 +
> >    >    libmultipath/dict.c             | 80
> >    >   
> >    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >    >    libmultipath/dict.h        |  1 +
> >    >    libmultipath/propsel.c     | 44
> >    >    ++++++++++++++++++++++++++++++++++++++++++++
> >    >    libmultipath/propsel.h     |  6 ++++++
> >    >    libmultipath/structs.h     | 12 +++++++++++-
> >    >    libmultipath/structs_vec.c | 10 ++++++++++
> >    >    multipath/multipath.conf.5 | 58
> >    >    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >    >    multipathd/main.c          | 61
> >    >    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
> >    >     
> >    >    We have added three new config parameters whose description is below.
> >    >    1.san_path_err_threshold:
> >    >            If set to a value greater than 0, multipathd will watch paths
> >    and
> >    >    check how many times a path has been failed due to errors. If the
> >    number
> >    >    of failures on a particular path is greater then the
> >    >    san_path_err_threshold then the path will not  reinstate  till
> >    >    san_path_err_recovery_time. These path failures should occur within a
> >    >    san_path_err_threshold_window time frame, if not we will consider the
> >    path
> >    >    is good enough to reinstate.
> >    >     
> >    >    2.san_path_err_threshold_window:
> >    >            If set to a value greater than 0, multipathd will check
> >    whether
> >    >    the path failures has exceeded  the san_path_err_threshold within
> >    this
> >    >    time frame i.e san_path_err_threshold_window . If so we will not
> >    reinstate
> >    >    the path till          san_path_err_recovery_time.
> >    >     
> >    >    3.san_path_err_recovery_time:
> >    >    If set to a value greater than 0, multipathd will make sure that when
> >    path
> >    >    failures has exceeded the san_path_err_threshold within
> >    >    san_path_err_threshold_window then the path  will be placed in failed
> >    >    state for san_path_err_recovery_time duration. Once
> >    >    san_path_err_recovery_time has timeout  we will reinstate the failed
> >    path
> >    >    .
> >    >     
> >    >    Regards,
> >    >    Muneendra.
> >    >     
> >    >    -----Original Message-----
> >    >    From: Muneendra Kumar M
> >    >    Sent: Wednesday, January 04, 2017 6:56 PM
> >    >    To: 'Benjamin Marzinski' <[4]bmarzins@redhat.com>
> >    >    Cc: [5]dm-devel@redhat.com
> >    >    Subject: RE: [dm-devel] deterministic io throughput in multipath
> >    >     
> >    >    Hi Ben,
> >    >    Thanks for the information.
> >    >     
> >    >    Regards,
> >    >    Muneendra.
> >    >     
> >    >    -----Original Message-----
> >    >    From: Benjamin Marzinski [[1][6]mailto:bmarzins@redhat.com]
> >    >    Sent: Tuesday, January 03, 2017 10:42 PM
> >    >    To: Muneendra Kumar M <[2][7]mmandala@Brocade.com>
> >    >    Cc: [3][8]dm-devel@redhat.com
> >    >    Subject: Re: [dm-devel] deterministic io throughput in multipath
> >    >     
> >    >    On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
> >    >    > Hi Ben,
> >    >    >
> >    >    > If there are two paths on a dm-1 say sda and sdb as below.
> >    >    >
> >    >    > #  multipath -ll
> >    >    >        mpathd (3600110d001ee7f0102050001cc0b6751) dm-1
> >    SANBlaze,VLUN
> >    >    MyLun
> >    >    >        size=8.0M features='0' hwhandler='0' wp=rw
> >    >    >        `-+- policy='round-robin 0' prio=50 status=active
> >    >    >          |- 8:0:1:0  sda 8:48 active ready  running
> >    >    >          `- 9:0:1:0  sdb 8:64 active ready  running         
> >    >    >
> >    >    > And on sda if iam seeing lot of errors due to which the sda path is
> >    >    fluctuating from failed state to active state and vicevera.
> >    >    >
> >    >    > My requirement is something like this if sda is failed for more
> >    then 5
> >    >    > times in a hour duration ,then I want to keep the sda in failed
> >    state
> >    >    > for few hours (3hrs)
> >    >    >
> >    >    > And the data should travel only thorugh sdb path.
> >    >    > Will this be possible with the below parameters.
> >    >     
> >    >    No. delay_watch_checks sets how may path checks you watch a path that
> >    has
> >    >    recently come back from the failed state. If the path fails again
> >    within
> >    >    this time, multipath device delays it.  This means that the delay is
> >    >    always trigger by two failures within the time limit.  It's possible
> >    to
> >    >    adapt this to count numbers of failures, and act after a certain
> >    number
> >    >    within a certain timeframe, but it would take a bit more work.
> >    >     
> >    >    delay_wait_checks doesn't guarantee that it will delay for any set
> >    length
> >    >    of time.  Instead, it sets the number of consecutive successful path
> >    >    checks that must occur before the path is usable again. You could set
> >    this
> >    >    for 3 hours of path checks, but if a check failed during this time,
> >    you
> >    >    would restart the 3 hours over again.
> >    >     
> >    >    -Ben
> >    >     
> >    >    > Can you just let me know what values I should add for
> >    delay_watch_checks
> >    >    and delay_wait_checks.
> >    >    >
> >    >    > Regards,
> >    >    > Muneendra.
> >    >    >
> >    >    >
> >    >    >
> >    >    > -----Original Message-----
> >    >    > From: Muneendra Kumar M
> >    >    > Sent: Thursday, December 22, 2016 11:10 AM
> >    >    > To: 'Benjamin Marzinski' <[4][9]bmarzins@redhat.com>
> >    >    > Cc: [5][10]dm-devel@redhat.com
> >    >    > Subject: RE: [dm-devel] deterministic io throughput in multipath
> >    >    >
> >    >    > Hi Ben,
> >    >    >
> >    >    > Thanks for the reply.
> >    >    > I will look into this parameters will do the internal testing and
> >    let
> >    >    you know the results.
> >    >    >
> >    >    > Regards,
> >    >    > Muneendra.
> >    >    >
> >    >    > -----Original Message-----
> >    >    > From: Benjamin Marzinski [[6][11]mailto:bmarzins@redhat.com]
> >    >    > Sent: Wednesday, December 21, 2016 9:40 PM
> >    >    > To: Muneendra Kumar M <[7][12]mmandala@Brocade.com>
> >    >    > Cc: [8][13]dm-devel@redhat.com
> >    >    > Subject: Re: [dm-devel] deterministic io throughput in multipath
> >    >    >
> >    >    > Have you looked into the delay_watch_checks and delay_wait_checks
> >    >    configuration parameters?  The idea behind them is to minimize the
> >    use of
> >    >    paths that are intermittently failing.
> >    >    >
> >    >    > -Ben
> >    >    >
> >    >    > On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
> >    >    > >    Customers using Linux host (mostly RHEL host) using a SAN
> >    network
> >    >    for
> >    >    > >    block storage, complain the Linux multipath stack is not
> >    resilient
> >    >    to
> >    >    > >    handle non-deterministic storage network behaviors. This has
> >    caused
> >    >    many
> >    >    > >    customer move away to non-linux based servers. The intent of
> >    the
> >    >    below
> >    >    > >    patch and the prevailing issues are given below. With the
> >    below
> >    >    design we
> >    >    > >    are seeing the Linux multipath stack becoming resilient to
> >    such
> >    >    network
> >    >    > >    issues. We hope by getting this patch accepted will help in
> >    more
> >    >    Linux
> >    >    > >    server adoption that use SAN network.
> >    >    > >
> >    >    > >    I have already sent the design details to the community in a
> >    >    different
> >    >    > >    mail chain and the details are available in the below link.
> >    >    > >
> >    >    > >   
> >    >   
> >    [1][9][14]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e=
> >    >    .
> >    >    > >
> >    >    > >    Can you please go through the design and send the comments to
> >    us.
> >    >    > >
> >    >    > >     
> >    >    > >
> >    >    > >    Regards,
> >    >    > >
> >    >    > >    Muneendra.
> >    >    > >
> >    >    > >     
> >    >    > >
> >    >    > >     
> >    >    > >
> >    >    > > References
> >    >    > >
> >    >    > >    Visible links
> >    >    > >    1.
> >    >    > >
> >    >   
> >    [10][15]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >    >    > > ar
> >    >    > >
> >    chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
> >    >    > > ub
> >    >    > >
> >    gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
> >    >    > > 1K
> >    >    > >
> >    XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
> >    >    > > Ru
> >    >    > > 52hG3MKzM&e=
> >    >    >
> >    >    > > --
> >    >    > > dm-devel mailing list
> >    >    > > [11][16]dm-devel@redhat.com
> >    >    > >
> >    >   
> >    [12][17]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >    >    > > ma
> >    >    > >
> >    ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
> >    >    > >
> >    7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
> >    >    > >
> >    > i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=
> >    >     
> >    >
> >    > References
> >    >
> >    >    Visible links
> >    >    1. [18]mailto:bmarzins@redhat.com
> >    >    2. [19]mailto:mmandala@brocade.com
> >    >    3. [20]mailto:dm-devel@redhat.com
> >    >    4. [21]mailto:bmarzins@redhat.com
> >    >    5. [22]mailto:dm-devel@redhat.com
> >    >    6. [23]mailto:bmarzins@redhat.com
> >    >    7. [24]mailto:mmandala@brocade.com
> >    >    8. [25]mailto:dm-devel@redhat.com
> >    >    9.
> >    [26]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
> >    >   10.
> >    [27]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >    >   11. [28]mailto:dm-devel@redhat.com
> >    >   12.
> >    > 
> > [29]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.co
> > m_
> >     
> >     
> >     
> >     
> > 
> > References
> > 
> >    Visible links
> >    1. mailto:bmarzins@redhat.com
> >    2. mailto:mmandala@brocade.com
> >    3. mailto:dm-devel@redhat.com
> >    4. mailto:bmarzins@redhat.com
> >    5. mailto:dm-devel@redhat.com
> >    6. mailto:bmarzins@redhat.com
> >    7. mailto:mmandala@brocade.com
> >    8. mailto:dm-devel@redhat.com
> >    9. mailto:bmarzins@redhat.com
> >   10. mailto:dm-devel@redhat.com
> >   11. mailto:bmarzins@redhat.com
> >   12. mailto:mmandala@brocade.com
> >   13. mailto:dm-devel@redhat.com
> >   14. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
> >   15. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >   16. mailto:dm-devel@redhat.com
> >   17. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >   18. mailto:bmarzins@redhat.com
> >   19. mailto:mmandala@brocade.com
> >   20. mailto:dm-devel@redhat.com
> >   21. mailto:bmarzins@redhat.com
> >   22. mailto:dm-devel@redhat.com
> >   23. mailto:bmarzins@redhat.com
> >   24. mailto:mmandala@brocade.com
> >   25. mailto:dm-devel@redhat.com
> >   26. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
> >   27. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >   28. mailto:dm-devel@redhat.com
> >   29. 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-01-25 13:07                 ` Benjamin Marzinski
@ 2017-02-01 11:58                   ` Muneendra Kumar M
  2017-02-02  1:50                     ` Benjamin Marzinski
  0 siblings, 1 reply; 21+ messages in thread
From: Muneendra Kumar M @ 2017-02-01 11:58 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel

[-- Attachment #1: Type: text/plain, Size: 26444 bytes --]

Hi Ben,
I have made the changes as per the below review comments .
Could you please review the attached patch and provide us your valuable comments .

Below are the files that has been changed . 

libmultipath/config.c      |  3 +++
 libmultipath/config.h      |  9 +++++++++
 libmultipath/configure.c   |  3 +++
 libmultipath/defaults.h    |  3 ++-
 libmultipath/dict.c        | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
 libmultipath/dict.h        |  3 +--
 libmultipath/propsel.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 libmultipath/propsel.h     |  3 +++
 libmultipath/structs.h     | 14 ++++++++++----
 multipath/multipath.conf.5 | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 multipathd/main.c          | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 11 files changed, 287 insertions(+), 37 deletions(-)

Thanks for the general info provided below.
I will commit the changes as below once the review is done.

Regards,
Muneendra.


-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
Sent: Wednesday, January 25, 2017 6:37 PM
To: Muneendra Kumar M <mmandala@Brocade.com>
Cc: dm-devel@redhat.com
Subject: Re: [dm-devel] deterministic io throughput in multipath

On Wed, Jan 25, 2017 at 11:48:33AM +0000, Muneendra Kumar M wrote:
> Hi Ben,
> Thanks for the review .
> I will consider the below points and will do the necessary changes.
> 
> I have two general questions which may not be related to this.
> 1)Is there any standard tests that we need to do to check the functionality of the multipath daemon.

No. multipath doesn't have a standard set of regression tests.  You need to do your own testing.

> 2)Iam new to git is there any standard steps which we generally  follow to push the changes .

You don't need to use git to push a patch, but it is easier to process if your patch is inline in the email instead of as an attachment (assuming your mail client doesn't mangle the patch).

If you want to use git, you just need to commit your patches to a branch off the head of master. Then you can build patches with

# git format-patch --cover-letter -s -n -o <dir> origin

and send them with

# git send-email --to "device-mapper development <dm-devel@redhat.com>" --cc "Christophe Varoqui <christophe.varoqui@opensvc.com>" --no-chain-reply-to --suppress-from <dir>

You may need to first need to set up your git name and email

-Ben

> Regards,
> Muneendra.
> 
> 
> 
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@redhat.com]
> Sent: Wednesday, January 25, 2017 2:59 PM
> To: Muneendra Kumar M <mmandala@Brocade.com>
> Cc: dm-devel@redhat.com
> Subject: Re: [dm-devel] deterministic io throughput in multipath
> 
> This looks fine to me.  If this what you want to push, I'm o.k. with it.
> But I'd like to make some suggestions that you are free to ignore.
> 
> Right now you have to check in two places to see if the path failed (in update_multipath and check_path). If you look at the delayed_*_checks code, it flags the path failures when you reinstate the path in check_path, since this will only happen there.
> 
> Next, right now you use the disable_reinstate code to deal with the devices when they shouldn't be reinstated. The issue with this is that the path appears to be up when people look at its state, but still isn't being used. If you do the check early and set the path state to PATH_DELAYED, like delayed_*_checks does, then the path is clearly marked when users look to see why it isn't being used. Also, if you exit check_path early, then you won't be running the prioritizer on these likely-unstable paths.
> 
> Finally, the way you use dis_reinstate_time, a flakey device can get 
> reinstated as soon as it comes back up, as long it was down for long 
> enough, simply because pp->dis_reinstate_time reached
> mpp->san_path_err_recovery_time while the device was failed.
> delayed_*_checks depends on a number of successful path checks, so you know that the device has at least been nominally functional for san_path_err_recovery_time.
> 
> Like I said, you don't have to change any of this to make me happy with your patch. But if you did change all of these, then the current delay_*_checks code would just end up being a special case of your code.
> I'd really like to pull out the delayed_*_checks code and just keep your version, since it seems more useful. It would be nice to keep the same functionality. But even if you don't make these changes, I still think we should pull out the delayed_*_checks code, since they both do the same general thing, and your code does it better.
> 
> -Ben
> 
> On Mon, Jan 23, 2017 at 11:02:42AM +0000, Muneendra Kumar M wrote:
> >    Hi Ben,
> >    I have made the changes as per the below review comments .
> >     
> >    Could you please review the attached patch and provide us your valuable
> >    comments .
> >    Below are the files that has been changed .
> >     
> >    libmultipath/config.c      |  3 +++
> >    libmultipath/config.h      |  9 +++++++++
> >    libmultipath/configure.c   |  3 +++
> >    libmultipath/defaults.h    |  3 ++-
> >    libmultipath/dict.c        | 84
> >    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
> >    libmultipath/dict.h        |  3 +--
> >    libmultipath/propsel.c     | 48
> >    ++++++++++++++++++++++++++++++++++++++++++++++--
> >    libmultipath/propsel.h     |  3 +++
> >    libmultipath/structs.h     | 14 ++++++++++----
> >    libmultipath/structs_vec.c |  6 ++++++
> >    multipath/multipath.conf.5 | 57
> >    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >    multipathd/main.c          | 70
> >    
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >     
> >     
> >    Regards,
> >    Muneendra.
> >     
> >    _____________________________________________
> >    From: Muneendra Kumar M
> >    Sent: Tuesday, January 17, 2017 4:13 PM
> >    To: 'Benjamin Marzinski' <bmarzins@redhat.com>
> >    Cc: dm-devel@redhat.com
> >    Subject: RE: [dm-devel] deterministic io throughput in multipath
> >     
> >     
> >    Hi Ben,
> >    Thanks for the review.
> >    In dict.c  I will make sure I will make generic functions which will be
> >    used by both delay_checks and err_checks.
> >     
> >    We want to increment the path failures every time the path goes down
> >    regardless of whether multipathd or the kernel noticed the failure of
> >    paths.
> >    Thanks for pointing this.
> >     
> >    I will completely agree with the idea which you mentioned below by
> >    reconsidering the san_path_err_threshold_window with
> >    san_path_err_forget_rate. This will avoid counting time when the path was
> >    down as time where the path wasn't having problems.
> >     
> >    I will incorporate all the changes mentioned below and will resend the
> >    patch once the testing is done.
> >     
> >    Regards,
> >    Muneendra.
> >     
> >     
> >     
> >    -----Original Message-----
> >    From: Benjamin Marzinski [[1]mailto:bmarzins@redhat.com]
> >    Sent: Tuesday, January 17, 2017 6:35 AM
> >    To: Muneendra Kumar M <[2]mmandala@Brocade.com>
> >    Cc: [3]dm-devel@redhat.com
> >    Subject: Re: [dm-devel] deterministic io throughput in multipath
> >     
> >    On Mon, Jan 16, 2017 at 11:19:19AM +0000, Muneendra Kumar M wrote:
> >    >    Hi Ben,
> >    >    After the below discussion we  came with the approach which will meet
> >    our
> >    >    requirement.
> >    >    I have attached the patch which is working good in our field tests.
> >    >    Could you please review the attached patch and provide us your
> >    valuable
> >    >    comments .
> >     
> >    I can see a number of issues with this patch.
> >     
> >    First, some nit-picks:
> >    - I assume "dis_reinstante_time" should be "dis_reinstate_time"
> >     
> >    - The indenting in check_path_validity_err is wrong, which made it
> >      confusing until I noticed that
> >     
> >    if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0)
> >     
> >      doesn't have an open brace, and shouldn't indent the rest of the
> >      function.
> >     
> >    - You call clock_gettime in check_path, but never use the result.
> >     
> >    - In dict.c, instead of writing your own functions that are the same as
> >      the *_delay_checks functions, you could make those functions generic
> >      and use them for both.  To go match the other generic function names
> >      they would probably be something like
> >     
> >    set_off_int_undef
> >     
> >    print_off_int_undef
> >     
> >      You would also need to change DELAY_CHECKS_* and ERR_CHECKS_* to
> >      point to some common enum that you created, the way
> >      user_friendly_names_states (to name one of many) does. The generic
> >      enum used by *_off_int_undef would be something like.
> >     
> >    enum no_undef {
> >            NU_NO = -1,
> >            NU_UNDEF = 0,
> >    }
> >     
> >      The idea is to try to cut down on the number of functions that are
> >      simply copy-pasting other functions in dict.c.
> >     
> >     
> >    Those are all minor cleanup issues, but there are some bigger problems.
> >     
> >    Instead of checking if san_path_err_threshold,
> >    san_path_err_threshold_window, and san_path_err_recovery_time are greater
> >    than zero seperately, you should probably check them all at the start of
> >    check_path_validity_err, and return 0 unless they all are set.
> >    Right now, if a user sets san_path_err_threshold and
> >    san_path_err_threshold_window but not san_path_err_recovery_time, their
> >    path will never recover after it hits the error threshold.  I pretty sure
> >    that you don't mean to permanently disable the paths.
> >     
> >     
> >    time_t is a signed type, which means that if you get the clock time in
> >    update_multpath and then fail to get the clock time in
> >    check_path_validity_err, this check:
> >     
> >    start_time.tv_sec - pp->failure_start_time) <
> >    pp->mpp->san_path_err_threshold_window
> >     
> >    will always be true.  I realize that clock_gettime is very unlikely to
> >    fail.  But if it does, probably the safest thing to so is to just
> >    immediately return 0 in check_path_validity_err.
> >     
> >     
> >    The way you set path_failures in update_multipath may not get you what you
> >    want.  It will only count path failures found by the kernel, and not the
> >    path checker.  If the check_path finds the error, pp->state will be set to
> >    PATH_DOWN before pp->dmstate is set to PSTATE_FAILED. That means you will
> >    not increment path_failures. Perhaps this is what you want, but I would
> >    assume that you would want to count every time the path goes down
> >    regardless of whether multipathd or the kernel noticed it.
> >     
> >     
> >    I'm not super enthusiastic about how the san_path_err_threshold_window
> >    works.  First, it starts counting from when the path goes down, so if the
> >    path takes long enough to get restored, and then fails immediately, it can
> >    just keep failing and it will never hit the san_path_err_threshold_window,
> >    since it spends so much of that time with the path failed.  Also, the
> >    window gets set on the first error, and never reset until the number of
> >    errors is over the threshold.  This means that if you get one early error
> >    and then a bunch of errors much later, you will go for (2 x
> >    san_path_err_threshold) - 1 errors until you stop reinstating the path,
> >    because of the window reset in the middle of the string of errors.  It
> >    seems like a better idea would be to have check_path_validity_err reset
> >    path_failures as soon as it notices that you are past
> >    san_path_err_threshold_window, instead of waiting till the number of
> >    errors hits san_path_err_threshold.
> >     
> >     
> >    If I was going to design this, I think I would have san_path_err_threshold
> >    and san_path_err_recovery_time like you do, but instead of having a
> >    san_path_err_threshold_window, I would have something like
> >    san_path_err_forget_rate.  The idea is that every san_path_err_forget_rate
> >    number of successful path checks you decrement path_failures by 1. This
> >    means that there is no window after which you reset.  If the path failures
> >    come in faster than the forget rate, you will eventually hit the error
> >    threshold. This also has the benefit of easily not counting time when the
> >    path was down as time where the path wasn't having problems. But if you
> >    don't like my idea, yours will work fine with some polish.
> >     
> >    -Ben
> >     
> >     
> >    >    Below are the files that has been changed .
> >    >     
> >    >    libmultipath/config.c      |  3 +++
> >    >    libmultipath/config.h      |  9 +++++++++
> >    >    libmultipath/configure.c   |  3 +++
> >    >    libmultipath/defaults.h    |  1 +
> >    >    libmultipath/dict.c             | 80
> >    >   
> >    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >    >    libmultipath/dict.h        |  1 +
> >    >    libmultipath/propsel.c     | 44
> >    >    ++++++++++++++++++++++++++++++++++++++++++++
> >    >    libmultipath/propsel.h     |  6 ++++++
> >    >    libmultipath/structs.h     | 12 +++++++++++-
> >    >    libmultipath/structs_vec.c | 10 ++++++++++
> >    >    multipath/multipath.conf.5 | 58
> >    >    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >    >    multipathd/main.c          | 61
> >    >    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
> >    >     
> >    >    We have added three new config parameters whose description is below.
> >    >    1.san_path_err_threshold:
> >    >            If set to a value greater than 0, multipathd will watch paths
> >    and
> >    >    check how many times a path has been failed due to errors. If the
> >    number
> >    >    of failures on a particular path is greater then the
> >    >    san_path_err_threshold then the path will not  reinstate  till
> >    >    san_path_err_recovery_time. These path failures should occur within a
> >    >    san_path_err_threshold_window time frame, if not we will consider the
> >    path
> >    >    is good enough to reinstate.
> >    >     
> >    >    2.san_path_err_threshold_window:
> >    >            If set to a value greater than 0, multipathd will check
> >    whether
> >    >    the path failures has exceeded  the san_path_err_threshold within
> >    this
> >    >    time frame i.e san_path_err_threshold_window . If so we will not
> >    reinstate
> >    >    the path till          san_path_err_recovery_time.
> >    >     
> >    >    3.san_path_err_recovery_time:
> >    >    If set to a value greater than 0, multipathd will make sure that when
> >    path
> >    >    failures has exceeded the san_path_err_threshold within
> >    >    san_path_err_threshold_window then the path  will be placed in failed
> >    >    state for san_path_err_recovery_time duration. Once
> >    >    san_path_err_recovery_time has timeout  we will reinstate the failed
> >    path
> >    >    .
> >    >     
> >    >    Regards,
> >    >    Muneendra.
> >    >     
> >    >    -----Original Message-----
> >    >    From: Muneendra Kumar M
> >    >    Sent: Wednesday, January 04, 2017 6:56 PM
> >    >    To: 'Benjamin Marzinski' <[4]bmarzins@redhat.com>
> >    >    Cc: [5]dm-devel@redhat.com
> >    >    Subject: RE: [dm-devel] deterministic io throughput in multipath
> >    >     
> >    >    Hi Ben,
> >    >    Thanks for the information.
> >    >     
> >    >    Regards,
> >    >    Muneendra.
> >    >     
> >    >    -----Original Message-----
> >    >    From: Benjamin Marzinski [[1][6]mailto:bmarzins@redhat.com]
> >    >    Sent: Tuesday, January 03, 2017 10:42 PM
> >    >    To: Muneendra Kumar M <[2][7]mmandala@Brocade.com>
> >    >    Cc: [3][8]dm-devel@redhat.com
> >    >    Subject: Re: [dm-devel] deterministic io throughput in multipath
> >    >     
> >    >    On Mon, Dec 26, 2016 at 09:42:48AM +0000, Muneendra Kumar M wrote:
> >    >    > Hi Ben,
> >    >    >
> >    >    > If there are two paths on a dm-1 say sda and sdb as below.
> >    >    >
> >    >    > #  multipath -ll
> >    >    >        mpathd (3600110d001ee7f0102050001cc0b6751) dm-1
> >    SANBlaze,VLUN
> >    >    MyLun
> >    >    >        size=8.0M features='0' hwhandler='0' wp=rw
> >    >    >        `-+- policy='round-robin 0' prio=50 status=active
> >    >    >          |- 8:0:1:0  sda 8:48 active ready  running
> >    >    >          `- 9:0:1:0  sdb 8:64 active ready  running         
> >    >    >
> >    >    > And on sda if iam seeing lot of errors due to which the sda path is
> >    >    fluctuating from failed state to active state and vicevera.
> >    >    >
> >    >    > My requirement is something like this if sda is failed for more
> >    then 5
> >    >    > times in a hour duration ,then I want to keep the sda in failed
> >    state
> >    >    > for few hours (3hrs)
> >    >    >
> >    >    > And the data should travel only thorugh sdb path.
> >    >    > Will this be possible with the below parameters.
> >    >     
> >    >    No. delay_watch_checks sets how may path checks you watch a path that
> >    has
> >    >    recently come back from the failed state. If the path fails again
> >    within
> >    >    this time, multipath device delays it.  This means that the delay is
> >    >    always trigger by two failures within the time limit.  It's possible
> >    to
> >    >    adapt this to count numbers of failures, and act after a certain
> >    number
> >    >    within a certain timeframe, but it would take a bit more work.
> >    >     
> >    >    delay_wait_checks doesn't guarantee that it will delay for any set
> >    length
> >    >    of time.  Instead, it sets the number of consecutive successful path
> >    >    checks that must occur before the path is usable again. You could set
> >    this
> >    >    for 3 hours of path checks, but if a check failed during this time,
> >    you
> >    >    would restart the 3 hours over again.
> >    >     
> >    >    -Ben
> >    >     
> >    >    > Can you just let me know what values I should add for
> >    delay_watch_checks
> >    >    and delay_wait_checks.
> >    >    >
> >    >    > Regards,
> >    >    > Muneendra.
> >    >    >
> >    >    >
> >    >    >
> >    >    > -----Original Message-----
> >    >    > From: Muneendra Kumar M
> >    >    > Sent: Thursday, December 22, 2016 11:10 AM
> >    >    > To: 'Benjamin Marzinski' <[4][9]bmarzins@redhat.com>
> >    >    > Cc: [5][10]dm-devel@redhat.com
> >    >    > Subject: RE: [dm-devel] deterministic io throughput in multipath
> >    >    >
> >    >    > Hi Ben,
> >    >    >
> >    >    > Thanks for the reply.
> >    >    > I will look into this parameters will do the internal testing and
> >    let
> >    >    you know the results.
> >    >    >
> >    >    > Regards,
> >    >    > Muneendra.
> >    >    >
> >    >    > -----Original Message-----
> >    >    > From: Benjamin Marzinski [[6][11]mailto:bmarzins@redhat.com]
> >    >    > Sent: Wednesday, December 21, 2016 9:40 PM
> >    >    > To: Muneendra Kumar M <[7][12]mmandala@Brocade.com>
> >    >    > Cc: [8][13]dm-devel@redhat.com
> >    >    > Subject: Re: [dm-devel] deterministic io throughput in multipath
> >    >    >
> >    >    > Have you looked into the delay_watch_checks and delay_wait_checks
> >    >    configuration parameters?  The idea behind them is to minimize the
> >    use of
> >    >    paths that are intermittently failing.
> >    >    >
> >    >    > -Ben
> >    >    >
> >    >    > On Mon, Dec 19, 2016 at 11:50:36AM +0000, Muneendra Kumar M wrote:
> >    >    > >    Customers using Linux host (mostly RHEL host) using a SAN
> >    network
> >    >    for
> >    >    > >    block storage, complain the Linux multipath stack is not
> >    resilient
> >    >    to
> >    >    > >    handle non-deterministic storage network behaviors. This has
> >    caused
> >    >    many
> >    >    > >    customer move away to non-linux based servers. The intent of
> >    the
> >    >    below
> >    >    > >    patch and the prevailing issues are given below. With the
> >    below
> >    >    design we
> >    >    > >    are seeing the Linux multipath stack becoming resilient to
> >    such
> >    >    network
> >    >    > >    issues. We hope by getting this patch accepted will help in
> >    more
> >    >    Linux
> >    >    > >    server adoption that use SAN network.
> >    >    > >
> >    >    > >    I have already sent the design details to the community in a
> >    >    different
> >    >    > >    mail chain and the details are available in the below link.
> >    >    > >
> >    >    > >   
> >    >   
> >    [1][9][14]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e=
> >    >    .
> >    >    > >
> >    >    > >    Can you please go through the design and send the comments to
> >    us.
> >    >    > >
> >    >    > >     
> >    >    > >
> >    >    > >    Regards,
> >    >    > >
> >    >    > >    Muneendra.
> >    >    > >
> >    >    > >     
> >    >    > >
> >    >    > >     
> >    >    > >
> >    >    > > References
> >    >    > >
> >    >    > >    Visible links
> >    >    > >    1.
> >    >    > >
> >    >   
> >    [10][15]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >    >    > > ar
> >    >    > >
> >    chives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOj
> >    >    > > ub
> >    >    > >
> >    gfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e
> >    >    > > 1K
> >    >    > >
> >    XtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7M
> >    >    > > Ru
> >    >    > > 52hG3MKzM&e=
> >    >    >
> >    >    > > --
> >    >    > > dm-devel mailing list
> >    >    > > [11][16]dm-devel@redhat.com
> >    >    > >
> >    >   
> >    [12][17]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >    >    > > ma
> >    >    > >
> >    ilman_listinfo_dm-2Ddevel&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc4
> >    >    > >
> >    7B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsL
> >    >    > >
> >    > i2C1L9pox7uexsY&s=UyE46dXOrNTbPz_TVGtpoHl3J3h_n0uYhI4TI-PgyWg&e=
> >    >     
> >    >
> >    > References
> >    >
> >    >    Visible links
> >    >    1. [18]mailto:bmarzins@redhat.com
> >    >    2. [19]mailto:mmandala@brocade.com
> >    >    3. [20]mailto:dm-devel@redhat.com
> >    >    4. [21]mailto:bmarzins@redhat.com
> >    >    5. [22]mailto:dm-devel@redhat.com
> >    >    6. [23]mailto:bmarzins@redhat.com
> >    >    7. [24]mailto:mmandala@brocade.com
> >    >    8. [25]mailto:dm-devel@redhat.com
> >    >    9.
> >    [26]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
> >    >   10.
> >    [27]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >    >   11. [28]mailto:dm-devel@redhat.com
> >    >   12.
> >    >
> > [29]https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.
> > co
> > m_
> >     
> >     
> >     
> >     
> > 
> > References
> > 
> >    Visible links
> >    1. mailto:bmarzins@redhat.com
> >    2. mailto:mmandala@brocade.com
> >    3. mailto:dm-devel@redhat.com
> >    4. mailto:bmarzins@redhat.com
> >    5. mailto:dm-devel@redhat.com
> >    6. mailto:bmarzins@redhat.com
> >    7. mailto:mmandala@brocade.com
> >    8. mailto:dm-devel@redhat.com
> >    9. mailto:bmarzins@redhat.com
> >   10. mailto:dm-devel@redhat.com
> >   11. mailto:bmarzins@redhat.com
> >   12. mailto:mmandala@brocade.com
> >   13. mailto:dm-devel@redhat.com
> >   14. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
> >   15. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >   16. mailto:dm-devel@redhat.com
> >   17. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >   18. mailto:bmarzins@redhat.com
> >   19. mailto:mmandala@brocade.com
> >   20. mailto:dm-devel@redhat.com
> >   21. mailto:bmarzins@redhat.com
> >   22. mailto:dm-devel@redhat.com
> >   23. mailto:bmarzins@redhat.com
> >   24. mailto:mmandala@brocade.com
> >   25. mailto:dm-devel@redhat.com
> >   26. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_archives_dm-2Ddevel_2016-2DDecember_msg00122.html&d=DgIDAw&c=IL_XqQWOjubgfqINi2jTzg&r=E3ftc47B6BGtZ4fVaYvkuv19wKvC_Mc6nhXaA1sBIP0&m=vfwpVp6e1KXtRA0ctwHYJ7cDmPsLi2C1L9pox7uexsY&s=q5OI-lfefNC2CHKmyUkokgiyiPo_Uj7MRu52hG3MKzM&e
> >   27. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> >   28. mailto:dm-devel@redhat.com
> >   29. 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com_
> 

[-- Attachment #2: san_path_error.patch --]
[-- Type: application/octet-stream, Size: 23194 bytes --]

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 15ddbd8..be384af 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -348,6 +348,9 @@ merge_hwe (struct hwentry * dst, struct hwentry * src)
 	merge_num(delay_wait_checks);
 	merge_num(skip_kpartx);
 	merge_num(max_sectors_kb);
+	merge_num(san_path_err_threshold);
+	merge_num(san_path_err_forget_rate);
+	merge_num(san_path_err_recovery_time);
 
 	/*
 	 * Make sure features is consistent with
diff --git a/libmultipath/config.h b/libmultipath/config.h
index 9670020..9e47894 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -65,6 +65,9 @@ struct hwentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	char * bl_product;
@@ -93,6 +96,9 @@ struct mpentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	uid_t uid;
@@ -138,6 +144,9 @@ struct config {
 	int processed_main_config;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int uxsock_timeout;
 	int strict_timing;
 	int retrigger_tries;
diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index a0fcad9..5ad3007 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -294,6 +294,9 @@ int setup_map(struct multipath *mpp, char *params, int params_size)
 	select_deferred_remove(conf, mpp);
 	select_delay_watch_checks(conf, mpp);
 	select_delay_wait_checks(conf, mpp);
+	select_san_path_err_threshold(conf, mpp);
+	select_san_path_err_forget_rate(conf, mpp);
+	select_san_path_err_recovery_time(conf, mpp);
 	select_skip_kpartx(conf, mpp);
 	select_max_sectors_kb(conf, mpp);
 
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index b9b0a37..3ef1579 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
@@ -23,7 +23,8 @@
 #define DEFAULT_RETAIN_HWHANDLER RETAIN_HWHANDLER_ON
 #define DEFAULT_DETECT_PRIO	DETECT_PRIO_ON
 #define DEFAULT_DEFERRED_REMOVE	DEFERRED_REMOVE_OFF
-#define DEFAULT_DELAY_CHECKS	DELAY_CHECKS_OFF
+#define DEFAULT_DELAY_CHECKS	NU_NO
+#define DEFAULT_ERR_CHECKS	NU_NO
 #define DEFAULT_UEVENT_STACKSIZE 256
 #define DEFAULT_RETRIGGER_DELAY	10
 #define DEFAULT_RETRIGGER_TRIES	3
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index dc21846..4754572 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -1023,7 +1023,7 @@ declare_mp_handler(reservation_key, set_reservation_key)
 declare_mp_snprint(reservation_key, print_reservation_key)
 
 static int
-set_delay_checks(vector strvec, void *ptr)
+set_off_int_undef(vector strvec, void *ptr)
 {
 	int *int_ptr = (int *)ptr;
 	char * buff;
@@ -1033,47 +1033,69 @@ set_delay_checks(vector strvec, void *ptr)
 		return 1;
 
 	if (!strcmp(buff, "no") || !strcmp(buff, "0"))
-		*int_ptr = DELAY_CHECKS_OFF;
+		*int_ptr = NU_NO;
 	else if ((*int_ptr = atoi(buff)) < 1)
-		*int_ptr = DELAY_CHECKS_UNDEF;
+		*int_ptr = NU_UNDEF;
 
 	FREE(buff);
 	return 0;
 }
 
 int
-print_delay_checks(char * buff, int len, void *ptr)
+print_off_int_undef(char * buff, int len, void *ptr)
 {
 	int *int_ptr = (int *)ptr;
 
 	switch(*int_ptr) {
-	case DELAY_CHECKS_UNDEF:
+	case NU_UNDEF:
 		return 0;
-	case DELAY_CHECKS_OFF:
+	case NU_NO:
 		return snprintf(buff, len, "\"off\"");
 	default:
 		return snprintf(buff, len, "%i", *int_ptr);
 	}
 }
 
-declare_def_handler(delay_watch_checks, set_delay_checks)
-declare_def_snprint(delay_watch_checks, print_delay_checks)
-declare_ovr_handler(delay_watch_checks, set_delay_checks)
-declare_ovr_snprint(delay_watch_checks, print_delay_checks)
-declare_hw_handler(delay_watch_checks, set_delay_checks)
-declare_hw_snprint(delay_watch_checks, print_delay_checks)
-declare_mp_handler(delay_watch_checks, set_delay_checks)
-declare_mp_snprint(delay_watch_checks, print_delay_checks)
-
-declare_def_handler(delay_wait_checks, set_delay_checks)
-declare_def_snprint(delay_wait_checks, print_delay_checks)
-declare_ovr_handler(delay_wait_checks, set_delay_checks)
-declare_ovr_snprint(delay_wait_checks, print_delay_checks)
-declare_hw_handler(delay_wait_checks, set_delay_checks)
-declare_hw_snprint(delay_wait_checks, print_delay_checks)
-declare_mp_handler(delay_wait_checks, set_delay_checks)
-declare_mp_snprint(delay_wait_checks, print_delay_checks)
-
+declare_def_handler(delay_watch_checks, set_off_int_undef)
+declare_def_snprint(delay_watch_checks, print_off_int_undef)
+declare_ovr_handler(delay_watch_checks, set_off_int_undef)
+declare_ovr_snprint(delay_watch_checks, print_off_int_undef)
+declare_hw_handler(delay_watch_checks, set_off_int_undef)
+declare_hw_snprint(delay_watch_checks, print_off_int_undef)
+declare_mp_handler(delay_watch_checks, set_off_int_undef)
+declare_mp_snprint(delay_watch_checks, print_off_int_undef)
+declare_def_handler(delay_wait_checks, set_off_int_undef)
+declare_def_snprint(delay_wait_checks, print_off_int_undef)
+declare_ovr_handler(delay_wait_checks, set_off_int_undef)
+declare_ovr_snprint(delay_wait_checks, print_off_int_undef)
+declare_hw_handler(delay_wait_checks, set_off_int_undef)
+declare_hw_snprint(delay_wait_checks, print_off_int_undef)
+declare_mp_handler(delay_wait_checks, set_off_int_undef)
+declare_mp_snprint(delay_wait_checks, print_off_int_undef)
+declare_def_handler(san_path_err_threshold, set_off_int_undef)
+declare_def_snprint(san_path_err_threshold, print_off_int_undef)
+declare_ovr_handler(san_path_err_threshold, set_off_int_undef)
+declare_ovr_snprint(san_path_err_threshold, print_off_int_undef)
+declare_hw_handler(san_path_err_threshold, set_off_int_undef)
+declare_hw_snprint(san_path_err_threshold, print_off_int_undef)
+declare_mp_handler(san_path_err_threshold, set_off_int_undef)
+declare_mp_snprint(san_path_err_threshold, print_off_int_undef)
+declare_def_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_def_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_ovr_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_ovr_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_hw_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_hw_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_mp_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_mp_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_def_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_def_snprint(san_path_err_recovery_time, print_off_int_undef)
+declare_ovr_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_ovr_snprint(san_path_err_recovery_time, print_off_int_undef)
+declare_hw_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_hw_snprint(san_path_err_recovery_time, print_off_int_undef)
+declare_mp_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_mp_snprint(san_path_err_recovery_time, print_off_int_undef)
 static int
 def_uxsock_timeout_handler(struct config *conf, vector strvec)
 {
@@ -1404,6 +1426,10 @@ init_keywords(vector keywords)
 	install_keyword("config_dir", &def_config_dir_handler, &snprint_def_config_dir);
 	install_keyword("delay_watch_checks", &def_delay_watch_checks_handler, &snprint_def_delay_watch_checks);
 	install_keyword("delay_wait_checks", &def_delay_wait_checks_handler, &snprint_def_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &def_san_path_err_threshold_handler, &snprint_def_san_path_err_threshold);
+        install_keyword("san_path_err_forget_rate", &def_san_path_err_forget_rate_handler, &snprint_def_san_path_err_forget_rate);
+        install_keyword("san_path_err_recovery_time", &def_san_path_err_recovery_time_handler, &snprint_def_san_path_err_recovery_time);
+
 	install_keyword("find_multipaths", &def_find_multipaths_handler, &snprint_def_find_multipaths);
 	install_keyword("uxsock_timeout", &def_uxsock_timeout_handler, &snprint_def_uxsock_timeout);
 	install_keyword("retrigger_tries", &def_retrigger_tries_handler, &snprint_def_retrigger_tries);
@@ -1486,6 +1512,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &hw_deferred_remove_handler, &snprint_hw_deferred_remove);
 	install_keyword("delay_watch_checks", &hw_delay_watch_checks_handler, &snprint_hw_delay_watch_checks);
 	install_keyword("delay_wait_checks", &hw_delay_wait_checks_handler, &snprint_hw_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &hw_san_path_err_threshold_handler, &snprint_hw_san_path_err_threshold);
+        install_keyword("san_path_err_forget_rate", &hw_san_path_err_forget_rate_handler, &snprint_hw_san_path_err_forget_rate);
+        install_keyword("san_path_err_recovery_time", &hw_san_path_err_recovery_time_handler, &snprint_hw_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &hw_skip_kpartx_handler, &snprint_hw_skip_kpartx);
 	install_keyword("max_sectors_kb", &hw_max_sectors_kb_handler, &snprint_hw_max_sectors_kb);
 	install_sublevel_end();
@@ -1515,6 +1544,10 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &ovr_deferred_remove_handler, &snprint_ovr_deferred_remove);
 	install_keyword("delay_watch_checks", &ovr_delay_watch_checks_handler, &snprint_ovr_delay_watch_checks);
 	install_keyword("delay_wait_checks", &ovr_delay_wait_checks_handler, &snprint_ovr_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &ovr_san_path_err_threshold_handler, &snprint_ovr_san_path_err_threshold);
+        install_keyword("san_path_err_forget_rate", &ovr_san_path_err_forget_rate_handler, &snprint_ovr_san_path_err_forget_rate);
+        install_keyword("san_path_err_recovery_time", &ovr_san_path_err_recovery_time_handler, &snprint_ovr_san_path_err_recovery_time);
+
 	install_keyword("skip_kpartx", &ovr_skip_kpartx_handler, &snprint_ovr_skip_kpartx);
 	install_keyword("max_sectors_kb", &ovr_max_sectors_kb_handler, &snprint_ovr_max_sectors_kb);
 
@@ -1543,6 +1576,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &mp_deferred_remove_handler, &snprint_mp_deferred_remove);
 	install_keyword("delay_watch_checks", &mp_delay_watch_checks_handler, &snprint_mp_delay_watch_checks);
 	install_keyword("delay_wait_checks", &mp_delay_wait_checks_handler, &snprint_mp_delay_wait_checks);
+	install_keyword("san_path_err_threshold", &mp_san_path_err_threshold_handler, &snprint_mp_san_path_err_threshold);
+	install_keyword("san_path_err_forget_rate", &mp_san_path_err_forget_rate_handler, &snprint_mp_san_path_err_forget_rate);
+	install_keyword("san_path_err_recovery_time", &mp_san_path_err_recovery_time_handler, &snprint_mp_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &mp_skip_kpartx_handler, &snprint_mp_skip_kpartx);
 	install_keyword("max_sectors_kb", &mp_max_sectors_kb_handler, &snprint_mp_max_sectors_kb);
 	install_sublevel_end();
diff --git a/libmultipath/dict.h b/libmultipath/dict.h
index 4cd03c5..2d6097d 100644
--- a/libmultipath/dict.h
+++ b/libmultipath/dict.h
@@ -14,6 +14,5 @@ int print_no_path_retry(char * buff, int len, void *ptr);
 int print_fast_io_fail(char * buff, int len, void *ptr);
 int print_dev_loss(char * buff, int len, void *ptr);
 int print_reservation_key(char * buff, int len, void * ptr);
-int print_delay_checks(char * buff, int len, void *ptr);
-
+int print_off_int_undef(char * buff, int len, void *ptr);
 #endif /* _DICT_H */
diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index c0bc616..e4afef7 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -623,7 +623,7 @@ int select_delay_watch_checks(struct config *conf, struct multipath *mp)
 	mp_set_conf(delay_watch_checks);
 	mp_set_default(delay_watch_checks, DEFAULT_DELAY_CHECKS);
 out:
-	print_delay_checks(buff, 12, &mp->delay_watch_checks);
+	print_off_int_undef(buff, 12, &mp->delay_watch_checks);
 	condlog(3, "%s: delay_watch_checks = %s %s", mp->alias, buff, origin);
 	return 0;
 }
@@ -638,12 +638,56 @@ int select_delay_wait_checks(struct config *conf, struct multipath *mp)
 	mp_set_conf(delay_wait_checks);
 	mp_set_default(delay_wait_checks, DEFAULT_DELAY_CHECKS);
 out:
-	print_delay_checks(buff, 12, &mp->delay_wait_checks);
+	print_off_int_undef(buff, 12, &mp->delay_wait_checks);
 	condlog(3, "%s: delay_wait_checks = %s %s", mp->alias, buff, origin);
 	return 0;
 
 }
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_threshold);
+        mp_set_ovr(san_path_err_threshold);
+        mp_set_hwe(san_path_err_threshold);
+        mp_set_conf(san_path_err_threshold);
+        mp_set_default(san_path_err_threshold, DEFAULT_ERR_CHECKS);
+out:
+        print_off_int_undef(buff, 12, &mp->san_path_err_threshold);
+        condlog(3, "%s: san_path_err_threshold = %s %s", mp->alias, buff, origin);
+        return 0;
+}
+
+int select_san_path_err_forget_rate(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_forget_rate);
+        mp_set_ovr(san_path_err_forget_rate);
+        mp_set_hwe(san_path_err_forget_rate);
+        mp_set_conf(san_path_err_forget_rate);
+        mp_set_default(san_path_err_forget_rate, DEFAULT_ERR_CHECKS);
+out:
+        print_off_int_undef(buff, 12, &mp->san_path_err_forget_rate);
+        condlog(3, "%s: san_path_err_forget_rate = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
 
+        mp_set_mpe(san_path_err_recovery_time);
+        mp_set_ovr(san_path_err_recovery_time);
+        mp_set_hwe(san_path_err_recovery_time);
+        mp_set_conf(san_path_err_recovery_time);
+        mp_set_default(san_path_err_recovery_time, DEFAULT_ERR_CHECKS);
+out:
+        print_off_int_undef(buff, 12, &mp->san_path_err_recovery_time);
+        condlog(3, "%s: san_path_err_recovery_time = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
 int select_skip_kpartx (struct config *conf, struct multipath * mp)
 {
 	char *origin;
diff --git a/libmultipath/propsel.h b/libmultipath/propsel.h
index ad98fa5..e5b6f93 100644
--- a/libmultipath/propsel.h
+++ b/libmultipath/propsel.h
@@ -24,3 +24,6 @@ int select_delay_watch_checks (struct config *conf, struct multipath * mp);
 int select_delay_wait_checks (struct config *conf, struct multipath * mp);
 int select_skip_kpartx (struct config *conf, struct multipath * mp);
 int select_max_sectors_kb (struct config *conf, struct multipath * mp);
+int select_san_path_err_forget_rate(struct config *conf, struct multipath *mp);
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp);
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp);
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 396f69d..6edd927 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -152,9 +152,9 @@ enum scsi_protocol {
 	SCSI_PROTOCOL_UNSPEC = 0xf, /* No specific protocol */
 };
 
-enum delay_checks_states {
-	DELAY_CHECKS_OFF = -1,
-	DELAY_CHECKS_UNDEF = 0,
+enum no_undef_states {
+	NU_NO = -1,
+	NU_UNDEF = 0,
 };
 
 enum initialized_states {
@@ -223,7 +223,10 @@ struct path {
 	int initialized;
 	int retriggers;
 	int wwid_changed;
-
+	unsigned int path_failures;
+	time_t dis_reinstate_time;
+	int disable_reinstate;
+	int san_path_err_forget_rate;
 	/* configlet pointers */
 	struct hwentry * hwe;
 };
@@ -255,6 +258,9 @@ struct multipath {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	unsigned int dev_loss;
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 36589f5..3c564ad 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
@@ -751,6 +751,45 @@ The default is: \fB/etc/multipath/conf.d/\fR
 .
 .
 .TP
+.B san_path_err_threshold
+If set to a value greater than 0, multipathd will watch paths and check how many
+times a path has been failed due to errors.If the number of failures on a particular
+path is greater then the san_path_err_threshold then the path will not  reinstante
+till san_path_err_recovery_time.These path failures should occur within a 
+san_path_err_forget_rate checks, if not we will consider the path is good enough
+to reinstantate.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_forget_rate
+If set to a value greater than 0, multipathd will check whether the path failures
+has exceeded  the san_path_err_threshold within this many checks i.e 
+san_path_err_forget_rate . If so we will not reinstante the path till
+san_path_err_recovery_time.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_recovery_time
+If set to a value greater than 0, multipathd will make sure that when path failures
+has exceeded the san_path_err_threshold within san_path_err_forget_rate then the path
+will be placed in failed state for san_path_err_recovery_time duration.Once san_path_err_recovery_time
+has timeout  we will reinstante the failed path .
+san_path_err_recovery_time value should be in secs.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
 .B delay_watch_checks
 If set to a value greater than 0, multipathd will watch paths that have
 recently become valid for this many checks. If they fail again while they are
@@ -1015,6 +1054,12 @@ are taken from the \fIdefaults\fR or \fIdevices\fR section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_forget_rate
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1128,6 +1173,12 @@ section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_forget_rate
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1192,6 +1243,12 @@ the values are taken from the \fIdevices\fR or \fIdefaults\fR sections:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_forget_rate
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
diff --git a/multipathd/main.c b/multipathd/main.c
index adc3258..d6d68a4 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1487,6 +1487,70 @@ void repair_path(struct path * pp)
 	LOG_MSG(1, checker_message(&pp->checker));
 }
 
+static int check_path_reinstate_state(struct path * pp) {
+	struct timespec start_time;
+	int disable_reinstate = 1;
+
+	if (!((pp->mpp->san_path_err_threshold > 0) && 
+				(pp->mpp->san_path_err_forget_rate > 0) &&
+				(pp->mpp->san_path_err_recovery_time >0))) {
+		return disable_reinstate;
+	}
+
+	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
+		return disable_reinstate;	
+	}
+
+	if ((start_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
+		disable_reinstate =0;
+		pp->path_failures = 0;
+		pp->disable_reinstate = 0;
+		pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+		condlog(3,"\npath %s :reinstate the path after err recovery time\n",pp->dev);
+	}
+	return  disable_reinstate;
+}
+
+static int check_path_validity_err (struct path * pp) {
+	struct timespec start_time;
+	int disable_reinstate = 0;
+
+	if (!((pp->mpp->san_path_err_threshold > 0) && 
+				(pp->mpp->san_path_err_forget_rate > 0) &&
+				(pp->mpp->san_path_err_recovery_time >0))) {
+		return disable_reinstate;
+	}
+
+	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
+		return disable_reinstate;	
+	}
+	if (!pp->disable_reinstate) {
+		if (pp->path_failures) {
+			/*if the error threshold has hit hit within the san_path_err_forget_rate
+			 *cycles donot reinstante the path till the san_path_err_recovery_time
+			 *place the path in failed state till san_path_err_recovery_time so that the
+			 *cutomer can rectify the issue within this time .Once the completion of
+			 *san_path_err_recovery_time it should automatically reinstantate the path
+			 */
+			if ((pp->path_failures > pp->mpp->san_path_err_threshold) &&
+					(pp->san_path_err_forget_rate > 0)) {
+				printf("\n%s:%d: %s hit error threshold \n",__func__,__LINE__,pp->dev);
+				pp->dis_reinstate_time = start_time.tv_sec ;
+				pp->disable_reinstate = 1;
+				disable_reinstate = 1;
+			} else if ((pp->san_path_err_forget_rate > 0)) {
+				pp->san_path_err_forget_rate--;
+			} else {
+				/*for every san_path_err_forget_rate number
+				 *of successful path checks decrement path_failures by 1
+				 */
+				pp->path_failures --;
+				pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+			}
+		}
+	}
+	return  disable_reinstate;
+}
 /*
  * Returns '1' if the path has been checked, '-1' if it was blacklisted
  * and '0' otherwise
@@ -1502,7 +1566,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	int oldchkrstate = pp->chkrstate;
 	int retrigger_tries, checkint;
 	struct config *conf;
-	int ret;
+	int ret;	
 
 	if ((pp->initialized == INIT_OK ||
 	     pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp)
@@ -1601,6 +1665,18 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 		return 0;
 
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
+	     pp->disable_reinstate) {
+		/*
+		 * check if the path is in failed state for more than san_path_err_recovery_time
+		 * if not place the path in delayed state
+		 */
+		if (check_path_reinstate_state(pp)) {
+			pp->state = PATH_DELAYED;
+			return 1;
+		}
+	}
+	
+	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
 	     pp->wait_checks > 0) {
 		if (pp->mpp->nr_active > 0) {
 			pp->state = PATH_DELAYED;
@@ -1609,18 +1685,31 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 		} else
 			pp->wait_checks = 0;
 	}
-
+	if ((newstate == PATH_DOWN || newstate == PATH_GHOST ||
+		pp->state == PATH_DOWN)) {
+		/*assigned  the path_err_forget_rate when we see the first failure on the path*/
+		if(pp->path_failures == 0){
+			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+		}
+		pp->path_failures++;
+	}
 	/*
 	 * don't reinstate failed path, if its in stand-by
 	 * and if target supports only implicit tpgs mode.
 	 * this will prevent unnecessary i/o by dm on stand-by
 	 * paths if there are no other active paths in map.
+	 *
+	 * when path failures has exceeded the san_path_err_threshold 
+	 * within san_path_err_forget_rate then we don't reinstate
+	 * failed path for san_path_err_recovery_time
 	 */
-	disable_reinstate = (newstate == PATH_GHOST &&
+	disable_reinstate = ((newstate == PATH_GHOST &&
 			    pp->mpp->nr_active == 0 &&
-			    pp->tpgs == TPGS_IMPLICIT) ? 1 : 0;
+			    pp->tpgs == TPGS_IMPLICIT) ? 1 :
+			    check_path_validity_err(pp));
 
 	pp->chkrstate = newstate;
+
 	if (newstate != pp->state) {
 		int oldstate = pp->state;
 		pp->state = newstate;

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-02-01 11:58                   ` Muneendra Kumar M
@ 2017-02-02  1:50                     ` Benjamin Marzinski
  2017-02-02 11:48                       ` Muneendra Kumar M
  0 siblings, 1 reply; 21+ messages in thread
From: Benjamin Marzinski @ 2017-02-02  1:50 UTC (permalink / raw)
  To: Muneendra Kumar M; +Cc: device-mapper development

This is certainly moving in the right direction.  There are a couple of
things I would change. check_path_reinstate_state() will automatically
disable the path if there are configuration problems. If things aren't
configured correctly, or the code can't get the current time, it seems
like it should allow the path to get reinstated, to avoid keeping
a perfectly good path down indefinitely.  Also, if you look at the
delay_*_checks code, it automatically reinstates a problematic
path if there are no other paths to use. This seems like a good idea as
well.

Also, your code increments path_failures every time the checker fails.
This means that if a device is down for a while, when it comes back up,
it will get delayed.  I'm not sure if this is intentional, or if you
were trying to track the number of times the path was restored and then
failed again, instead of the total time a path was failed for.

Perhaps it would be easier to show the kind of changes I would make with
a patch.  What do you think about this? I haven't done much testing on
it at all, but these are the changes I would make.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/config.c |   3 +
 libmultipath/dict.c   |   2 +-
 multipathd/main.c     | 149 +++++++++++++++++++++++---------------------------
 3 files changed, 72 insertions(+), 82 deletions(-)

diff --git a/libmultipath/config.c b/libmultipath/config.c
index be384af..5837dc6 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -624,6 +624,9 @@ load_config (char * file)
 	conf->disable_changed_wwids = DEFAULT_DISABLE_CHANGED_WWIDS;
 	conf->remove_retries = 0;
 	conf->max_sectors_kb = DEFAULT_MAX_SECTORS_KB;
+	conf->san_path_err_threshold = DEFAULT_ERR_CHECKS;
+	conf->san_path_err_forget_rate = DEFAULT_ERR_CHECKS;
+	conf->san_path_err_recovery_time = DEFAULT_ERR_CHECKS;
 
 	/*
 	 * preload default hwtable
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 4754572..ae94c88 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -1050,7 +1050,7 @@ print_off_int_undef(char * buff, int len, void *ptr)
 	case NU_UNDEF:
 		return 0;
 	case NU_NO:
-		return snprintf(buff, len, "\"off\"");
+		return snprintf(buff, len, "\"no\"");
 	default:
 		return snprintf(buff, len, "%i", *int_ptr);
 	}
diff --git a/multipathd/main.c b/multipathd/main.c
index d6d68a4..305e236 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1488,69 +1488,70 @@ void repair_path(struct path * pp)
 }
 
 static int check_path_reinstate_state(struct path * pp) {
-	struct timespec start_time;
-	int disable_reinstate = 1;
-
-	if (!((pp->mpp->san_path_err_threshold > 0) && 
-				(pp->mpp->san_path_err_forget_rate > 0) &&
-				(pp->mpp->san_path_err_recovery_time >0))) {
-		return disable_reinstate;
-	}
-
-	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
-		return disable_reinstate;	
+	struct timespec curr_time;
+
+	if (pp->disable_reinstate) {
+		/* If we don't know how much time has passed, automatically
+		 * reinstate the path, just to be safe. Also, if there are
+		 * no other usable paths, reinstate the path */
+		if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0 ||
+		    pp->mpp->nr_active == 0) {
+			condlog(2, "%s : reinstating path early", pp->dev);
+			goto reinstate_path;
+		}
+		if ((curr_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
+			condlog(2,"%s : reinstate the path after err recovery time", pp->dev);
+			goto reinstate_path;
+		}
+		return 1;
 	}
 
-	if ((start_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
-		disable_reinstate =0;
-		pp->path_failures = 0;
-		pp->disable_reinstate = 0;
-		pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
-		condlog(3,"\npath %s :reinstate the path after err recovery time\n",pp->dev);
+	/* forget errors on a working path */
+	if ((pp->state == PATH_UP || pp->state == PATH_GHOST) &&
+	    pp->path_failures > 0) {
+		if (pp->san_path_err_forget_rate > 0)
+			pp->san_path_err_forget_rate--;
+		else {
+			/* for every san_path_err_forget_rate number of 
+			 * successful path checks decrement path_failures by 1
+			 */
+			pp->path_failures--;
+			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+		}
+		return 0;
 	}
-	return  disable_reinstate;
-}
 
-static int check_path_validity_err (struct path * pp) {
-	struct timespec start_time;
-	int disable_reinstate = 0;
+	/* If the path isn't recovering from a failed state, do nothing */
+	if (pp->state != PATH_DOWN && pp->state != PATH_SHAKY &&
+	    pp->state != PATH_TIMEOUT)
+		return 0;
 
-	if (!((pp->mpp->san_path_err_threshold > 0) && 
-				(pp->mpp->san_path_err_forget_rate > 0) &&
-				(pp->mpp->san_path_err_recovery_time >0))) {
-		return disable_reinstate;
-	}
+	if (pp->path_failures == 0)
+		 pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+	pp->path_failures++;
 
-	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
-		return disable_reinstate;	
-	}
-	if (!pp->disable_reinstate) {
-		if (pp->path_failures) {
-			/*if the error threshold has hit hit within the san_path_err_forget_rate
-			 *cycles donot reinstante the path till the san_path_err_recovery_time
-			 *place the path in failed state till san_path_err_recovery_time so that the
-			 *cutomer can rectify the issue within this time .Once the completion of
-			 *san_path_err_recovery_time it should automatically reinstantate the path
-			 */
-			if ((pp->path_failures > pp->mpp->san_path_err_threshold) &&
-					(pp->san_path_err_forget_rate > 0)) {
-				printf("\n%s:%d: %s hit error threshold \n",__func__,__LINE__,pp->dev);
-				pp->dis_reinstate_time = start_time.tv_sec ;
-				pp->disable_reinstate = 1;
-				disable_reinstate = 1;
-			} else if ((pp->san_path_err_forget_rate > 0)) {
-				pp->san_path_err_forget_rate--;
-			} else {
-				/*for every san_path_err_forget_rate number
-				 *of successful path checks decrement path_failures by 1
-				 */
-				pp->path_failures --;
-				pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
-			}
-		}
+	/* if we don't know the currently time, we don't know how long to
+	 * delay the path, so there's no point in checking if we should */
+	if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0)
+		return 0;
+	/* when path failures has exceeded the san_path_err_threshold
+	 * place the path in delayed state till san_path_err_recovery_time
+	 * so that the cutomer can rectify the issue within this time. After
+	 * the completion of san_path_err_recovery_time it should
+	 * automatically reinstate the path */
+	if (pp->path_failures > pp->mpp->san_path_err_threshold) {
+		condlog(2, "%s : hit error threshold. Delaying path reinstatement", pp->dev);
+		pp->dis_reinstate_time = curr_time.tv_sec;
+		pp->disable_reinstate = 1;
+		return 1;
 	}
-	return  disable_reinstate;
+	return 0;
+reinstate_path:
+	pp->path_failures = 0;
+	pp->disable_reinstate = 0;
+	return 0;
 }
+
 /*
  * Returns '1' if the path has been checked, '-1' if it was blacklisted
  * and '0' otherwise
@@ -1566,7 +1567,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	int oldchkrstate = pp->chkrstate;
 	int retrigger_tries, checkint;
 	struct config *conf;
-	int ret;	
+	int ret;
 
 	if ((pp->initialized == INIT_OK ||
 	     pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp)
@@ -1664,16 +1665,15 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	if (!pp->mpp)
 		return 0;
 
+	/* We only need to check if the path should be delayed when the
+	 * the path is actually usable and san_path_err is configured */
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
-	     pp->disable_reinstate) {
-		/*
-		 * check if the path is in failed state for more than san_path_err_recovery_time
-		 * if not place the path in delayed state
-		 */
-		if (check_path_reinstate_state(pp)) {
-			pp->state = PATH_DELAYED;
-			return 1;
-		}
+	    pp->mpp->san_path_err_threshold > 0 &&
+	    pp->mpp->san_path_err_forget_rate > 0 &&
+	    pp->mpp->san_path_err_recovery_time > 0 &&
+	    check_path_reinstate_state(pp)) {
+		pp->state = PATH_DELAYED;
+		return 1;
 	}
 	
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
@@ -1685,31 +1685,18 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 		} else
 			pp->wait_checks = 0;
 	}
-	if ((newstate == PATH_DOWN || newstate == PATH_GHOST ||
-		pp->state == PATH_DOWN)) {
-		/*assigned  the path_err_forget_rate when we see the first failure on the path*/
-		if(pp->path_failures == 0){
-			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
-		}
-		pp->path_failures++;
-	}
+
 	/*
 	 * don't reinstate failed path, if its in stand-by
 	 * and if target supports only implicit tpgs mode.
 	 * this will prevent unnecessary i/o by dm on stand-by
 	 * paths if there are no other active paths in map.
-	 *
-	 * when path failures has exceeded the san_path_err_threshold 
-	 * within san_path_err_forget_rate then we don't reinstate
-	 * failed path for san_path_err_recovery_time
 	 */
-	disable_reinstate = ((newstate == PATH_GHOST &&
+	disable_reinstate = (newstate == PATH_GHOST &&
 			    pp->mpp->nr_active == 0 &&
-			    pp->tpgs == TPGS_IMPLICIT) ? 1 :
-			    check_path_validity_err(pp));
+			    pp->tpgs == TPGS_IMPLICIT) ? 1 : 0;
 
 	pp->chkrstate = newstate;
-
 	if (newstate != pp->state) {
 		int oldstate = pp->state;
 		pp->state = newstate;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-02-02  1:50                     ` Benjamin Marzinski
@ 2017-02-02 11:48                       ` Muneendra Kumar M
  2017-02-02 17:39                         ` Benjamin Marzinski
  0 siblings, 1 reply; 21+ messages in thread
From: Muneendra Kumar M @ 2017-02-02 11:48 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development

[-- Attachment #1: Type: text/plain, Size: 11218 bytes --]

Hi Ben,
The below changes suggested by you are good. Thanks for it.
I have taken your changes and made few changes to make the functionality working.
I have tested the same on the setup which works fine.

We need to increment the path_failures every time checker fails.
if a device is down for a while, when it comes back up, it will get delayed only if the path failures exceeds the error threshold.
Whether checker fails or kernel identifies the failures we need  to capture those as it tells the state of the path and target.
The below code has already taken care of this.

Could you please review the attached patch and provide us your valuable comments .

Below are the files that has been changed .

libmultipath/config.c      |  6 ++++++
 libmultipath/config.h      |  9 +++++++++
 libmultipath/configure.c   |  3 +++
 libmultipath/defaults.h    |  3 ++-
 libmultipath/dict.c        | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
 libmultipath/dict.h        |  3 +--
 libmultipath/propsel.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 libmultipath/propsel.h     |  3 +++
 libmultipath/structs.h     | 14 ++++++++++----
 multipath/multipath.conf.5 | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 multipathd/main.c          | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 11 files changed, 281 insertions(+), 34 deletions(-)

Regards,
Muneendra.




-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
Sent: Thursday, February 02, 2017 7:20 AM
To: Muneendra Kumar M <mmandala@Brocade.com>
Cc: device-mapper development <dm-devel@redhat.com>
Subject: RE: [dm-devel] deterministic io throughput in multipath

This is certainly moving in the right direction.  There are a couple of things I would change. check_path_reinstate_state() will automatically disable the path if there are configuration problems. If things aren't configured correctly, or the code can't get the current time, it seems like it should allow the path to get reinstated, to avoid keeping a perfectly good path down indefinitely.  Also, if you look at the delay_*_checks code, it automatically reinstates a problematic path if there are no other paths to use. This seems like a good idea as well.

Also, your code increments path_failures every time the checker fails.
This means that if a device is down for a while, when it comes back up, it will get delayed.  I'm not sure if this is intentional, or if you were trying to track the number of times the path was restored and then failed again, instead of the total time a path was failed for.

Perhaps it would be easier to show the kind of changes I would make with a patch.  What do you think about this? I haven't done much testing on it at all, but these are the changes I would make.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/config.c |   3 +
 libmultipath/dict.c   |   2 +-
 multipathd/main.c     | 149 +++++++++++++++++++++++---------------------------
 3 files changed, 72 insertions(+), 82 deletions(-)

diff --git a/libmultipath/config.c b/libmultipath/config.c index be384af..5837dc6 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -624,6 +624,9 @@ load_config (char * file)
 	conf->disable_changed_wwids = DEFAULT_DISABLE_CHANGED_WWIDS;
 	conf->remove_retries = 0;
 	conf->max_sectors_kb = DEFAULT_MAX_SECTORS_KB;
+	conf->san_path_err_threshold = DEFAULT_ERR_CHECKS;
+	conf->san_path_err_forget_rate = DEFAULT_ERR_CHECKS;
+	conf->san_path_err_recovery_time = DEFAULT_ERR_CHECKS;
 
 	/*
 	 * preload default hwtable
diff --git a/libmultipath/dict.c b/libmultipath/dict.c index 4754572..ae94c88 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -1050,7 +1050,7 @@ print_off_int_undef(char * buff, int len, void *ptr)
 	case NU_UNDEF:
 		return 0;
 	case NU_NO:
-		return snprintf(buff, len, "\"off\"");
+		return snprintf(buff, len, "\"no\"");
 	default:
 		return snprintf(buff, len, "%i", *int_ptr);
 	}
diff --git a/multipathd/main.c b/multipathd/main.c index d6d68a4..305e236 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1488,69 +1488,70 @@ void repair_path(struct path * pp)  }
 
 static int check_path_reinstate_state(struct path * pp) {
-	struct timespec start_time;
-	int disable_reinstate = 1;
-
-	if (!((pp->mpp->san_path_err_threshold > 0) && 
-				(pp->mpp->san_path_err_forget_rate > 0) &&
-				(pp->mpp->san_path_err_recovery_time >0))) {
-		return disable_reinstate;
-	}
-
-	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
-		return disable_reinstate;	
+	struct timespec curr_time;
+
+	if (pp->disable_reinstate) {
+		/* If we don't know how much time has passed, automatically
+		 * reinstate the path, just to be safe. Also, if there are
+		 * no other usable paths, reinstate the path */
+		if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0 ||
+		    pp->mpp->nr_active == 0) {
+			condlog(2, "%s : reinstating path early", pp->dev);
+			goto reinstate_path;
+		}
+		if ((curr_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
+			condlog(2,"%s : reinstate the path after err recovery time", pp->dev);
+			goto reinstate_path;
+		}
+		return 1;
 	}
 
-	if ((start_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
-		disable_reinstate =0;
-		pp->path_failures = 0;
-		pp->disable_reinstate = 0;
-		pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
-		condlog(3,"\npath %s :reinstate the path after err recovery time\n",pp->dev);
+	/* forget errors on a working path */
+	if ((pp->state == PATH_UP || pp->state == PATH_GHOST) &&
+	    pp->path_failures > 0) {
+		if (pp->san_path_err_forget_rate > 0)
+			pp->san_path_err_forget_rate--;
+		else {
+			/* for every san_path_err_forget_rate number of 
+			 * successful path checks decrement path_failures by 1
+			 */
+			pp->path_failures--;
+			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+		}
+		return 0;
 	}
-	return  disable_reinstate;
-}
 
-static int check_path_validity_err (struct path * pp) {
-	struct timespec start_time;
-	int disable_reinstate = 0;
+	/* If the path isn't recovering from a failed state, do nothing */
+	if (pp->state != PATH_DOWN && pp->state != PATH_SHAKY &&
+	    pp->state != PATH_TIMEOUT)
+		return 0;
 
-	if (!((pp->mpp->san_path_err_threshold > 0) && 
-				(pp->mpp->san_path_err_forget_rate > 0) &&
-				(pp->mpp->san_path_err_recovery_time >0))) {
-		return disable_reinstate;
-	}
+	if (pp->path_failures == 0)
+		 pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+	pp->path_failures++;
 
-	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
-		return disable_reinstate;	
-	}
-	if (!pp->disable_reinstate) {
-		if (pp->path_failures) {
-			/*if the error threshold has hit hit within the san_path_err_forget_rate
-			 *cycles donot reinstante the path till the san_path_err_recovery_time
-			 *place the path in failed state till san_path_err_recovery_time so that the
-			 *cutomer can rectify the issue within this time .Once the completion of
-			 *san_path_err_recovery_time it should automatically reinstantate the path
-			 */
-			if ((pp->path_failures > pp->mpp->san_path_err_threshold) &&
-					(pp->san_path_err_forget_rate > 0)) {
-				printf("\n%s:%d: %s hit error threshold \n",__func__,__LINE__,pp->dev);
-				pp->dis_reinstate_time = start_time.tv_sec ;
-				pp->disable_reinstate = 1;
-				disable_reinstate = 1;
-			} else if ((pp->san_path_err_forget_rate > 0)) {
-				pp->san_path_err_forget_rate--;
-			} else {
-				/*for every san_path_err_forget_rate number
-				 *of successful path checks decrement path_failures by 1
-				 */
-				pp->path_failures --;
-				pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
-			}
-		}
+	/* if we don't know the currently time, we don't know how long to
+	 * delay the path, so there's no point in checking if we should */
+	if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0)
+		return 0;
+	/* when path failures has exceeded the san_path_err_threshold
+	 * place the path in delayed state till san_path_err_recovery_time
+	 * so that the cutomer can rectify the issue within this time. After
+	 * the completion of san_path_err_recovery_time it should
+	 * automatically reinstate the path */
+	if (pp->path_failures > pp->mpp->san_path_err_threshold) {
+		condlog(2, "%s : hit error threshold. Delaying path reinstatement", pp->dev);
+		pp->dis_reinstate_time = curr_time.tv_sec;
+		pp->disable_reinstate = 1;
+		return 1;
 	}
-	return  disable_reinstate;
+	return 0;
+reinstate_path:
+	pp->path_failures = 0;
+	pp->disable_reinstate = 0;
+	return 0;
 }
+
 /*
  * Returns '1' if the path has been checked, '-1' if it was blacklisted
  * and '0' otherwise
@@ -1566,7 +1567,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	int oldchkrstate = pp->chkrstate;
 	int retrigger_tries, checkint;
 	struct config *conf;
-	int ret;	
+	int ret;
 
 	if ((pp->initialized == INIT_OK ||
 	     pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) @@ -1664,16 +1665,15 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	if (!pp->mpp)
 		return 0;
 
+	/* We only need to check if the path should be delayed when the
+	 * the path is actually usable and san_path_err is configured */
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
-	     pp->disable_reinstate) {
-		/*
-		 * check if the path is in failed state for more than san_path_err_recovery_time
-		 * if not place the path in delayed state
-		 */
-		if (check_path_reinstate_state(pp)) {
-			pp->state = PATH_DELAYED;
-			return 1;
-		}
+	    pp->mpp->san_path_err_threshold > 0 &&
+	    pp->mpp->san_path_err_forget_rate > 0 &&
+	    pp->mpp->san_path_err_recovery_time > 0 &&
+	    check_path_reinstate_state(pp)) {
+		pp->state = PATH_DELAYED;
+		return 1;
 	}
 	
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) && @@ -1685,31 +1685,18 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 		} else
 			pp->wait_checks = 0;
 	}
-	if ((newstate == PATH_DOWN || newstate == PATH_GHOST ||
-		pp->state == PATH_DOWN)) {
-		/*assigned  the path_err_forget_rate when we see the first failure on the path*/
-		if(pp->path_failures == 0){
-			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
-		}
-		pp->path_failures++;
-	}
+
 	/*
 	 * don't reinstate failed path, if its in stand-by
 	 * and if target supports only implicit tpgs mode.
 	 * this will prevent unnecessary i/o by dm on stand-by
 	 * paths if there are no other active paths in map.
-	 *
-	 * when path failures has exceeded the san_path_err_threshold 
-	 * within san_path_err_forget_rate then we don't reinstate
-	 * failed path for san_path_err_recovery_time
 	 */
-	disable_reinstate = ((newstate == PATH_GHOST &&
+	disable_reinstate = (newstate == PATH_GHOST &&
 			    pp->mpp->nr_active == 0 &&
-			    pp->tpgs == TPGS_IMPLICIT) ? 1 :
-			    check_path_validity_err(pp));
+			    pp->tpgs == TPGS_IMPLICIT) ? 1 : 0;
 
 	pp->chkrstate = newstate;
-
 	if (newstate != pp->state) {
 		int oldstate = pp->state;
 		pp->state = newstate;
--
1.8.3.1


[-- Attachment #2: san_path_error.patch --]
[-- Type: application/octet-stream, Size: 22168 bytes --]

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 15ddbd8..5837dc6 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -348,6 +348,9 @@ merge_hwe (struct hwentry * dst, struct hwentry * src)
 	merge_num(delay_wait_checks);
 	merge_num(skip_kpartx);
 	merge_num(max_sectors_kb);
+	merge_num(san_path_err_threshold);
+	merge_num(san_path_err_forget_rate);
+	merge_num(san_path_err_recovery_time);
 
 	/*
 	 * Make sure features is consistent with
@@ -621,6 +624,9 @@ load_config (char * file)
 	conf->disable_changed_wwids = DEFAULT_DISABLE_CHANGED_WWIDS;
 	conf->remove_retries = 0;
 	conf->max_sectors_kb = DEFAULT_MAX_SECTORS_KB;
+	conf->san_path_err_threshold = DEFAULT_ERR_CHECKS;
+	conf->san_path_err_forget_rate = DEFAULT_ERR_CHECKS;
+	conf->san_path_err_recovery_time = DEFAULT_ERR_CHECKS;
 
 	/*
 	 * preload default hwtable
diff --git a/libmultipath/config.h b/libmultipath/config.h
index 9670020..9e47894 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -65,6 +65,9 @@ struct hwentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	char * bl_product;
@@ -93,6 +96,9 @@ struct mpentry {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	uid_t uid;
@@ -138,6 +144,9 @@ struct config {
 	int processed_main_config;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int uxsock_timeout;
 	int strict_timing;
 	int retrigger_tries;
diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index a0fcad9..5ad3007 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -294,6 +294,9 @@ int setup_map(struct multipath *mpp, char *params, int params_size)
 	select_deferred_remove(conf, mpp);
 	select_delay_watch_checks(conf, mpp);
 	select_delay_wait_checks(conf, mpp);
+	select_san_path_err_threshold(conf, mpp);
+	select_san_path_err_forget_rate(conf, mpp);
+	select_san_path_err_recovery_time(conf, mpp);
 	select_skip_kpartx(conf, mpp);
 	select_max_sectors_kb(conf, mpp);
 
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index b9b0a37..3ef1579 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
@@ -23,7 +23,8 @@
 #define DEFAULT_RETAIN_HWHANDLER RETAIN_HWHANDLER_ON
 #define DEFAULT_DETECT_PRIO	DETECT_PRIO_ON
 #define DEFAULT_DEFERRED_REMOVE	DEFERRED_REMOVE_OFF
-#define DEFAULT_DELAY_CHECKS	DELAY_CHECKS_OFF
+#define DEFAULT_DELAY_CHECKS	NU_NO
+#define DEFAULT_ERR_CHECKS	NU_NO
 #define DEFAULT_UEVENT_STACKSIZE 256
 #define DEFAULT_RETRIGGER_DELAY	10
 #define DEFAULT_RETRIGGER_TRIES	3
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index dc21846..ae94c88 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -1023,7 +1023,7 @@ declare_mp_handler(reservation_key, set_reservation_key)
 declare_mp_snprint(reservation_key, print_reservation_key)
 
 static int
-set_delay_checks(vector strvec, void *ptr)
+set_off_int_undef(vector strvec, void *ptr)
 {
 	int *int_ptr = (int *)ptr;
 	char * buff;
@@ -1033,47 +1033,69 @@ set_delay_checks(vector strvec, void *ptr)
 		return 1;
 
 	if (!strcmp(buff, "no") || !strcmp(buff, "0"))
-		*int_ptr = DELAY_CHECKS_OFF;
+		*int_ptr = NU_NO;
 	else if ((*int_ptr = atoi(buff)) < 1)
-		*int_ptr = DELAY_CHECKS_UNDEF;
+		*int_ptr = NU_UNDEF;
 
 	FREE(buff);
 	return 0;
 }
 
 int
-print_delay_checks(char * buff, int len, void *ptr)
+print_off_int_undef(char * buff, int len, void *ptr)
 {
 	int *int_ptr = (int *)ptr;
 
 	switch(*int_ptr) {
-	case DELAY_CHECKS_UNDEF:
+	case NU_UNDEF:
 		return 0;
-	case DELAY_CHECKS_OFF:
-		return snprintf(buff, len, "\"off\"");
+	case NU_NO:
+		return snprintf(buff, len, "\"no\"");
 	default:
 		return snprintf(buff, len, "%i", *int_ptr);
 	}
 }
 
-declare_def_handler(delay_watch_checks, set_delay_checks)
-declare_def_snprint(delay_watch_checks, print_delay_checks)
-declare_ovr_handler(delay_watch_checks, set_delay_checks)
-declare_ovr_snprint(delay_watch_checks, print_delay_checks)
-declare_hw_handler(delay_watch_checks, set_delay_checks)
-declare_hw_snprint(delay_watch_checks, print_delay_checks)
-declare_mp_handler(delay_watch_checks, set_delay_checks)
-declare_mp_snprint(delay_watch_checks, print_delay_checks)
-
-declare_def_handler(delay_wait_checks, set_delay_checks)
-declare_def_snprint(delay_wait_checks, print_delay_checks)
-declare_ovr_handler(delay_wait_checks, set_delay_checks)
-declare_ovr_snprint(delay_wait_checks, print_delay_checks)
-declare_hw_handler(delay_wait_checks, set_delay_checks)
-declare_hw_snprint(delay_wait_checks, print_delay_checks)
-declare_mp_handler(delay_wait_checks, set_delay_checks)
-declare_mp_snprint(delay_wait_checks, print_delay_checks)
-
+declare_def_handler(delay_watch_checks, set_off_int_undef)
+declare_def_snprint(delay_watch_checks, print_off_int_undef)
+declare_ovr_handler(delay_watch_checks, set_off_int_undef)
+declare_ovr_snprint(delay_watch_checks, print_off_int_undef)
+declare_hw_handler(delay_watch_checks, set_off_int_undef)
+declare_hw_snprint(delay_watch_checks, print_off_int_undef)
+declare_mp_handler(delay_watch_checks, set_off_int_undef)
+declare_mp_snprint(delay_watch_checks, print_off_int_undef)
+declare_def_handler(delay_wait_checks, set_off_int_undef)
+declare_def_snprint(delay_wait_checks, print_off_int_undef)
+declare_ovr_handler(delay_wait_checks, set_off_int_undef)
+declare_ovr_snprint(delay_wait_checks, print_off_int_undef)
+declare_hw_handler(delay_wait_checks, set_off_int_undef)
+declare_hw_snprint(delay_wait_checks, print_off_int_undef)
+declare_mp_handler(delay_wait_checks, set_off_int_undef)
+declare_mp_snprint(delay_wait_checks, print_off_int_undef)
+declare_def_handler(san_path_err_threshold, set_off_int_undef)
+declare_def_snprint(san_path_err_threshold, print_off_int_undef)
+declare_ovr_handler(san_path_err_threshold, set_off_int_undef)
+declare_ovr_snprint(san_path_err_threshold, print_off_int_undef)
+declare_hw_handler(san_path_err_threshold, set_off_int_undef)
+declare_hw_snprint(san_path_err_threshold, print_off_int_undef)
+declare_mp_handler(san_path_err_threshold, set_off_int_undef)
+declare_mp_snprint(san_path_err_threshold, print_off_int_undef)
+declare_def_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_def_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_ovr_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_ovr_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_hw_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_hw_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_mp_handler(san_path_err_forget_rate, set_off_int_undef)
+declare_mp_snprint(san_path_err_forget_rate, print_off_int_undef)
+declare_def_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_def_snprint(san_path_err_recovery_time, print_off_int_undef)
+declare_ovr_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_ovr_snprint(san_path_err_recovery_time, print_off_int_undef)
+declare_hw_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_hw_snprint(san_path_err_recovery_time, print_off_int_undef)
+declare_mp_handler(san_path_err_recovery_time, set_off_int_undef)
+declare_mp_snprint(san_path_err_recovery_time, print_off_int_undef)
 static int
 def_uxsock_timeout_handler(struct config *conf, vector strvec)
 {
@@ -1404,6 +1426,10 @@ init_keywords(vector keywords)
 	install_keyword("config_dir", &def_config_dir_handler, &snprint_def_config_dir);
 	install_keyword("delay_watch_checks", &def_delay_watch_checks_handler, &snprint_def_delay_watch_checks);
 	install_keyword("delay_wait_checks", &def_delay_wait_checks_handler, &snprint_def_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &def_san_path_err_threshold_handler, &snprint_def_san_path_err_threshold);
+        install_keyword("san_path_err_forget_rate", &def_san_path_err_forget_rate_handler, &snprint_def_san_path_err_forget_rate);
+        install_keyword("san_path_err_recovery_time", &def_san_path_err_recovery_time_handler, &snprint_def_san_path_err_recovery_time);
+
 	install_keyword("find_multipaths", &def_find_multipaths_handler, &snprint_def_find_multipaths);
 	install_keyword("uxsock_timeout", &def_uxsock_timeout_handler, &snprint_def_uxsock_timeout);
 	install_keyword("retrigger_tries", &def_retrigger_tries_handler, &snprint_def_retrigger_tries);
@@ -1486,6 +1512,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &hw_deferred_remove_handler, &snprint_hw_deferred_remove);
 	install_keyword("delay_watch_checks", &hw_delay_watch_checks_handler, &snprint_hw_delay_watch_checks);
 	install_keyword("delay_wait_checks", &hw_delay_wait_checks_handler, &snprint_hw_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &hw_san_path_err_threshold_handler, &snprint_hw_san_path_err_threshold);
+        install_keyword("san_path_err_forget_rate", &hw_san_path_err_forget_rate_handler, &snprint_hw_san_path_err_forget_rate);
+        install_keyword("san_path_err_recovery_time", &hw_san_path_err_recovery_time_handler, &snprint_hw_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &hw_skip_kpartx_handler, &snprint_hw_skip_kpartx);
 	install_keyword("max_sectors_kb", &hw_max_sectors_kb_handler, &snprint_hw_max_sectors_kb);
 	install_sublevel_end();
@@ -1515,6 +1544,10 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &ovr_deferred_remove_handler, &snprint_ovr_deferred_remove);
 	install_keyword("delay_watch_checks", &ovr_delay_watch_checks_handler, &snprint_ovr_delay_watch_checks);
 	install_keyword("delay_wait_checks", &ovr_delay_wait_checks_handler, &snprint_ovr_delay_wait_checks);
+        install_keyword("san_path_err_threshold", &ovr_san_path_err_threshold_handler, &snprint_ovr_san_path_err_threshold);
+        install_keyword("san_path_err_forget_rate", &ovr_san_path_err_forget_rate_handler, &snprint_ovr_san_path_err_forget_rate);
+        install_keyword("san_path_err_recovery_time", &ovr_san_path_err_recovery_time_handler, &snprint_ovr_san_path_err_recovery_time);
+
 	install_keyword("skip_kpartx", &ovr_skip_kpartx_handler, &snprint_ovr_skip_kpartx);
 	install_keyword("max_sectors_kb", &ovr_max_sectors_kb_handler, &snprint_ovr_max_sectors_kb);
 
@@ -1543,6 +1576,9 @@ init_keywords(vector keywords)
 	install_keyword("deferred_remove", &mp_deferred_remove_handler, &snprint_mp_deferred_remove);
 	install_keyword("delay_watch_checks", &mp_delay_watch_checks_handler, &snprint_mp_delay_watch_checks);
 	install_keyword("delay_wait_checks", &mp_delay_wait_checks_handler, &snprint_mp_delay_wait_checks);
+	install_keyword("san_path_err_threshold", &mp_san_path_err_threshold_handler, &snprint_mp_san_path_err_threshold);
+	install_keyword("san_path_err_forget_rate", &mp_san_path_err_forget_rate_handler, &snprint_mp_san_path_err_forget_rate);
+	install_keyword("san_path_err_recovery_time", &mp_san_path_err_recovery_time_handler, &snprint_mp_san_path_err_recovery_time);
 	install_keyword("skip_kpartx", &mp_skip_kpartx_handler, &snprint_mp_skip_kpartx);
 	install_keyword("max_sectors_kb", &mp_max_sectors_kb_handler, &snprint_mp_max_sectors_kb);
 	install_sublevel_end();
diff --git a/libmultipath/dict.h b/libmultipath/dict.h
index 4cd03c5..2d6097d 100644
--- a/libmultipath/dict.h
+++ b/libmultipath/dict.h
@@ -14,6 +14,5 @@ int print_no_path_retry(char * buff, int len, void *ptr);
 int print_fast_io_fail(char * buff, int len, void *ptr);
 int print_dev_loss(char * buff, int len, void *ptr);
 int print_reservation_key(char * buff, int len, void * ptr);
-int print_delay_checks(char * buff, int len, void *ptr);
-
+int print_off_int_undef(char * buff, int len, void *ptr);
 #endif /* _DICT_H */
diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index c0bc616..e4afef7 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -623,7 +623,7 @@ int select_delay_watch_checks(struct config *conf, struct multipath *mp)
 	mp_set_conf(delay_watch_checks);
 	mp_set_default(delay_watch_checks, DEFAULT_DELAY_CHECKS);
 out:
-	print_delay_checks(buff, 12, &mp->delay_watch_checks);
+	print_off_int_undef(buff, 12, &mp->delay_watch_checks);
 	condlog(3, "%s: delay_watch_checks = %s %s", mp->alias, buff, origin);
 	return 0;
 }
@@ -638,12 +638,56 @@ int select_delay_wait_checks(struct config *conf, struct multipath *mp)
 	mp_set_conf(delay_wait_checks);
 	mp_set_default(delay_wait_checks, DEFAULT_DELAY_CHECKS);
 out:
-	print_delay_checks(buff, 12, &mp->delay_wait_checks);
+	print_off_int_undef(buff, 12, &mp->delay_wait_checks);
 	condlog(3, "%s: delay_wait_checks = %s %s", mp->alias, buff, origin);
 	return 0;
 
 }
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_threshold);
+        mp_set_ovr(san_path_err_threshold);
+        mp_set_hwe(san_path_err_threshold);
+        mp_set_conf(san_path_err_threshold);
+        mp_set_default(san_path_err_threshold, DEFAULT_ERR_CHECKS);
+out:
+        print_off_int_undef(buff, 12, &mp->san_path_err_threshold);
+        condlog(3, "%s: san_path_err_threshold = %s %s", mp->alias, buff, origin);
+        return 0;
+}
+
+int select_san_path_err_forget_rate(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
+
+        mp_set_mpe(san_path_err_forget_rate);
+        mp_set_ovr(san_path_err_forget_rate);
+        mp_set_hwe(san_path_err_forget_rate);
+        mp_set_conf(san_path_err_forget_rate);
+        mp_set_default(san_path_err_forget_rate, DEFAULT_ERR_CHECKS);
+out:
+        print_off_int_undef(buff, 12, &mp->san_path_err_forget_rate);
+        condlog(3, "%s: san_path_err_forget_rate = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp)
+{
+        char *origin, buff[12];
 
+        mp_set_mpe(san_path_err_recovery_time);
+        mp_set_ovr(san_path_err_recovery_time);
+        mp_set_hwe(san_path_err_recovery_time);
+        mp_set_conf(san_path_err_recovery_time);
+        mp_set_default(san_path_err_recovery_time, DEFAULT_ERR_CHECKS);
+out:
+        print_off_int_undef(buff, 12, &mp->san_path_err_recovery_time);
+        condlog(3, "%s: san_path_err_recovery_time = %s %s", mp->alias, buff, origin);
+        return 0;
+
+}
 int select_skip_kpartx (struct config *conf, struct multipath * mp)
 {
 	char *origin;
diff --git a/libmultipath/propsel.h b/libmultipath/propsel.h
index ad98fa5..e5b6f93 100644
--- a/libmultipath/propsel.h
+++ b/libmultipath/propsel.h
@@ -24,3 +24,6 @@ int select_delay_watch_checks (struct config *conf, struct multipath * mp);
 int select_delay_wait_checks (struct config *conf, struct multipath * mp);
 int select_skip_kpartx (struct config *conf, struct multipath * mp);
 int select_max_sectors_kb (struct config *conf, struct multipath * mp);
+int select_san_path_err_forget_rate(struct config *conf, struct multipath *mp);
+int select_san_path_err_threshold(struct config *conf, struct multipath *mp);
+int select_san_path_err_recovery_time(struct config *conf, struct multipath *mp);
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 396f69d..6edd927 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -152,9 +152,9 @@ enum scsi_protocol {
 	SCSI_PROTOCOL_UNSPEC = 0xf, /* No specific protocol */
 };
 
-enum delay_checks_states {
-	DELAY_CHECKS_OFF = -1,
-	DELAY_CHECKS_UNDEF = 0,
+enum no_undef_states {
+	NU_NO = -1,
+	NU_UNDEF = 0,
 };
 
 enum initialized_states {
@@ -223,7 +223,10 @@ struct path {
 	int initialized;
 	int retriggers;
 	int wwid_changed;
-
+	unsigned int path_failures;
+	time_t dis_reinstate_time;
+	int disable_reinstate;
+	int san_path_err_forget_rate;
 	/* configlet pointers */
 	struct hwentry * hwe;
 };
@@ -255,6 +258,9 @@ struct multipath {
 	int deferred_remove;
 	int delay_watch_checks;
 	int delay_wait_checks;
+	int san_path_err_threshold;
+	int san_path_err_forget_rate;
+	int san_path_err_recovery_time;
 	int skip_kpartx;
 	int max_sectors_kb;
 	unsigned int dev_loss;
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 36589f5..3c564ad 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
@@ -751,6 +751,45 @@ The default is: \fB/etc/multipath/conf.d/\fR
 .
 .
 .TP
+.B san_path_err_threshold
+If set to a value greater than 0, multipathd will watch paths and check how many
+times a path has been failed due to errors.If the number of failures on a particular
+path is greater then the san_path_err_threshold then the path will not  reinstante
+till san_path_err_recovery_time.These path failures should occur within a 
+san_path_err_forget_rate checks, if not we will consider the path is good enough
+to reinstantate.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_forget_rate
+If set to a value greater than 0, multipathd will check whether the path failures
+has exceeded  the san_path_err_threshold within this many checks i.e 
+san_path_err_forget_rate . If so we will not reinstante the path till
+san_path_err_recovery_time.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
+.B san_path_err_recovery_time
+If set to a value greater than 0, multipathd will make sure that when path failures
+has exceeded the san_path_err_threshold within san_path_err_forget_rate then the path
+will be placed in failed state for san_path_err_recovery_time duration.Once san_path_err_recovery_time
+has timeout  we will reinstante the failed path .
+san_path_err_recovery_time value should be in secs.
+.RS
+.TP
+The default is: \fBno\fR
+.RE
+.
+.
+.TP
 .B delay_watch_checks
 If set to a value greater than 0, multipathd will watch paths that have
 recently become valid for this many checks. If they fail again while they are
@@ -1015,6 +1054,12 @@ are taken from the \fIdefaults\fR or \fIdevices\fR section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_forget_rate
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1128,6 +1173,12 @@ section:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_forget_rate
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
@@ -1192,6 +1243,12 @@ the values are taken from the \fIdevices\fR or \fIdefaults\fR sections:
 .TP
 .B deferred_remove
 .TP
+.B san_path_err_threshold
+.TP
+.B san_path_err_forget_rate
+.TP
+.B san_path_err_recovery_time
+.TP
 .B delay_watch_checks
 .TP
 .B delay_wait_checks
diff --git a/multipathd/main.c b/multipathd/main.c
index adc3258..4a1a7ef 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1487,6 +1487,83 @@ void repair_path(struct path * pp)
 	LOG_MSG(1, checker_message(&pp->checker));
 }
 
+static int check_path_reinstate_state(struct path * pp) {
+	struct timespec curr_time;
+	if (!((pp->mpp->san_path_err_threshold > 0) &&
+				(pp->mpp->san_path_err_forget_rate > 0) &&
+				(pp->mpp->san_path_err_recovery_time >0))) {
+		return 0;
+	}
+	
+	if (pp->disable_reinstate) {
+		/* If we don't know how much time has passed, automatically
+		 * reinstate the path, just to be safe. Also, if there are
+		 * no other usable paths, reinstate the path 
+		 */
+		if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0 ||
+				pp->mpp->nr_active == 0) {
+			condlog(2, "%s : reinstating path early", pp->dev);
+			goto reinstate_path;
+		}
+		if ((curr_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
+			condlog(2,"%s : reinstate the path after err recovery time", pp->dev);
+			goto reinstate_path;
+		}
+		return 1;
+	}
+	/* forget errors on a working path */
+	if ((pp->state == PATH_UP || pp->state == PATH_GHOST) &&
+			pp->path_failures > 0) {
+		if (pp->san_path_err_forget_rate > 0){
+			pp->san_path_err_forget_rate--;
+		} else {
+			/* for every san_path_err_forget_rate number of 
+			 * successful path checks decrement path_failures by 1
+			 */
+			pp->path_failures--;
+			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+		}
+		return 0;
+	}
+
+	/* If the path isn't recovering from a failed state, do nothing */
+	if (pp->state != PATH_DOWN && pp->state != PATH_SHAKY &&
+			pp->state != PATH_TIMEOUT)
+		return 0;
+
+	if (pp->path_failures == 0)
+		pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
+
+	pp->path_failures++;
+
+	/* if we don't know the currently time, we don't know how long to
+	 * delay the path, so there's no point in checking if we should 
+	 */
+
+	if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0)
+		return 0;
+	/* when path failures has exceeded the san_path_err_threshold
+	 * place the path in delayed state till san_path_err_recovery_time
+	 * so that the cutomer can rectify the issue within this time. After
+	 * the completion of san_path_err_recovery_time it should
+	 * automatically reinstate the path 
+	 */
+	if (pp->path_failures > pp->mpp->san_path_err_threshold) {
+		condlog(2, "%s : hit error threshold. Delaying path reinstatement", pp->dev);
+		pp->dis_reinstate_time = curr_time.tv_sec;
+		pp->disable_reinstate = 1;
+		return 1;
+	} else {
+		return 0;
+	}
+
+reinstate_path:
+	pp->path_failures = 0;
+	pp->disable_reinstate = 0;
+	pp->san_path_err_forget_rate = 0;
+	return 0;
+}
+
 /*
  * Returns '1' if the path has been checked, '-1' if it was blacklisted
  * and '0' otherwise
@@ -1601,6 +1678,12 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 		return 0;
 
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
+			check_path_reinstate_state(pp)) {
+		pp->state = PATH_DELAYED;
+		return 1;
+	}
+
+	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
 	     pp->wait_checks > 0) {
 		if (pp->mpp->nr_active > 0) {
 			pp->state = PATH_DELAYED;

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-02-02 11:48                       ` Muneendra Kumar M
@ 2017-02-02 17:39                         ` Benjamin Marzinski
  2017-02-02 18:02                           ` Muneendra Kumar M
  0 siblings, 1 reply; 21+ messages in thread
From: Benjamin Marzinski @ 2017-02-02 17:39 UTC (permalink / raw)
  To: Muneendra Kumar M; +Cc: device-mapper development

This looks fine. Thanks for all your work on this

ACK

-Ben

On Thu, Feb 02, 2017 at 11:48:39AM +0000, Muneendra Kumar M wrote:
> Hi Ben,
> The below changes suggested by you are good. Thanks for it.
> I have taken your changes and made few changes to make the functionality working.
> I have tested the same on the setup which works fine.
> 
> We need to increment the path_failures every time checker fails.
> if a device is down for a while, when it comes back up, it will get delayed only if the path failures exceeds the error threshold.
> Whether checker fails or kernel identifies the failures we need  to capture those as it tells the state of the path and target.
> The below code has already taken care of this.
> 
> Could you please review the attached patch and provide us your valuable comments .
> 
> Below are the files that has been changed .
> 
> libmultipath/config.c      |  6 ++++++
>  libmultipath/config.h      |  9 +++++++++
>  libmultipath/configure.c   |  3 +++
>  libmultipath/defaults.h    |  3 ++-
>  libmultipath/dict.c        | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
>  libmultipath/dict.h        |  3 +--
>  libmultipath/propsel.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>  libmultipath/propsel.h     |  3 +++
>  libmultipath/structs.h     | 14 ++++++++++----
>  multipath/multipath.conf.5 | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  multipathd/main.c          | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  11 files changed, 281 insertions(+), 34 deletions(-)
> 
> Regards,
> Muneendra.
> 
> 
> 
> 
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
> Sent: Thursday, February 02, 2017 7:20 AM
> To: Muneendra Kumar M <mmandala@Brocade.com>
> Cc: device-mapper development <dm-devel@redhat.com>
> Subject: RE: [dm-devel] deterministic io throughput in multipath
> 
> This is certainly moving in the right direction.  There are a couple of things I would change. check_path_reinstate_state() will automatically disable the path if there are configuration problems. If things aren't configured correctly, or the code can't get the current time, it seems like it should allow the path to get reinstated, to avoid keeping a perfectly good path down indefinitely.  Also, if you look at the delay_*_checks code, it automatically reinstates a problematic path if there are no other paths to use. This seems like a good idea as well.
> 
> Also, your code increments path_failures every time the checker fails.
> This means that if a device is down for a while, when it comes back up, it will get delayed.  I'm not sure if this is intentional, or if you were trying to track the number of times the path was restored and then failed again, instead of the total time a path was failed for.
> 
> Perhaps it would be easier to show the kind of changes I would make with a patch.  What do you think about this? I haven't done much testing on it at all, but these are the changes I would make.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/config.c |   3 +
>  libmultipath/dict.c   |   2 +-
>  multipathd/main.c     | 149 +++++++++++++++++++++++---------------------------
>  3 files changed, 72 insertions(+), 82 deletions(-)
> 
> diff --git a/libmultipath/config.c b/libmultipath/config.c index be384af..5837dc6 100644
> --- a/libmultipath/config.c
> +++ b/libmultipath/config.c
> @@ -624,6 +624,9 @@ load_config (char * file)
>  	conf->disable_changed_wwids = DEFAULT_DISABLE_CHANGED_WWIDS;
>  	conf->remove_retries = 0;
>  	conf->max_sectors_kb = DEFAULT_MAX_SECTORS_KB;
> +	conf->san_path_err_threshold = DEFAULT_ERR_CHECKS;
> +	conf->san_path_err_forget_rate = DEFAULT_ERR_CHECKS;
> +	conf->san_path_err_recovery_time = DEFAULT_ERR_CHECKS;
>  
>  	/*
>  	 * preload default hwtable
> diff --git a/libmultipath/dict.c b/libmultipath/dict.c index 4754572..ae94c88 100644
> --- a/libmultipath/dict.c
> +++ b/libmultipath/dict.c
> @@ -1050,7 +1050,7 @@ print_off_int_undef(char * buff, int len, void *ptr)
>  	case NU_UNDEF:
>  		return 0;
>  	case NU_NO:
> -		return snprintf(buff, len, "\"off\"");
> +		return snprintf(buff, len, "\"no\"");
>  	default:
>  		return snprintf(buff, len, "%i", *int_ptr);
>  	}
> diff --git a/multipathd/main.c b/multipathd/main.c index d6d68a4..305e236 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1488,69 +1488,70 @@ void repair_path(struct path * pp)  }
>  
>  static int check_path_reinstate_state(struct path * pp) {
> -	struct timespec start_time;
> -	int disable_reinstate = 1;
> -
> -	if (!((pp->mpp->san_path_err_threshold > 0) && 
> -				(pp->mpp->san_path_err_forget_rate > 0) &&
> -				(pp->mpp->san_path_err_recovery_time >0))) {
> -		return disable_reinstate;
> -	}
> -
> -	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
> -		return disable_reinstate;	
> +	struct timespec curr_time;
> +
> +	if (pp->disable_reinstate) {
> +		/* If we don't know how much time has passed, automatically
> +		 * reinstate the path, just to be safe. Also, if there are
> +		 * no other usable paths, reinstate the path */
> +		if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0 ||
> +		    pp->mpp->nr_active == 0) {
> +			condlog(2, "%s : reinstating path early", pp->dev);
> +			goto reinstate_path;
> +		}
> +		if ((curr_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
> +			condlog(2,"%s : reinstate the path after err recovery time", pp->dev);
> +			goto reinstate_path;
> +		}
> +		return 1;
>  	}
>  
> -	if ((start_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
> -		disable_reinstate =0;
> -		pp->path_failures = 0;
> -		pp->disable_reinstate = 0;
> -		pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> -		condlog(3,"\npath %s :reinstate the path after err recovery time\n",pp->dev);
> +	/* forget errors on a working path */
> +	if ((pp->state == PATH_UP || pp->state == PATH_GHOST) &&
> +	    pp->path_failures > 0) {
> +		if (pp->san_path_err_forget_rate > 0)
> +			pp->san_path_err_forget_rate--;
> +		else {
> +			/* for every san_path_err_forget_rate number of 
> +			 * successful path checks decrement path_failures by 1
> +			 */
> +			pp->path_failures--;
> +			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> +		}
> +		return 0;
>  	}
> -	return  disable_reinstate;
> -}
>  
> -static int check_path_validity_err (struct path * pp) {
> -	struct timespec start_time;
> -	int disable_reinstate = 0;
> +	/* If the path isn't recovering from a failed state, do nothing */
> +	if (pp->state != PATH_DOWN && pp->state != PATH_SHAKY &&
> +	    pp->state != PATH_TIMEOUT)
> +		return 0;
>  
> -	if (!((pp->mpp->san_path_err_threshold > 0) && 
> -				(pp->mpp->san_path_err_forget_rate > 0) &&
> -				(pp->mpp->san_path_err_recovery_time >0))) {
> -		return disable_reinstate;
> -	}
> +	if (pp->path_failures == 0)
> +		 pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> +	pp->path_failures++;
>  
> -	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
> -		return disable_reinstate;	
> -	}
> -	if (!pp->disable_reinstate) {
> -		if (pp->path_failures) {
> -			/*if the error threshold has hit hit within the san_path_err_forget_rate
> -			 *cycles donot reinstante the path till the san_path_err_recovery_time
> -			 *place the path in failed state till san_path_err_recovery_time so that the
> -			 *cutomer can rectify the issue within this time .Once the completion of
> -			 *san_path_err_recovery_time it should automatically reinstantate the path
> -			 */
> -			if ((pp->path_failures > pp->mpp->san_path_err_threshold) &&
> -					(pp->san_path_err_forget_rate > 0)) {
> -				printf("\n%s:%d: %s hit error threshold \n",__func__,__LINE__,pp->dev);
> -				pp->dis_reinstate_time = start_time.tv_sec ;
> -				pp->disable_reinstate = 1;
> -				disable_reinstate = 1;
> -			} else if ((pp->san_path_err_forget_rate > 0)) {
> -				pp->san_path_err_forget_rate--;
> -			} else {
> -				/*for every san_path_err_forget_rate number
> -				 *of successful path checks decrement path_failures by 1
> -				 */
> -				pp->path_failures --;
> -				pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> -			}
> -		}
> +	/* if we don't know the currently time, we don't know how long to
> +	 * delay the path, so there's no point in checking if we should */
> +	if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0)
> +		return 0;
> +	/* when path failures has exceeded the san_path_err_threshold
> +	 * place the path in delayed state till san_path_err_recovery_time
> +	 * so that the cutomer can rectify the issue within this time. After
> +	 * the completion of san_path_err_recovery_time it should
> +	 * automatically reinstate the path */
> +	if (pp->path_failures > pp->mpp->san_path_err_threshold) {
> +		condlog(2, "%s : hit error threshold. Delaying path reinstatement", pp->dev);
> +		pp->dis_reinstate_time = curr_time.tv_sec;
> +		pp->disable_reinstate = 1;
> +		return 1;
>  	}
> -	return  disable_reinstate;
> +	return 0;
> +reinstate_path:
> +	pp->path_failures = 0;
> +	pp->disable_reinstate = 0;
> +	return 0;
>  }
> +
>  /*
>   * Returns '1' if the path has been checked, '-1' if it was blacklisted
>   * and '0' otherwise
> @@ -1566,7 +1567,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
>  	int oldchkrstate = pp->chkrstate;
>  	int retrigger_tries, checkint;
>  	struct config *conf;
> -	int ret;	
> +	int ret;
>  
>  	if ((pp->initialized == INIT_OK ||
>  	     pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) @@ -1664,16 +1665,15 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
>  	if (!pp->mpp)
>  		return 0;
>  
> +	/* We only need to check if the path should be delayed when the
> +	 * the path is actually usable and san_path_err is configured */
>  	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
> -	     pp->disable_reinstate) {
> -		/*
> -		 * check if the path is in failed state for more than san_path_err_recovery_time
> -		 * if not place the path in delayed state
> -		 */
> -		if (check_path_reinstate_state(pp)) {
> -			pp->state = PATH_DELAYED;
> -			return 1;
> -		}
> +	    pp->mpp->san_path_err_threshold > 0 &&
> +	    pp->mpp->san_path_err_forget_rate > 0 &&
> +	    pp->mpp->san_path_err_recovery_time > 0 &&
> +	    check_path_reinstate_state(pp)) {
> +		pp->state = PATH_DELAYED;
> +		return 1;
>  	}
>  	
>  	if ((newstate == PATH_UP || newstate == PATH_GHOST) && @@ -1685,31 +1685,18 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
>  		} else
>  			pp->wait_checks = 0;
>  	}
> -	if ((newstate == PATH_DOWN || newstate == PATH_GHOST ||
> -		pp->state == PATH_DOWN)) {
> -		/*assigned  the path_err_forget_rate when we see the first failure on the path*/
> -		if(pp->path_failures == 0){
> -			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> -		}
> -		pp->path_failures++;
> -	}
> +
>  	/*
>  	 * don't reinstate failed path, if its in stand-by
>  	 * and if target supports only implicit tpgs mode.
>  	 * this will prevent unnecessary i/o by dm on stand-by
>  	 * paths if there are no other active paths in map.
> -	 *
> -	 * when path failures has exceeded the san_path_err_threshold 
> -	 * within san_path_err_forget_rate then we don't reinstate
> -	 * failed path for san_path_err_recovery_time
>  	 */
> -	disable_reinstate = ((newstate == PATH_GHOST &&
> +	disable_reinstate = (newstate == PATH_GHOST &&
>  			    pp->mpp->nr_active == 0 &&
> -			    pp->tpgs == TPGS_IMPLICIT) ? 1 :
> -			    check_path_validity_err(pp));
> +			    pp->tpgs == TPGS_IMPLICIT) ? 1 : 0;
>  
>  	pp->chkrstate = newstate;
> -
>  	if (newstate != pp->state) {
>  		int oldstate = pp->state;
>  		pp->state = newstate;
> --
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-02-02 17:39                         ` Benjamin Marzinski
@ 2017-02-02 18:02                           ` Muneendra Kumar M
  2017-02-02 18:29                             ` Benjamin Marzinski
  0 siblings, 1 reply; 21+ messages in thread
From: Muneendra Kumar M @ 2017-02-02 18:02 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development

Hi Ben,
Thanks for the review.
So can I push my changes as mentioned by you in the below mail using git.

Regards,
Muneendra.


-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
Sent: Thursday, February 02, 2017 11:09 PM
To: Muneendra Kumar M <mmandala@Brocade.com>
Cc: device-mapper development <dm-devel@redhat.com>
Subject: Re: [dm-devel] deterministic io throughput in multipath

This looks fine. Thanks for all your work on this

ACK

-Ben

On Thu, Feb 02, 2017 at 11:48:39AM +0000, Muneendra Kumar M wrote:
> Hi Ben,
> The below changes suggested by you are good. Thanks for it.
> I have taken your changes and made few changes to make the functionality working.
> I have tested the same on the setup which works fine.
> 
> We need to increment the path_failures every time checker fails.
> if a device is down for a while, when it comes back up, it will get delayed only if the path failures exceeds the error threshold.
> Whether checker fails or kernel identifies the failures we need  to capture those as it tells the state of the path and target.
> The below code has already taken care of this.
> 
> Could you please review the attached patch and provide us your valuable comments .
> 
> Below are the files that has been changed .
> 
> libmultipath/config.c      |  6 ++++++
>  libmultipath/config.h      |  9 +++++++++
>  libmultipath/configure.c   |  3 +++
>  libmultipath/defaults.h    |  3 ++-
>  libmultipath/dict.c        | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
>  libmultipath/dict.h        |  3 +--
>  libmultipath/propsel.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>  libmultipath/propsel.h     |  3 +++
>  libmultipath/structs.h     | 14 ++++++++++----
>  multipath/multipath.conf.5 | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  multipathd/main.c          | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  11 files changed, 281 insertions(+), 34 deletions(-)
> 
> Regards,
> Muneendra.
> 
> 
> 
> 
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
> Sent: Thursday, February 02, 2017 7:20 AM
> To: Muneendra Kumar M <mmandala@Brocade.com>
> Cc: device-mapper development <dm-devel@redhat.com>
> Subject: RE: [dm-devel] deterministic io throughput in multipath
> 
> This is certainly moving in the right direction.  There are a couple of things I would change. check_path_reinstate_state() will automatically disable the path if there are configuration problems. If things aren't configured correctly, or the code can't get the current time, it seems like it should allow the path to get reinstated, to avoid keeping a perfectly good path down indefinitely.  Also, if you look at the delay_*_checks code, it automatically reinstates a problematic path if there are no other paths to use. This seems like a good idea as well.
> 
> Also, your code increments path_failures every time the checker fails.
> This means that if a device is down for a while, when it comes back up, it will get delayed.  I'm not sure if this is intentional, or if you were trying to track the number of times the path was restored and then failed again, instead of the total time a path was failed for.
> 
> Perhaps it would be easier to show the kind of changes I would make with a patch.  What do you think about this? I haven't done much testing on it at all, but these are the changes I would make.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/config.c |   3 +
>  libmultipath/dict.c   |   2 +-
>  multipathd/main.c     | 149 +++++++++++++++++++++++---------------------------
>  3 files changed, 72 insertions(+), 82 deletions(-)
> 
> diff --git a/libmultipath/config.c b/libmultipath/config.c index be384af..5837dc6 100644
> --- a/libmultipath/config.c
> +++ b/libmultipath/config.c
> @@ -624,6 +624,9 @@ load_config (char * file)
>  	conf->disable_changed_wwids = DEFAULT_DISABLE_CHANGED_WWIDS;
>  	conf->remove_retries = 0;
>  	conf->max_sectors_kb = DEFAULT_MAX_SECTORS_KB;
> +	conf->san_path_err_threshold = DEFAULT_ERR_CHECKS;
> +	conf->san_path_err_forget_rate = DEFAULT_ERR_CHECKS;
> +	conf->san_path_err_recovery_time = DEFAULT_ERR_CHECKS;
>  
>  	/*
>  	 * preload default hwtable
> diff --git a/libmultipath/dict.c b/libmultipath/dict.c index 4754572..ae94c88 100644
> --- a/libmultipath/dict.c
> +++ b/libmultipath/dict.c
> @@ -1050,7 +1050,7 @@ print_off_int_undef(char * buff, int len, void *ptr)
>  	case NU_UNDEF:
>  		return 0;
>  	case NU_NO:
> -		return snprintf(buff, len, "\"off\"");
> +		return snprintf(buff, len, "\"no\"");
>  	default:
>  		return snprintf(buff, len, "%i", *int_ptr);
>  	}
> diff --git a/multipathd/main.c b/multipathd/main.c index d6d68a4..305e236 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1488,69 +1488,70 @@ void repair_path(struct path * pp)  }
>  
>  static int check_path_reinstate_state(struct path * pp) {
> -	struct timespec start_time;
> -	int disable_reinstate = 1;
> -
> -	if (!((pp->mpp->san_path_err_threshold > 0) && 
> -				(pp->mpp->san_path_err_forget_rate > 0) &&
> -				(pp->mpp->san_path_err_recovery_time >0))) {
> -		return disable_reinstate;
> -	}
> -
> -	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
> -		return disable_reinstate;	
> +	struct timespec curr_time;
> +
> +	if (pp->disable_reinstate) {
> +		/* If we don't know how much time has passed, automatically
> +		 * reinstate the path, just to be safe. Also, if there are
> +		 * no other usable paths, reinstate the path */
> +		if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0 ||
> +		    pp->mpp->nr_active == 0) {
> +			condlog(2, "%s : reinstating path early", pp->dev);
> +			goto reinstate_path;
> +		}
> +		if ((curr_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
> +			condlog(2,"%s : reinstate the path after err recovery time", pp->dev);
> +			goto reinstate_path;
> +		}
> +		return 1;
>  	}
>  
> -	if ((start_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
> -		disable_reinstate =0;
> -		pp->path_failures = 0;
> -		pp->disable_reinstate = 0;
> -		pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> -		condlog(3,"\npath %s :reinstate the path after err recovery time\n",pp->dev);
> +	/* forget errors on a working path */
> +	if ((pp->state == PATH_UP || pp->state == PATH_GHOST) &&
> +	    pp->path_failures > 0) {
> +		if (pp->san_path_err_forget_rate > 0)
> +			pp->san_path_err_forget_rate--;
> +		else {
> +			/* for every san_path_err_forget_rate number of 
> +			 * successful path checks decrement path_failures by 1
> +			 */
> +			pp->path_failures--;
> +			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> +		}
> +		return 0;
>  	}
> -	return  disable_reinstate;
> -}
>  
> -static int check_path_validity_err (struct path * pp) {
> -	struct timespec start_time;
> -	int disable_reinstate = 0;
> +	/* If the path isn't recovering from a failed state, do nothing */
> +	if (pp->state != PATH_DOWN && pp->state != PATH_SHAKY &&
> +	    pp->state != PATH_TIMEOUT)
> +		return 0;
>  
> -	if (!((pp->mpp->san_path_err_threshold > 0) && 
> -				(pp->mpp->san_path_err_forget_rate > 0) &&
> -				(pp->mpp->san_path_err_recovery_time >0))) {
> -		return disable_reinstate;
> -	}
> +	if (pp->path_failures == 0)
> +		 pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> +	pp->path_failures++;
>  
> -	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
> -		return disable_reinstate;	
> -	}
> -	if (!pp->disable_reinstate) {
> -		if (pp->path_failures) {
> -			/*if the error threshold has hit hit within the san_path_err_forget_rate
> -			 *cycles donot reinstante the path till the san_path_err_recovery_time
> -			 *place the path in failed state till san_path_err_recovery_time so that the
> -			 *cutomer can rectify the issue within this time .Once the completion of
> -			 *san_path_err_recovery_time it should automatically reinstantate the path
> -			 */
> -			if ((pp->path_failures > pp->mpp->san_path_err_threshold) &&
> -					(pp->san_path_err_forget_rate > 0)) {
> -				printf("\n%s:%d: %s hit error threshold \n",__func__,__LINE__,pp->dev);
> -				pp->dis_reinstate_time = start_time.tv_sec ;
> -				pp->disable_reinstate = 1;
> -				disable_reinstate = 1;
> -			} else if ((pp->san_path_err_forget_rate > 0)) {
> -				pp->san_path_err_forget_rate--;
> -			} else {
> -				/*for every san_path_err_forget_rate number
> -				 *of successful path checks decrement path_failures by 1
> -				 */
> -				pp->path_failures --;
> -				pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> -			}
> -		}
> +	/* if we don't know the currently time, we don't know how long to
> +	 * delay the path, so there's no point in checking if we should */
> +	if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0)
> +		return 0;
> +	/* when path failures has exceeded the san_path_err_threshold
> +	 * place the path in delayed state till san_path_err_recovery_time
> +	 * so that the cutomer can rectify the issue within this time. After
> +	 * the completion of san_path_err_recovery_time it should
> +	 * automatically reinstate the path */
> +	if (pp->path_failures > pp->mpp->san_path_err_threshold) {
> +		condlog(2, "%s : hit error threshold. Delaying path reinstatement", pp->dev);
> +		pp->dis_reinstate_time = curr_time.tv_sec;
> +		pp->disable_reinstate = 1;
> +		return 1;
>  	}
> -	return  disable_reinstate;
> +	return 0;
> +reinstate_path:
> +	pp->path_failures = 0;
> +	pp->disable_reinstate = 0;
> +	return 0;
>  }
> +
>  /*
>   * Returns '1' if the path has been checked, '-1' if it was blacklisted
>   * and '0' otherwise
> @@ -1566,7 +1567,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
>  	int oldchkrstate = pp->chkrstate;
>  	int retrigger_tries, checkint;
>  	struct config *conf;
> -	int ret;	
> +	int ret;
>  
>  	if ((pp->initialized == INIT_OK ||
>  	     pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) @@ -1664,16 +1665,15 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
>  	if (!pp->mpp)
>  		return 0;
>  
> +	/* We only need to check if the path should be delayed when the
> +	 * the path is actually usable and san_path_err is configured */
>  	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
> -	     pp->disable_reinstate) {
> -		/*
> -		 * check if the path is in failed state for more than san_path_err_recovery_time
> -		 * if not place the path in delayed state
> -		 */
> -		if (check_path_reinstate_state(pp)) {
> -			pp->state = PATH_DELAYED;
> -			return 1;
> -		}
> +	    pp->mpp->san_path_err_threshold > 0 &&
> +	    pp->mpp->san_path_err_forget_rate > 0 &&
> +	    pp->mpp->san_path_err_recovery_time > 0 &&
> +	    check_path_reinstate_state(pp)) {
> +		pp->state = PATH_DELAYED;
> +		return 1;
>  	}
>  	
>  	if ((newstate == PATH_UP || newstate == PATH_GHOST) && @@ -1685,31 +1685,18 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
>  		} else
>  			pp->wait_checks = 0;
>  	}
> -	if ((newstate == PATH_DOWN || newstate == PATH_GHOST ||
> -		pp->state == PATH_DOWN)) {
> -		/*assigned  the path_err_forget_rate when we see the first failure on the path*/
> -		if(pp->path_failures == 0){
> -			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> -		}
> -		pp->path_failures++;
> -	}
> +
>  	/*
>  	 * don't reinstate failed path, if its in stand-by
>  	 * and if target supports only implicit tpgs mode.
>  	 * this will prevent unnecessary i/o by dm on stand-by
>  	 * paths if there are no other active paths in map.
> -	 *
> -	 * when path failures has exceeded the san_path_err_threshold 
> -	 * within san_path_err_forget_rate then we don't reinstate
> -	 * failed path for san_path_err_recovery_time
>  	 */
> -	disable_reinstate = ((newstate == PATH_GHOST &&
> +	disable_reinstate = (newstate == PATH_GHOST &&
>  			    pp->mpp->nr_active == 0 &&
> -			    pp->tpgs == TPGS_IMPLICIT) ? 1 :
> -			    check_path_validity_err(pp));
> +			    pp->tpgs == TPGS_IMPLICIT) ? 1 : 0;
>  
>  	pp->chkrstate = newstate;
> -
>  	if (newstate != pp->state) {
>  		int oldstate = pp->state;
>  		pp->state = newstate;
> --
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-02-02 18:02                           ` Muneendra Kumar M
@ 2017-02-02 18:29                             ` Benjamin Marzinski
  2017-02-03 11:43                               ` Muneendra Kumar M
  0 siblings, 1 reply; 21+ messages in thread
From: Benjamin Marzinski @ 2017-02-02 18:29 UTC (permalink / raw)
  To: Muneendra Kumar M; +Cc: device-mapper development

On Thu, Feb 02, 2017 at 06:02:57PM +0000, Muneendra Kumar M wrote:
> Hi Ben,
> Thanks for the review.
> So can I push my changes as mentioned by you in the below mail using git.

Sure.

-Ben

> 
> Regards,
> Muneendra.
> 
> 
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
> Sent: Thursday, February 02, 2017 11:09 PM
> To: Muneendra Kumar M <mmandala@Brocade.com>
> Cc: device-mapper development <dm-devel@redhat.com>
> Subject: Re: [dm-devel] deterministic io throughput in multipath
> 
> This looks fine. Thanks for all your work on this
> 
> ACK
> 
> -Ben
> 
> On Thu, Feb 02, 2017 at 11:48:39AM +0000, Muneendra Kumar M wrote:
> > Hi Ben,
> > The below changes suggested by you are good. Thanks for it.
> > I have taken your changes and made few changes to make the functionality working.
> > I have tested the same on the setup which works fine.
> > 
> > We need to increment the path_failures every time checker fails.
> > if a device is down for a while, when it comes back up, it will get delayed only if the path failures exceeds the error threshold.
> > Whether checker fails or kernel identifies the failures we need  to capture those as it tells the state of the path and target.
> > The below code has already taken care of this.
> > 
> > Could you please review the attached patch and provide us your valuable comments .
> > 
> > Below are the files that has been changed .
> > 
> > libmultipath/config.c      |  6 ++++++
> >  libmultipath/config.h      |  9 +++++++++
> >  libmultipath/configure.c   |  3 +++
> >  libmultipath/defaults.h    |  3 ++-
> >  libmultipath/dict.c        | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
> >  libmultipath/dict.h        |  3 +--
> >  libmultipath/propsel.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  libmultipath/propsel.h     |  3 +++
> >  libmultipath/structs.h     | 14 ++++++++++----
> >  multipath/multipath.conf.5 | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  multipathd/main.c          | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  11 files changed, 281 insertions(+), 34 deletions(-)
> > 
> > Regards,
> > Muneendra.
> > 
> > 
> > 
> > 
> > -----Original Message-----
> > From: Benjamin Marzinski [mailto:bmarzins@redhat.com] 
> > Sent: Thursday, February 02, 2017 7:20 AM
> > To: Muneendra Kumar M <mmandala@Brocade.com>
> > Cc: device-mapper development <dm-devel@redhat.com>
> > Subject: RE: [dm-devel] deterministic io throughput in multipath
> > 
> > This is certainly moving in the right direction.  There are a couple of things I would change. check_path_reinstate_state() will automatically disable the path if there are configuration problems. If things aren't configured correctly, or the code can't get the current time, it seems like it should allow the path to get reinstated, to avoid keeping a perfectly good path down indefinitely.  Also, if you look at the delay_*_checks code, it automatically reinstates a problematic path if there are no other paths to use. This seems like a good idea as well.
> > 
> > Also, your code increments path_failures every time the checker fails.
> > This means that if a device is down for a while, when it comes back up, it will get delayed.  I'm not sure if this is intentional, or if you were trying to track the number of times the path was restored and then failed again, instead of the total time a path was failed for.
> > 
> > Perhaps it would be easier to show the kind of changes I would make with a patch.  What do you think about this? I haven't done much testing on it at all, but these are the changes I would make.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/config.c |   3 +
> >  libmultipath/dict.c   |   2 +-
> >  multipathd/main.c     | 149 +++++++++++++++++++++++---------------------------
> >  3 files changed, 72 insertions(+), 82 deletions(-)
> > 
> > diff --git a/libmultipath/config.c b/libmultipath/config.c index be384af..5837dc6 100644
> > --- a/libmultipath/config.c
> > +++ b/libmultipath/config.c
> > @@ -624,6 +624,9 @@ load_config (char * file)
> >  	conf->disable_changed_wwids = DEFAULT_DISABLE_CHANGED_WWIDS;
> >  	conf->remove_retries = 0;
> >  	conf->max_sectors_kb = DEFAULT_MAX_SECTORS_KB;
> > +	conf->san_path_err_threshold = DEFAULT_ERR_CHECKS;
> > +	conf->san_path_err_forget_rate = DEFAULT_ERR_CHECKS;
> > +	conf->san_path_err_recovery_time = DEFAULT_ERR_CHECKS;
> >  
> >  	/*
> >  	 * preload default hwtable
> > diff --git a/libmultipath/dict.c b/libmultipath/dict.c index 4754572..ae94c88 100644
> > --- a/libmultipath/dict.c
> > +++ b/libmultipath/dict.c
> > @@ -1050,7 +1050,7 @@ print_off_int_undef(char * buff, int len, void *ptr)
> >  	case NU_UNDEF:
> >  		return 0;
> >  	case NU_NO:
> > -		return snprintf(buff, len, "\"off\"");
> > +		return snprintf(buff, len, "\"no\"");
> >  	default:
> >  		return snprintf(buff, len, "%i", *int_ptr);
> >  	}
> > diff --git a/multipathd/main.c b/multipathd/main.c index d6d68a4..305e236 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1488,69 +1488,70 @@ void repair_path(struct path * pp)  }
> >  
> >  static int check_path_reinstate_state(struct path * pp) {
> > -	struct timespec start_time;
> > -	int disable_reinstate = 1;
> > -
> > -	if (!((pp->mpp->san_path_err_threshold > 0) && 
> > -				(pp->mpp->san_path_err_forget_rate > 0) &&
> > -				(pp->mpp->san_path_err_recovery_time >0))) {
> > -		return disable_reinstate;
> > -	}
> > -
> > -	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
> > -		return disable_reinstate;	
> > +	struct timespec curr_time;
> > +
> > +	if (pp->disable_reinstate) {
> > +		/* If we don't know how much time has passed, automatically
> > +		 * reinstate the path, just to be safe. Also, if there are
> > +		 * no other usable paths, reinstate the path */
> > +		if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0 ||
> > +		    pp->mpp->nr_active == 0) {
> > +			condlog(2, "%s : reinstating path early", pp->dev);
> > +			goto reinstate_path;
> > +		}
> > +		if ((curr_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
> > +			condlog(2,"%s : reinstate the path after err recovery time", pp->dev);
> > +			goto reinstate_path;
> > +		}
> > +		return 1;
> >  	}
> >  
> > -	if ((start_time.tv_sec - pp->dis_reinstate_time ) > pp->mpp->san_path_err_recovery_time) {
> > -		disable_reinstate =0;
> > -		pp->path_failures = 0;
> > -		pp->disable_reinstate = 0;
> > -		pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> > -		condlog(3,"\npath %s :reinstate the path after err recovery time\n",pp->dev);
> > +	/* forget errors on a working path */
> > +	if ((pp->state == PATH_UP || pp->state == PATH_GHOST) &&
> > +	    pp->path_failures > 0) {
> > +		if (pp->san_path_err_forget_rate > 0)
> > +			pp->san_path_err_forget_rate--;
> > +		else {
> > +			/* for every san_path_err_forget_rate number of 
> > +			 * successful path checks decrement path_failures by 1
> > +			 */
> > +			pp->path_failures--;
> > +			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> > +		}
> > +		return 0;
> >  	}
> > -	return  disable_reinstate;
> > -}
> >  
> > -static int check_path_validity_err (struct path * pp) {
> > -	struct timespec start_time;
> > -	int disable_reinstate = 0;
> > +	/* If the path isn't recovering from a failed state, do nothing */
> > +	if (pp->state != PATH_DOWN && pp->state != PATH_SHAKY &&
> > +	    pp->state != PATH_TIMEOUT)
> > +		return 0;
> >  
> > -	if (!((pp->mpp->san_path_err_threshold > 0) && 
> > -				(pp->mpp->san_path_err_forget_rate > 0) &&
> > -				(pp->mpp->san_path_err_recovery_time >0))) {
> > -		return disable_reinstate;
> > -	}
> > +	if (pp->path_failures == 0)
> > +		 pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> > +	pp->path_failures++;
> >  
> > -	if (clock_gettime(CLOCK_MONOTONIC, &start_time) != 0) {
> > -		return disable_reinstate;	
> > -	}
> > -	if (!pp->disable_reinstate) {
> > -		if (pp->path_failures) {
> > -			/*if the error threshold has hit hit within the san_path_err_forget_rate
> > -			 *cycles donot reinstante the path till the san_path_err_recovery_time
> > -			 *place the path in failed state till san_path_err_recovery_time so that the
> > -			 *cutomer can rectify the issue within this time .Once the completion of
> > -			 *san_path_err_recovery_time it should automatically reinstantate the path
> > -			 */
> > -			if ((pp->path_failures > pp->mpp->san_path_err_threshold) &&
> > -					(pp->san_path_err_forget_rate > 0)) {
> > -				printf("\n%s:%d: %s hit error threshold \n",__func__,__LINE__,pp->dev);
> > -				pp->dis_reinstate_time = start_time.tv_sec ;
> > -				pp->disable_reinstate = 1;
> > -				disable_reinstate = 1;
> > -			} else if ((pp->san_path_err_forget_rate > 0)) {
> > -				pp->san_path_err_forget_rate--;
> > -			} else {
> > -				/*for every san_path_err_forget_rate number
> > -				 *of successful path checks decrement path_failures by 1
> > -				 */
> > -				pp->path_failures --;
> > -				pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> > -			}
> > -		}
> > +	/* if we don't know the currently time, we don't know how long to
> > +	 * delay the path, so there's no point in checking if we should */
> > +	if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0)
> > +		return 0;
> > +	/* when path failures has exceeded the san_path_err_threshold
> > +	 * place the path in delayed state till san_path_err_recovery_time
> > +	 * so that the cutomer can rectify the issue within this time. After
> > +	 * the completion of san_path_err_recovery_time it should
> > +	 * automatically reinstate the path */
> > +	if (pp->path_failures > pp->mpp->san_path_err_threshold) {
> > +		condlog(2, "%s : hit error threshold. Delaying path reinstatement", pp->dev);
> > +		pp->dis_reinstate_time = curr_time.tv_sec;
> > +		pp->disable_reinstate = 1;
> > +		return 1;
> >  	}
> > -	return  disable_reinstate;
> > +	return 0;
> > +reinstate_path:
> > +	pp->path_failures = 0;
> > +	pp->disable_reinstate = 0;
> > +	return 0;
> >  }
> > +
> >  /*
> >   * Returns '1' if the path has been checked, '-1' if it was blacklisted
> >   * and '0' otherwise
> > @@ -1566,7 +1567,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
> >  	int oldchkrstate = pp->chkrstate;
> >  	int retrigger_tries, checkint;
> >  	struct config *conf;
> > -	int ret;	
> > +	int ret;
> >  
> >  	if ((pp->initialized == INIT_OK ||
> >  	     pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) @@ -1664,16 +1665,15 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
> >  	if (!pp->mpp)
> >  		return 0;
> >  
> > +	/* We only need to check if the path should be delayed when the
> > +	 * the path is actually usable and san_path_err is configured */
> >  	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
> > -	     pp->disable_reinstate) {
> > -		/*
> > -		 * check if the path is in failed state for more than san_path_err_recovery_time
> > -		 * if not place the path in delayed state
> > -		 */
> > -		if (check_path_reinstate_state(pp)) {
> > -			pp->state = PATH_DELAYED;
> > -			return 1;
> > -		}
> > +	    pp->mpp->san_path_err_threshold > 0 &&
> > +	    pp->mpp->san_path_err_forget_rate > 0 &&
> > +	    pp->mpp->san_path_err_recovery_time > 0 &&
> > +	    check_path_reinstate_state(pp)) {
> > +		pp->state = PATH_DELAYED;
> > +		return 1;
> >  	}
> >  	
> >  	if ((newstate == PATH_UP || newstate == PATH_GHOST) && @@ -1685,31 +1685,18 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
> >  		} else
> >  			pp->wait_checks = 0;
> >  	}
> > -	if ((newstate == PATH_DOWN || newstate == PATH_GHOST ||
> > -		pp->state == PATH_DOWN)) {
> > -		/*assigned  the path_err_forget_rate when we see the first failure on the path*/
> > -		if(pp->path_failures == 0){
> > -			pp->san_path_err_forget_rate = pp->mpp->san_path_err_forget_rate;
> > -		}
> > -		pp->path_failures++;
> > -	}
> > +
> >  	/*
> >  	 * don't reinstate failed path, if its in stand-by
> >  	 * and if target supports only implicit tpgs mode.
> >  	 * this will prevent unnecessary i/o by dm on stand-by
> >  	 * paths if there are no other active paths in map.
> > -	 *
> > -	 * when path failures has exceeded the san_path_err_threshold 
> > -	 * within san_path_err_forget_rate then we don't reinstate
> > -	 * failed path for san_path_err_recovery_time
> >  	 */
> > -	disable_reinstate = ((newstate == PATH_GHOST &&
> > +	disable_reinstate = (newstate == PATH_GHOST &&
> >  			    pp->mpp->nr_active == 0 &&
> > -			    pp->tpgs == TPGS_IMPLICIT) ? 1 :
> > -			    check_path_validity_err(pp));
> > +			    pp->tpgs == TPGS_IMPLICIT) ? 1 : 0;
> >  
> >  	pp->chkrstate = newstate;
> > -
> >  	if (newstate != pp->state) {
> >  		int oldstate = pp->state;
> >  		pp->state = newstate;
> > --
> > 1.8.3.1
> > 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: deterministic io throughput in multipath
  2017-02-02 18:29                             ` Benjamin Marzinski
@ 2017-02-03 11:43                               ` Muneendra Kumar M
  0 siblings, 0 replies; 21+ messages in thread
From: Muneendra Kumar M @ 2017-02-03 11:43 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 4984 bytes --]

Hi Ben,
I did commit my  patches to a branch off the head of master
But when I used the below command iam getting the below errors. Not sure whether the mail has been sent to dm-devel@redhat.com<mailto:dm-devel@redhat.com>.

# git send-email --to "device-mapper development <dm-devel@redhat.com>" --cc "Christophe Varoqui <christophe.varoqui@opensvc.com>" --no-chain-reply-to --suppress-from <dir>

Iam seeing the below error. Could you please help me in this


Content-Description: Notification
Content-Type: text/plain; charset=us-ascii

This is the mail system at host localhost.localdomain.

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to postmaster.

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

                   The mail system

<christophe.varoqui@opensvc.com>: host spool.mail.gandi.net[217.70.184.6] said:
    550 5.1.8 <root@localhost.localdomain>: Sender address rejected: Domain not
    found (in reply to RCPT TO command)

--E52C4C13C372.1486112814/localhost.localdomain
Content-Description: Delivery report
Content-Type: message/delivery-status

Reporting-MTA: dns; localhost.localdomain
X-Postfix-Queue-ID: E52C4C13C372
X-Postfix-Sender: rfc822; root@localhost.localdomain
Arrival-Date: Fri,  3 Feb 2017 14:36:22 +0530 (IST)

Final-Recipient: rfc822; christophe.varoqui@opensvc.com
Action: failed
Status: 5.1.8
Remote-MTA: dns; spool.mail.gandi.net
Diagnostic-Code: smtp; 550 5.1.8 <root@localhost.localdomain>: Sender address
    rejected: Domain not found

--E52C4C13C372.1486112814/localhost.localdomain
Content-Description: Undelivered Message
Content-Type: message/rfc822

Return-Path: <root@localhost.localdomain>
Received: by localhost.localdomain (Postfix, from userid 0)
        id E52C4C13C372; Fri,  3 Feb 2017 14:36:22 +0530 (IST)
From: M Muneendra Kumar <mmandala@brocade.com>
To: device-mapper development <dm-devel@redhat.com>
Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>,
        Benjamin Marzinski <bmarzins@redhat.com>
Subject: [PATCH 0/1] multipathd: deterministic io throughput in multipath
Date: Fri,  3 Feb 2017 14:36:21 +0530
Message-Id: <1486112782-12706-1-git-send-email-mmandala@brocade.com>
X-Mailer: git-send-email 1.8.3.1

Regards,
Muneendra.

-----Original Message-----
From: Benjamin Marzinski [mailto:bmarzins@redhat.com]
Sent: Friday, February 03, 2017 12:00 AM
To: Muneendra Kumar M <mmandala@Brocade.com>
Cc: device-mapper development <dm-devel@redhat.com>
Subject: Re: [dm-devel] deterministic io throughput in multipath

On Thu, Feb 02, 2017 at 06:02:57PM +0000, Muneendra Kumar M wrote:
> Hi Ben,
> Thanks for the review.
> So can I push my changes as mentioned by you in the below mail using git.

Sure.

-Ben

>
> Regards,
> Muneendra.
>
>
> -----Original Message-----
> From: Benjamin Marzinski [mailto:bmarzins@redhat.com]
> Sent: Thursday, February 02, 2017 11:09 PM
> To: Muneendra Kumar M <mmandala@Brocade.com<mailto:mmandala@Brocade.com>>
> Cc: device-mapper development <dm-devel@redhat.com<mailto:dm-devel@redhat.com>>
> Subject: Re: [dm-devel] deterministic io throughput in multipath
>
> This looks fine. Thanks for all your work on this
>
> ACK
>
> -Ben
>
> On Thu, Feb 02, 2017 at 11:48:39AM +0000, Muneendra Kumar M wrote:
> > Hi Ben,
> > The below changes suggested by you are good. Thanks for it.
> > I have taken your changes and made few changes to make the functionality working.
> > I have tested the same on the setup which works fine.
> >
> > We need to increment the path_failures every time checker fails.
> > if a device is down for a while, when it comes back up, it will get delayed only if the path failures exceeds the error threshold.
> > Whether checker fails or kernel identifies the failures we need  to capture those as it tells the state of the path and target.
> > The below code has already taken care of this.
> >
> > Could you please review the attached patch and provide us your valuable comments .
> >
> > Below are the files that has been changed .
> >
> > libmultipath/config.c      |  6 ++++++
> >  libmultipath/config.h      |  9 +++++++++
> >  libmultipath/configure.c   |  3 +++
> >  libmultipath/defaults.h    |  3 ++-
> >  libmultipath/dict.c        | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
> >  libmultipath/dict.h        |  3 +--
> >  libmultipath/propsel.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  libmultipath/propsel.h     |  3 +++
> >  libmultipath/structs.h     | 14 ++++++++++----
> >  multipath/multipath.conf.5 | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  multipathd/main.c          | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  11 files changed, 281 insertions(+), 34 deletions(-)
> >
> > Regards,
> > Muneendra.
> >
> >


[-- Attachment #1.2: Type: text/html, Size: 10692 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-02-03 11:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-19 11:50 deterministic io throughput in multipath Muneendra Kumar M
2016-12-19 12:09 ` Hannes Reinecke
2016-12-21 16:09 ` Benjamin Marzinski
2016-12-22  5:39   ` Muneendra Kumar M
2016-12-26  9:42   ` Muneendra Kumar M
2017-01-03 17:12     ` Benjamin Marzinski
2017-01-04 13:26       ` Muneendra Kumar M
2017-01-16 11:19       ` Muneendra Kumar M
2017-01-17  1:04         ` Benjamin Marzinski
2017-01-17 10:43           ` Muneendra Kumar M
2017-01-23 11:02           ` Muneendra Kumar M
2017-01-25  9:28             ` Benjamin Marzinski
2017-01-25 11:48               ` Muneendra Kumar M
2017-01-25 13:07                 ` Benjamin Marzinski
2017-02-01 11:58                   ` Muneendra Kumar M
2017-02-02  1:50                     ` Benjamin Marzinski
2017-02-02 11:48                       ` Muneendra Kumar M
2017-02-02 17:39                         ` Benjamin Marzinski
2017-02-02 18:02                           ` Muneendra Kumar M
2017-02-02 18:29                             ` Benjamin Marzinski
2017-02-03 11:43                               ` Muneendra Kumar M

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.