All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
@ 2013-06-24 13:48 Jack Wang
       [not found] ` <51C84E39.80806-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 21+ messages in thread
From: Jack Wang @ 2013-06-24 13:48 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley, Mike Christie

> I'm not sure it's possible to avoid such a race without introducing
> a new mutex. How about something like the (untested) SCSI core patch
> below, and invoking scsi_block_eh() and scsi_unblock_eh() around any
> reconnect activity not initiated from the SCSI EH thread ?
> 
> [PATCH] Add scsi_block_eh() and scsi_unblock_eh()
> 
Hi Bart,

The description doesn't match the code at all, do you mean try to
serialize the reconnect activity with this new block_eh_mutex?

Cheers,
Jack
> ---
>  drivers/scsi/hosts.c      |    1 +
>  drivers/scsi/scsi_error.c |   10 ++++++++++
>  include/scsi/scsi_host.h  |    1 +
>  3 files changed, 12 insertions(+)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 17e2ccb..0df3ec8 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -360,6 +360,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>         init_waitqueue_head(&shost->host_wait);
> 
>         mutex_init(&shost->scan_mutex);
> +       mutex_init(&shost->block_eh_mutex);
> 
>         /*
>          * subtract one because we increment first then return, but we need to
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index ab16930..566daaa 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -551,6 +551,10 @@ static int scsi_begin_eh(struct Scsi_Host *host)
>  {
>         int res;
> 
> +       res = mutex_lock_interruptible(&host->block_eh_mutex);
> +       if (res)
> +               goto out;
> +
>         spin_lock_irq(host->host_lock);
>         switch (host->shost_state) {
>         case SHOST_DEL:
> @@ -565,6 +569,10 @@ static int scsi_begin_eh(struct Scsi_Host *host)
>         }
>         spin_unlock_irq(host->host_lock);
> 
> +       if (res)
> +               mutex_unlock(&host->block_eh_mutex);
> +
> +out:
>         return res;
>  }
> 
> @@ -579,6 +587,8 @@ static void scsi_end_eh(struct Scsi_Host *host)
>         if (host->eh_active == 0)
>                 wake_up(&host->host_wait);
>         spin_unlock_irq(host->host_lock);
> +
> +       mutex_unlock(&host->block_eh_mutex);
>  }
> 
>  /**
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 9785e51..d7ce065 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -573,6 +573,7 @@ struct Scsi_Host {
>         spinlock_t              *host_lock;
> 
>         struct mutex            scan_mutex;/* serialize scanning activity */
> +       struct mutex            block_eh_mutex; /* block ML LLD EH calls */
> 
>         struct list_head        eh_cmd_q;
>         struct task_struct    * ehandler;  /* Error recovery thread. */


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found] ` <51C84E39.80806-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-06-24 15:50   ` Bart Van Assche
       [not found]     ` <51C86AB4.1000906-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2013-06-24 15:50 UTC (permalink / raw)
  To: Jack Wang
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley, Mike Christie

On 06/24/13 15:48, Jack Wang wrote:
>> I'm not sure it's possible to avoid such a race without introducing
>> a new mutex. How about something like the (untested) SCSI core patch
>> below, and invoking scsi_block_eh() and scsi_unblock_eh() around any
>> reconnect activity not initiated from the SCSI EH thread ?
>>
>> [PATCH] Add scsi_block_eh() and scsi_unblock_eh()
>>
> Hi Bart,
>
> The description doesn't match the code at all, do you mean try to
> serialize the reconnect activity with this new block_eh_mutex?

In case it wasn't clear, the actual scsi_block_eh() and 
scsi_unblock_eh() calls aren't present in the patch I posted, only their 
implementation.

P.S.: please fix your e-mail client such that it does not break e-mail 
threading. There are no "In-Reply-To:" tags in the header of your e-mails.

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found]     ` <51C86AB4.1000906-HInyCGIudOg@public.gmane.org>
@ 2013-06-24 16:05       ` Jack Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jack Wang @ 2013-06-24 16:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley, Mike Christie

On 06/24/2013 05:50 PM, Bart Van Assche wrote:
> On 06/24/13 15:48, Jack Wang wrote:
>>> I'm not sure it's possible to avoid such a race without introducing
>>> a new mutex. How about something like the (untested) SCSI core patch
>>> below, and invoking scsi_block_eh() and scsi_unblock_eh() around any
>>> reconnect activity not initiated from the SCSI EH thread ?
>>>
>>> [PATCH] Add scsi_block_eh() and scsi_unblock_eh()
>>>
>> Hi Bart,
>>
>> The description doesn't match the code at all, do you mean try to
>> serialize the reconnect activity with this new block_eh_mutex?
> 
> In case it wasn't clear, the actual scsi_block_eh() and
> scsi_unblock_eh() calls aren't present in the patch I posted, only their
> implementation.
As the patch title is "Add scsi_block_eh() and scsi_unblock_eh()", so I
asked, thanks for explanation.
> 
> P.S.: please fix your e-mail client such that it does not break e-mail
> threading. There are no "In-Reply-To:" tags in the header of your e-mails.
> 
Sorry for inconvenient, will fix it.

Thanks,
Jack
> Thanks,
> 
> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found]     ` <51C764FB.6070207-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
@ 2013-06-24  7:37       ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2013-06-24  7:37 UTC (permalink / raw)
  To: Mike Christie
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley

On 06/23/13 23:13, Mike Christie wrote:
> On 06/12/2013 08:28 AM, Bart Van Assche wrote:
>> +		/*
>> +		 * It can occur that after fast_io_fail_tmo expired and before
>> +		 * dev_loss_tmo expired that the SCSI error handler has
>> +		 * offlined one or more devices. scsi_target_unblock() doesn't
>> +		 * change the state of these devices into running, so do that
>> +		 * explicitly.
>> +		 */
>> +		spin_lock_irq(shost->host_lock);
>> +		__shost_for_each_device(sdev, shost)
>> +			if (sdev->sdev_state == SDEV_OFFLINE)
>> +				sdev->sdev_state = SDEV_RUNNING;
>> +		spin_unlock_irq(shost->host_lock);
> 
> Is it possible for this to race with scsi_eh_offline_sdevs? Can it be
> looping over cmds offlining devices while this is looping over devices
> onlining them?
> 
> It seems this can also happen for all transports/drivers. Maybe a a scsi
> eh/lib helper function that syncrhonizes with the scsi eh completion
> would be better.

I'm not sure it's possible to avoid such a race without introducing
a new mutex. How about something like the (untested) SCSI core patch
below, and invoking scsi_block_eh() and scsi_unblock_eh() around any
reconnect activity not initiated from the SCSI EH thread ?

[PATCH] Add scsi_block_eh() and scsi_unblock_eh()

---
 drivers/scsi/hosts.c      |    1 +
 drivers/scsi/scsi_error.c |   10 ++++++++++
 include/scsi/scsi_host.h  |    1 +
 3 files changed, 12 insertions(+)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 17e2ccb..0df3ec8 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -360,6 +360,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	init_waitqueue_head(&shost->host_wait);
 
 	mutex_init(&shost->scan_mutex);
+	mutex_init(&shost->block_eh_mutex);
 
 	/*
 	 * subtract one because we increment first then return, but we need to
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ab16930..566daaa 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -551,6 +551,10 @@ static int scsi_begin_eh(struct Scsi_Host *host)
 {
 	int res;
 
+	res = mutex_lock_interruptible(&host->block_eh_mutex);
+	if (res)
+		goto out;
+
 	spin_lock_irq(host->host_lock);
 	switch (host->shost_state) {
 	case SHOST_DEL:
@@ -565,6 +569,10 @@ static int scsi_begin_eh(struct Scsi_Host *host)
 	}
 	spin_unlock_irq(host->host_lock);
 
+	if (res)
+		mutex_unlock(&host->block_eh_mutex);
+
+out:
 	return res;
 }
 
@@ -579,6 +587,8 @@ static void scsi_end_eh(struct Scsi_Host *host)
 	if (host->eh_active == 0)
 		wake_up(&host->host_wait);
 	spin_unlock_irq(host->host_lock);
+
+	mutex_unlock(&host->block_eh_mutex);
 }
 
 /**
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 9785e51..d7ce065 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -573,6 +573,7 @@ struct Scsi_Host {
 	spinlock_t		*host_lock;
 
 	struct mutex		scan_mutex;/* serialize scanning activity */
+	struct mutex		block_eh_mutex; /* block ML LLD EH calls */
 
 	struct list_head	eh_cmd_q;
 	struct task_struct    * ehandler;  /* Error recovery thread. */
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
  2013-06-12 13:28 ` [PATCH 07/14] scsi_transport_srp: Add transport layer error handling Bart Van Assche
       [not found]   ` <51B8777B.5050201-HInyCGIudOg@public.gmane.org>
@ 2013-06-23 21:13   ` Mike Christie
       [not found]     ` <51C764FB.6070207-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
  1 sibling, 1 reply; 21+ messages in thread
From: Mike Christie @ 2013-06-23 21:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley

On 06/12/2013 08:28 AM, Bart Van Assche wrote:
> +		/*
> +		 * It can occur that after fast_io_fail_tmo expired and before
> +		 * dev_loss_tmo expired that the SCSI error handler has
> +		 * offlined one or more devices. scsi_target_unblock() doesn't
> +		 * change the state of these devices into running, so do that
> +		 * explicitly.
> +		 */
> +		spin_lock_irq(shost->host_lock);
> +		__shost_for_each_device(sdev, shost)
> +			if (sdev->sdev_state == SDEV_OFFLINE)
> +				sdev->sdev_state = SDEV_RUNNING;
> +		spin_unlock_irq(shost->host_lock);

Is it possible for this to race with scsi_eh_offline_sdevs? Can it be
looping over cmds offlining devices while this is looping over devices
onlining them?

It seems this can also happen for all transports/drivers. Maybe a a scsi
eh/lib helper function that syncrhonizes with the scsi eh completion
would be better.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
  2013-06-19 15:27 ` Bart Van Assche
@ 2013-06-21 12:17   ` Jack Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jack Wang @ 2013-06-21 12:17 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley, Roland Dreier

On 06/19/2013 05:27 PM, Bart Van Assche wrote:
> On 06/19/13 15:44, Jack Wang wrote:
>>> +        /*
>>> +         * It can occur that after fast_io_fail_tmo expired and before
>>> +         * dev_loss_tmo expired that the SCSI error handler has
>>> +         * offlined one or more devices.  doesn't
>>> +         * change the state of these devices into running, so do that
>>> +         * explicitly.
>>> +         */
>>> +        spin_lock_irq(shost->host_lock);
>>> +        __shost_for_each_device(sdev, shost)
>>> +            if (sdev->sdev_state == SDEV_OFFLINE)
>>> +                sdev->sdev_state = SDEV_RUNNING;
>>> +        spin_unlock_irq(shost->host_lock);
>>
>> Do you have test case to verify this behaviour?
> 
> Hello Jack,
> 
> This is what I came up with after analyzing why a so-called "port
> flapping" test failed. The concept of that test is simple: use
> ibportstate to disable and reenable the proper IB port on the switch
> with random intervals and check whether I/O starts running again if the
> path remains operational long enough. When running such a test for a few
> days with random intervals between a few seconds and a few minutes
> sooner or later it will occur that scsi_try_host_reset() succeeds and
> that scsi_eh_test_devices() fails. That will cause the SCSI error
> handler to offline devices. Hence the above code to change the offline
> state into running after a reconnect succeeds. I'm not proud of that
> code but I couldn't find a better solution. Maybe the above code won't
> be necessary anymore once we switch to Hannes' new SCSI error handler.
> 
> Bart.

Thanks Bart for reply, in fact we saw same problem you describe here.
It's reasonable to set the device back to RUNNING after reconnect
succeeds. I'm curious why the scsi_target_unblock() doesn't handle this
case.

I'm not sure new SCSI eh from Hannes will avoid scsi eh set device to
offline in such situation, but at least it will avoid one bad lun block
whole host.

Jack


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
  2013-06-19 13:44 Jack Wang
@ 2013-06-19 15:27 ` Bart Van Assche
  2013-06-21 12:17   ` Jack Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2013-06-19 15:27 UTC (permalink / raw)
  To: Jack Wang
  Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley, Roland Dreier

On 06/19/13 15:44, Jack Wang wrote:
>> +		/*
>> +		 * It can occur that after fast_io_fail_tmo expired and before
>> +		 * dev_loss_tmo expired that the SCSI error handler has
>> +		 * offlined one or more devices. scsi_target_unblock() doesn't
>> +		 * change the state of these devices into running, so do that
>> +		 * explicitly.
>> +		 */
>> +		spin_lock_irq(shost->host_lock);
>> +		__shost_for_each_device(sdev, shost)
>> +			if (sdev->sdev_state == SDEV_OFFLINE)
>> +				sdev->sdev_state = SDEV_RUNNING;
>> +		spin_unlock_irq(shost->host_lock);
>
> Do you have test case to verify this behaviour?

Hello Jack,

This is what I came up with after analyzing why a so-called "port 
flapping" test failed. The concept of that test is simple: use 
ibportstate to disable and reenable the proper IB port on the switch 
with random intervals and check whether I/O starts running again if the 
path remains operational long enough. When running such a test for a few 
days with random intervals between a few seconds and a few minutes 
sooner or later it will occur that scsi_try_host_reset() succeeds and 
that scsi_eh_test_devices() fails. That will cause the SCSI error 
handler to offline devices. Hence the above code to change the offline 
state into running after a reconnect succeeds. I'm not proud of that 
code but I couldn't find a better solution. Maybe the above code won't 
be necessary anymore once we switch to Hannes' new SCSI error handler.

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
@ 2013-06-19 13:44 Jack Wang
  2013-06-19 15:27 ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Jack Wang @ 2013-06-19 13:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley, Roland Dreier

> +		/*
> +		 * It can occur that after fast_io_fail_tmo expired and before
> +		 * dev_loss_tmo expired that the SCSI error handler has
> +		 * offlined one or more devices. scsi_target_unblock() doesn't
> +		 * change the state of these devices into running, so do that
> +		 * explicitly.
> +		 */
> +		spin_lock_irq(shost->host_lock);
> +		__shost_for_each_device(sdev, shost)
> +			if (sdev->sdev_state == SDEV_OFFLINE)
> +				sdev->sdev_state = SDEV_RUNNING;
> +		spin_unlock_irq(shost->host_lock);
Hello Bart,

Do you have test case to verify this behaviour?

Regards,
Jack

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found]                   ` <51C09202.2040503-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-06-19 13:00                     ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2013-06-19 13:00 UTC (permalink / raw)
  To: Vu Pham
  Cc: Roland Dreier, David Dillow, Sebastian Riemer, linux-rdma,
	linux-scsi, James Bottomley

On 06/18/13 18:59, Vu Pham wrote:
> Bart Van Assche wrote:
>> On 06/14/13 19:59, Vu Pham wrote:
>>>> On 06/13/13 21:43, Vu Pham wrote:
>>>>> If rport's state is already SRP_RPORT_BLOCKED, I don't think we need
>>>>> to do extra block with scsi_block_requests()
>>>>
>>>> Please keep in mind that srp_reconnect_rport() can be called from two
>>>> different contexts: that function can not only be called from inside
>>>> the SRP transport layer but also from inside the SCSI error handler
>>>> (see also the srp_reset_device() modifications in a later patch in
>>>> this series). If this function is invoked from the context of the SCSI
>>>> error handler the chance is high that the SCSI device will have
>>>> another state than SDEV_BLOCK. Hence the scsi_block_requests() call in
>>>> this function.
>>> Yes, srp_reconnect_rport() can be called from two contexts; however, it
>>> deals with same rport & rport's state.
>>> I'm thinking something like this:
>>>
>>>     if (rport->state != SRP_RPORT_BLOCKED) {
>>>      scsi_block_requests(shost);
>>
>> Sorry but I'm afraid that that approach would still allow the user to
>> unblock one or more SCSI devices via sysfs during the
>> i->f->reconnect(rport) call, something we do not want.
>>
> I don't think that user can unblock scsi device(s) via sysfs if you use
> scsi_block_requests(shost) in srp_start_tl_fail_timers().

Hello Vu,

If scsi_block_requests() would be used in srp_start_tl_fail_timers() 
instead of scsi_target_block() then multipathd would no longer be able 
to notice that a path is blocked after the fast_io_fail and dev_loss 
timers started and hence wouldn't be able to use the optimization where 
blocked paths are skipped when queueing a new I/O request.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
  2013-06-15  9:52               ` Bart Van Assche
       [not found]                 ` <51BC3945.9030900-HInyCGIudOg@public.gmane.org>
@ 2013-06-18 16:59                 ` Vu Pham
       [not found]                   ` <51C09202.2040503-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 21+ messages in thread
From: Vu Pham @ 2013-06-18 16:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Sebastian Riemer, linux-rdma,
	linux-scsi, James Bottomley

Bart Van Assche wrote:
> On 06/14/13 19:59, Vu Pham wrote:
>>> On 06/13/13 21:43, Vu Pham wrote:
>>>>> +/**
>>>>> + * srp_tmo_valid() - check timeout combination validity
>>>>> + *
>>>>> + * If no fast I/O fail timeout has been configured then the device
>>>>> loss timeout
>>>>> + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail
>>>>> timeout has
>>>>> + * been configured then it must be below the device loss timeout.
>>>>> + */
>>>>> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
>>>>> +{
>>>>> +    return (fast_io_fail_tmo < 0 && 1 <= dev_loss_tmo &&
>>>>> +        dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
>>>>> +        || (0 <= fast_io_fail_tmo &&
>>>>> +            (dev_loss_tmo < 0 ||
>>>>> +             (fast_io_fail_tmo < dev_loss_tmo &&
>>>>> +              dev_loss_tmo < LONG_MAX / HZ))) ? 0 : -EINVAL;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(srp_tmo_valid);
>>>> fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with
>>>> negative value
>>>> dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with
>>>> negative value
>>>
>>> OK, will update the documentation such that it correctly refers to
>>> "off" instead of a negative value and I will also mention that
>>> dev_loss_tmo can now be disabled.
>>>
>> It's not only the documentation but also the code logic, you cannot turn
>> dev_loss_tmo off if fast_io_fail_tmo already turned off and vice versa
>> with the return statement above.
>
> Does this mean that you think it would be useful to disable both the 
> fast_io_fail and the dev_loss mechanisms, and hence rely on the user 
> to remove remote ports that have disappeared and on the SCSI command 
> timeout to detect path failures ?
Yes.

> I'll start testing this to see whether that combination does not 
> trigger any adverse behavior.
>

Ok

>>>> If rport's state is already SRP_RPORT_BLOCKED, I don't think we need
>>>> to do extra block with scsi_block_requests()
>>>
>>> Please keep in mind that srp_reconnect_rport() can be called from two
>>> different contexts: that function can not only be called from inside
>>> the SRP transport layer but also from inside the SCSI error handler
>>> (see also the srp_reset_device() modifications in a later patch in
>>> this series). If this function is invoked from the context of the SCSI
>>> error handler the chance is high that the SCSI device will have
>>> another state than SDEV_BLOCK. Hence the scsi_block_requests() call in
>>> this function.
>> Yes, srp_reconnect_rport() can be called from two contexts; however, it
>> deals with same rport & rport's state.
>> I'm thinking something like this:
>>
>>     if (rport->state != SRP_RPORT_BLOCKED) {
>>      scsi_block_requests(shost);
>
> Sorry but I'm afraid that that approach would still allow the user to 
> unblock one or more SCSI devices via sysfs during the 
> i->f->reconnect(rport) call, something we do not want.
>
I don't think that user can unblock scsi device(s) via sysfs if you use 
scsi_block_requests(shost) in srp_start_tl_fail_timers().

-vu

>> I think that we can use only the pair
>> scsi_block_requests()/scsi_unblock_requests() unless the advantage of
>> multipathd recognizing the SDEV_BLOCK is noticeable.
>
> I think the advantage of multipathd recognizing the SDEV_BLOCK state 
> before the fast_io_fail_tmo timer has expired is important. Multipathd 
> does not queue I/O to paths that are in the SDEV_BLOCK state so 
> setting that state helps I/O to fail over more quickly, especially for 
> large values of fast_io_fail_tmo.
>
> Hope this helps,
>
> Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found]                           ` <51BEBAEA.4080202-HInyCGIudOg@public.gmane.org>
  2013-06-17  8:10                             ` Hannes Reinecke
@ 2013-06-17 10:13                             ` Sebastian Riemer
  1 sibling, 0 replies; 21+ messages in thread
From: Sebastian Riemer @ 2013-06-17 10:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Hannes Reinecke, Vu Pham, Roland Dreier, David Dillow,
	linux-rdma, linux-scsi, James Bottomley

On 17.06.2013 09:29, Bart Van Assche wrote:
> On 06/17/13 09:14, Hannes Reinecke wrote:
>> On 06/17/2013 09:04 AM, Bart Van Assche wrote:
>>> I agree that the value of fast_io_fail_tmo should be kept small.
>>> Although as you explained changing the SCSI device state into
>>> SDEV_BLOCK doesn't help for I/O that has already been queued on a
>>> failed path, I think it's still useful for I/O that is queued after
>>> the fast_io_fail timer has been started and before that timer has
>>> expired.
>>
>> Why, but of course.
>>
>> The typical scenario would be:
>> -> detect link-loss
>> -> call scsi_block_request()
>> -> start dev_loss_tmo and fast_io_fail_tmo
>>
>> -> When fast_io_fail_tmo triggers:
>>     -> Abort all outstanding requests
>>
>> -> When dev_loss_tmo triggers:
>>     -> Abort all outstanding requests
>>     -> Remove/disable the I_T nexus
>>     -> call scsi_unblock_request()
>>
>> However, if and whether multipath detects SDEV_BLOCK doesn't
>> guarantee a fast failover; in fact is was only added rather recently
>> as it's not a big win in most cases.
> 
> Even if setting the state SDEV_BLOCK doesn't help much with improving
> failover time, it still has the advantage over using
> scsi_block_requests() that it can be overridden by a user via sysfs.

In my opinion that SDEV_BLOCK can help the reconnect. The only reason
for high fast_io_fail_tmo is that you don't use multipath at all and
hope that the connection becomes available again before that timeout.
You place the reconnects in between so that there is a chance that the
reconnect succeeds and the transport layer error work can be canceled.

But I have to look at all of your patches first to see how you
implemented the big picture.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found]                           ` <51BEBAEA.4080202-HInyCGIudOg@public.gmane.org>
@ 2013-06-17  8:10                             ` Hannes Reinecke
  2013-06-17 10:13                             ` Sebastian Riemer
  1 sibling, 0 replies; 21+ messages in thread
From: Hannes Reinecke @ 2013-06-17  8:10 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Vu Pham, Roland Dreier, David Dillow, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley

On 06/17/2013 09:29 AM, Bart Van Assche wrote:
> On 06/17/13 09:14, Hannes Reinecke wrote:
>> On 06/17/2013 09:04 AM, Bart Van Assche wrote:
>>> I agree that the value of fast_io_fail_tmo should be kept small.
>>> Although as you explained changing the SCSI device state into
>>> SDEV_BLOCK doesn't help for I/O that has already been queued on a
>>> failed path, I think it's still useful for I/O that is queued after
>>> the fast_io_fail timer has been started and before that timer has
>>> expired.
>>
>> Why, but of course.
>>
>> The typical scenario would be:
>> -> detect link-loss
>> -> call scsi_block_request()
>> -> start dev_loss_tmo and fast_io_fail_tmo
>>
>> -> When fast_io_fail_tmo triggers:
>>     -> Abort all outstanding requests
>>
>> -> When dev_loss_tmo triggers:
>>     -> Abort all outstanding requests
>>     -> Remove/disable the I_T nexus
>>     -> call scsi_unblock_request()
>>
>> However, if and whether multipath detects SDEV_BLOCK doesn't
>> guarantee a fast failover; in fact is was only added rather recently
>> as it's not a big win in most cases.
> 
> Even if setting the state SDEV_BLOCK doesn't help much with
> improving failover time, it still has the advantage over using
> scsi_block_requests() that it can be overridden by a user via sysfs.
> 
Argl. I meant scsi_internal_device_block(), not
scsi_block_requests(). Of course.

So we're arguing a non-existing point :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare-l3A5Bk7waGM@public.gmane.org			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
  2013-06-17  7:14                       ` Hannes Reinecke
@ 2013-06-17  7:29                         ` Bart Van Assche
       [not found]                           ` <51BEBAEA.4080202-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2013-06-17  7:29 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Vu Pham, Roland Dreier, David Dillow, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley

On 06/17/13 09:14, Hannes Reinecke wrote:
> On 06/17/2013 09:04 AM, Bart Van Assche wrote:
>> I agree that the value of fast_io_fail_tmo should be kept small.
>> Although as you explained changing the SCSI device state into
>> SDEV_BLOCK doesn't help for I/O that has already been queued on a
>> failed path, I think it's still useful for I/O that is queued after
>> the fast_io_fail timer has been started and before that timer has
>> expired.
>
> Why, but of course.
>
> The typical scenario would be:
> -> detect link-loss
> -> call scsi_block_request()
> -> start dev_loss_tmo and fast_io_fail_tmo
>
> -> When fast_io_fail_tmo triggers:
>     -> Abort all outstanding requests
>
> -> When dev_loss_tmo triggers:
>     -> Abort all outstanding requests
>     -> Remove/disable the I_T nexus
>     -> call scsi_unblock_request()
>
> However, if and whether multipath detects SDEV_BLOCK doesn't
> guarantee a fast failover; in fact is was only added rather recently
> as it's not a big win in most cases.

Even if setting the state SDEV_BLOCK doesn't help much with improving 
failover time, it still has the advantage over using 
scsi_block_requests() that it can be overridden by a user via sysfs.

Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
  2013-06-17  7:04                     ` Bart Van Assche
@ 2013-06-17  7:14                       ` Hannes Reinecke
  2013-06-17  7:29                         ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2013-06-17  7:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Vu Pham, Roland Dreier, David Dillow, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley

On 06/17/2013 09:04 AM, Bart Van Assche wrote:
> On 06/17/13 08:18, Hannes Reinecke wrote:
>> On 06/15/2013 11:52 AM, Bart Van Assche wrote:
[ .. ]
>>>
>>> I think the advantage of multipathd recognizing the SDEV_BLOCK state
>>> before the fast_io_fail_tmo timer has expired is important.
>>> Multipathd does not queue I/O to paths that are in the SDEV_BLOCK
>>> state so setting that state helps I/O to fail over more quickly,
>>> especially for large values of fast_io_fail_tmo.
>>>
>> Sadly it doesn't work that way.
>>
>> SDEV_BLOCK will instruct multipath to not queue _new_ I/Os to the
>> path, but there still will be I/O queued on that path.
>> For these multipath _has_ to wait for I/O completion.
>> And as it turns out, in most cases the application itself will wait
>> for completion on these I/O before continue sending more I/O.
>> So in effect multipath would queue new I/O to other paths, but won't
>> _receive_ new I/O as the upper layers are still waiting for
>> completion of the queued I/O.
>>
>> The only way to excite fast failover with multipathing is to set
>> fast_io_fail to a _LOW_ value (eg 5 seconds), as this will terminate
>> the outstanding I/Os.
>>
>> Large values of fast_io_fail will almost guarantee sluggish I/O
>> failover.
> 
> Hello Hannes,
> 
> I agree that the value of fast_io_fail_tmo should be kept small.
> Although as you explained changing the SCSI device state into
> SDEV_BLOCK doesn't help for I/O that has already been queued on a
> failed path, I think it's still useful for I/O that is queued after
> the fast_io_fail timer has been started and before that timer has
> expired.
> 
Why, but of course.

The typical scenario would be:
-> detect link-loss
-> call scsi_block_request()
-> start dev_loss_tmo and fast_io_fail_tmo

-> When fast_io_fail_tmo triggers:
   -> Abort all outstanding requests

-> When dev_loss_tmo triggers:
   -> Abort all outstanding requests
   -> Remove/disable the I_T nexus
   -> call scsi_unblock_request()

However, if and whether multipath detects SDEV_BLOCK doesn't
guarantee a fast failover; in fact is was only added rather recently
as it's not a big win in most cases.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
  2013-06-17  6:18                   ` Hannes Reinecke
@ 2013-06-17  7:04                     ` Bart Van Assche
  2013-06-17  7:14                       ` Hannes Reinecke
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2013-06-17  7:04 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Vu Pham, Roland Dreier, David Dillow, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley

On 06/17/13 08:18, Hannes Reinecke wrote:
> On 06/15/2013 11:52 AM, Bart Van Assche wrote:
>> On 06/14/13 19:59, Vu Pham wrote:
>>>> On 06/13/13 21:43, Vu Pham wrote:
>>>>>> +/**
>>>>>> + * srp_tmo_valid() - check timeout combination validity
>>>>>> + *
>>>>>> + * If no fast I/O fail timeout has been configured then the
>>>>>> device
>>>>>> loss timeout
>>>>>> + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O
>>>>>> fail
>>>>>> timeout has
>>>>>> + * been configured then it must be below the device loss timeout.
>>>>>> + */
>>>>>> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
>>>>>> +{
>>>>>> +    return (fast_io_fail_tmo < 0 && 1 <= dev_loss_tmo &&
>>>>>> +        dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
>>>>>> +        || (0 <= fast_io_fail_tmo &&
>>>>>> +            (dev_loss_tmo < 0 ||
>>>>>> +             (fast_io_fail_tmo < dev_loss_tmo &&
>>>>>> +              dev_loss_tmo < LONG_MAX / HZ))) ? 0 : -EINVAL;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(srp_tmo_valid);
>>>>> fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with
>>>>> negative value
>>>>> dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with
>>>>> negative value
>>>>
>>>> OK, will update the documentation such that it correctly refers to
>>>> "off" instead of a negative value and I will also mention that
>>>> dev_loss_tmo can now be disabled.
>>>>
>>> It's not only the documentation but also the code logic, you
>>> cannot turn
>>> dev_loss_tmo off if fast_io_fail_tmo already turned off and vice
>>> versa
>>> with the return statement above.
>>
>> Does this mean that you think it would be useful to disable both the
>> fast_io_fail and the dev_loss mechanisms, and hence rely on the user
>> to remove remote ports that have disappeared and on the SCSI command
>> timeout to detect path failures ? I'll start testing this to see
>> whether that combination does not trigger any adverse behavior.
>>
>>>>> If rport's state is already SRP_RPORT_BLOCKED, I don't think we
>>>>> need
>>>>> to do extra block with scsi_block_requests()
>>>>
>>>> Please keep in mind that srp_reconnect_rport() can be called from
>>>> two
>>>> different contexts: that function can not only be called from inside
>>>> the SRP transport layer but also from inside the SCSI error handler
>>>> (see also the srp_reset_device() modifications in a later patch in
>>>> this series). If this function is invoked from the context of the
>>>> SCSI
>>>> error handler the chance is high that the SCSI device will have
>>>> another state than SDEV_BLOCK. Hence the scsi_block_requests()
>>>> call in
>>>> this function.
>>> Yes, srp_reconnect_rport() can be called from two contexts;
>>> however, it
>>> deals with same rport & rport's state.
>>> I'm thinking something like this:
>>>
>>>      if (rport->state != SRP_RPORT_BLOCKED) {
>>>       scsi_block_requests(shost);
>>
>> Sorry but I'm afraid that that approach would still allow the user
>> to unblock one or more SCSI devices via sysfs during the
>> i->f->reconnect(rport) call, something we do not want.
>>
>>> I think that we can use only the pair
>>> scsi_block_requests()/scsi_unblock_requests() unless the advantage of
>>> multipathd recognizing the SDEV_BLOCK is noticeable.
>>
>> I think the advantage of multipathd recognizing the SDEV_BLOCK state
>> before the fast_io_fail_tmo timer has expired is important.
>> Multipathd does not queue I/O to paths that are in the SDEV_BLOCK
>> state so setting that state helps I/O to fail over more quickly,
>> especially for large values of fast_io_fail_tmo.
>>
> Sadly it doesn't work that way.
>
> SDEV_BLOCK will instruct multipath to not queue _new_ I/Os to the
> path, but there still will be I/O queued on that path.
> For these multipath _has_ to wait for I/O completion.
> And as it turns out, in most cases the application itself will wait
> for completion on these I/O before continue sending more I/O.
> So in effect multipath would queue new I/O to other paths, but won't
> _receive_ new I/O as the upper layers are still waiting for
> completion of the queued I/O.
>
> The only way to excite fast failover with multipathing is to set
> fast_io_fail to a _LOW_ value (eg 5 seconds), as this will terminate
> the outstanding I/Os.
>
> Large values of fast_io_fail will almost guarantee sluggish I/O
> failover.

Hello Hannes,

I agree that the value of fast_io_fail_tmo should be kept small. 
Although as you explained changing the SCSI device state into SDEV_BLOCK 
doesn't help for I/O that has already been queued on a failed path, I 
think it's still useful for I/O that is queued after the fast_io_fail 
timer has been started and before that timer has expired.

Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found]                 ` <51BC3945.9030900-HInyCGIudOg@public.gmane.org>
@ 2013-06-17  6:18                   ` Hannes Reinecke
  2013-06-17  7:04                     ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2013-06-17  6:18 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Vu Pham, Roland Dreier, David Dillow, Sebastian Riemer,
	linux-rdma, linux-scsi, James Bottomley

On 06/15/2013 11:52 AM, Bart Van Assche wrote:
> On 06/14/13 19:59, Vu Pham wrote:
>>> On 06/13/13 21:43, Vu Pham wrote:
>>>>> +/**
>>>>> + * srp_tmo_valid() - check timeout combination validity
>>>>> + *
>>>>> + * If no fast I/O fail timeout has been configured then the
>>>>> device
>>>>> loss timeout
>>>>> + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O
>>>>> fail
>>>>> timeout has
>>>>> + * been configured then it must be below the device loss timeout.
>>>>> + */
>>>>> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
>>>>> +{
>>>>> +    return (fast_io_fail_tmo < 0 && 1 <= dev_loss_tmo &&
>>>>> +        dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
>>>>> +        || (0 <= fast_io_fail_tmo &&
>>>>> +            (dev_loss_tmo < 0 ||
>>>>> +             (fast_io_fail_tmo < dev_loss_tmo &&
>>>>> +              dev_loss_tmo < LONG_MAX / HZ))) ? 0 : -EINVAL;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(srp_tmo_valid);
>>>> fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with
>>>> negative value
>>>> dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with
>>>> negative value
>>>
>>> OK, will update the documentation such that it correctly refers to
>>> "off" instead of a negative value and I will also mention that
>>> dev_loss_tmo can now be disabled.
>>>
>> It's not only the documentation but also the code logic, you
>> cannot turn
>> dev_loss_tmo off if fast_io_fail_tmo already turned off and vice
>> versa
>> with the return statement above.
> 
> Does this mean that you think it would be useful to disable both the
> fast_io_fail and the dev_loss mechanisms, and hence rely on the user
> to remove remote ports that have disappeared and on the SCSI command
> timeout to detect path failures ? I'll start testing this to see
> whether that combination does not trigger any adverse behavior.
> 
>>>> If rport's state is already SRP_RPORT_BLOCKED, I don't think we
>>>> need
>>>> to do extra block with scsi_block_requests()
>>>
>>> Please keep in mind that srp_reconnect_rport() can be called from
>>> two
>>> different contexts: that function can not only be called from inside
>>> the SRP transport layer but also from inside the SCSI error handler
>>> (see also the srp_reset_device() modifications in a later patch in
>>> this series). If this function is invoked from the context of the
>>> SCSI
>>> error handler the chance is high that the SCSI device will have
>>> another state than SDEV_BLOCK. Hence the scsi_block_requests()
>>> call in
>>> this function.
>> Yes, srp_reconnect_rport() can be called from two contexts;
>> however, it
>> deals with same rport & rport's state.
>> I'm thinking something like this:
>>
>>     if (rport->state != SRP_RPORT_BLOCKED) {
>>      scsi_block_requests(shost);
> 
> Sorry but I'm afraid that that approach would still allow the user
> to unblock one or more SCSI devices via sysfs during the
> i->f->reconnect(rport) call, something we do not want.
> 
>> I think that we can use only the pair
>> scsi_block_requests()/scsi_unblock_requests() unless the advantage of
>> multipathd recognizing the SDEV_BLOCK is noticeable.
> 
> I think the advantage of multipathd recognizing the SDEV_BLOCK state
> before the fast_io_fail_tmo timer has expired is important.
> Multipathd does not queue I/O to paths that are in the SDEV_BLOCK
> state so setting that state helps I/O to fail over more quickly,
> especially for large values of fast_io_fail_tmo.
> 
Sadly it doesn't work that way.

SDEV_BLOCK will instruct multipath to not queue _new_ I/Os to the
path, but there still will be I/O queued on that path.
For these multipath _has_ to wait for I/O completion.
And as it turns out, in most cases the application itself will wait
for completion on these I/O before continue sending more I/O.
So in effect multipath would queue new I/O to other paths, but won't
_receive_ new I/O as the upper layers are still waiting for
completion of the queued I/O.

The only way to excite fast failover with multipathing is to set
fast_io_fail to a _LOW_ value (eg 5 seconds), as this will terminate
the outstanding I/Os.

Large values of fast_io_fail will almost guarantee sluggish I/O
failover.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare-l3A5Bk7waGM@public.gmane.org			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found]             ` <51BB5A04.3080901-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-06-15  9:52               ` Bart Van Assche
       [not found]                 ` <51BC3945.9030900-HInyCGIudOg@public.gmane.org>
  2013-06-18 16:59                 ` Vu Pham
  0 siblings, 2 replies; 21+ messages in thread
From: Bart Van Assche @ 2013-06-15  9:52 UTC (permalink / raw)
  To: Vu Pham
  Cc: Roland Dreier, David Dillow, Sebastian Riemer, linux-rdma,
	linux-scsi, James Bottomley

On 06/14/13 19:59, Vu Pham wrote:
>> On 06/13/13 21:43, Vu Pham wrote:
>>>> +/**
>>>> + * srp_tmo_valid() - check timeout combination validity
>>>> + *
>>>> + * If no fast I/O fail timeout has been configured then the device
>>>> loss timeout
>>>> + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail
>>>> timeout has
>>>> + * been configured then it must be below the device loss timeout.
>>>> + */
>>>> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
>>>> +{
>>>> +    return (fast_io_fail_tmo < 0 && 1 <= dev_loss_tmo &&
>>>> +        dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
>>>> +        || (0 <= fast_io_fail_tmo &&
>>>> +            (dev_loss_tmo < 0 ||
>>>> +             (fast_io_fail_tmo < dev_loss_tmo &&
>>>> +              dev_loss_tmo < LONG_MAX / HZ))) ? 0 : -EINVAL;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(srp_tmo_valid);
>>> fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with
>>> negative value
>>> dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with
>>> negative value
>>
>> OK, will update the documentation such that it correctly refers to
>> "off" instead of a negative value and I will also mention that
>> dev_loss_tmo can now be disabled.
>>
> It's not only the documentation but also the code logic, you cannot turn
> dev_loss_tmo off if fast_io_fail_tmo already turned off and vice versa
> with the return statement above.

Does this mean that you think it would be useful to disable both the 
fast_io_fail and the dev_loss mechanisms, and hence rely on the user to 
remove remote ports that have disappeared and on the SCSI command 
timeout to detect path failures ? I'll start testing this to see whether 
that combination does not trigger any adverse behavior.

>>> If rport's state is already SRP_RPORT_BLOCKED, I don't think we need
>>> to do extra block with scsi_block_requests()
>>
>> Please keep in mind that srp_reconnect_rport() can be called from two
>> different contexts: that function can not only be called from inside
>> the SRP transport layer but also from inside the SCSI error handler
>> (see also the srp_reset_device() modifications in a later patch in
>> this series). If this function is invoked from the context of the SCSI
>> error handler the chance is high that the SCSI device will have
>> another state than SDEV_BLOCK. Hence the scsi_block_requests() call in
>> this function.
> Yes, srp_reconnect_rport() can be called from two contexts; however, it
> deals with same rport & rport's state.
> I'm thinking something like this:
>
>     if (rport->state != SRP_RPORT_BLOCKED) {
>      scsi_block_requests(shost);

Sorry but I'm afraid that that approach would still allow the user to 
unblock one or more SCSI devices via sysfs during the 
i->f->reconnect(rport) call, something we do not want.

> I think that we can use only the pair
> scsi_block_requests()/scsi_unblock_requests() unless the advantage of
> multipathd recognizing the SDEV_BLOCK is noticeable.

I think the advantage of multipathd recognizing the SDEV_BLOCK state 
before the fast_io_fail_tmo timer has expired is important. Multipathd 
does not queue I/O to paths that are in the SDEV_BLOCK state so setting 
that state helps I/O to fail over more quickly, especially for large 
values of fast_io_fail_tmo.

Hope this helps,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found]         ` <51BB1857.7040802-HInyCGIudOg@public.gmane.org>
@ 2013-06-14 17:59           ` Vu Pham
       [not found]             ` <51BB5A04.3080901-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 21+ messages in thread
From: Vu Pham @ 2013-06-14 17:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Sebastian Riemer, linux-rdma,
	linux-scsi, James Bottomley

Hello Bart,

> On 06/13/13 21:43, Vu Pham wrote:
>   
>> Hello Bart,
>>
>>     
>>> +What:        /sys/class/srp_remote_ports/port-<h>:<n>/dev_loss_tmo
>>> +Date:        September 1, 2013
>>> +KernelVersion:    3.11
>>> +Contact:    linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> +Description:    Number of seconds the SCSI layer will wait after a 
>>> transport
>>> +        layer error has been observed before removing a target port.
>>> +        Zero means immediate removal.
>>>       
>> A negative value will disable the target port removal.
>>
>> <snip>
>>     
>>> +
>>> +/**
>>> + * srp_tmo_valid() - check timeout combination validity
>>> + *
>>> + * If no fast I/O fail timeout has been configured then the device 
>>> loss timeout
>>> + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail 
>>> timeout has
>>> + * been configured then it must be below the device loss timeout.
>>> + */
>>> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
>>> +{
>>> +    return (fast_io_fail_tmo < 0 && 1 <= dev_loss_tmo &&
>>> +        dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
>>> +        || (0 <= fast_io_fail_tmo &&
>>> +            (dev_loss_tmo < 0 ||
>>> +             (fast_io_fail_tmo < dev_loss_tmo &&
>>> +              dev_loss_tmo < LONG_MAX / HZ))) ? 0 : -EINVAL;
>>> +}
>>> +EXPORT_SYMBOL_GPL(srp_tmo_valid);
>>>       
>> fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with negative 
>> value
>> dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with negative 
>> value
>>     
>
> OK, will update the documentation such that it correctly refers to "off" instead of a negative value and I will also mention that dev_loss_tmo can now be disabled.
>
>   
It's not only the documentation but also the code logic, you cannot turn 
dev_loss_tmo off if fast_io_fail_tmo already turned off and vice versa 
with the return statement above.
>> <snip>
>>     
>>> + * srp_reconnect_rport - reconnect by invoking 
>>> srp_function_template.reconnect()
>>> + *
>>> + * Blocks SCSI command queueing before invoking reconnect() such that 
>>> the
>>> + * scsi_host_template.queuecommand() won't be invoked concurrently with
>>> + * reconnect(). This is important since a reconnect() implementation may
>>> + * reallocate resources needed by queuecommand(). Please note that this
>>> + * function neither waits until outstanding requests have finished 
>>> nor tries
>>> + * to abort these. It is the responsibility of the reconnect() 
>>> function to
>>> + * finish outstanding commands before reconnecting to the target port.
>>> + */
>>> +int srp_reconnect_rport(struct srp_rport *rport)
>>> +{
>>> +    struct Scsi_Host *shost = rport_to_shost(rport);
>>> +    struct srp_internal *i = to_srp_internal(shost->transportt);
>>> +    struct scsi_device *sdev;
>>> +    int res;
>>> +
>>> +    pr_debug("SCSI host %s\n", dev_name(&shost->shost_gendev));
>>> +
>>> +    res = mutex_lock_interruptible(&rport->mutex);
>>> +    if (res) {
>>> +        pr_debug("%s: mutex_lock_interruptible() returned %d\n",
>>> +             dev_name(&shost->shost_gendev), res);
>>> +        goto out;
>>> +    }
>>> +
>>> +    spin_lock_irq(shost->host_lock);
>>> +    scsi_block_requests(shost);
>>> +    spin_unlock_irq(shost->host_lock);
>>> +
>>>       
>> In scsi_block_requests() definition, no locks are assumed held.
>>     
>
> Good catch :-) However, if you look around in drivers/scsi you will see that several SCSI LLD drivers invoke scsi_block_requests() with the host lock held. I'm not sure whether these LLDs or the scsi_block_requests() documentation is incorrect. Anyway, I'll leave the locking statements out since these are not necessary around this call of scsi_block_requests().
>
>   
>> If rport's state is already SRP_RPORT_BLOCKED, I don't think we need to 
>> do extra block with scsi_block_requests()
>>     
>
> Please keep in mind that srp_reconnect_rport() can be called from two different contexts: that function can not only be called from inside the SRP transport layer but also from inside the SCSI error handler (see also the srp_reset_device() modifications in a later patch in this series). If this function is invoked from the context of the SCSI error handler the chance is high that the SCSI device will have another state than SDEV_BLOCK. Hence the scsi_block_requests() call in this function.
>   
Yes, srp_reconnect_rport() can be called from two contexts; however, it 
deals with same rport & rport's state.
I'm thinking something like this:

    if (rport->state != SRP_RPORT_BLOCKED) {
	scsi_block_requests(shost);

    while (scsi_request_fn_active(shost))
        msleep(20);

    res = i->f->reconnect(rport);

    pr_debug("%s (state %d): transport.reconnect() returned %d\n",
         dev_name(&shost->shost_gendev), rport->state, res);
    if (res == 0) {
        cancel_delayed_work(&rport->fast_io_fail_work);
        cancel_delayed_work(&rport->dev_loss_work);
        rport->failed_reconnects = 0;
        scsi_unblock_requests(shost);
        srp_rport_set_state(rport, SRP_RPORT_RUNNING);
        /*
         * It can occur that after fast_io_fail_tmo expired and before
         * dev_loss_tmo expired that the SCSI error handler has
         * offlined one or more devices. scsi_target_unblock() doesn't
         * change the state of these devices into running, so do that
         * explicitly.
         */
        spin_lock_irq(shost->host_lock);
        __shost_for_each_device(sdev, shost)
            if (sdev->sdev_state == SDEV_OFFLINE)
                sdev->sdev_state = SDEV_RUNNING;
        spin_unlock_irq(shost->host_lock);
    } else if (rport->state != SRP_RPORT_BLOCKED) {
        scsi_unblock_requests(shost);
	mutex_unlock(&rport->mutex);
	srp_start_tl_fail_timers(rport);  /* we should consider some elapse time already passed ie. scsi command timedout seconds)
	mutex_lock(&rport->mutex);
    }

>   
>> Shouldn't we avoid using both scsi_block/unblock_requests() and 
>> scsi_target_block/unblock()?
>> ie. in srp_start_tl_fail_timers()  call scsi_block_requests(), in 
>> __rport_fast_io_fail_timedout() and rport_dev_loss_timedout() call 
>> scsi_unblock_requests()
>> and using scsi_block/unblock_requests in this function.
>>     
>
> I agree that using only one of these two mechanisms would be more elegant. However, the advantage of using scsi_target_block() as long as fast_io_fail_tmo has not expired is that this functions set the SDEV_BLOCK state which is recognized by multipathd.
Could you point out the advantage of multipathd recognizing the 
SDEV_BLOCK state while fast_io_fail_tmo has not expired?

>  The reason scsi_block_requests() is used in the above function is to prevent that a user can reenable requests via sysfs while a new InfiniBand RC connection is being established. A call of srp_queuecommand() concurrently with srp_create_target_ib() can namely result in a kernel crash.
>
>   
Agreed.
I think that we can use only the pair 
scsi_block_requests()/scsi_unblock_requests() unless the advantage of 
multipathd recognizing the SDEV_BLOCK is noticeable.
>>> +    while (scsi_request_fn_active(shost))
>>> +        msleep(20);
>>> +
>>> +    res = i->f->reconnect(rport);
>>> +    scsi_unblock_requests(shost);
>>> +    pr_debug("%s (state %d): transport.reconnect() returned %d\n",
>>> +         dev_name(&shost->shost_gendev), rport->state, res);
>>> +    if (res == 0) {
>>> +        cancel_delayed_work(&rport->fast_io_fail_work);
>>> +        cancel_delayed_work(&rport->dev_loss_work);
>>> +        rport->failed_reconnects = 0;
>>> +        scsi_target_unblock(&shost->shost_gendev, SDEV_RUNNING);
>>> +        srp_rport_set_state(rport, SRP_RPORT_RUNNING);
>>> +        /*
>>> +         * It can occur that after fast_io_fail_tmo expired and before
>>> +         * dev_loss_tmo expired that the SCSI error handler has
>>> +         * offlined one or more devices. scsi_target_unblock() doesn't
>>> +         * change the state of these devices into running, so do that
>>> +         * explicitly.
>>> +         */
>>> +        spin_lock_irq(shost->host_lock);
>>> +        __shost_for_each_device(sdev, shost)
>>> +            if (sdev->sdev_state == SDEV_OFFLINE)
>>> +                sdev->sdev_state = SDEV_RUNNING;
>>> +        spin_unlock_irq(shost->host_lock);
>>> +    } else if (rport->state == SRP_RPORT_RUNNING) {
>>> +        srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST);
>>> +        scsi_target_unblock(&shost->shost_gendev,
>>> +                    SDEV_TRANSPORT_OFFLINE);
>>>       
>> Would this unblock override the fast_io_fail_tmo functionality?
>>     
>
> Sorry but I don't think so. If something goes wrong with the transport layer then the function srp_start_tl_fail_timers() gets invoked. That function starts with changing the state from SRP_RPORT_RUNNING into SRP_RPORT_BLOCKED. In other words, the code below "if (rport->state == SRP_RPORT_RUNNING)" can only be reached if srp_reconnect_rport() is invoked from the context of the SCSI error handler and not if it is called from inside one of the new SRP transport layer times. In other words, whether or not that code will be reached depends on whether fast_io_fail_tmo is below or above the SCSI command timeout.
>
>   
Yes, if one set RC retry_count=7 and fast_io_fail_tmo > scsi command 
timed out, the scsi command timed out may happen before transport error 
detected, so srp_reconnect_rport() will called in scsi error handler 
context and the else if statement above will override the 
fast_io_fail_tmo functionality.
Please look back to my reference code above.
>>> +static enum blk_eh_timer_return srp_timed_out(struct scsi_cmnd *scmd)
>>> +{
>>> +    struct scsi_device *sdev = scmd->device;
>>> +
>>> +    pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
>>> +    return scsi_device_blocked(sdev) ? BLK_EH_RESET_TIMER :
>>> +        BLK_EH_NOT_HANDLED;
>>>       
>> if one turn off fast_io_fail_tmo, rport_fast_io_fail_timedout() won't be 
>> called, and reconnect is failed, scsi_device_blocked remains true, error 
>> recovery cannot happen
>> I think it should be
>> return (scsi_device_blocked(sdev) && rport->fast_io_fail_tmo > 0) ? 
>> BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED;
>>     
>
> How about the following alternative ?
>   
It look good. Let me do some testing

-vu

> diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
> index 53b34b3..84574fb 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -550,11 +550,12 @@ void srp_start_tl_fail_timers(struct srp_rport *rport)
>                 goto out_unlock;
>         pr_debug("%s new state: %d\n", dev_name(&shost->shost_gendev),
>                  rport->state);
> -       scsi_target_block(&shost->shost_gendev);
>  
> -       if (rport->fast_io_fail_tmo >= 0)
> +       if (rport->fast_io_fail_tmo >= 0) {
> +               scsi_target_block(&shost->shost_gendev);
>                 queue_delayed_work(system_long_wq, &rport->fast_io_fail_work,
>                                    1UL * rport->fast_io_fail_tmo * HZ);
> +       }
>         if (rport->dev_loss_tmo >= 0)
>                 queue_delayed_work(system_long_wq, &rport->dev_loss_work,
>                                    1UL * rport->dev_loss_tmo * HZ);
>
> Bart.
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
  2013-06-13 19:43     ` Vu Pham
@ 2013-06-14 13:19       ` Bart Van Assche
       [not found]         ` <51BB1857.7040802-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2013-06-14 13:19 UTC (permalink / raw)
  To: Vu Pham
  Cc: Roland Dreier, David Dillow, Sebastian Riemer, linux-rdma,
	linux-scsi, James Bottomley

On 06/13/13 21:43, Vu Pham wrote:
> Hello Bart,
> 
>>
>> +What:        /sys/class/srp_remote_ports/port-<h>:<n>/dev_loss_tmo
>> +Date:        September 1, 2013
>> +KernelVersion:    3.11
>> +Contact:    linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
>> +Description:    Number of seconds the SCSI layer will wait after a 
>> transport
>> +        layer error has been observed before removing a target port.
>> +        Zero means immediate removal.
> A negative value will disable the target port removal.
>
> <snip>
>> +
>> +/**
>> + * srp_tmo_valid() - check timeout combination validity
>> + *
>> + * If no fast I/O fail timeout has been configured then the device 
>> loss timeout
>> + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail 
>> timeout has
>> + * been configured then it must be below the device loss timeout.
>> + */
>> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
>> +{
>> +    return (fast_io_fail_tmo < 0 && 1 <= dev_loss_tmo &&
>> +        dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
>> +        || (0 <= fast_io_fail_tmo &&
>> +            (dev_loss_tmo < 0 ||
>> +             (fast_io_fail_tmo < dev_loss_tmo &&
>> +              dev_loss_tmo < LONG_MAX / HZ))) ? 0 : -EINVAL;
>> +}
>> +EXPORT_SYMBOL_GPL(srp_tmo_valid);
> fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with negative 
> value
> dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with negative 
> value

OK, will update the documentation such that it correctly refers to "off" instead of a negative value and I will also mention that dev_loss_tmo can now be disabled.

> <snip>
>>
>> + * srp_reconnect_rport - reconnect by invoking 
>> srp_function_template.reconnect()
>> + *
>> + * Blocks SCSI command queueing before invoking reconnect() such that 
>> the
>> + * scsi_host_template.queuecommand() won't be invoked concurrently with
>> + * reconnect(). This is important since a reconnect() implementation may
>> + * reallocate resources needed by queuecommand(). Please note that this
>> + * function neither waits until outstanding requests have finished 
>> nor tries
>> + * to abort these. It is the responsibility of the reconnect() 
>> function to
>> + * finish outstanding commands before reconnecting to the target port.
>> + */
>> +int srp_reconnect_rport(struct srp_rport *rport)
>> +{
>> +    struct Scsi_Host *shost = rport_to_shost(rport);
>> +    struct srp_internal *i = to_srp_internal(shost->transportt);
>> +    struct scsi_device *sdev;
>> +    int res;
>> +
>> +    pr_debug("SCSI host %s\n", dev_name(&shost->shost_gendev));
>> +
>> +    res = mutex_lock_interruptible(&rport->mutex);
>> +    if (res) {
>> +        pr_debug("%s: mutex_lock_interruptible() returned %d\n",
>> +             dev_name(&shost->shost_gendev), res);
>> +        goto out;
>> +    }
>> +
>> +    spin_lock_irq(shost->host_lock);
>> +    scsi_block_requests(shost);
>> +    spin_unlock_irq(shost->host_lock);
>> +
> In scsi_block_requests() definition, no locks are assumed held.

Good catch :-) However, if you look around in drivers/scsi you will see that several SCSI LLD drivers invoke scsi_block_requests() with the host lock held. I'm not sure whether these LLDs or the scsi_block_requests() documentation is incorrect. Anyway, I'll leave the locking statements out since these are not necessary around this call of scsi_block_requests().

> If rport's state is already SRP_RPORT_BLOCKED, I don't think we need to 
> do extra block with scsi_block_requests()

Please keep in mind that srp_reconnect_rport() can be called from two different contexts: that function can not only be called from inside the SRP transport layer but also from inside the SCSI error handler (see also the srp_reset_device() modifications in a later patch in this series). If this function is invoked from the context of the SCSI error handler the chance is high that the SCSI device will have another state than SDEV_BLOCK. Hence the scsi_block_requests() call in this function.

> Shouldn't we avoid using both scsi_block/unblock_requests() and 
> scsi_target_block/unblock()?
> ie. in srp_start_tl_fail_timers()  call scsi_block_requests(), in 
> __rport_fast_io_fail_timedout() and rport_dev_loss_timedout() call 
> scsi_unblock_requests()
> and using scsi_block/unblock_requests in this function.

I agree that using only one of these two mechanisms would be more elegant. However, the advantage of using scsi_target_block() as long as fast_io_fail_tmo has not expired is that this functions set the SDEV_BLOCK state which is recognized by multipathd. The reason scsi_block_requests() is used in the above function is to prevent that a user can reenable requests via sysfs while a new InfiniBand RC connection is being established. A call of srp_queuecommand() concurrently with srp_create_target_ib() can namely result in a kernel crash.

>> +    while (scsi_request_fn_active(shost))
>> +        msleep(20);
>> +
>> +    res = i->f->reconnect(rport);
>> +    scsi_unblock_requests(shost);
>> +    pr_debug("%s (state %d): transport.reconnect() returned %d\n",
>> +         dev_name(&shost->shost_gendev), rport->state, res);
>> +    if (res == 0) {
>> +        cancel_delayed_work(&rport->fast_io_fail_work);
>> +        cancel_delayed_work(&rport->dev_loss_work);
>> +        rport->failed_reconnects = 0;
>> +        scsi_target_unblock(&shost->shost_gendev, SDEV_RUNNING);
>> +        srp_rport_set_state(rport, SRP_RPORT_RUNNING);
>> +        /*
>> +         * It can occur that after fast_io_fail_tmo expired and before
>> +         * dev_loss_tmo expired that the SCSI error handler has
>> +         * offlined one or more devices. scsi_target_unblock() doesn't
>> +         * change the state of these devices into running, so do that
>> +         * explicitly.
>> +         */
>> +        spin_lock_irq(shost->host_lock);
>> +        __shost_for_each_device(sdev, shost)
>> +            if (sdev->sdev_state == SDEV_OFFLINE)
>> +                sdev->sdev_state = SDEV_RUNNING;
>> +        spin_unlock_irq(shost->host_lock);
>> +    } else if (rport->state == SRP_RPORT_RUNNING) {
>> +        srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST);
>> +        scsi_target_unblock(&shost->shost_gendev,
>> +                    SDEV_TRANSPORT_OFFLINE);
> Would this unblock override the fast_io_fail_tmo functionality?

Sorry but I don't think so. If something goes wrong with the transport layer then the function srp_start_tl_fail_timers() gets invoked. That function starts with changing the state from SRP_RPORT_RUNNING into SRP_RPORT_BLOCKED. In other words, the code below "if (rport->state == SRP_RPORT_RUNNING)" can only be reached if srp_reconnect_rport() is invoked from the context of the SCSI error handler and not if it is called from inside one of the new SRP transport layer times. In other words, whether or not that code will be reached depends on whether fast_io_fail_tmo is below or above the SCSI command timeout.

>> +static enum blk_eh_timer_return srp_timed_out(struct scsi_cmnd *scmd)
>> +{
>> +    struct scsi_device *sdev = scmd->device;
>> +
>> +    pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
>> +    return scsi_device_blocked(sdev) ? BLK_EH_RESET_TIMER :
>> +        BLK_EH_NOT_HANDLED;
> if one turn off fast_io_fail_tmo, rport_fast_io_fail_timedout() won't be 
> called, and reconnect is failed, scsi_device_blocked remains true, error 
> recovery cannot happen
> I think it should be
> return (scsi_device_blocked(sdev) && rport->fast_io_fail_tmo > 0) ? 
> BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED;

How about the following alternative ?

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 53b34b3..84574fb 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -550,11 +550,12 @@ void srp_start_tl_fail_timers(struct srp_rport *rport)
                goto out_unlock;
        pr_debug("%s new state: %d\n", dev_name(&shost->shost_gendev),
                 rport->state);
-       scsi_target_block(&shost->shost_gendev);
 
-       if (rport->fast_io_fail_tmo >= 0)
+       if (rport->fast_io_fail_tmo >= 0) {
+               scsi_target_block(&shost->shost_gendev);
                queue_delayed_work(system_long_wq, &rport->fast_io_fail_work,
                                   1UL * rport->fast_io_fail_tmo * HZ);
+       }
        if (rport->dev_loss_tmo >= 0)
                queue_delayed_work(system_long_wq, &rport->dev_loss_work,
                                   1UL * rport->dev_loss_tmo * HZ);

Bart.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
       [not found]   ` <51B8777B.5050201-HInyCGIudOg@public.gmane.org>
@ 2013-06-13 19:43     ` Vu Pham
  2013-06-14 13:19       ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Vu Pham @ 2013-06-13 19:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Sebastian Riemer, linux-rdma,
	linux-scsi, James Bottomley

Hello Bart,

>  
> +What:		/sys/class/srp_remote_ports/port-<h>:<n>/dev_loss_tmo
> +Date:		September 1, 2013
> +KernelVersion:	3.11
> +Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +Description:	Number of seconds the SCSI layer will wait after a transport
> +		layer error has been observed before removing a target port.
> +		Zero means immediate removal.
>   
A negative value will disable the target port removal.
> +
> +What:		/sys/class/srp_remote_ports/port-<h>:<n>/fast_io_fail_tmo
> +Date:		September 1, 2013
> +KernelVersion:	3.11
> +Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +Description:	Number of seconds the SCSI layer will wait after a transport
> +		layer error has been observed before failing I/O. Zero means
> +		immediate removal. A negative value will disable this
> +		behavior.
>
>   
<snip>
> +
> +/**
> + * srp_tmo_valid() - check timeout combination validity
> + *
> + * If no fast I/O fail timeout has been configured then the device loss timeout
> + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail timeout has
> + * been configured then it must be below the device loss timeout.
> + */
> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
> +{
> +	return (fast_io_fail_tmo < 0 && 1 <= dev_loss_tmo &&
> +		dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
> +		|| (0 <= fast_io_fail_tmo &&
> +		    (dev_loss_tmo < 0 ||
> +		     (fast_io_fail_tmo < dev_loss_tmo &&
> +		      dev_loss_tmo < LONG_MAX / HZ))) ? 0 : -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(srp_tmo_valid);
>   
fast_io_fail_tmo is off, one cannot turn off dev_loss_tmo with negative 
value
dev_loss_tmo is off, one cannot turn off fast_io_fail_tmo with negative 
value

<snip>
>
> + * srp_reconnect_rport - reconnect by invoking srp_function_template.reconnect()
> + *
> + * Blocks SCSI command queueing before invoking reconnect() such that the
> + * scsi_host_template.queuecommand() won't be invoked concurrently with
> + * reconnect(). This is important since a reconnect() implementation may
> + * reallocate resources needed by queuecommand(). Please note that this
> + * function neither waits until outstanding requests have finished nor tries
> + * to abort these. It is the responsibility of the reconnect() function to
> + * finish outstanding commands before reconnecting to the target port.
> + */
> +int srp_reconnect_rport(struct srp_rport *rport)
> +{
> +	struct Scsi_Host *shost = rport_to_shost(rport);
> +	struct srp_internal *i = to_srp_internal(shost->transportt);
> +	struct scsi_device *sdev;
> +	int res;
> +
> +	pr_debug("SCSI host %s\n", dev_name(&shost->shost_gendev));
> +
> +	res = mutex_lock_interruptible(&rport->mutex);
> +	if (res) {
> +		pr_debug("%s: mutex_lock_interruptible() returned %d\n",
> +			 dev_name(&shost->shost_gendev), res);
> +		goto out;
> +	}
> +
> +	spin_lock_irq(shost->host_lock);
> +	scsi_block_requests(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>   
In scsi_block_requests() definition, no locks are assumed held.

If rport's state is already SRP_RPORT_BLOCKED, I don't think we need to 
do extra block with scsi_block_requests()

Shouldn't we avoid using both scsi_block/unblock_requests() and 
scsi_target_block/unblock()?
ie. in srp_start_tl_fail_timers()  call scsi_block_requests(), in 
__rport_fast_io_fail_timedout() and rport_dev_loss_timedout() call 
scsi_unblock_requests()
and using scsi_block/unblock_requests in this fuction.
> +	while (scsi_request_fn_active(shost))
> +		msleep(20);
> +
> +	res = i->f->reconnect(rport);
> +	scsi_unblock_requests(shost);
> +	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
> +		 dev_name(&shost->shost_gendev), rport->state, res);
> +	if (res == 0) {
> +		cancel_delayed_work(&rport->fast_io_fail_work);
> +		cancel_delayed_work(&rport->dev_loss_work);
> +		rport->failed_reconnects = 0;
> +		scsi_target_unblock(&shost->shost_gendev, SDEV_RUNNING);
> +		srp_rport_set_state(rport, SRP_RPORT_RUNNING);
> +		/*
> +		 * It can occur that after fast_io_fail_tmo expired and before
> +		 * dev_loss_tmo expired that the SCSI error handler has
> +		 * offlined one or more devices. scsi_target_unblock() doesn't
> +		 * change the state of these devices into running, so do that
> +		 * explicitly.
> +		 */
> +		spin_lock_irq(shost->host_lock);
> +		__shost_for_each_device(sdev, shost)
> +			if (sdev->sdev_state == SDEV_OFFLINE)
> +				sdev->sdev_state = SDEV_RUNNING;
> +		spin_unlock_irq(shost->host_lock);
> +	} else if (rport->state == SRP_RPORT_RUNNING) {
> +		srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST);
> +		scsi_target_unblock(&shost->shost_gendev,
> +				    SDEV_TRANSPORT_OFFLINE);
>   
Would this unblock override the fast_io_fail_tmo functionality?

> +	}
> +	mutex_unlock(&rport->mutex);
> +
> +out:
> +	return res;
> +}
> +EXPORT_SYMBOL(srp_reconnect_rport);
> +
> +static void srp_reconnect_work(struct work_struct *work)
> +{
> +	struct srp_rport *rport = container_of(to_delayed_work(work),
> +					struct srp_rport, reconnect_work);
> +	struct Scsi_Host *shost = rport_to_shost(rport);
> +	int delay, res;
> +
> +	res = srp_reconnect_rport(rport);
> +	if (res != 0) {
> +		shost_printk(KERN_ERR, shost,
> +			     "reconnect attempt %d failed (%d)\n",
> +			     ++rport->failed_reconnects, res);
> +		delay = rport->reconnect_delay *
> +			min(100, max(1, rport->failed_reconnects - 10));
> +		if (delay > 0)
> +			queue_delayed_work(system_long_wq,
> +					   &rport->reconnect_work, delay * HZ);
> +	}
> +}
> +
> +static void __rport_fast_io_fail_timedout(struct srp_rport *rport)
> +{
> +	struct Scsi_Host *shost = rport_to_shost(rport);
> +	struct srp_internal *i;
> +
> +	lockdep_assert_held(&rport->mutex);
> +
> +	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
> +		return;
> +	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
> +
> +	/* Involve the LLDD if possible to terminate all io on the rport. */
> +	i = to_srp_internal(shost->transportt);
> +	if (i->f->terminate_rport_io)
> +		i->f->terminate_rport_io(rport);
> +}
> +
> +/**
> + * rport_fast_io_fail_timedout() - fast I/O failure timeout handler
> + *
> + * Unblocks the SCSI host.
> + */
> +static void rport_fast_io_fail_timedout(struct work_struct *work)
> +{
> +	struct srp_rport *rport = container_of(to_delayed_work(work),
> +					struct srp_rport, fast_io_fail_work);
> +	struct Scsi_Host *shost = rport_to_shost(rport);
> +
> +	pr_debug("fast_io_fail_tmo expired for %s.\n",
> +		 dev_name(&shost->shost_gendev));
> +
> +	mutex_lock(&rport->mutex);
> +	__rport_fast_io_fail_timedout(rport);
> +	mutex_unlock(&rport->mutex);
> +}
> +
> +/**
> + * rport_dev_loss_timedout() - device loss timeout handler
> + *
> + * Note: rport->ft->rport_delete must either unblock the SCSI host or schedule
> + * SCSI host removal.
> + */
> +static void rport_dev_loss_timedout(struct work_struct *work)
> +{
> +	struct srp_rport *rport = container_of(to_delayed_work(work),
> +					struct srp_rport, dev_loss_work);
> +	struct Scsi_Host *shost = rport_to_shost(rport);
> +	struct srp_internal *i = to_srp_internal(shost->transportt);
> +
> +	pr_err("dev_loss_tmo expired for %s.\n",
> +	       dev_name(&shost->shost_gendev));
> +
> +	mutex_lock(&rport->mutex);
> +	WARN_ON(srp_rport_set_state(rport, SRP_RPORT_LOST) != 0);
> +	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
> +	mutex_unlock(&rport->mutex);
> +
> +	i->f->rport_delete(rport);
> +}
> +
> +/**
> + * srp_start_tl_fail_timers() - start the transport layer failure timers
> + *
> + * Start the transport layer fast I/O failure and device loss timers. Do not
> + * modify a timer that was already started.
> + */
> +void srp_start_tl_fail_timers(struct srp_rport *rport)
> +{
> +	struct Scsi_Host *shost = rport_to_shost(rport);
> +	int delay;
> +
> +	mutex_lock(&rport->mutex);
> +	pr_debug("%s current state: %d\n", dev_name(&shost->shost_gendev),
> +		 rport->state);
> +	if (srp_rport_set_state(rport, SRP_RPORT_BLOCKED) != 0)
> +		goto out_unlock;
> +	pr_debug("%s new state: %d\n", dev_name(&shost->shost_gendev),
> +		 rport->state);
> +	scsi_target_block(&shost->shost_gendev);
> +
> +	if (rport->fast_io_fail_tmo >= 0)
> +		queue_delayed_work(system_long_wq, &rport->fast_io_fail_work,
> +				   1UL * rport->fast_io_fail_tmo * HZ);
> +	if (rport->dev_loss_tmo >= 0)
> +		queue_delayed_work(system_long_wq, &rport->dev_loss_work,
> +				   1UL * rport->dev_loss_tmo * HZ);
> +	delay = rport->reconnect_delay;
> +	if (delay > 0)
> +		queue_delayed_work(system_long_wq, &rport->reconnect_work,
> +				   1UL * delay * HZ);
> +out_unlock:
> +	mutex_unlock(&rport->mutex);
> +}
> +EXPORT_SYMBOL(srp_start_tl_fail_timers);
> +
> +/**
> + * srp_timed_out - SRP transport intercept of the SCSI timeout EH
> + *
> + * If a timeout occurs while an rport is in the blocked state, ask the SCSI
> + * EH to continue waiting (BLK_EH_RESET_TIMER). Otherwise let the SCSI core
> + * handle the timeout (BLK_EH_NOT_HANDLED).
> + *
> + * Note: This function is called from soft-IRQ context and with the request
> + * queue lock held.
> + */
> +static enum blk_eh_timer_return srp_timed_out(struct scsi_cmnd *scmd)
> +{
> +	struct scsi_device *sdev = scmd->device;
> +
> +	pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
> +	return scsi_device_blocked(sdev) ? BLK_EH_RESET_TIMER :
> +		BLK_EH_NOT_HANDLED;
>   
if one turn off fast_io_fail_tmo, rport_fast_io_fail_timedout() won't be 
called, and reconnect is failed, scsi_device_blocked remains true, error 
recovery cannot happen
I think it should be
return (scsi_device_blocked(sdev) && rport->fast_io_fail_tmo > 0) ? 
BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED;

-vu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 07/14] scsi_transport_srp: Add transport layer error handling
  2013-06-12 13:17 [PATCH 0/14] IB SRP initiator patches for kernel 3.11 Bart Van Assche
@ 2013-06-12 13:28 ` Bart Van Assche
       [not found]   ` <51B8777B.5050201-HInyCGIudOg@public.gmane.org>
  2013-06-23 21:13   ` Mike Christie
  0 siblings, 2 replies; 21+ messages in thread
From: Bart Van Assche @ 2013-06-12 13:28 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley

Add the necessary functions in the SRP transport module to allow
an SRP initiator driver to implement transport layer error handling
similar to the functionality already provided by the FC transport
layer. This includes:
- Support for implementing fast_io_fail_tmo, the time that should
  elapse after having detected a transport layer problem and
  before failing I/O.
- Support for implementing dev_loss_tmo, the time that should
  elapse after having detected a transport layer problem and
  before removing a remote port.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: David Dillow <dillowda@ornl.gov>
Cc: Vu Pham <vu@mellanox.com>
Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com>
---
 Documentation/ABI/stable/sysfs-transport-srp |   36 ++
 drivers/scsi/scsi_transport_srp.c            |  489 +++++++++++++++++++++++++-
 include/scsi/scsi_transport_srp.h            |   54 ++-
 3 files changed, 576 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
index b36fb0d..b7759b1 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -5,6 +5,23 @@ Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
 Description:	Instructs an SRP initiator to disconnect from a target and to
 		remove all LUNs imported from that target.
 
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/dev_loss_tmo
+Date:		September 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:	Number of seconds the SCSI layer will wait after a transport
+		layer error has been observed before removing a target port.
+		Zero means immediate removal.
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/fast_io_fail_tmo
+Date:		September 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:	Number of seconds the SCSI layer will wait after a transport
+		layer error has been observed before failing I/O. Zero means
+		immediate removal. A negative value will disable this
+		behavior.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/port_id
 Date:		June 27, 2007
 KernelVersion:	2.6.24
@@ -12,8 +29,27 @@ Contact:	linux-scsi@vger.kernel.org
 Description:	16-byte local SRP port identifier in hexadecimal format. An
 		example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00.
 
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/reconnect_delay
+Date:		September 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:	Number of seconds the SCSI layer will wait after a reconnect
+		attempt failed before retrying.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/roles
 Date:		June 27, 2007
 KernelVersion:	2.6.24
 Contact:	linux-scsi@vger.kernel.org
 Description:	Role of the remote port. Either "SRP Initiator" or "SRP Target".
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/state
+Date:		September 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:	State of the transport layer to the remote port. "running" if
+		the transport layer is operational; "blocked" if a transport
+		layer error has been encountered but the fail_io_fast_tmo
+		timer has not yet fired; "fail-fast" after the
+		fail_io_fast_tmo timer has fired and before the "dev_loss_tmo"
+		timer has fired; "lost" after the "dev_loss_tmo" timer has
+		fired and before the port is finally removed.
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index f7ba94a..53b34b3 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,12 +24,15 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/delay.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_transport.h>
+#include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_transport_srp.h>
+#include "scsi_priv.h"
 #include "scsi_transport_srp_internal.h"
 
 struct srp_host_attrs {
@@ -38,7 +41,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)	((struct srp_host_attrs *)(host)->shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 3
+#define SRP_RPORT_ATTRS 8
 
 struct srp_internal {
 	struct scsi_transport_template t;
@@ -54,6 +57,28 @@ struct srp_internal {
 
 #define	dev_to_rport(d)	container_of(d, struct srp_rport, dev)
 #define transport_class_to_srp_rport(dev) dev_to_rport((dev)->parent)
+static inline struct Scsi_Host *rport_to_shost(struct srp_rport *r)
+{
+	return dev_to_shost(r->dev.parent);
+}
+
+/**
+ * srp_tmo_valid() - check timeout combination validity
+ *
+ * If no fast I/O fail timeout has been configured then the device loss timeout
+ * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail timeout has
+ * been configured then it must be below the device loss timeout.
+ */
+int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
+{
+	return (fast_io_fail_tmo < 0 && 1 <= dev_loss_tmo &&
+		dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+		|| (0 <= fast_io_fail_tmo &&
+		    (dev_loss_tmo < 0 ||
+		     (fast_io_fail_tmo < dev_loss_tmo &&
+		      dev_loss_tmo < LONG_MAX / HZ))) ? 0 : -EINVAL;
+}
+EXPORT_SYMBOL_GPL(srp_tmo_valid);
 
 static int srp_host_setup(struct transport_container *tc, struct device *dev,
 			  struct device *cdev)
@@ -134,10 +159,441 @@ static ssize_t store_srp_rport_delete(struct device *dev,
 
 static DEVICE_ATTR(delete, S_IWUSR, NULL, store_srp_rport_delete);
 
+static ssize_t show_srp_rport_state(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	static const char *const state_name[] = {
+		[SRP_RPORT_RUNNING]	= "running",
+		[SRP_RPORT_BLOCKED]	= "blocked",
+		[SRP_RPORT_FAIL_FAST]	= "fail-fast",
+		[SRP_RPORT_LOST]	= "lost",
+	};
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	enum srp_rport_state state = rport->state;
+
+	return sprintf(buf, "%s\n",
+		       (unsigned)state < ARRAY_SIZE(state_name) ?
+		       state_name[state] : "???");
+}
+
+static DEVICE_ATTR(state, S_IRUGO, show_srp_rport_state, NULL);
+
+static ssize_t show_reconnect_delay(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%d\n", rport->reconnect_delay);
+}
+
+static ssize_t store_reconnect_delay(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, const size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16];
+	int res, delay;
+
+	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
+	res = kstrtoint(ch, 0, &delay);
+	if (res)
+		goto out;
+
+	res = mutex_lock_interruptible(&rport->mutex);
+	if (res)
+		goto out;
+	if (rport->reconnect_delay <= 0 && delay > 0 &&
+	    rport->state != SRP_RPORT_RUNNING) {
+		queue_delayed_work(system_long_wq, &rport->reconnect_work,
+				   delay * HZ);
+	} else if (delay <= 0) {
+		cancel_delayed_work(&rport->reconnect_work);
+	}
+	rport->reconnect_delay = delay;
+	mutex_unlock(&rport->mutex);
+
+	res = count;
+
+out:
+	return res;
+}
+
+static DEVICE_ATTR(reconnect_delay, S_IRUGO | S_IWUSR, show_reconnect_delay,
+		   store_reconnect_delay);
+
+static ssize_t show_failed_reconnects(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%d\n", rport->failed_reconnects);
+}
+
+static DEVICE_ATTR(failed_reconnects, S_IRUGO, show_failed_reconnects, NULL);
+
+static ssize_t show_srp_rport_fast_io_fail_tmo(struct device *dev,
+					       struct device_attribute *attr,
+					       char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	if (rport->fast_io_fail_tmo >= 0)
+		return sprintf(buf, "%d\n", rport->fast_io_fail_tmo);
+	else
+		return sprintf(buf, "off\n");
+}
+
+static ssize_t store_srp_rport_fast_io_fail_tmo(struct device *dev,
+						struct device_attribute *attr,
+						const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16], *p;
+	int res;
+	int fast_io_fail_tmo;
+
+	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
+	p = strchr(ch, '\n');
+	if (p)
+		*p = '\0';
+
+	if (strcmp(ch, "off") != 0) {
+		res = kstrtoint(ch, 0, &fast_io_fail_tmo);
+		if (res)
+			goto out;
+	} else {
+		fast_io_fail_tmo = -1;
+	}
+	res = srp_tmo_valid(fast_io_fail_tmo, rport->dev_loss_tmo);
+	if (res)
+		goto out;
+	rport->fast_io_fail_tmo = fast_io_fail_tmo;
+	res = count;
+
+out:
+	return res;
+}
+
+static DEVICE_ATTR(fast_io_fail_tmo, S_IRUGO | S_IWUSR,
+		   show_srp_rport_fast_io_fail_tmo,
+		   store_srp_rport_fast_io_fail_tmo);
+
+static ssize_t show_srp_rport_dev_loss_tmo(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	if (rport->dev_loss_tmo >= 0)
+		return sprintf(buf, "%d\n", rport->dev_loss_tmo);
+	else
+		return sprintf(buf, "off\n");
+}
+
+static ssize_t store_srp_rport_dev_loss_tmo(struct device *dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16];
+	int res;
+	int dev_loss_tmo;
+
+	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
+	if (strcmp(ch, "off") != 0) {
+		res = kstrtoint(ch, 0, &dev_loss_tmo);
+		if (res)
+			goto out;
+	} else {
+		dev_loss_tmo = -1;
+	}
+	res = srp_tmo_valid(rport->fast_io_fail_tmo, dev_loss_tmo);
+	if (res)
+		goto out;
+	rport->dev_loss_tmo = dev_loss_tmo;
+	res = count;
+
+out:
+	return res;
+}
+
+static DEVICE_ATTR(dev_loss_tmo, S_IRUGO | S_IWUSR,
+		   show_srp_rport_dev_loss_tmo,
+		   store_srp_rport_dev_loss_tmo);
+
+static int srp_rport_set_state(struct srp_rport *rport,
+			       enum srp_rport_state new_state)
+{
+	enum srp_rport_state old_state = rport->state;
+
+	lockdep_assert_held(&rport->mutex);
+
+	switch (new_state) {
+	case SRP_RPORT_RUNNING:
+		switch (old_state) {
+		case SRP_RPORT_LOST:
+			goto invalid;
+		default:
+			break;
+		}
+		break;
+	case SRP_RPORT_BLOCKED:
+		switch (old_state) {
+		case SRP_RPORT_RUNNING:
+			break;
+		default:
+			goto invalid;
+		}
+		break;
+	case SRP_RPORT_FAIL_FAST:
+		switch (old_state) {
+		case SRP_RPORT_LOST:
+			goto invalid;
+		default:
+			break;
+		}
+		break;
+	case SRP_RPORT_LOST:
+		break;
+	}
+	rport->state = new_state;
+	return 0;
+
+invalid:
+	return -EINVAL;
+}
+
+/**
+ * scsi_request_fn_active - number of kernel threads inside scsi_request_fn()
+ */
+static int scsi_request_fn_active(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+	struct request_queue *q;
+	int request_fn_active = 0;
+
+	shost_for_each_device(sdev, shost) {
+		q = sdev->request_queue;
+
+		spin_lock_irq(q->queue_lock);
+		request_fn_active += q->request_fn_active;
+		spin_unlock_irq(q->queue_lock);
+	}
+
+	return request_fn_active;
+}
+
+/**
+ * srp_reconnect_rport - reconnect by invoking srp_function_template.reconnect()
+ *
+ * Blocks SCSI command queueing before invoking reconnect() such that the
+ * scsi_host_template.queuecommand() won't be invoked concurrently with
+ * reconnect(). This is important since a reconnect() implementation may
+ * reallocate resources needed by queuecommand(). Please note that this
+ * function neither waits until outstanding requests have finished nor tries
+ * to abort these. It is the responsibility of the reconnect() function to
+ * finish outstanding commands before reconnecting to the target port.
+ */
+int srp_reconnect_rport(struct srp_rport *rport)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct srp_internal *i = to_srp_internal(shost->transportt);
+	struct scsi_device *sdev;
+	int res;
+
+	pr_debug("SCSI host %s\n", dev_name(&shost->shost_gendev));
+
+	res = mutex_lock_interruptible(&rport->mutex);
+	if (res) {
+		pr_debug("%s: mutex_lock_interruptible() returned %d\n",
+			 dev_name(&shost->shost_gendev), res);
+		goto out;
+	}
+
+	spin_lock_irq(shost->host_lock);
+	scsi_block_requests(shost);
+	spin_unlock_irq(shost->host_lock);
+
+	while (scsi_request_fn_active(shost))
+		msleep(20);
+
+	res = i->f->reconnect(rport);
+	scsi_unblock_requests(shost);
+	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
+		 dev_name(&shost->shost_gendev), rport->state, res);
+	if (res == 0) {
+		cancel_delayed_work(&rport->fast_io_fail_work);
+		cancel_delayed_work(&rport->dev_loss_work);
+		rport->failed_reconnects = 0;
+		scsi_target_unblock(&shost->shost_gendev, SDEV_RUNNING);
+		srp_rport_set_state(rport, SRP_RPORT_RUNNING);
+		/*
+		 * It can occur that after fast_io_fail_tmo expired and before
+		 * dev_loss_tmo expired that the SCSI error handler has
+		 * offlined one or more devices. scsi_target_unblock() doesn't
+		 * change the state of these devices into running, so do that
+		 * explicitly.
+		 */
+		spin_lock_irq(shost->host_lock);
+		__shost_for_each_device(sdev, shost)
+			if (sdev->sdev_state == SDEV_OFFLINE)
+				sdev->sdev_state = SDEV_RUNNING;
+		spin_unlock_irq(shost->host_lock);
+	} else if (rport->state == SRP_RPORT_RUNNING) {
+		srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST);
+		scsi_target_unblock(&shost->shost_gendev,
+				    SDEV_TRANSPORT_OFFLINE);
+	}
+	mutex_unlock(&rport->mutex);
+
+out:
+	return res;
+}
+EXPORT_SYMBOL(srp_reconnect_rport);
+
+static void srp_reconnect_work(struct work_struct *work)
+{
+	struct srp_rport *rport = container_of(to_delayed_work(work),
+					struct srp_rport, reconnect_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	int delay, res;
+
+	res = srp_reconnect_rport(rport);
+	if (res != 0) {
+		shost_printk(KERN_ERR, shost,
+			     "reconnect attempt %d failed (%d)\n",
+			     ++rport->failed_reconnects, res);
+		delay = rport->reconnect_delay *
+			min(100, max(1, rport->failed_reconnects - 10));
+		if (delay > 0)
+			queue_delayed_work(system_long_wq,
+					   &rport->reconnect_work, delay * HZ);
+	}
+}
+
+static void __rport_fast_io_fail_timedout(struct srp_rport *rport)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct srp_internal *i;
+
+	lockdep_assert_held(&rport->mutex);
+
+	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
+		return;
+	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
+
+	/* Involve the LLDD if possible to terminate all io on the rport. */
+	i = to_srp_internal(shost->transportt);
+	if (i->f->terminate_rport_io)
+		i->f->terminate_rport_io(rport);
+}
+
+/**
+ * rport_fast_io_fail_timedout() - fast I/O failure timeout handler
+ *
+ * Unblocks the SCSI host.
+ */
+static void rport_fast_io_fail_timedout(struct work_struct *work)
+{
+	struct srp_rport *rport = container_of(to_delayed_work(work),
+					struct srp_rport, fast_io_fail_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+
+	pr_debug("fast_io_fail_tmo expired for %s.\n",
+		 dev_name(&shost->shost_gendev));
+
+	mutex_lock(&rport->mutex);
+	__rport_fast_io_fail_timedout(rport);
+	mutex_unlock(&rport->mutex);
+}
+
+/**
+ * rport_dev_loss_timedout() - device loss timeout handler
+ *
+ * Note: rport->ft->rport_delete must either unblock the SCSI host or schedule
+ * SCSI host removal.
+ */
+static void rport_dev_loss_timedout(struct work_struct *work)
+{
+	struct srp_rport *rport = container_of(to_delayed_work(work),
+					struct srp_rport, dev_loss_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct srp_internal *i = to_srp_internal(shost->transportt);
+
+	pr_err("dev_loss_tmo expired for %s.\n",
+	       dev_name(&shost->shost_gendev));
+
+	mutex_lock(&rport->mutex);
+	WARN_ON(srp_rport_set_state(rport, SRP_RPORT_LOST) != 0);
+	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
+	mutex_unlock(&rport->mutex);
+
+	i->f->rport_delete(rport);
+}
+
+/**
+ * srp_start_tl_fail_timers() - start the transport layer failure timers
+ *
+ * Start the transport layer fast I/O failure and device loss timers. Do not
+ * modify a timer that was already started.
+ */
+void srp_start_tl_fail_timers(struct srp_rport *rport)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	int delay;
+
+	mutex_lock(&rport->mutex);
+	pr_debug("%s current state: %d\n", dev_name(&shost->shost_gendev),
+		 rport->state);
+	if (srp_rport_set_state(rport, SRP_RPORT_BLOCKED) != 0)
+		goto out_unlock;
+	pr_debug("%s new state: %d\n", dev_name(&shost->shost_gendev),
+		 rport->state);
+	scsi_target_block(&shost->shost_gendev);
+
+	if (rport->fast_io_fail_tmo >= 0)
+		queue_delayed_work(system_long_wq, &rport->fast_io_fail_work,
+				   1UL * rport->fast_io_fail_tmo * HZ);
+	if (rport->dev_loss_tmo >= 0)
+		queue_delayed_work(system_long_wq, &rport->dev_loss_work,
+				   1UL * rport->dev_loss_tmo * HZ);
+	delay = rport->reconnect_delay;
+	if (delay > 0)
+		queue_delayed_work(system_long_wq, &rport->reconnect_work,
+				   1UL * delay * HZ);
+out_unlock:
+	mutex_unlock(&rport->mutex);
+}
+EXPORT_SYMBOL(srp_start_tl_fail_timers);
+
+/**
+ * srp_timed_out - SRP transport intercept of the SCSI timeout EH
+ *
+ * If a timeout occurs while an rport is in the blocked state, ask the SCSI
+ * EH to continue waiting (BLK_EH_RESET_TIMER). Otherwise let the SCSI core
+ * handle the timeout (BLK_EH_NOT_HANDLED).
+ *
+ * Note: This function is called from soft-IRQ context and with the request
+ * queue lock held.
+ */
+static enum blk_eh_timer_return srp_timed_out(struct scsi_cmnd *scmd)
+{
+	struct scsi_device *sdev = scmd->device;
+
+	pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
+	return scsi_device_blocked(sdev) ? BLK_EH_RESET_TIMER :
+		BLK_EH_NOT_HANDLED;
+}
+
 static void srp_rport_release(struct device *dev)
 {
 	struct srp_rport *rport = dev_to_rport(dev);
 
+	cancel_delayed_work_sync(&rport->reconnect_work);
+	cancel_delayed_work_sync(&rport->fast_io_fail_work);
+	cancel_delayed_work_sync(&rport->dev_loss_work);
+
 	put_device(dev->parent);
 	kfree(rport);
 }
@@ -214,12 +670,15 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 {
 	struct srp_rport *rport;
 	struct device *parent = &shost->shost_gendev;
+	struct srp_internal *i = to_srp_internal(shost->transportt);
 	int id, ret;
 
 	rport = kzalloc(sizeof(*rport), GFP_KERNEL);
 	if (!rport)
 		return ERR_PTR(-ENOMEM);
 
+	mutex_init(&rport->mutex);
+
 	device_initialize(&rport->dev);
 
 	rport->dev.parent = get_device(parent);
@@ -228,6 +687,17 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 	memcpy(rport->port_id, ids->port_id, sizeof(rport->port_id));
 	rport->roles = ids->roles;
 
+	if (i->f->reconnect)
+		rport->reconnect_delay = i->f->reconnect_delay ?
+			*i->f->reconnect_delay : 10;
+	INIT_DELAYED_WORK(&rport->reconnect_work, srp_reconnect_work);
+	rport->fast_io_fail_tmo = i->f->fast_io_fail_tmo ?
+		*i->f->fast_io_fail_tmo : 15;
+	rport->dev_loss_tmo = i->f->dev_loss_tmo ? *i->f->dev_loss_tmo : 600;
+	INIT_DELAYED_WORK(&rport->fast_io_fail_work,
+			  rport_fast_io_fail_timedout);
+	INIT_DELAYED_WORK(&rport->dev_loss_work, rport_dev_loss_timedout);
+
 	id = atomic_inc_return(&to_srp_host_attrs(shost)->next_port_id);
 	dev_set_name(&rport->dev, "port-%d:%d", shost->host_no, id);
 
@@ -277,6 +747,12 @@ void srp_rport_del(struct srp_rport *rport)
 	transport_remove_device(dev);
 	device_del(dev);
 	transport_destroy_device(dev);
+
+	mutex_lock(&rport->mutex);
+	if (rport->state == SRP_RPORT_BLOCKED)
+		__rport_fast_io_fail_timedout(rport);
+	mutex_unlock(&rport->mutex);
+
 	put_device(dev);
 }
 EXPORT_SYMBOL_GPL(srp_rport_del);
@@ -328,6 +804,8 @@ srp_attach_transport(struct srp_function_template *ft)
 	if (!i)
 		return NULL;
 
+	i->t.eh_timed_out = srp_timed_out;
+
 	i->t.tsk_mgmt_response = srp_tsk_mgmt_response;
 	i->t.it_nexus_response = srp_it_nexus_response;
 
@@ -345,6 +823,15 @@ srp_attach_transport(struct srp_function_template *ft)
 	count = 0;
 	i->rport_attrs[count++] = &dev_attr_port_id;
 	i->rport_attrs[count++] = &dev_attr_roles;
+	if (ft->has_rport_state) {
+		i->rport_attrs[count++] = &dev_attr_state;
+		i->rport_attrs[count++] = &dev_attr_fast_io_fail_tmo;
+		i->rport_attrs[count++] = &dev_attr_dev_loss_tmo;
+	}
+	if (ft->reconnect) {
+		i->rport_attrs[count++] = &dev_attr_reconnect_delay;
+		i->rport_attrs[count++] = &dev_attr_failed_reconnects;
+	}
 	if (ft->rport_delete)
 		i->rport_attrs[count++] = &dev_attr_delete;
 	i->rport_attrs[count++] = NULL;
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index 5a2d2d1..e077496 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -13,6 +13,18 @@ struct srp_rport_identifiers {
 	u8 roles;
 };
 
+enum srp_rport_state {
+	SRP_RPORT_RUNNING,
+	SRP_RPORT_BLOCKED,
+	SRP_RPORT_FAIL_FAST,
+	SRP_RPORT_LOST,
+};
+
+/**
+ * struct srp_rport
+ * @mutex:   Protects against concurrent rport fast_io_fail / dev_loss_tmo /
+ *   reconnect activity.
+ */
 struct srp_rport {
 	/* for initiator and target drivers */
 
@@ -23,11 +35,27 @@ struct srp_rport {
 
 	/* for initiator drivers */
 
-	void *lld_data;	/* LLD private data */
+	void			*lld_data;	/* LLD private data */
+
+	struct mutex		mutex;
+	enum srp_rport_state	state;
+	int			reconnect_delay;
+	int			failed_reconnects;
+	struct delayed_work	reconnect_work;
+	int			fast_io_fail_tmo;
+	int			dev_loss_tmo;
+	struct delayed_work	fast_io_fail_work;
+	struct delayed_work	dev_loss_work;
 };
 
 struct srp_function_template {
 	/* for initiator drivers */
+	bool has_rport_state;
+	int *reconnect_delay;
+	int *fast_io_fail_tmo;
+	int *dev_loss_tmo;
+	int (*reconnect)(struct srp_rport *rport);
+	void (*terminate_rport_io)(struct srp_rport *rport);
 	void (*rport_delete)(struct srp_rport *rport);
 	/* for target drivers */
 	int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int);
@@ -43,7 +71,29 @@ extern void srp_rport_put(struct srp_rport *rport);
 extern struct srp_rport *srp_rport_add(struct Scsi_Host *,
 				       struct srp_rport_identifiers *);
 extern void srp_rport_del(struct srp_rport *);
-
+extern int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo);
+extern int srp_reconnect_rport(struct srp_rport *rport);
+extern void srp_start_tl_fail_timers(struct srp_rport *rport);
 extern void srp_remove_host(struct Scsi_Host *);
 
+/**
+ * srp_chkready() - evaluate the transport layer state before I/O
+ *
+ * Returns a SCSI result code that can be returned by the LLDD. The role of
+ * this function is similar to that of fc_remote_port_chkready().
+ */
+static inline int srp_chkready(struct srp_rport *rport)
+{
+	switch (rport->state) {
+	case SRP_RPORT_RUNNING:
+	case SRP_RPORT_BLOCKED:
+	default:
+		return 0;
+	case SRP_RPORT_FAIL_FAST:
+		return DID_TRANSPORT_FAILFAST << 16;
+	case SRP_RPORT_LOST:
+		return DID_NO_CONNECT << 16;
+	}
+}
+
 #endif
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-06-24 16:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-24 13:48 [PATCH 07/14] scsi_transport_srp: Add transport layer error handling Jack Wang
     [not found] ` <51C84E39.80806-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-06-24 15:50   ` Bart Van Assche
     [not found]     ` <51C86AB4.1000906-HInyCGIudOg@public.gmane.org>
2013-06-24 16:05       ` Jack Wang
  -- strict thread matches above, loose matches on Subject: below --
2013-06-19 13:44 Jack Wang
2013-06-19 15:27 ` Bart Van Assche
2013-06-21 12:17   ` Jack Wang
2013-06-12 13:17 [PATCH 0/14] IB SRP initiator patches for kernel 3.11 Bart Van Assche
2013-06-12 13:28 ` [PATCH 07/14] scsi_transport_srp: Add transport layer error handling Bart Van Assche
     [not found]   ` <51B8777B.5050201-HInyCGIudOg@public.gmane.org>
2013-06-13 19:43     ` Vu Pham
2013-06-14 13:19       ` Bart Van Assche
     [not found]         ` <51BB1857.7040802-HInyCGIudOg@public.gmane.org>
2013-06-14 17:59           ` Vu Pham
     [not found]             ` <51BB5A04.3080901-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-06-15  9:52               ` Bart Van Assche
     [not found]                 ` <51BC3945.9030900-HInyCGIudOg@public.gmane.org>
2013-06-17  6:18                   ` Hannes Reinecke
2013-06-17  7:04                     ` Bart Van Assche
2013-06-17  7:14                       ` Hannes Reinecke
2013-06-17  7:29                         ` Bart Van Assche
     [not found]                           ` <51BEBAEA.4080202-HInyCGIudOg@public.gmane.org>
2013-06-17  8:10                             ` Hannes Reinecke
2013-06-17 10:13                             ` Sebastian Riemer
2013-06-18 16:59                 ` Vu Pham
     [not found]                   ` <51C09202.2040503-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-06-19 13:00                     ` Bart Van Assche
2013-06-23 21:13   ` Mike Christie
     [not found]     ` <51C764FB.6070207-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>
2013-06-24  7:37       ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.