All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi_dh_alua: add missing transitioning state support
@ 2010-08-17 19:05 Mike Snitzer
  2010-08-17 19:23 ` Nicholas A. Bellinger
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2010-08-17 19:05 UTC (permalink / raw)
  To: James Bottomley; +Cc: Mike Christie, linux-scsi

Handle transitioning in the prep_fn.
Handle transitioning in alua_rtpg's implicit alua code too.

These gaps were identified during controller failover testing of an
ALUA array.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/scsi/device_handler/scsi_dh_alua.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 1a970a7..c1eedc5 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
 		    h->state == TPGS_STATE_STANDBY)
 			/* Useable path if active */
 			err = SCSI_DH_OK;
+		else if (h->state == TPGS_STATE_TRANSITIONING)
+			/* State transition, retry */
+			goto retry;
 		else
 			/* Path unuseable for unavailable/offline */
 			err = SCSI_DH_DEV_OFFLINED;
@@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
 	struct alua_dh_data *h = get_alua_data(sdev);
 	int ret = BLKPREP_OK;
 
-	if (h->state != TPGS_STATE_OPTIMIZED &&
-	    h->state != TPGS_STATE_NONOPTIMIZED) {
+	if (h->state == TPGS_STATE_TRANSITIONING)
+		ret = BLKPREP_DEFER;
+	else if (h->state != TPGS_STATE_OPTIMIZED &&
+		 h->state != TPGS_STATE_NONOPTIMIZED) {
 		ret = BLKPREP_KILL;
 		req->cmd_flags |= REQ_QUIET;
 	}
 	return ret;
-
 }
 
 static const struct scsi_dh_devlist alua_dev_list[] = {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi_dh_alua: add missing transitioning state support
  2010-08-17 19:05 [PATCH] scsi_dh_alua: add missing transitioning state support Mike Snitzer
@ 2010-08-17 19:23 ` Nicholas A. Bellinger
  2010-08-30  9:36   ` Hannes Reinecke
  0 siblings, 1 reply; 14+ messages in thread
From: Nicholas A. Bellinger @ 2010-08-17 19:23 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: James Bottomley, Mike Christie, linux-scsi

On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
> Handle transitioning in the prep_fn.
> Handle transitioning in alua_rtpg's implicit alua code too.
> 
> These gaps were identified during controller failover testing of an
> ALUA array.
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  drivers/scsi/device_handler/scsi_dh_alua.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index 1a970a7..c1eedc5 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
>  		    h->state == TPGS_STATE_STANDBY)
>  			/* Useable path if active */
>  			err = SCSI_DH_OK;
> +		else if (h->state == TPGS_STATE_TRANSITIONING)
> +			/* State transition, retry */
> +			goto retry;
>  		else
>  			/* Path unuseable for unavailable/offline */
>  			err = SCSI_DH_DEV_OFFLINED;
> @@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>  	struct alua_dh_data *h = get_alua_data(sdev);
>  	int ret = BLKPREP_OK;
>  
> -	if (h->state != TPGS_STATE_OPTIMIZED &&
> -	    h->state != TPGS_STATE_NONOPTIMIZED) {
> +	if (h->state == TPGS_STATE_TRANSITIONING)
> +		ret = BLKPREP_DEFER;
> +	else if (h->state != TPGS_STATE_OPTIMIZED &&
> +		 h->state != TPGS_STATE_NONOPTIMIZED) {
>  		ret = BLKPREP_KILL;
>  		req->cmd_flags |= REQ_QUIET;
>  	}
>  	return ret;
> -
>  }
>  

Makes sense to me..

Acked-by: Nicholas A. Bellinger <nab@linux-iscsi.org>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi_dh_alua: add missing transitioning state support
  2010-08-17 19:23 ` Nicholas A. Bellinger
@ 2010-08-30  9:36   ` Hannes Reinecke
  2010-08-31 15:11     ` Mike Snitzer
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Reinecke @ 2010-08-30  9:36 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Mike Snitzer, James Bottomley, Mike Christie, linux-scsi

Nicholas A. Bellinger wrote:
> On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
>> Handle transitioning in the prep_fn.
>> Handle transitioning in alua_rtpg's implicit alua code too.
>>
>> These gaps were identified during controller failover testing of an
>> ALUA array.
>>
>> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
>> ---
>>  drivers/scsi/device_handler/scsi_dh_alua.c |   10 +++++++---
>>  1 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
>> index 1a970a7..c1eedc5 100644
>> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
>> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
>> @@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
>>  		    h->state == TPGS_STATE_STANDBY)
>>  			/* Useable path if active */
>>  			err = SCSI_DH_OK;
>> +		else if (h->state == TPGS_STATE_TRANSITIONING)
>> +			/* State transition, retry */
>> +			goto retry;
>>  		else
>>  			/* Path unuseable for unavailable/offline */
>>  			err = SCSI_DH_DEV_OFFLINED;
>> @@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>>  	struct alua_dh_data *h = get_alua_data(sdev);
>>  	int ret = BLKPREP_OK;
>>  
>> -	if (h->state != TPGS_STATE_OPTIMIZED &&
>> -	    h->state != TPGS_STATE_NONOPTIMIZED) {
>> +	if (h->state == TPGS_STATE_TRANSITIONING)
>> +		ret = BLKPREP_DEFER;
>> +	else if (h->state != TPGS_STATE_OPTIMIZED &&
>> +		 h->state != TPGS_STATE_NONOPTIMIZED) {
>>  		ret = BLKPREP_KILL;
>>  		req->cmd_flags |= REQ_QUIET;
>>  	}
>>  	return ret;
>> -
>>  }
>>  
> 
> Makes sense to me..
> 
> Acked-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
> 
Not so fast. There are two problems with this approach:

The path is retried indefinitely. Arrays are _supposed_ to be in 'transitioning'
only temporary; however, if the array is stuck due to a fw error we're stuck in 'defer',
too.

Secondly this path fails with 'directio' multipath checker. Remember that 'directio'
is using 'fs' requests, not block-pc ones. Hence for all I/O the prep_fn() callback
is evaluated, which will return 'DEFER' here once the path is in transitioning.
And the state is never updated as RTPG is never called.

I'm currently preparing a patch which addressed these situations, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-08-30  9:36   ` Hannes Reinecke
@ 2010-08-31 15:11     ` Mike Snitzer
  2010-09-20 15:35       ` Mike Snitzer
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2010-08-31 15:11 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Nicholas A. Bellinger, James Bottomley, Mike Christie, linux-scsi

On Mon, Aug 30 2010 at  5:36am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> Nicholas A. Bellinger wrote:
> > On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
> >> Handle transitioning in the prep_fn.
> >> Handle transitioning in alua_rtpg's implicit alua code too.
> >>
> >> These gaps were identified during controller failover testing of an
> >> ALUA array.
> >>
> >> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> >> ---
> >>  drivers/scsi/device_handler/scsi_dh_alua.c |   10 +++++++---
> >>  1 files changed, 7 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> >> index 1a970a7..c1eedc5 100644
> >> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> >> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> >> @@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
> >>  		    h->state == TPGS_STATE_STANDBY)
> >>  			/* Useable path if active */
> >>  			err = SCSI_DH_OK;
> >> +		else if (h->state == TPGS_STATE_TRANSITIONING)
> >> +			/* State transition, retry */
> >> +			goto retry;
> >>  		else
> >>  			/* Path unuseable for unavailable/offline */
> >>  			err = SCSI_DH_DEV_OFFLINED;
> >> @@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
> >>  	struct alua_dh_data *h = get_alua_data(sdev);
> >>  	int ret = BLKPREP_OK;
> >>  
> >> -	if (h->state != TPGS_STATE_OPTIMIZED &&
> >> -	    h->state != TPGS_STATE_NONOPTIMIZED) {
> >> +	if (h->state == TPGS_STATE_TRANSITIONING)
> >> +		ret = BLKPREP_DEFER;
> >> +	else if (h->state != TPGS_STATE_OPTIMIZED &&
> >> +		 h->state != TPGS_STATE_NONOPTIMIZED) {
> >>  		ret = BLKPREP_KILL;
> >>  		req->cmd_flags |= REQ_QUIET;
> >>  	}
> >>  	return ret;
> >> -
> >>  }
> >>  
> > 
> > Makes sense to me..
> > 
> > Acked-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
> > 
> Not so fast. There are two problems with this approach:
> 
> The path is retried indefinitely. Arrays are _supposed_ to be in 'transitioning'
> only temporary; however, if the array is stuck due to a fw error we're stuck in 'defer',
> too.

And what is the problem with that?  The IO will eventually time out.

> Secondly this path fails with 'directio' multipath checker. Remember that 'directio'
> is using 'fs' requests, not block-pc ones. Hence for all I/O the prep_fn() callback
> is evaluated, which will return 'DEFER' here once the path is in transitioning.
> And the state is never updated as RTPG is never called.

Testing ALUA with directio path checker did not result in such immutable
state in the few instances that TPGS_STATE_TRANSITIONING was seen in
alua_prep_fn.

> I'm currently preparing a patch which addressed these situations, too.

OK, please share.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-08-31 15:11     ` Mike Snitzer
@ 2010-09-20 15:35       ` Mike Snitzer
  2010-09-21  2:27         ` Mike Christie
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2010-09-20 15:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Nicholas A. Bellinger, James Bottomley, Mike Christie, linux-scsi

Hi Hannes,

On Tue, Aug 31 2010 at 11:11am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Aug 30 2010 at  5:36am -0400,
> Hannes Reinecke <hare@suse.de> wrote:
> 
> > Nicholas A. Bellinger wrote:
> > > On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
> > >> Handle transitioning in the prep_fn.
> > >> Handle transitioning in alua_rtpg's implicit alua code too.
> > >>
> > >> These gaps were identified during controller failover testing of an
> > >> ALUA array.
> > >>
> > >> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > >> ---
> > >>  drivers/scsi/device_handler/scsi_dh_alua.c |   10 +++++++---
> > >>  1 files changed, 7 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> > >> index 1a970a7..c1eedc5 100644
> > >> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> > >> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> > >> @@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
> > >>  		    h->state == TPGS_STATE_STANDBY)
> > >>  			/* Useable path if active */
> > >>  			err = SCSI_DH_OK;
> > >> +		else if (h->state == TPGS_STATE_TRANSITIONING)
> > >> +			/* State transition, retry */
> > >> +			goto retry;
> > >>  		else
> > >>  			/* Path unuseable for unavailable/offline */
> > >>  			err = SCSI_DH_DEV_OFFLINED;
> > >> @@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
> > >>  	struct alua_dh_data *h = get_alua_data(sdev);
> > >>  	int ret = BLKPREP_OK;
> > >>  
> > >> -	if (h->state != TPGS_STATE_OPTIMIZED &&
> > >> -	    h->state != TPGS_STATE_NONOPTIMIZED) {
> > >> +	if (h->state == TPGS_STATE_TRANSITIONING)
> > >> +		ret = BLKPREP_DEFER;
> > >> +	else if (h->state != TPGS_STATE_OPTIMIZED &&
> > >> +		 h->state != TPGS_STATE_NONOPTIMIZED) {
> > >>  		ret = BLKPREP_KILL;
> > >>  		req->cmd_flags |= REQ_QUIET;
> > >>  	}
> > >>  	return ret;
> > >> -
> > >>  }
> > >>  
> > > 
> > > Makes sense to me..
> > > 
> > > Acked-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
> > > 
> > Not so fast. There are two problems with this approach:
> > 
> > The path is retried indefinitely. Arrays are _supposed_ to be in 'transitioning'
> > only temporary; however, if the array is stuck due to a fw error we're stuck in 'defer',
> > too.
> 
> And what is the problem with that?  The IO will eventually time out.

To restate as a question: even though we'll retry in alua_rtpg();
shouldn't the SCSI command eventually time out (via
scsi_attempt_requeue_command)?

Note that my proposed change to alua_rtpg() just adds a
TPGS_STATE_TRANSITIONING handler for implicit ALUA -- explicit ALUA
already has such a handler.

Allowing implicit ALUA to fall through (as we do currently) causes a
return alua_rtpg() of SCSI_DH_DEV_OFFLINED -- which isn't the correct
state.

> > Secondly this path fails with 'directio' multipath checker. Remember that 'directio'
> > is using 'fs' requests, not block-pc ones. Hence for all I/O the prep_fn() callback
> > is evaluated, which will return 'DEFER' here once the path is in transitioning.
> > And the state is never updated as RTPG is never called.
> 
> Testing ALUA with directio path checker did not result in such immutable
> state in the few instances that TPGS_STATE_TRANSITIONING was seen in
> alua_prep_fn.

I had another look and I see what you're saying.  Thanks for catching this!

> > I'm currently preparing a patch which addressed these situations, too.
> 
> OK, please share.

Do you have that patch you were preparing?  I look forward to seeing
your solution to this.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-09-20 15:35       ` Mike Snitzer
@ 2010-09-21  2:27         ` Mike Christie
  2010-09-21  2:28           ` Mike Christie
  2010-09-21 19:33           ` Mike Snitzer
  0 siblings, 2 replies; 14+ messages in thread
From: Mike Christie @ 2010-09-21  2:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Hannes Reinecke, Nicholas A. Bellinger, James Bottomley, linux-scsi

On 09/20/2010 10:35 AM, Mike Snitzer wrote:
> Hi Hannes,
>
> On Tue, Aug 31 2010 at 11:11am -0400,
> Mike Snitzer<snitzer@redhat.com>  wrote:
>
>> On Mon, Aug 30 2010 at  5:36am -0400,
>> Hannes Reinecke<hare@suse.de>  wrote:
>>
>>> Nicholas A. Bellinger wrote:
>>>> On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
>>>>> Handle transitioning in the prep_fn.
>>>>> Handle transitioning in alua_rtpg's implicit alua code too.
>>>>>
>>>>> These gaps were identified during controller failover testing of an
>>>>> ALUA array.
>>>>>
>>>>> Signed-off-by: Mike Snitzer<snitzer@redhat.com>
>>>>> ---
>>>>>   drivers/scsi/device_handler/scsi_dh_alua.c |   10 +++++++---
>>>>>   1 files changed, 7 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>> index 1a970a7..c1eedc5 100644
>>>>> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>> @@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
>>>>>   		    h->state == TPGS_STATE_STANDBY)
>>>>>   			/* Useable path if active */
>>>>>   			err = SCSI_DH_OK;
>>>>> +		else if (h->state == TPGS_STATE_TRANSITIONING)
>>>>> +			/* State transition, retry */
>>>>> +			goto retry;
>>>>>   		else
>>>>>   			/* Path unuseable for unavailable/offline */
>>>>>   			err = SCSI_DH_DEV_OFFLINED;
>>>>> @@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>>>>>   	struct alua_dh_data *h = get_alua_data(sdev);
>>>>>   	int ret = BLKPREP_OK;
>>>>>
>>>>> -	if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>> -	    h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>> +	if (h->state == TPGS_STATE_TRANSITIONING)
>>>>> +		ret = BLKPREP_DEFER;
>>>>> +	else if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>> +		 h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>>   		ret = BLKPREP_KILL;
>>>>>   		req->cmd_flags |= REQ_QUIET;
>>>>>   	}
>>>>>   	return ret;
>>>>> -
>>>>>   }
>>>>>
>>>>
>>>> Makes sense to me..
>>>>
>>>> Acked-by: Nicholas A. Bellinger<nab@linux-iscsi.org>
>>>>
>>> Not so fast. There are two problems with this approach:
>>>
>>> The path is retried indefinitely. Arrays are _supposed_ to be in 'transitioning'
>>> only temporary; however, if the array is stuck due to a fw error we're stuck in 'defer',
>>> too.
>>
>> And what is the problem with that?  The IO will eventually time out.
>
> To restate as a question: even though we'll retry in alua_rtpg();
> shouldn't the SCSI command eventually time out (via
> scsi_attempt_requeue_command)?

That function is only in RHEL. Requests that are prepd and sent to the 
scsi layer and driver would eventually timeout in scsi_softirq_done in 
upstream.

alua_prep_fn prevents the IO from getting sent to the scsi layer so we 
do not hit the check in scsi_softirq_done though.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-09-21  2:27         ` Mike Christie
@ 2010-09-21  2:28           ` Mike Christie
  2010-09-21 19:33           ` Mike Snitzer
  1 sibling, 0 replies; 14+ messages in thread
From: Mike Christie @ 2010-09-21  2:28 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Hannes Reinecke, Nicholas A. Bellinger, James Bottomley, linux-scsi

On 09/20/2010 09:27 PM, Mike Christie wrote:
> On 09/20/2010 10:35 AM, Mike Snitzer wrote:
>> Hi Hannes,
>>
>> On Tue, Aug 31 2010 at 11:11am -0400,
>> Mike Snitzer<snitzer@redhat.com> wrote:
>>
>>> On Mon, Aug 30 2010 at 5:36am -0400,
>>> Hannes Reinecke<hare@suse.de> wrote:
>>>
>>>> Nicholas A. Bellinger wrote:
>>>>> On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
>>>>>> Handle transitioning in the prep_fn.
>>>>>> Handle transitioning in alua_rtpg's implicit alua code too.
>>>>>>
>>>>>> These gaps were identified during controller failover testing of an
>>>>>> ALUA array.
>>>>>>
>>>>>> Signed-off-by: Mike Snitzer<snitzer@redhat.com>
>>>>>> ---
>>>>>> drivers/scsi/device_handler/scsi_dh_alua.c | 10 +++++++---
>>>>>> 1 files changed, 7 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>> b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>> index 1a970a7..c1eedc5 100644
>>>>>> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>> @@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device *sdev,
>>>>>> struct alua_dh_data *h)
>>>>>> h->state == TPGS_STATE_STANDBY)
>>>>>> /* Useable path if active */
>>>>>> err = SCSI_DH_OK;
>>>>>> + else if (h->state == TPGS_STATE_TRANSITIONING)
>>>>>> + /* State transition, retry */
>>>>>> + goto retry;
>>>>>> else
>>>>>> /* Path unuseable for unavailable/offline */
>>>>>> err = SCSI_DH_DEV_OFFLINED;
>>>>>> @@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device
>>>>>> *sdev, struct request *req)
>>>>>> struct alua_dh_data *h = get_alua_data(sdev);
>>>>>> int ret = BLKPREP_OK;
>>>>>>
>>>>>> - if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>>> - h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>>> + if (h->state == TPGS_STATE_TRANSITIONING)
>>>>>> + ret = BLKPREP_DEFER;
>>>>>> + else if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>>> + h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>>> ret = BLKPREP_KILL;
>>>>>> req->cmd_flags |= REQ_QUIET;
>>>>>> }
>>>>>> return ret;
>>>>>> -
>>>>>> }
>>>>>>
>>>>>
>>>>> Makes sense to me..
>>>>>
>>>>> Acked-by: Nicholas A. Bellinger<nab@linux-iscsi.org>
>>>>>
>>>> Not so fast. There are two problems with this approach:
>>>>
>>>> The path is retried indefinitely. Arrays are _supposed_ to be in
>>>> 'transitioning'
>>>> only temporary; however, if the array is stuck due to a fw error
>>>> we're stuck in 'defer',
>>>> too.
>>>
>>> And what is the problem with that? The IO will eventually time out.
>>
>> To restate as a question: even though we'll retry in alua_rtpg();
>> shouldn't the SCSI command eventually time out (via
>> scsi_attempt_requeue_command)?
>
> That function is only in RHEL. Requests that are prepd and sent to the
> scsi layer and driver would eventually timeout in scsi_softirq_done in
> upstream.

Did we want to move this check to the block layer btw? If so it could 
solve the problem.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-09-21  2:27         ` Mike Christie
  2010-09-21  2:28           ` Mike Christie
@ 2010-09-21 19:33           ` Mike Snitzer
  2010-09-21 21:14             ` Mike Christie
  1 sibling, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2010-09-21 19:33 UTC (permalink / raw)
  To: Mike Christie
  Cc: Hannes Reinecke, Nicholas A. Bellinger, James Bottomley, linux-scsi

On Mon, Sep 20 2010 at 10:27pm -0400,
Mike Christie <michaelc@cs.wisc.edu> wrote:

> On 09/20/2010 10:35 AM, Mike Snitzer wrote:
> >Hi Hannes,
> >
> >On Tue, Aug 31 2010 at 11:11am -0400,
> >Mike Snitzer<snitzer@redhat.com>  wrote:
> >
> >>On Mon, Aug 30 2010 at  5:36am -0400,
> >>Hannes Reinecke<hare@suse.de>  wrote:
> >>
> >>>Nicholas A. Bellinger wrote:
> >>>>On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
> >>>>>Handle transitioning in the prep_fn.
> >>>>>Handle transitioning in alua_rtpg's implicit alua code too.
> >>>>>
> >>>>>These gaps were identified during controller failover testing of an
> >>>>>ALUA array.
> >>>>>
> >>>>>Signed-off-by: Mike Snitzer<snitzer@redhat.com>
> >>>>>---
> >>>>>  drivers/scsi/device_handler/scsi_dh_alua.c |   10 +++++++---
> >>>>>  1 files changed, 7 insertions(+), 3 deletions(-)
> >>>>>
> >>>>>diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> >>>>>index 1a970a7..c1eedc5 100644
> >>>>>--- a/drivers/scsi/device_handler/scsi_dh_alua.c
> >>>>>+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> >>>>>@@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
> >>>>>  		    h->state == TPGS_STATE_STANDBY)
> >>>>>  			/* Useable path if active */
> >>>>>  			err = SCSI_DH_OK;
> >>>>>+		else if (h->state == TPGS_STATE_TRANSITIONING)
> >>>>>+			/* State transition, retry */
> >>>>>+			goto retry;
> >>>>>  		else
> >>>>>  			/* Path unuseable for unavailable/offline */
> >>>>>  			err = SCSI_DH_DEV_OFFLINED;
> >>>>>@@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
> >>>>>  	struct alua_dh_data *h = get_alua_data(sdev);
> >>>>>  	int ret = BLKPREP_OK;
> >>>>>
> >>>>>-	if (h->state != TPGS_STATE_OPTIMIZED&&
> >>>>>-	    h->state != TPGS_STATE_NONOPTIMIZED) {
> >>>>>+	if (h->state == TPGS_STATE_TRANSITIONING)
> >>>>>+		ret = BLKPREP_DEFER;
> >>>>>+	else if (h->state != TPGS_STATE_OPTIMIZED&&
> >>>>>+		 h->state != TPGS_STATE_NONOPTIMIZED) {
> >>>>>  		ret = BLKPREP_KILL;
> >>>>>  		req->cmd_flags |= REQ_QUIET;
> >>>>>  	}
> >>>>>  	return ret;
> >>>>>-
> >>>>>  }
> >>>>>
> >>>>
> >>>>Makes sense to me..
> >>>>
> >>>>Acked-by: Nicholas A. Bellinger<nab@linux-iscsi.org>
> >>>>
> >>>Not so fast. There are two problems with this approach:
> >>>
> >>>The path is retried indefinitely. Arrays are _supposed_ to be in 'transitioning'
> >>>only temporary; however, if the array is stuck due to a fw error we're stuck in 'defer',
> >>>too.
> >>
> >>And what is the problem with that?  The IO will eventually time out.
> >
> >To restate as a question: even though we'll retry in alua_rtpg();
> >shouldn't the SCSI command eventually time out (via
> >scsi_attempt_requeue_command)?
> 
> That function is only in RHEL. Requests that are prepd and sent to
> the scsi layer and driver would eventually timeout in
> scsi_softirq_done in upstream.
> 
> alua_prep_fn prevents the IO from getting sent to the scsi layer so
> we do not hit the check in scsi_softirq_done though.

That is only the case if alua_prep_fn were to return BLKPREP_DEFER
right?

Taking a step back:
1) the proposed alua_prep_fn BLKPREP_DEFER change is not correct and is
   no longer being proposed...

2) the patch also modified alua_rtpg() so implicit ALUA would retry
   (just like explicit ALUA currently does) if TPGS_STATE_TRANSITIONING
   - so why should we avoid retry for implicit but do it for explicit?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-09-21 19:33           ` Mike Snitzer
@ 2010-09-21 21:14             ` Mike Christie
  2010-09-22 10:13               ` Hannes Reinecke
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Christie @ 2010-09-21 21:14 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Hannes Reinecke, Nicholas A. Bellinger, James Bottomley, linux-scsi

On 09/21/2010 02:33 PM, Mike Snitzer wrote:
> On Mon, Sep 20 2010 at 10:27pm -0400,
> Mike Christie<michaelc@cs.wisc.edu>  wrote:
>
>> On 09/20/2010 10:35 AM, Mike Snitzer wrote:
>>> Hi Hannes,
>>>
>>> On Tue, Aug 31 2010 at 11:11am -0400,
>>> Mike Snitzer<snitzer@redhat.com>   wrote:
>>>
>>>> On Mon, Aug 30 2010 at  5:36am -0400,
>>>> Hannes Reinecke<hare@suse.de>   wrote:
>>>>
>>>>> Nicholas A. Bellinger wrote:
>>>>>> On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
>>>>>>> Handle transitioning in the prep_fn.
>>>>>>> Handle transitioning in alua_rtpg's implicit alua code too.
>>>>>>>
>>>>>>> These gaps were identified during controller failover testing of an
>>>>>>> ALUA array.
>>>>>>>
>>>>>>> Signed-off-by: Mike Snitzer<snitzer@redhat.com>
>>>>>>> ---
>>>>>>>   drivers/scsi/device_handler/scsi_dh_alua.c |   10 +++++++---
>>>>>>>   1 files changed, 7 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>> index 1a970a7..c1eedc5 100644
>>>>>>> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>> @@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
>>>>>>>   		    h->state == TPGS_STATE_STANDBY)
>>>>>>>   			/* Useable path if active */
>>>>>>>   			err = SCSI_DH_OK;
>>>>>>> +		else if (h->state == TPGS_STATE_TRANSITIONING)
>>>>>>> +			/* State transition, retry */
>>>>>>> +			goto retry;
>>>>>>>   		else
>>>>>>>   			/* Path unuseable for unavailable/offline */
>>>>>>>   			err = SCSI_DH_DEV_OFFLINED;
>>>>>>> @@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>>>>>>>   	struct alua_dh_data *h = get_alua_data(sdev);
>>>>>>>   	int ret = BLKPREP_OK;
>>>>>>>
>>>>>>> -	if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>>>> -	    h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>>>> +	if (h->state == TPGS_STATE_TRANSITIONING)
>>>>>>> +		ret = BLKPREP_DEFER;
>>>>>>> +	else if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>>>> +		 h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>>>>   		ret = BLKPREP_KILL;
>>>>>>>   		req->cmd_flags |= REQ_QUIET;
>>>>>>>   	}
>>>>>>>   	return ret;
>>>>>>> -
>>>>>>>   }
>>>>>>>
>>>>>>
>>>>>> Makes sense to me..
>>>>>>
>>>>>> Acked-by: Nicholas A. Bellinger<nab@linux-iscsi.org>
>>>>>>
>>>>> Not so fast. There are two problems with this approach:
>>>>>
>>>>> The path is retried indefinitely. Arrays are _supposed_ to be in 'transitioning'
>>>>> only temporary; however, if the array is stuck due to a fw error we're stuck in 'defer',
>>>>> too.
>>>>
>>>> And what is the problem with that?  The IO will eventually time out.
>>>
>>> To restate as a question: even though we'll retry in alua_rtpg();
>>> shouldn't the SCSI command eventually time out (via
>>> scsi_attempt_requeue_command)?
>>
>> That function is only in RHEL. Requests that are prepd and sent to
>> the scsi layer and driver would eventually timeout in
>> scsi_softirq_done in upstream.
>>
>> alua_prep_fn prevents the IO from getting sent to the scsi layer so
>> we do not hit the check in scsi_softirq_done though.
>
> That is only the case if alua_prep_fn were to return BLKPREP_DEFER
> right?

Yeah. I misread the email above. Just to clarify..

For the alua case, we will always retry due the prep_fn issue I mentioned.

For the alua_rtpg() retry case you were discussing we will also always 
retry, because the timer check in 
scsi_attempt_requeue_command/scsi_softirq_done will break us from 
retrying forever in that execution of the request started by the 
submit_rtpg call. However, the added goto retry will would just end up 
starting a another execution. To handle Hannes comment I think you would 
want to add some retry/timer checks in alua_rtpg to prevent that from 
retrying forever.

>
> 2) the patch also modified alua_rtpg() so implicit ALUA would retry
>     (just like explicit ALUA currently does) if TPGS_STATE_TRANSITIONING
>     - so why should we avoid retry for implicit but do it for explicit?

Leaving that for Hannes. I cannot think of a reason. Probably just did 
not do it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-09-21 21:14             ` Mike Christie
@ 2010-09-22 10:13               ` Hannes Reinecke
  2010-09-22 12:29                 ` Mike Snitzer
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Reinecke @ 2010-09-22 10:13 UTC (permalink / raw)
  To: Mike Christie
  Cc: Mike Snitzer, Nicholas A. Bellinger, James Bottomley, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 4863 bytes --]

Mike Christie wrote:
> On 09/21/2010 02:33 PM, Mike Snitzer wrote:
>> On Mon, Sep 20 2010 at 10:27pm -0400,
>> Mike Christie<michaelc@cs.wisc.edu>  wrote:
>>
>>> On 09/20/2010 10:35 AM, Mike Snitzer wrote:
>>>> Hi Hannes,
>>>>
>>>> On Tue, Aug 31 2010 at 11:11am -0400,
>>>> Mike Snitzer<snitzer@redhat.com>   wrote:
>>>>
>>>>> On Mon, Aug 30 2010 at  5:36am -0400,
>>>>> Hannes Reinecke<hare@suse.de>   wrote:
>>>>>
>>>>>> Nicholas A. Bellinger wrote:
>>>>>>> On Tue, 2010-08-17 at 15:05 -0400, Mike Snitzer wrote:
>>>>>>>> Handle transitioning in the prep_fn.
>>>>>>>> Handle transitioning in alua_rtpg's implicit alua code too.
>>>>>>>>
>>>>>>>> These gaps were identified during controller failover testing of an
>>>>>>>> ALUA array.
>>>>>>>>
>>>>>>>> Signed-off-by: Mike Snitzer<snitzer@redhat.com>
>>>>>>>> ---
>>>>>>>>   drivers/scsi/device_handler/scsi_dh_alua.c |   10 +++++++---
>>>>>>>>   1 files changed, 7 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>>> b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>>> index 1a970a7..c1eedc5 100644
>>>>>>>> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>>> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
>>>>>>>> @@ -616,6 +616,9 @@ static int alua_rtpg(struct scsi_device
>>>>>>>> *sdev, struct alua_dh_data *h)
>>>>>>>>               h->state == TPGS_STATE_STANDBY)
>>>>>>>>               /* Useable path if active */
>>>>>>>>               err = SCSI_DH_OK;
>>>>>>>> +        else if (h->state == TPGS_STATE_TRANSITIONING)
>>>>>>>> +            /* State transition, retry */
>>>>>>>> +            goto retry;
>>>>>>>>           else
>>>>>>>>               /* Path unuseable for unavailable/offline */
>>>>>>>>               err = SCSI_DH_DEV_OFFLINED;
>>>>>>>> @@ -698,13 +701,14 @@ static int alua_prep_fn(struct scsi_device
>>>>>>>> *sdev, struct request *req)
>>>>>>>>       struct alua_dh_data *h = get_alua_data(sdev);
>>>>>>>>       int ret = BLKPREP_OK;
>>>>>>>>
>>>>>>>> -    if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>>>>> -        h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>>>>> +    if (h->state == TPGS_STATE_TRANSITIONING)
>>>>>>>> +        ret = BLKPREP_DEFER;
>>>>>>>> +    else if (h->state != TPGS_STATE_OPTIMIZED&&
>>>>>>>> +         h->state != TPGS_STATE_NONOPTIMIZED) {
>>>>>>>>           ret = BLKPREP_KILL;
>>>>>>>>           req->cmd_flags |= REQ_QUIET;
>>>>>>>>       }
>>>>>>>>       return ret;
>>>>>>>> -
>>>>>>>>   }
>>>>>>>>
>>>>>>>
>>>>>>> Makes sense to me..
>>>>>>>
>>>>>>> Acked-by: Nicholas A. Bellinger<nab@linux-iscsi.org>
>>>>>>>
>>>>>> Not so fast. There are two problems with this approach:
>>>>>>
>>>>>> The path is retried indefinitely. Arrays are _supposed_ to be in
>>>>>> 'transitioning'
>>>>>> only temporary; however, if the array is stuck due to a fw error
>>>>>> we're stuck in 'defer',
>>>>>> too.
>>>>>
>>>>> And what is the problem with that?  The IO will eventually time out.
>>>>
>>>> To restate as a question: even though we'll retry in alua_rtpg();
>>>> shouldn't the SCSI command eventually time out (via
>>>> scsi_attempt_requeue_command)?
>>>
>>> That function is only in RHEL. Requests that are prepd and sent to
>>> the scsi layer and driver would eventually timeout in
>>> scsi_softirq_done in upstream.
>>>
>>> alua_prep_fn prevents the IO from getting sent to the scsi layer so
>>> we do not hit the check in scsi_softirq_done though.
>>
>> That is only the case if alua_prep_fn were to return BLKPREP_DEFER
>> right?
> 
> Yeah. I misread the email above. Just to clarify..
> 
> For the alua case, we will always retry due the prep_fn issue I mentioned.
> 
> For the alua_rtpg() retry case you were discussing we will also always
> retry, because the timer check in
> scsi_attempt_requeue_command/scsi_softirq_done will break us from
> retrying forever in that execution of the request started by the
> submit_rtpg call. However, the added goto retry will would just end up
> starting a another execution. To handle Hannes comment I think you would
> want to add some retry/timer checks in alua_rtpg to prevent that from
> retrying forever.
> 
>>
>> 2) the patch also modified alua_rtpg() so implicit ALUA would retry
>>     (just like explicit ALUA currently does) if TPGS_STATE_TRANSITIONING
>>     - so why should we avoid retry for implicit but do it for explicit?
> 
> Leaving that for Hannes. I cannot think of a reason. Probably just did
> not do it.

Finally I got around to answering this.

I've attached a patch which I made the other day which seems to work
reasonably well.
Looks better from my side, so if you agree I'll be sending it
upstream properly.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

[-- Attachment #2: scsi-dh-alua-handle-all-states --]
[-- Type: text/plain, Size: 4696 bytes --]

>From d3f02c90db3e3177309b78726d082e17dd772ee2 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Wed, 22 Sep 2010 12:09:07 +0200
Subject: [PATCH] scsi_dh_alua: Handle all states correctly

For ALUA we should be handling all states, independent of whether
is explicit or implicit. For 'Transitioning' we should be retry
for a certain amount of time; after that an error should be
returned.

Signed-off-by: Hannes Reinecke <hare@suse.de>

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 1a970a7..c6f57e3 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -31,6 +31,7 @@
 #define TPGS_STATE_NONOPTIMIZED		0x1
 #define TPGS_STATE_STANDBY		0x2
 #define TPGS_STATE_UNAVAILABLE		0x3
+#define TPGS_STATE_LBA_DEPENDENT	0x4
 #define TPGS_STATE_OFFLINE		0xe
 #define TPGS_STATE_TRANSITIONING	0xf
 
@@ -39,6 +40,7 @@
 #define TPGS_SUPPORT_NONOPTIMIZED	0x02
 #define TPGS_SUPPORT_STANDBY		0x04
 #define TPGS_SUPPORT_UNAVAILABLE	0x08
+#define TPGS_SUPPORT_LBA_DEPENDENT	0x10
 #define TPGS_SUPPORT_OFFLINE		0x40
 #define TPGS_SUPPORT_TRANSITION		0x80
 
@@ -460,6 +462,8 @@ static char print_alua_state(int state)
 		return 'S';
 	case TPGS_STATE_UNAVAILABLE:
 		return 'U';
+	case TPGS_STATE_LBA_DEPENDENT:
+		return 'L';
 	case TPGS_STATE_OFFLINE:
 		return 'O';
 	case TPGS_STATE_TRANSITIONING:
@@ -542,7 +546,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
 	int len, k, off, valid_states = 0;
 	char *ucp;
 	unsigned err;
+	unsigned long expiry, interval = 10;
 
+	expiry = round_jiffies_up(jiffies + ALUA_FAILOVER_TIMEOUT);
  retry:
 	err = submit_rtpg(sdev, h);
 
@@ -553,7 +559,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
 			return SCSI_DH_IO;
 
 		err = alua_check_sense(sdev, &sense_hdr);
-		if (err == ADD_TO_MLQUEUE)
+		if (err == ADD_TO_MLQUEUE && time_before(jiffies, expiry))
 			goto retry;
 		sdev_printk(KERN_INFO, sdev,
 			    "%s: rtpg sense code %02x/%02x/%02x\n",
@@ -587,38 +593,36 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
 	}
 
 	sdev_printk(KERN_INFO, sdev,
-		    "%s: port group %02x state %c supports %c%c%c%c%c%c\n",
+		    "%s: port group %02x state %c supports %c%c%c%c%c%c%c\n",
 		    ALUA_DH_NAME, h->group_id, print_alua_state(h->state),
 		    valid_states&TPGS_SUPPORT_TRANSITION?'T':'t',
 		    valid_states&TPGS_SUPPORT_OFFLINE?'O':'o',
+		    valid_states&TPGS_SUPPORT_LBA_DEPENDENT?'L':'l',
 		    valid_states&TPGS_SUPPORT_UNAVAILABLE?'U':'u',
 		    valid_states&TPGS_SUPPORT_STANDBY?'S':'s',
 		    valid_states&TPGS_SUPPORT_NONOPTIMIZED?'N':'n',
 		    valid_states&TPGS_SUPPORT_OPTIMIZED?'A':'a');
 
-	if (h->tpgs & TPGS_MODE_EXPLICIT) {
-		switch (h->state) {
-		case TPGS_STATE_TRANSITIONING:
+	switch (h->state) {
+	case TPGS_STATE_TRANSITIONING:
+		if (time_before(jiffies, expiry)) {
 			/* State transition, retry */
+			interval *= 10;
+			msleep(interval);
 			goto retry;
-			break;
-		case TPGS_STATE_OFFLINE:
-			/* Path is offline, fail */
-			err = SCSI_DH_DEV_OFFLINED;
-			break;
-		default:
-			break;
 		}
-	} else {
-		/* Only Implicit ALUA support */
-		if (h->state == TPGS_STATE_OPTIMIZED ||
-		    h->state == TPGS_STATE_NONOPTIMIZED ||
-		    h->state == TPGS_STATE_STANDBY)
-			/* Useable path if active */
-			err = SCSI_DH_OK;
-		else
-			/* Path unuseable for unavailable/offline */
-			err = SCSI_DH_DEV_OFFLINED;
+		/* Transitioning time exceeded */
+		err = SCSI_DH_RETRY;
+		break;
+	case TPGS_STATE_OFFLINE:
+	case TPGS_STATE_UNAVAILABLE:
+		/* Path unuseable for unavailable/offline */
+		err = SCSI_DH_DEV_OFFLINED;
+		break;
+	default:
+		/* Useable path if active */
+		err = SCSI_DH_OK;
+		break;
 	}
 	return err;
 }
@@ -672,7 +676,9 @@ static int alua_activate(struct scsi_device *sdev,
 			goto out;
 	}
 
-	if (h->tpgs & TPGS_MODE_EXPLICIT && h->state != TPGS_STATE_OPTIMIZED) {
+	if (h->tpgs & TPGS_MODE_EXPLICIT &&
+	    h->state != TPGS_STATE_OPTIMIZED &&
+	    h->state != TPGS_STATE_LBA_DEPENDENT) {
 		h->callback_fn = fn;
 		h->callback_data = data;
 		err = submit_stpg(h);
@@ -698,8 +704,11 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
 	struct alua_dh_data *h = get_alua_data(sdev);
 	int ret = BLKPREP_OK;
 
-	if (h->state != TPGS_STATE_OPTIMIZED &&
-	    h->state != TPGS_STATE_NONOPTIMIZED) {
+	if (h->state == TPGS_STATE_TRANSITIONING)
+		ret = BLKPREP_DEFER;
+	else if (h->state != TPGS_STATE_OPTIMIZED &&
+		 h->state != TPGS_STATE_NONOPTIMIZED &&
+		 h->state != TPGS_STATE_LBA_DEPENDENT) {
 		ret = BLKPREP_KILL;
 		req->cmd_flags |= REQ_QUIET;
 	}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-09-22 10:13               ` Hannes Reinecke
@ 2010-09-22 12:29                 ` Mike Snitzer
  2010-09-23  7:15                   ` Hannes Reinecke
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2010-09-22 12:29 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Mike Christie, Nicholas A. Bellinger, James Bottomley, linux-scsi



On Wed, Sep 22 2010 at  6:13am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> Mike Christie wrote:
> > On 09/21/2010 02:33 PM, Mike Snitzer wrote:
> >> 2) the patch also modified alua_rtpg() so implicit ALUA would retry
> >>     (just like explicit ALUA currently does) if TPGS_STATE_TRANSITIONING
> >>     - so why should we avoid retry for implicit but do it for explicit?
> > 
> > Leaving that for Hannes. I cannot think of a reason. Probably just did
> > not do it.
> 
> Finally I got around to answering this.
> 
> I've attached a patch which I made the other day which seems to work
> reasonably well.
> Looks better from my side, so if you agree I'll be sending it
> upstream properly.

Looks good for covering this "2)" change above.

But "1)" change you had concern with (in my previous patch) was that
alua_prep_fn would return BLKPREP_DEFER if TPGS_STATE_TRANSITIONING.
This was bad in that FS requests (like the directio path checker) would
always call ->prep_fn and that if TPGS_STATE_TRANSITIONING the RTPG
state would never be reevaluated.. leaving us stuck in BLKPREP_DEFER.

Seems I'm missing a new flow that proves this is no longer a concern.

Please advise, thanks.
Mike

> From d3f02c90db3e3177309b78726d082e17dd772ee2 Mon Sep 17 00:00:00 2001
> From: Hannes Reinecke <hare@suse.de>
> Date: Wed, 22 Sep 2010 12:09:07 +0200
> Subject: [PATCH] scsi_dh_alua: Handle all states correctly
> 
> For ALUA we should be handling all states, independent of whether
> is explicit or implicit. For 'Transitioning' we should be retry
> for a certain amount of time; after that an error should be
> returned.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index 1a970a7..c6f57e3 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
...
> @@ -698,8 +704,11 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
>  	struct alua_dh_data *h = get_alua_data(sdev);
>  	int ret = BLKPREP_OK;
>  
> -	if (h->state != TPGS_STATE_OPTIMIZED &&
> -	    h->state != TPGS_STATE_NONOPTIMIZED) {
> +	if (h->state == TPGS_STATE_TRANSITIONING)
> +		ret = BLKPREP_DEFER;
> +	else if (h->state != TPGS_STATE_OPTIMIZED &&
> +		 h->state != TPGS_STATE_NONOPTIMIZED &&
> +		 h->state != TPGS_STATE_LBA_DEPENDENT) {
>  		ret = BLKPREP_KILL;
>  		req->cmd_flags |= REQ_QUIET;
>  	}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-09-22 12:29                 ` Mike Snitzer
@ 2010-09-23  7:15                   ` Hannes Reinecke
  2010-09-23 13:44                     ` Mike Snitzer
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Reinecke @ 2010-09-23  7:15 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Mike Christie, Nicholas A. Bellinger, James Bottomley, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

Mike Snitzer wrote:
> 
> On Wed, Sep 22 2010 at  6:13am -0400,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> Mike Christie wrote:
>>> On 09/21/2010 02:33 PM, Mike Snitzer wrote:
>>>> 2) the patch also modified alua_rtpg() so implicit ALUA would retry
>>>>     (just like explicit ALUA currently does) if TPGS_STATE_TRANSITIONING
>>>>     - so why should we avoid retry for implicit but do it for explicit?
>>> Leaving that for Hannes. I cannot think of a reason. Probably just did
>>> not do it.
>> Finally I got around to answering this.
>>
>> I've attached a patch which I made the other day which seems to work
>> reasonably well.
>> Looks better from my side, so if you agree I'll be sending it
>> upstream properly.
> 
> Looks good for covering this "2)" change above.
> 
> But "1)" change you had concern with (in my previous patch) was that
> alua_prep_fn would return BLKPREP_DEFER if TPGS_STATE_TRANSITIONING.
> This was bad in that FS requests (like the directio path checker) would
> always call ->prep_fn and that if TPGS_STATE_TRANSITIONING the RTPG
> state would never be reevaluated.. leaving us stuck in BLKPREP_DEFER.
> 
> Seems I'm missing a new flow that proves this is no longer a concern.
> 
The device_handler will keep retrying any path in 'transitioning'.
After ALUA_FAILOVER_TIMEOUT the device handler will either have
updated the port state to something other than 'transitioning' or
returned with 'SCSI_DH_RETRY'.
But yes, you are right; we should be setting the port to something
else than 'transitioning' then. Probably 'standby' is a good choice
here.
New patch attached.

Thanks for the review.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

[-- Attachment #2: scsi-dh-alua-handle-all-states --]
[-- Type: text/plain, Size: 4832 bytes --]

>From db02ba3c33a4a1dbfcdc4de1b73e378f7b3d8925 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Wed, 22 Sep 2010 12:09:07 +0200
Subject: [PATCH] scsi_dh_alua: Handle all states correctly

For ALUA we should be handling all states, independent of whether
is explicit or implicit. For 'Transitioning' we should be retry
for a certain amount of time; after that we're setting the port
to 'Standby' and return SCSI_DH_RETRY to signal upper layers
a retry is in order here.

Signed-off-by: Hannes Reinecke <hare@suse.de>

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 1a970a7..c7e11ed 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -31,6 +31,7 @@
 #define TPGS_STATE_NONOPTIMIZED		0x1
 #define TPGS_STATE_STANDBY		0x2
 #define TPGS_STATE_UNAVAILABLE		0x3
+#define TPGS_STATE_LBA_DEPENDENT	0x4
 #define TPGS_STATE_OFFLINE		0xe
 #define TPGS_STATE_TRANSITIONING	0xf
 
@@ -39,6 +40,7 @@
 #define TPGS_SUPPORT_NONOPTIMIZED	0x02
 #define TPGS_SUPPORT_STANDBY		0x04
 #define TPGS_SUPPORT_UNAVAILABLE	0x08
+#define TPGS_SUPPORT_LBA_DEPENDENT	0x10
 #define TPGS_SUPPORT_OFFLINE		0x40
 #define TPGS_SUPPORT_TRANSITION		0x80
 
@@ -460,6 +462,8 @@ static char print_alua_state(int state)
 		return 'S';
 	case TPGS_STATE_UNAVAILABLE:
 		return 'U';
+	case TPGS_STATE_LBA_DEPENDENT:
+		return 'L';
 	case TPGS_STATE_OFFLINE:
 		return 'O';
 	case TPGS_STATE_TRANSITIONING:
@@ -542,7 +546,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
 	int len, k, off, valid_states = 0;
 	char *ucp;
 	unsigned err;
+	unsigned long expiry, interval = 10;
 
+	expiry = round_jiffies_up(jiffies + ALUA_FAILOVER_TIMEOUT);
  retry:
 	err = submit_rtpg(sdev, h);
 
@@ -553,7 +559,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
 			return SCSI_DH_IO;
 
 		err = alua_check_sense(sdev, &sense_hdr);
-		if (err == ADD_TO_MLQUEUE)
+		if (err == ADD_TO_MLQUEUE && time_before(jiffies, expiry))
 			goto retry;
 		sdev_printk(KERN_INFO, sdev,
 			    "%s: rtpg sense code %02x/%02x/%02x\n",
@@ -587,38 +593,37 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h)
 	}
 
 	sdev_printk(KERN_INFO, sdev,
-		    "%s: port group %02x state %c supports %c%c%c%c%c%c\n",
+		    "%s: port group %02x state %c supports %c%c%c%c%c%c%c\n",
 		    ALUA_DH_NAME, h->group_id, print_alua_state(h->state),
 		    valid_states&TPGS_SUPPORT_TRANSITION?'T':'t',
 		    valid_states&TPGS_SUPPORT_OFFLINE?'O':'o',
+		    valid_states&TPGS_SUPPORT_LBA_DEPENDENT?'L':'l',
 		    valid_states&TPGS_SUPPORT_UNAVAILABLE?'U':'u',
 		    valid_states&TPGS_SUPPORT_STANDBY?'S':'s',
 		    valid_states&TPGS_SUPPORT_NONOPTIMIZED?'N':'n',
 		    valid_states&TPGS_SUPPORT_OPTIMIZED?'A':'a');
 
-	if (h->tpgs & TPGS_MODE_EXPLICIT) {
-		switch (h->state) {
-		case TPGS_STATE_TRANSITIONING:
+	switch (h->state) {
+	case TPGS_STATE_TRANSITIONING:
+		if (time_before(jiffies, expiry)) {
 			/* State transition, retry */
+			interval *= 10;
+			msleep(interval);
 			goto retry;
-			break;
-		case TPGS_STATE_OFFLINE:
-			/* Path is offline, fail */
-			err = SCSI_DH_DEV_OFFLINED;
-			break;
-		default:
-			break;
 		}
-	} else {
-		/* Only Implicit ALUA support */
-		if (h->state == TPGS_STATE_OPTIMIZED ||
-		    h->state == TPGS_STATE_NONOPTIMIZED ||
-		    h->state == TPGS_STATE_STANDBY)
-			/* Useable path if active */
-			err = SCSI_DH_OK;
-		else
-			/* Path unuseable for unavailable/offline */
-			err = SCSI_DH_DEV_OFFLINED;
+		/* Transitioning time exceeded, set port to standby */
+		err = SCSI_DH_RETRY;
+		h->state = TPGS_STATE_STANDBY;
+		break;
+	case TPGS_STATE_OFFLINE:
+	case TPGS_STATE_UNAVAILABLE:
+		/* Path unuseable for unavailable/offline */
+		err = SCSI_DH_DEV_OFFLINED;
+		break;
+	default:
+		/* Useable path if active */
+		err = SCSI_DH_OK;
+		break;
 	}
 	return err;
 }
@@ -672,7 +677,9 @@ static int alua_activate(struct scsi_device *sdev,
 			goto out;
 	}
 
-	if (h->tpgs & TPGS_MODE_EXPLICIT && h->state != TPGS_STATE_OPTIMIZED) {
+	if (h->tpgs & TPGS_MODE_EXPLICIT &&
+	    h->state != TPGS_STATE_OPTIMIZED &&
+	    h->state != TPGS_STATE_LBA_DEPENDENT) {
 		h->callback_fn = fn;
 		h->callback_data = data;
 		err = submit_stpg(h);
@@ -698,8 +705,11 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req)
 	struct alua_dh_data *h = get_alua_data(sdev);
 	int ret = BLKPREP_OK;
 
-	if (h->state != TPGS_STATE_OPTIMIZED &&
-	    h->state != TPGS_STATE_NONOPTIMIZED) {
+	if (h->state == TPGS_STATE_TRANSITIONING)
+		ret = BLKPREP_DEFER;
+	else if (h->state != TPGS_STATE_OPTIMIZED &&
+		 h->state != TPGS_STATE_NONOPTIMIZED &&
+		 h->state != TPGS_STATE_LBA_DEPENDENT) {
 		ret = BLKPREP_KILL;
 		req->cmd_flags |= REQ_QUIET;
 	}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-09-23  7:15                   ` Hannes Reinecke
@ 2010-09-23 13:44                     ` Mike Snitzer
  2010-09-23 18:53                       ` Mike Snitzer
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2010-09-23 13:44 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Mike Christie, Nicholas A. Bellinger, James Bottomley, linux-scsi

On Thu, Sep 23 2010 at  3:15am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> The device_handler will keep retrying any path in 'transitioning'.
> After ALUA_FAILOVER_TIMEOUT the device handler will either have
> updated the port state to something other than 'transitioning' or
> returned with 'SCSI_DH_RETRY'.
> But yes, you are right; we should be setting the port to something
> else than 'transitioning' then. Probably 'standby' is a good choice
> here.
> New patch attached.
> 
> Thanks for the review.
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		      zSeries & Storage
> hare@suse.de			      +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)

> From db02ba3c33a4a1dbfcdc4de1b73e378f7b3d8925 Mon Sep 17 00:00:00 2001
> From: Hannes Reinecke <hare@suse.de>
> Date: Wed, 22 Sep 2010 12:09:07 +0200
> Subject: [PATCH] scsi_dh_alua: Handle all states correctly
> 
> For ALUA we should be handling all states, independent of whether
> is explicit or implicit. For 'Transitioning' we should be retry
> for a certain amount of time; after that we're setting the port
> to 'Standby' and return SCSI_DH_RETRY to signal upper layers
> a retry is in order here.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>

Acked-by: Mike Snitzer <snitzer@redhat.com>

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scsi_dh_alua: add missing transitioning state support
  2010-09-23 13:44                     ` Mike Snitzer
@ 2010-09-23 18:53                       ` Mike Snitzer
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2010-09-23 18:53 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Mike Christie, Nicholas A. Bellinger, James Bottomley, linux-scsi

On Thu, Sep 23 2010 at  9:44am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> > From db02ba3c33a4a1dbfcdc4de1b73e378f7b3d8925 Mon Sep 17 00:00:00 2001
> > From: Hannes Reinecke <hare@suse.de>
> > Date: Wed, 22 Sep 2010 12:09:07 +0200
> > Subject: [PATCH] scsi_dh_alua: Handle all states correctly
> > 
> > For ALUA we should be handling all states, independent of whether
> > is explicit or implicit. For 'Transitioning' we should be retry
> > for a certain amount of time; after that we're setting the port
> > to 'Standby' and return SCSI_DH_RETRY to signal upper layers
> > a retry is in order here.
> > 
> > Signed-off-by: Hannes Reinecke <hare@suse.de>
> 
> Acked-by: Mike Snitzer <snitzer@redhat.com>

BTW, we're missing the include for the msleep() introduced in this
patch:

#include <linux/delay.h>

James, any chance you'd add that?

Or should Hannes send a revised patch?

Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-09-23 18:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-17 19:05 [PATCH] scsi_dh_alua: add missing transitioning state support Mike Snitzer
2010-08-17 19:23 ` Nicholas A. Bellinger
2010-08-30  9:36   ` Hannes Reinecke
2010-08-31 15:11     ` Mike Snitzer
2010-09-20 15:35       ` Mike Snitzer
2010-09-21  2:27         ` Mike Christie
2010-09-21  2:28           ` Mike Christie
2010-09-21 19:33           ` Mike Snitzer
2010-09-21 21:14             ` Mike Christie
2010-09-22 10:13               ` Hannes Reinecke
2010-09-22 12:29                 ` Mike Snitzer
2010-09-23  7:15                   ` Hannes Reinecke
2010-09-23 13:44                     ` Mike Snitzer
2010-09-23 18:53                       ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.