All of lore.kernel.org
 help / color / mirror / Atom feed
* Scsi Error handling query
@ 2015-03-26 13:38 Kashyap Desai
  2015-03-26 15:57 ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Kashyap Desai @ 2015-03-26 13:38 UTC (permalink / raw)
  To: hare, linux-scsi

Hi Hannes,

I was going through one of the slide posted at below link.

http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf

Slide #59 has below data. I was trying to correlate with latest upstream
code, but do not understand few things. Does Linux handle blocking I/O to
the device and target before it actually start legacy EH recovery ? Also,
how does linux scsi stack achieve task set abort ?

Proposed SCSI EH strategy
• Send command aborts after timeout
• EH Recovery starts:
‒ Block I/O to the device
       ‒ Issue 'Task Set Abort'
‒ Block I/O to the target
       ‒ Issue I_T Nexus Reset
       ‒ Complete outstanding command on success
‒ Engage current EH strategy
       ‒ LUN Reset, Target Reset etc


Thanks, Kashyap
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Scsi Error handling query
  2015-03-26 13:38 Scsi Error handling query Kashyap Desai
@ 2015-03-26 15:57 ` Hannes Reinecke
  2015-03-26 18:43   ` Kashyap Desai
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2015-03-26 15:57 UTC (permalink / raw)
  To: Kashyap Desai, linux-scsi

On 03/26/2015 02:38 PM, Kashyap Desai wrote:
> Hi Hannes,
> 
> I was going through one of the slide posted at below link.
> 
> http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf
> 
> Slide #59 has below data. I was trying to correlate with latest upstream
> code, but do not understand few things. Does Linux handle blocking I/O to
> the device and target before it actually start legacy EH recovery ?

Yes. This is handled by 'scsi_eh_scmd_add()', which adds the command
to the internal 'eh_entry' list and starts recovery once all
remaining outstanding commands are completed.

> Also, how does linux scsi stack achieve task set abort ?
> 
Currently we don't :-)
The presentation was a roadmap about future EH updates.

> Proposed SCSI EH strategy
> • Send command aborts after timeout
> • EH Recovery starts:
> ‒ Block I/O to the device
>        ‒ Issue 'Task Set Abort'
> ‒ Block I/O to the target
>        ‒ Issue I_T Nexus Reset
>        ‒ Complete outstanding command on success
> ‒ Engage current EH strategy
>        ‒ LUN Reset, Target Reset etc
> 
The current plans for EH updates are:

- Convert eh_host_reset_handler() to take Scsi_Host as argument
  - Convert EH host reset to do a host rescan after try_host_reset()
    succeeded
  - Terminate failed scmds prior to calling try_host_reset()
  => with that we should be able to instantiate a quick failover
     when running under multipathing, as then I/Os will be returned
     prior to the host reset (which is know to take quite a long
     time)

- Convert the remaining eh_XXX_reset_handler() to take the
  appropriate structure as argument.
  This will require some work, as some EH handler implementation
  re-use the command tag (or even the actual command) for sending
  TMFs.

- Implementing a 'transport reset' EH function; to be called
  after the current EH LUN Reset

- Investigating the possibilty for an asynchronous 'task set abort',
  and make the 'transport reset' EH function asynchronous, too.

I've got a patchset for the first step, but the others still require
some work ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Scsi Error handling query
  2015-03-26 15:57 ` Hannes Reinecke
@ 2015-03-26 18:43   ` Kashyap Desai
  2015-03-27 16:02     ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Kashyap Desai @ 2015-03-26 18:43 UTC (permalink / raw)
  To: Hannes Reinecke, linux-scsi

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Thursday, March 26, 2015 9:28 PM
> To: Kashyap Desai; linux-scsi@vger.kernel.org
> Subject: Re: Scsi Error handling query
>
> On 03/26/2015 02:38 PM, Kashyap Desai wrote:
> > Hi Hannes,
> >
> > I was going through one of the slide posted at below link.
> >
> > http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pd
> > f
> >
> > Slide #59 has below data. I was trying to correlate with latest
> > upstream code, but do not understand few things. Does Linux handle
> > blocking I/O to the device and target before it actually start legacy EH
> recovery ?
>
> Yes. This is handled by 'scsi_eh_scmd_add()', which adds the command to
> the
> internal 'eh_entry' list and starts recovery once all remaining
> outstanding
> commands are completed.

Thanks Hannes..! Scsi_eh_scmd_add() move shost state to recovery, so it
means  blocking further IO to the Host and not really a limited to
Device/Target for which command was timed out. Right ?
I understood that, new improvement of scsi error handling will allow IOs to
the other Devices attached to the host except the IO belongs to specific
target.

Also, one more thing to clarify... In presentation, term "task set aborts"
was used. Does this mean task set abort is handled as traversing complete
list of timed out command and sending individual TASK ABORT ?

>
> > Also, how does linux scsi stack achieve task set abort ?
> >
> Currently we don't :-)
> The presentation was a roadmap about future EH updates.
>
> > Proposed SCSI EH strategy
> > • Send command aborts after timeout
> > • EH Recovery starts:
> > ‒ Block I/O to the device
> >        ‒ Issue 'Task Set Abort'
> > ‒ Block I/O to the target
> >        ‒ Issue I_T Nexus Reset
> >        ‒ Complete outstanding command on success ‒ Engage current EH
> > strategy
> >        ‒ LUN Reset, Target Reset etc
> >
> The current plans for EH updates are:
>
> - Convert eh_host_reset_handler() to take Scsi_Host as argument
>   - Convert EH host reset to do a host rescan after try_host_reset()
>     succeeded
>   - Terminate failed scmds prior to calling try_host_reset()
>   => with that we should be able to instantiate a quick failover
>      when running under multipathing, as then I/Os will be returned
>      prior to the host reset (which is know to take quite a long
>      time)
>
> - Convert the remaining eh_XXX_reset_handler() to take the
>   appropriate structure as argument.
>   This will require some work, as some EH handler implementation
>   re-use the command tag (or even the actual command) for sending
>   TMFs.
>
> - Implementing a 'transport reset' EH function; to be called
>   after the current EH LUN Reset
>
> - Investigating the possibilty for an asynchronous 'task set abort',
>   and make the 'transport reset' EH function asynchronous, too.
>
> I've got a patchset for the first step, but the others still require some
> work ...
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke		               zSeries & Storage
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284
> (AG
> Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Scsi Error handling query
  2015-03-26 18:43   ` Kashyap Desai
@ 2015-03-27 16:02     ` Hannes Reinecke
  2015-03-30 11:45       ` Kashyap Desai
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2015-03-27 16:02 UTC (permalink / raw)
  To: Kashyap Desai, linux-scsi

On 03/26/2015 07:43 PM, Kashyap Desai wrote:
>> -----Original Message-----
>> From: Hannes Reinecke [mailto:hare@suse.de]
>> Sent: Thursday, March 26, 2015 9:28 PM
>> To: Kashyap Desai; linux-scsi@vger.kernel.org
>> Subject: Re: Scsi Error handling query
>>
>> On 03/26/2015 02:38 PM, Kashyap Desai wrote:
>>> Hi Hannes,
>>>
>>> I was going through one of the slide posted at below link.
>>>
>>> http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pd
>>> f
>>>
>>> Slide #59 has below data. I was trying to correlate with latest
>>> upstream code, but do not understand few things. Does Linux handle
>>> blocking I/O to the device and target before it actually start legacy EH
>> recovery ?
>>
>> Yes. This is handled by 'scsi_eh_scmd_add()', which adds the command to
>> the
>> internal 'eh_entry' list and starts recovery once all remaining
>> outstanding
>> commands are completed.
> 
> Thanks Hannes..! Scsi_eh_scmd_add() move shost state to recovery, so it
> means  blocking further IO to the Host and not really a limited to
> Device/Target for which command was timed out. Right ?
> I understood that, new improvement of scsi error handling will allow IOs to
> the other Devices attached to the host except the IO belongs to specific
> target.
> 
> Also, one more thing to clarify... In presentation, term "task set aborts"
> was used. Does this mean task set abort is handled as traversing complete
> list of timed out command and sending individual TASK ABORT ?
> 
No. The idea was to send 'task set aborts' as a single TMF.
However, I'm not sure if I'll be going ahead with that one; once
we've issued a 'transport reset the commands will be cone anyway.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Scsi Error handling query
  2015-03-27 16:02     ` Hannes Reinecke
@ 2015-03-30 11:45       ` Kashyap Desai
  2015-03-30 15:12         ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Kashyap Desai @ 2015-03-30 11:45 UTC (permalink / raw)
  To: Hannes Reinecke, linux-scsi

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Friday, March 27, 2015 9:32 PM
> To: Kashyap Desai; linux-scsi@vger.kernel.org
> Subject: Re: Scsi Error handling query
>
> On 03/26/2015 07:43 PM, Kashyap Desai wrote:
> >> -----Original Message-----
> >> From: Hannes Reinecke [mailto:hare@suse.de]
> >> Sent: Thursday, March 26, 2015 9:28 PM
> >> To: Kashyap Desai; linux-scsi@vger.kernel.org
> >> Subject: Re: Scsi Error handling query
> >>
> >> On 03/26/2015 02:38 PM, Kashyap Desai wrote:
> >>> Hi Hannes,
> >>>
> >>> I was going through one of the slide posted at below link.
> >>>
> >>> http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.
> >>> pd
> >>> f
> >>>
> >>> Slide #59 has below data. I was trying to correlate with latest
> >>> upstream code, but do not understand few things. Does Linux handle
> >>> blocking I/O to the device and target before it actually start
> >>> legacy EH
> >> recovery ?
> >>
> >> Yes. This is handled by 'scsi_eh_scmd_add()', which adds the command
> >> to the internal 'eh_entry' list and starts recovery once all
> >> remaining outstanding commands are completed.
> >
> > Thanks Hannes..! Scsi_eh_scmd_add() move shost state to recovery, so
> > it means  blocking further IO to the Host and not really a limited to
> > Device/Target for which command was timed out. Right ?
> > I understood that, new improvement of scsi error handling will allow
> > IOs to the other Devices attached to the host except the IO belongs to
> > specific target.
> >
> > Also, one more thing to clarify... In presentation, term "task set
> > aborts"
> > was used. Does this mean task set abort is handled as traversing
> > complete list of timed out command and sending individual TASK ABORT ?
> >
> No. The idea was to send 'task set aborts' as a single TMF.

Thanks Hannes.!  OK so idea was to have single TMF for "Task set abort."   I
am not sure how to frame my  next question.. But what if Linux SCSI layer
traverse each IO of one particular target and issue individual Task abort?
Don’t we call that as "task set aborts" ?  How LLD Driver should interface
for "task set aborts" as single TMF ? My understanding is "Task set abort"
will be internally converted to single Task abort either by SCSI layer or
HBA FW.

> However, I'm not sure if I'll be going ahead with that one; once we've
> issued a
> 'transport reset the commands will be cone anyway.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke		               zSeries & Storage
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284
> (AG
> Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Scsi Error handling query
  2015-03-30 11:45       ` Kashyap Desai
@ 2015-03-30 15:12         ` Hannes Reinecke
  2015-03-31 13:33           ` Kashyap Desai
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2015-03-30 15:12 UTC (permalink / raw)
  To: Kashyap Desai, linux-scsi

On 03/30/2015 01:45 PM, Kashyap Desai wrote:
>> -----Original Message-----
>> From: Hannes Reinecke [mailto:hare@suse.de]
>> Sent: Friday, March 27, 2015 9:32 PM
>> To: Kashyap Desai; linux-scsi@vger.kernel.org
>> Subject: Re: Scsi Error handling query
>>
>> On 03/26/2015 07:43 PM, Kashyap Desai wrote:
>>>> -----Original Message-----
>>>> From: Hannes Reinecke [mailto:hare@suse.de]
>>>> Sent: Thursday, March 26, 2015 9:28 PM
>>>> To: Kashyap Desai; linux-scsi@vger.kernel.org
>>>> Subject: Re: Scsi Error handling query
>>>>
>>>> On 03/26/2015 02:38 PM, Kashyap Desai wrote:
>>>>> Hi Hannes,
>>>>>
>>>>> I was going through one of the slide posted at below link.
>>>>>
>>>>> http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.
>>>>> pd
>>>>> f
>>>>>
>>>>> Slide #59 has below data. I was trying to correlate with latest
>>>>> upstream code, but do not understand few things. Does Linux handle
>>>>> blocking I/O to the device and target before it actually start
>>>>> legacy EH
>>>> recovery ?
>>>>
>>>> Yes. This is handled by 'scsi_eh_scmd_add()', which adds the command
>>>> to the internal 'eh_entry' list and starts recovery once all
>>>> remaining outstanding commands are completed.
>>>
>>> Thanks Hannes..! Scsi_eh_scmd_add() move shost state to recovery, so
>>> it means  blocking further IO to the Host and not really a limited to
>>> Device/Target for which command was timed out. Right ?
>>> I understood that, new improvement of scsi error handling will allow
>>> IOs to the other Devices attached to the host except the IO belongs to
>>> specific target.
>>>
>>> Also, one more thing to clarify... In presentation, term "task set
>>> aborts"
>>> was used. Does this mean task set abort is handled as traversing
>>> complete list of timed out command and sending individual TASK ABORT ?
>>>
>> No. The idea was to send 'task set aborts' as a single TMF.
> 
> Thanks Hannes.!  OK so idea was to have single TMF for "Task set abort."   I
> am not sure how to frame my  next question.. But what if Linux SCSI layer
> traverse each IO of one particular target and issue individual Task abort?
> Don’t we call that as "task set aborts" ?  How LLD Driver should interface
> for "task set aborts" as single TMF ? My understanding is "Task set abort"
> will be internally converted to single Task abort either by SCSI layer or
> HBA FW.
> 
Why? There _is_ a 'task set abort' TMF defined in SAM.
If the firmware doesn't implement it I'd thought the respective
command to be failed?

However, at this point I'm not sure if 'task set abort' is actually
required; it _should_ be superseded by the new 'transport reset' EH
handler.
On the FC side this will translate into a relogin, which will
automatically abort all outstanding tasks.
SAS even has a dedicated TMF IT NEXUS LOSS, which looks like it
could be used here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Scsi Error handling query
  2015-03-30 15:12         ` Hannes Reinecke
@ 2015-03-31 13:33           ` Kashyap Desai
  0 siblings, 0 replies; 7+ messages in thread
From: Kashyap Desai @ 2015-03-31 13:33 UTC (permalink / raw)
  To: Hannes Reinecke, linux-scsi

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Monday, March 30, 2015 8:43 PM
> To: Kashyap Desai; linux-scsi@vger.kernel.org
> Subject: Re: Scsi Error handling query
>
> On 03/30/2015 01:45 PM, Kashyap Desai wrote:
> >> -----Original Message-----
> >> From: Hannes Reinecke [mailto:hare@suse.de]
> >> Sent: Friday, March 27, 2015 9:32 PM
> >> To: Kashyap Desai; linux-scsi@vger.kernel.org
> >> Subject: Re: Scsi Error handling query
> >>
> >> On 03/26/2015 07:43 PM, Kashyap Desai wrote:
> >>>> -----Original Message-----
> >>>> From: Hannes Reinecke [mailto:hare@suse.de]
> >>>> Sent: Thursday, March 26, 2015 9:28 PM
> >>>> To: Kashyap Desai; linux-scsi@vger.kernel.org
> >>>> Subject: Re: Scsi Error handling query
> >>>>
> >>>> On 03/26/2015 02:38 PM, Kashyap Desai wrote:
> >>>>> Hi Hannes,
> >>>>>
> >>>>> I was going through one of the slide posted at below link.
> >>>>>
> >>>>> http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.
> >>>>> pd
> >>>>> f
> >>>>>
> >>>>> Slide #59 has below data. I was trying to correlate with latest
> >>>>> upstream code, but do not understand few things. Does Linux handle
> >>>>> blocking I/O to the device and target before it actually start
> >>>>> legacy EH
> >>>> recovery ?
> >>>>
> >>>> Yes. This is handled by 'scsi_eh_scmd_add()', which adds the
> >>>> command to the internal 'eh_entry' list and starts recovery once
> >>>> all remaining outstanding commands are completed.
> >>>
> >>> Thanks Hannes..! Scsi_eh_scmd_add() move shost state to recovery, so
> >>> it means  blocking further IO to the Host and not really a limited
> >>> to Device/Target for which command was timed out. Right ?
> >>> I understood that, new improvement of scsi error handling will allow
> >>> IOs to the other Devices attached to the host except the IO belongs
> >>> to specific target.
> >>>
> >>> Also, one more thing to clarify... In presentation, term "task set
> >>> aborts"
> >>> was used. Does this mean task set abort is handled as traversing
> >>> complete list of timed out command and sending individual TASK ABORT ?
> >>>
> >> No. The idea was to send 'task set aborts' as a single TMF.
> >
> > Thanks Hannes.!  OK so idea was to have single TMF for "Task set abort."
> > I
> > am not sure how to frame my  next question.. But what if Linux SCSI
> > layer traverse each IO of one particular target and issue individual
> > Task abort?
> > Don’t we call that as "task set aborts" ?  How LLD Driver should
> > interface for "task set aborts" as single TMF ? My understanding is
> > "Task set
> abort"
> > will be internally converted to single Task abort either by SCSI layer
> > or HBA FW.
> >
> Why? There _is_ a 'task set abort' TMF defined in SAM.
> If the firmware doesn't implement it I'd thought the respective command to
> be
> failed?
Understood this part.  It is single TMF of "task set abort" is what
addressed here.

Is there any harm to do in Low level driver as below for Lun/Target reset ?

Once LLD enter into eh_abort_handler callback, try to do "Task Set Abort"
from LLD as single TMF (most likely convert single "task set abort" in FW
specific format)
Wait for completion and see if things are really resolve. If not, issue
Target Reset (from the FW) which will be SAS Link Hard Reset.

This way driver can  abort command related to associated I_T nexus.
I initially thought Linux scsi error handling actually does this because it
completely traverse all timed out command from error handling thread (which
will be almost equivalent to single TMF "Task set abort")

>
> However, at this point I'm not sure if 'task set abort' is actually
> required; it
> _should_ be superseded by the new 'transport reset' EH handler.
> On the FC side this will translate into a relogin, which will
> automatically abort all
> outstanding tasks.
> SAS even has a dedicated TMF IT NEXUS LOSS, which looks like it could be
> used
> here.

Not sure if my above comment/query is on same line.

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke		               zSeries & Storage
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284
> (AG
> Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-03-31 13:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-26 13:38 Scsi Error handling query Kashyap Desai
2015-03-26 15:57 ` Hannes Reinecke
2015-03-26 18:43   ` Kashyap Desai
2015-03-27 16:02     ` Hannes Reinecke
2015-03-30 11:45       ` Kashyap Desai
2015-03-30 15:12         ` Hannes Reinecke
2015-03-31 13:33           ` Kashyap Desai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.