All of lore.kernel.org
 help / color / mirror / Atom feed
* SCSI TMF processing; tag allocation
@ 2010-11-06  6:02 Luben Tuikov
  2010-11-13 12:37 ` Matthew Wilcox
  2010-11-15 18:53 ` Douglas Gilbert
  0 siblings, 2 replies; 11+ messages in thread
From: Luben Tuikov @ 2010-11-06  6:02 UTC (permalink / raw)
  To: Greg KH, linux-scsi, linux-kernel, tj, jens.axboe, James.Bottomley

Here is the following scenario:

Tags 2, 3, ..., X are pending in the device, NOT necessarily submitted in that order (this is the "task set").

Tag 8 times out and is aborted via TMF ABORT TASK. Immediately the device returns okay and the driver says to SCSI core that all is well.

SCSI Core sends TUR immediately, reusing the tag, tag 8.

Then the rest of the commands that were pending in the task set complete with success (2, 3, ..., X, and 8), in the order they were submitted, since tasks 2 to X are I/O commands and 8 being TUR has no implicit HEAD OF QUEUE it completes last with success. Now the device task set is empty. Those commands are completed okay with the Linux device driver with SCSI Core (meaning done() were called, result is 0).

SCSI Core though, continues to abort the rest of the task set, tags 2, ..., X. The device says okay, since they are not in its task set. So SCSI Core's error "handling" routine, sends ABORT TASK for TAG 3. *BUT* since tag 3 is free, immediately after the TMF, we see TUR with, guess what, tag 3, since as far as SCSI/Block layer are concerned, tag 3 is free
(bitmap in the block layer).

Sure enough, the device receives both, back to back: first the ABORT TASK
TMF with tag of task to be managed (TTBM) of 3, and then it receives TUR
with tag of 3. Sure enough the device aborts the TUR.

TUR times out with SCSI Core/block layer and we see SCSI Core tries to
abort it again, by sending another ABORT TASK TMF with TTBM of 3, that of
the TUR.

At the same time error handling goes on and sends ABORT TASK TMF for the rest of the commands in the task set and at the end sends LU RESET and I_T Nexus reset (well, the driver that is, but nevertheless).

There are several issues here involving people at various layers (not in any/priority order):

First, SCSI Core should probably send ABORT TASK SET. Sending ABORT TASK for each task at the device is also okay, but adds transport overhead.

Second, SCSI Core should not send ABORT TASK for a completed task. Sure, the device will reply with FUNCTION COMPLETE either way, but in the
aforementioned case the task server aborts a command which wasn't
intended to be aborted (and since the tag was reused--see below). That is,
if the command is in the "eh" queue, and is completed, don't send ABORT
TASK for it.

Third, and most importantly, tags should form an increasing sequence and should not be reused until all other tags after it and before it have been reused. This for example can be accomplished if one were to use
find_next_zero_bit() with non-zero "offset", it being the last allocated
tag in a modulo the number of tags manner. That is, find_next_zero_bit()
could wrap as well as starting from an offset or the caller could implement
that via two calls to this function, in blk_queue_start_tag().

Forth, transport protocols need tags for other purposes than just sending
I/O commands, for example sending task management functions. LLDDs should
be given callbacks to allocate a free tag, only if #3 above is implemented.

Fifth, all commands that enter queuecommand() should be tagged, regardless
of whether the device supports tags and how many. At the moment, this
isn't so, and transports are forced to reserve the first tag the
transport supports for non-tagged commands and the rest for tagged
commands. For example INQUIRY coming untagged (tag 0), and then
READ coming with tag 0 (tagged). This adds additional work for LLDDs to
check whether the "request" is tagged or not and assign it a tag if
it is not, or offset the tag if it is tagged with the offset reserved
for untagged tasks (normally just one).

    Luben


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-06  6:02 SCSI TMF processing; tag allocation Luben Tuikov
@ 2010-11-13 12:37 ` Matthew Wilcox
  2010-11-15  9:28   ` Jens Axboe
  2010-11-16 20:28   ` Vladislav Bolkhovitin
  2010-11-15 18:53 ` Douglas Gilbert
  1 sibling, 2 replies; 11+ messages in thread
From: Matthew Wilcox @ 2010-11-13 12:37 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: Greg KH, linux-scsi, linux-kernel, tj, James Bottomley, Jens Axboe

On Fri, Nov 05, 2010 at 11:02:43PM -0700, Luben Tuikov wrote:
> Sure enough, the device receives both, back to back: first the ABORT TASK
> TMF with tag of task to be managed (TTBM) of 3, and then it receives TUR
> with tag of 3. Sure enough the device aborts the TUR.

I think that's acceptable device behaviour, but it's not guaranteed
device behaviour, as far as I can tell.  Receiving the abort and then
the task isn't guaranteed to abort the task.

> First, SCSI Core should probably send ABORT TASK SET. Sending ABORT TASK for each task at the device is also okay, but adds transport overhead.

Agreed.

> Second, SCSI Core should not send ABORT TASK for a completed task. Sure, the device will reply with FUNCTION COMPLETE either way, but in the
> aforementioned case the task server aborts a command which wasn't
> intended to be aborted (and since the tag was reused--see below). That is,
> if the command is in the "eh" queue, and is completed, don't send ABORT
> TASK for it.

Agreed.

> Third, and most importantly, tags should form an increasing sequence and should not be reused until all other tags after it and before it have been reused. This for example can be accomplished if one were to use
> find_next_zero_bit() with non-zero "offset", it being the last allocated
> tag in a modulo the number of tags manner. That is, find_next_zero_bit()
> could wrap as well as starting from an offset or the caller could implement
> that via two calls to this function, in blk_queue_start_tag().

It might be more efficient too.  If we cycle through, we can start by
just trying to assign the next tag; it will likely succeed.  If it's
already assigned, then we can search.

> Forth, transport protocols need tags for other purposes than just sending
> I/O commands, for example sending task management functions. LLDDs should
> be given callbacks to allocate a free tag, only if #3 above is implemented.

Definitely!

> Fifth, all commands that enter queuecommand() should be tagged, regardless
> of whether the device supports tags and how many. At the moment, this
> isn't so, and transports are forced to reserve the first tag the
> transport supports for non-tagged commands and the rest for tagged
> commands. For example INQUIRY coming untagged (tag 0), and then
> READ coming with tag 0 (tagged). This adds additional work for LLDDs to
> check whether the "request" is tagged or not and assign it a tag if
> it is not, or offset the tag if it is tagged with the offset reserved
> for untagged tasks (normally just one).

This one confuses me, though.  I've not seen it happen.  As far as I can
tell, the SCSI layer will either send a single untagged command, or it
will send one-or-more tagged commands.  I haven't been able to provoke it
into sending an untagged command _and_ a tagged command at the same time.

For UAS, I didn't reserve a tag; I just use tag 0 for untagged
commands, and refuse to send a tagged command if there's an untagged
command outstanding.  I could put a printk in to see if that logic
ever triggers...

SPC-4 seems to indicate (in the INQUIRY command, 6.4.2) that support
for queueing is now non-optional, and even support for basic queueing
has been removed, replaced only with the 'command management model'.
I must confess to being somewhat confused by the difference between the
'basic queue' model and 'command management' model, but ideally as a
driver writer, I shouldn't have to understand it (because it would be
handled for me by the SCSI core).

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-13 12:37 ` Matthew Wilcox
@ 2010-11-15  9:28   ` Jens Axboe
  2010-11-15 14:33     ` James Bottomley
  2010-11-16 20:28   ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2010-11-15  9:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luben Tuikov, Greg KH, linux-scsi, linux-kernel, tj, James Bottomley

>> Third, and most importantly, tags should form an increasing sequence and should not be reused until all other tags after it and before it have been reused. This for example can be accomplished if one were to use
>> find_next_zero_bit() with non-zero "offset", it being the last allocated
>> tag in a modulo the number of tags manner. That is, find_next_zero_bit()
>> could wrap as well as starting from an offset or the caller could implement
>> that via two calls to this function, in blk_queue_start_tag().

Care to explain your reasoning? For starvation issues? At least I'm not
aware of any correctness issues in that regard, but doing tag cycling in
this fashion seems like a good idea just to prevent starvation alone by
an ill behaving device.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-15  9:28   ` Jens Axboe
@ 2010-11-15 14:33     ` James Bottomley
  2010-11-15 14:40       ` Alan Cox
  2010-11-15 14:46       ` Matthew Wilcox
  0 siblings, 2 replies; 11+ messages in thread
From: James Bottomley @ 2010-11-15 14:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Matthew Wilcox, Luben Tuikov, Greg KH, linux-scsi, linux-kernel, tj

On Mon, 2010-11-15 at 10:28 +0100, Jens Axboe wrote:
> >> Third, and most importantly, tags should form an increasing sequence and should not be reused until all other tags after it and before it have been reused. This for example can be accomplished if one were to use
> >> find_next_zero_bit() with non-zero "offset", it being the last allocated
> >> tag in a modulo the number of tags manner. That is, find_next_zero_bit()
> >> could wrap as well as starting from an offset or the caller could implement
> >> that via two calls to this function, in blk_queue_start_tag().
> 
> Care to explain your reasoning? For starvation issues? At least I'm not
> aware of any correctness issues in that regard, but doing tag cycling in
> this fashion seems like a good idea just to prevent starvation alone by
> an ill behaving device.

Right, it's the clock algorithm to prevent tag starvation.  If you have
hands representing the first and last tag and they're never allowed to
cross, the device can't starve any tag for too long because eventually
it will be the only outstanding command.

It's not the only algorithm however.  Banging down an ordered tag every
200 or so commands has exactly the same effect.  In fact the clock
algorithm was what the 53c700 driver used (before it was converted to
generic tags) and the ordered tag what aic7xxx uses.

Realistically, tag starvation isn't really a problem.  It was a known
issue for 80s era hardware.  I've got some of the oldest drives on the
planet and I didn't see a problem when the clock algorithm was removed
from 53c700.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-15 14:33     ` James Bottomley
@ 2010-11-15 14:40       ` Alan Cox
  2010-11-15 14:53         ` James Bottomley
  2010-11-15 14:46       ` Matthew Wilcox
  1 sibling, 1 reply; 11+ messages in thread
From: Alan Cox @ 2010-11-15 14:40 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Matthew Wilcox, Luben Tuikov, Greg KH, linux-scsi,
	linux-kernel, tj

> Realistically, tag starvation isn't really a problem.  It was a known
> issue for 80s era hardware.  I've got some of the oldest drives on the
> planet and I didn't see a problem when the clock algorithm was removed
> from 53c700.

I'm not entirely sure that is the case at least for some early NCQ
capable ATA drives 8(

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-15 14:33     ` James Bottomley
  2010-11-15 14:40       ` Alan Cox
@ 2010-11-15 14:46       ` Matthew Wilcox
  2010-11-15 14:52         ` James Bottomley
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2010-11-15 14:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Luben Tuikov, Greg KH, linux-scsi, linux-kernel, tj

On Mon, Nov 15, 2010 at 08:33:15AM -0600, James Bottomley wrote:
> Right, it's the clock algorithm to prevent tag starvation.  If you have
> hands representing the first and last tag and they're never allowed to
> cross, the device can't starve any tag for too long because eventually
> it will be the only outstanding command.
> 
> It's not the only algorithm however.  Banging down an ordered tag every
> 200 or so commands has exactly the same effect.  In fact the clock
> algorithm was what the 53c700 driver used (before it was converted to
> generic tags) and the ordered tag what aic7xxx uses.
> 
> Realistically, tag starvation isn't really a problem.  It was a known
> issue for 80s era hardware.  I've got some of the oldest drives on the
> planet and I didn't see a problem when the clock algorithm was removed
> from 53c700.

The problem is that each driver is solving the problem in its own way
right now, which is clearly daft.  And no drive manufactured in the past
fifteen years supports ordered tags anyway, so they're only a placebo
at this point.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-15 14:46       ` Matthew Wilcox
@ 2010-11-15 14:52         ` James Bottomley
  0 siblings, 0 replies; 11+ messages in thread
From: James Bottomley @ 2010-11-15 14:52 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jens Axboe, Luben Tuikov, Greg KH, linux-scsi, linux-kernel, tj

On Mon, 2010-11-15 at 07:46 -0700, Matthew Wilcox wrote:
> On Mon, Nov 15, 2010 at 08:33:15AM -0600, James Bottomley wrote:
> > Right, it's the clock algorithm to prevent tag starvation.  If you have
> > hands representing the first and last tag and they're never allowed to
> > cross, the device can't starve any tag for too long because eventually
> > it will be the only outstanding command.
> > 
> > It's not the only algorithm however.  Banging down an ordered tag every
> > 200 or so commands has exactly the same effect.  In fact the clock
> > algorithm was what the 53c700 driver used (before it was converted to
> > generic tags) and the ordered tag what aic7xxx uses.
> > 
> > Realistically, tag starvation isn't really a problem.  It was a known
> > issue for 80s era hardware.  I've got some of the oldest drives on the
> > planet and I didn't see a problem when the clock algorithm was removed
> > from 53c700.
> 
> The problem is that each driver is solving the problem in its own way
> right now, which is clearly daft.  And no drive manufactured in the past
> fifteen years supports ordered tags anyway, so they're only a placebo
> at this point.

Actually, most drivers ignore the issue ... which is why I don't think
there's much of a problem.  The aic7xxx is the only one I know doing
something about this.  Like I said, I removed the clock algorithm from
53c700 and haven't had a problem (yet, I suppose).

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-15 14:40       ` Alan Cox
@ 2010-11-15 14:53         ` James Bottomley
  0 siblings, 0 replies; 11+ messages in thread
From: James Bottomley @ 2010-11-15 14:53 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jens Axboe, Matthew Wilcox, Luben Tuikov, Greg KH, linux-scsi,
	linux-kernel, tj

On Mon, 2010-11-15 at 14:40 +0000, Alan Cox wrote:
> > Realistically, tag starvation isn't really a problem.  It was a known
> > issue for 80s era hardware.  I've got some of the oldest drives on the
> > planet and I didn't see a problem when the clock algorithm was removed
> > from 53c700.
> 
> I'm not entirely sure that is the case at least for some early NCQ
> capable ATA drives 8(

OK, since ATA seems to be reinventing all sorts of bad 80s SCSI
behaviour, I'll concede this point.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-06  6:02 SCSI TMF processing; tag allocation Luben Tuikov
  2010-11-13 12:37 ` Matthew Wilcox
@ 2010-11-15 18:53 ` Douglas Gilbert
  2010-11-15 19:09   ` Matthew Wilcox
  1 sibling, 1 reply; 11+ messages in thread
From: Douglas Gilbert @ 2010-11-15 18:53 UTC (permalink / raw)
  To: ltuikov
  Cc: Greg KH, linux-scsi, linux-kernel, tj, jens.axboe, James.Bottomley

On 10-11-06 02:02 AM, Luben Tuikov wrote:
> Here is the following scenario:
>
> Tags 2, 3, ..., X are pending in the device, NOT necessarily submitted in that order (this is the "task set").
>
> Tag 8 times out and is aborted via TMF ABORT TASK. Immediately the device returns okay and the driver says to SCSI core that all is well.
>
> SCSI Core sends TUR immediately, reusing the tag, tag 8.
>
> Then the rest of the commands that were pending in the task set complete with success (2, 3, ..., X, and 8), in the order they were submitted, since tasks 2 to X are I/O commands and 8 being TUR has no implicit HEAD OF QUEUE it completes last with success. Now the device task set is empty. Those commands are completed okay with the Linux device driver with SCSI Core (meaning done() were called, result is 0).

Just checked, the SCSI commands with a "implicit head of
queue" attribute are:
   - INQUIRY
   - REPORT LUNS
   - READ CAPACITY (10+16) [SBC]

On the subject of UAS, due to some USB-3 streaming protocol
requirements, three tag values cannot be used by SCSI. They
are 0h, FFFEh and FFFFh. So those available for SCSI tag
use are 1h to FFFDh inclusive.

Doug Gilbert


> SCSI Core though, continues to abort the rest of the task set, tags 2, ..., X. The device says okay, since they are not in its task set. So SCSI Core's error "handling" routine, sends ABORT TASK for TAG 3. *BUT* since tag 3 is free, immediately after the TMF, we see TUR with, guess what, tag 3, since as far as SCSI/Block layer are concerned, tag 3 is free
> (bitmap in the block layer).
>
> Sure enough, the device receives both, back to back: first the ABORT TASK
> TMF with tag of task to be managed (TTBM) of 3, and then it receives TUR
> with tag of 3. Sure enough the device aborts the TUR.
>
> TUR times out with SCSI Core/block layer and we see SCSI Core tries to
> abort it again, by sending another ABORT TASK TMF with TTBM of 3, that of
> the TUR.
>
> At the same time error handling goes on and sends ABORT TASK TMF for the rest of the commands in the task set and at the end sends LU RESET and I_T Nexus reset (well, the driver that is, but nevertheless).
>
> There are several issues here involving people at various layers (not in any/priority order):
>
> First, SCSI Core should probably send ABORT TASK SET. Sending ABORT TASK for each task at the device is also okay, but adds transport overhead.
>
> Second, SCSI Core should not send ABORT TASK for a completed task. Sure, the device will reply with FUNCTION COMPLETE either way, but in the
> aforementioned case the task server aborts a command which wasn't
> intended to be aborted (and since the tag was reused--see below). That is,
> if the command is in the "eh" queue, and is completed, don't send ABORT
> TASK for it.
>
> Third, and most importantly, tags should form an increasing sequence and should not be reused until all other tags after it and before it have been reused. This for example can be accomplished if one were to use
> find_next_zero_bit() with non-zero "offset", it being the last allocated
> tag in a modulo the number of tags manner. That is, find_next_zero_bit()
> could wrap as well as starting from an offset or the caller could implement
> that via two calls to this function, in blk_queue_start_tag().
>
> Forth, transport protocols need tags for other purposes than just sending
> I/O commands, for example sending task management functions. LLDDs should
> be given callbacks to allocate a free tag, only if #3 above is implemented.
>
> Fifth, all commands that enter queuecommand() should be tagged, regardless
> of whether the device supports tags and how many. At the moment, this
> isn't so, and transports are forced to reserve the first tag the
> transport supports for non-tagged commands and the rest for tagged
> commands. For example INQUIRY coming untagged (tag 0), and then
> READ coming with tag 0 (tagged). This adds additional work for LLDDs to
> check whether the "request" is tagged or not and assign it a tag if
> it is not, or offset the tag if it is tagged with the offset reserved
> for untagged tasks (normally just one).
>
>      Luben
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-15 18:53 ` Douglas Gilbert
@ 2010-11-15 19:09   ` Matthew Wilcox
  0 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox @ 2010-11-15 19:09 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: ltuikov, Greg KH, linux-scsi, linux-kernel, tj, jens.axboe,
	James.Bottomley

On Mon, Nov 15, 2010 at 01:53:28PM -0500, Douglas Gilbert wrote:
> On the subject of UAS, due to some USB-3 streaming protocol
> requirements, three tag values cannot be used by SCSI. They
> are 0h, FFFEh and FFFFh. So those available for SCSI tag
> use are 1h to FFFDh inclusive.

I add 1 to the tag value handed to me by the block layer, and use that
as the tag value that I tell the UAS device this command has.  So if I
ask for 256 tags, I'll hand values 1-257 to the device.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: SCSI TMF processing; tag allocation
  2010-11-13 12:37 ` Matthew Wilcox
  2010-11-15  9:28   ` Jens Axboe
@ 2010-11-16 20:28   ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 11+ messages in thread
From: Vladislav Bolkhovitin @ 2010-11-16 20:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luben Tuikov, Greg KH, linux-scsi, linux-kernel, tj,
	James Bottomley, Jens Axboe

Matthew Wilcox, on 11/13/2010 03:37 PM wrote:
> SPC-4 seems to indicate (in the INQUIRY command, 6.4.2) that support
> for queueing is now non-optional, and even support for basic queueing
> has been removed, replaced only with the 'command management model'.
> I must confess to being somewhat confused by the difference between the
> 'basic queue' model and 'command management' model

The 'command management model' is what was called 'full task management
model' in SAM-3. The difference of it with 'basic task management model'
in SAM-3 is that (basically) in the latter:

 - Support only for SIMPLE or ORDERED commands (not both)

 - For SIMPLE commands their reorder can't be controlled

 - On any error all queued commands must be aborted and this behavior
can't be controlled.

Vlad


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-11-16 20:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-06  6:02 SCSI TMF processing; tag allocation Luben Tuikov
2010-11-13 12:37 ` Matthew Wilcox
2010-11-15  9:28   ` Jens Axboe
2010-11-15 14:33     ` James Bottomley
2010-11-15 14:40       ` Alan Cox
2010-11-15 14:53         ` James Bottomley
2010-11-15 14:46       ` Matthew Wilcox
2010-11-15 14:52         ` James Bottomley
2010-11-16 20:28   ` Vladislav Bolkhovitin
2010-11-15 18:53 ` Douglas Gilbert
2010-11-15 19:09   ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.