All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling
@ 2019-09-26 22:08 Damien Le Moal
  2019-09-26 22:08 ` [PATCH 1/2] scsi: save/restore command resid for error handling Damien Le Moal
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Damien Le Moal @ 2019-09-26 22:08 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-usb, usb-storage,
	Alan Stern, Greg Kroah-Hartman
  Cc: Justin Piszcz

If a non-passthrough command is terminated with a CHECK CONDITION, the
scsi error recovery code reuses the failed command scsi_cmnd structure
to process error recovery request sense. To preserve information
regarding the failed command, the functions scsi_eh_prep_cmnd() and
scsi_eh_restore_cmnd() respectively save and restore the original
command information. However, the resid field of the failed command
request structure is not preserved and reused for the request sense
handling, leading to the original command having an incorrect resid
when:
A) The command is not retried and terminated with an error
B) The command completes after retry and the underlying LLD does not set
   resid for a fully completed command (resid=0)

The first patch of this series addresses case (A) above by adding resid
as part of the command information saved using struct scsi_eh_save.

Case B can be observed with a WD My Book USB disks when a read or write
command is sent to the disk while the disk is in deep sleep mode
(spun down) due to a long period of inactivity (~30min).
In such case, the following command sequence happen:
1) The read or write command is terminated after a few seconds with
   CHECK CONDITION and an asc/ascq of 04/01 (LOGICAL UNIT IS IN PROCESS
   OF BECOMING READY)
2) In response to this failure, the USB mass storage driver triggers
   autosense processing, reusing the command descriptor to issue a
   request sense command with a 96B sense buffer size. The reply
   to this command gives a NOT READY / LOGICAL UNIT IS IN PROCESS
   OF BECOMING READY sense of 18B, resulting in a resid of 78B.
3) The original command is retried and failed again, with step 2
   repeated, until the drive spins up and becomes ready.
4) When the original command completes after the drive has become ready,
   the request sense command resid of 78B is seen by the scsi disk
   driver sd_done() function and wrongly generates a warning about the
   unaligned value reported.

This problem is fixed in patch 2 by always setting a command resid to 0
when there is no residual in usb_stor_Bulk_transport(). Note that
usb_stor_CB_transport() does not need changes since usb_stor_bulk_srb()
always sets the resid for a completed command, regardless of the
residual value.

Damien Le Moal (2):
  scsi: save/restore command resid for error handling
  usb: Clear scsi command resid when residue is 0

 drivers/scsi/scsi_error.c       | 2 ++
 drivers/usb/storage/transport.c | 9 +++++++++
 include/scsi/scsi_eh.h          | 1 +
 3 files changed, 12 insertions(+)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] scsi: save/restore command resid for error handling
  2019-09-26 22:08 [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling Damien Le Moal
@ 2019-09-26 22:08 ` Damien Le Moal
  2019-09-27 19:20   ` Christoph Hellwig
  2019-09-26 22:08 ` [PATCH 2/2] usb: Clear scsi command resid when residue is 0 Damien Le Moal
  2019-09-26 23:57 ` [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling Alan Stern
  2 siblings, 1 reply; 11+ messages in thread
From: Damien Le Moal @ 2019-09-26 22:08 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-usb, usb-storage,
	Alan Stern, Greg Kroah-Hartman
  Cc: Justin Piszcz

When a command is terminated with CHECK CONDITION and request sense
executed by hijacking the command descriptor, the original command resid
is lost and replaced with the resid from the execution of request sense.
If based on the obtained sense data the command is aborted and not
retried, the resid that will be seen by drivers such as sd will be the
resid of the request sense execution and not the value from the original
command failure. Make sure this does not happen by adding resid as part
of the command information saved using struct scsi_eh_save.

Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/scsi_error.c | 2 ++
 include/scsi/scsi_eh.h    | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1c470e31ae81..d4ac13979189 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -967,6 +967,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
 	ses->data_direction = scmd->sc_data_direction;
 	ses->sdb = scmd->sdb;
 	ses->result = scmd->result;
+	ses->resid = scsi_get_resid(scmd);
 	ses->underflow = scmd->underflow;
 	ses->prot_op = scmd->prot_op;
 	ses->eh_eflags = scmd->eh_eflags;
@@ -1029,6 +1030,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses)
 	scmd->sc_data_direction = ses->data_direction;
 	scmd->sdb = ses->sdb;
 	scmd->result = ses->result;
+	scsi_set_resid(scmd, ses->resid);
 	scmd->underflow = ses->underflow;
 	scmd->prot_op = ses->prot_op;
 	scmd->eh_eflags = ses->eh_eflags;
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 3810b340551c..9caa9b262a32 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -32,6 +32,7 @@ extern int scsi_ioctl_reset(struct scsi_device *, int __user *);
 struct scsi_eh_save {
 	/* saved state */
 	int result;
+	unsigned int resid;
 	int eh_eflags;
 	enum dma_data_direction data_direction;
 	unsigned underflow;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] usb: Clear scsi command resid when residue is 0
  2019-09-26 22:08 [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling Damien Le Moal
  2019-09-26 22:08 ` [PATCH 1/2] scsi: save/restore command resid for error handling Damien Le Moal
@ 2019-09-26 22:08 ` Damien Le Moal
  2019-09-26 23:57 ` [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling Alan Stern
  2 siblings, 0 replies; 11+ messages in thread
From: Damien Le Moal @ 2019-09-26 22:08 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen, linux-usb, usb-storage,
	Alan Stern, Greg Kroah-Hartman
  Cc: Justin Piszcz

A scsi command terminated with CHECK CONDITION is hijacked to execute
request sense with the original command information preserved and
restored respectively with scsi_eh_prep_cmnd() and
scsi_eh_restore_cmnd() in usb_stor_invoke_transport(). This means that
if the original command is retried, the command resid may not be 0,
indicating the residual at the time of the previous failed execution. If
the command is retried and fully succeed, the residual of 0 should thus
be set, always, to avoid reporting to the upper layer an incorrect
non-zero residual.

Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/usb/storage/transport.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index 96cb0409dd89..c69fcbe467ef 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -1287,6 +1287,15 @@ int usb_stor_Bulk_transport(struct scsi_cmnd *srb, struct us_data *us)
 			scsi_set_resid(srb, max(scsi_get_resid(srb),
 			                                       (int) residue));
 		}
+	} else if (!residue) {
+		/*
+		 * The command may have been retried after a CHECK CONDITION
+		 * failure and autosense execution, which can result in resid
+		 * to indicate the residual at the time of the failure instead
+		 * of 0. Clear the resid here to indicate that the command
+		 * fully completed.
+		 */
+		scsi_set_resid(srb, 0);
 	}
 
 	/* based on the status code, we report good or bad */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling
  2019-09-26 22:08 [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling Damien Le Moal
  2019-09-26 22:08 ` [PATCH 1/2] scsi: save/restore command resid for error handling Damien Le Moal
  2019-09-26 22:08 ` [PATCH 2/2] usb: Clear scsi command resid when residue is 0 Damien Le Moal
@ 2019-09-26 23:57 ` Alan Stern
  2019-09-27  0:17   ` Damien Le Moal
  2019-09-27  0:54   ` Douglas Gilbert
  2 siblings, 2 replies; 11+ messages in thread
From: Alan Stern @ 2019-09-26 23:57 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, Martin K . Petersen, linux-usb, usb-storage,
	Greg Kroah-Hartman, Justin Piszcz

On Fri, 27 Sep 2019, Damien Le Moal wrote:

> If a non-passthrough command is terminated with a CHECK CONDITION, the
> scsi error recovery code reuses the failed command scsi_cmnd structure
> to process error recovery request sense. To preserve information
> regarding the failed command, the functions scsi_eh_prep_cmnd() and
> scsi_eh_restore_cmnd() respectively save and restore the original
> command information. However, the resid field of the failed command
> request structure is not preserved and reused for the request sense
> handling, leading to the original command having an incorrect resid
> when:
> A) The command is not retried and terminated with an error
> B) The command completes after retry and the underlying LLD does not set
>    resid for a fully completed command (resid=0)
> 
> The first patch of this series addresses case (A) above by adding resid
> as part of the command information saved using struct scsi_eh_save.

Good catch.

> Case B can be observed with a WD My Book USB disks when a read or write
> command is sent to the disk while the disk is in deep sleep mode
> (spun down) due to a long period of inactivity (~30min).
> In such case, the following command sequence happen:
> 1) The read or write command is terminated after a few seconds with
>    CHECK CONDITION and an asc/ascq of 04/01 (LOGICAL UNIT IS IN PROCESS
>    OF BECOMING READY)
> 2) In response to this failure, the USB mass storage driver triggers
>    autosense processing, reusing the command descriptor to issue a
>    request sense command with a 96B sense buffer size. The reply
>    to this command gives a NOT READY / LOGICAL UNIT IS IN PROCESS
>    OF BECOMING READY sense of 18B, resulting in a resid of 78B.
> 3) The original command is retried and failed again, with step 2
>    repeated, until the drive spins up and becomes ready.
> 4) When the original command completes after the drive has become ready,
>    the request sense command resid of 78B is seen by the scsi disk
>    driver sd_done() function and wrongly generates a warning about the
>    unaligned value reported.

But with the 1/1 patch in place, 4 won't happen any more, right?  
sd_done() will see the resid from the successful read or write.

> This problem is fixed in patch 2 by always setting a command resid to 0
> when there is no residual in usb_stor_Bulk_transport(). Note that
> usb_stor_CB_transport() does not need changes since usb_stor_bulk_srb()
> always sets the resid for a completed command, regardless of the
> residual value.

Exactly the same reasoning shows that usb_stor_Bulk_transport() also
does not need changes, doesn't it?  Which means that patch 2/2 is 
unnecessary.

Alan Stern

PS: The correct term is "residue", not "residual".  I know that the 
code sometimes uses the wrong word.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling
  2019-09-26 23:57 ` [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling Alan Stern
@ 2019-09-27  0:17   ` Damien Le Moal
  2019-09-27 16:34     ` Alan Stern
  2019-09-27  0:54   ` Douglas Gilbert
  1 sibling, 1 reply; 11+ messages in thread
From: Damien Le Moal @ 2019-09-27  0:17 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-scsi, Martin K . Petersen, linux-usb, usb-storage,
	Greg Kroah-Hartman, Justin Piszcz

On 2019/09/26 16:57, Alan Stern wrote:
> On Fri, 27 Sep 2019, Damien Le Moal wrote:
> 
>> If a non-passthrough command is terminated with a CHECK CONDITION, the
>> scsi error recovery code reuses the failed command scsi_cmnd structure
>> to process error recovery request sense. To preserve information
>> regarding the failed command, the functions scsi_eh_prep_cmnd() and
>> scsi_eh_restore_cmnd() respectively save and restore the original
>> command information. However, the resid field of the failed command
>> request structure is not preserved and reused for the request sense
>> handling, leading to the original command having an incorrect resid
>> when:
>> A) The command is not retried and terminated with an error
>> B) The command completes after retry and the underlying LLD does not set
>>    resid for a fully completed command (resid=0)
>>
>> The first patch of this series addresses case (A) above by adding resid
>> as part of the command information saved using struct scsi_eh_save.
> 
> Good catch.
> 
>> Case B can be observed with a WD My Book USB disks when a read or write
>> command is sent to the disk while the disk is in deep sleep mode
>> (spun down) due to a long period of inactivity (~30min).
>> In such case, the following command sequence happen:
>> 1) The read or write command is terminated after a few seconds with
>>    CHECK CONDITION and an asc/ascq of 04/01 (LOGICAL UNIT IS IN PROCESS
>>    OF BECOMING READY)
>> 2) In response to this failure, the USB mass storage driver triggers
>>    autosense processing, reusing the command descriptor to issue a
>>    request sense command with a 96B sense buffer size. The reply
>>    to this command gives a NOT READY / LOGICAL UNIT IS IN PROCESS
>>    OF BECOMING READY sense of 18B, resulting in a resid of 78B.
>> 3) The original command is retried and failed again, with step 2
>>    repeated, until the drive spins up and becomes ready.
>> 4) When the original command completes after the drive has become ready,
>>    the request sense command resid of 78B is seen by the scsi disk
>>    driver sd_done() function and wrongly generates a warning about the
>>    unaligned value reported.
> 
> But with the 1/1 patch in place, 4 won't happen any more, right?  
> sd_done() will see the resid from the successful read or write.

No it will not because the USB driver does not set the residue when the value
returned for the command is 0, that is, when the command succeeds after the disk
spins up. E.g., in my test, I used a 4096B read at LBA 0. With patch 1 only, I
see a residue of 4096 since the command is entirely failed with CHECK CONDITION
until the drive spins up and becomes ready, at which point the resid becomes 0
but is not set with scsi_set_resid(). Without patch 1 nor 2, I see a residue of
78B which is the residue from the request sense executed between retries of the
read command.

Patch 2 is needed to make sure that the residue is set to 0 when the command
fully completes. If there is no failure/request sense/retry dance, resid is
already 0 (initialized value) so there is no problem. The problem shows up only
for CHECK CONDITION + retry patterns. Tests confirm this. Only patch 1 does not
solve the problem.

>> This problem is fixed in patch 2 by always setting a command resid to 0
>> when there is no residual in usb_stor_Bulk_transport(). Note that
>> usb_stor_CB_transport() does not need changes since usb_stor_bulk_srb()
>> always sets the resid for a completed command, regardless of the
>> residual value.
> 
> Exactly the same reasoning shows that usb_stor_Bulk_transport() also
> does not need changes, doesn't it?  Which means that patch 2/2 is 
> unnecessary.

In usb_stor_Bulk_transport(), since scsi_set_resid() is only called under the

if (residue && !(us->fflags & US_FL_IGNORE_RESIDUE))

condition, resid is not set if it is 0.

Whereas in usb_stor_CB_transport(), through the call to usb_stor_bulk_srb(),
resid is always set, unconditionally, with:

scsi_set_resid(srb, scsi_bufflen(srb) - partial);

where partial is the command length for a fully completed command (the variable
name is misleading), leading to resid being set to 0 as needed for successful
commands.

Please let me know if I missed something. I can reproduce the problem 100% of
the time, even though it is painful due to the 30min wait between tests to wait
for the drive going to sleep (if I force a sleep power state, the problem does
not happen due to power control the drive USB bridge FW, I am guessing, which
will wake up the disk before the command is sent).

> 
> Alan Stern
> 
> PS: The correct term is "residue", not "residual".  I know that the 
> code sometimes uses the wrong word.

Thanks... I keep using resid but trying to write proper sentences, I mess up the
word extension :)


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling
  2019-09-26 23:57 ` [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling Alan Stern
  2019-09-27  0:17   ` Damien Le Moal
@ 2019-09-27  0:54   ` Douglas Gilbert
  2019-09-27 14:11     ` Alan Stern
  1 sibling, 1 reply; 11+ messages in thread
From: Douglas Gilbert @ 2019-09-27  0:54 UTC (permalink / raw)
  To: Alan Stern, Damien Le Moal
  Cc: linux-scsi, Martin K . Petersen, linux-usb, usb-storage,
	Greg Kroah-Hartman, Justin Piszcz

On 2019-09-26 7:57 p.m., Alan Stern wrote:
> On Fri, 27 Sep 2019, Damien Le Moal wrote:
> 
>> If a non-passthrough command is terminated with a CHECK CONDITION, the
>> scsi error recovery code reuses the failed command scsi_cmnd structure
>> to process error recovery request sense. To preserve information
>> regarding the failed command, the functions scsi_eh_prep_cmnd() and
>> scsi_eh_restore_cmnd() respectively save and restore the original
>> command information. However, the resid field of the failed command
>> request structure is not preserved and reused for the request sense
>> handling, leading to the original command having an incorrect resid
>> when:
>> A) The command is not retried and terminated with an error
>> B) The command completes after retry and the underlying LLD does not set
>>     resid for a fully completed command (resid=0)
>>
>> The first patch of this series addresses case (A) above by adding resid
>> as part of the command information saved using struct scsi_eh_save.
> 
> Good catch.
> 
>> Case B can be observed with a WD My Book USB disks when a read or write
>> command is sent to the disk while the disk is in deep sleep mode
>> (spun down) due to a long period of inactivity (~30min).
>> In such case, the following command sequence happen:
>> 1) The read or write command is terminated after a few seconds with
>>     CHECK CONDITION and an asc/ascq of 04/01 (LOGICAL UNIT IS IN PROCESS
>>     OF BECOMING READY)
>> 2) In response to this failure, the USB mass storage driver triggers
>>     autosense processing, reusing the command descriptor to issue a
>>     request sense command with a 96B sense buffer size. The reply
>>     to this command gives a NOT READY / LOGICAL UNIT IS IN PROCESS
>>     OF BECOMING READY sense of 18B, resulting in a resid of 78B.
>> 3) The original command is retried and failed again, with step 2
>>     repeated, until the drive spins up and becomes ready.
>> 4) When the original command completes after the drive has become ready,
>>     the request sense command resid of 78B is seen by the scsi disk
>>     driver sd_done() function and wrongly generates a warning about the
>>     unaligned value reported.
> 
> But with the 1/1 patch in place, 4 won't happen any more, right?
> sd_done() will see the resid from the successful read or write.
> 
>> This problem is fixed in patch 2 by always setting a command resid to 0
>> when there is no residual in usb_stor_Bulk_transport(). Note that
>> usb_stor_CB_transport() does not need changes since usb_stor_bulk_srb()
>> always sets the resid for a completed command, regardless of the
>> residual value.
> 
> Exactly the same reasoning shows that usb_stor_Bulk_transport() also
> does not need changes, doesn't it?  Which means that patch 2/2 is
> unnecessary.
> 
> Alan Stern
> 
> PS: The correct term is "residue", not "residual".  I know that the
> code sometimes uses the wrong word.

Digging into my T10 document archive I found this cam3r03.pdf :

   − cam_resid;
     The data residual length member contains the difference in twos
     complement form of the number of data bytes transferred by the
     HA for the SCSI command compared with the number of bytes
     requested by the CCB cam_dxfer_len member. This is calculated
     by the total number of bytes requested to be transferred by the
     CCB minus the actual number of bytes transferred by the HA.

CAM is a now withdrawn T10 standard from the 1990s that was influential
at the time. FreeBSD's SCSI subsystem is (still) based on CAM.

For a more recent standard/draft there is fcp5r00.pdf that uses the
term "residual value" when defining its fcp_resid.

The only reference to the term "residue" that I found is in CAM: an
optional message: IGNORE WIDE RESIDUE .

So I would leave the naming up to the patch author. It is pretty
clear what is being referred to in either case.

Doug Gilbert


P.S. I prefer "residual" because it is more flexible being both
an adjective and a noun.
[Ref: https://www.lexico.com/en/definition/residual]



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling
  2019-09-27  0:54   ` Douglas Gilbert
@ 2019-09-27 14:11     ` Alan Stern
  2019-09-27 23:33       ` Finn Thain
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2019-09-27 14:11 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: Damien Le Moal, linux-scsi, Martin K . Petersen, linux-usb,
	usb-storage, Greg Kroah-Hartman, Justin Piszcz

On Thu, 26 Sep 2019, Douglas Gilbert wrote:

> On 2019-09-26 7:57 p.m., Alan Stern wrote:

> > PS: The correct term is "residue", not "residual".  I know that the
> > code sometimes uses the wrong word.
> 
> Digging into my T10 document archive I found this cam3r03.pdf :
> 
>    − cam_resid;
>      The data residual length member contains the difference in twos
>      complement form of the number of data bytes transferred by the
>      HA for the SCSI command compared with the number of bytes
>      requested by the CCB cam_dxfer_len member. This is calculated
>      by the total number of bytes requested to be transferred by the
>      CCB minus the actual number of bytes transferred by the HA.
> 
> CAM is a now withdrawn T10 standard from the 1990s that was influential
> at the time. FreeBSD's SCSI subsystem is (still) based on CAM.

I was going by my old copy of X3T9.2 Project 375D Rev 10L, Working 
Draft, "Information technology - Small Computer System Interface - 2", 
from back in 1996.  Yes, it's thoroughly out of date, but you would 
think that the nomenclature would still be accurate.  At any rate, it 
includes 24 instances of the word "residue" and no instances of 
"residual".

On the other hand, my copy of X3T10/792D Rev 12b, draft proposed, 
"Information technology - SCSI-2 Common access method transport and 
SCSI interface module", dated 1995, contains 24 instances of "residual" 
and only 2 instances of "residue".

So I guess this was never defined precisely.

> For a more recent standard/draft there is fcp5r00.pdf that uses the
> term "residual value" when defining its fcp_resid.
> 
> The only reference to the term "residue" that I found is in CAM: an
> optional message: IGNORE WIDE RESIDUE .
> 
> So I would leave the naming up to the patch author. It is pretty
> clear what is being referred to in either case.

True enough.

Alan Stern

> Doug Gilbert
> 
> 
> P.S. I prefer "residual" because it is more flexible being both
> an adjective and a noun.
> [Ref: https://www.lexico.com/en/definition/residual]


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling
  2019-09-27  0:17   ` Damien Le Moal
@ 2019-09-27 16:34     ` Alan Stern
  2019-09-27 20:51       ` Damien Le Moal
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2019-09-27 16:34 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, Martin K . Petersen, linux-usb, usb-storage,
	Greg Kroah-Hartman, Justin Piszcz

On Fri, 27 Sep 2019, Damien Le Moal wrote:

> On 2019/09/26 16:57, Alan Stern wrote:
> > On Fri, 27 Sep 2019, Damien Le Moal wrote:
> > 
> >> If a non-passthrough command is terminated with a CHECK CONDITION, the
> >> scsi error recovery code reuses the failed command scsi_cmnd structure
> >> to process error recovery request sense. To preserve information
> >> regarding the failed command, the functions scsi_eh_prep_cmnd() and
> >> scsi_eh_restore_cmnd() respectively save and restore the original
> >> command information. However, the resid field of the failed command
> >> request structure is not preserved and reused for the request sense
> >> handling, leading to the original command having an incorrect resid
> >> when:
> >> A) The command is not retried and terminated with an error
> >> B) The command completes after retry and the underlying LLD does not set
> >>    resid for a fully completed command (resid=0)
> >>
> >> The first patch of this series addresses case (A) above by adding resid
> >> as part of the command information saved using struct scsi_eh_save.
> > 
> > Good catch.
> > 
> >> Case B can be observed with a WD My Book USB disks when a read or write
> >> command is sent to the disk while the disk is in deep sleep mode
> >> (spun down) due to a long period of inactivity (~30min).
> >> In such case, the following command sequence happen:
> >> 1) The read or write command is terminated after a few seconds with
> >>    CHECK CONDITION and an asc/ascq of 04/01 (LOGICAL UNIT IS IN PROCESS
> >>    OF BECOMING READY)
> >> 2) In response to this failure, the USB mass storage driver triggers
> >>    autosense processing, reusing the command descriptor to issue a
> >>    request sense command with a 96B sense buffer size. The reply
> >>    to this command gives a NOT READY / LOGICAL UNIT IS IN PROCESS
> >>    OF BECOMING READY sense of 18B, resulting in a resid of 78B.
> >> 3) The original command is retried and failed again, with step 2
> >>    repeated, until the drive spins up and becomes ready.
> >> 4) When the original command completes after the drive has become ready,
> >>    the request sense command resid of 78B is seen by the scsi disk
> >>    driver sd_done() function and wrongly generates a warning about the
> >>    unaligned value reported.
> > 
> > But with the 1/1 patch in place, 4 won't happen any more, right?  
> > sd_done() will see the resid from the successful read or write.
> 
> No it will not because the USB driver does not set the residue when the value
> returned for the command is 0, that is, when the command succeeds after the disk
> spins up. E.g., in my test, I used a 4096B read at LBA 0. With patch 1 only, I
> see a residue of 4096 since the command is entirely failed with CHECK CONDITION
> until the drive spins up and becomes ready, at which point the resid becomes 0
> but is not set with scsi_set_resid(). Without patch 1 nor 2, I see a residue of
> 78B which is the residue from the request sense executed between retries of the
> read command.

I don't doubt your results.  But you need to do a better job of 
explaining how the existing code leads to those results.

For example, you said the driver does not set the command's residue 
when the command succeeds.  But in fact, one of the first things 
usb_stor_invoke_transport() does -- before it begins transmitting the 
command -- is

	scsi_set_resid(srb, 0);

So if the command is successful, even if the transport layer routine 
doesn't set the residue explicitly, the value should still be 0.

So if you see the residue not getting set properly, you should explain 
exactly how that happens.

> Patch 2 is needed to make sure that the residue is set to 0 when the command
> fully completes. If there is no failure/request sense/retry dance, resid is
> already 0 (initialized value) so there is no problem. The problem shows up only
> for CHECK CONDITION + retry patterns. Tests confirm this. Only patch 1 does not
> solve the problem.

But the residue gets reset to 0 every time before the command is
retried, doesn't it?

Unless for your device, the driver doesn't run 
usb_stor_invoke_transport().  The only way that can happen is if you 
use one of the modular subdrivers -- in which case the bug lies in that 
subdriver and should be fixed there, not in transport.c.

> >> This problem is fixed in patch 2 by always setting a command resid to 0
> >> when there is no residual in usb_stor_Bulk_transport(). Note that
> >> usb_stor_CB_transport() does not need changes since usb_stor_bulk_srb()
> >> always sets the resid for a completed command, regardless of the
> >> residual value.
> > 
> > Exactly the same reasoning shows that usb_stor_Bulk_transport() also
> > does not need changes, doesn't it?  Which means that patch 2/2 is 
> > unnecessary.
> 
> In usb_stor_Bulk_transport(), since scsi_set_resid() is only called under the
> 
> if (residue && !(us->fflags & US_FL_IGNORE_RESIDUE))
> 
> condition, resid is not set if it is 0.

But usb_stor_Bulk_transport() calls usb_stor_bulk_srb(), which does set
the residue, as you realize:

> Whereas in usb_stor_CB_transport(), through the call to usb_stor_bulk_srb(),
> resid is always set, unconditionally, with:
> 
> scsi_set_resid(srb, scsi_bufflen(srb) - partial);
> 
> where partial is the command length for a fully completed command (the variable
> name is misleading), leading to resid being set to 0 as needed for successful
> commands.
> 
> Please let me know if I missed something. I can reproduce the problem 100% of
> the time, even though it is painful due to the 30min wait between tests to wait
> for the drive going to sleep (if I force a sleep power state, the problem does
> not happen due to power control the drive USB bridge FW, I am guessing, which
> will wake up the disk before the command is sent).

Please investigate a little more closely into what's going wrong and
present more details so that others can understand it too.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] scsi: save/restore command resid for error handling
  2019-09-26 22:08 ` [PATCH 1/2] scsi: save/restore command resid for error handling Damien Le Moal
@ 2019-09-27 19:20   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2019-09-27 19:20 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, Martin K . Petersen, linux-usb, usb-storage,
	Alan Stern, Greg Kroah-Hartman, Justin Piszcz

On Fri, Sep 27, 2019 at 07:08:43AM +0900, Damien Le Moal wrote:
> When a command is terminated with CHECK CONDITION and request sense
> executed by hijacking the command descriptor, the original command resid
> is lost and replaced with the resid from the execution of request sense.
> If based on the obtained sense data the command is aborted and not
> retried, the resid that will be seen by drivers such as sd will be the
> resid of the request sense execution and not the value from the original
> command failure. Make sure this does not happen by adding resid as part
> of the command information saved using struct scsi_eh_save.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/scsi/scsi_error.c | 2 ++
>  include/scsi/scsi_eh.h    | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 1c470e31ae81..d4ac13979189 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -967,6 +967,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
>  	ses->data_direction = scmd->sc_data_direction;
>  	ses->sdb = scmd->sdb;
>  	ses->result = scmd->result;
> +	ses->resid = scsi_get_resid(scmd);

Don't we also need to reset the resid to 0 here as we do in the
queuecommand path?  That would take care of what you are trying to do
for usb-storage in the next patch in generic code.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling
  2019-09-27 16:34     ` Alan Stern
@ 2019-09-27 20:51       ` Damien Le Moal
  0 siblings, 0 replies; 11+ messages in thread
From: Damien Le Moal @ 2019-09-27 20:51 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-scsi, Martin K . Petersen, linux-usb, usb-storage,
	Greg Kroah-Hartman, Justin Piszcz

On 2019/09/27 9:34, Alan Stern wrote:
> On Fri, 27 Sep 2019, Damien Le Moal wrote:
> 
>> On 2019/09/26 16:57, Alan Stern wrote:
>>> On Fri, 27 Sep 2019, Damien Le Moal wrote:
>>>
>>>> If a non-passthrough command is terminated with a CHECK CONDITION, the
>>>> scsi error recovery code reuses the failed command scsi_cmnd structure
>>>> to process error recovery request sense. To preserve information
>>>> regarding the failed command, the functions scsi_eh_prep_cmnd() and
>>>> scsi_eh_restore_cmnd() respectively save and restore the original
>>>> command information. However, the resid field of the failed command
>>>> request structure is not preserved and reused for the request sense
>>>> handling, leading to the original command having an incorrect resid
>>>> when:
>>>> A) The command is not retried and terminated with an error
>>>> B) The command completes after retry and the underlying LLD does not set
>>>>    resid for a fully completed command (resid=0)
>>>>
>>>> The first patch of this series addresses case (A) above by adding resid
>>>> as part of the command information saved using struct scsi_eh_save.
>>>
>>> Good catch.
>>>
>>>> Case B can be observed with a WD My Book USB disks when a read or write
>>>> command is sent to the disk while the disk is in deep sleep mode
>>>> (spun down) due to a long period of inactivity (~30min).
>>>> In such case, the following command sequence happen:
>>>> 1) The read or write command is terminated after a few seconds with
>>>>    CHECK CONDITION and an asc/ascq of 04/01 (LOGICAL UNIT IS IN PROCESS
>>>>    OF BECOMING READY)
>>>> 2) In response to this failure, the USB mass storage driver triggers
>>>>    autosense processing, reusing the command descriptor to issue a
>>>>    request sense command with a 96B sense buffer size. The reply
>>>>    to this command gives a NOT READY / LOGICAL UNIT IS IN PROCESS
>>>>    OF BECOMING READY sense of 18B, resulting in a resid of 78B.
>>>> 3) The original command is retried and failed again, with step 2
>>>>    repeated, until the drive spins up and becomes ready.
>>>> 4) When the original command completes after the drive has become ready,
>>>>    the request sense command resid of 78B is seen by the scsi disk
>>>>    driver sd_done() function and wrongly generates a warning about the
>>>>    unaligned value reported.
>>>
>>> But with the 1/1 patch in place, 4 won't happen any more, right?  
>>> sd_done() will see the resid from the successful read or write.
>>
>> No it will not because the USB driver does not set the residue when the value
>> returned for the command is 0, that is, when the command succeeds after the disk
>> spins up. E.g., in my test, I used a 4096B read at LBA 0. With patch 1 only, I
>> see a residue of 4096 since the command is entirely failed with CHECK CONDITION
>> until the drive spins up and becomes ready, at which point the resid becomes 0
>> but is not set with scsi_set_resid(). Without patch 1 nor 2, I see a residue of
>> 78B which is the residue from the request sense executed between retries of the
>> read command.
> 
> I don't doubt your results.  But you need to do a better job of 
> explaining how the existing code leads to those results.
> 
> For example, you said the driver does not set the command's residue 
> when the command succeeds.  But in fact, one of the first things 
> usb_stor_invoke_transport() does -- before it begins transmitting the 
> command -- is
> 
> 	scsi_set_resid(srb, 0);
> 
> So if the command is successful, even if the transport layer routine 
> doesn't set the residue explicitly, the value should still be 0.

Yes, you are right. I missed that one. Patch 2 is not needed at all.

> So if you see the residue not getting set properly, you should explain 
> exactly how that happens.

I got confused in my analysis because I missed that the actual timing of
sd_done() execution is after every command retry+request sense pair, that is,
what usb_stor_invoke_transport() does. Without patch 1, the resid is wrong on
completion of the initial command because of the request sense, nothing to do
with resid not being initialized.
W
ith patch 1, sd_done() sees a resid equal to the entire command size in case of
failure, and 0 (as set by usb_stor_invoke_transport()) in case of success.

So as you suggested, only patch 1 is necessary. I will resend it with the
additional resid initialization for autosense for security, since that could be
a problem for a sense request that has no residue (the request sense is issued
directly using us->transport(), not using usb_stor_invoke_transport()).

>> Patch 2 is needed to make sure that the residue is set to 0 when the command
>> fully completes. If there is no failure/request sense/retry dance, resid is
>> already 0 (initialized value) so there is no problem. The problem shows up only
>> for CHECK CONDITION + retry patterns. Tests confirm this. Only patch 1 does not
>> solve the problem.
> 
> But the residue gets reset to 0 every time before the command is
> retried, doesn't it?

Yes it is. Checked.

> Unless for your device, the driver doesn't run 
> usb_stor_invoke_transport().  The only way that can happen is if you 
> use one of the modular subdrivers -- in which case the bug lies in that 
> subdriver and should be fixed there, not in transport.c.

The device is using usb_storage (Bulk transport, scsi transparent protocol). But
the problem is higher up in the stack anyway.

>>>> This problem is fixed in patch 2 by always setting a command resid to 0
>>>> when there is no residual in usb_stor_Bulk_transport(). Note that
>>>> usb_stor_CB_transport() does not need changes since usb_stor_bulk_srb()
>>>> always sets the resid for a completed command, regardless of the
>>>> residual value.
>>>
>>> Exactly the same reasoning shows that usb_stor_Bulk_transport() also
>>> does not need changes, doesn't it?  Which means that patch 2/2 is 
>>> unnecessary.
>>
>> In usb_stor_Bulk_transport(), since scsi_set_resid() is only called under the
>>
>> if (residue && !(us->fflags & US_FL_IGNORE_RESIDUE))
>>
>> condition, resid is not set if it is 0.
> 
> But usb_stor_Bulk_transport() calls usb_stor_bulk_srb(), which does set
> the residue, as you realize:

Yes, absolutely. Checked !

>> Whereas in usb_stor_CB_transport(), through the call to usb_stor_bulk_srb(),
>> resid is always set, unconditionally, with:
>>
>> scsi_set_resid(srb, scsi_bufflen(srb) - partial);
>>
>> where partial is the command length for a fully completed command (the variable
>> name is misleading), leading to resid being set to 0 as needed for successful
>> commands.
>>
>> Please let me know if I missed something. I can reproduce the problem 100% of
>> the time, even though it is painful due to the 30min wait between tests to wait
>> for the drive going to sleep (if I force a sleep power state, the problem does
>> not happen due to power control the drive USB bridge FW, I am guessing, which
>> will wake up the disk before the command is sent).
> 
> Please investigate a little more closely into what's going wrong and
> present more details so that others can understand it too.

I did and realized it was my mistake. USB is not to blame. Only handling of
resid in save/restore of struct scsi_eh_save.

Thanks for your comments !

Best regards.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling
  2019-09-27 14:11     ` Alan Stern
@ 2019-09-27 23:33       ` Finn Thain
  0 siblings, 0 replies; 11+ messages in thread
From: Finn Thain @ 2019-09-27 23:33 UTC (permalink / raw)
  To: Alan Stern
  Cc: Douglas Gilbert, Damien Le Moal, linux-scsi, Martin K . Petersen,
	linux-usb, usb-storage, Greg Kroah-Hartman, Justin Piszcz

[snipped selective measurements of word usage]

On Fri, 27 Sep 2019, Alan Stern wrote:

> 
> So I guess this was never defined precisely.
> 

The O.E.D. defines these terms:

	residual
	a. [...]
	n. 1. a quantity remaining after other things have been subtracted 
	      or allowed for. [...]
	   2. [...]
	   3. [...]

and

	residue
	n. a small amount of something that remains after the main part 
	   has gone or been taken or used.


So, writing "residue" could be misleading as the connotation is that the 
remaining part is the lesser part.

I think that's probably why the term "residual" gets used in finance (and 
in SCSI transfers).

-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-09-27 23:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-26 22:08 [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling Damien Le Moal
2019-09-26 22:08 ` [PATCH 1/2] scsi: save/restore command resid for error handling Damien Le Moal
2019-09-27 19:20   ` Christoph Hellwig
2019-09-26 22:08 ` [PATCH 2/2] usb: Clear scsi command resid when residue is 0 Damien Le Moal
2019-09-26 23:57 ` [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling Alan Stern
2019-09-27  0:17   ` Damien Le Moal
2019-09-27 16:34     ` Alan Stern
2019-09-27 20:51       ` Damien Le Moal
2019-09-27  0:54   ` Douglas Gilbert
2019-09-27 14:11     ` Alan Stern
2019-09-27 23:33       ` Finn Thain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.