All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: Fix failed request error code
@ 2018-04-04  6:51 Damien Le Moal
  2018-04-04  6:57 ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Le Moal @ 2018-04-04  6:51 UTC (permalink / raw)
  To: linux-scsi, Martin K . Petersen
  Cc: Bart Van Assche, Hannes Reinecke, Hannes Reinecke, stable

With the introduction of commit e39a97353e53 ("scsi: core: return
BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command that
failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but lacking
additional sense information will have a return code set to BLK_STS_OK.
This results in the request issuer to see successful request execution
despite the failure. An example of such case is an unaligned write on a
host managed ZAC disk connected to a SAS HBA with a malfunctioning SAT.
The unaligned write command gets aborted but has no additional sense
information.

sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current]
sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense information
sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
print_req_error: I/O error, dev sde, sector 274726920

In scsi_io_completion(), sense key handling to not change the request
error code and success being reported to the issuer.

Fix this by making sure that the error code always indicates an error
if scsi_io_completion() decide that the action to be taken for a failed
command is to not retry it and terminate it immediately (ACTION_FAIL) .

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()")
Cc: Hannes Reinecke <hare@suse.com>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi_lib.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c84f931388f2..87579bfcc186 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				scsi_print_command(cmd);
 			}
 		}
+		/*
+		 * The command failed and should not be retried. If the host
+		 * byte is DID_OK, then __scsi_error_from_host_byte() returned
+		 * BLK_STS_OK and error indicates a success. Make sure to not
+		 * use that as the completion code and always return an
+		 * I/O error.
+		 */
+		if (error == BLK_STS_OK)
+			error = BLK_STS_IOERR;
 		if (!scsi_end_request(req, error, blk_rq_err_bytes(req), 0))
 			return;
 		/*FALLTHRU*/
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: Fix failed request error code
  2018-04-04  6:51 [PATCH] scsi: Fix failed request error code Damien Le Moal
@ 2018-04-04  6:57 ` Hannes Reinecke
  2018-04-04  7:06   ` Damien Le Moal
  0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2018-04-04  6:57 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, Martin K . Petersen, Bart Van Assche,
	Hannes Reinecke, stable

On Wed,  4 Apr 2018 15:51:38 +0900
Damien Le Moal <damien.lemoal@wdc.com> wrote:

> With the introduction of commit e39a97353e53 ("scsi: core: return
> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command
> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but
> lacking additional sense information will have a return code set to
> BLK_STS_OK. This results in the request issuer to see successful
> request execution despite the failure. An example of such case is an
> unaligned write on a host managed ZAC disk connected to a SAS HBA
> with a malfunctioning SAT. The unaligned write command gets aborted
> but has no additional sense information.
> 
> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key :
> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense: No
> additional sense information sd 10:0:0:0: [sde] tag#3905 CDB:
> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
> print_req_error: I/O error, dev sde, sector 274726920
> 
> In scsi_io_completion(), sense key handling to not change the request
> error code and success being reported to the issuer.
> 
> Fix this by making sure that the error code always indicates an error
> if scsi_io_completion() decide that the action to be taken for a
> failed command is to not retry it and terminate it immediately
> (ACTION_FAIL) .
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in
> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@suse.com>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/scsi/scsi_lib.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index c84f931388f2..87579bfcc186 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd *cmd,
> unsigned int good_bytes) scsi_print_command(cmd);
>  			}
>  		}
> +		/*
> +		 * The command failed and should not be retried. If
> the host
> +		 * byte is DID_OK, then
> __scsi_error_from_host_byte() returned
> +		 * BLK_STS_OK and error indicates a success. Make
> sure to not
> +		 * use that as the completion code and always return
> an
> +		 * I/O error.
> +		 */
> +		if (error == BLK_STS_OK)
> +			error = BLK_STS_IOERR;
>  		if (!scsi_end_request(req, error,
> blk_rq_err_bytes(req), 0)) return;
>  		/*FALLTHRU*/

That looks wrong.
Shouldn't __scsi_error_from_host_byte() return the correct status here?

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: Fix failed request error code
  2018-04-04  6:57 ` Hannes Reinecke
@ 2018-04-04  7:06   ` Damien Le Moal
  2018-04-04  7:35     ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Le Moal @ 2018-04-04  7:06 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, Martin K . Petersen, Bart Van Assche,
	Hannes Reinecke, stable

Hannes,

On 4/4/18 15:57, Hannes Reinecke wrote:
> On Wed,  4 Apr 2018 15:51:38 +0900
> Damien Le Moal <damien.lemoal@wdc.com> wrote:
> 
>> With the introduction of commit e39a97353e53 ("scsi: core: return
>> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command
>> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but
>> lacking additional sense information will have a return code set to
>> BLK_STS_OK. This results in the request issuer to see successful
>> request execution despite the failure. An example of such case is an
>> unaligned write on a host managed ZAC disk connected to a SAS HBA
>> with a malfunctioning SAT. The unaligned write command gets aborted
>> but has no additional sense information.
>>
>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
>> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key :
>> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense: No
>> additional sense information sd 10:0:0:0: [sde] tag#3905 CDB:
>> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
>> print_req_error: I/O error, dev sde, sector 274726920
>>
>> In scsi_io_completion(), sense key handling to not change the request
>> error code and success being reported to the issuer.
>>
>> Fix this by making sure that the error code always indicates an error
>> if scsi_io_completion() decide that the action to be taken for a
>> failed command is to not retry it and terminate it immediately
>> (ACTION_FAIL) .
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in
>> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@suse.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>>  drivers/scsi/scsi_lib.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index c84f931388f2..87579bfcc186 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd *cmd,
>> unsigned int good_bytes) scsi_print_command(cmd);
>>  			}
>>  		}
>> +		/*
>> +		 * The command failed and should not be retried. If
>> the host
>> +		 * byte is DID_OK, then
>> __scsi_error_from_host_byte() returned
>> +		 * BLK_STS_OK and error indicates a success. Make
>> sure to not
>> +		 * use that as the completion code and always return
>> an
>> +		 * I/O error.
>> +		 */
>> +		if (error == BLK_STS_OK)
>> +			error = BLK_STS_IOERR;
>>  		if (!scsi_end_request(req, error,
>> blk_rq_err_bytes(req), 0)) return;
>>  		/*FALLTHRU*/
> 
> That looks wrong.
> Shouldn't __scsi_error_from_host_byte() return the correct status here?

My drive said:

sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current]
sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense information
sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00 00 02 0c 00 01
00 00 00 01 00 00

Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns
BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND
case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing and
error stays equal to success. scsi_end_request() gets called with that
and dd sees a success...

There are also plenty of other sense keys cases where error is not
changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I
think this is likely the most common case since an command failure with
hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most common one.

My patch is a bit of a hammer and makes sure that an ACTION_FAIL request
is completed as a failure... Am I getting all this wrong ?

Best.

-- 
Damien Le Moal,
Western Digital

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: Fix failed request error code
  2018-04-04  7:06   ` Damien Le Moal
@ 2018-04-04  7:35     ` Hannes Reinecke
  2018-04-04  7:39       ` Damien Le Moal
  0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2018-04-04  7:35 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-scsi, Martin K . Petersen, Bart Van Assche,
	Hannes Reinecke, stable

On Wed, 4 Apr 2018 07:06:58 +0000
Damien Le Moal <Damien.LeMoal@wdc.com> wrote:

> Hannes,
> 
> On 4/4/18 15:57, Hannes Reinecke wrote:
> > On Wed,  4 Apr 2018 15:51:38 +0900
> > Damien Le Moal <damien.lemoal@wdc.com> wrote:
> >   
> >> With the introduction of commit e39a97353e53 ("scsi: core: return
> >> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command
> >> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but
> >> lacking additional sense information will have a return code set to
> >> BLK_STS_OK. This results in the request issuer to see successful
> >> request execution despite the failure. An example of such case is
> >> an unaligned write on a host managed ZAC disk connected to a SAS
> >> HBA with a malfunctioning SAT. The unaligned write command gets
> >> aborted but has no additional sense information.
> >>
> >> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
> >> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key :
> >> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense:
> >> No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB:
> >> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
> >> print_req_error: I/O error, dev sde, sector 274726920
> >>
> >> In scsi_io_completion(), sense key handling to not change the
> >> request error code and success being reported to the issuer.
> >>
> >> Fix this by making sure that the error code always indicates an
> >> error if scsi_io_completion() decide that the action to be taken
> >> for a failed command is to not retry it and terminate it
> >> immediately (ACTION_FAIL) .
> >>
> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> >> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in
> >> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@suse.com>
> >> Cc: <stable@vger.kernel.org>
> >> ---
> >>  drivers/scsi/scsi_lib.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> index c84f931388f2..87579bfcc186 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd
> >> *cmd, unsigned int good_bytes) scsi_print_command(cmd);
> >>  			}
> >>  		}
> >> +		/*
> >> +		 * The command failed and should not be retried.
> >> If the host
> >> +		 * byte is DID_OK, then
> >> __scsi_error_from_host_byte() returned
> >> +		 * BLK_STS_OK and error indicates a success. Make
> >> sure to not
> >> +		 * use that as the completion code and always
> >> return an
> >> +		 * I/O error.
> >> +		 */
> >> +		if (error == BLK_STS_OK)
> >> +			error = BLK_STS_IOERR;
> >>  		if (!scsi_end_request(req, error,
> >> blk_rq_err_bytes(req), 0)) return;
> >>  		/*FALLTHRU*/  
> > 
> > That looks wrong.
> > Shouldn't __scsi_error_from_host_byte() return the correct status
> > here?  
> 
> My drive said:
> 
> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current]
> sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense
> information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00
> 00 02 0c 00 01 00 00 00 01 00 00
> 
> Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns
> BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND
> case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing
> and error stays equal to success. scsi_end_request() gets called with
> that and dd sees a success...
> 
> There are also plenty of other sense keys cases where error is not
> changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I
> think this is likely the most common case since an command failure
> with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most
> common one.
> 
> My patch is a bit of a hammer and makes sure that an ACTION_FAIL
> request is completed as a failure... Am I getting all this wrong ?
> 

Just checked with the code, and send a new patch.
Which I find a bit clearer.

Fact is that __scsi_error_from_host_byte() is not sufficient to
evaluate the return code, so taking its return value with any further
checks is indeed wrong.
But then it never said so on the boilerplate of
__scsi_error_from_host_byte() ...

So please do check the 'v2' version of the patch.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: Fix failed request error code
  2018-04-04  7:35     ` Hannes Reinecke
@ 2018-04-04  7:39       ` Damien Le Moal
  2018-04-04  7:51         ` Damien Le Moal
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Le Moal @ 2018-04-04  7:39 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, Martin K . Petersen, Bart Van Assche,
	Hannes Reinecke, stable

Hannes,

On 4/4/18 16:35, Hannes Reinecke wrote:
> On Wed, 4 Apr 2018 07:06:58 +0000
> Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> 
>> Hannes,
>>
>> On 4/4/18 15:57, Hannes Reinecke wrote:
>>> On Wed,  4 Apr 2018 15:51:38 +0900
>>> Damien Le Moal <damien.lemoal@wdc.com> wrote:
>>>   
>>>> With the introduction of commit e39a97353e53 ("scsi: core: return
>>>> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command
>>>> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but
>>>> lacking additional sense information will have a return code set to
>>>> BLK_STS_OK. This results in the request issuer to see successful
>>>> request execution despite the failure. An example of such case is
>>>> an unaligned write on a host managed ZAC disk connected to a SAS
>>>> HBA with a malfunctioning SAT. The unaligned write command gets
>>>> aborted but has no additional sense information.
>>>>
>>>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
>>>> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key :
>>>> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense:
>>>> No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB:
>>>> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
>>>> print_req_error: I/O error, dev sde, sector 274726920
>>>>
>>>> In scsi_io_completion(), sense key handling to not change the
>>>> request error code and success being reported to the issuer.
>>>>
>>>> Fix this by making sure that the error code always indicates an
>>>> error if scsi_io_completion() decide that the action to be taken
>>>> for a failed command is to not retry it and terminate it
>>>> immediately (ACTION_FAIL) .
>>>>
>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in
>>>> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@suse.com>
>>>> Cc: <stable@vger.kernel.org>
>>>> ---
>>>>  drivers/scsi/scsi_lib.c | 9 +++++++++
>>>>  1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>> index c84f931388f2..87579bfcc186 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd
>>>> *cmd, unsigned int good_bytes) scsi_print_command(cmd);
>>>>  			}
>>>>  		}
>>>> +		/*
>>>> +		 * The command failed and should not be retried.
>>>> If the host
>>>> +		 * byte is DID_OK, then
>>>> __scsi_error_from_host_byte() returned
>>>> +		 * BLK_STS_OK and error indicates a success. Make
>>>> sure to not
>>>> +		 * use that as the completion code and always
>>>> return an
>>>> +		 * I/O error.
>>>> +		 */
>>>> +		if (error == BLK_STS_OK)
>>>> +			error = BLK_STS_IOERR;
>>>>  		if (!scsi_end_request(req, error,
>>>> blk_rq_err_bytes(req), 0)) return;
>>>>  		/*FALLTHRU*/  
>>>
>>> That looks wrong.
>>> Shouldn't __scsi_error_from_host_byte() return the correct status
>>> here?  
>>
>> My drive said:
>>
>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
>> driverbyte=DRIVER_SENSE
>> sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current]
>> sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense
>> information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00
>> 00 02 0c 00 01 00 00 00 01 00 00
>>
>> Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns
>> BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND
>> case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing
>> and error stays equal to success. scsi_end_request() gets called with
>> that and dd sees a success...
>>
>> There are also plenty of other sense keys cases where error is not
>> changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I
>> think this is likely the most common case since an command failure
>> with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most
>> common one.
>>
>> My patch is a bit of a hammer and makes sure that an ACTION_FAIL
>> request is completed as a failure... Am I getting all this wrong ?
>>
> 
> Just checked with the code, and send a new patch.
> Which I find a bit clearer.
> 
> Fact is that __scsi_error_from_host_byte() is not sufficient to
> evaluate the return code, so taking its return value with any further
> checks is indeed wrong.
> But then it never said so on the boilerplate of
> __scsi_error_from_host_byte() ...
> 
> So please do check the 'v2' version of the patch.
> 

It fixes the particular problem I am seeing.
But looking more at this code, isn't there a problem with
sshdr.sense_key == RECOVERED_ERROR ?

if (sense_valid && (sshdr.sense_key == RECOVERED_ERROR)) {
	...
	result = 0;
        /* for passthrough error may be set */
        error = BLK_STS_OK;
}

Then the error will be changed wrongly to BLK_STS_IOERR.

My original change had the same problem.


-- 
Damien Le Moal,
Western Digital

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] scsi: Fix failed request error code
  2018-04-04  7:39       ` Damien Le Moal
@ 2018-04-04  7:51         ` Damien Le Moal
  0 siblings, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2018-04-04  7:51 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, Martin K . Petersen, Bart Van Assche,
	Hannes Reinecke, stable



On 4/4/18 16:39, Damien Le Moal wrote:
> Hannes,
> 
> On 4/4/18 16:35, Hannes Reinecke wrote:
>> On Wed, 4 Apr 2018 07:06:58 +0000
>> Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
>>
>>> Hannes,
>>>
>>> On 4/4/18 15:57, Hannes Reinecke wrote:
>>>> On Wed,  4 Apr 2018 15:51:38 +0900
>>>> Damien Le Moal <damien.lemoal@wdc.com> wrote:
>>>>   
>>>>> With the introduction of commit e39a97353e53 ("scsi: core: return
>>>>> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command
>>>>> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but
>>>>> lacking additional sense information will have a return code set to
>>>>> BLK_STS_OK. This results in the request issuer to see successful
>>>>> request execution despite the failure. An example of such case is
>>>>> an unaligned write on a host managed ZAC disk connected to a SAS
>>>>> HBA with a malfunctioning SAT. The unaligned write command gets
>>>>> aborted but has no additional sense information.
>>>>>
>>>>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
>>>>> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key :
>>>>> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense:
>>>>> No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB:
>>>>> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
>>>>> print_req_error: I/O error, dev sde, sector 274726920
>>>>>
>>>>> In scsi_io_completion(), sense key handling to not change the
>>>>> request error code and success being reported to the issuer.
>>>>>
>>>>> Fix this by making sure that the error code always indicates an
>>>>> error if scsi_io_completion() decide that the action to be taken
>>>>> for a failed command is to not retry it and terminate it
>>>>> immediately (ACTION_FAIL) .
>>>>>
>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>>> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in
>>>>> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@suse.com>
>>>>> Cc: <stable@vger.kernel.org>
>>>>> ---
>>>>>  drivers/scsi/scsi_lib.c | 9 +++++++++
>>>>>  1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>> index c84f931388f2..87579bfcc186 100644
>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd
>>>>> *cmd, unsigned int good_bytes) scsi_print_command(cmd);
>>>>>  			}
>>>>>  		}
>>>>> +		/*
>>>>> +		 * The command failed and should not be retried.
>>>>> If the host
>>>>> +		 * byte is DID_OK, then
>>>>> __scsi_error_from_host_byte() returned
>>>>> +		 * BLK_STS_OK and error indicates a success. Make
>>>>> sure to not
>>>>> +		 * use that as the completion code and always
>>>>> return an
>>>>> +		 * I/O error.
>>>>> +		 */
>>>>> +		if (error == BLK_STS_OK)
>>>>> +			error = BLK_STS_IOERR;
>>>>>  		if (!scsi_end_request(req, error,
>>>>> blk_rq_err_bytes(req), 0)) return;
>>>>>  		/*FALLTHRU*/  
>>>>
>>>> That looks wrong.
>>>> Shouldn't __scsi_error_from_host_byte() return the correct status
>>>> here?  
>>>
>>> My drive said:
>>>
>>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
>>> driverbyte=DRIVER_SENSE
>>> sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current]
>>> sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense
>>> information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00
>>> 00 02 0c 00 01 00 00 00 01 00 00
>>>
>>> Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns
>>> BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND
>>> case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing
>>> and error stays equal to success. scsi_end_request() gets called with
>>> that and dd sees a success...
>>>
>>> There are also plenty of other sense keys cases where error is not
>>> changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I
>>> think this is likely the most common case since an command failure
>>> with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most
>>> common one.
>>>
>>> My patch is a bit of a hammer and makes sure that an ACTION_FAIL
>>> request is completed as a failure... Am I getting all this wrong ?
>>>
>>
>> Just checked with the code, and send a new patch.
>> Which I find a bit clearer.
>>
>> Fact is that __scsi_error_from_host_byte() is not sufficient to
>> evaluate the return code, so taking its return value with any further
>> checks is indeed wrong.
>> But then it never said so on the boilerplate of
>> __scsi_error_from_host_byte() ...
>>
>> So please do check the 'v2' version of the patch.
>>
> 
> It fixes the particular problem I am seeing.
> But looking more at this code, isn't there a problem with
> sshdr.sense_key == RECOVERED_ERROR ?
> 
> if (sense_valid && (sshdr.sense_key == RECOVERED_ERROR)) {
> 	...
> 	result = 0;
>         /* for passthrough error may be set */
>         error = BLK_STS_OK;
> }
> 
> Then the error will be changed wrongly to BLK_STS_IOERR.
> 
> My original change had the same problem.

Ignore. It does not go through the change in that case thanks to:
	if (!(blk_rq_bytes(req) == 0 && error) &&
            !scsi_end_request(req, error, good_bytes, 0))

                return;
Assuming the RECOVERED_ERROR has no left over bytes. Otherwise, it goes
through requeue. Both cases being before the error code overwrite, no
problem.

Thanks !

-- 
Damien Le Moal,
Western Digital

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-04  7:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-04  6:51 [PATCH] scsi: Fix failed request error code Damien Le Moal
2018-04-04  6:57 ` Hannes Reinecke
2018-04-04  7:06   ` Damien Le Moal
2018-04-04  7:35     ` Hannes Reinecke
2018-04-04  7:39       ` Damien Le Moal
2018-04-04  7:51         ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.