All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Kepplinger <martin.kepplinger@puri.sm>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Can Guo <cang@codeaurora.org>,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel@puri.sm
Subject: Re: [PATCH] scsi: sd: add runtime pm to open / release
Date: Sun, 9 Aug 2020 11:20:22 +0200	[thread overview]
Message-ID: <d3b6f7b8-5345-1ae1-4f79-5dde226e74f1@puri.sm> (raw)
In-Reply-To: <20200808150542.GB256751@rowland.harvard.edu>

On 08.08.20 17:05, Alan Stern wrote:
> On Sat, Aug 08, 2020 at 08:59:09AM +0200, Martin Kepplinger wrote:
>> On 07.08.20 16:30, Alan Stern wrote:
>>> On Fri, Aug 07, 2020 at 11:51:21AM +0200, Martin Kepplinger wrote:
>>>> it's really strange: below is the change I'm trying. Of course that's
>>>> only for testing the functionality, nothing how a patch could look like.
>>>>
>>>> While I remember it had worked, now (weirdly since I tried that mounting
>>>> via fstab) it doesn't anymore!
>>>>
>>>> What I understand (not much): I handle the error with "retry" via the
>>>> new flag, but scsi_decide_disposition() returns SUCCESS because of "no
>>>> more retries"; but it's the first and only time it's called.
>>>
>>> Are you saying that scmd->allowed is set to 0?  Or is scsi_notry_cmd() 
>>> returning a nonzero value?  Whichever is true, why does it happen that 
>>> way?
>>
>> scsi_notry_cmd() is returning 1. (it's retry 1 of 5 allowed).
>>
>> why is it returning 1? REQ_FAILFAST_DEV is set. It's DID_OK, then "if
>> (status_byte(scmd->result) != CHECK_CONDITION)" appearently is not true,
>> then at the end it returns 1 because of REQ_FAILFAST_DEV.
>>
>> that seems to come from the block layer. why and when? could I change
>> that so that the scsi error handling stays in control?
> 
> The only place I see where that flag might get set is in 
> blk_mq_bio_to_request() in block/blk-mq.c, which does:
> 
> 	if (bio->bi_opf & REQ_RAHEAD)
> 		rq->cmd_flags |= REQ_FAILFAST_MASK;
> 
> So apparently read-ahead reads are supposed to fail fast (i.e., without 
> retries), presumably because they are optional after all.
> 
>>> What is the failing command?  Is it a READ(10)?
>>
>> Not sure how I'd answer that, but here's the test to trigger the error:
>>
>> mount /dev/sda1 /mnt
>> cd /mnt
>> ls
>> cp file ~/ (if ls "works" and doesn't yet trigger the error)
>>
>> and that's the (familiar looking) logs when doing so. again: despite the
>> mentioned workaround in scsi_error and the new expected_media_change
>> flag *is* set and gets cleared, as it should be. REQ_FAILFAST_DEV seems
>> to override what I want to do is scsi_error:
>>
>> [   55.557629] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
>> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
>> [   55.557639] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
>> [   55.557646] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
>> [   55.557657] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 08 fc
>> e0 00 00 01 00
> 
> Yes, 0x28 is READ(10).  Likely this is a read-ahead request, although I 
> don't know how we can tell for sure.
> 
>> [   55.557666] blk_update_request: I/O error, dev sda, sector 589024 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   55.568899] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   55.574691] blk_update_request: I/O error, dev sda, sector 589025 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   55.585756] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   55.591562] blk_update_request: I/O error, dev sda, sector 589026 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   55.602274] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> (... goes on with the same)
> 
> Is such a drastic response really appropriate for the failure of a 
> read-ahead request?  It seems like a more logical response would be to 
> let the request fail but keep the device online.
> 
> Of course, that would only solve part of your problem -- your log would 
> still get filled with those "I/O error" messages even though they 
> wouldn't be fatal.  Probably a better approach would be to make the new 
> expecting_media_change flag override scsi_no_retry_cmd().
> 
> But this is not my area of expertise.  Maybe someone else will have more 
> to say.
> 
> Alan Stern
> 

Hey Alan, I'm really glad for that, I suspected some of this but I have
little experience in scsi/block layers, so that is super helpful.

I'd appreciate an opinion on the below workaround that *seems* to work
now (let's see, I had thought so before :)

Whether or not this helps to find a real solution, let's see. But
integration of such a flag in the error handling paths is what's
interesting for now:


--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -565,6 +565,17 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
 				return NEEDS_RETRY;
 			}
 		}
+		if (scmd->device->expecting_media_change) {
+			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
+				/* clear expecting_media_change in
+				 * scsi_noretry_cmd() because we need
+				 * to override possible "failfast" overrides
+				 * that block readahead can cause.
+				 */
+				return NEEDS_RETRY;
+			}
+		}
+
 		/*
 		 * we might also expect a cc/ua if another LUN on the target
 		 * reported a UA with an ASC/ASCQ of 3F 0E -
@@ -1744,6 +1755,15 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)
 		return (scmd->request->cmd_flags & REQ_FAILFAST_DRIVER);
 	}

+	/*
+	 * We need to have retries when expecting_media_change is set.
+	 * So we need to return success and clear the flag here.
+	 */
+	if (scmd->device->expecting_media_change) {
+		scmd->device->expecting_media_change = 0;
+		return 0;
+	}
+
 	if (status_byte(scmd->result) != CHECK_CONDITION)
 		return 0;

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..bb583e403b81 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev)
 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;

+	sdkp->device->expecting_media_change = 1;
+
 	if (!sdkp->device->manage_start_stop)
 		return 0;

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index bc5909033d13..f5fc1af68e00 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -169,6 +169,7 @@ struct scsi_device {
 				 * this device */
 	unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
 				     * because we did a bus reset. */
+	unsigned expecting_media_change:1;
 	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
 	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
 	unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
--

  reply	other threads:[~2020-08-09  9:21 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23 11:10 [PATCH] scsi: sd: add runtime pm to open / release Martin Kepplinger
2020-06-24 13:33 ` Bart Van Assche
2020-06-25  8:16   ` Martin Kepplinger
2020-06-25 14:52     ` Alan Stern
2020-06-26  3:53     ` Bart Van Assche
2020-06-26 15:07     ` Bart Van Assche
2020-06-26 15:44       ` Alan Stern
2020-06-28  2:37         ` Bart Van Assche
2020-06-28 13:10           ` Alan Stern
2020-06-29  9:42         ` Martin Kepplinger
2020-06-29 16:15           ` Alan Stern
2020-06-29 16:56             ` Bart Van Assche
2020-06-29 17:40               ` Alan Stern
2020-06-30  3:33             ` Martin Kepplinger
2020-06-30 13:38               ` Alan Stern
2020-06-30 15:59             ` Bart Van Assche
2020-06-30 18:02               ` Alan Stern
2020-06-30 19:23                 ` Bart Van Assche
2020-06-30 19:38                   ` Alan Stern
2020-06-30 23:31                     ` Bart Van Assche
2020-07-01  0:49                       ` Alan Stern
2020-07-06 16:41                         ` Alan Stern
2020-07-28  7:02                           ` Martin Kepplinger
2020-07-28 20:02                             ` Alan Stern
2020-07-29 14:12                               ` Martin Kepplinger
2020-07-29 14:32                                 ` Alan Stern
2020-07-29 14:44                                   ` Martin K. Petersen
2020-07-29 14:56                                     ` Alan Stern
2020-07-29 14:46                                   ` James Bottomley
2020-07-29 14:53                                     ` James Bottomley
2020-07-29 15:40                                       ` Martin Kepplinger
2020-07-29 15:44                                         ` James Bottomley
2020-07-29 16:43                                           ` Martin Kepplinger
2020-07-29 18:25                                             ` Alan Stern
2020-07-29 18:29                                               ` James Bottomley
2020-07-30  8:52                                                 ` Martin Kepplinger
2020-07-30  8:54                                                   ` Martin Kepplinger
2020-07-30 15:10                                                   ` Alan Stern
2020-08-04  9:39                                                     ` Martin Kepplinger
2020-08-07  9:51                                                       ` Martin Kepplinger
2020-08-07 14:30                                                         ` Alan Stern
2020-08-08  6:59                                                           ` Martin Kepplinger
2020-08-08 15:05                                                             ` Alan Stern
2020-08-09  9:20                                                               ` Martin Kepplinger [this message]
2020-08-09 15:26                                                                 ` Alan Stern
2020-08-10 12:03                                                                   ` Martin Kepplinger
2020-08-10 14:13                                                                     ` Alan Stern
2020-08-11  7:55                                                                       ` Martin Kepplinger
2020-08-11 13:48                                                                         ` Alan Stern
2020-08-23 14:57                                                                         ` [PATCH] block: Fix bug in runtime-resume handling Alan Stern
2020-08-24 17:48                                                                           ` Bart Van Assche
2020-08-24 20:13                                                                             ` Alan Stern
2020-08-26  7:48                                                                               ` Martin Kepplinger
2020-08-27 17:42                                                                                 ` Martin Kepplinger
2020-08-27 20:29                                                                                   ` Alan Stern
2020-08-29  7:24                                                                                     ` Martin Kepplinger
2020-08-29 15:26                                                                                       ` Alan Stern
2020-08-29 16:33                                                                                         ` Martin Kepplinger
2020-08-29 18:56                                                                                           ` Alan Stern
2020-08-30  0:38                                                                                             ` Bart Van Assche
2020-08-30  1:06                                                                                               ` Alan Stern
2020-07-29 15:40                                       ` [PATCH] scsi: sd: add runtime pm to open / release Alan Stern
2020-07-29 15:49                                         ` James Bottomley
2020-07-29 16:17                                           ` Alan Stern
2020-07-29 15:52                                         ` Martin Kepplinger
2020-07-29 18:10                                   ` Douglas Gilbert
2020-07-30  8:05             ` Martin Kepplinger
2020-07-30 15:14               ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3b6f7b8-5345-1ae1-4f79-5dde226e74f1@puri.sm \
    --to=martin.kepplinger@puri.sm \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=bvanassche@acm.org \
    --cc=cang@codeaurora.org \
    --cc=kernel@puri.sm \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.