Linux-SCSI Archive on lore.kernel.org
 help / color / Atom feed
From: Martin Kepplinger <martin.kepplinger@puri.sm>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Can Guo <cang@codeaurora.org>,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel@puri.sm
Subject: Re: [PATCH] scsi: sd: add runtime pm to open / release
Date: Sun, 9 Aug 2020 11:20:22 +0200
Message-ID: <d3b6f7b8-5345-1ae1-4f79-5dde226e74f1@puri.sm> (raw)
In-Reply-To: <20200808150542.GB256751@rowland.harvard.edu>

On 08.08.20 17:05, Alan Stern wrote:
> On Sat, Aug 08, 2020 at 08:59:09AM +0200, Martin Kepplinger wrote:
>> On 07.08.20 16:30, Alan Stern wrote:
>>> On Fri, Aug 07, 2020 at 11:51:21AM +0200, Martin Kepplinger wrote:
>>>> it's really strange: below is the change I'm trying. Of course that's
>>>> only for testing the functionality, nothing how a patch could look like.
>>>>
>>>> While I remember it had worked, now (weirdly since I tried that mounting
>>>> via fstab) it doesn't anymore!
>>>>
>>>> What I understand (not much): I handle the error with "retry" via the
>>>> new flag, but scsi_decide_disposition() returns SUCCESS because of "no
>>>> more retries"; but it's the first and only time it's called.
>>>
>>> Are you saying that scmd->allowed is set to 0?  Or is scsi_notry_cmd() 
>>> returning a nonzero value?  Whichever is true, why does it happen that 
>>> way?
>>
>> scsi_notry_cmd() is returning 1. (it's retry 1 of 5 allowed).
>>
>> why is it returning 1? REQ_FAILFAST_DEV is set. It's DID_OK, then "if
>> (status_byte(scmd->result) != CHECK_CONDITION)" appearently is not true,
>> then at the end it returns 1 because of REQ_FAILFAST_DEV.
>>
>> that seems to come from the block layer. why and when? could I change
>> that so that the scsi error handling stays in control?
> 
> The only place I see where that flag might get set is in 
> blk_mq_bio_to_request() in block/blk-mq.c, which does:
> 
> 	if (bio->bi_opf & REQ_RAHEAD)
> 		rq->cmd_flags |= REQ_FAILFAST_MASK;
> 
> So apparently read-ahead reads are supposed to fail fast (i.e., without 
> retries), presumably because they are optional after all.
> 
>>> What is the failing command?  Is it a READ(10)?
>>
>> Not sure how I'd answer that, but here's the test to trigger the error:
>>
>> mount /dev/sda1 /mnt
>> cd /mnt
>> ls
>> cp file ~/ (if ls "works" and doesn't yet trigger the error)
>>
>> and that's the (familiar looking) logs when doing so. again: despite the
>> mentioned workaround in scsi_error and the new expected_media_change
>> flag *is* set and gets cleared, as it should be. REQ_FAILFAST_DEV seems
>> to override what I want to do is scsi_error:
>>
>> [   55.557629] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
>> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
>> [   55.557639] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
>> [   55.557646] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
>> [   55.557657] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 08 fc
>> e0 00 00 01 00
> 
> Yes, 0x28 is READ(10).  Likely this is a read-ahead request, although I 
> don't know how we can tell for sure.
> 
>> [   55.557666] blk_update_request: I/O error, dev sda, sector 589024 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   55.568899] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   55.574691] blk_update_request: I/O error, dev sda, sector 589025 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   55.585756] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   55.591562] blk_update_request: I/O error, dev sda, sector 589026 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   55.602274] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> (... goes on with the same)
> 
> Is such a drastic response really appropriate for the failure of a 
> read-ahead request?  It seems like a more logical response would be to 
> let the request fail but keep the device online.
> 
> Of course, that would only solve part of your problem -- your log would 
> still get filled with those "I/O error" messages even though they 
> wouldn't be fatal.  Probably a better approach would be to make the new 
> expecting_media_change flag override scsi_no_retry_cmd().
> 
> But this is not my area of expertise.  Maybe someone else will have more 
> to say.
> 
> Alan Stern
> 

Hey Alan, I'm really glad for that, I suspected some of this but I have
little experience in scsi/block layers, so that is super helpful.

I'd appreciate an opinion on the below workaround that *seems* to work
now (let's see, I had thought so before :)

Whether or not this helps to find a real solution, let's see. But
integration of such a flag in the error handling paths is what's
interesting for now:


--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -565,6 +565,17 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
 				return NEEDS_RETRY;
 			}
 		}
+		if (scmd->device->expecting_media_change) {
+			if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) {
+				/* clear expecting_media_change in
+				 * scsi_noretry_cmd() because we need
+				 * to override possible "failfast" overrides
+				 * that block readahead can cause.
+				 */
+				return NEEDS_RETRY;
+			}
+		}
+
 		/*
 		 * we might also expect a cc/ua if another LUN on the target
 		 * reported a UA with an ASC/ASCQ of 3F 0E -
@@ -1744,6 +1755,15 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)
 		return (scmd->request->cmd_flags & REQ_FAILFAST_DRIVER);
 	}

+	/*
+	 * We need to have retries when expecting_media_change is set.
+	 * So we need to return success and clear the flag here.
+	 */
+	if (scmd->device->expecting_media_change) {
+		scmd->device->expecting_media_change = 0;
+		return 0;
+	}
+
 	if (status_byte(scmd->result) != CHECK_CONDITION)
 		return 0;

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..bb583e403b81 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev)
 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;

+	sdkp->device->expecting_media_change = 1;
+
 	if (!sdkp->device->manage_start_stop)
 		return 0;

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index bc5909033d13..f5fc1af68e00 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -169,6 +169,7 @@ struct scsi_device {
 				 * this device */
 	unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN
 				     * because we did a bus reset. */
+	unsigned expecting_media_change:1;
 	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
 	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
 	unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
--

  reply index

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23 11:10 Martin Kepplinger
2020-06-24 13:33 ` Bart Van Assche
2020-06-25  8:16   ` Martin Kepplinger
2020-06-25 14:52     ` Alan Stern
2020-06-26  3:53     ` Bart Van Assche
2020-06-26 15:07     ` Bart Van Assche
2020-06-26 15:44       ` Alan Stern
2020-06-28  2:37         ` Bart Van Assche
2020-06-28 13:10           ` Alan Stern
2020-06-29  9:42         ` Martin Kepplinger
2020-06-29 16:15           ` Alan Stern
2020-06-29 16:56             ` Bart Van Assche
2020-06-29 17:40               ` Alan Stern
2020-06-30  3:33             ` Martin Kepplinger
2020-06-30 13:38               ` Alan Stern
2020-06-30 15:59             ` Bart Van Assche
2020-06-30 18:02               ` Alan Stern
2020-06-30 19:23                 ` Bart Van Assche
2020-06-30 19:38                   ` Alan Stern
2020-06-30 23:31                     ` Bart Van Assche
2020-07-01  0:49                       ` Alan Stern
2020-07-06 16:41                         ` Alan Stern
2020-07-28  7:02                           ` Martin Kepplinger
2020-07-28 20:02                             ` Alan Stern
2020-07-29 14:12                               ` Martin Kepplinger
2020-07-29 14:32                                 ` Alan Stern
2020-07-29 14:44                                   ` Martin K. Petersen
2020-07-29 14:56                                     ` Alan Stern
2020-07-29 14:46                                   ` James Bottomley
2020-07-29 14:53                                     ` James Bottomley
2020-07-29 15:40                                       ` Martin Kepplinger
2020-07-29 15:44                                         ` James Bottomley
2020-07-29 16:43                                           ` Martin Kepplinger
2020-07-29 18:25                                             ` Alan Stern
2020-07-29 18:29                                               ` James Bottomley
2020-07-30  8:52                                                 ` Martin Kepplinger
2020-07-30  8:54                                                   ` Martin Kepplinger
2020-07-30 15:10                                                   ` Alan Stern
2020-08-04  9:39                                                     ` Martin Kepplinger
2020-08-07  9:51                                                       ` Martin Kepplinger
2020-08-07 14:30                                                         ` Alan Stern
2020-08-08  6:59                                                           ` Martin Kepplinger
2020-08-08 15:05                                                             ` Alan Stern
2020-08-09  9:20                                                               ` Martin Kepplinger [this message]
2020-08-09 15:26                                                                 ` Alan Stern
2020-08-10 12:03                                                                   ` Martin Kepplinger
2020-08-10 14:13                                                                     ` Alan Stern
2020-08-11  7:55                                                                       ` Martin Kepplinger
2020-08-11 13:48                                                                         ` Alan Stern
2020-08-23 14:57                                                                         ` [PATCH] block: Fix bug in runtime-resume handling Alan Stern
2020-08-24 17:48                                                                           ` Bart Van Assche
2020-08-24 20:13                                                                             ` Alan Stern
2020-08-26  7:48                                                                               ` Martin Kepplinger
2020-08-27 17:42                                                                                 ` Martin Kepplinger
2020-08-27 20:29                                                                                   ` Alan Stern
2020-08-29  7:24                                                                                     ` Martin Kepplinger
2020-08-29 15:26                                                                                       ` Alan Stern
2020-08-29 16:33                                                                                         ` Martin Kepplinger
2020-08-29 18:56                                                                                           ` Alan Stern
2020-08-30  0:38                                                                                             ` Bart Van Assche
2020-08-30  1:06                                                                                               ` Alan Stern
2020-07-29 15:40                                       ` [PATCH] scsi: sd: add runtime pm to open / release Alan Stern
2020-07-29 15:49                                         ` James Bottomley
2020-07-29 16:17                                           ` Alan Stern
2020-07-29 15:52                                         ` Martin Kepplinger
2020-07-29 18:10                                   ` Douglas Gilbert
2020-07-30  8:05             ` Martin Kepplinger
2020-07-30 15:14               ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3b6f7b8-5345-1ae1-4f79-5dde226e74f1@puri.sm \
    --to=martin.kepplinger@puri.sm \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=bvanassche@acm.org \
    --cc=cang@codeaurora.org \
    --cc=kernel@puri.sm \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-SCSI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-scsi/0 linux-scsi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-scsi linux-scsi/ https://lore.kernel.org/linux-scsi \
		linux-scsi@vger.kernel.org
	public-inbox-index linux-scsi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-scsi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git