From: Martin Kepplinger <martin.kepplinger@puri.sm> To: Alan Stern <stern@rowland.harvard.edu> Cc: James Bottomley <James.Bottomley@hansenpartnership.com>, Bart Van Assche <bvanassche@acm.org>, Can Guo <cang@codeaurora.org>, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, kernel@puri.sm Subject: Re: [PATCH] scsi: sd: add runtime pm to open / release Date: Sun, 9 Aug 2020 11:20:22 +0200 [thread overview] Message-ID: <d3b6f7b8-5345-1ae1-4f79-5dde226e74f1@puri.sm> (raw) In-Reply-To: <20200808150542.GB256751@rowland.harvard.edu> On 08.08.20 17:05, Alan Stern wrote: > On Sat, Aug 08, 2020 at 08:59:09AM +0200, Martin Kepplinger wrote: >> On 07.08.20 16:30, Alan Stern wrote: >>> On Fri, Aug 07, 2020 at 11:51:21AM +0200, Martin Kepplinger wrote: >>>> it's really strange: below is the change I'm trying. Of course that's >>>> only for testing the functionality, nothing how a patch could look like. >>>> >>>> While I remember it had worked, now (weirdly since I tried that mounting >>>> via fstab) it doesn't anymore! >>>> >>>> What I understand (not much): I handle the error with "retry" via the >>>> new flag, but scsi_decide_disposition() returns SUCCESS because of "no >>>> more retries"; but it's the first and only time it's called. >>> >>> Are you saying that scmd->allowed is set to 0? Or is scsi_notry_cmd() >>> returning a nonzero value? Whichever is true, why does it happen that >>> way? >> >> scsi_notry_cmd() is returning 1. (it's retry 1 of 5 allowed). >> >> why is it returning 1? REQ_FAILFAST_DEV is set. It's DID_OK, then "if >> (status_byte(scmd->result) != CHECK_CONDITION)" appearently is not true, >> then at the end it returns 1 because of REQ_FAILFAST_DEV. >> >> that seems to come from the block layer. why and when? could I change >> that so that the scsi error handling stays in control? > > The only place I see where that flag might get set is in > blk_mq_bio_to_request() in block/blk-mq.c, which does: > > if (bio->bi_opf & REQ_RAHEAD) > rq->cmd_flags |= REQ_FAILFAST_MASK; > > So apparently read-ahead reads are supposed to fail fast (i.e., without > retries), presumably because they are optional after all. > >>> What is the failing command? Is it a READ(10)? >> >> Not sure how I'd answer that, but here's the test to trigger the error: >> >> mount /dev/sda1 /mnt >> cd /mnt >> ls >> cp file ~/ (if ls "works" and doesn't yet trigger the error) >> >> and that's the (familiar looking) logs when doing so. again: despite the >> mentioned workaround in scsi_error and the new expected_media_change >> flag *is* set and gets cleared, as it should be. REQ_FAILFAST_DEV seems >> to override what I want to do is scsi_error: >> >> [ 55.557629] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: >> hostbyte=0x00 driverbyte=0x08 cmd_age=0s >> [ 55.557639] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current] >> [ 55.557646] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0 >> [ 55.557657] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 08 fc >> e0 00 00 01 00 > > Yes, 0x28 is READ(10). Likely this is a read-ahead request, although I > don't know how we can tell for sure. > >> [ 55.557666] blk_update_request: I/O error, dev sda, sector 589024 op >> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 >> [ 55.568899] sd 0:0:0:0: [sda] tag#0 device offline or changed >> [ 55.574691] blk_update_request: I/O error, dev sda, sector 589025 op >> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 >> [ 55.585756] sd 0:0:0:0: [sda] tag#0 device offline or changed >> [ 55.591562] blk_update_request: I/O error, dev sda, sector 589026 op >> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 >> [ 55.602274] sd 0:0:0:0: [sda] tag#0 device offline or changed >> (... goes on with the same) > > Is such a drastic response really appropriate for the failure of a > read-ahead request? It seems like a more logical response would be to > let the request fail but keep the device online. > > Of course, that would only solve part of your problem -- your log would > still get filled with those "I/O error" messages even though they > wouldn't be fatal. Probably a better approach would be to make the new > expecting_media_change flag override scsi_no_retry_cmd(). > > But this is not my area of expertise. Maybe someone else will have more > to say. > > Alan Stern > Hey Alan, I'm really glad for that, I suspected some of this but I have little experience in scsi/block layers, so that is super helpful. I'd appreciate an opinion on the below workaround that *seems* to work now (let's see, I had thought so before :) Whether or not this helps to find a real solution, let's see. But integration of such a flag in the error handling paths is what's interesting for now: --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -565,6 +565,17 @@ int scsi_check_sense(struct scsi_cmnd *scmd) return NEEDS_RETRY; } } + if (scmd->device->expecting_media_change) { + if (sshdr.asc == 0x28 && sshdr.ascq == 0x00) { + /* clear expecting_media_change in + * scsi_noretry_cmd() because we need + * to override possible "failfast" overrides + * that block readahead can cause. + */ + return NEEDS_RETRY; + } + } + /* * we might also expect a cc/ua if another LUN on the target * reported a UA with an ASC/ASCQ of 3F 0E - @@ -1744,6 +1755,15 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd) return (scmd->request->cmd_flags & REQ_FAILFAST_DRIVER); } + /* + * We need to have retries when expecting_media_change is set. + * So we need to return success and clear the flag here. + */ + if (scmd->device->expecting_media_change) { + scmd->device->expecting_media_change = 0; + return 0; + } + if (status_byte(scmd->result) != CHECK_CONDITION) return 0; diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index d90fefffe31b..bb583e403b81 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev) if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ return 0; + sdkp->device->expecting_media_change = 1; + if (!sdkp->device->manage_start_stop) return 0; diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index bc5909033d13..f5fc1af68e00 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -169,6 +169,7 @@ struct scsi_device { * this device */ unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN * because we did a bus reset. */ + unsigned expecting_media_change:1; unsigned use_10_for_rw:1; /* first try 10-byte read / write */ unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */ unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */ --
next prev parent reply other threads:[~2020-08-09 9:21 UTC|newest] Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-23 11:10 Martin Kepplinger 2020-06-24 13:33 ` Bart Van Assche 2020-06-25 8:16 ` Martin Kepplinger 2020-06-25 14:52 ` Alan Stern 2020-06-26 3:53 ` Bart Van Assche 2020-06-26 15:07 ` Bart Van Assche 2020-06-26 15:44 ` Alan Stern 2020-06-28 2:37 ` Bart Van Assche 2020-06-28 13:10 ` Alan Stern 2020-06-29 9:42 ` Martin Kepplinger 2020-06-29 16:15 ` Alan Stern 2020-06-29 16:56 ` Bart Van Assche 2020-06-29 17:40 ` Alan Stern 2020-06-30 3:33 ` Martin Kepplinger 2020-06-30 13:38 ` Alan Stern 2020-06-30 15:59 ` Bart Van Assche 2020-06-30 18:02 ` Alan Stern 2020-06-30 19:23 ` Bart Van Assche 2020-06-30 19:38 ` Alan Stern 2020-06-30 23:31 ` Bart Van Assche 2020-07-01 0:49 ` Alan Stern 2020-07-06 16:41 ` Alan Stern 2020-07-28 7:02 ` Martin Kepplinger 2020-07-28 20:02 ` Alan Stern 2020-07-29 14:12 ` Martin Kepplinger 2020-07-29 14:32 ` Alan Stern 2020-07-29 14:44 ` Martin K. Petersen 2020-07-29 14:56 ` Alan Stern 2020-07-29 14:46 ` James Bottomley 2020-07-29 14:53 ` James Bottomley 2020-07-29 15:40 ` Martin Kepplinger 2020-07-29 15:44 ` James Bottomley 2020-07-29 16:43 ` Martin Kepplinger 2020-07-29 18:25 ` Alan Stern 2020-07-29 18:29 ` James Bottomley 2020-07-30 8:52 ` Martin Kepplinger 2020-07-30 8:54 ` Martin Kepplinger 2020-07-30 15:10 ` Alan Stern 2020-08-04 9:39 ` Martin Kepplinger 2020-08-07 9:51 ` Martin Kepplinger 2020-08-07 14:30 ` Alan Stern 2020-08-08 6:59 ` Martin Kepplinger 2020-08-08 15:05 ` Alan Stern 2020-08-09 9:20 ` Martin Kepplinger [this message] 2020-08-09 15:26 ` Alan Stern 2020-08-10 12:03 ` Martin Kepplinger 2020-08-10 14:13 ` Alan Stern 2020-08-11 7:55 ` Martin Kepplinger 2020-08-11 13:48 ` Alan Stern 2020-08-23 14:57 ` [PATCH] block: Fix bug in runtime-resume handling Alan Stern 2020-08-24 17:48 ` Bart Van Assche 2020-08-24 20:13 ` Alan Stern 2020-08-26 7:48 ` Martin Kepplinger 2020-08-27 17:42 ` Martin Kepplinger 2020-08-27 20:29 ` Alan Stern 2020-08-29 7:24 ` Martin Kepplinger 2020-08-29 15:26 ` Alan Stern 2020-08-29 16:33 ` Martin Kepplinger 2020-08-29 18:56 ` Alan Stern 2020-08-30 0:38 ` Bart Van Assche 2020-08-30 1:06 ` Alan Stern 2020-07-29 15:40 ` [PATCH] scsi: sd: add runtime pm to open / release Alan Stern 2020-07-29 15:49 ` James Bottomley 2020-07-29 16:17 ` Alan Stern 2020-07-29 15:52 ` Martin Kepplinger 2020-07-29 18:10 ` Douglas Gilbert 2020-07-30 8:05 ` Martin Kepplinger 2020-07-30 15:14 ` Alan Stern
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=d3b6f7b8-5345-1ae1-4f79-5dde226e74f1@puri.sm \ --to=martin.kepplinger@puri.sm \ --cc=James.Bottomley@hansenpartnership.com \ --cc=bvanassche@acm.org \ --cc=cang@codeaurora.org \ --cc=kernel@puri.sm \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-scsi@vger.kernel.org \ --cc=martin.petersen@oracle.com \ --cc=stern@rowland.harvard.edu \ --subject='Re: [PATCH] scsi: sd: add runtime pm to open / release' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.