From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Date: Thu, 2 Mar 2017 09:02:42 +0100 Message-ID: <21ec4ad7-9a56-dc2b-3baa-9c85eb0c1527@suse.de> References: <1488359720-130871-1-git-send-email-hare@suse.de> <1488359720-130871-2-git-send-email-hare@suse.de> <1488410641.2699.14.camel@sandisk.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Return-path: Received: from mx2.suse.de ([195.135.220.15]:50957 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751229AbdCBOfY (ORCPT ); Thu, 2 Mar 2017 09:35:24 -0500 In-Reply-To: <1488410641.2699.14.camel@sandisk.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Bart Van Assche , "martin.petersen@oracle.com" Cc: "bblock@linux.vnet.ibm.com" , "hch@lst.de" , "linux-scsi@vger.kernel.org" , "hare@suse.com" , "james.bottomley@hansenpartnership.com" , "emilne@redhat.com" , "loberman@redhat.com" , "maier@de.ibm.com" On 03/02/2017 12:24 AM, Bart Van Assche wrote: > On Wed, 2017-03-01 at 10:15 +0100, Hannes Reinecke wrote: >> The current medium access timeout counter will be increased for >> each command, so if there are enough failed commands we'll hit >> the medium access timeout for even a single failure. > > This sentence describes multiple failed commands as a single failure. > That's confusing to me. Did you perhaps intend "for a single device > failure"? > >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >> index f2cafae..cec439c 100644 >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -58,6 +58,7 @@ >> static int scsi_eh_try_stu(struct scsi_cmnd *scmd); >> static int scsi_try_to_abort_cmd(struct scsi_host_template *, >> struct scsi_cmnd *); >> +static int scsi_eh_reset(struct scsi_cmnd *scmd); >> >> /* called with shost->host_lock held */ >> void scsi_eh_wakeup(struct Scsi_Host *shost) >> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) >> if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) >> eh_flag &= ~SCSI_EH_CANCEL_CMD; >> scmd->eh_eflags |= eh_flag; >> + scsi_eh_reset(scmd); >> list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); >> shost->host_failed++; >> scsi_eh_wakeup(shost); >> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) >> if (!blk_rq_is_passthrough(scmd->request)) { >> struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); >> if (sdrv->eh_action) >> - rtn = sdrv->eh_action(scmd, rtn); >> + rtn = sdrv->eh_action(scmd, rtn, false); >> + } >> + return rtn; >> +} >> + >> +static int scsi_eh_reset(struct scsi_cmnd *scmd) >> +{ >> + int rtn = SUCCESS; >> + >> + if (!blk_rq_is_passthrough(scmd->request)) { >> + struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); >> + if (sdrv->eh_action) >> + rtn = sdrv->eh_action(scmd, rtn, true); >> } >> return rtn; >> } > > Can this function be moved up such that we don't need a new forward declaration? > Why, but of course. I just put is here to be next to the original scsi_eh_action() function. >> @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 key) >> * sd_eh_action - error handling callback >> * @scmd: sd-issued command that has failed >> * @eh_disp: The recovery disposition suggested by the midlayer >> + * @reset: Reset medium access counter > > It seems to me that @reset does not trigger a reset of the medium access counter > but rather of the variable that prevents the medium access error counter to be > incremented? > Correct. Will be fixing up the description. >> + * recovery). >> + * We have to be careful to count a medium access failure only once >> + * per SCSI EH run; there might be several timed out commands which > > Did you perhaps intend "once per device per SCSI EH run"? > Yes. >> --- a/drivers/scsi/sd.h >> +++ b/drivers/scsi/sd.h >> @@ -106,6 +106,7 @@ struct scsi_disk { >> unsigned rc_basis: 2; >> unsigned zoned: 2; >> unsigned urswrz : 1; >> + unsigned medium_access_reset : 1; > > The name of this new member is confusing to me. How about using the name > "ignore_medium_access_errors" instead? And since this is a boolean, how > about using true and false in assignments to this variable? > Ok, will be doing so. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)