All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Block <bblock@linux.vnet.ibm.com>
To: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	James Bottomley <james.bottomley@hansenpartnership.com>,
	Bart van Assche <bart.vanassche@sandisk.com>,
	linux-scsi@vger.kernel.org
Subject: Re: [PATCHv3 6/6] scsi: make asynchronous aborts mandatory
Date: Tue, 14 Mar 2017 18:33:22 +0100	[thread overview]
Message-ID: <20170314173322.GA19037@bblock-ThinkPad-W530> (raw)
In-Reply-To: <1488359720-130871-7-git-send-email-hare@suse.de>

Hello Hannes,

On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote:
> There hasn't been any reports for HBAs where asynchronous abort
> would not work, so we should make it mandatory and remove
> the fallback.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> ---
>  Documentation/scsi/scsi_eh.txt | 18 ++++------
>  drivers/scsi/scsi_error.c      | 81 ++++--------------------------------------
>  drivers/scsi/scsi_lib.c        |  2 +-
>  drivers/scsi/scsi_priv.h       |  3 +-
>  include/scsi/scsi_host.h       |  5 ---
>  5 files changed, 15 insertions(+), 94 deletions(-)
> 
> diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
> index 4edb9c1c..11e447b 100644
> --- a/Documentation/scsi/scsi_eh.txt
> +++ b/Documentation/scsi/scsi_eh.txt
> @@ -70,7 +70,7 @@ with the command.
>  	scmd is requeued to blk queue.
> 
>   - otherwise
> -	scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
> +	scsi_eh_scmd_add(scmd) is invoked for the command.  See
>  	[1-3] for details of this function.
> 
> 
> @@ -103,9 +103,7 @@ function
>          eh_timed_out() callback did not handle the command.
>  	Step #2 is taken.
> 
> - 2. If the host supports asynchronous completion (as indicated by the
> -    no_async_abort setting in the host template) scsi_abort_command()
> -    is invoked to schedule an asynchrous abort.
> + 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
>      Asynchronous abort are not invoked for commands which the
>      SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
>      already had been aborted once, and this is a retry which failed),
> @@ -127,16 +125,13 @@ function
> 
>   scmds enter EH via scsi_eh_scmd_add(), which does the following.
> 
> - 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
> -    completions and SCSI_EH_CANCEL_CMD for timeouts.
> + 1. Links scmd->eh_entry to shost->eh_cmd_q
> 
> - 2. Links scmd->eh_entry to shost->eh_cmd_q
> + 2. Sets SHOST_RECOVERY bit in shost->shost_state
> 
> - 3. Sets SHOST_RECOVERY bit in shost->shost_state
> + 3. Increments shost->host_failed
> 
> - 4. Increments shost->host_failed
> -
> - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> 
>   As can be seen above, once any scmd is added to shost->eh_cmd_q,
>  SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
> @@ -252,7 +247,6 @@ scmd->allowed.
> 
>   1. Error completion / time out
>      ACTION: scsi_eh_scmd_add() is invoked for scmd
> -	- set scmd->eh_eflags
>  	- add scmd to shost->eh_cmd_q
>  	- set SHOST_RECOVERY
>  	- shost->host_failed++
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 802a081..7b70ee9 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>  		}
>  	}
> 
> -	scsi_eh_scmd_add(scmd, 0);
> +	scsi_eh_scmd_add(scmd);
>  }
> 
>  /**
> @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>  /**
>   * scsi_eh_scmd_add - add scsi cmd to error handling.
>   * @scmd:	scmd to run eh on.
> - * @eh_flag:	optional SCSI_EH flag.
>   */
> -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>  {
>  	struct Scsi_Host *shost = scmd->device->host;
>  	unsigned long flags;
> @@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  	if (shost->eh_deadline != -1 && !shost->last_reset)
>  		shost->last_reset = jiffies;
> 
> -	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
> -		eh_flag &= ~SCSI_EH_CANCEL_CMD;
> -	scmd->eh_eflags |= eh_flag;
>  	scsi_eh_reset(scmd);
>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>  	shost->host_failed++;
> @@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>  		rtn = host->hostt->eh_timed_out(scmd);
> 
>  	if (rtn == BLK_EH_NOT_HANDLED) {
> -		if (host->hostt->no_async_abort ||
> -		    scsi_abort_command(scmd) != SUCCESS) {
> +		if (scsi_abort_command(scmd) != SUCCESS) {
>  			set_host_byte(scmd, DID_TIME_OUT);
> -			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
> +			scsi_eh_scmd_add(scmd);
>  		}
>  	}
> 
> @@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost,
>  		list_for_each_entry(scmd, work_q, eh_entry) {
>  			if (scmd->device == sdev) {
>  				++total_failures;
> -				if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
> +				if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>  					++cmd_cancel;
>  				else
>  					++cmd_failed;
> @@ -1153,8 +1148,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
>  	 * should not get sense.
>  	 */
>  	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> -		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
> -		    (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
> +		if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
>  		    SCSI_SENSE_VALID(scmd))
>  			continue;
> 
> @@ -1294,61 +1288,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
>  	return list_empty(work_q);
>  }
> 
> -
> -/**
> - * scsi_eh_abort_cmds - abort pending commands.
> - * @work_q:	&list_head for pending commands.
> - * @done_q:	&list_head for processed commands.
> - *
> - * Decription:
> - *    Try and see whether or not it makes sense to try and abort the
> - *    running command.  This only works out to be the case if we have one
> - *    command that has timed out.  If the command simply failed, it makes
> - *    no sense to try and abort the command, since as far as the shost
> - *    adapter is concerned, it isn't running.
> - */
> -static int scsi_eh_abort_cmds(struct list_head *work_q,
> -			      struct list_head *done_q)
> -{
> -	struct scsi_cmnd *scmd, *next;
> -	LIST_HEAD(check_list);
> -	int rtn;
> -	struct Scsi_Host *shost;
> -
> -	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> -		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
> -			continue;
> -		shost = scmd->device->host;
> -		if (scsi_host_eh_past_deadline(shost)) {
> -			list_splice_init(&check_list, work_q);
> -			SCSI_LOG_ERROR_RECOVERY(3,
> -				scmd_printk(KERN_INFO, scmd,
> -					    "%s: skip aborting cmd, past eh deadline\n",
> -					    current->comm));
> -			return list_empty(work_q);
> -		}
> -		SCSI_LOG_ERROR_RECOVERY(3,
> -			scmd_printk(KERN_INFO, scmd,
> -				     "%s: aborting cmd\n", current->comm));
> -		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
> -		if (rtn == FAILED) {
> -			SCSI_LOG_ERROR_RECOVERY(3,
> -				scmd_printk(KERN_INFO, scmd,
> -					    "%s: aborting cmd failed\n",
> -					     current->comm));
> -			list_splice_init(&check_list, work_q);
> -			return list_empty(work_q);
> -		}
> -		scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
> -		if (rtn == FAST_IO_FAIL)
> -			scsi_eh_finish_cmd(scmd, done_q);
> -		else
> -			list_move_tail(&scmd->eh_entry, &check_list);
> -	}
> -
> -	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
> -}
> -
>  /**
>   * scsi_eh_try_stu - Send START_UNIT to device.
>   * @scmd:	&scsi_cmnd to send START_UNIT
> @@ -1691,11 +1630,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
>  		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
>  			    "not ready after error recovery\n");
>  		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
> -		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
> -			/*
> -			 * FIXME: Handle lost cmds.
> -			 */
> -		}
>  		scsi_eh_finish_cmd(scmd, done_q);
>  	}
>  	return;
> @@ -2139,8 +2073,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>  	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
> 
>  	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
> -		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
> -			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
> +		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
> 
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	if (shost->eh_deadline != -1)
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0735a46..98b2df8 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq)
>  			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
>  			break;
>  		default:
> -			scsi_eh_scmd_add(cmd, 0);
> +			scsi_eh_scmd_add(cmd);
>  			break;
>  	}
>  }
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 5be6cbf6..e20ab10 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -18,7 +18,6 @@
>  /*
>   * Scsi Error Handler Flags
>   */
> -#define SCSI_EH_CANCEL_CMD	0x0001	/* Cancel this cmd */
>  #define SCSI_EH_ABORT_SCHEDULED	0x0002	/* Abort has been scheduled */
> 
>  #define SCSI_SENSE_VALID(scmd) \
> @@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
>  extern int scsi_error_handler(void *host);
>  extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
>  extern void scsi_eh_wakeup(struct Scsi_Host *shost);
> -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
> +extern void scsi_eh_scmd_add(struct scsi_cmnd *);
>  void scsi_eh_ready_devs(struct Scsi_Host *shost,
>  			struct list_head *work_q,
>  			struct list_head *done_q);
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 3cd8c3b..afb0481 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -452,11 +452,6 @@ struct scsi_host_template {
>  	unsigned no_write_same:1;
> 
>  	/*
> -	 * True if asynchronous aborts are not supported
> -	 */
> -	unsigned no_async_abort:1;
> -
> -	/*
>  	 * Countdown for host blocking with no commands outstanding.
>  	 */
>  	unsigned int max_host_blocked;
> -- 
> 1.8.5.6
> 

Hmm so, I guess we compromise in terms of how granular we want to
recover? 

When say an abort for command A on LUN 1 behind Port α fails for some
reason, then we also skip all possible aborts for command B on LUN 2
behind Port α and command C on LUN 1 behind Port β? (The host might
already be in recovery by the time command B and C fail)

I mean, in the end, all traffic holds on that host anyway, as long as we
get into EH - which is true as soon as one abort fails. Might as well
skip the wait time till we get there.

I would like it, if we could document that fact a bit more direct.
Otherwise I think that's good. The code also looks good.


Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block
-- 
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

  reply	other threads:[~2017-03-14 17:33 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01  9:15 [PATCHv3 0/6] SCSI EH cleanup Hannes Reinecke
2017-03-01  9:15 ` [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
2017-03-01 13:50   ` Steffen Maier
2017-03-01 23:24   ` Bart Van Assche
2017-03-02  8:02     ` Hannes Reinecke
2017-03-02 20:16   ` Benjamin Block
2017-03-13 10:20     ` Hannes Reinecke
2017-03-13 13:37   ` Mauricio Faria de Oliveira
2017-03-13 14:48     ` Hannes Reinecke
2017-03-13 15:54       ` Mauricio Faria de Oliveira
2017-03-01  9:15 ` [PATCHv3 2/6] libsas: allow async aborts Hannes Reinecke
2017-03-01  9:15 ` [PATCHv3 3/6] scsi: make eh_eflags persistent Hannes Reinecke
2017-03-01 23:29   ` Bart Van Assche
2017-03-14 17:51   ` Benjamin Block
2017-03-01  9:15 ` [PATCHv3 4/6] scsi_error: do not escalate failed EH command Hannes Reinecke
2017-03-14 17:56   ` Benjamin Block
2017-03-15 13:54     ` Hannes Reinecke
2017-03-16 11:01       ` Benjamin Block
2017-03-16 11:53         ` Hannes Reinecke
2017-03-21 19:05           ` Benjamin Block
2017-03-23 13:11             ` Hannes Reinecke
2017-03-01  9:15 ` [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
2017-03-01 23:34   ` Bart Van Assche
2017-03-14 18:05   ` Benjamin Block
2017-03-01  9:15 ` [PATCHv3 6/6] scsi: make asynchronous aborts mandatory Hannes Reinecke
2017-03-14 17:33   ` Benjamin Block [this message]
2017-03-15 13:54     ` Hannes Reinecke
2017-03-15 17:55       ` Benjamin Block
2017-03-16 14:06         ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170314173322.GA19037@bblock-ThinkPad-W530 \
    --to=bblock@linux.vnet.ibm.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.