All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ewan Milne <emilne@redhat.com>
To: James Bottomley <jbottomley@parallels.com>
Cc: Hannes Reinecke <hare@suse.de>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Ren Mingxin <renmx@cn.fujitsu.com>,
	Bart van Assche <bvanassche@acm.org>,
	Joern Engel <joern@logfs.org>
Subject: Re: [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime
Date: Thu, 17 Oct 2013 10:27:56 -0400	[thread overview]
Message-ID: <1382020076.3812.78.camel@localhost.localdomain> (raw)
In-Reply-To: <1381936290.1864.11.camel@dabdike>

On Wed, 2013-10-16 at 19:22 +0000, James Bottomley wrote:
> What about instead:
> 
> static int scsi_host_eh_past_deadline(struct Scsi_Host *shost, int percent) {
> 	if (!shost->last_reset || !shost->eh_deadline)
> 		return 0;
> 
> 	if (time_before(jiffies,
> 			shost->last_reset + shost->eh_deadline * percent/100))
> 		return 0;
> 
> 	return 1;
> }
> 
> which allows us to have
> 
> if (scsi_host_eh_past_deadline(shost, 50)) {
> 
> in scsi_eh_abort_cmds()
> 
> if (scsi_host_eh_past_deadline(shost, 66) {
> 
> in scsi_eh_bus_device_reset()
> 
> say 83 in target reset, and 100 in bus reset.
> 
> Thus ensuring we at least get a crack at the reset chain?
> 
> James
> 

Well, conceptually that seems like a good idea, since if there
is limited time available it is probably wiser to spend it on
higher-level recovery instead of timing out trying to deal with
individual devices, but we didn't do any testing on how long
the bus device reset/target reset/bus reset take.  The host
reset was about 10 seconds for lpfc, and the maximum time was
(command timeout) + (eh deadline) + (host reset time).

However...

With this enhancement, the maximum time could be much longer if
we attempt to e.g. perform a bus reset right before the
eh_deadline expires, because drivers like lpfc iterate over the
targets and send a target reset to each one (with a timeout).

The original problem that prompted this change was that a
target became inaccessible, and nothing the EH did was ever going
to do anything except timeout, until the host reset was performed,
at which point the FC login would fail and the HBA would start
failing commands immediately instead of them timing out.

I guess the main thing is that there should be some way to
explain to people what value to use for eh_deadline in order
for I/O to complete within a specified amount of time (e.g. before
some other node in a cluster shoots us because we are unresponsive).

-Ewan





  reply	other threads:[~2013-10-17 14:28 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-01  6:50 [PATCHv2 0/7] Limit overall SCSI EH runtime Hannes Reinecke
2013-07-01  6:50 ` [PATCH 1/7] dpt_i2o: Remove DPTI_STATE_IOCTL Hannes Reinecke
2013-07-01  6:50 ` [PATCH 2/7] dpt_i2o: return SCSI_MLQUEUE_HOST_BUSY when in reset Hannes Reinecke
2013-07-01  6:50 ` [PATCH 3/7] advansys: Remove 'last_reset' references Hannes Reinecke
2013-07-01  6:50 ` [PATCH 4/7] tmscsim: Move 'last_reset' into host structure Hannes Reinecke
2013-07-01  6:50 ` [PATCH 5/7] dc395: Move 'last_reset' into internal " Hannes Reinecke
2013-07-01  6:50 ` [PATCH 6/7] scsi: remove check for 'resetting' Hannes Reinecke
2013-07-01  6:50 ` [PATCH 7/7] scsi: Add 'eh_deadline' to limit SCSI EH runtime Hannes Reinecke
2013-09-20  7:48   ` Ren Mingxin
2013-10-16 19:22   ` James Bottomley
2013-10-17 14:27     ` Ewan Milne [this message]
2013-10-23  9:25     ` Hannes Reinecke
2013-10-23  7:46       ` James Bottomley
2013-10-23  9:49         ` Hannes Reinecke
2013-07-01 17:44 ` [PATCHv2 0/7] Limit overall " Jörn Engel
2013-07-01 19:23   ` James Bottomley
2013-07-01 20:55     ` Jörn Engel
2013-07-02  5:48       ` Hannes Reinecke
2013-07-02  6:37       ` James Bottomley
2013-07-02 14:58         ` Jörn Engel
2013-07-02 16:33           ` James Bottomley
2013-07-02 15:50             ` Jörn Engel
2013-07-10 20:35 ` Ewan Milne
2013-07-12  5:54   ` Ren Mingxin
2013-07-12 13:30     ` Ewan Milne
2013-07-15 10:33       ` Ren Mingxin
2013-07-26  9:52         ` Ren Mingxin
2013-08-07  6:43   ` Ren Mingxin
2013-08-29 13:06     ` Hannes Reinecke
2013-09-24 20:51       ` Ric Wheeler
2013-09-25  5:48         ` Hannes Reinecke
2013-10-02 16:21           ` Hannes Reinecke
  -- strict thread matches above, loose matches on Subject: below --
2013-06-10 11:11 [PATCH " Hannes Reinecke
2013-06-10 11:11 ` [PATCH 7/7] scsi: Add 'eh_deadline' to limit " Hannes Reinecke
2013-06-27 14:33   ` Ewan Milne
2013-06-28  7:14     ` Hannes Reinecke
2013-06-28 12:54       ` Ewan Milne
2013-06-28  7:29   ` Bart Van Assche
2013-06-28  7:42     ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1382020076.3812.78.camel@localhost.localdomain \
    --to=emilne@redhat.com \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.de \
    --cc=jbottomley@parallels.com \
    --cc=joern@logfs.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=renmx@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.