All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>,
	"dm-devel@redhat.com James Bottomley" <James.Bottomley@suse.de>,
	linux-scsi@vger.kernel.org
Subject: Re: block_abort_queue (blk_abort_request) racing with scsi_request_fn
Date: Wed, 17 Nov 2010 16:55:41 -0500	[thread overview]
Message-ID: <20101117215541.GA7785@redhat.com> (raw)
In-Reply-To: <20101117174925.GA2176@linux.vnet.ibm.com>

On Wed, Nov 17 2010 at 12:49pm -0500,
Mike Anderson <andmike@linux.vnet.ibm.com> wrote:

> Mike Snitzer <snitzer@redhat.com> wrote:
> > Hi Mike,
> > 
> > On Fri, Nov 12 2010 at 12:54pm -0500,
> > Mike Anderson <andmike@linux.vnet.ibm.com> wrote:
> > 
> > > By not directly timing out the I/O but accelerating the timeout by a
> > > factor. The value could be calculated as a percentage of the queue timeout
> > > value for a default with the option of exposing a sysfs attribute
> > > similar to fast_io_fail_tmo. The attribute could also provide a off
> > > method which we do not have today and is my bad that we do not have one
> > > (I posted the features patch to multipath but did not followup which
> > > would have provided a off).
> > 
> > You're referring to these patches:
> > https://patchwork.kernel.org/patch/96674/
> > https://patchwork.kernel.org/patch/96673/
> > 
> 
> Yes these are the patches that I was referring to.
> 
> > Do you have an interest in pursuing these further? 
> 
> Yes.
> 
> > In the near-term
> > should we default to off (so introduce MP_FEATURE_ABORT_Q) -- given the
> > current race which exposes corruption?
> > 
> 
> Given the current race exposure default to off might be the best choice.

OK, I can work to refresh these patches, invert the logic to default to
off, and repost.  But in addition I'll post a 3rd patch that disallows
anything but off.

> > Or are you now interested in accelerating the timeout?  I'd need to
> > review this thread in more detail to give you an opinion.  But I do know
> > that simply disabling dm-mpath's call to blk_abort_queue() enables some
> > extensive path failure load testing to _not_ cause the list corruption
> > that leads to a crash.
> 
> I think the on/off control plus a fix to address the issue when it is on
> would be good. Since I do not believe we want the impact the normal IO
> path by more lock bouncing adding modification of the blk_abort_queue
> function appeared like one of the least distributive options. There might
> be others.

OK, I'll defer to you (and/or Mike C) to propose that additional fix
to allow us to safely enable the feature.  As part of that patch you'd
revert the small change from my 3rd patch that disallows anything but
off?

Could be that we won't need my 3rd patch -- if your additional fix for
the race can be developed and tested quickly.  But "quickly" is all
relative, the comprehensive load testing I've done is successful if it
lasts ~50 hours without crashing.

Thanks,
Mike

  reply	other threads:[~2010-11-17 21:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-12  5:23 block_abort_queue (blk_abort_request) racing with scsi_request_fn Mike Anderson
2010-11-10  7:09 ` [dm-devel] " Mike Christie
2010-11-10  7:30   ` Mike Christie
2010-11-10 16:30     ` Mike Anderson
2010-11-10 21:16       ` Mike Christie
2010-11-12 17:54         ` Mike Anderson
2010-11-16 21:39           ` Mike Snitzer
2010-11-17 17:49             ` [dm-devel] " Mike Anderson
2010-11-17 21:55               ` Mike Snitzer [this message]
2010-11-18  4:40                 ` [PATCH v2] dm mpath: add feature flag to control call to blk_abort_queue Mike Snitzer
2010-11-18  7:20                   ` Mike Anderson
2010-11-18 15:48                     ` Mike Snitzer
2010-11-18 15:48                     ` [PATCH v3] " Mike Snitzer
2010-11-18 19:16                       ` (unknown), Mike Snitzer
2010-11-18 19:21                         ` Mike Snitzer
2010-11-18 19:19                       ` [PATCH v4] dm mpath: avoid call to blk_abort_queue by default Mike Snitzer
2010-11-18 20:07                         ` [PATCH v5] " Mike Snitzer
2010-11-18 20:18                           ` [dm-devel] " Alasdair G Kergon
2010-11-18 20:39                             ` Mike Anderson
2010-11-18 21:48                             ` [PATCH] dm mpath: disable call to blk_abort_queue and related code Mike Snitzer
2010-11-23  1:00                               ` [PATCH v2] dm mpath: revert "dm: Call blk_abort_queue on failed paths" Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101117215541.GA7785@redhat.com \
    --to=snitzer@redhat.com \
    --cc=James.Bottomley@suse.de \
    --cc=andmike@linux.vnet.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.