All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Anderson <andmike@linux.vnet.ibm.com>
To: device-mapper development <dm-devel@redhat.com>
Cc: James Bottomley <James.Bottomley@suse.de>, linux-scsi@vger.kernel.org
Subject: Re: [dm-devel] block_abort_queue (blk_abort_request) racing with scsi_request_fn
Date: Wed, 10 Nov 2010 08:30:47 -0800	[thread overview]
Message-ID: <20101110163047.GA26201@linux.vnet.ibm.com> (raw)
In-Reply-To: <4CDA49F8.9050406@cs.wisc.edu>

Mike Christie <michaelc@cs.wisc.edu> wrote:
> On 11/10/2010 01:09 AM, Mike Christie wrote:
> >On 05/12/2010 12:23 AM, Mike Anderson wrote:
> >>I was looking at a dump from a weekend run and I believe I am seeing a
> >>case where blk_abort_request through blk_abort_queue picked up a request
> >>for timeout that scsi_request_fn decided not to start. This test was
> >>under
> >>error injection.
> >>
> >>I assume the case in scsi_request_fn this is hitting is that a request
> >>has
> >>been put on the timeout_list with blk_start_request and then one of the
> >>not_ready checks is hit and the request is decided not to be started. I
> >>believe the drop
> >>
> >>It appears that my usage of walking the timeout_list in block_abort_queue
> >>and using blk_mark_rq_complete in block_abort_request will not work in
> >>this case.
> >>
> >>While it would be good to have way to ensure a command is started, it is
> >>unclear if even at a low timeout of 1 second that a user other than
> >>blk_abort_queue would hit this race.
> >>
> >>The dropping / acquiring of host_lock and queue_lock in scsi_request_fn
> >>and scsi_dispatch_cmd make it unclear to me if usage of
> >>blk_mark_rq_complete will cover all cases.
> >>
> >>I looked at checking serial_number in scsi_times_out along with a couple
> >>blk_mark_rq_complete additions, but unclear if this would be good and
> >>/ or
> >>work in all cases.
> >>
> >>I looked at just accelerating deadline by some default value but unclear
> >>if that would be acceptable.
> >>
> >>I also looked at just using just the mark interface I previously posted
> >>and not calling blk_abort_request at all, but that would change current
> >>behavior that has been in use for a while.
> >>
> >
> >Did you ever solve this? I am hitting this with the dm-multipath
> >blk_abort_queue case (the email I sent you a couple weeks ago).
> >

No. I am also not seeing it in my recent error injection testing.

Is your test configuration / error injection testing able to reproduce
fairly reliably. If so can you provide some general details on how you
are generating this error.

> >It seems we could fix this by just having blk_requeue_request do a check
> >for if the request timedout similar to what scsi used to do. A hacky way
> >might be to have 2 requeue functions.
> >
> >blk_requeue_completed_request - This is the blk_requeue_request we have
> >today. It unconditionally requeues the request. It should only be used
> >if the command has been completed either from blk_complete_request or
> >from block layer timeout handling
> >(blk_rq_timed_out_timer->blk_rq_timed_out->rq_timed_out_fn).
> >
> >blk_requeue_running_request - This checks if the timer is running before
> >requeueing the request. If blk_rq_timed_out_timer/blk_rq_timed_out has
> >taken over the request and is going to handle it then this function just
> >returns and does not requeue. So for this we could just have it check if
> >the queue has a rq_timed_out_fn and if rq->timeout_list is empty or not.
> >
> >I think this might be confusing to use, so I tried something slightly
> >different below.
> >
> >
> >I also tried a patch where we just add another req bit. We set it in
> >blk_rq_timed_out_timer and clear it in a new function that clears it
> >then calls blk_requeue_reqeust. The new function:
> >
> >blk_requeue_timedout_request - used when request is to be requeued if a
> >LLD q->rq_timed_out_fn returned BLK_EH_NOT_HANDLED and has resolved the
> >problem and wants the request to be requeued. This function clears
> >REQ_ATOM_TIMEDOUT and then calls blk_requeue_request.
> >
> >blk_reqeuue_request would then check if REQ_ATOM_TIMEDOUT is set and if
> >it was just drop the request assuming the rq_timed_out_fn was handling
> >it. This still requires the caller to know how the command is supposed
> >to be reqeueud. But I think it might be easier since the driver here has
> >returned BLK_EH_NOT_HANDLED in the q timeout fn so they know that they
> >are going to be handling the request in a special way.
> >
> >I attached the last idea here. Made over Linus's tree. Only compile tested.
> >
> 
> Oops, nevermind. I think this is trying to solve a slightly
> different problem. I saw your other mail. My patch will not handle
> the case where:
> 
> 1. cmd is in scsi_reqeust_fn has run blk_start_request and dropped
> the queue_lock. Has not yet taken the host lock and incremented host
> busy counters.
> 2. blk_abort_queue runs q->rq_timed_out_fn and adds cmd to host eh list.
> 3. Somehow scsi eh runs and is finishing its stuff before #1 has
> done anything, so the cmd was just processed by scsi eh *and* at the
> same time is still lingering in scsi_request_fn (somehow #1 has
> still not taken the host lock).
> 

While #1 could also return with a busy from queuecommand which will call
scsi_queue_insert with no check for complete. One could add a
blk_mark_rq_complete check prior to calling scsi_queue_insert. This does
not cover the not_ready label case in scsi_request_fn.

Another option might be to make blk_abort less aggressive if we cannot
close all the paths and switch it to more of a drain model, but then we
may be in the same boat in selecting how fast we can drain based what we
perceive to be a safe time value. This option leaves the existing races
open in the scsi_request_fn / scsi_dispatch_cmd.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

  reply	other threads:[~2010-11-10 16:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-12  5:23 block_abort_queue (blk_abort_request) racing with scsi_request_fn Mike Anderson
2010-11-10  7:09 ` [dm-devel] " Mike Christie
2010-11-10  7:30   ` Mike Christie
2010-11-10 16:30     ` Mike Anderson [this message]
2010-11-10 21:16       ` Mike Christie
2010-11-12 17:54         ` Mike Anderson
2010-11-16 21:39           ` Mike Snitzer
2010-11-17 17:49             ` [dm-devel] " Mike Anderson
2010-11-17 21:55               ` Mike Snitzer
2010-11-18  4:40                 ` [PATCH v2] dm mpath: add feature flag to control call to blk_abort_queue Mike Snitzer
2010-11-18  7:20                   ` Mike Anderson
2010-11-18 15:48                     ` Mike Snitzer
2010-11-18 15:48                     ` [PATCH v3] " Mike Snitzer
2010-11-18 19:16                       ` (unknown), Mike Snitzer
2010-11-18 19:21                         ` Mike Snitzer
2010-11-18 19:19                       ` [PATCH v4] dm mpath: avoid call to blk_abort_queue by default Mike Snitzer
2010-11-18 20:07                         ` [PATCH v5] " Mike Snitzer
2010-11-18 20:18                           ` [dm-devel] " Alasdair G Kergon
2010-11-18 20:39                             ` Mike Anderson
2010-11-18 21:48                             ` [PATCH] dm mpath: disable call to blk_abort_queue and related code Mike Snitzer
2010-11-23  1:00                               ` [PATCH v2] dm mpath: revert "dm: Call blk_abort_queue on failed paths" Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101110163047.GA26201@linux.vnet.ibm.com \
    --to=andmike@linux.vnet.ibm.com \
    --cc=James.Bottomley@suse.de \
    --cc=dm-devel@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.