All of lore.kernel.org
 help / color / mirror / Atom feed
* block_abort_queue (blk_abort_request) racing with scsi_request_fn
@ 2010-05-12  5:23 Mike Anderson
  2010-11-10  7:09 ` [dm-devel] " Mike Christie
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2010-05-12  5:23 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley; +Cc: dm-devel, linux-scsi

I was looking at a dump from a weekend run and I believe I am seeing a
case where blk_abort_request through blk_abort_queue picked up a request
for timeout that scsi_request_fn decided not to start. This test was under
error injection.

I assume the case in scsi_request_fn this is hitting is that a request has
been put on the timeout_list with blk_start_request and then one of the
not_ready checks is hit and the request is decided not to be started. I
believe the drop 

It appears that my usage of walking the timeout_list in block_abort_queue
and using blk_mark_rq_complete in block_abort_request will not work in
this case.
 
While it would be good to have way to ensure a command is started, it is
unclear if even at a low timeout of 1 second that a user other than
blk_abort_queue would hit this race.

The dropping / acquiring of host_lock and queue_lock in scsi_request_fn
and scsi_dispatch_cmd make it unclear to me if usage of
blk_mark_rq_complete will cover all cases.

I looked at checking serial_number in scsi_times_out along with a couple
blk_mark_rq_complete additions, but unclear if this would be good and / or
work in all cases.

I looked at just accelerating deadline by some default value but unclear
if that would be acceptable. 

I also looked at just using just the mark interface I previously posted
and not calling blk_abort_request at all, but that would change current
behavior that has been in use for a while.

Looking for suggestions.

Thanks,

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dm-devel] block_abort_queue (blk_abort_request) racing with scsi_request_fn
  2010-05-12  5:23 block_abort_queue (blk_abort_request) racing with scsi_request_fn Mike Anderson
@ 2010-11-10  7:09 ` Mike Christie
  2010-11-10  7:30   ` Mike Christie
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Christie @ 2010-11-10  7:09 UTC (permalink / raw)
  To: device-mapper development
  Cc: Mike Anderson, Jens Axboe, James Bottomley, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 3589 bytes --]

On 05/12/2010 12:23 AM, Mike Anderson wrote:
> I was looking at a dump from a weekend run and I believe I am seeing a
> case where blk_abort_request through blk_abort_queue picked up a request
> for timeout that scsi_request_fn decided not to start. This test was under
> error injection.
>
> I assume the case in scsi_request_fn this is hitting is that a request has
> been put on the timeout_list with blk_start_request and then one of the
> not_ready checks is hit and the request is decided not to be started. I
> believe the drop
>
> It appears that my usage of walking the timeout_list in block_abort_queue
> and using blk_mark_rq_complete in block_abort_request will not work in
> this case.
>
> While it would be good to have way to ensure a command is started, it is
> unclear if even at a low timeout of 1 second that a user other than
> blk_abort_queue would hit this race.
>
> The dropping / acquiring of host_lock and queue_lock in scsi_request_fn
> and scsi_dispatch_cmd make it unclear to me if usage of
> blk_mark_rq_complete will cover all cases.
>
> I looked at checking serial_number in scsi_times_out along with a couple
> blk_mark_rq_complete additions, but unclear if this would be good and / or
> work in all cases.
>
> I looked at just accelerating deadline by some default value but unclear
> if that would be acceptable.
>
> I also looked at just using just the mark interface I previously posted
> and not calling blk_abort_request at all, but that would change current
> behavior that has been in use for a while.
>

Did you ever solve this? I am hitting this with the dm-multipath 
blk_abort_queue case (the email I sent you a couple weeks ago).

It seems we could fix this by just having blk_requeue_request do a check 
for if the request timedout similar to what scsi used to do. A hacky way 
might be to have 2 requeue functions.

blk_requeue_completed_request - This is the blk_requeue_request we have 
today. It unconditionally requeues the request. It should only be used 
if the command has been completed either from blk_complete_request or 
from block layer timeout handling 
(blk_rq_timed_out_timer->blk_rq_timed_out->rq_timed_out_fn).

blk_requeue_running_request - This checks if the timer is running before 
requeueing the request. If blk_rq_timed_out_timer/blk_rq_timed_out has 
taken over the request and is going to handle it then this function just 
returns and does not requeue. So for this we could just have it check if 
the queue has a rq_timed_out_fn and if rq->timeout_list is empty or not.

I think this might be confusing to use, so I tried something slightly 
different below.


I also tried a patch where we just add another req bit. We set it in 
blk_rq_timed_out_timer and clear it in a new function that clears it 
then calls blk_requeue_reqeust. The new function:

blk_requeue_timedout_request - used when request is to be requeued if a 
LLD q->rq_timed_out_fn returned BLK_EH_NOT_HANDLED and has resolved the 
problem and wants the request to be requeued. This function clears 
REQ_ATOM_TIMEDOUT and then calls blk_requeue_request.

blk_reqeuue_request would then check if REQ_ATOM_TIMEDOUT is set and if 
it was just drop the request assuming the rq_timed_out_fn was handling 
it. This still requires the caller to know how the command is supposed 
to be reqeueud. But I think it might be easier since the driver here has 
returned BLK_EH_NOT_HANDLED in the q timeout fn so they know that they 
are going to be handling the request in a special way.

I attached the last idea here. Made over Linus's tree. Only compile tested.

[-- Attachment #2: blk-requeue-timedout-request.patch --]
[-- Type: text/plain, Size: 5492 bytes --]

diff --git a/block/blk-core.c b/block/blk-core.c
index f0834e2..92279d4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -981,6 +981,9 @@ EXPORT_SYMBOL(blk_make_request);
  */
 void blk_requeue_request(struct request_queue *q, struct request *rq)
 {
+	if (test_bit(REQ_ATOM_TIMEDOUT, &rq->atomic_flags))
+		return;
+
 	blk_delete_timer(rq);
 	blk_clear_rq_complete(rq);
 	trace_block_rq_requeue(q, rq);
diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index ee9c216..e0a8d11 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -156,8 +156,10 @@ void blk_complete_request(struct request *req)
 {
 	if (unlikely(blk_should_fake_timeout(req->q)))
 		return;
-	if (!blk_mark_rq_complete(req))
+	if (!blk_mark_rq_complete(req)) {
+		blk_clear_rq_timedout(req);
 		__blk_complete_request(req);
+	}
 }
 EXPORT_SYMBOL(blk_complete_request);
 
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 4f0c06c..afc6e5f 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -97,6 +97,7 @@ static void blk_rq_timed_out(struct request *req)
 		 * and we can move more of the generic scsi eh code to
 		 * the blk layer.
 		 */
+		blk_mark_rq_timedout(req);
 		break;
 	default:
 		printk(KERN_ERR "block: bad eh return: %d\n", ret);
@@ -104,6 +105,25 @@ static void blk_rq_timed_out(struct request *req)
 	}
 }
 
+/**
+ * blk_requeue_timedout_request - put a request that timedout back on queue
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * Description:
+ *	If a module has returned BLK_EH_NOT_HANDLED from its
+ *	rq_timed_out_fn and needs to requeue the request this
+ *	function should be used instead of blk_requeue_request.
+ *
+ *	queue_lock must be held.
+ */
+void blk_requeue_timedout_request(struct request_queue *q, struct request *req)
+{
+	blk_clear_rq_timedout(req);
+	blk_requeue_request(q, req);
+}
+EXPORT_SYMBOL_GPL(blk_requeue_timedout_request);
+
 void blk_rq_timed_out_timer(unsigned long data)
 {
 	struct request_queue *q = (struct request_queue *) data;
diff --git a/block/blk.h b/block/blk.h
index 2db8f32..ad93258 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -30,6 +30,7 @@ void __generic_unplug_device(struct request_queue *);
  */
 enum rq_atomic_flags {
 	REQ_ATOM_COMPLETE = 0,
+	REQ_ATOM_TIMEDOUT = 1,
 };
 
 /*
@@ -46,7 +47,15 @@ static inline void blk_clear_rq_complete(struct request *rq)
 	clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
 }
 
-/*
+static inline int blk_mark_rq_timedout(struct request *rq)
+{
+	return test_and_set_bit(REQ_ATOM_TIMEDOUT, &rq->atomic_flags);
+}
+
+static inline void blk_clear_rq_timedout(struct request *rq)
+{
+	clear_bit(REQ_ATOM_TIMEDOUT, &rq->atomic_flags);
+}/*
  * Internal elevator interface
  */
 #define ELV_ON_HASH(rq)		(!hlist_unhashed(&(rq)->hash))
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1de30eb..06df25a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1680,7 +1680,11 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 							  " retry cmd: %p\n",
 							  current->comm,
 							  scmd));
-				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+				printk(KERN_ERR "scmd %p %p %p\n", scmd,
+					scmd->eh_entry.next, scmd->eh_entry.prev);
+
+				scsi_queue_insert(scmd,
+						SCSI_MLQUEUE_EH_TIMEDOUT_RETRY);
 		} else {
 			/*
 			 * If just we got sense for the device (called
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index eafeeda..cef49b2 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -158,7 +158,10 @@ static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 	 * and plugs the queue appropriately.
          */
 	spin_lock_irqsave(q->queue_lock, flags);
-	blk_requeue_request(q, cmd->request);
+	if (reason == SCSI_MLQUEUE_EH_TIMEDOUT_RETRY)
+		blk_requeue_timedout_request(q, cmd->request);
+	else
+		blk_requeue_request(q, cmd->request);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	scsi_run_queue(q);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5027a59..e56f28e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -205,7 +205,10 @@ typedef int (dma_drain_needed_fn)(struct request *);
 typedef int (lld_busy_fn) (struct request_queue *q);
 
 enum blk_eh_timer_return {
-	BLK_EH_NOT_HANDLED,
+	BLK_EH_NOT_HANDLED,	/* If this is returned the module must
+				 * call blk_requeue_timedout_request to
+				 * requeue it
+				 */
 	BLK_EH_HANDLED,
 	BLK_EH_RESET_TIMER,
 };
@@ -653,6 +656,8 @@ extern struct request *blk_get_request(struct request_queue *, int, gfp_t);
 extern struct request *blk_make_request(struct request_queue *, struct bio *,
 					gfp_t);
 extern void blk_insert_request(struct request_queue *, struct request *, int, void *);
+extern void blk_requeue_timedout_request(struct request_queue *,
+					 struct request *);
 extern void blk_requeue_request(struct request_queue *, struct request *);
 extern void blk_add_request_payload(struct request *rq, struct page *page,
 		unsigned int len);
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 216af85..5bde952 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -442,6 +442,7 @@ static inline int scsi_is_wlun(unsigned int lun)
 #define SCSI_MLQUEUE_DEVICE_BUSY 0x1056
 #define SCSI_MLQUEUE_EH_RETRY    0x1057
 #define SCSI_MLQUEUE_TARGET_BUSY 0x1058
+#define SCSI_MLQUEUE_EH_TIMEDOUT_RETRY	0x1059
 
 /*
  *  Use these to separate status msg and our bytes

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [dm-devel] block_abort_queue (blk_abort_request) racing with scsi_request_fn
  2010-11-10  7:09 ` [dm-devel] " Mike Christie
@ 2010-11-10  7:30   ` Mike Christie
  2010-11-10 16:30     ` Mike Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Christie @ 2010-11-10  7:30 UTC (permalink / raw)
  To: device-mapper development; +Cc: Mike Anderson, James Bottomley, linux-scsi

On 11/10/2010 01:09 AM, Mike Christie wrote:
> On 05/12/2010 12:23 AM, Mike Anderson wrote:
>> I was looking at a dump from a weekend run and I believe I am seeing a
>> case where blk_abort_request through blk_abort_queue picked up a request
>> for timeout that scsi_request_fn decided not to start. This test was
>> under
>> error injection.
>>
>> I assume the case in scsi_request_fn this is hitting is that a request
>> has
>> been put on the timeout_list with blk_start_request and then one of the
>> not_ready checks is hit and the request is decided not to be started. I
>> believe the drop
>>
>> It appears that my usage of walking the timeout_list in block_abort_queue
>> and using blk_mark_rq_complete in block_abort_request will not work in
>> this case.
>>
>> While it would be good to have way to ensure a command is started, it is
>> unclear if even at a low timeout of 1 second that a user other than
>> blk_abort_queue would hit this race.
>>
>> The dropping / acquiring of host_lock and queue_lock in scsi_request_fn
>> and scsi_dispatch_cmd make it unclear to me if usage of
>> blk_mark_rq_complete will cover all cases.
>>
>> I looked at checking serial_number in scsi_times_out along with a couple
>> blk_mark_rq_complete additions, but unclear if this would be good and
>> / or
>> work in all cases.
>>
>> I looked at just accelerating deadline by some default value but unclear
>> if that would be acceptable.
>>
>> I also looked at just using just the mark interface I previously posted
>> and not calling blk_abort_request at all, but that would change current
>> behavior that has been in use for a while.
>>
>
> Did you ever solve this? I am hitting this with the dm-multipath
> blk_abort_queue case (the email I sent you a couple weeks ago).
>
> It seems we could fix this by just having blk_requeue_request do a check
> for if the request timedout similar to what scsi used to do. A hacky way
> might be to have 2 requeue functions.
>
> blk_requeue_completed_request - This is the blk_requeue_request we have
> today. It unconditionally requeues the request. It should only be used
> if the command has been completed either from blk_complete_request or
> from block layer timeout handling
> (blk_rq_timed_out_timer->blk_rq_timed_out->rq_timed_out_fn).
>
> blk_requeue_running_request - This checks if the timer is running before
> requeueing the request. If blk_rq_timed_out_timer/blk_rq_timed_out has
> taken over the request and is going to handle it then this function just
> returns and does not requeue. So for this we could just have it check if
> the queue has a rq_timed_out_fn and if rq->timeout_list is empty or not.
>
> I think this might be confusing to use, so I tried something slightly
> different below.
>
>
> I also tried a patch where we just add another req bit. We set it in
> blk_rq_timed_out_timer and clear it in a new function that clears it
> then calls blk_requeue_reqeust. The new function:
>
> blk_requeue_timedout_request - used when request is to be requeued if a
> LLD q->rq_timed_out_fn returned BLK_EH_NOT_HANDLED and has resolved the
> problem and wants the request to be requeued. This function clears
> REQ_ATOM_TIMEDOUT and then calls blk_requeue_request.
>
> blk_reqeuue_request would then check if REQ_ATOM_TIMEDOUT is set and if
> it was just drop the request assuming the rq_timed_out_fn was handling
> it. This still requires the caller to know how the command is supposed
> to be reqeueud. But I think it might be easier since the driver here has
> returned BLK_EH_NOT_HANDLED in the q timeout fn so they know that they
> are going to be handling the request in a special way.
>
> I attached the last idea here. Made over Linus's tree. Only compile tested.
>

Oops, nevermind. I think this is trying to solve a slightly different 
problem. I saw your other mail. My patch will not handle the case where:

1. cmd is in scsi_reqeust_fn has run blk_start_request and dropped the 
queue_lock. Has not yet taken the host lock and incremented host busy 
counters.
2. blk_abort_queue runs q->rq_timed_out_fn and adds cmd to host eh list.
3. Somehow scsi eh runs and is finishing its stuff before #1 has done 
anything, so the cmd was just processed by scsi eh *and* at the same 
time is still lingering in scsi_request_fn (somehow #1 has still not 
taken the host lock).

scsi eh is then does 
scsi_eh_flush_done_q->scsi_queue_insert->blk_requeue_request on the 
request while request is also in scsi_request_fn processing.

So now we hit some bug ons in the blk code. The cmd from #1 then finally 
grabs the host lock but it is too late.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dm-devel] block_abort_queue (blk_abort_request) racing with scsi_request_fn
  2010-11-10  7:30   ` Mike Christie
@ 2010-11-10 16:30     ` Mike Anderson
  2010-11-10 21:16       ` Mike Christie
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2010-11-10 16:30 UTC (permalink / raw)
  To: device-mapper development; +Cc: James Bottomley, linux-scsi

Mike Christie <michaelc@cs.wisc.edu> wrote:
> On 11/10/2010 01:09 AM, Mike Christie wrote:
> >On 05/12/2010 12:23 AM, Mike Anderson wrote:
> >>I was looking at a dump from a weekend run and I believe I am seeing a
> >>case where blk_abort_request through blk_abort_queue picked up a request
> >>for timeout that scsi_request_fn decided not to start. This test was
> >>under
> >>error injection.
> >>
> >>I assume the case in scsi_request_fn this is hitting is that a request
> >>has
> >>been put on the timeout_list with blk_start_request and then one of the
> >>not_ready checks is hit and the request is decided not to be started. I
> >>believe the drop
> >>
> >>It appears that my usage of walking the timeout_list in block_abort_queue
> >>and using blk_mark_rq_complete in block_abort_request will not work in
> >>this case.
> >>
> >>While it would be good to have way to ensure a command is started, it is
> >>unclear if even at a low timeout of 1 second that a user other than
> >>blk_abort_queue would hit this race.
> >>
> >>The dropping / acquiring of host_lock and queue_lock in scsi_request_fn
> >>and scsi_dispatch_cmd make it unclear to me if usage of
> >>blk_mark_rq_complete will cover all cases.
> >>
> >>I looked at checking serial_number in scsi_times_out along with a couple
> >>blk_mark_rq_complete additions, but unclear if this would be good and
> >>/ or
> >>work in all cases.
> >>
> >>I looked at just accelerating deadline by some default value but unclear
> >>if that would be acceptable.
> >>
> >>I also looked at just using just the mark interface I previously posted
> >>and not calling blk_abort_request at all, but that would change current
> >>behavior that has been in use for a while.
> >>
> >
> >Did you ever solve this? I am hitting this with the dm-multipath
> >blk_abort_queue case (the email I sent you a couple weeks ago).
> >

No. I am also not seeing it in my recent error injection testing.

Is your test configuration / error injection testing able to reproduce
fairly reliably. If so can you provide some general details on how you
are generating this error.

> >It seems we could fix this by just having blk_requeue_request do a check
> >for if the request timedout similar to what scsi used to do. A hacky way
> >might be to have 2 requeue functions.
> >
> >blk_requeue_completed_request - This is the blk_requeue_request we have
> >today. It unconditionally requeues the request. It should only be used
> >if the command has been completed either from blk_complete_request or
> >from block layer timeout handling
> >(blk_rq_timed_out_timer->blk_rq_timed_out->rq_timed_out_fn).
> >
> >blk_requeue_running_request - This checks if the timer is running before
> >requeueing the request. If blk_rq_timed_out_timer/blk_rq_timed_out has
> >taken over the request and is going to handle it then this function just
> >returns and does not requeue. So for this we could just have it check if
> >the queue has a rq_timed_out_fn and if rq->timeout_list is empty or not.
> >
> >I think this might be confusing to use, so I tried something slightly
> >different below.
> >
> >
> >I also tried a patch where we just add another req bit. We set it in
> >blk_rq_timed_out_timer and clear it in a new function that clears it
> >then calls blk_requeue_reqeust. The new function:
> >
> >blk_requeue_timedout_request - used when request is to be requeued if a
> >LLD q->rq_timed_out_fn returned BLK_EH_NOT_HANDLED and has resolved the
> >problem and wants the request to be requeued. This function clears
> >REQ_ATOM_TIMEDOUT and then calls blk_requeue_request.
> >
> >blk_reqeuue_request would then check if REQ_ATOM_TIMEDOUT is set and if
> >it was just drop the request assuming the rq_timed_out_fn was handling
> >it. This still requires the caller to know how the command is supposed
> >to be reqeueud. But I think it might be easier since the driver here has
> >returned BLK_EH_NOT_HANDLED in the q timeout fn so they know that they
> >are going to be handling the request in a special way.
> >
> >I attached the last idea here. Made over Linus's tree. Only compile tested.
> >
> 
> Oops, nevermind. I think this is trying to solve a slightly
> different problem. I saw your other mail. My patch will not handle
> the case where:
> 
> 1. cmd is in scsi_reqeust_fn has run blk_start_request and dropped
> the queue_lock. Has not yet taken the host lock and incremented host
> busy counters.
> 2. blk_abort_queue runs q->rq_timed_out_fn and adds cmd to host eh list.
> 3. Somehow scsi eh runs and is finishing its stuff before #1 has
> done anything, so the cmd was just processed by scsi eh *and* at the
> same time is still lingering in scsi_request_fn (somehow #1 has
> still not taken the host lock).
> 

While #1 could also return with a busy from queuecommand which will call
scsi_queue_insert with no check for complete. One could add a
blk_mark_rq_complete check prior to calling scsi_queue_insert. This does
not cover the not_ready label case in scsi_request_fn.

Another option might be to make blk_abort less aggressive if we cannot
close all the paths and switch it to more of a drain model, but then we
may be in the same boat in selecting how fast we can drain based what we
perceive to be a safe time value. This option leaves the existing races
open in the scsi_request_fn / scsi_dispatch_cmd.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dm-devel] block_abort_queue (blk_abort_request) racing with scsi_request_fn
  2010-11-10 16:30     ` Mike Anderson
@ 2010-11-10 21:16       ` Mike Christie
  2010-11-12 17:54         ` Mike Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Christie @ 2010-11-10 21:16 UTC (permalink / raw)
  To: Mike Anderson; +Cc: device-mapper development, James Bottomley, linux-scsi

On 11/10/2010 10:30 AM, Mike Anderson wrote:
> Mike Christie<michaelc@cs.wisc.edu>  wrote:
>> On 11/10/2010 01:09 AM, Mike Christie wrote:
>>> On 05/12/2010 12:23 AM, Mike Anderson wrote:
>>>> I was looking at a dump from a weekend run and I believe I am seeing a
>>>> case where blk_abort_request through blk_abort_queue picked up a request
>>>> for timeout that scsi_request_fn decided not to start. This test was
>>>> under
>>>> error injection.
>>>>
>>>> I assume the case in scsi_request_fn this is hitting is that a request
>>>> has
>>>> been put on the timeout_list with blk_start_request and then one of the
>>>> not_ready checks is hit and the request is decided not to be started. I
>>>> believe the drop
>>>>
>>>> It appears that my usage of walking the timeout_list in block_abort_queue
>>>> and using blk_mark_rq_complete in block_abort_request will not work in
>>>> this case.
>>>>
>>>> While it would be good to have way to ensure a command is started, it is
>>>> unclear if even at a low timeout of 1 second that a user other than
>>>> blk_abort_queue would hit this race.
>>>>
>>>> The dropping / acquiring of host_lock and queue_lock in scsi_request_fn
>>>> and scsi_dispatch_cmd make it unclear to me if usage of
>>>> blk_mark_rq_complete will cover all cases.
>>>>
>>>> I looked at checking serial_number in scsi_times_out along with a couple
>>>> blk_mark_rq_complete additions, but unclear if this would be good and
>>>> / or
>>>> work in all cases.
>>>>
>>>> I looked at just accelerating deadline by some default value but unclear
>>>> if that would be acceptable.
>>>>
>>>> I also looked at just using just the mark interface I previously posted
>>>> and not calling blk_abort_request at all, but that would change current
>>>> behavior that has been in use for a while.
>>>>
>>>
>>> Did you ever solve this? I am hitting this with the dm-multipath
>>> blk_abort_queue case (the email I sent you a couple weeks ago).
>>>
>
> No. I am also not seeing it in my recent error injection testing.
>
> Is your test configuration / error injection testing able to reproduce
> fairly reliably. If so can you provide some general details on how you
> are generating this error.

In one test we just run dm-mutlipath over FC with the default timeouts, 
then disable the target controller. This leads to IO timing out and the 
scsi eh running. We then bring the controller back up and depending on 
timing that then can lead to either the scsi eh failing and IO being 
failed upwards or the scsi eh succeeding and IO retried.

Also a really easy way, but probably unrealistic, to hit something 
similar (more like what you hit in your 1 second timeout case in the 
other mail) is if you just set the queue timeout to 0, the queue_depth 
lower than the number of commands you are sending and let the IO test 
run on the scsi disk. This sort of replicates the problem because the 
request timesout while it is in the scsi_request_fn similar to how the 
blk_abort_queue can be called on the cmd while is in there.


>
>>> It seems we could fix this by just having blk_requeue_request do a check
>>> for if the request timedout similar to what scsi used to do. A hacky way
>>> might be to have 2 requeue functions.
>>>
>>> blk_requeue_completed_request - This is the blk_requeue_request we have
>>> today. It unconditionally requeues the request. It should only be used
>>> if the command has been completed either from blk_complete_request or
>> >from block layer timeout handling
>>> (blk_rq_timed_out_timer->blk_rq_timed_out->rq_timed_out_fn).
>>>
>>> blk_requeue_running_request - This checks if the timer is running before
>>> requeueing the request. If blk_rq_timed_out_timer/blk_rq_timed_out has
>>> taken over the request and is going to handle it then this function just
>>> returns and does not requeue. So for this we could just have it check if
>>> the queue has a rq_timed_out_fn and if rq->timeout_list is empty or not.
>>>
>>> I think this might be confusing to use, so I tried something slightly
>>> different below.
>>>
>>>
>>> I also tried a patch where we just add another req bit. We set it in
>>> blk_rq_timed_out_timer and clear it in a new function that clears it
>>> then calls blk_requeue_reqeust. The new function:
>>>
>>> blk_requeue_timedout_request - used when request is to be requeued if a
>>> LLD q->rq_timed_out_fn returned BLK_EH_NOT_HANDLED and has resolved the
>>> problem and wants the request to be requeued. This function clears
>>> REQ_ATOM_TIMEDOUT and then calls blk_requeue_request.
>>>
>>> blk_reqeuue_request would then check if REQ_ATOM_TIMEDOUT is set and if
>>> it was just drop the request assuming the rq_timed_out_fn was handling
>>> it. This still requires the caller to know how the command is supposed
>>> to be reqeueud. But I think it might be easier since the driver here has
>>> returned BLK_EH_NOT_HANDLED in the q timeout fn so they know that they
>>> are going to be handling the request in a special way.
>>>
>>> I attached the last idea here. Made over Linus's tree. Only compile tested.
>>>
>>
>> Oops, nevermind. I think this is trying to solve a slightly
>> different problem. I saw your other mail. My patch will not handle
>> the case where:
>>
>> 1. cmd is in scsi_reqeust_fn has run blk_start_request and dropped
>> the queue_lock. Has not yet taken the host lock and incremented host
>> busy counters.
>> 2. blk_abort_queue runs q->rq_timed_out_fn and adds cmd to host eh list.
>> 3. Somehow scsi eh runs and is finishing its stuff before #1 has
>> done anything, so the cmd was just processed by scsi eh *and* at the
>> same time is still lingering in scsi_request_fn (somehow #1 has
>> still not taken the host lock).
>>
>
> While #1 could also return with a busy from queuecommand which will call
> scsi_queue_insert with no check for complete. One could add a
> blk_mark_rq_complete check prior to calling scsi_queue_insert. This does
> not cover the not_ready label case in scsi_request_fn.
>

Yeah, I was trying to not to have to add blk_mark_rq_complete checks in 
the scsi layer and just have the block layer handle it. I think my patch 
in the previous mail covers both the scsi_request_fn and 
scsi_dispatch_cmd/queuecommand cases. The patch should work as long as 
the scsi eh has not already completed and run blk_requeue_request on it. 
The problem with my patch is only that case above where the cmd gets 
blk_abort_queued/timedout run on it and the scsi eh somehow is able to 
complete and run scsi_queue_insert while scsi_request_fn is still trying 
to process the request.

This could be solved by just having the scsi eh thread flush the queue 
before running. This would make sure the cmd is not running in 
scsi_request_fn while the scsi the scsi is also processing it. We sort 
of do this today with the host_busy checks in scsi_error_handler. The 
problem is that window from the time scsi_request_fn drops the 
queue_lock and grabs the host_lock to increment host_busy. In the 
window, blk_abort_request can run, the scsi_eh thread can see host_busy 
== host_failed, start the scsi eh, run to completion and hit the problem 
I described above.




> Another option might be to make blk_abort less aggressive if we cannot
> close all the paths and switch it to more of a drain model, but then we
> may be in the same boat in selecting how fast we can drain based what we
> perceive to be a safe time value. This option leaves the existing races
> open in the scsi_request_fn / scsi_dispatch_cmd.
>
> -andmike
> --
> Michael Anderson
> andmike@linux.vnet.ibm.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dm-devel] block_abort_queue (blk_abort_request) racing with scsi_request_fn
  2010-11-10 21:16       ` Mike Christie
@ 2010-11-12 17:54         ` Mike Anderson
  2010-11-16 21:39           ` Mike Snitzer
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2010-11-12 17:54 UTC (permalink / raw)
  To: Mike Christie; +Cc: device-mapper development, James Bottomley, linux-scsi

Mike Christie <michaelc@cs.wisc.edu> wrote:
> On 11/10/2010 10:30 AM, Mike Anderson wrote:
> >
> >Is your test configuration / error injection testing able to reproduce
> >fairly reliably. If so can you provide some general details on how you
> >are generating this error.
> 
> In one test we just run dm-mutlipath over FC with the default
> timeouts, then disable the target controller. This leads to IO
> timing out and the scsi eh running. We then bring the controller
> back up and depending on timing that then can lead to either the
> scsi eh failing and IO being failed upwards or the scsi eh
> succeeding and IO retried.
> 
> Also a really easy way, but probably unrealistic, to hit something
> similar (more like what you hit in your 1 second timeout case in the
> other mail) is if you just set the queue timeout to 0, the
> queue_depth lower than the number of commands you are sending and
> let the IO test run on the scsi disk. This sort of replicates the
> problem because the request timesout while it is in the
> scsi_request_fn similar to how the blk_abort_queue can be called on
> the cmd while is in there.
> 

I ran the timeout set to zero test using "modprobe scsi_debug
max_queue=2 virtual_gb=20" along with fio. Wihtout any changes the I/O
locks up in with in a minute. I applied your patch and the I/O locked up
in near the same amount of time which based on your previously mail I
think this is to be expected. I then reordered scsi_request_fn to do the
host_busy work up front prior to getting a request and starting it. It
has been running for ~ 18 hours (although it is slow due to all the
timeouts occurring).

The reordering in scsi_request_fn concerns me as we are bouncing the
locks more and unclear how much impact the reordering has on I/O rates. I
did not attach a patch as currently it does not look very good and I think
I am missing the handling of a couple of error cases.

> 
> >>>
> >>>I also tried a patch where we just add another req bit. We set it in
> >>>blk_rq_timed_out_timer and clear it in a new function that clears it
> >>>then calls blk_requeue_reqeust. The new function:
> >>>
> >>>blk_requeue_timedout_request - used when request is to be requeued if a
> >>>LLD q->rq_timed_out_fn returned BLK_EH_NOT_HANDLED and has resolved the
> >>>problem and wants the request to be requeued. This function clears
> >>>REQ_ATOM_TIMEDOUT and then calls blk_requeue_request.
> >>>
> >>>blk_reqeuue_request would then check if REQ_ATOM_TIMEDOUT is set and if
> >>>it was just drop the request assuming the rq_timed_out_fn was handling
> >>>it. This still requires the caller to know how the command is supposed
> >>>to be reqeueud. But I think it might be easier since the driver here has
> >>>returned BLK_EH_NOT_HANDLED in the q timeout fn so they know that they
> >>>are going to be handling the request in a special way.
> >>>
> >>>I attached the last idea here. Made over Linus's tree. Only compile tested.
> >>>
> >>
> >>Oops, nevermind. I think this is trying to solve a slightly
> >>different problem. I saw your other mail. My patch will not handle
> >>the case where:
> >>
> >>1. cmd is in scsi_reqeust_fn has run blk_start_request and dropped
> >>the queue_lock. Has not yet taken the host lock and incremented host
> >>busy counters.
> >>2. blk_abort_queue runs q->rq_timed_out_fn and adds cmd to host eh list.
> >>3. Somehow scsi eh runs and is finishing its stuff before #1 has
> >>done anything, so the cmd was just processed by scsi eh *and* at the
> >>same time is still lingering in scsi_request_fn (somehow #1 has
> >>still not taken the host lock).
> >>
> >
> >While #1 could also return with a busy from queuecommand which will call
> >scsi_queue_insert with no check for complete. One could add a
> >blk_mark_rq_complete check prior to calling scsi_queue_insert. This does
> >not cover the not_ready label case in scsi_request_fn.
> >
> 
> Yeah, I was trying to not to have to add blk_mark_rq_complete checks
> in the scsi layer and just have the block layer handle it. I think
> my patch in the previous mail covers both the scsi_request_fn and
> scsi_dispatch_cmd/queuecommand cases. The patch should work as long
> as the scsi eh has not already completed and run blk_requeue_request
> on it. The problem with my patch is only that case above where the
> cmd gets blk_abort_queued/timedout run on it and the scsi eh somehow
> is able to complete and run scsi_queue_insert while scsi_request_fn
> is still trying to process the request.
> 
> This could be solved by just having the scsi eh thread flush the
> queue before running. This would make sure the cmd is not running in
> scsi_request_fn while the scsi the scsi is also processing it. We
> sort of do this today with the host_busy checks in
> scsi_error_handler. The problem is that window from the time
> scsi_request_fn drops the queue_lock and grabs the host_lock to
> increment host_busy. In the window, blk_abort_request can run, the
> scsi_eh thread can see host_busy == host_failed, start the scsi eh,
> run to completion and hit the problem I described above.
> 
> 
> 
> 
> >Another option might be to make blk_abort less aggressive if we cannot
> >close all the paths and switch it to more of a drain model, but then we
> >may be in the same boat in selecting how fast we can drain based what we
> >perceive to be a safe time value. This option leaves the existing races
> >open in the scsi_request_fn / scsi_dispatch_cmd.


With the lock push down work and other clean ups I assume some of the lock / unlocks in scsi_request_fn might be reduced making a reordering / addressing this issue less impact to the I/O path. This might be incorrect.

As mentioned above an option might be to provide a floor for the timeout
part of the abort instead of calling a timeout directly. 

While this seems like leaving a hole for others to fall into, it might
be the least disruptive to most users and may have the side effect of
avoiding a eh wakeup in some cases. At the same time a safe in all cases
value cannot really be selected.

We currently are providing some protection to the SG interfaces from
hitting this race by providing a floor for a timeout value of
BLK_MIN_SG_TIMEOUT. We also default the sd interface to SD_TIMEOUT
making it unlikely that a user would hit this case unless they modified
the timeout to low value like in the experiment above.

By not directly timing out the I/O but accelerating the timeout by a
factor. The value could be calculated as a percentage of the queue timeout
value for a default with the option of exposing a sysfs attribute
similar to fast_io_fail_tmo. The attribute could also provide a off
method which we do not have today and is my bad that we do not have one
(I posted the features patch to multipath but did not followup which
would have provided a off).  Since the queue is cleared increasing the
lifetime of the I/O a small amount should still provide good latency
reduction. The intent was to reduce high I/O latency during some failure
cases (since we do not fast fail timeouts anymore the timeout * N
retries plus a large number of I/Os queue case). At the same time this
interface needs to safe.

Is this pushing off the real fix a bad idea to consider / direction to
look at?

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block_abort_queue (blk_abort_request) racing with scsi_request_fn
  2010-11-12 17:54         ` Mike Anderson
@ 2010-11-16 21:39           ` Mike Snitzer
  2010-11-17 17:49             ` [dm-devel] " Mike Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Snitzer @ 2010-11-16 21:39 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Mike Christie, James Bottomley, linux-scsi, device-mapper development

Hi Mike,

On Fri, Nov 12 2010 at 12:54pm -0500,
Mike Anderson <andmike@linux.vnet.ibm.com> wrote:

> By not directly timing out the I/O but accelerating the timeout by a
> factor. The value could be calculated as a percentage of the queue timeout
> value for a default with the option of exposing a sysfs attribute
> similar to fast_io_fail_tmo. The attribute could also provide a off
> method which we do not have today and is my bad that we do not have one
> (I posted the features patch to multipath but did not followup which
> would have provided a off).

You're referring to these patches:
https://patchwork.kernel.org/patch/96674/
https://patchwork.kernel.org/patch/96673/

Do you have an interest in pursuing these further?  In the near-term
should we default to off (so introduce MP_FEATURE_ABORT_Q) -- given the
current race which exposes corruption?

Or are you now interested in accelerating the timeout?  I'd need to
review this thread in more detail to give you an opinion.  But I do know
that simply disabling dm-mpath's call to blk_abort_queue() enables some
extensive path failure load testing to _not_ cause the list corruption
that leads to a crash.

Mike

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dm-devel] block_abort_queue (blk_abort_request) racing with scsi_request_fn
  2010-11-16 21:39           ` Mike Snitzer
@ 2010-11-17 17:49             ` Mike Anderson
  2010-11-17 21:55               ` Mike Snitzer
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Anderson @ 2010-11-17 17:49 UTC (permalink / raw)
  To: device-mapper development; +Cc: Mike Christie, James Bottomley, linux-scsi

Mike Snitzer <snitzer@redhat.com> wrote:
> Hi Mike,
> 
> On Fri, Nov 12 2010 at 12:54pm -0500,
> Mike Anderson <andmike@linux.vnet.ibm.com> wrote:
> 
> > By not directly timing out the I/O but accelerating the timeout by a
> > factor. The value could be calculated as a percentage of the queue timeout
> > value for a default with the option of exposing a sysfs attribute
> > similar to fast_io_fail_tmo. The attribute could also provide a off
> > method which we do not have today and is my bad that we do not have one
> > (I posted the features patch to multipath but did not followup which
> > would have provided a off).
> 
> You're referring to these patches:
> https://patchwork.kernel.org/patch/96674/
> https://patchwork.kernel.org/patch/96673/
> 

Yes these are the patches that I was referring to.

> Do you have an interest in pursuing these further? 

Yes.

> In the near-term
> should we default to off (so introduce MP_FEATURE_ABORT_Q) -- given the
> current race which exposes corruption?
> 

Given the current race exposure default to off might be the best choice.

> Or are you now interested in accelerating the timeout?  I'd need to
> review this thread in more detail to give you an opinion.  But I do know
> that simply disabling dm-mpath's call to blk_abort_queue() enables some
> extensive path failure load testing to _not_ cause the list corruption
> that leads to a crash.

I think the on/off control plus a fix to address the issue when it is on
would be good. Since I do not believe we want the impact the normal IO
path by more lock bouncing adding modification of the blk_abort_queue
function appeared like one of the least distributive options. There might
be others.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block_abort_queue (blk_abort_request) racing with scsi_request_fn
  2010-11-17 17:49             ` [dm-devel] " Mike Anderson
@ 2010-11-17 21:55               ` Mike Snitzer
  2010-11-18  4:40                 ` [PATCH v2] dm mpath: add feature flag to control call to blk_abort_queue Mike Snitzer
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Snitzer @ 2010-11-17 21:55 UTC (permalink / raw)
  To: Mike Anderson
  Cc: Mike Christie, dm-devel@redhat.com James Bottomley, linux-scsi

On Wed, Nov 17 2010 at 12:49pm -0500,
Mike Anderson <andmike@linux.vnet.ibm.com> wrote:

> Mike Snitzer <snitzer@redhat.com> wrote:
> > Hi Mike,
> > 
> > On Fri, Nov 12 2010 at 12:54pm -0500,
> > Mike Anderson <andmike@linux.vnet.ibm.com> wrote:
> > 
> > > By not directly timing out the I/O but accelerating the timeout by a
> > > factor. The value could be calculated as a percentage of the queue timeout
> > > value for a default with the option of exposing a sysfs attribute
> > > similar to fast_io_fail_tmo. The attribute could also provide a off
> > > method which we do not have today and is my bad that we do not have one
> > > (I posted the features patch to multipath but did not followup which
> > > would have provided a off).
> > 
> > You're referring to these patches:
> > https://patchwork.kernel.org/patch/96674/
> > https://patchwork.kernel.org/patch/96673/
> > 
> 
> Yes these are the patches that I was referring to.
> 
> > Do you have an interest in pursuing these further? 
> 
> Yes.
> 
> > In the near-term
> > should we default to off (so introduce MP_FEATURE_ABORT_Q) -- given the
> > current race which exposes corruption?
> > 
> 
> Given the current race exposure default to off might be the best choice.

OK, I can work to refresh these patches, invert the logic to default to
off, and repost.  But in addition I'll post a 3rd patch that disallows
anything but off.

> > Or are you now interested in accelerating the timeout?  I'd need to
> > review this thread in more detail to give you an opinion.  But I do know
> > that simply disabling dm-mpath's call to blk_abort_queue() enables some
> > extensive path failure load testing to _not_ cause the list corruption
> > that leads to a crash.
> 
> I think the on/off control plus a fix to address the issue when it is on
> would be good. Since I do not believe we want the impact the normal IO
> path by more lock bouncing adding modification of the blk_abort_queue
> function appeared like one of the least distributive options. There might
> be others.

OK, I'll defer to you (and/or Mike C) to propose that additional fix
to allow us to safely enable the feature.  As part of that patch you'd
revert the small change from my 3rd patch that disallows anything but
off?

Could be that we won't need my 3rd patch -- if your additional fix for
the race can be developed and tested quickly.  But "quickly" is all
relative, the comprehensive load testing I've done is successful if it
lasts ~50 hours without crashing.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] dm mpath: add feature flag to control call to blk_abort_queue
  2010-11-17 21:55               ` Mike Snitzer
@ 2010-11-18  4:40                 ` Mike Snitzer
  2010-11-18  7:20                   ` Mike Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Snitzer @ 2010-11-18  4:40 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-scsi, Mike Anderson, Mike Christie

Add 'features' member to 'struct multipath' and introduce
MPF_ABORT_QUEUE as the first feature flag.  If "abort_queue_on_failure"
is provided during mpath device configuration the MPF_ABORT_QUEUE
feature flag will be set and blk_abort_queue() will be called during
path failure to allow for lower latency path failures.

But this feature has been disabled by default due to a race (between
blk_abort_queue and scsi_request_fn) that can lead to list corruption,
from: https://www.redhat.com/archives/dm-devel/2010-November/msg00085.html

"the cmd gets blk_abort_queued/timedout run on it and the scsi eh
somehow is able to complete and run scsi_queue_insert while
scsi_request_fn is still trying to process the request."

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/md/dm-mpath.c |   19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/md/dm-mpath.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-mpath.c
+++ linux-2.6/drivers/md/dm-mpath.c
@@ -56,8 +56,14 @@ struct priority_group {
 	struct list_head pgpaths;
 };
 
+/*
+ * Bits for the m->features
+ */
+#define MPF_ABORT_QUEUE 0
+
 /* Multipath context */
 struct multipath {
+	unsigned long features;
 	struct list_head list;
 	struct dm_target *ti;
 
@@ -146,7 +152,8 @@ static void deactivate_path(struct work_
 	struct pgpath *pgpath =
 		container_of(work, struct pgpath, deactivate_path);
 
-	blk_abort_queue(pgpath->path.dev->bdev->bd_disk->queue);
+	if (test_bit(MPF_ABORT_QUEUE, &pgpath->pg->m->features))
+		blk_abort_queue(pgpath->path.dev->bdev->bd_disk->queue);
 }
 
 static struct priority_group *alloc_priority_group(void)
@@ -813,6 +820,11 @@ static int parse_features(struct arg_set
 			continue;
 		}
 
+		if (!strnicmp(param_name, MESG_STR("abort_queue_on_failure"))) {
+			set_bit(MPF_ABORT_QUEUE, &m->features);
+			continue;
+		}
+
 		if (!strnicmp(param_name, MESG_STR("pg_init_retries")) &&
 		    (argc >= 1)) {
 			r = read_param(_params + 1, shift(as),
@@ -1382,11 +1394,14 @@ static int multipath_status(struct dm_ta
 		DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count);
 	else {
 		DMEMIT("%u ", m->queue_if_no_path +
-			      (m->pg_init_retries > 0) * 2);
+			      (m->pg_init_retries > 0) * 2 +
+			      test_bit(MPF_ABORT_QUEUE, &m->features));
 		if (m->queue_if_no_path)
 			DMEMIT("queue_if_no_path ");
 		if (m->pg_init_retries)
 			DMEMIT("pg_init_retries %u ", m->pg_init_retries);
+		if (test_bit(MPF_ABORT_QUEUE, &m->features))
+			DMEMIT("abort_queue_on_failure ");
 	}
 
 	if (!m->hw_handler_name || type == STATUSTYPE_INFO)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] dm mpath: add feature flag to control call to blk_abort_queue
  2010-11-18  4:40                 ` [PATCH v2] dm mpath: add feature flag to control call to blk_abort_queue Mike Snitzer
@ 2010-11-18  7:20                   ` Mike Anderson
  2010-11-18 15:48                     ` Mike Snitzer
  2010-11-18 15:48                     ` [PATCH v3] " Mike Snitzer
  0 siblings, 2 replies; 21+ messages in thread
From: Mike Anderson @ 2010-11-18  7:20 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, linux-scsi, Mike Christie

Mike S,

Thanks for doing the refresh / update of the patch.
I just did a quick test and did not see any issues but did have a couple
of questions.

Two questions:

1.) Should we bump the multipath targets version for this change?

2.) A general question on the length of the feature name
"abort_queue_on_failure" while the descriptive name is nice I noticed if
I have two features that the multipath output line starts wrapping. I
guess we could make the feature name shorter, but eventually if we added
more features the line would eventually wrap so a shorted name will just
stop wrapping now.

(I had a old experimental patch where I was trying to use the feature
flags to control the selection of fast fail flags to use and if full
recovery was done on the IO. The output was kinda messy, but that is to
be expected with this type of text interface vs a attribute based
alternative.)

Thanks,

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] dm mpath: add feature flag to control call to blk_abort_queue
  2010-11-18  7:20                   ` Mike Anderson
@ 2010-11-18 15:48                     ` Mike Snitzer
  2010-11-18 15:48                     ` [PATCH v3] " Mike Snitzer
  1 sibling, 0 replies; 21+ messages in thread
From: Mike Snitzer @ 2010-11-18 15:48 UTC (permalink / raw)
  To: Mike Anderson; +Cc: dm-devel, linux-scsi, Mike Christie

On Thu, Nov 18 2010 at  2:20am -0500,
Mike Anderson <andmike@linux.vnet.ibm.com> wrote:

> Mike S,
> 
> Thanks for doing the refresh / update of the patch.
> I just did a quick test and did not see any issues but did have a couple
> of questions.
> 
> Two questions:
> 
> 1.) Should we bump the multipath targets version for this change?

Yes, not doing so was an oversight.  I'll bump the version and post v3.

> 2.) A general question on the length of the feature name
> "abort_queue_on_failure" while the descriptive name is nice I noticed if
> I have two features that the multipath output line starts wrapping. I
> guess we could make the feature name shorter, but eventually if we added
> more features the line would eventually wrap so a shorted name will just
> stop wrapping now.

I'm open to other suggestions for the feature name.

I agree that we don't want feature names to get too long.  But they do
need to be descriptive.  So we need to have some balance.  I could've
used "abort_q_on_failure" but I went for "queue" to maintain symmetry
with "queue_if_no_path".  Similarly, abbreviating "failure" to "fail"
seemed like it didn't buy much (less clear?). *shrug* Maybe
"abort_queue_on_fail" offers a better balance?

As for the wrapping, I don't think there is anything we can do to avoid
it (given the current interface).

Thanks,
Mike

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3] dm mpath: add feature flag to control call to blk_abort_queue
  2010-11-18  7:20                   ` Mike Anderson
  2010-11-18 15:48                     ` Mike Snitzer
@ 2010-11-18 15:48                     ` Mike Snitzer
  2010-11-18 19:16                       ` (unknown), Mike Snitzer
  2010-11-18 19:19                       ` [PATCH v4] dm mpath: avoid call to blk_abort_queue by default Mike Snitzer
  1 sibling, 2 replies; 21+ messages in thread
From: Mike Snitzer @ 2010-11-18 15:48 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-scsi, Mike Anderson, Mike Christie

Add 'features' member to 'struct multipath' and introduce
MPF_ABORT_QUEUE as the first feature flag.  If "abort_queue_on_fail"
is provided during mpath device configuration the MPF_ABORT_QUEUE
feature flag will be set and blk_abort_queue() will be called during
path failure to allow for lower latency path failures.

But this feature has been disabled by default due to a race (between
blk_abort_queue and scsi_request_fn) that can lead to list corruption,
from: https://www.redhat.com/archives/dm-devel/2010-November/msg00085.html

"the cmd gets blk_abort_queued/timedout run on it and the scsi eh
somehow is able to complete and run scsi_queue_insert while
scsi_request_fn is still trying to process the request."

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/md/dm-mpath.c |   21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Index: linux-2.6/drivers/md/dm-mpath.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-mpath.c
+++ linux-2.6/drivers/md/dm-mpath.c
@@ -56,8 +56,14 @@ struct priority_group {
 	struct list_head pgpaths;
 };
 
+/*
+ * Bits for the m->features
+ */
+#define MPF_ABORT_QUEUE 0
+
 /* Multipath context */
 struct multipath {
+	unsigned long features;
 	struct list_head list;
 	struct dm_target *ti;
 
@@ -146,7 +152,8 @@ static void deactivate_path(struct work_
 	struct pgpath *pgpath =
 		container_of(work, struct pgpath, deactivate_path);
 
-	blk_abort_queue(pgpath->path.dev->bdev->bd_disk->queue);
+	if (test_bit(MPF_ABORT_QUEUE, &pgpath->pg->m->features))
+		blk_abort_queue(pgpath->path.dev->bdev->bd_disk->queue);
 }
 
 static struct priority_group *alloc_priority_group(void)
@@ -813,6 +820,11 @@ static int parse_features(struct arg_set
 			continue;
 		}
 
+		if (!strnicmp(param_name, MESG_STR("abort_queue_on_fail"))) {
+			set_bit(MPF_ABORT_QUEUE, &m->features);
+			continue;
+		}
+
 		if (!strnicmp(param_name, MESG_STR("pg_init_retries")) &&
 		    (argc >= 1)) {
 			r = read_param(_params + 1, shift(as),
@@ -1382,11 +1394,14 @@ static int multipath_status(struct dm_ta
 		DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count);
 	else {
 		DMEMIT("%u ", m->queue_if_no_path +
-			      (m->pg_init_retries > 0) * 2);
+			      (m->pg_init_retries > 0) * 2 +
+			      test_bit(MPF_ABORT_QUEUE, &m->features));
 		if (m->queue_if_no_path)
 			DMEMIT("queue_if_no_path ");
 		if (m->pg_init_retries)
 			DMEMIT("pg_init_retries %u ", m->pg_init_retries);
+		if (test_bit(MPF_ABORT_QUEUE, &m->features))
+			DMEMIT("abort_queue_on_fail ");
 	}
 
 	if (!m->hw_handler_name || type == STATUSTYPE_INFO)
@@ -1655,7 +1670,7 @@ out:
  *---------------------------------------------------------------*/
 static struct target_type multipath_target = {
 	.name = "multipath",
-	.version = {1, 1, 1},
+	.version = {1, 2, 0},
 	.module = THIS_MODULE,
 	.ctr = multipath_ctr,
 	.dtr = multipath_dtr,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* (unknown), 
  2010-11-18 15:48                     ` [PATCH v3] " Mike Snitzer
@ 2010-11-18 19:16                       ` Mike Snitzer
  2010-11-18 19:21                         ` Mike Snitzer
  2010-11-18 19:19                       ` [PATCH v4] dm mpath: avoid call to blk_abort_queue by default Mike Snitzer
  1 sibling, 1 reply; 21+ messages in thread
From: Mike Snitzer @ 2010-11-18 19:16 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-scsi

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 487ecda..723dc19 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -56,8 +56,14 @@ struct priority_group {
 	struct list_head pgpaths;
 };
 
+/*
+ * Bits for the m->features
+ */
+#define MPF_ABORT_QUEUE 0
+
 /* Multipath context */
 struct multipath {
+	unsigned long features;
 	struct list_head list;
 	struct dm_target *ti;
 
@@ -813,6 +819,11 @@ static int parse_features(struct arg_set *as, struct multipath *m)
 			continue;
 		}
 
+		if (!strnicmp(param_name, MESG_STR("abort_queue_on_fail"))) {
+			set_bit(MPF_ABORT_QUEUE, &m->features);
+			continue;
+		}
+
 		if (!strnicmp(param_name, MESG_STR("pg_init_retries")) &&
 		    (argc >= 1)) {
 			r = read_param(_params + 1, shift(as),
@@ -995,7 +1006,9 @@ static int fail_path(struct pgpath *pgpath)
 		      pgpath->path.dev->name, m->nr_valid_paths);
 
 	schedule_work(&m->trigger_event);
-	queue_work(kmultipathd, &pgpath->deactivate_path);
+
+	if (test_bit(MPF_ABORT_QUEUE, &pgpath->pg->m->features))
+		queue_work(kmultipathd, &pgpath->deactivate_path);
 
 out:
 	spin_unlock_irqrestore(&m->lock, flags);
@@ -1382,11 +1395,14 @@ static int multipath_status(struct dm_target *ti, status_type_t type,
 		DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count);
 	else {
 		DMEMIT("%u ", m->queue_if_no_path +
-			      (m->pg_init_retries > 0) * 2);
+			      (m->pg_init_retries > 0) * 2 +
+			      test_bit(MPF_ABORT_QUEUE, &m->features));
 		if (m->queue_if_no_path)
 			DMEMIT("queue_if_no_path ");
 		if (m->pg_init_retries)
 			DMEMIT("pg_init_retries %u ", m->pg_init_retries);
+		if (test_bit(MPF_ABORT_QUEUE, &m->features))
+			DMEMIT("abort_queue_on_fail ");
 	}
 
 	if (!m->hw_handler_name || type == STATUSTYPE_INFO)
@@ -1490,6 +1506,10 @@ static int multipath_message(struct dm_target *ti, unsigned argc, char **argv)
 		} else if (!strnicmp(argv[0], MESG_STR("fail_if_no_path"))) {
 			r = queue_if_no_path(m, 0, 0);
 			goto out;
+		} else if (!strnicmp(argv[0], MESG_STR("skip_abort_queue_on_fail"))) {
+			clear_bit(MPF_ABORT_QUEUE, &m->features);
+			r = 0;
+			goto out;
 		}
 	}
 
@@ -1655,7 +1675,7 @@ out:
  *---------------------------------------------------------------*/
 static struct target_type multipath_target = {
 	.name = "multipath",
-	.version = {1, 1, 1},
+	.version = {1, 2, 0},
 	.module = THIS_MODULE,
 	.ctr = multipath_ctr,
 	.dtr = multipath_dtr,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4] dm mpath: avoid call to blk_abort_queue by default
  2010-11-18 15:48                     ` [PATCH v3] " Mike Snitzer
  2010-11-18 19:16                       ` (unknown), Mike Snitzer
@ 2010-11-18 19:19                       ` Mike Snitzer
  2010-11-18 20:07                         ` [PATCH v5] " Mike Snitzer
  1 sibling, 1 reply; 21+ messages in thread
From: Mike Snitzer @ 2010-11-18 19:19 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-scsi, Mike Anderson, Mike Christie

Multipath was previously made to use blk_abort_queue() to allow for
lower latency path deactivation (commit 224cb3e9).  The call to
blk_abort_queue has proven to be unsafe, and is now disabled by default,
due to a race (between blk_abort_queue and scsi_request_fn) that can
lead to list corruption, from:
https://www.redhat.com/archives/dm-devel/2010-November/msg00085.html

"the cmd gets blk_abort_queued/timedout run on it and the scsi eh
somehow is able to complete and run scsi_queue_insert while
scsi_request_fn is still trying to process the request."

It is expected that this race will be fixed in the near-term so it makes
little since to remove all associated code.  Providing control over the
call to blk_abort_queue() facilitates continued testing and future
flexibility to opt-in to lower latency path deactivation.  Opting to
enable this feature will emit a warning for the time being.

Add 'features' member to 'struct multipath' and introduce
MPF_ABORT_QUEUE as the first feature flag.  If "abort_queue_on_fail"
is provided during mpath device configuration the MPF_ABORT_QUEUE
feature flag will be set and blk_abort_queue() will be called during
path deactivation.  The MPF_ABORT_QUEUE feature flag may be cleared
using the "skip_abort_queue_on_fail" message.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/md/dm-mpath.c |   27 ++++++++++++++++++++++++---
 1 files changed, 24 insertions(+), 3 deletions(-)

v4: - revised the patch subject and header to emphaszie default to off
    - fixed missing m->lock locking when using test_bit() on m->features
      (MPF_ABORT_QUEUE is checked early in fail_path() to avoid queuing
      unnecessary work).
    - also added ability to clear MPF_ABORT_QUEUE feature flag via
      "skip_abort_queue_on_fail" message.

v3: bumped target version and switched feature name from
    "abort_queue_on_failure" to "abort_queue_on_fail".

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 487ecda..ecec1af 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -56,8 +56,14 @@ struct priority_group {
 	struct list_head pgpaths;
 };
 
+/*
+ * Bits for the m->features
+ */
+#define MPF_ABORT_QUEUE 0
+
 /* Multipath context */
 struct multipath {
+	unsigned long features;
 	struct list_head list;
 	struct dm_target *ti;
 
@@ -813,6 +819,12 @@ static int parse_features(struct arg_set *as, struct multipath *m)
 			continue;
 		}
 
+		if (!strnicmp(param_name, MESG_STR("abort_queue_on_fail"))) {
+			set_bit(MPF_ABORT_QUEUE, &m->features);
+			DMWARN("Enabling use of blk_abort_queue is unsafe.");
+			continue;
+		}
+
 		if (!strnicmp(param_name, MESG_STR("pg_init_retries")) &&
 		    (argc >= 1)) {
 			r = read_param(_params + 1, shift(as),
@@ -995,7 +1007,9 @@ static int fail_path(struct pgpath *pgpath)
 		      pgpath->path.dev->name, m->nr_valid_paths);
 
 	schedule_work(&m->trigger_event);
-	queue_work(kmultipathd, &pgpath->deactivate_path);
+
+	if (test_bit(MPF_ABORT_QUEUE, &pgpath->pg->m->features))
+		queue_work(kmultipathd, &pgpath->deactivate_path);
 
 out:
 	spin_unlock_irqrestore(&m->lock, flags);
@@ -1382,11 +1396,14 @@ static int multipath_status(struct dm_target *ti, status_type_t type,
 		DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count);
 	else {
 		DMEMIT("%u ", m->queue_if_no_path +
-			      (m->pg_init_retries > 0) * 2);
+			      (m->pg_init_retries > 0) * 2 +
+			      test_bit(MPF_ABORT_QUEUE, &m->features));
 		if (m->queue_if_no_path)
 			DMEMIT("queue_if_no_path ");
 		if (m->pg_init_retries)
 			DMEMIT("pg_init_retries %u ", m->pg_init_retries);
+		if (test_bit(MPF_ABORT_QUEUE, &m->features))
+			DMEMIT("abort_queue_on_fail ");
 	}
 
 	if (!m->hw_handler_name || type == STATUSTYPE_INFO)
@@ -1490,6 +1507,10 @@ static int multipath_message(struct dm_target *ti, unsigned argc, char **argv)
 		} else if (!strnicmp(argv[0], MESG_STR("fail_if_no_path"))) {
 			r = queue_if_no_path(m, 0, 0);
 			goto out;
+		} else if (!strnicmp(argv[0], MESG_STR("skip_abort_queue_on_fail"))) {
+			clear_bit(MPF_ABORT_QUEUE, &m->features);
+			r = 0;
+			goto out;
 		}
 	}
 
@@ -1655,7 +1676,7 @@ out:
  *---------------------------------------------------------------*/
 static struct target_type multipath_target = {
 	.name = "multipath",
-	.version = {1, 1, 1},
+	.version = {1, 2, 0},
 	.module = THIS_MODULE,
 	.ctr = multipath_ctr,
 	.dtr = multipath_dtr,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re:
  2010-11-18 19:16                       ` (unknown), Mike Snitzer
@ 2010-11-18 19:21                         ` Mike Snitzer
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Snitzer @ 2010-11-18 19:21 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-scsi

please ignore this patch...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v5] dm mpath: avoid call to blk_abort_queue by default
  2010-11-18 19:19                       ` [PATCH v4] dm mpath: avoid call to blk_abort_queue by default Mike Snitzer
@ 2010-11-18 20:07                         ` Mike Snitzer
  2010-11-18 20:18                           ` [dm-devel] " Alasdair G Kergon
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Snitzer @ 2010-11-18 20:07 UTC (permalink / raw)
  To: dm-devel; +Cc: linux-scsi, Mike Anderson, Mike Christie

Multipath was previously made to use blk_abort_queue() to allow for
lower latency path deactivation (commit 224cb3e9).  The call to
blk_abort_queue has proven to be unsafe, and is now disabled by default,
due to a race (between blk_abort_queue and scsi_request_fn) that can
lead to list corruption, from:
https://www.redhat.com/archives/dm-devel/2010-November/msg00085.html

"the cmd gets blk_abort_queued/timedout run on it and the scsi eh
somehow is able to complete and run scsi_queue_insert while
scsi_request_fn is still trying to process the request."

It is expected that this race will be fixed in the near-term so it makes
little since to remove all associated code.  Providing control over the
call to blk_abort_queue() facilitates continued testing and future
flexibility to opt-in to lower latency path deactivation.  Opting to
enable this feature will emit a warning for the time being.

Add 'features' member to 'struct multipath' and introduce
MPF_ABORT_QUEUE as the first feature flag.  If "abort_queue_on_fail"
is provided, via message or during mpath device configuration, the
MPF_ABORT_QUEUE feature flag will be set and blk_abort_queue() will be
called during path deactivation.  The MPF_ABORT_QUEUE feature flag may
be cleared using the "skip_abort_queue_on_fail" message.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/md/dm-mpath.c |   46 +++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 43 insertions(+), 3 deletions(-)

v5: - introduced abort_queue_on_fail() to perform m->lock locking around
      both set_bit and clear_bit
    - also added ability to set MPF_ABORT_QUEUE feature flag via
      "abort_queue_on_fail" message.

v4: - revised the patch subject and header to emphaszie default to off
    - fixed missing m->lock locking when using test_bit() on m->features
      (MPF_ABORT_QUEUE is checked early in fail_path() to avoid queuing
      unnecessary work).
    - also added ability to clear MPF_ABORT_QUEUE feature flag via
      "skip_abort_queue_on_fail" message.

v3: bumped target version and switched feature name from
    "abort_queue_on_failure" to "abort_queue_on_fail".

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 487ecda..a6f11c3 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -56,8 +56,14 @@ struct priority_group {
 	struct list_head pgpaths;
 };
 
+/*
+ * Bits for the m->features
+ */
+#define MPF_ABORT_QUEUE 0
+
 /* Multipath context */
 struct multipath {
+	unsigned long features;
 	struct list_head list;
 	struct dm_target *ti;
 
@@ -414,6 +420,24 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path,
 	return 0;
 }
 
+static int abort_queue_on_fail(struct multipath *m, unsigned abort_queue_on_fail)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&m->lock, flags);
+
+	if (abort_queue_on_fail) {
+		set_bit(MPF_ABORT_QUEUE, &m->features);
+		DMWARN("Enabling use of blk_abort_queue is unsafe.");
+	} else
+		clear_bit(MPF_ABORT_QUEUE, &m->features);
+
+	spin_unlock_irqrestore(&m->lock, flags);
+
+	return 0;
+}
+
+
 /*-----------------------------------------------------------------
  * The multipath daemon is responsible for resubmitting queued ios.
  *---------------------------------------------------------------*/
@@ -813,6 +837,11 @@ static int parse_features(struct arg_set *as, struct multipath *m)
 			continue;
 		}
 
+		if (!strnicmp(param_name, MESG_STR("abort_queue_on_fail"))) {
+			r = abort_queue_on_fail(m, 1);
+			continue;
+		}
+
 		if (!strnicmp(param_name, MESG_STR("pg_init_retries")) &&
 		    (argc >= 1)) {
 			r = read_param(_params + 1, shift(as),
@@ -995,7 +1024,9 @@ static int fail_path(struct pgpath *pgpath)
 		      pgpath->path.dev->name, m->nr_valid_paths);
 
 	schedule_work(&m->trigger_event);
-	queue_work(kmultipathd, &pgpath->deactivate_path);
+
+	if (test_bit(MPF_ABORT_QUEUE, &pgpath->pg->m->features))
+		queue_work(kmultipathd, &pgpath->deactivate_path);
 
 out:
 	spin_unlock_irqrestore(&m->lock, flags);
@@ -1382,11 +1413,14 @@ static int multipath_status(struct dm_target *ti, status_type_t type,
 		DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count);
 	else {
 		DMEMIT("%u ", m->queue_if_no_path +
-			      (m->pg_init_retries > 0) * 2);
+			      (m->pg_init_retries > 0) * 2 +
+			      test_bit(MPF_ABORT_QUEUE, &m->features));
 		if (m->queue_if_no_path)
 			DMEMIT("queue_if_no_path ");
 		if (m->pg_init_retries)
 			DMEMIT("pg_init_retries %u ", m->pg_init_retries);
+		if (test_bit(MPF_ABORT_QUEUE, &m->features))
+			DMEMIT("abort_queue_on_fail ");
 	}
 
 	if (!m->hw_handler_name || type == STATUSTYPE_INFO)
@@ -1490,6 +1524,12 @@ static int multipath_message(struct dm_target *ti, unsigned argc, char **argv)
 		} else if (!strnicmp(argv[0], MESG_STR("fail_if_no_path"))) {
 			r = queue_if_no_path(m, 0, 0);
 			goto out;
+		} else if (!strnicmp(argv[0], MESG_STR("abort_queue_on_fail"))) {
+			r = abort_queue_on_fail(m, 1);
+			goto out;
+		} else if (!strnicmp(argv[0], MESG_STR("skip_abort_queue_on_fail"))) {
+			r = abort_queue_on_fail(m, 0);
+			goto out;
 		}
 	}
 
@@ -1655,7 +1695,7 @@ out:
  *---------------------------------------------------------------*/
 static struct target_type multipath_target = {
 	.name = "multipath",
-	.version = {1, 1, 1},
+	.version = {1, 2, 0},
 	.module = THIS_MODULE,
 	.ctr = multipath_ctr,
 	.dtr = multipath_dtr,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [dm-devel] [PATCH v5] dm mpath: avoid call to blk_abort_queue by default
  2010-11-18 20:07                         ` [PATCH v5] " Mike Snitzer
@ 2010-11-18 20:18                           ` Alasdair G Kergon
  2010-11-18 20:39                             ` Mike Anderson
  2010-11-18 21:48                             ` [PATCH] dm mpath: disable call to blk_abort_queue and related code Mike Snitzer
  0 siblings, 2 replies; 21+ messages in thread
From: Alasdair G Kergon @ 2010-11-18 20:18 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Mike Christie, Mike Anderson, linux-scsi

On Thu, Nov 18, 2010 at 03:07:33PM -0500, Mike Snitzer wrote:
> It is expected that this race will be fixed in the near-term so it makes
> little since to remove all associated code.  Providing control over the
> call to blk_abort_queue() facilitates continued testing and future
> flexibility to opt-in to lower latency path deactivation.  Opting to
> enable this feature will emit a warning for the time being.
 
Ah, but then the *feature* keyword - which is only going to be needed
temporarily until it's fixed - becomes a permanent part of the interface...

If it's broken, let's just disable it (#define-style) until it gets fixed.
(And only if it doesn't get fixed in a reasonable amount of time, remove the
code.)

Alasdair


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dm-devel] [PATCH v5] dm mpath: avoid call to blk_abort_queue by default
  2010-11-18 20:18                           ` [dm-devel] " Alasdair G Kergon
@ 2010-11-18 20:39                             ` Mike Anderson
  2010-11-18 21:48                             ` [PATCH] dm mpath: disable call to blk_abort_queue and related code Mike Snitzer
  1 sibling, 0 replies; 21+ messages in thread
From: Mike Anderson @ 2010-11-18 20:39 UTC (permalink / raw)
  To: Mike Snitzer, dm-devel, Mike Christie, linux-scsi

Alasdair G Kergon <agk@redhat.com> wrote:
> On Thu, Nov 18, 2010 at 03:07:33PM -0500, Mike Snitzer wrote:
> > It is expected that this race will be fixed in the near-term so it makes
> > little since to remove all associated code.  Providing control over the
> > call to blk_abort_queue() facilitates continued testing and future
> > flexibility to opt-in to lower latency path deactivation.  Opting to
> > enable this feature will emit a warning for the time being.
> 
> Ah, but then the *feature* keyword - which is only going to be needed
> temporarily until it's fixed - becomes a permanent part of the interface...
> 
> If it's broken, let's just disable it (#define-style) until it gets fixed.
> (And only if it doesn't get fixed in a reasonable amount of time, remove the
> code.)

I understand that if its broken that we could #define it out in dm or we
could #define the code in blk_abort_queue.

I believe the thought was that post addressing the race issue that
others may want control. As there maybe cases where the waking up of the
error_handler and a extra abort may not be wanted in all cases. 

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] dm mpath: disable call to blk_abort_queue and related code
  2010-11-18 20:18                           ` [dm-devel] " Alasdair G Kergon
  2010-11-18 20:39                             ` Mike Anderson
@ 2010-11-18 21:48                             ` Mike Snitzer
  2010-11-23  1:00                               ` [PATCH v2] dm mpath: revert "dm: Call blk_abort_queue on failed paths" Mike Snitzer
  1 sibling, 1 reply; 21+ messages in thread
From: Mike Snitzer @ 2010-11-18 21:48 UTC (permalink / raw)
  To: dm-devel; +Cc: Mike Christie, Mike Anderson

Multipath was previously made to use blk_abort_queue() to allow for
lower latency path deactivation (commit 224cb3e9).  The call to
blk_abort_queue has proven to be unsafe due to a race (between
blk_abort_queue and scsi_request_fn) that can lead to list corruption,
from: https://www.redhat.com/archives/dm-devel/2010-November/msg00085.html

"the cmd gets blk_abort_queued/timedout run on it and the scsi eh
somehow is able to complete and run scsi_queue_insert while
scsi_request_fn is still trying to process the request."

It is hoped that this race will be fixed soon so it makes little sense
to remove all associated code at this point.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/md/dm-mpath.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 487ecda..6292e41 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -24,6 +24,9 @@
 #define DM_MSG_PREFIX "multipath"
 #define MESG_STR(x) x, sizeof(x)
 
+/* Avoid calling blk_abort_queue until race with scsi_request_fn is fixed */
+#define MPATH_CALL_BLK_ABORT_QUEUE 0
+
 /* Path properties */
 struct pgpath {
 	struct list_head list;
@@ -33,7 +36,9 @@ struct pgpath {
 	unsigned fail_count;		/* Cumulative failure count */
 
 	struct dm_path path;
+#if MPATH_CALL_BLK_ABORT_QUEUE
 	struct work_struct deactivate_path;
+#endif
 	struct work_struct activate_path;
 };
 
@@ -116,7 +121,9 @@ static struct workqueue_struct *kmultipathd, *kmpath_handlerd;
 static void process_queued_ios(struct work_struct *work);
 static void trigger_event(struct work_struct *work);
 static void activate_path(struct work_struct *work);
+#if MPATH_CALL_BLK_ABORT_QUEUE
 static void deactivate_path(struct work_struct *work);
+#endif
 
 
 /*-----------------------------------------------
@@ -129,7 +136,9 @@ static struct pgpath *alloc_pgpath(void)
 
 	if (pgpath) {
 		pgpath->is_active = 1;
+#if MPATH_CALL_BLK_ABORT_QUEUE
 		INIT_WORK(&pgpath->deactivate_path, deactivate_path);
+#endif
 		INIT_WORK(&pgpath->activate_path, activate_path);
 	}
 
@@ -141,6 +150,7 @@ static void free_pgpath(struct pgpath *pgpath)
 	kfree(pgpath);
 }
 
+#if MPATH_CALL_BLK_ABORT_QUEUE
 static void deactivate_path(struct work_struct *work)
 {
 	struct pgpath *pgpath =
@@ -148,6 +158,7 @@ static void deactivate_path(struct work_struct *work)
 
 	blk_abort_queue(pgpath->path.dev->bdev->bd_disk->queue);
 }
+#endif
 
 static struct priority_group *alloc_priority_group(void)
 {
@@ -995,7 +1006,9 @@ static int fail_path(struct pgpath *pgpath)
 		      pgpath->path.dev->name, m->nr_valid_paths);
 
 	schedule_work(&m->trigger_event);
+#if MPATH_CALL_BLK_ABORT_QUEUE
 	queue_work(kmultipathd, &pgpath->deactivate_path);
+#endif
 
 out:
 	spin_unlock_irqrestore(&m->lock, flags);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2] dm mpath: revert "dm: Call blk_abort_queue on failed paths"
  2010-11-18 21:48                             ` [PATCH] dm mpath: disable call to blk_abort_queue and related code Mike Snitzer
@ 2010-11-23  1:00                               ` Mike Snitzer
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Snitzer @ 2010-11-23  1:00 UTC (permalink / raw)
  To: dm-devel; +Cc: Mike Christie, Mike Anderson

This reverts commit 224cb3e981f1b2f9f93dbd49eaef505d17d894c2.

Multipath was previously made to use blk_abort_queue() to allow for
lower latency path deactivation.  The call to blk_abort_queue has proven
to be unsafe due to a race (between blk_abort_queue and scsi_request_fn)
that can lead to list corruption, from:
https://www.redhat.com/archives/dm-devel/2010-November/msg00085.html

"the cmd gets blk_abort_queued/timedout run on it and the scsi eh
somehow is able to complete and run scsi_queue_insert while
scsi_request_fn is still trying to process the request."

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/md/dm-mpath.c |   12 ------------
 1 file changed, 12 deletions(-)

v2: revert commit rather than #if 0 the code in question

Index: linux-2.6/drivers/md/dm-mpath.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-mpath.c
+++ linux-2.6/drivers/md/dm-mpath.c
@@ -33,7 +33,6 @@ struct pgpath {
 	unsigned fail_count;		/* Cumulative failure count */
 
 	struct dm_path path;
-	struct work_struct deactivate_path;
 	struct work_struct activate_path;
 };
 
@@ -116,7 +115,6 @@ static struct workqueue_struct *kmultipa
 static void process_queued_ios(struct work_struct *work);
 static void trigger_event(struct work_struct *work);
 static void activate_path(struct work_struct *work);
-static void deactivate_path(struct work_struct *work);
 
 
 /*-----------------------------------------------
@@ -129,7 +127,6 @@ static struct pgpath *alloc_pgpath(void)
 
 	if (pgpath) {
 		pgpath->is_active = 1;
-		INIT_WORK(&pgpath->deactivate_path, deactivate_path);
 		INIT_WORK(&pgpath->activate_path, activate_path);
 	}
 
@@ -141,14 +138,6 @@ static void free_pgpath(struct pgpath *p
 	kfree(pgpath);
 }
 
-static void deactivate_path(struct work_struct *work)
-{
-	struct pgpath *pgpath =
-		container_of(work, struct pgpath, deactivate_path);
-
-	blk_abort_queue(pgpath->path.dev->bdev->bd_disk->queue);
-}
-
 static struct priority_group *alloc_priority_group(void)
 {
 	struct priority_group *pg;
@@ -995,7 +984,6 @@ static int fail_path(struct pgpath *pgpa
 		      pgpath->path.dev->name, m->nr_valid_paths);
 
 	schedule_work(&m->trigger_event);
-	queue_work(kmultipathd, &pgpath->deactivate_path);
 
 out:
 	spin_unlock_irqrestore(&m->lock, flags);

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-11-23  1:00 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-12  5:23 block_abort_queue (blk_abort_request) racing with scsi_request_fn Mike Anderson
2010-11-10  7:09 ` [dm-devel] " Mike Christie
2010-11-10  7:30   ` Mike Christie
2010-11-10 16:30     ` Mike Anderson
2010-11-10 21:16       ` Mike Christie
2010-11-12 17:54         ` Mike Anderson
2010-11-16 21:39           ` Mike Snitzer
2010-11-17 17:49             ` [dm-devel] " Mike Anderson
2010-11-17 21:55               ` Mike Snitzer
2010-11-18  4:40                 ` [PATCH v2] dm mpath: add feature flag to control call to blk_abort_queue Mike Snitzer
2010-11-18  7:20                   ` Mike Anderson
2010-11-18 15:48                     ` Mike Snitzer
2010-11-18 15:48                     ` [PATCH v3] " Mike Snitzer
2010-11-18 19:16                       ` (unknown), Mike Snitzer
2010-11-18 19:21                         ` Mike Snitzer
2010-11-18 19:19                       ` [PATCH v4] dm mpath: avoid call to blk_abort_queue by default Mike Snitzer
2010-11-18 20:07                         ` [PATCH v5] " Mike Snitzer
2010-11-18 20:18                           ` [dm-devel] " Alasdair G Kergon
2010-11-18 20:39                             ` Mike Anderson
2010-11-18 21:48                             ` [PATCH] dm mpath: disable call to blk_abort_queue and related code Mike Snitzer
2010-11-23  1:00                               ` [PATCH v2] dm mpath: revert "dm: Call blk_abort_queue on failed paths" Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.