All of lore.kernel.org
 help / color / mirror / Atom feed
* dm-mq and end_clone_request()
@ 2016-07-19 22:57 Bart Van Assche
  2016-07-20 14:08 ` Mike Snitzer
  0 siblings, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-07-19 22:57 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Hello Mike,

If I run a fio data integrity test against kernel v4.7-rc7 then I see 
often that fio reports I/O errors if a path is removed despite 
queue_if_no_path having been set in /etc/multipath.conf. Further 
analysis showed that this happens because during SCSI device removal a 
SCSI device enters state SDEV_CANCEL before the block layer queue is 
marked as "dying". In that state I/O requests submitted to that SCSI 
device are failed with -EIO. The behavior for end_clone_request() in 
drivers/md/dm.c for such requests is as follows:
- With multiqueue support disabled, call __blk_put_request() and ignore
   the "error" argument passed to end_clone_request().
- With multiqueue support enabled, pass the "error" argument to
   dm_complete_request().

Shouldn't end_clone_request() requeue failed requests in both cases 
instead of passing the I/O error to the submitter only if multiqueue is 
enabled?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-19 22:57 dm-mq and end_clone_request() Bart Van Assche
@ 2016-07-20 14:08 ` Mike Snitzer
  2016-07-20 14:27   ` Mike Snitzer
  0 siblings, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-07-20 14:08 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Tue, Jul 19 2016 at  6:57pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> Hello Mike,
> 
> If I run a fio data integrity test against kernel v4.7-rc7 then I
> see often that fio reports I/O errors if a path is removed despite
> queue_if_no_path having been set in /etc/multipath.conf. Further
> analysis showed that this happens because during SCSI device removal
> a SCSI device enters state SDEV_CANCEL before the block layer queue
> is marked as "dying". In that state I/O requests submitted to that
> SCSI device are failed with -EIO. The behavior for
> end_clone_request() in drivers/md/dm.c for such requests is as
> follows:
> - With multiqueue support disabled, call __blk_put_request() and ignore
>   the "error" argument passed to end_clone_request().

The __blk_put_request() isn't contributing to this blk-mq problem.  The
need for it is unique to the request_fn case.

> - With multiqueue support enabled, pass the "error" argument to
>   dm_complete_request().

The error arg is passed to dm_complete_request() regardless of queue
type but it is only immediately used by the blk-mq API (via
blk_mq_complete_request).

> Shouldn't end_clone_request() requeue failed requests in both cases
> instead of passing the I/O error to the submitter only if multiqueue
> is enabled?

Pretty sure you'll find it is _not_ blk-mq that is passing the error
up.  (But if I'm proven wrong that will be welcomed news).

The error passed to dm_complete_request() is always used to set
tio->error which is later used by dm_done().  DM core handles errors
later via softirq in dm_done() -- where the error is passed into the
target_type's rq_end_io hook.

So in DM multipath you'll see do_end_io() we do finally act on the error
we got from the lower layer.  And if the error is -EIO, noretry_error()
will return true and -EIO will be returned up the IO stack.

In the end we're relying on SCSI to properly categorize the underlying
faults as retryable vs not -- via SCSI's differentiated IO errors.

Unfortunately I'm not seeing anything that DM multipath can do
differently here.  -EIO is _always_ propagated up.

It is strange that all the dm-mq testing that has been done didn't ever
catch this.  The mptest testsuite is a baseline for validating DM
multipath (and request-based DM core) changes.  But I've also had Red
Hat's QE hammer dm-mq with heavy IO (in terms of the "dt" utility) on a
larger NetApp testbed in the face of regular controller faults.

Must be this scenario of SDEV_CANCEL is a race that is relatively
unique/rare to your testbed?

This raises the question: should SCSI be returning something other than
-EIO for this case?  E.g. an error that is retryable?

Mike

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-20 14:08 ` Mike Snitzer
@ 2016-07-20 14:27   ` Mike Snitzer
  2016-07-20 17:37     ` Bart Van Assche
  0 siblings, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-07-20 14:27 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Wed, Jul 20 2016 at 10:08am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Jul 19 2016 at  6:57pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > Hello Mike,
> > 
> > If I run a fio data integrity test against kernel v4.7-rc7 then I
> > see often that fio reports I/O errors if a path is removed despite
> > queue_if_no_path having been set in /etc/multipath.conf. Further
> > analysis showed that this happens because during SCSI device removal
> > a SCSI device enters state SDEV_CANCEL before the block layer queue
> > is marked as "dying". In that state I/O requests submitted to that
> > SCSI device are failed with -EIO. The behavior for
> > end_clone_request() in drivers/md/dm.c for such requests is as
...
> > - With multiqueue support enabled, pass the "error" argument to
> >   dm_complete_request().
> 
> The error arg is passed to dm_complete_request() regardless of queue
> type but it is only immediately used by the blk-mq API (via
> blk_mq_complete_request).
> 
> > Shouldn't end_clone_request() requeue failed requests in both cases
> > instead of passing the I/O error to the submitter only if multiqueue
> > is enabled?
> 
> Pretty sure you'll find it is _not_ blk-mq that is passing the error
> up.  (But if I'm proven wrong that will be welcomed news).
>
> The error passed to dm_complete_request() is always used to set
> tio->error which is later used by dm_done().  DM core handles errors
> later via softirq in dm_done() -- where the error is passed into the
> target_type's rq_end_io hook.
> 
> So in DM multipath you'll see do_end_io() we do finally act on the error
> we got from the lower layer.  And if the error is -EIO, noretry_error()
> will return true and -EIO will be returned up the IO stack.

For some reason I thought -EIO was considered not retryable.  That's
obviously wrong (e.g. noretry_error() doesn't seize on -EIO).

> In the end we're relying on SCSI to properly categorize the underlying
> faults as retryable vs not -- via SCSI's differentiated IO errors.
> 
> Unfortunately I'm not seeing anything that DM multipath can do
> differently here.  -EIO is _always_ propagated up.
> 
> It is strange that all the dm-mq testing that has been done didn't ever
> catch this.  The mptest testsuite is a baseline for validating DM
> multipath (and request-based DM core) changes.  But I've also had Red
> Hat's QE hammer dm-mq with heavy IO (in terms of the "dt" utility) on a
> larger NetApp testbed in the face of regular controller faults.
> 
> Must be this scenario of SDEV_CANCEL is a race that is relatively
> unique/rare to your testbed?
> 
> This raises the question: should SCSI be returning something other than
> -EIO for this case?  E.g. an error that is retryable?

So it must be that blk-mq is somehow returning -EIO earlier based on
rq->errors that is established during blk_mq_complete_request().

Please try this patch (not happy with it since it assumes all
request-based DM targets will handle IO errors -- which is fine for now
since DM multipath is the only one).  Could be you've already tried
this?  Does it fix your problem?

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 7a96618..347ff25 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -414,7 +414,7 @@ static void dm_complete_request(struct request *rq, int error)
 	if (!rq->q->mq_ops)
 		blk_complete_request(rq);
 	else
-		blk_mq_complete_request(rq, error);
+		blk_mq_complete_request(rq, 0);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-20 14:27   ` Mike Snitzer
@ 2016-07-20 17:37     ` Bart Van Assche
  2016-07-20 18:33       ` Mike Snitzer
  2016-07-21 20:32       ` Mike Snitzer
  0 siblings, 2 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-07-20 17:37 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-scsi

On 07/20/2016 07:27 AM, Mike Snitzer wrote:
> On Wed, Jul 20 2016 at 10:08am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> [ ... ]
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 7a96618..347ff25 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -414,7 +414,7 @@ static void dm_complete_request(struct request *rq, int error)
>  	if (!rq->q->mq_ops)
>  		blk_complete_request(rq);
>  	else
> -		blk_mq_complete_request(rq, error);
> +		blk_mq_complete_request(rq, 0);
>  }
>
>  /*

Hello Mike,

Thanks for the patch. But unfortunately even with this patch applied I 
see fio reporting I/O errors when run on top of dm-mq after a 
(simulated) cable pull. I think these errors occur because 
dm_softirq_done() fails requests if clone == NULL and tio->error != 
NULL. Adding WARN_ON_ONCE(error && !tio->clone) in dm_complete_request() 
produced the following call stack:

Workqueue: kblockd blk_mq_run_work_fn
Call Trace:
  [<ffffffff8132aa27>] dump_stack+0x68/0xa1
  [<ffffffff81061ed6>] __warn+0xc6/0xe0
  [<ffffffff81061fa8>] warn_slowpath_null+0x18/0x20
  [<ffffffffa0253fda>] dm_complete_request+0x8a/0xb0 [dm_mod]
  [<ffffffffa0255200>] map_request+0x70/0x2e0 [dm_mod]
  [<ffffffffa025771d>] dm_mq_queue_rq+0x7d/0x120 [dm_mod]
  [<ffffffff8131195a>] __blk_mq_run_hw_queue+0x1da/0x350
  [<ffffffff813120a0>] blk_mq_run_work_fn+0x10/0x20
  [<ffffffff8107f279>] process_one_work+0x1f9/0x6a0
  [<ffffffff8107f769>] worker_thread+0x49/0x490
  [<ffffffff81085f7a>] kthread+0xea/0x100
  [<ffffffff8162faff>] ret_from_fork+0x1f/0x40

Please let me know if you need more information.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-20 17:37     ` Bart Van Assche
@ 2016-07-20 18:33       ` Mike Snitzer
  2016-07-21 20:58         ` [dm-devel] " Bart Van Assche
       [not found]         ` <317679447.7168375.1469729769593.JavaMail.zimbra@redhat.com>
  2016-07-21 20:32       ` Mike Snitzer
  1 sibling, 2 replies; 74+ messages in thread
From: Mike Snitzer @ 2016-07-20 18:33 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Wed, Jul 20 2016 at  1:37pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/20/2016 07:27 AM, Mike Snitzer wrote:
> >On Wed, Jul 20 2016 at 10:08am -0400,
> >Mike Snitzer <snitzer@redhat.com> wrote:
> >[ ... ]
> >diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> >index 7a96618..347ff25 100644
> >--- a/drivers/md/dm-rq.c
> >+++ b/drivers/md/dm-rq.c
> >@@ -414,7 +414,7 @@ static void dm_complete_request(struct request *rq, int error)
> > 	if (!rq->q->mq_ops)
> > 		blk_complete_request(rq);
> > 	else
> >-		blk_mq_complete_request(rq, error);
> >+		blk_mq_complete_request(rq, 0);
> > }
> >
> > /*
> 
> Hello Mike,
> 
> Thanks for the patch. But unfortunately even with this patch applied
> I see fio reporting I/O errors when run on top of dm-mq after a
> (simulated) cable pull. I think these errors occur because
> dm_softirq_done() fails requests if clone == NULL and tio->error !=
> NULL. Adding WARN_ON_ONCE(error && !tio->clone) in
> dm_complete_request() produced the following call stack:
> 
> Workqueue: kblockd blk_mq_run_work_fn
> Call Trace:
>  [<ffffffff8132aa27>] dump_stack+0x68/0xa1
>  [<ffffffff81061ed6>] __warn+0xc6/0xe0
>  [<ffffffff81061fa8>] warn_slowpath_null+0x18/0x20
>  [<ffffffffa0253fda>] dm_complete_request+0x8a/0xb0 [dm_mod]
>  [<ffffffffa0255200>] map_request+0x70/0x2e0 [dm_mod]
>  [<ffffffffa025771d>] dm_mq_queue_rq+0x7d/0x120 [dm_mod]
>  [<ffffffff8131195a>] __blk_mq_run_hw_queue+0x1da/0x350
>  [<ffffffff813120a0>] blk_mq_run_work_fn+0x10/0x20
>  [<ffffffff8107f279>] process_one_work+0x1f9/0x6a0
>  [<ffffffff8107f769>] worker_thread+0x49/0x490
>  [<ffffffff81085f7a>] kthread+0xea/0x100
>  [<ffffffff8162faff>] ret_from_fork+0x1f/0x40
> 
> Please let me know if you need more information.

Cable pull testing isn't new.  I'd have thought it covered (albeit
simulated) via mptest.  Does mptest pass for you?
https://github.com/snitm/mptest

Last I knew your tests were all passing with dm-mq.  Would love to
understand how/when they have regressed.

If clone is NULL then the request wasn't actually issued to (or even
allocated from the tag space of) the underlying blk-mq request_queue
(via clone_and_map_rq's call to __multipath_map).

The original request could've been previously cloned, failed due to
cable pull, it began retry via requeue, but on retry
blk_mq_alloc_request() could've failed in __multipath_map() -- (maybe
now the request_queue was "dying"?).  Or __multipath_map() failed for
some other reason.

Would be interesting to know the error returned from map_request()'s
ti->type->clone_and_map_rq().  Really should just be DM_MAPIO_REQUEUE.
But the stack you've provided shows map_request calling
dm_complete_request(), which implies dm_kill_unmapped_request() is being
called due to ti->type->clone_and_map_rq() returning < 0.

Could be that __multipath_map() is returning an error even before it
would've otherwise called blk_mq_alloc_request().

You said you are using queue_if_no_path?  Would be interesting to see
where __multipth_map is returning with an error.

Could be that __multipath_map() should respond differently if
blk_mq_alloc_request() fails and the queue is dying -- at a minimum the
path should be failed.  But before we explore that we need to understand
why the clone_and_map_rq() is returning < 0.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-20 17:37     ` Bart Van Assche
  2016-07-20 18:33       ` Mike Snitzer
@ 2016-07-21 20:32       ` Mike Snitzer
  2016-07-21 20:40         ` [dm-devel] " Bart Van Assche
  1 sibling, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-07-21 20:32 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Wed, Jul 20 2016 at  1:37pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/20/2016 07:27 AM, Mike Snitzer wrote:
> >On Wed, Jul 20 2016 at 10:08am -0400,
> >Mike Snitzer <snitzer@redhat.com> wrote:
> >[ ... ]
> >diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> >index 7a96618..347ff25 100644
> >--- a/drivers/md/dm-rq.c
> >+++ b/drivers/md/dm-rq.c
> >@@ -414,7 +414,7 @@ static void dm_complete_request(struct request *rq, int error)
> > 	if (!rq->q->mq_ops)
> > 		blk_complete_request(rq);
> > 	else
> >-		blk_mq_complete_request(rq, error);
> >+		blk_mq_complete_request(rq, 0);
> > }
> >
> > /*
> 
> Hello Mike,
> 
> Thanks for the patch. But unfortunately even with this patch applied
> I see fio reporting I/O errors when run on top of dm-mq after a
> (simulated) cable pull.

BTW, curious, how is it that you're simulating the cable pull?  I'd like
to see if doing the same, via a new test in mptest, exposes this issue
you're hitting.

Mike

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-07-21 20:32       ` Mike Snitzer
@ 2016-07-21 20:40         ` Bart Van Assche
  0 siblings, 0 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-07-21 20:40 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-scsi

On 07/21/2016 01:32 PM, Mike Snitzer wrote:
> BTW, curious, how is it that you're simulating the cable pull?  I'd like
> to see if doing the same, via a new test in mptest, exposes this issue
> you're hitting.

Hello Mike,

Writing into the /sys/class/srp_remote_ports/port-*/delete sysfs 
attribute causes the SRP initiator driver to call scsi_remove_host(), 
what also happens if the dev_loss_tmo timer expires.

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-07-20 18:33       ` Mike Snitzer
@ 2016-07-21 20:58         ` Bart Van Assche
  2016-07-25 17:53           ` Mike Snitzer
       [not found]         ` <317679447.7168375.1469729769593.JavaMail.zimbra@redhat.com>
  1 sibling, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-07-21 20:58 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-scsi

On 07/20/2016 11:33 AM, Mike Snitzer wrote:
> Would be interesting to know the error returned from map_request()'s
> ti->type->clone_and_map_rq().  Really should just be DM_MAPIO_REQUEUE.
> But the stack you've provided shows map_request calling
> dm_complete_request(), which implies dm_kill_unmapped_request() is being
> called due to ti->type->clone_and_map_rq() returning < 0.

Hello Mike,

Apparently certain requests fail with -EIO because DM_DEV_SUSPEND ioctls 
are being submitted to the same multipath target. As you know 
DM_DEV_SUSPEND changes QUEUE_IF_NO_PATH from 1 into 0. A WARN_ON() 
statement that I added in driver dm-mpath statement learned me that 
multipathd is submitting these DM_DEV_SUSPEND ioctls. In the output of 
strace -fp$(pidof multipathd) I found the following:

[pid 13927] ioctl(5, DM_TABLE_STATUS, 0x7fa1000483f0) = 0
[pid 13927] write(1, "mpathbe: failed to setup multipa"..., 35) = 35
[pid 13927] write(1, "dm-0: uev_add_map failed\n", 25) = 25
[pid 13927] write(1, "uevent trigger error\n", 21) = 21
[pid 13927] write(1, "sdh: remove path (uevent)\n", 26) = 26
[pid 13927] ioctl(5, DM_TABLE_LOAD, 0x7fa1000483f0) = 0
[pid 13927] ioctl(5, DM_DEV_SUSPEND, 0x7fa1000483f0) = 0

I'm still analyzing these and other messages.

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-21 20:58         ` [dm-devel] " Bart Van Assche
@ 2016-07-25 17:53           ` Mike Snitzer
  2016-07-25 21:23             ` Mike Snitzer
  2016-07-26  6:02             ` Hannes Reinecke
  0 siblings, 2 replies; 74+ messages in thread
From: Mike Snitzer @ 2016-07-25 17:53 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Thu, Jul 21 2016 at  4:58pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/20/2016 11:33 AM, Mike Snitzer wrote:
> >Would be interesting to know the error returned from map_request()'s
> >ti->type->clone_and_map_rq().  Really should just be DM_MAPIO_REQUEUE.
> >But the stack you've provided shows map_request calling
> >dm_complete_request(), which implies dm_kill_unmapped_request() is being
> >called due to ti->type->clone_and_map_rq() returning < 0.
> 
> Hello Mike,
> 
> Apparently certain requests fail with -EIO because DM_DEV_SUSPEND
> ioctls are being submitted to the same multipath target. As you know
> DM_DEV_SUSPEND changes QUEUE_IF_NO_PATH from 1 into 0. A WARN_ON()
> statement that I added in driver dm-mpath statement learned me that
> multipathd is submitting these DM_DEV_SUSPEND ioctls. In the output
> of strace -fp$(pidof multipathd) I found the following:
> 
> [pid 13927] ioctl(5, DM_TABLE_STATUS, 0x7fa1000483f0) = 0
> [pid 13927] write(1, "mpathbe: failed to setup multipa"..., 35) = 35
> [pid 13927] write(1, "dm-0: uev_add_map failed\n", 25) = 25
> [pid 13927] write(1, "uevent trigger error\n", 21) = 21
> [pid 13927] write(1, "sdh: remove path (uevent)\n", 26) = 26
> [pid 13927] ioctl(5, DM_TABLE_LOAD, 0x7fa1000483f0) = 0
> [pid 13927] ioctl(5, DM_DEV_SUSPEND, 0x7fa1000483f0) = 0
> 
> I'm still analyzing these and other messages.

The various ioctls you're seeing is just multipathd responding to the
failures.  Part of reloading a table (with revised path info, etc) is to
suspend and then resume the device that is being updated.

But I'm not actually sure on the historic reasoning of why
queue_if_no_path is disabled (and active setting saved) on suspend.

I'll think about this further but maybe others recall why?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-25 17:53           ` Mike Snitzer
@ 2016-07-25 21:23             ` Mike Snitzer
  2016-07-25 22:00               ` Bart Van Assche
  2016-07-26  6:02             ` Hannes Reinecke
  1 sibling, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-07-25 21:23 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Mon, Jul 25 2016 at  1:53pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, Jul 21 2016 at  4:58pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > On 07/20/2016 11:33 AM, Mike Snitzer wrote:
> > >Would be interesting to know the error returned from map_request()'s
> > >ti->type->clone_and_map_rq().  Really should just be DM_MAPIO_REQUEUE.
> > >But the stack you've provided shows map_request calling
> > >dm_complete_request(), which implies dm_kill_unmapped_request() is being
> > >called due to ti->type->clone_and_map_rq() returning < 0.
> > 
> > Hello Mike,
> > 
> > Apparently certain requests fail with -EIO because DM_DEV_SUSPEND
> > ioctls are being submitted to the same multipath target. As you know
> > DM_DEV_SUSPEND changes QUEUE_IF_NO_PATH from 1 into 0. A WARN_ON()
> > statement that I added in driver dm-mpath statement learned me that
> > multipathd is submitting these DM_DEV_SUSPEND ioctls. In the output
> > of strace -fp$(pidof multipathd) I found the following:
> > 
> > [pid 13927] ioctl(5, DM_TABLE_STATUS, 0x7fa1000483f0) = 0
> > [pid 13927] write(1, "mpathbe: failed to setup multipa"..., 35) = 35
> > [pid 13927] write(1, "dm-0: uev_add_map failed\n", 25) = 25
> > [pid 13927] write(1, "uevent trigger error\n", 21) = 21
> > [pid 13927] write(1, "sdh: remove path (uevent)\n", 26) = 26
> > [pid 13927] ioctl(5, DM_TABLE_LOAD, 0x7fa1000483f0) = 0
> > [pid 13927] ioctl(5, DM_DEV_SUSPEND, 0x7fa1000483f0) = 0
> > 
> > I'm still analyzing these and other messages.
> 
> The various ioctls you're seeing is just multipathd responding to the
> failures.  Part of reloading a table (with revised path info, etc) is to
> suspend and then resume the device that is being updated.
> 
> But I'm not actually sure on the historic reasoning of why
> queue_if_no_path is disabled (and active setting saved) on suspend.
> 
> I'll think about this further but maybe others recall why?

I think it dates back to when we queued IO within the multipath target.
Commit e809917735ebf ("dm mpath: push back requests instead of
queueing") obviously changed how we handle the retry.

But regardless __must_push_back() should catch the case where
queue_io_no_path is cleared during suspend (by checking if current !=
saved).

SO I'd be curious to know if your debugging has enabled you to identify
exactly where in the dm-mapth.c code the -EIO return is being
established.  do_end_io() is the likely candidate -- but again the
__must_push_back() check should prevent it and DM_ENDIO_REQUEUE should
be returned.

Mike

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-25 21:23             ` Mike Snitzer
@ 2016-07-25 22:00               ` Bart Van Assche
  2016-07-26  1:16                 ` Mike Snitzer
  0 siblings, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-07-25 22:00 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-scsi

On 07/25/2016 02:23 PM, Mike Snitzer wrote:
> So I'd be curious to know if your debugging has enabled you to identify
> exactly where in the dm-mapth.c code the -EIO return is being
> established.  do_end_io() is the likely candidate -- but again the
> __must_push_back() check should prevent it and DM_ENDIO_REQUEUE should
> be returned.

Hello Mike,

Thanks for looking further into this. The pr_info() statement that I had 
added in the following code block in __multipath_map() fired what told 
me that the following code block triggered the -EIO return:

	if (!pgpath) {
		if (!must_push_back(m))
			r = -EIO;	/* Failed */
		pr_info("%s(): (a) returning %d\n", __func__, r);
		return r;
	}

 From the system log:

kernel: mpath 254:0: queue_if_no_path 1 -> 0
kernel: __multipath_map(): (a) returning -5

The code that I had added in queue_if_no_path() is as follows:

	old = test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
         [ ... ]
	pr_info("mpath %s: queue_if_no_path %d -> %d\n",
		dm_device_name(dm_table_get_md(m->ti->table)), old,
		queue_if_no_path);

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-25 22:00               ` Bart Van Assche
@ 2016-07-26  1:16                 ` Mike Snitzer
  2016-07-26 22:51                   ` Bart Van Assche
  2016-07-27 20:09                   ` Mike Snitzer
  0 siblings, 2 replies; 74+ messages in thread
From: Mike Snitzer @ 2016-07-26  1:16 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Mon, Jul 25 2016 at  6:00P -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/25/2016 02:23 PM, Mike Snitzer wrote:
> >So I'd be curious to know if your debugging has enabled you to identify
> >exactly where in the dm-mapth.c code the -EIO return is being
> >established.  do_end_io() is the likely candidate -- but again the
> >__must_push_back() check should prevent it and DM_ENDIO_REQUEUE should
> >be returned.
> 
> Hello Mike,
> 
> Thanks for looking further into this. The pr_info() statement that I had
> added in the following code block in __multipath_map() fired what told me
> that the following code block triggered the -EIO return:
> 
> 	if (!pgpath) {
> 		if (!must_push_back(m))
> 			r = -EIO;	/* Failed */
> 		pr_info("%s(): (a) returning %d\n", __func__, r);
> 		return r;
> 	}
> 
> From the system log:
> 
> kernel: mpath 254:0: queue_if_no_path 1 -> 0
> kernel: __multipath_map(): (a) returning -5
> 
> The code that I had added in queue_if_no_path() is as follows:
> 
> 	old = test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
>         [ ... ]
> 	pr_info("mpath %s: queue_if_no_path %d -> %d\n",
> 		dm_device_name(dm_table_get_md(m->ti->table)), old,
> 		queue_if_no_path);

Hi Bart,

Please try this patch to see if it fixes your issue, thanks.

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 52baf8a..287caa7 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -433,10 +433,17 @@ failed:
  */
 static int must_push_back(struct multipath *m)
 {
-	return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
-		((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
-		  test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
-		 dm_noflush_suspending(m->ti)));
+	bool r;
+	unsigned long flags;
+
+	spin_lock_irqsave(&m->lock, flags);
+	r = (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
+	     ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
+	       test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
+	      dm_noflush_suspending(m->ti)));
+	spin_unlock_irqrestore(&m->lock, flags);
+
+	return r;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-25 17:53           ` Mike Snitzer
  2016-07-25 21:23             ` Mike Snitzer
@ 2016-07-26  6:02             ` Hannes Reinecke
  2016-07-26 16:06               ` Mike Snitzer
  1 sibling, 1 reply; 74+ messages in thread
From: Hannes Reinecke @ 2016-07-26  6:02 UTC (permalink / raw)
  To: dm-devel

On 07/25/2016 07:53 PM, Mike Snitzer wrote:
> On Thu, Jul 21 2016 at  4:58pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
>> On 07/20/2016 11:33 AM, Mike Snitzer wrote:
>>> Would be interesting to know the error returned from map_request()'s
>>> ti->type->clone_and_map_rq().  Really should just be DM_MAPIO_REQUEUE.
>>> But the stack you've provided shows map_request calling
>>> dm_complete_request(), which implies dm_kill_unmapped_request() is being
>>> called due to ti->type->clone_and_map_rq() returning < 0.
>>
>> Hello Mike,
>>
>> Apparently certain requests fail with -EIO because DM_DEV_SUSPEND
>> ioctls are being submitted to the same multipath target. As you know
>> DM_DEV_SUSPEND changes QUEUE_IF_NO_PATH from 1 into 0. A WARN_ON()
>> statement that I added in driver dm-mpath statement learned me that
>> multipathd is submitting these DM_DEV_SUSPEND ioctls. In the output
>> of strace -fp$(pidof multipathd) I found the following:
>>
>> [pid 13927] ioctl(5, DM_TABLE_STATUS, 0x7fa1000483f0) = 0
>> [pid 13927] write(1, "mpathbe: failed to setup multipa"..., 35) = 35
>> [pid 13927] write(1, "dm-0: uev_add_map failed\n", 25) = 25
>> [pid 13927] write(1, "uevent trigger error\n", 21) = 21
>> [pid 13927] write(1, "sdh: remove path (uevent)\n", 26) = 26
>> [pid 13927] ioctl(5, DM_TABLE_LOAD, 0x7fa1000483f0) = 0
>> [pid 13927] ioctl(5, DM_DEV_SUSPEND, 0x7fa1000483f0) = 0
>>
>> I'm still analyzing these and other messages.
>
> The various ioctls you're seeing is just multipathd responding to the
> failures.  Part of reloading a table (with revised path info, etc) is to
> suspend and then resume the device that is being updated.
>
> But I'm not actually sure on the historic reasoning of why
> queue_if_no_path is disabled (and active setting saved) on suspend.
>
> I'll think about this further but maybe others recall why?
>
Yes, originally multipath was using the device-mapper internal queueing 
mechanism, which meant that queued I/O was holding a reference to the 
table. So you couldn't change the table while I/O was pending/queued, 
and you'd have to unset queue_if_no_path to allow the tables to be swapped.
(Or something. That's the reasoning I gave to myself when seeing that 
construct. And I found myself pretty convincing. Not that I've ever 
tested that.)
So with commit e809917735ebf ("dm mpath: push back requests instead of
queueing") that obviously isn't true anymore, and I'd love to see that
quirk go.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-26  6:02             ` Hannes Reinecke
@ 2016-07-26 16:06               ` Mike Snitzer
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Snitzer @ 2016-07-26 16:06 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel

On Tue, Jul 26 2016 at  2:02am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On 07/25/2016 07:53 PM, Mike Snitzer wrote:
> >
> >The various ioctls you're seeing is just multipathd responding to the
> >failures.  Part of reloading a table (with revised path info, etc) is to
> >suspend and then resume the device that is being updated.
> >
> >But I'm not actually sure on the historic reasoning of why
> >queue_if_no_path is disabled (and active setting saved) on suspend.
> >
> >I'll think about this further but maybe others recall why?
> >
> Yes, originally multipath was using the device-mapper internal
> queueing mechanism, which meant that queued I/O was holding a
> reference to the table. So you couldn't change the table while I/O
> was pending/queued, and you'd have to unset queue_if_no_path to
> allow the tables to be swapped.
> (Or something. That's the reasoning I gave to myself when seeing
> that construct. And I found myself pretty convincing. Not that I've
> ever tested that.)
> So with commit e809917735ebf ("dm mpath: push back requests instead of
> queueing") that obviously isn't true anymore, and I'd love to see that
> quirk go.

Yeah, but unfortunately (or fortunately?) I've reinstated bio-based
multipath support -- for current 4.8 merge -- that actually _does_ make
use of (and reintroduces) the multipath target's internal queueing only
for bio-based use, see:

https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=76e33fe4e2c4363c2b9f627472bd43dc235c3406

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-26  1:16                 ` Mike Snitzer
@ 2016-07-26 22:51                   ` Bart Van Assche
  2016-07-27 14:08                     ` Mike Snitzer
  2016-07-27 20:09                   ` Mike Snitzer
  1 sibling, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-07-26 22:51 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-scsi

On 07/25/2016 06:15 PM, Mike Snitzer wrote:
> Please try this patch to see if it fixes your issue, thanks.
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 52baf8a..287caa7 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -433,10 +433,17 @@ failed:
>   */
>  static int must_push_back(struct multipath *m)
>  {
> -	return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> -		((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> -		  test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> -		 dm_noflush_suspending(m->ti)));
> +	bool r;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&m->lock, flags);
> +	r = (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> +	     ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> +	       test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> +	      dm_noflush_suspending(m->ti)));
> +	spin_unlock_irqrestore(&m->lock, flags);
> +
> +	return r;
>  }
>  
>  /*
 
Hello Mike,

Thank you for having made this patch available. Unfortunately even
with this patch applied I still see fio reporting I/O errors and the
following text still appears in the system log immediately before the
I/O errors are reported (due to debug statements I added in the device
mapper; mpath 254:0 and mpathbe refer to the same dm device):

Jul 26 15:40:37 ion-dev-ib-ini kernel: mpath 254:0: queue_if_no_path 0 -> 1
Jul 26 15:40:37 ion-dev-ib-ini kernel: executing DM ioctl DEV_SUSPEND on mpathbe
Jul 26 15:40:37 ion-dev-ib-ini kernel: mpath 254:0: queue_if_no_path 1 -> 0

BTW, I have not yet been able to determine which user space code
triggers the DEV_SUSPEND ioctl. A condlog() call I had added just
above dm_simplecmd_flush(DM_DEVICE_SUSPEND, mapname, 0) call in
multipathd did not produce any output.

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-26 22:51                   ` Bart Van Assche
@ 2016-07-27 14:08                     ` Mike Snitzer
  2016-07-27 15:52                       ` [dm-devel] " Benjamin Marzinski
  0 siblings, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-07-27 14:08 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Tue, Jul 26 2016 at  6:51pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/25/2016 06:15 PM, Mike Snitzer wrote:
> > Please try this patch to see if it fixes your issue, thanks.
> > 
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index 52baf8a..287caa7 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -433,10 +433,17 @@ failed:
> >   */
> >  static int must_push_back(struct multipath *m)
> >  {
> > -	return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> > -		((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> > -		  test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> > -		 dm_noflush_suspending(m->ti)));
> > +	bool r;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&m->lock, flags);
> > +	r = (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> > +	     ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> > +	       test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> > +	      dm_noflush_suspending(m->ti)));
> > +	spin_unlock_irqrestore(&m->lock, flags);
> > +
> > +	return r;
> >  }
> >  
> >  /*
>  
> Hello Mike,
> 
> Thank you for having made this patch available. Unfortunately even
> with this patch applied I still see fio reporting I/O errors and the
> following text still appears in the system log immediately before the
> I/O errors are reported (due to debug statements I added in the device
> mapper; mpath 254:0 and mpathbe refer to the same dm device):
> 
> Jul 26 15:40:37 ion-dev-ib-ini kernel: mpath 254:0: queue_if_no_path 0 -> 1
> Jul 26 15:40:37 ion-dev-ib-ini kernel: executing DM ioctl DEV_SUSPEND on mpathbe
> Jul 26 15:40:37 ion-dev-ib-ini kernel: mpath 254:0: queue_if_no_path 1 -> 0

This is all as expected.  Only question I have: is
dm_noflush_suspending() false? -- I assume so given must_push_back() is
returning false.

I'm struggling to appreciate why must_push_back() is only true if
noflush is used.  Regardless of which type, if there are no paths and
queue_if_no_path was configured (implied by current != saved) then we
shouldn't be returning -EIO back up the stack.

> BTW, I have not yet been able to determine which user space code
> triggers the DEV_SUSPEND ioctl. A condlog() call I had added just
> above dm_simplecmd_flush(DM_DEVICE_SUSPEND, mapname, 0) call in
> multipathd did not produce any output.

I need to dig into the multipath-tools userspace code more to be able to
answer.  But I've cc'd Ben Marzinski explicitly to get his insight.

Curious if multipath-tools _always_ use the noflush variant of suspend?
If not then we're setting ourselves up to return -EIO when we shouldn't.

Mike

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-07-27 14:08                     ` Mike Snitzer
@ 2016-07-27 15:52                       ` Benjamin Marzinski
  2016-07-27 19:06                         ` Bart Van Assche
  0 siblings, 1 reply; 74+ messages in thread
From: Benjamin Marzinski @ 2016-07-27 15:52 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Bart Van Assche, device-mapper development, linux-scsi

On Wed, Jul 27, 2016 at 10:08:28AM -0400, Mike Snitzer wrote:
> On Tue, Jul 26 2016 at  6:51pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > On 07/25/2016 06:15 PM, Mike Snitzer wrote:
> > > Please try this patch to see if it fixes your issue, thanks.
> > > 
> > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > index 52baf8a..287caa7 100644
> > > --- a/drivers/md/dm-mpath.c
> > > +++ b/drivers/md/dm-mpath.c
> > > @@ -433,10 +433,17 @@ failed:
> > >   */
> > >  static int must_push_back(struct multipath *m)
> > >  {
> > > -	return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> > > -		((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> > > -		  test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> > > -		 dm_noflush_suspending(m->ti)));
> > > +	bool r;
> > > +	unsigned long flags;
> > > +
> > > +	spin_lock_irqsave(&m->lock, flags);
> > > +	r = (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> > > +	     ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> > > +	       test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> > > +	      dm_noflush_suspending(m->ti)));
> > > +	spin_unlock_irqrestore(&m->lock, flags);
> > > +
> > > +	return r;
> > >  }
> > >  
> > >  /*
> >  
> > Hello Mike,
> > 
> > Thank you for having made this patch available. Unfortunately even
> > with this patch applied I still see fio reporting I/O errors and the
> > following text still appears in the system log immediately before the
> > I/O errors are reported (due to debug statements I added in the device
> > mapper; mpath 254:0 and mpathbe refer to the same dm device):
> > 
> > Jul 26 15:40:37 ion-dev-ib-ini kernel: mpath 254:0: queue_if_no_path 0 -> 1
> > Jul 26 15:40:37 ion-dev-ib-ini kernel: executing DM ioctl DEV_SUSPEND on mpathbe
> > Jul 26 15:40:37 ion-dev-ib-ini kernel: mpath 254:0: queue_if_no_path 1 -> 0
> 
> This is all as expected.  Only question I have: is
> dm_noflush_suspending() false? -- I assume so given must_push_back() is
> returning false.
> 
> I'm struggling to appreciate why must_push_back() is only true if
> noflush is used.  Regardless of which type, if there are no paths and
> queue_if_no_path was configured (implied by current != saved) then we
> shouldn't be returning -EIO back up the stack.
> 
> > BTW, I have not yet been able to determine which user space code
> > triggers the DEV_SUSPEND ioctl. A condlog() call I had added just
> > above dm_simplecmd_flush(DM_DEVICE_SUSPEND, mapname, 0) call in
> > multipathd did not produce any output.

if you look in drivers/md/dm-ioctl.c at do_resume(), device mapper
internally does a suspend when you call resume with a new table loaded.
That's when these suspends are happening.

In the userspace tools, this happens in the DM_DEVICE_RESUME calls after
dm_addmap_reload(), which loads the new table.  These all happen in the
domap() function.
 
> I need to dig into the multipath-tools userspace code more to be able to
> answer.  But I've cc'd Ben Marzinski explicitly to get his insight.
> 
> Curious if multipath-tools _always_ use the noflush variant of suspend?
> If not then we're setting ourselves up to return -EIO when we shouldn't.

There is only one time when we don't use noflush. That's when we resize
the table, and that's because we could otherwise have IOs that are past
the end of the device. It's been a known issue for a while now that you
cannot resize a multipath device with no working paths. Or (to say it in
a way that doesn't assume that people are stupid) if you lose all of
your paths while resizing a multipath device, IOs will fail. I've never
heard of anyone pushing back on that limitation.

-Ben

> Mike
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-07-27 15:52                       ` [dm-devel] " Benjamin Marzinski
@ 2016-07-27 19:06                         ` Bart Van Assche
  2016-07-27 19:54                           ` Mike Snitzer
  0 siblings, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-07-27 19:06 UTC (permalink / raw)
  To: Benjamin Marzinski, Mike Snitzer; +Cc: device-mapper development, linux-scsi

On 07/27/2016 08:52 AM, Benjamin Marzinski wrote:
> if you look in drivers/md/dm-ioctl.c at do_resume(), device mapper
> internally does a suspend when you call resume with a new table loaded.
> That's when these suspends are happening.
>
> In the userspace tools, this happens in the DM_DEVICE_RESUME calls after
> dm_addmap_reload(), which loads the new table.  These all happen in the
> domap() function.

Thanks Ben for chiming in. As far as I can see md->map is only used in 
the I/O submission path but not in the I/O completion path. So why to 
suspend and resume I/O before activating the new map? Do you think it 
would be safe to activate the new map without suspending and resuming I/O?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-27 19:06                         ` Bart Van Assche
@ 2016-07-27 19:54                           ` Mike Snitzer
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Snitzer @ 2016-07-27 19:54 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Benjamin Marzinski, device-mapper development, linux-scsi

On Wed, Jul 27 2016 at  3:06pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/27/2016 08:52 AM, Benjamin Marzinski wrote:
> >if you look in drivers/md/dm-ioctl.c at do_resume(), device mapper
> >internally does a suspend when you call resume with a new table loaded.
> >That's when these suspends are happening.
> >
> >In the userspace tools, this happens in the DM_DEVICE_RESUME calls after
> >dm_addmap_reload(), which loads the new table.  These all happen in the
> >domap() function.

multipathd's only call to dm_addmap_reload() with flush = true is the
ACT_RESIZE case in do_map().

> Thanks Ben for chiming in. As far as I can see md->map is only used
> in the I/O submission path but not in the I/O completion path. So
> why to suspend and resume I/O before activating the new map? Do you
> think it would be safe to activate the new map without suspending
> and resuming I/O?

That is the DM state machine.  Before a new table can be swapped in, via
resume, the old map needs to first be suspended.

I appreciate you just hunting for _why_ on this but questioning this
aspect of userspace <-> kernel ioctl interface between multipathd and
dm-mpath isn't productive.

I'll focus on reviewing 4.7's flag management (if there is anywhere that
queue_if_no_path and the saved_ variant are managed without holding the
lock).  Basically the 4.7 changes that reduced/dropped holding the
m->lock in so many places could have allowed for some race that is
causing must_push_back() to return false (in 4.7) when it previously
return true (<= 4.6).

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-26  1:16                 ` Mike Snitzer
  2016-07-26 22:51                   ` Bart Van Assche
@ 2016-07-27 20:09                   ` Mike Snitzer
  2016-07-27 23:05                     ` Bart Van Assche
  1 sibling, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-07-27 20:09 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Mon, Jul 25 2016 at  9:16pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> Hi Bart,
> 
> Please try this patch to see if it fixes your issue, thanks.
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 52baf8a..287caa7 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -433,10 +433,17 @@ failed:
>   */
>  static int must_push_back(struct multipath *m)
>  {
> -	return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> -		((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> -		  test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> -		 dm_noflush_suspending(m->ti)));
> +	bool r;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&m->lock, flags);
> +	r = (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> +	     ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> +	       test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> +	      dm_noflush_suspending(m->ti)));
> +	spin_unlock_irqrestore(&m->lock, flags);
> +
> +	return r;
>  }
>  
>  /*

In addition to the above patch, please apply this patch and retest your
4.7 kernel:

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 287caa7..16583c1 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1416,12 +1416,14 @@ static void multipath_postsuspend(struct dm_target *ti)
 static void multipath_resume(struct dm_target *ti)
 {
 	struct multipath *m = ti->private;
+	unsigned long flags;
 
+	spin_lock_irqsave(&m->lock, flags);
 	if (test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags))
 		set_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
 	else
 		clear_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
-	smp_mb__after_atomic();
+	spin_unlock_irqrestore(&m->lock, flags);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-27 20:09                   ` Mike Snitzer
@ 2016-07-27 23:05                     ` Bart Van Assche
  2016-07-28 13:33                       ` Mike Snitzer
  0 siblings, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-07-27 23:05 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-scsi

On 07/27/2016 01:09 PM, Mike Snitzer wrote:
>  In addition to the above patch, please apply this patch and retest your
> 4.7 kernel:
>
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 287caa7..16583c1 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -1416,12 +1416,14 @@ static void multipath_postsuspend(struct dm_target *ti)
>  static void multipath_resume(struct dm_target *ti)
>  {
>  	struct multipath *m = ti->private;
> +	unsigned long flags;
>
> +	spin_lock_irqsave(&m->lock, flags);
>  	if (test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags))
>  		set_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
>  	else
>  		clear_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
> -	smp_mb__after_atomic();
> +	spin_unlock_irqrestore(&m->lock, flags);
>  }
>
>  /*

Hello Mike,

Thanks again for having made this patch available. I will test it as 
soon as I have the time. BTW, in the meantime I ran a few tests with 
DM_MQ_DEFAULT=n since until now I ran all tests with DM_MQ_DEFAULT=y. 
The result of these tests is as follows:
* v4.6.0, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=y: first simulated path 
removal triggers I/O errors.
* v4.6.4, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=n: test passes more than 
100 iterations.

I have not yet run any tests with kernel v4.5.x because in the test I 
ran the ib_srp and ib_srpt drivers are loaded on the same system and 
because I need five v4.7 LIO patches to run this test pass but 
unfortunately these patches do not apply cleanly on the v4.5.x code base.

Please let me know if you need more information.

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-27 23:05                     ` Bart Van Assche
@ 2016-07-28 13:33                       ` Mike Snitzer
  2016-07-28 15:23                         ` Bart Van Assche
  0 siblings, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-07-28 13:33 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Wed, Jul 27 2016 at  7:05pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/27/2016 01:09 PM, Mike Snitzer wrote:
> > In addition to the above patch, please apply this patch and retest your
> >4.7 kernel:
> >
> >diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> >index 287caa7..16583c1 100644
> >--- a/drivers/md/dm-mpath.c
> >+++ b/drivers/md/dm-mpath.c
> >@@ -1416,12 +1416,14 @@ static void multipath_postsuspend(struct dm_target *ti)
> > static void multipath_resume(struct dm_target *ti)
> > {
> > 	struct multipath *m = ti->private;
> >+	unsigned long flags;
> >
> >+	spin_lock_irqsave(&m->lock, flags);
> > 	if (test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags))
> > 		set_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
> > 	else
> > 		clear_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
> >-	smp_mb__after_atomic();
> >+	spin_unlock_irqrestore(&m->lock, flags);
> > }
> >
> > /*
> 
> Hello Mike,
> 
> Thanks again for having made this patch available. I will test it as
> soon as I have the time. BTW, in the meantime I ran a few tests with
> DM_MQ_DEFAULT=n since until now I ran all tests with
> DM_MQ_DEFAULT=y. The result of these tests is as follows:
> * v4.6.0, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=y: first simulated
> path removal triggers I/O errors.
> * v4.6.4, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=n: test passes more
> than 100 iterations.

I think this may point to an SRP issue then.  Is the synthetic "cable
pull" (by writing to /sys/class/srp_remote_ports/port-*/delete)
representitive of what actually happens if a cable is physically pulled?

Or is yur synthetic method hitting the device way harder than would
happen with an actual production fault?

Again, there hasn't been any report of failures (EIO or otherwise) with
extensive scsi-mq and dm-mq testing on a larger FC testbed.

> I have not yet run any tests with kernel v4.5.x because in the test
> I ran the ib_srp and ib_srpt drivers are loaded on the same system
> and because I need five v4.7 LIO patches to run this test pass but
> unfortunately these patches do not apply cleanly on the v4.5.x code
> base.
> 
> Please let me know if you need more information.

Can the target core be made to use SRP in loopback (local test machine)
mode?  The mptest harness currently defaults to using tcmloop.  Would be
great if I could somehow exercise the SRP code without needing a
fullblown IB setup.

But if there isn't a way to achieve that test coverage I can
probably/hopefully get access to a subset of a larger IB/SRP testbed.

Please advise, thanks.
Mike

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-28 13:33                       ` Mike Snitzer
@ 2016-07-28 15:23                         ` Bart Van Assche
  2016-07-28 15:40                           ` Mike Snitzer
  0 siblings, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-07-28 15:23 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-scsi

On 07/28/2016 06:33 AM, Mike Snitzer wrote:
> On Wed, Jul 27 2016 at  7:05pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> Thanks again for having made this patch available. I will test it as
>> soon as I have the time. BTW, in the meantime I ran a few tests with
>> DM_MQ_DEFAULT=n since until now I ran all tests with
>> DM_MQ_DEFAULT=y. The result of these tests is as follows:
>> * v4.6.0, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=y: first simulated
>> path removal triggers I/O errors.
>> * v4.6.4, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=n: test passes more
>> than 100 iterations.
>
> I think this may point to an SRP issue then.  Is the synthetic "cable
> pull" (by writing to /sys/class/srp_remote_ports/port-*/delete)
> representitive of what actually happens if a cable is physically pulled?
>
> Or is your synthetic method hitting the device way harder than would
> happen with an actual production fault?
>
> Again, there hasn't been any report of failures (EIO or otherwise) with
> extensive scsi-mq and dm-mq testing on a larger FC testbed.

Hello Mike,

Sorry but I disagree that the ib_srp driver would be causing the EIO 
errors because:
* All tests, including the tests that pass, were run with
   CONFIG_SCSI_MQ_DEFAULT=y in the kernel config. The same code paths
   were triggered in the ib_srp driver by all the tests
   (CONFIG_DM_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n).
* In my previous e-mails I have shown that the EIO error code is
   generated by the dm-mpath driver after all (SRP) paths have gone. So
   how could the ib_srp driver be involved?

There is an important difference between the SCSI FC drivers and ib_srp: 
after dev_loss_tmo expires FC drivers call scsi_remove_target() while 
the SRP transport layer triggers a call of scsi_remove_host().

Both writing into /sys/class/srp_remote_ports/*/delete and pulling a
cable make the ib_srp driver call scsi_remove_host(). The only 
difference is the timing. With the former method it is more likely that 
the time between submitting I/O and calling scsi_remove_host() is small.

>> I have not yet run any tests with kernel v4.5.x because in the test
>> I ran the ib_srp and ib_srpt drivers are loaded on the same system
>> and because I need five v4.7 LIO patches to run this test pass but
>> unfortunately these patches do not apply cleanly on the v4.5.x code
>> base.
>>
>> Please let me know if you need more information.
>
> Can the target core be made to use SRP in loopback (local test machine)
> mode?  The mptest harness currently defaults to using tcmloop.  Would be
> great if I could somehow exercise the SRP code without needing a
> fullblown IB setup.
>
> But if there isn't a way to achieve that test coverage I can
> probably/hopefully get access to a subset of a larger IB/SRP testbed.

All InfiniBand HCAs that I have encountered so far support loopback as 
long as at least one HCA port is up (either connected to a switch or 
connected to another HCA port and opensm is running against one of these 
two ports).

The scripts I used to test the ib_srp driver are available at 
https://github.com/bvanassche/srp-test.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-07-28 15:23                         ` Bart Van Assche
@ 2016-07-28 15:40                           ` Mike Snitzer
  2016-07-29  6:28                             ` [dm-devel] " Hannes Reinecke
  0 siblings, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-07-28 15:40 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, linux-scsi

On Thu, Jul 28 2016 at 11:23am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/28/2016 06:33 AM, Mike Snitzer wrote:
> >On Wed, Jul 27 2016 at  7:05pm -0400,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>Thanks again for having made this patch available. I will test it as
> >>soon as I have the time. BTW, in the meantime I ran a few tests with
> >>DM_MQ_DEFAULT=n since until now I ran all tests with
> >>DM_MQ_DEFAULT=y. The result of these tests is as follows:
> >>* v4.6.0, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=y: first simulated
> >>path removal triggers I/O errors.
> >>* v4.6.4, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=n: test passes more
> >>than 100 iterations.
> >
> >I think this may point to an SRP issue then.  Is the synthetic "cable
> >pull" (by writing to /sys/class/srp_remote_ports/port-*/delete)
> >representitive of what actually happens if a cable is physically pulled?
> >
> >Or is your synthetic method hitting the device way harder than would
> >happen with an actual production fault?
> >
> >Again, there hasn't been any report of failures (EIO or otherwise) with
> >extensive scsi-mq and dm-mq testing on a larger FC testbed.
> 
> Hello Mike,
> 
> Sorry but I disagree that the ib_srp driver would be causing the EIO
> errors because:
> * All tests, including the tests that pass, were run with
>   CONFIG_SCSI_MQ_DEFAULT=y in the kernel config. The same code paths
>   were triggered in the ib_srp driver by all the tests
>   (CONFIG_DM_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n).
> * In my previous e-mails I have shown that the EIO error code is
>   generated by the dm-mpath driver after all (SRP) paths have gone. So
>   how could the ib_srp driver be involved?
> 
> There is an important difference between the SCSI FC drivers and
> ib_srp: after dev_loss_tmo expires FC drivers call
> scsi_remove_target() while the SRP transport layer triggers a call
> of scsi_remove_host().
> 
> Both writing into /sys/class/srp_remote_ports/*/delete and pulling a
> cable make the ib_srp driver call scsi_remove_host(). The only
> difference is the timing. With the former method it is more likely
> that the time between submitting I/O and calling scsi_remove_host()
> is small.

Reality is I just need a testbed to reproduce.  This back and forth
isn't really helping us converge on _why_ must_push_back() is returning
false for your case.  I need to know what exactly is causing that method
to return false in your case.

As is, hard to see why blk-mq vs .request_fn interface for DM mpath
device would cause must_push_back() to return false vs true.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-07-28 15:40                           ` Mike Snitzer
@ 2016-07-29  6:28                             ` Hannes Reinecke
  0 siblings, 0 replies; 74+ messages in thread
From: Hannes Reinecke @ 2016-07-29  6:28 UTC (permalink / raw)
  To: Mike Snitzer, Bart Van Assche; +Cc: device-mapper development, linux-scsi

On 07/28/2016 05:40 PM, Mike Snitzer wrote:
> On Thu, Jul 28 2016 at 11:23am -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
>> On 07/28/2016 06:33 AM, Mike Snitzer wrote:
[ .. ]
> 
> Reality is I just need a testbed to reproduce.  This back and forth
> isn't really helping us converge on _why_ must_push_back() is returning
> false for your case.  I need to know what exactly is causing that method
> to return false in your case.
> 
> As is, hard to see why blk-mq vs .request_fn interface for DM mpath
> device would cause must_push_back() to return false vs true.
> 
I wonder if that isn't the same issue I've seen (and tried to discuss at
LSF), hitting the printk in  blk_cloned_rq_check_limits().
If I would hazard a guess I'd say that the queue limits become
temporarily invalidated during failover, and we're managing to submit an
I/O at just that time.

I am currently working on getting FCoE to run over virtio; if that works
we should be a good synthetic testbed for reproducing.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
       [not found]                         ` <17da3ab0-233a-2cec-f921-bfd42c953ccc@sandisk.com>
@ 2016-08-01 17:59                           ` Mike Snitzer
  2016-08-01 18:55                             ` Bart Van Assche
  0 siblings, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-08-01 17:59 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Laurence Oberman, dm-devel, linux-scsi

With this debug patch ontop of v4.7:

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 52baf8a..22baf29 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -433,10 +433,22 @@ failed:
  */
 static int must_push_back(struct multipath *m)
 {
+	bool queue_if_no_path = test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
+	bool suspend_active = (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
+			       test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags));
+	bool suspending = (suspend_active && dm_noflush_suspending(m->ti));
+
+#if 0
 	return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
 		((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
 		  test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
 		 dm_noflush_suspending(m->ti)));
+#else
+	if (!queue_if_no_path || !suspending)
+		DMERR_LIMIT("%s: queue_if_no_path=%d suspend_active=%d suspending=%d",
+			    __func__, queue_if_no_path, suspend_active, suspending);
+	return (queue_if_no_path || suspending);
+#endif
 }
 
 /*
@@ -459,7 +471,7 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 		pgpath = choose_pgpath(m, nr_bytes);
 
 	if (!pgpath) {
-		if (!must_push_back(m))
+		if (WARN_ON_ONCE(!must_push_back(m)))
 			r = -EIO;	/* Failed */
 		return r;
 	} else if (test_bit(MPATHF_QUEUE_IO, &m->flags) ||
@@ -1347,7 +1359,7 @@ static int do_end_io(struct multipath *m, struct request *clone,
 
 	if (!atomic_read(&m->nr_valid_paths)) {
 		if (!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) {
-			if (!must_push_back(m))
+			if (WARN_ON_ONCE(!must_push_back(m)))
 				r = -EIO;
 		} else {
 			if (error == -EBADE)

I got:

Aug  1 13:41:48 client kernel: blk_update_request: I/O error, dev sdbf, sector 34376
Aug  1 13:41:48 client kernel: blk_update_request: I/O error, dev sdbf, sector 183608
Aug  1 13:41:48 client kernel: device-mapper: multipath: Failing path 67:144.
Aug  1 13:41:48 client kernel: blk_update_request: I/O error, dev sdbf, sector 237352
Aug  1 13:41:48 client kernel: device-mapper: multipath: must_push_back: queue_if_no_path=0 suspend_active=1 suspending=1
Aug  1 13:41:48 client kernel: device-mapper: multipath: must_push_back: queue_if_no_path=0 suspend_active=1 suspending=1
Aug  1 13:41:48 client kernel: device-mapper: multipath: must_push_back: queue_if_no_path=0 suspend_active=1 suspending=1
Aug  1 13:41:48 client kernel: device-mapper: multipath: must_push_back: queue_if_no_path=0 suspend_active=1 suspending=1
Aug  1 13:41:48 client kernel: device-mapper: multipath: must_push_back: queue_if_no_path=0 suspend_active=1 suspending=0
Aug  1 13:41:48 client kernel: ------------[ cut here ]------------
Aug  1 13:41:48 client kernel: WARNING: CPU: 10 PID: 6445 at drivers/md/dm-mpath.c:474 __multipath_map.isra.11+0xb7/0x240 [dm_multipath]
Aug  1 13:41:48 client kernel: Modules linked in: dm_round_robin(O) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter rpcrdma ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_c
 lmulni_intel aesni_intel glue_helper
Aug  1 13:41:48 client kernel: lrw gf128mul ablk_helper cryptd ipmi_ssif iTCO_wdt iTCO_vendor_support sg ipmi_si hpilo pcspkr ipmi_msghandler acpi_power_meter hpwdt lpc_ich shpchp i7core_edac mfd_core edac_core pcc_cpufreq acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath(O) ip_tables xfs libcrc32c mlx5_ib ib_core sd_mod radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect mlx5_core sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw bnx2 hpsa i2c_core ptp pps_core scsi_transport_sas dm_mod(O) [last unloaded: dm_log]
Aug  1 13:41:48 client kernel: CPU: 10 PID: 6445 Comm: kworker/10:1H Tainted: G          IO    4.7.0.snitm #1
Aug  1 13:41:48 client kernel: Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
Aug  1 13:41:48 client kernel: Workqueue: kblockd blk_mq_requeue_work
Aug  1 13:41:48 client kernel: 0000000000000286 000000008737cd27 ffff880607ecfbb8 ffffffff8134259f
Aug  1 13:41:48 client kernel: 0000000000000000 0000000000000000 ffff880607ecfbf8 ffffffff81088fb1
Aug  1 13:41:48 client kernel: 000001da00000000 ffff880bd0393100 0000000000000000 0000000000000000
Aug  1 13:41:48 client kernel: Call Trace:
Aug  1 13:41:48 client kernel: [<ffffffff8134259f>] dump_stack+0x63/0x84
Aug  1 13:41:48 client kernel: [<ffffffff81088fb1>] __warn+0xd1/0xf0
Aug  1 13:41:48 client kernel: [<ffffffff810890ed>] warn_slowpath_null+0x1d/0x20
Aug  1 13:41:48 client kernel: [<ffffffffa0104e27>] __multipath_map.isra.11+0xb7/0x240 [dm_multipath]
Aug  1 13:41:48 client kernel: [<ffffffffa0104fca>] multipath_clone_and_map+0x1a/0x20 [dm_multipath]
Aug  1 13:41:48 client kernel: [<ffffffffa0002242>] map_request+0xd2/0x240 [dm_mod]
Aug  1 13:41:48 client kernel: [<ffffffffa000407e>] dm_mq_queue_rq+0x7e/0x110 [dm_mod]
Aug  1 13:41:48 client kernel: [<ffffffff81320d32>] __blk_mq_run_hw_queue+0x1f2/0x370
Aug  1 13:41:48 client kernel: [<ffffffff81320b25>] blk_mq_run_hw_queue+0x95/0xb0
Aug  1 13:41:48 client kernel: [<ffffffff81322438>] ? blk_mq_insert_request+0x88/0xc0
Aug  1 13:41:48 client kernel: [<ffffffff81321ba5>] blk_mq_start_hw_queue+0x15/0x20
Aug  1 13:41:48 client kernel: [<ffffffff81321be2>] blk_mq_start_hw_queues+0x32/0x50
Aug  1 13:41:48 client kernel: [<ffffffff813229e5>] blk_mq_requeue_work+0x115/0x140
Aug  1 13:41:48 client kernel: [<ffffffff810a1e72>] process_one_work+0x152/0x400
Aug  1 13:41:48 client kernel: [<ffffffff810a2765>] worker_thread+0x125/0x4b0
Aug  1 13:41:48 client kernel: [<ffffffff810a2640>] ? rescuer_thread+0x380/0x380
Aug  1 13:41:48 client kernel: [<ffffffff810a8298>] kthread+0xd8/0xf0
Aug  1 13:41:48 client kernel: [<ffffffff816ba17f>] ret_from_fork+0x1f/0x40
Aug  1 13:41:48 client kernel: [<ffffffff810a81c0>] ? kthread_park+0x60/0x60
Aug  1 13:41:48 client kernel: ---[ end trace 1be159facc3adabe ]---
Aug  1 13:41:48 client kernel: blk_update_request: I/O error, dev dm-28, sector 267144
Aug  1 13:41:48 client multipathd: 360001ff0b035d0000000001a8d8a001b: load table [0 62277025792 multipath 1 queue_if_no_path 0 0 0]
Aug  1 13:41:48 client multipathd: 360001ff0b035d0000000001a8d8a001b: Entering recovery mode: max_retries=12

This says to me that must_push_back is returning false because
dm_noflush_suspending() is false.  When this happens -EIO will escape up
the IO stack.

And this confirms that must_push_back() calling dm_noflush_suspending()
is quite suspect given queue_if_no_path was configured: we should
_always_ pushback if no paths are available.

I'll dig deeper on really understanding _why_ must_push_back() is coded
like it is.  There is a deep historic reason but hell if I can recall
what it is...

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-01 17:59                           ` Mike Snitzer
@ 2016-08-01 18:55                             ` Bart Van Assche
  2016-08-01 19:15                               ` Mike Snitzer
  2016-08-01 20:46                               ` Mike Snitzer
  0 siblings, 2 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-08-01 18:55 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Laurence Oberman, dm-devel, linux-scsi

On 08/01/2016 10:59 AM, Mike Snitzer wrote:
> This says to me that must_push_back is returning false because
> dm_noflush_suspending() is false.  When this happens -EIO will escape up
> the IO stack.
>
> And this confirms that must_push_back() calling dm_noflush_suspending()
> is quite suspect given queue_if_no_path was configured: we should
> _always_ pushback if no paths are available.
>
> I'll dig deeper on really understanding _why_ must_push_back() is coded
> like it is.

Hello Mike,

Earlier I had reported that I observe this behavior with 
CONFIG_DM_MQ_DEFAULT=y after the first simulated cable pull. I have been 
able to reproduce this behavior with CONFIG_DM_MQ_DEFAULT=n but it takes 
a large number of iterations to trigger this behavior. The output that 
appears on my setup in the kernel log with a bunch of printk()'s added 
in the dm-mpath driver for CONFIG_DM_MQ_DEFAULT=n is as follows (mpath 
254:0 and /dev/mapper/mpathbe refer to the same multipath device):

[  314.755582] mpath 254:0: queue_if_no_path 0 -> 1
[  314.770571] executing DM ioctl DEV_SUSPEND on mpathbe
[  314.770622] mpath 254:0: queue_if_no_path 1 -> 0
[  314.770657] __multipath_map(): (a) returning -5
[  314.770657] map_request(): clone_and_map_rq() returned -5
[  314.770658] dm_complete_request: error = -5

Bart.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-01 18:55                             ` Bart Van Assche
@ 2016-08-01 19:15                               ` Mike Snitzer
  2016-08-01 20:46                               ` Mike Snitzer
  1 sibling, 0 replies; 74+ messages in thread
From: Mike Snitzer @ 2016-08-01 19:15 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Laurence Oberman, dm-devel, linux-scsi

On Mon, Aug 01 2016 at  2:55pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 08/01/2016 10:59 AM, Mike Snitzer wrote:
> >This says to me that must_push_back is returning false because
> >dm_noflush_suspending() is false.  When this happens -EIO will escape up
> >the IO stack.
> >
> >And this confirms that must_push_back() calling dm_noflush_suspending()
> >is quite suspect given queue_if_no_path was configured: we should
> >_always_ pushback if no paths are available.
> >
> >I'll dig deeper on really understanding _why_ must_push_back() is coded
> >like it is.
> 
> Hello Mike,
> 
> Earlier I had reported that I observe this behavior with
> CONFIG_DM_MQ_DEFAULT=y after the first simulated cable pull. I have
> been able to reproduce this behavior with CONFIG_DM_MQ_DEFAULT=n but
> it takes a large number of iterations to trigger this behavior. The
> output that appears on my setup in the kernel log with a bunch of
> printk()'s added in the dm-mpath driver for CONFIG_DM_MQ_DEFAULT=n
> is as follows (mpath 254:0 and /dev/mapper/mpathbe refer to the same
> multipath device):
> 
> [  314.755582] mpath 254:0: queue_if_no_path 0 -> 1
> [  314.770571] executing DM ioctl DEV_SUSPEND on mpathbe
> [  314.770622] mpath 254:0: queue_if_no_path 1 -> 0
> [  314.770657] __multipath_map(): (a) returning -5
> [  314.770657] map_request(): clone_and_map_rq() returned -5
> [  314.770658] dm_complete_request: error = -5

OK, that makes no sense at all (nevermind that your trace really isn't
useful, my debug patch is more so).

The old .request_fn code checks !blk_queue_stopped() in dm_request_fn().
When a dm-mpath device is suspended the queue gets stopped (same goes
for dm-mq).  So once a request-based DM device is suspended the .map_rq
and .clone_and_map_rq hooks shouldn't get called.

BUT the difference between .request_fn and dm-mq is that dm-mq doesn't
ever check if the queue if stopped.  We rely on blk-core to ensure that
IO is not submitted (via dm_mq_queue_rq) while a device is suspended
(and because the queue is stopped it shouldn't).

Now you're saying both cases aren't working.. which is really confusing.
Basically the request-based DM core should protect against requests
being mapped while a device is suspended... seems to be mounting
evidence that isn't the case.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-01 18:55                             ` Bart Van Assche
  2016-08-01 19:15                               ` Mike Snitzer
@ 2016-08-01 20:46                               ` Mike Snitzer
  2016-08-01 22:41                                   ` Bart Van Assche
  1 sibling, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-08-01 20:46 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Laurence Oberman, dm-devel, linux-scsi

On Mon, Aug 01 2016 at  2:55P -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 08/01/2016 10:59 AM, Mike Snitzer wrote:
> >This says to me that must_push_back is returning false because
> >dm_noflush_suspending() is false.  When this happens -EIO will escape up
> >the IO stack.
> >
> >And this confirms that must_push_back() calling dm_noflush_suspending()
> >is quite suspect given queue_if_no_path was configured: we should
> >_always_ pushback if no paths are available.
> >
> >I'll dig deeper on really understanding _why_ must_push_back() is coded
> >like it is.
> 
> Hello Mike,
> 
> Earlier I had reported that I observe this behavior with
> CONFIG_DM_MQ_DEFAULT=y after the first simulated cable pull. I have been
> able to reproduce this behavior with CONFIG_DM_MQ_DEFAULT=n but it takes a
> large number of iterations to trigger this behavior. The output that appears
> on my setup in the kernel log with a bunch of printk()'s added in the
> dm-mpath driver for CONFIG_DM_MQ_DEFAULT=n is as follows (mpath 254:0 and
> /dev/mapper/mpathbe refer to the same multipath device):
> 
> [  314.755582] mpath 254:0: queue_if_no_path 0 -> 1
> [  314.770571] executing DM ioctl DEV_SUSPEND on mpathbe
> [  314.770622] mpath 254:0: queue_if_no_path 1 -> 0
> [  314.770657] __multipath_map(): (a) returning -5
> [  314.770657] map_request(): clone_and_map_rq() returned -5
> [  314.770658] dm_complete_request: error = -5

Hi Bart,

Please retry both variant (CONFIG_DM_MQ_DEFAULT=y first) with this patch
applied.  Interested to see if things look better for you (WARN_ON_ONCEs
added just to see if we hit the corresponding suspend/stopped state
while mapping requests -- if so this speaks to an inherently racey
problem that will need further investigation for a proper fix but
results from this should let us know if we're closer).

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1b2f962..0e0f6e0 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2007,6 +2007,9 @@ static int map_request(struct dm_rq_target_io *tio, struct request *rq,
 	struct dm_target *ti = tio->ti;
 	struct request *clone = NULL;
 
+	if (WARN_ON_ONCE(unlikely(dm_suspended_md(md))))
+		return DM_MAPIO_REQUEUE;
+
 	if (tio->clone) {
 		clone = tio->clone;
 		r = ti->type->map_rq(ti, clone, &tio->info);
@@ -2722,6 +2725,9 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		dm_put_live_table(md, srcu_idx);
 	}
 
+	if (WARN_ON_ONCE(unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))))
+		return BLK_MQ_RQ_QUEUE_BUSY;
+
 	if (ti->type->busy && ti->type->busy(ti))
 		return BLK_MQ_RQ_QUEUE_BUSY;
 

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-01 20:46                               ` Mike Snitzer
@ 2016-08-01 22:41                                   ` Bart Van Assche
  0 siblings, 0 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-08-01 22:41 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Laurence Oberman, linux-scsi

On 08/01/2016 01:46 PM, Mike Snitzer wrote:
> Please retry both variant (CONFIG_DM_MQ_DEFAULT=y first) with this patch
> applied.  Interested to see if things look better for you (WARN_ON_ONCEs
> added just to see if we hit the corresponding suspend/stopped state
> while mapping requests -- if so this speaks to an inherently racy
> problem that will need further investigation for a proper fix but
> results from this should let us know if we're closer).
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 1b2f962..0e0f6e0 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2007,6 +2007,9 @@ static int map_request(struct dm_rq_target_io *tio, struct request *rq,
>  	struct dm_target *ti = tio->ti;
>  	struct request *clone = NULL;
>  
> +	if (WARN_ON_ONCE(unlikely(dm_suspended_md(md))))
> +		return DM_MAPIO_REQUEUE;
> +
>  	if (tio->clone) {
>  		clone = tio->clone;
>  		r = ti->type->map_rq(ti, clone, &tio->info);
> @@ -2722,6 +2725,9 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
>  		dm_put_live_table(md, srcu_idx);
>  	}
>  
> +	if (WARN_ON_ONCE(unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))))
> +		return BLK_MQ_RQ_QUEUE_BUSY;
> +
>  	if (ti->type->busy && ti->type->busy(ti))
>  		return BLK_MQ_RQ_QUEUE_BUSY;

Hello Mike,

The test results with this patch and also the three other patches that
have been posted in the context of this e-mail thread applied on top of
kernel v4.7 are as follows:

(1) CONFIG_DM_MQ_DEFAULT=y and fio running on top of XFS:

>From the system log:

[ ... ]
mpath 254:0: queue_if_no_path 0 -> 1
executing DM ioctl DEV_SUSPEND on mpathbe
mpath 254:0: queue_if_no_path 1 -> 0
__multipath_map(): (a) returning -5
map_request(): clone_and_map_rq() returned -5
dm_complete_request: error = -5
dm_softirq_done: dm-0 tio->error = -5
blk_update_request: I/O error (-5), dev dm-0, sector 311960
[ ... ]

After this test finished, "dmsetup remove_all" failed and the following
message appeared in the system log: "device-mapper: ioctl: remove_all
left 1 open device(s)".

Note: when I reran this test after a reboot "dmsetup remove_all" succeeded.


(2) CONFIG_DM_MQ_DEFAULT=y and fio running on top of ext4:

>From the system log:
[ ... ]
[  146.023067] WARNING: CPU: 2 PID: 482 at drivers/md/dm.c:2748 dm_mq_queue_rq+0xc1/0x150 [dm_mod]
[  146.026073] Workqueue: kblockd blk_mq_run_work_fn
[  146.026083] Call Trace:
[  146.026087]  [<ffffffff81320047>] dump_stack+0x68/0xa1
[  146.026090]  [<ffffffff81061c46>] __warn+0xc6/0xe0
[  146.026092]  [<ffffffff81061d18>] warn_slowpath_null+0x18/0x20
[  146.026098]  [<ffffffffa0286791>] dm_mq_queue_rq+0xc1/0x150 [dm_mod]
[  146.026100]  [<ffffffff81306f7a>] __blk_mq_run_hw_queue+0x1da/0x350
[  146.026102]  [<ffffffff813076c0>] blk_mq_run_work_fn+0x10/0x20
[  146.026105]  [<ffffffff8107efe9>] process_one_work+0x1f9/0x6a0
[  146.026109]  [<ffffffff8107f4d9>] worker_thread+0x49/0x490
[  146.026116]  [<ffffffff81085cda>] kthread+0xea/0x100
[  146.026119]  [<ffffffff81624fbf>] ret_from_fork+0x1f/0x40
[ ... ]
[  146.269194] mpath 254:1: queue_if_no_path 0 -> 1
[  146.276502] executing DM ioctl DEV_SUSPEND on mpathbf
[  146.276556] mpath 254:1: queue_if_no_path 1 -> 0
[  146.276560] __multipath_map(): (a) returning -5
[  146.276561] map_request(): clone_and_map_rq() returned -5
[  146.276562] dm_complete_request: error = -5
[  146.276563] dm_softirq_done: dm-1 tio->error = -5
[  146.276566] blk_update_request: I/O error (-5), dev dm-1, sector 2097144
[ ... ]

After this test finished running "dmsetup remove_all" and unloading ib_srp
succeeded.


(3) CONFIG_DM_MQ_DEFAULT=n and fio running on top of XFS:

The first run of this test passed. During the second run fio reported
an I/O error. From the system log:

[ ... ]
[ 1290.010886] mpath 254:0: queue_if_no_path 0 -> 1
[ 1290.026905] executing DM ioctl DEV_SUSPEND on mpathbe
[ 1290.026960] mpath 254:0: queue_if_no_path 1 -> 0
[ 1290.027001] __multipath_map(): (a) returning -5
[ 1290.027002] map_request(): clone_and_map_rq() returned -5
[ 1290.027003] dm_complete_request: error = -5
[ ... ]


(4) CONFIG_DM_MQ_DEFAULT=n and fio running on top of ext4:

The first two runs of this test passed. After the second run "dmsetup
remove_all" failed and the following error message appeared in the system
log: "device-mapper: ioctl: remove_all left 1 open device(s)". The following
kernel thread might be the one that was holding open /dev/dm-0:

# ps aux | grep dio/
root      5306  0.0  0.0      0     0 ?        S<   15:24   0:00 [dio/dm-0]


Please let me know if you need more information.

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
@ 2016-08-01 22:41                                   ` Bart Van Assche
  0 siblings, 0 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-08-01 22:41 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Laurence Oberman, linux-scsi

On 08/01/2016 01:46 PM, Mike Snitzer wrote:
> Please retry both variant (CONFIG_DM_MQ_DEFAULT=y first) with this patch
> applied.  Interested to see if things look better for you (WARN_ON_ONCEs
> added just to see if we hit the corresponding suspend/stopped state
> while mapping requests -- if so this speaks to an inherently racy
> problem that will need further investigation for a proper fix but
> results from this should let us know if we're closer).
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 1b2f962..0e0f6e0 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2007,6 +2007,9 @@ static int map_request(struct dm_rq_target_io *tio, struct request *rq,
>  	struct dm_target *ti = tio->ti;
>  	struct request *clone = NULL;
>  
> +	if (WARN_ON_ONCE(unlikely(dm_suspended_md(md))))
> +		return DM_MAPIO_REQUEUE;
> +
>  	if (tio->clone) {
>  		clone = tio->clone;
>  		r = ti->type->map_rq(ti, clone, &tio->info);
> @@ -2722,6 +2725,9 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
>  		dm_put_live_table(md, srcu_idx);
>  	}
>  
> +	if (WARN_ON_ONCE(unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))))
> +		return BLK_MQ_RQ_QUEUE_BUSY;
> +
>  	if (ti->type->busy && ti->type->busy(ti))
>  		return BLK_MQ_RQ_QUEUE_BUSY;

Hello Mike,

The test results with this patch and also the three other patches that
have been posted in the context of this e-mail thread applied on top of
kernel v4.7 are as follows:

(1) CONFIG_DM_MQ_DEFAULT=y and fio running on top of XFS:

From the system log:

[ ... ]
mpath 254:0: queue_if_no_path 0 -> 1
executing DM ioctl DEV_SUSPEND on mpathbe
mpath 254:0: queue_if_no_path 1 -> 0
__multipath_map(): (a) returning -5
map_request(): clone_and_map_rq() returned -5
dm_complete_request: error = -5
dm_softirq_done: dm-0 tio->error = -5
blk_update_request: I/O error (-5), dev dm-0, sector 311960
[ ... ]

After this test finished, "dmsetup remove_all" failed and the following
message appeared in the system log: "device-mapper: ioctl: remove_all
left 1 open device(s)".

Note: when I reran this test after a reboot "dmsetup remove_all" succeeded.


(2) CONFIG_DM_MQ_DEFAULT=y and fio running on top of ext4:

From the system log:
[ ... ]
[  146.023067] WARNING: CPU: 2 PID: 482 at drivers/md/dm.c:2748 dm_mq_queue_rq+0xc1/0x150 [dm_mod]
[  146.026073] Workqueue: kblockd blk_mq_run_work_fn
[  146.026083] Call Trace:
[  146.026087]  [<ffffffff81320047>] dump_stack+0x68/0xa1
[  146.026090]  [<ffffffff81061c46>] __warn+0xc6/0xe0
[  146.026092]  [<ffffffff81061d18>] warn_slowpath_null+0x18/0x20
[  146.026098]  [<ffffffffa0286791>] dm_mq_queue_rq+0xc1/0x150 [dm_mod]
[  146.026100]  [<ffffffff81306f7a>] __blk_mq_run_hw_queue+0x1da/0x350
[  146.026102]  [<ffffffff813076c0>] blk_mq_run_work_fn+0x10/0x20
[  146.026105]  [<ffffffff8107efe9>] process_one_work+0x1f9/0x6a0
[  146.026109]  [<ffffffff8107f4d9>] worker_thread+0x49/0x490
[  146.026116]  [<ffffffff81085cda>] kthread+0xea/0x100
[  146.026119]  [<ffffffff81624fbf>] ret_from_fork+0x1f/0x40
[ ... ]
[  146.269194] mpath 254:1: queue_if_no_path 0 -> 1
[  146.276502] executing DM ioctl DEV_SUSPEND on mpathbf
[  146.276556] mpath 254:1: queue_if_no_path 1 -> 0
[  146.276560] __multipath_map(): (a) returning -5
[  146.276561] map_request(): clone_and_map_rq() returned -5
[  146.276562] dm_complete_request: error = -5
[  146.276563] dm_softirq_done: dm-1 tio->error = -5
[  146.276566] blk_update_request: I/O error (-5), dev dm-1, sector 2097144
[ ... ]

After this test finished running "dmsetup remove_all" and unloading ib_srp
succeeded.


(3) CONFIG_DM_MQ_DEFAULT=n and fio running on top of XFS:

The first run of this test passed. During the second run fio reported
an I/O error. From the system log:

[ ... ]
[ 1290.010886] mpath 254:0: queue_if_no_path 0 -> 1
[ 1290.026905] executing DM ioctl DEV_SUSPEND on mpathbe
[ 1290.026960] mpath 254:0: queue_if_no_path 1 -> 0
[ 1290.027001] __multipath_map(): (a) returning -5
[ 1290.027002] map_request(): clone_and_map_rq() returned -5
[ 1290.027003] dm_complete_request: error = -5
[ ... ]


(4) CONFIG_DM_MQ_DEFAULT=n and fio running on top of ext4:

The first two runs of this test passed. After the second run "dmsetup
remove_all" failed and the following error message appeared in the system
log: "device-mapper: ioctl: remove_all left 1 open device(s)". The following
kernel thread might be the one that was holding open /dev/dm-0:

# ps aux | grep dio/
root      5306  0.0  0.0      0     0 ?        S<   15:24   0:00 [dio/dm-0]


Please let me know if you need more information.

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-01 22:41                                   ` Bart Van Assche
  (?)
@ 2016-08-02 17:45                                   ` Mike Snitzer
  2016-08-03  0:19                                     ` Bart Van Assche
  -1 siblings, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-08-02 17:45 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, Laurence Oberman, linux-scsi

On Mon, Aug 01 2016 at  6:41pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 08/01/2016 01:46 PM, Mike Snitzer wrote:
> > Please retry both variant (CONFIG_DM_MQ_DEFAULT=y first) with this patch
> > applied.  Interested to see if things look better for you (WARN_ON_ONCEs
> > added just to see if we hit the corresponding suspend/stopped state
> > while mapping requests -- if so this speaks to an inherently racy
> > problem that will need further investigation for a proper fix but
> > results from this should let us know if we're closer).
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 1b2f962..0e0f6e0 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -2007,6 +2007,9 @@ static int map_request(struct dm_rq_target_io *tio, struct request *rq,
> >  	struct dm_target *ti = tio->ti;
> >  	struct request *clone = NULL;
> >  
> > +	if (WARN_ON_ONCE(unlikely(dm_suspended_md(md))))
> > +		return DM_MAPIO_REQUEUE;
> > +
> >  	if (tio->clone) {
> >  		clone = tio->clone;
> >  		r = ti->type->map_rq(ti, clone, &tio->info);
> > @@ -2722,6 +2725,9 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  		dm_put_live_table(md, srcu_idx);
> >  	}
> >  
> > +	if (WARN_ON_ONCE(unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))))
> > +		return BLK_MQ_RQ_QUEUE_BUSY;
> > +
> >  	if (ti->type->busy && ti->type->busy(ti))
> >  		return BLK_MQ_RQ_QUEUE_BUSY;
> 
> Hello Mike,
> 
> The test results with this patch and also the three other patches that
> have been posted in the context of this e-mail thread applied on top of
> kernel v4.7 are as follows:

Hi Bart,

Please do these same tests against a v4.7 kernel with the 4 patches from
this branch applied (no need for your other debug patches):
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7-mpath-fixes

I've had good results with my blk-mq SRP based testing.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-02 17:45                                   ` Mike Snitzer
@ 2016-08-03  0:19                                     ` Bart Van Assche
  2016-08-03  0:40                                       ` Mike Snitzer
  0 siblings, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-08-03  0:19 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Laurence Oberman, linux-scsi

On 08/02/2016 10:45 AM, Mike Snitzer wrote:
> Please do these same tests against a v4.7 kernel with the 4 patches from
> this branch applied (no need for your other debug patches):
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7-mpath-fixes
> 
> I've had good results with my blk-mq SRP based testing.

Hello Mike,

Thanks again for having made these patches available. The results of my
tests are as follows:

(1) CONFIG_DM_MQ_DEFAULT=y, fio running on top of xfs

The first simulated cable pull caused the following messages to appear:

[  428.716566] mpath 254:1: queue_if_no_path 1 -> 0
[  428.729671] __multipath_map(): (a) returning -5
[  428.729730] map_request(): clone_and_map_rq() returned -5
[  428.729788] dm_complete_request: error = -5
[  428.729846] dm_softirq_done: dm-1 tio->error = -5
[  428.729904] blk_update_request: 880 callbacks suppressed
[  428.729970] blk_update_request: I/O error (-5), dev dm-1, sector 2097024

(2) CONFIG_DM_MQ_DEFAULT=y, fio running on top of ext4

The first simulated cable pull caused the following messages to appear:

[  162.894737] mpath 254:0: queue_if_no_path 0 -> 1
[  162.903155] executing DM ioctl DEV_SUSPEND on mpathbe
[  162.903207] mpath 254:0: queue_if_no_path 1 -> 0
[  162.903255] device-mapper: multipath: must_push_back: queue_if_no_path=0 suspend_active=1 suspending=0
[  162.903256] __multipath_map(): (a) returning -5
[  162.903257] map_request(): clone_and_map_rq() returned -5
[  162.903258] dm_complete_request: error = -5
[  162.903259] dm_softirq_done: dm-0 tio->error = -5
[  162.903261] blk_update_request: I/O error (-5), dev dm-0, sector 263424
[  162.903284] Buffer I/O error on dev dm-0, logical block 32928, lost sync page write

(3) CONFIG_DM_MQ_DEFAULT=n, fio running on top of xfs

This test passed once but on the second run fio reported "bad magic header" after
a large number of iterations. I'm still analyzing the logs.

(4) CONFIG_DM_MQ_DEFAULT=n, fio running on top of ext4

Ran this test three times. The first two runs passed but during the third run
fio reported again I/O errors and I found the following in the kernel log:

[  954.048860] __multipath_map(): (a) returning -5
[  954.048861] map_request(): clone_and_map_rq() returned -5
[  954.048862] dm_complete_request: error = -5
[  954.048870] dm_softirq_done: dm-0 tio->error = -5
[  954.048873] blk_update_request: 15 callbacks suppressed
[  954.048874] blk_update_request: I/O error (-5), dev dm-0, sector 159976

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03  0:19                                     ` Bart Van Assche
@ 2016-08-03  0:40                                       ` Mike Snitzer
  2016-08-03  1:33                                         ` Laurence Oberman
  2016-08-03 16:55                                         ` Bart Van Assche
  0 siblings, 2 replies; 74+ messages in thread
From: Mike Snitzer @ 2016-08-03  0:40 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, Laurence Oberman, linux-scsi

On Tue, Aug 02 2016 at  8:19pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 08/02/2016 10:45 AM, Mike Snitzer wrote:
> > Please do these same tests against a v4.7 kernel with the 4 patches from
> > this branch applied (no need for your other debug patches):
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7-mpath-fixes
> > 
> > I've had good results with my blk-mq SRP based testing.
> 
> Hello Mike,
> 
> Thanks again for having made these patches available. The results of my
> tests are as follows:

Disappointing.  But I asked you to run the v4.7 kernel patches I
pointed to _without_ any of your debug patches.

I cannot reproduce on our SRP testbed with the fixes I provided.  We're
now in a place where there would appear to be something very unique to
your environment causing these failures.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03  0:40                                       ` Mike Snitzer
@ 2016-08-03  1:33                                         ` Laurence Oberman
  2016-08-03  2:10                                           ` Mike Snitzer
  2016-08-03 16:06                                           ` Bart Van Assche
  2016-08-03 16:55                                         ` Bart Van Assche
  1 sibling, 2 replies; 74+ messages in thread
From: Laurence Oberman @ 2016-08-03  1:33 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Bart Van Assche, dm-devel, linux-scsi

Hi Bart

I simplified the test to 2 simple scripts and only running against one XFS file system.
Can you validate these and tell me if its enough to emulate what you are doing.
Perhaps our test-suite is too simple.

Start the test

# cat run_test.sh
#!/bin/bash
logger "Starting Bart's test"
#for i in `seq 1 10`
for i in 1
do
	fio --verify=md5 -rw=randwrite --size=10M --bs=4K --loops=$((10**6)) \
        --iodepth=64 --group_reporting --sync=1 --direct=1 --ioengine=libaio \
        --directory="/data-$i" --name=data-integrity-test --thread --numjobs=16 \
        --runtime=600 --output=fio-output.txt >/dev/null &
done

Delete the host, I wait 10s in between host deletions. 
But I also tested with 3s and still its stable with Mike's patches.

#!/bin/bash
for i in /sys/class/srp_remote_ports/*
do
 echo "Deleting host $i, it will re-connect via srp_daemon" 
 echo 1 > $i/delete
 sleep 10
done

Check for I/O errors affecting XFS and we now have none with the patches Mike provided.
After recovery I can create files in the xfs mount with no issues.

Can you use my scripts and 1 mount and see if it still fails for you.

Thanks
Laurence

----- Original Message -----
> From: "Mike Snitzer" <snitzer@redhat.com>
> To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> Cc: dm-devel@redhat.com, "Laurence Oberman" <loberman@redhat.com>, linux-scsi@vger.kernel.org
> Sent: Tuesday, August 2, 2016 8:40:14 PM
> Subject: Re: dm-mq and end_clone_request()
> 
> On Tue, Aug 02 2016 at  8:19pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > On 08/02/2016 10:45 AM, Mike Snitzer wrote:
> > > Please do these same tests against a v4.7 kernel with the 4 patches from
> > > this branch applied (no need for your other debug patches):
> > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7-mpath-fixes
> > > 
> > > I've had good results with my blk-mq SRP based testing.
> > 
> > Hello Mike,
> > 
> > Thanks again for having made these patches available. The results of my
> > tests are as follows:
> 
> Disappointing.  But I asked you to run the v4.7 kernel patches I
> pointed to _without_ any of your debug patches.
> 
> I cannot reproduce on our SRP testbed with the fixes I provided.  We're
> now in a place where there would appear to be something very unique to
> your environment causing these failures.
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03  1:33                                         ` Laurence Oberman
@ 2016-08-03  2:10                                           ` Mike Snitzer
  2016-08-03  2:18                                             ` Laurence Oberman
  2016-08-03 16:06                                           ` Bart Van Assche
  1 sibling, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-08-03  2:10 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: Bart Van Assche, dm-devel, linux-scsi

On Tue, Aug 02 2016 at  9:33pm -0400,
Laurence Oberman <loberman@redhat.com> wrote:

> Hi Bart
> 
> I simplified the test to 2 simple scripts and only running against one XFS file system.
> Can you validate these and tell me if its enough to emulate what you are doing.
> Perhaps our test-suite is too simple.
> 
> Start the test
> 
> # cat run_test.sh
> #!/bin/bash
> logger "Starting Bart's test"
> #for i in `seq 1 10`
> for i in 1
> do
> 	fio --verify=md5 -rw=randwrite --size=10M --bs=4K --loops=$((10**6)) \
>         --iodepth=64 --group_reporting --sync=1 --direct=1 --ioengine=libaio \
>         --directory="/data-$i" --name=data-integrity-test --thread --numjobs=16 \
>         --runtime=600 --output=fio-output.txt >/dev/null &
> done
> 
> Delete the host, I wait 10s in between host deletions. 
> But I also tested with 3s and still its stable with Mike's patches.
> 
> #!/bin/bash
> for i in /sys/class/srp_remote_ports/*
> do
>  echo "Deleting host $i, it will re-connect via srp_daemon" 
>  echo 1 > $i/delete
>  sleep 10
> done
> 
> Check for I/O errors affecting XFS and we now have none with the patches Mike provided.
> After recovery I can create files in the xfs mount with no issues.
> 
> Can you use my scripts and 1 mount and see if it still fails for you.

In parallel we can try Bart's testsuite that he shared earlier in this
thread: https://github.com/bvanassche/srp-test

README.md says:
"Running these tests manually is tedious. Hence this test suite that
tests the SRP initiator and target drivers by loading both drivers on
the same server, by logging in using the IB loopback functionality and
by sending I/O through the SRP initiator driver to a RAM disk exported
by the SRP target driver."

This could explain why Bart is still seeing issues.  He isn't testing
real hardware -- as such he is using ramdisk to expose races, etc.

Mike

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03  2:10                                           ` Mike Snitzer
@ 2016-08-03  2:18                                             ` Laurence Oberman
  2016-08-03  2:55                                               ` Laurence Oberman
  0 siblings, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-03  2:18 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Bart Van Assche, dm-devel, linux-scsi



----- Original Message -----
> From: "Mike Snitzer" <snitzer@redhat.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: "Bart Van Assche" <bart.vanassche@sandisk.com>, dm-devel@redhat.com, linux-scsi@vger.kernel.org
> Sent: Tuesday, August 2, 2016 10:10:12 PM
> Subject: Re: dm-mq and end_clone_request()
> 
> On Tue, Aug 02 2016 at  9:33pm -0400,
> Laurence Oberman <loberman@redhat.com> wrote:
> 
> > Hi Bart
> > 
> > I simplified the test to 2 simple scripts and only running against one XFS
> > file system.
> > Can you validate these and tell me if its enough to emulate what you are
> > doing.
> > Perhaps our test-suite is too simple.
> > 
> > Start the test
> > 
> > # cat run_test.sh
> > #!/bin/bash
> > logger "Starting Bart's test"
> > #for i in `seq 1 10`
> > for i in 1
> > do
> > 	fio --verify=md5 -rw=randwrite --size=10M --bs=4K --loops=$((10**6)) \
> >         --iodepth=64 --group_reporting --sync=1 --direct=1
> >         --ioengine=libaio \
> >         --directory="/data-$i" --name=data-integrity-test --thread
> >         --numjobs=16 \
> >         --runtime=600 --output=fio-output.txt >/dev/null &
> > done
> > 
> > Delete the host, I wait 10s in between host deletions.
> > But I also tested with 3s and still its stable with Mike's patches.
> > 
> > #!/bin/bash
> > for i in /sys/class/srp_remote_ports/*
> > do
> >  echo "Deleting host $i, it will re-connect via srp_daemon"
> >  echo 1 > $i/delete
> >  sleep 10
> > done
> > 
> > Check for I/O errors affecting XFS and we now have none with the patches
> > Mike provided.
> > After recovery I can create files in the xfs mount with no issues.
> > 
> > Can you use my scripts and 1 mount and see if it still fails for you.
> 
> In parallel we can try Bart's testsuite that he shared earlier in this
> thread: https://github.com/bvanassche/srp-test
> 
> README.md says:
> "Running these tests manually is tedious. Hence this test suite that
> tests the SRP initiator and target drivers by loading both drivers on
> the same server, by logging in using the IB loopback functionality and
> by sending I/O through the SRP initiator driver to a RAM disk exported
> by the SRP target driver."
> 
> This could explain why Bart is still seeing issues.  He isn't testing
> real hardware -- as such he is using ramdisk to expose races, etc.
> 
> Mike
> 

Hi Mike,

I looked at Bart's scripts, they looked fine but I wanted a more simplified way to bring the error out.
Using ramdisk is not uncommon as an LIO backend via ib_srpt to serve LUNS.
That is the same way I do it when I am not connected to a large array as it is the only way I can get EDR like speeds.

I don't thinks its racing due to the ramdisk back-end but  maybe we need to ramp ours up to run more in parallel in a loop.

I will run 21 parallel runs and see if it makes a difference tonight and report back tomorrow.
Clearly prior to your final patches we were escaping back to the FS layer with errors but since your patches, at least in out test harness that is resolved.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03  2:18                                             ` Laurence Oberman
@ 2016-08-03  2:55                                               ` Laurence Oberman
  2016-08-03 15:10                                                 ` Laurence Oberman
  0 siblings, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-03  2:55 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Bart Van Assche, dm-devel, linux-scsi



----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Mike Snitzer" <snitzer@redhat.com>
> Cc: "Bart Van Assche" <bart.vanassche@sandisk.com>, dm-devel@redhat.com, linux-scsi@vger.kernel.org
> Sent: Tuesday, August 2, 2016 10:18:30 PM
> Subject: Re: dm-mq and end_clone_request()
> 
> 
> 
> ----- Original Message -----
> > From: "Mike Snitzer" <snitzer@redhat.com>
> > To: "Laurence Oberman" <loberman@redhat.com>
> > Cc: "Bart Van Assche" <bart.vanassche@sandisk.com>, dm-devel@redhat.com,
> > linux-scsi@vger.kernel.org
> > Sent: Tuesday, August 2, 2016 10:10:12 PM
> > Subject: Re: dm-mq and end_clone_request()
> > 
> > On Tue, Aug 02 2016 at  9:33pm -0400,
> > Laurence Oberman <loberman@redhat.com> wrote:
> > 
> > > Hi Bart
> > > 
> > > I simplified the test to 2 simple scripts and only running against one
> > > XFS
> > > file system.
> > > Can you validate these and tell me if its enough to emulate what you are
> > > doing.
> > > Perhaps our test-suite is too simple.
> > > 
> > > Start the test
> > > 
> > > # cat run_test.sh
> > > #!/bin/bash
> > > logger "Starting Bart's test"
> > > #for i in `seq 1 10`
> > > for i in 1
> > > do
> > > 	fio --verify=md5 -rw=randwrite --size=10M --bs=4K --loops=$((10**6)) \
> > >         --iodepth=64 --group_reporting --sync=1 --direct=1
> > >         --ioengine=libaio \
> > >         --directory="/data-$i" --name=data-integrity-test --thread
> > >         --numjobs=16 \
> > >         --runtime=600 --output=fio-output.txt >/dev/null &
> > > done
> > > 
> > > Delete the host, I wait 10s in between host deletions.
> > > But I also tested with 3s and still its stable with Mike's patches.
> > > 
> > > #!/bin/bash
> > > for i in /sys/class/srp_remote_ports/*
> > > do
> > >  echo "Deleting host $i, it will re-connect via srp_daemon"
> > >  echo 1 > $i/delete
> > >  sleep 10
> > > done
> > > 
> > > Check for I/O errors affecting XFS and we now have none with the patches
> > > Mike provided.
> > > After recovery I can create files in the xfs mount with no issues.
> > > 
> > > Can you use my scripts and 1 mount and see if it still fails for you.
> > 
> > In parallel we can try Bart's testsuite that he shared earlier in this
> > thread: https://github.com/bvanassche/srp-test
> > 
> > README.md says:
> > "Running these tests manually is tedious. Hence this test suite that
> > tests the SRP initiator and target drivers by loading both drivers on
> > the same server, by logging in using the IB loopback functionality and
> > by sending I/O through the SRP initiator driver to a RAM disk exported
> > by the SRP target driver."
> > 
> > This could explain why Bart is still seeing issues.  He isn't testing
> > real hardware -- as such he is using ramdisk to expose races, etc.
> > 
> > Mike
> > 
> 
> Hi Mike,
> 
> I looked at Bart's scripts, they looked fine but I wanted a more simplified
> way to bring the error out.
> Using ramdisk is not uncommon as an LIO backend via ib_srpt to serve LUNS.
> That is the same way I do it when I am not connected to a large array as it
> is the only way I can get EDR like speeds.
> 
> I don't thinks its racing due to the ramdisk back-end but  maybe we need to
> ramp ours up to run more in parallel in a loop.
> 
> I will run 21 parallel runs and see if it makes a difference tonight and
> report back tomorrow.
> Clearly prior to your final patches we were escaping back to the FS layer
> with errors but since your patches, at least in out test harness that is
> resolved.
> 
> Thanks
> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hello

I ran 20 parallel runs with 3 loops through host deletion and in each case fio survived with no hard error escaping to the FS layer.
Its solid in our test bed,
Keep in mind we have no ib_srpt loaded as we have a hardware based array and are connected directly to the array with EDR 100.
I am also not removing and reloading modules like is happening in Barts's scripts and also not trying to delete mpath maps etc.

I focused only on the I/O error that was escaping up to the FS layer.
I will check in with Bart tomorrow.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03  2:55                                               ` Laurence Oberman
@ 2016-08-03 15:10                                                 ` Laurence Oberman
  0 siblings, 0 replies; 74+ messages in thread
From: Laurence Oberman @ 2016-08-03 15:10 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Bart Van Assche, dm-devel, linux-scsi



----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Mike Snitzer" <snitzer@redhat.com>
> Cc: "Bart Van Assche" <bart.vanassche@sandisk.com>, dm-devel@redhat.com, linux-scsi@vger.kernel.org
> Sent: Tuesday, August 2, 2016 10:55:59 PM
> Subject: Re: dm-mq and end_clone_request()
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman@redhat.com>
> > To: "Mike Snitzer" <snitzer@redhat.com>
> > Cc: "Bart Van Assche" <bart.vanassche@sandisk.com>, dm-devel@redhat.com,
> > linux-scsi@vger.kernel.org
> > Sent: Tuesday, August 2, 2016 10:18:30 PM
> > Subject: Re: dm-mq and end_clone_request()
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Mike Snitzer" <snitzer@redhat.com>
> > > To: "Laurence Oberman" <loberman@redhat.com>
> > > Cc: "Bart Van Assche" <bart.vanassche@sandisk.com>, dm-devel@redhat.com,
> > > linux-scsi@vger.kernel.org
> > > Sent: Tuesday, August 2, 2016 10:10:12 PM
> > > Subject: Re: dm-mq and end_clone_request()
> > > 
> > > On Tue, Aug 02 2016 at  9:33pm -0400,
> > > Laurence Oberman <loberman@redhat.com> wrote:
> > > 
> > > > Hi Bart
> > > > 
> > > > I simplified the test to 2 simple scripts and only running against one
> > > > XFS
> > > > file system.
> > > > Can you validate these and tell me if its enough to emulate what you
> > > > are
> > > > doing.
> > > > Perhaps our test-suite is too simple.
> > > > 
> > > > Start the test
> > > > 
> > > > # cat run_test.sh
> > > > #!/bin/bash
> > > > logger "Starting Bart's test"
> > > > #for i in `seq 1 10`
> > > > for i in 1
> > > > do
> > > > 	fio --verify=md5 -rw=randwrite --size=10M --bs=4K --loops=$((10**6)) \
> > > >         --iodepth=64 --group_reporting --sync=1 --direct=1
> > > >         --ioengine=libaio \
> > > >         --directory="/data-$i" --name=data-integrity-test --thread
> > > >         --numjobs=16 \
> > > >         --runtime=600 --output=fio-output.txt >/dev/null &
> > > > done
> > > > 
> > > > Delete the host, I wait 10s in between host deletions.
> > > > But I also tested with 3s and still its stable with Mike's patches.
> > > > 
> > > > #!/bin/bash
> > > > for i in /sys/class/srp_remote_ports/*
> > > > do
> > > >  echo "Deleting host $i, it will re-connect via srp_daemon"
> > > >  echo 1 > $i/delete
> > > >  sleep 10
> > > > done
> > > > 
> > > > Check for I/O errors affecting XFS and we now have none with the
> > > > patches
> > > > Mike provided.
> > > > After recovery I can create files in the xfs mount with no issues.
> > > > 
> > > > Can you use my scripts and 1 mount and see if it still fails for you.
> > > 
> > > In parallel we can try Bart's testsuite that he shared earlier in this
> > > thread: https://github.com/bvanassche/srp-test
> > > 
> > > README.md says:
> > > "Running these tests manually is tedious. Hence this test suite that
> > > tests the SRP initiator and target drivers by loading both drivers on
> > > the same server, by logging in using the IB loopback functionality and
> > > by sending I/O through the SRP initiator driver to a RAM disk exported
> > > by the SRP target driver."
> > > 
> > > This could explain why Bart is still seeing issues.  He isn't testing
> > > real hardware -- as such he is using ramdisk to expose races, etc.
> > > 
> > > Mike
> > > 
> > 
> > Hi Mike,
> > 
> > I looked at Bart's scripts, they looked fine but I wanted a more simplified
> > way to bring the error out.
> > Using ramdisk is not uncommon as an LIO backend via ib_srpt to serve LUNS.
> > That is the same way I do it when I am not connected to a large array as it
> > is the only way I can get EDR like speeds.
> > 
> > I don't thinks its racing due to the ramdisk back-end but  maybe we need to
> > ramp ours up to run more in parallel in a loop.
> > 
> > I will run 21 parallel runs and see if it makes a difference tonight and
> > report back tomorrow.
> > Clearly prior to your final patches we were escaping back to the FS layer
> > with errors but since your patches, at least in out test harness that is
> > resolved.
> > 
> > Thanks
> > Laurence
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> Hello
> 
> I ran 20 parallel runs with 3 loops through host deletion and in each case
> fio survived with no hard error escaping to the FS layer.
> Its solid in our test bed,
> Keep in mind we have no ib_srpt loaded as we have a hardware based array and
> are connected directly to the array with EDR 100.
> I am also not removing and reloading modules like is happening in Barts's
> scripts and also not trying to delete mpath maps etc.
> 
> I focused only on the I/O error that was escaping up to the FS layer.
> I will check in with Bart tomorrow.
> 
> Thanks
> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hi Bart

Looking back at your email.

I also get these, but those are expected as we are in the process of doing I/O when we yank the hosts and that has in-flights affected.

Aug  2 22:41:23 jumpclient kernel: device-mapper: multipath: Failing path 8:192.
Aug  2 22:41:23 jumpclient kernel: blk_update_request: I/O error, dev sdm, sector 258504
Aug  2 22:41:23 jumpclient kernel: blk_update_request: I/O error, dev sdm, sector 60320

However I never get any of these any more (with the patches applied) that you show:
[  162.903284] Buffer I/O error on dev dm-0, logical block 32928, lost sync page write

I will work with you to understand why with Mike's patches, its now stable here but not in your configuration

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03  1:33                                         ` Laurence Oberman
  2016-08-03  2:10                                           ` Mike Snitzer
@ 2016-08-03 16:06                                           ` Bart Van Assche
  2016-08-03 17:25                                             ` Laurence Oberman
  2016-08-03 18:03                                             ` [dm-devel] " Laurence Oberman
  1 sibling, 2 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-08-03 16:06 UTC (permalink / raw)
  To: Laurence Oberman, Mike Snitzer; +Cc: dm-devel, linux-scsi

On 08/02/2016 06:33 PM, Laurence Oberman wrote:
> #!/bin/bash
> for i in /sys/class/srp_remote_ports/*
> do
>  echo "Deleting host $i, it will re-connect via srp_daemon"
>  echo 1 > $i/delete
>  sleep 10
> done

Hello Laurence,

Sorry but the above looks wrong to me. There should be a second loop 
around this loop and the sleep statement should be moved from the inner 
loop to the outer loop. The above code logs out one (initiator, target) 
port pair at a time instead of logging out all paths at once.

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03  0:40                                       ` Mike Snitzer
  2016-08-03  1:33                                         ` Laurence Oberman
@ 2016-08-03 16:55                                         ` Bart Van Assche
  2016-08-04  9:53                                           ` Hannes Reinecke
  2016-08-04 16:10                                           ` Mike Snitzer
  1 sibling, 2 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-08-03 16:55 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Laurence Oberman, linux-scsi

On 08/02/2016 05:40 PM, Mike Snitzer wrote:
> But I asked you to run the v4.7 kernel patches I
> pointed to _without_ any of your debug patches.

I need several patches to fix bugs that are not related to the device 
mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs" 
(https://lkml.org/lkml/2016/8/3/289).

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03 16:06                                           ` Bart Van Assche
@ 2016-08-03 17:25                                             ` Laurence Oberman
  2016-08-03 18:03                                             ` [dm-devel] " Laurence Oberman
  1 sibling, 0 replies; 74+ messages in thread
From: Laurence Oberman @ 2016-08-03 17:25 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-scsi, Mike Snitzer



----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>, "Mike Snitzer" <snitzer@redhat.com>
> Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org
> Sent: Wednesday, August 3, 2016 12:06:17 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> On 08/02/2016 06:33 PM, Laurence Oberman wrote:
> > #!/bin/bash
> > for i in /sys/class/srp_remote_ports/*
> > do
> >  echo "Deleting host $i, it will re-connect via srp_daemon"
> >  echo 1 > $i/delete
> >  sleep 10
> > done
> 
> Hello Laurence,
> 
> Sorry but the above looks wrong to me. There should be a second loop
> around this loop and the sleep statement should be moved from the inner
> loop to the outer loop. The above code logs out one (initiator, target)
> port pair at a time instead of logging out all paths at once.
> 
> Bart.
> 

Hi Bart

It logs out each host in turn with a 10s sleep in between.
I actually reduced the sleep to 3s last night.
We do land up with all paths lost but not at precisely the same second.

Are you saying we have to lose all paths at the same time.
That is easy to fix and I was running it that way in beginning, I will re-test.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-03 16:06                                           ` Bart Van Assche
  2016-08-03 17:25                                             ` Laurence Oberman
@ 2016-08-03 18:03                                             ` Laurence Oberman
  1 sibling, 0 replies; 74+ messages in thread
From: Laurence Oberman @ 2016-08-03 18:03 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Mike Snitzer, dm-devel, linux-scsi



----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>, "Mike Snitzer" <snitzer@redhat.com>
> Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org
> Sent: Wednesday, August 3, 2016 12:06:17 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> On 08/02/2016 06:33 PM, Laurence Oberman wrote:
> > #!/bin/bash
> > for i in /sys/class/srp_remote_ports/*
> > do
> >  echo "Deleting host $i, it will re-connect via srp_daemon"
> >  echo 1 > $i/delete
> >  sleep 10
> > done
> 
> Hello Laurence,
> 
> Sorry but the above looks wrong to me. There should be a second loop
> around this loop and the sleep statement should be moved from the inner
> loop to the outer loop. The above code logs out one (initiator, target)
> port pair at a time instead of logging out all paths at once.
> 
> Bart.
> 

Hi Bart

Latest tests are still good on our side.
I am now taking both paths out at the same time but still we seem stable here.
First test removed sleep and we still had a delay, second test add a background so they ran as close as possible to the same time.
Both tests passed.

I will email messages log just to you.

With no sleep we still have a gap when we delete paths of 9s and we are good.

Aug  3 13:41:21 jumpclient multipathd: 360001ff0b035d000000000008d700001: remaining active paths: 1
Aug  3 13:41:22 jumpclient multipathd: 360001ff0b035d000000000028d720003: remaining active paths: 1
Aug  3 13:41:22 jumpclient multipathd: 360001ff0b035d000000000048d740005: remaining active paths: 1
Aug  3 13:41:22 jumpclient multipathd: 360001ff0b035d000000000068d760007: remaining active paths: 1
Aug  3 13:41:23 jumpclient multipathd: 360001ff0b035d0000000000b8d7b000c: remaining active paths: 1
Aug  3 13:41:23 jumpclient multipathd: 360001ff0b035d0000000000d8d7d000e: remaining active paths: 1
Aug  3 13:41:23 jumpclient multipathd: 360001ff0b035d000000000118d810012: remaining active paths: 1
Aug  3 13:41:24 jumpclient multipathd: 360001ff0b035d000000000138d830014: remaining active paths: 1
Aug  3 13:41:24 jumpclient multipathd: 360001ff0b035d000000000158d850016: remaining active paths: 1
Aug  3 13:41:25 jumpclient multipathd: 360001ff0b035d000000000178d870018: remaining active paths: 1
Aug  3 13:41:25 jumpclient multipathd: 360001ff0b035d000000000198d89001a: remaining active paths: 1
Aug  3 13:41:25 jumpclient multipathd: 360001ff0b035d0000000001a8d8a001b: remaining active paths: 1
Aug  3 13:41:25 jumpclient multipathd: 360001ff0b035d0000000001c8d8c001d: remaining active paths: 1
Aug  3 13:41:26 jumpclient multipathd: 360001ff0b035d0000000001e8d8e001f: remaining active paths: 1
Aug  3 13:41:26 jumpclient multipathd: 360001ff0b035d0000000001f8d8f0020: remaining active paths: 1
Aug  3 13:41:26 jumpclient multipathd: 360001ff0b035d000000000208d900021: remaining active paths: 1
Aug  3 13:41:26 jumpclient multipathd: 360001ff0b035d000000000228d920023: remaining active paths: 1
Aug  3 13:41:28 jumpclient multipathd: 360001ff0b035d000000000248d940025: remaining active paths: 1
Aug  3 13:41:29 jumpclient multipathd: 360001ff0b035d000000000268d960027: remaining active paths: 1
Aug  3 13:41:29 jumpclient multipathd: 360001ff0b035d000000000278d970028: remaining active paths: 1
Aug  3 13:41:30 jumpclient multipathd: 360001ff0b035d000000000288d980029: remaining active paths: 1
Aug  3 13:41:35 jumpclient multipathd: 360001ff0b035d000000000008d700001: remaining active paths: 0
Aug  3 13:41:36 jumpclient multipathd: 360001ff0b035d000000000028d720003: remaining active paths: 0
Aug  3 13:41:37 jumpclient multipathd: 360001ff0b035d000000000048d740005: remaining active paths: 0
Aug  3 13:41:37 jumpclient multipathd: 360001ff0b035d000000000068d760007: remaining active paths: 0
Aug  3 13:41:38 jumpclient multipathd: 360001ff0b035d0000000000b8d7b000c: remaining active paths: 0
Aug  3 13:41:38 jumpclient multipathd: 360001ff0b035d0000000000d8d7d000e: remaining active paths: 0
Aug  3 13:41:38 jumpclient multipathd: 360001ff0b035d000000000108d800011: remaining active paths: 0
Aug  3 13:41:38 jumpclient multipathd: 360001ff0b035d000000000118d810012: remaining active paths: 0
Aug  3 13:41:38 jumpclient multipathd: 360001ff0b035d000000000138d830014: remaining active paths: 0
Aug  3 13:41:39 jumpclient multipathd: 360001ff0b035d000000000158d850016: remaining active paths: 0
Aug  3 13:41:39 jumpclient multipathd: 360001ff0b035d000000000178d870018: remaining active paths: 0
Aug  3 13:41:39 jumpclient multipathd: 360001ff0b035d000000000198d89001a: remaining active paths: 0
Aug  3 13:41:39 jumpclient multipathd: 360001ff0b035d0000000001a8d8a001b: remaining active paths: 0
Aug  3 13:41:39 jumpclient multipathd: 360001ff0b035d0000000001c8d8c001d: remaining active paths: 0
Aug  3 13:41:39 jumpclient multipathd: 360001ff0b035d0000000001e8d8e001f: remaining active paths: 0
Aug  3 13:41:41 jumpclient multipathd: 360001ff0b035d0000000001f8d8f0020: remaining active paths: 0
Aug  3 13:41:41 jumpclient multipathd: 360001ff0b035d000000000208d900021: remaining active paths: 0
Aug  3 13:41:43 jumpclient multipathd: 360001ff0b035d000000000248d940025: remaining active paths: 0
Aug  3 13:41:43 jumpclient multipathd: 360001ff0b035d000000000268d960027: remaining active paths: 0
Aug  3 13:41:44 jumpclient multipathd: 360001ff0b035d000000000288d980029: remaining active paths: 0
Aug  3 13:42:44 jumpclient multipathd: 360001ff0b035d000000000138d830014: remaining active paths: 2
Aug  3 13:42:44 jumpclient multipathd: 360001ff0b035d000000000158d850016: remaining active paths: 2

These are the only errors and they are expected.

Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 31141264880
Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 79928
Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 65264
Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 55232
Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 14152
Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 168880
Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 269392
Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 309200
Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 87520
Aug  3 13:41:22 jumpclient kernel: blk_update_request: I/O error, dev sdd, sector 7744
Aug  3 13:41:28 jumpclient kernel: blk_update_request: I/O error, dev sdca, sector 119984
Aug  3 13:41:29 jumpclient kernel: blk_update_request: I/O error, dev sdcd, sector 31139908984
Aug  3 13:41:29 jumpclient kernel: blk_update_request: I/O error, dev sdcd, sector 131136
Aug  3 13:41:29 jumpclient kernel: blk_update_request: I/O error, dev sdcd, sector 97536
Aug  3 13:41:29 jumpclient kernel: blk_update_request: I/O error, dev sdcd, sector 123264
Aug  3 13:41:29 jumpclient kernel: blk_update_request: I/O error, dev sdcd, sector 110336
Aug  3 13:41:29 jumpclient kernel: blk_update_request: I/O error, dev sdcd, sector 158136
Aug  3 13:41:29 jumpclient kernel: blk_update_request: I/O error, dev sdcd, sector 156136
Aug  3 13:41:29 jumpclient kernel: blk_update_request: I/O error, dev sdcd, sector 173072
Aug  3 13:41:29 jumpclient kernel: blk_update_request: I/O error, dev sdcd, sector 6984
Aug  3 13:41:35 jumpclient kernel: blk_update_request: I/O error, dev sdc, sector 130224
Aug  3 13:41:35 jumpclient kernel: blk_update_request: I/O error, dev sdc, sector 225816
Aug  3 13:41:36 jumpclient kernel: blk_update_request: I/O error, dev sdc, sector 248120
Aug  3 13:41:36 jumpclient kernel: blk_update_request: I/O error, dev sdc, sector 242528
Aug  3 13:41:36 jumpclient kernel: blk_update_request: I/O error, dev sdk, sector 251248
Aug  3 13:41:36 jumpclient kernel: blk_update_request: I/O error, dev sdk, sector 242032
Aug  3 13:41:36 jumpclient kernel: blk_update_request: I/O error, dev sdk, sector 203736
Aug  3 13:41:36 jumpclient kernel: blk_update_request: I/O error, dev sdk, sector 31141107808
Aug  3 13:41:36 jumpclient kernel: blk_update_request: I/O error, dev sdk, sector 233336
Aug  3 13:41:36 jumpclient kernel: blk_update_request: I/O error, dev sdk, sector 187944
Aug  3 13:41:41 jumpclient kernel: blk_update_request: I/O error, dev sdbl, sector 85800
Aug  3 13:41:41 jumpclient kernel: blk_update_request: I/O error, dev sdbl, sector 74120
Aug  3 13:41:41 jumpclient kernel: blk_update_request: I/O error, dev sdbl, sector 78216
Aug  3 13:41:41 jumpclient kernel: blk_update_request: I/O error, dev sdbl, sector 79976
Aug  3 13:41:41 jumpclient kernel: blk_update_request: I/O error, dev sdbl, sector 79552
Aug  3 13:41:41 jumpclient kernel: blk_update_request: I/O error, dev sdbl, sector 87888
Aug  3 13:41:43 jumpclient kernel: blk_update_request: I/O error, dev sdbt, sector 274368
Aug  3 13:41:43 jumpclient kernel: blk_update_request: I/O error, dev sdbt, sector 31139814080
Aug  3 13:41:43 jumpclient kernel: blk_update_request: I/O error, dev sdbx, sector 6776
Aug  3 13:41:43 jumpclient kernel: blk_update_request: I/O error, dev sdbx, sector 302152

Changing script to add & we take both away at the same time but still we seem to survive here.

This is my configuration:

360001ff0b035d000000000078d770008 dm-9 DDN     ,SFA14K          
size=29T features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=90 status=active
| `- 1:0:0:7  sday 67:32  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 2:0:0:7  sdj  8:144  active ready running

       device {
        vendor                  "DDN"
        product                 "SFA14K"
        path_grouping_policy    group_by_prio
        prio                    alua
        path_selector           "round-robin 0"
        path_checker            tur
        failback                2
        rr_weight               uniform
        no_path_retry           12
        dev_loss_tmo            10
        fast_io_fail_tmo        5
	features     "1 queue_if_no_path"
    }


With &

#!/bin/bash
for i in /sys/class/srp_remote_ports/*
do
 echo "Deleting host $i, it will re-connect via srp_daemon" 
 echo 1 > $i/delete &
 #sleep 3
done

Here we lose both paths at the same time.

[root@jumpclient bart_tests]# grep remaining messages
Aug  3 13:49:38 jumpclient multipathd: 360001ff0b035d000000000008d700001: remaining active paths: 0
Aug  3 13:49:38 jumpclient multipathd: 360001ff0b035d000000000028d720003: remaining active paths: 0
Aug  3 13:49:38 jumpclient multipathd: 360001ff0b035d000000000048d740005: remaining active paths: 0
Aug  3 13:49:41 jumpclient multipathd: 360001ff0b035d000000000068d760007: remaining active paths: 0
Aug  3 13:49:42 jumpclient multipathd: 360001ff0b035d0000000000d8d7d000e: remaining active paths: 0
Aug  3 13:49:45 jumpclient multipathd: 360001ff0b035d000000000118d810012: remaining active paths: 0
Aug  3 13:49:45 jumpclient multipathd: 360001ff0b035d000000000108d800011: remaining active paths: 0
Aug  3 13:49:47 jumpclient multipathd: 360001ff0b035d000000000158d850016: remaining active paths: 0
Aug  3 13:49:48 jumpclient multipathd: 360001ff0b035d000000000178d870018: remaining active paths: 0
Aug  3 13:49:48 jumpclient multipathd: 360001ff0b035d000000000198d89001a: remaining active paths: 0
Aug  3 13:49:48 jumpclient multipathd: 360001ff0b035d0000000001a8d8a001b: remaining active paths: 0
Aug  3 13:49:55 jumpclient multipathd: 360001ff0b035d0000000001e8d8e001f: remaining active paths: 0
Aug  3 13:49:55 jumpclient multipathd: 360001ff0b035d0000000001f8d8f0020: remaining active paths: 0
Aug  3 13:49:58 jumpclient multipathd: 360001ff0b035d000000000248d940025: remaining active paths: 0
Aug  3 13:49:59 jumpclient multipathd: 360001ff0b035d000000000268d960027: remaining active paths: 0
Aug  3 13:50:00 jumpclient multipathd: 360001ff0b035d000000000288d980029: remaining active paths: 0
Aug  3 13:51:17 jumpclient multipathd: 360001ff0b035d000000000038d730004: remaining active paths: 2
Aug  3 13:51:17 jumpclient multipathd: 360001ff0b035d000000000028d720003: remaining active paths: 2
Aug  3 13:51:19 jumpclient multipathd: 360001ff0b035d000000000078d770008: remaining active paths: 2
Aug  3 13:51:20 jumpclient multipathd: 360001ff0b035d0000000000a8d7a000b: remaining active paths: 2
Aug  3 13:51:23 jumpclient multipathd: 360001ff0b035d0000000000d8d7d000e: remaining active paths: 2
Aug  3 13:51:24 jumpclient multipathd: 360001ff0b035d000000000108d800011: remaining active paths: 2
Aug  3 13:51:25 jumpclient multipathd: 360001ff0b035d0000000000f8d7f0010: remaining active paths: 2
Aug  3 13:51:26 jumpclient multipathd: 360001ff0b035d000000000128d820013: remaining active paths: 2
Aug  3 13:51:29 jumpclient multipathd: 360001ff0b035d0000000001c8d8c001d: remaining active paths: 2
Aug  3 13:51:33 jumpclient multipathd: 360001ff0b035d000000000228d920023: remaining active paths: 2
Aug  3 13:51:34 jumpclient multipathd: 360001ff0b035d000000000238d930024: remaining active paths: 2

We still survive.

[root@jumpclient bart_tests]# grep -i error messages
Aug  3 13:49:38 jumpclient kernel: blk_update_request: I/O error, dev sdc, sector 98288
Aug  3 13:49:38 jumpclient kernel: blk_update_request: I/O error, dev sdc, sector 98320
Aug  3 13:49:38 jumpclient kernel: blk_update_request: I/O error, dev sdc, sector 46976
Aug  3 13:49:38 jumpclient kernel: blk_update_request: I/O error, dev sde, sector 216720
Aug  3 13:49:38 jumpclient kernel: blk_update_request: I/O error, dev sdg, sector 130672
Aug  3 13:49:41 jumpclient kernel: blk_update_request: I/O error, dev sdi, sector 56984
Aug  3 13:49:41 jumpclient kernel: blk_update_request: I/O error, dev sdi, sector 56120
Aug  3 13:49:41 jumpclient kernel: blk_update_request: I/O error, dev sdi, sector 62112
Aug  3 13:49:42 jumpclient kernel: blk_update_request: I/O error, dev sdp, sector 156944
Aug  3 13:49:42 jumpclient kernel: blk_update_request: I/O error, dev sdp, sector 31140975496
Aug  3 13:49:44 jumpclient kernel: blk_update_request: I/O error, dev sdt, sector 207392
Aug  3 13:49:44 jumpclient kernel: blk_update_request: I/O error, dev sdt, sector 200568
Aug  3 13:49:44 jumpclient kernel: blk_update_request: I/O error, dev sdt, sector 251048
Aug  3 13:49:44 jumpclient kernel: blk_update_request: I/O error, dev sdt, sector 247616
Aug  3 13:49:44 jumpclient kernel: blk_update_request: I/O error, dev sdt, sector 210592
Aug  3 13:49:44 jumpclient kernel: blk_update_request: I/O error, dev sdt, sector 200120
Aug  3 13:49:44 jumpclient kernel: blk_update_request: I/O error, dev sdt, sector 203000
Aug  3 13:49:44 jumpclient kernel: blk_update_request: I/O error, dev sdt, sector 248640
Aug  3 13:49:47 jumpclient kernel: blk_update_request: I/O error, dev sdx, sector 48232
Aug  3 13:49:48 jumpclient kernel: blk_update_request: I/O error, dev sdz, sector 9984
Aug  3 13:49:55 jumpclient kernel: blk_update_request: I/O error, dev sdag, sector 130512
Aug  3 13:49:58 jumpclient kernel: blk_update_request: I/O error, dev sdai, sector 39040
Aug  3 13:49:58 jumpclient kernel: blk_update_request: I/O error, dev sdam, sector 31140570528
Aug  3 13:49:59 jumpclient kernel: blk_update_request: I/O error, dev sdao, sector 204552
Aug  3 13:50:00 jumpclient kernel: blk_update_request: I/O error, dev sdaq, sector 31142052904




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03 16:55                                         ` Bart Van Assche
@ 2016-08-04  9:53                                           ` Hannes Reinecke
  2016-08-04 10:09                                             ` Hannes Reinecke
  2016-08-04 16:10                                           ` Mike Snitzer
  1 sibling, 1 reply; 74+ messages in thread
From: Hannes Reinecke @ 2016-08-04  9:53 UTC (permalink / raw)
  To: dm-devel

On 08/03/2016 06:55 PM, Bart Van Assche wrote:
> On 08/02/2016 05:40 PM, Mike Snitzer wrote:
>> But I asked you to run the v4.7 kernel patches I
>> pointed to _without_ any of your debug patches.
> 
> I need several patches to fix bugs that are not related to the device
> mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs"
> (https://lkml.org/lkml/2016/8/3/289).
> 
Hmm. Can you test with this patch?

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 7790a70..9daed03 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -439,8 +439,7 @@ static int must_push_back(struct multipath *m)
 {
        return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
                ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
-                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
-                dm_noflush_suspending(m->ti)));
+                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)));
 }

 /*

Reasoning:
The original check for dm_noflush_suspending() was for bio-based
drivers, which needed to queue I/O within the device-mapper core.
So during suspend this I/O would keep a reference to the device-mapper
core and the table couldn't be swapped.
For request-based multipathing, however, the I/O is _never_ held within
the device-mapper core but rather pushed back to the request queue.
IE even for pushback the I/O will never hold a reference to the
device-mapper core, and the tables can be swapped irrespective of the
'dm_noflush_suspend()' setting.

Or that's the idea, at least :-)

Yes Mike, I know, it's not going to work with bio-based multipathing.
But this is just for figuring out where the real issue is.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-04  9:53                                           ` Hannes Reinecke
@ 2016-08-04 10:09                                             ` Hannes Reinecke
  2016-08-04 15:10                                               ` Mike Snitzer
  0 siblings, 1 reply; 74+ messages in thread
From: Hannes Reinecke @ 2016-08-04 10:09 UTC (permalink / raw)
  To: dm-devel

On 08/04/2016 11:53 AM, Hannes Reinecke wrote:
> On 08/03/2016 06:55 PM, Bart Van Assche wrote:
>> On 08/02/2016 05:40 PM, Mike Snitzer wrote:
>>> But I asked you to run the v4.7 kernel patches I
>>> pointed to _without_ any of your debug patches.
>>
>> I need several patches to fix bugs that are not related to the device
>> mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs"
>> (https://lkml.org/lkml/2016/8/3/289).
>>
> Hmm. Can you test with this patch?
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 7790a70..9daed03 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -439,8 +439,7 @@ static int must_push_back(struct multipath *m)
>  {
>         return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
>                 ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> -                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> -                dm_noflush_suspending(m->ti)));
> +                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)));
>  }
> 
>  /*
> 
> Reasoning:
> The original check for dm_noflush_suspending() was for bio-based
> drivers, which needed to queue I/O within the device-mapper core.
> So during suspend this I/O would keep a reference to the device-mapper
> core and the table couldn't be swapped.
> For request-based multipathing, however, the I/O is _never_ held within
> the device-mapper core but rather pushed back to the request queue.
> IE even for pushback the I/O will never hold a reference to the
> device-mapper core, and the tables can be swapped irrespective of the
> 'dm_noflush_suspend()' setting.
> 
> Or that's the idea, at least :-)
> 
> Yes Mike, I know, it's not going to work with bio-based multipathing.
> But this is just for figuring out where the real issue is.
> 
And indeed.

multipathd is calling DM_SUSPEND _without_ the noflush_suspending flag.
(On the grounds that originally it needed to flush all I/O from the
device-mapper core).
Which will be causing I/O errors if any I/O is executed after
->presuspend has been called.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-04 10:09                                             ` Hannes Reinecke
@ 2016-08-04 15:10                                               ` Mike Snitzer
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Snitzer @ 2016-08-04 15:10 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel

On Thu, Aug 04 2016 at  6:09am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On 08/04/2016 11:53 AM, Hannes Reinecke wrote:
> > On 08/03/2016 06:55 PM, Bart Van Assche wrote:
> >> On 08/02/2016 05:40 PM, Mike Snitzer wrote:
> >>> But I asked you to run the v4.7 kernel patches I
> >>> pointed to _without_ any of your debug patches.
> >>
> >> I need several patches to fix bugs that are not related to the device
> >> mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs"
> >> (https://lkml.org/lkml/2016/8/3/289).
> >>
> > Hmm. Can you test with this patch?
> > 
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index 7790a70..9daed03 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -439,8 +439,7 @@ static int must_push_back(struct multipath *m)
> >  {
> >         return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> >                 ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> > -                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> > -                dm_noflush_suspending(m->ti)));
> > +                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)));
> >  }
> > 
> >  /*
> > 
> > Reasoning:
> > The original check for dm_noflush_suspending() was for bio-based
> > drivers, which needed to queue I/O within the device-mapper core.
> > So during suspend this I/O would keep a reference to the device-mapper
> > core and the table couldn't be swapped.
> > For request-based multipathing, however, the I/O is _never_ held within
> > the device-mapper core but rather pushed back to the request queue.
> > IE even for pushback the I/O will never hold a reference to the
> > device-mapper core, and the tables can be swapped irrespective of the
> > 'dm_noflush_suspend()' setting.
> > 
> > Or that's the idea, at least :-)
> > 
> > Yes Mike, I know, it's not going to work with bio-based multipathing.
> > But this is just for figuring out where the real issue is.
> > 
> And indeed.
> 
> multipathd is calling DM_SUSPEND _without_ the noflush_suspending flag.
> (On the grounds that originally it needed to flush all I/O from the
> device-mapper core).
> Which will be causing I/O errors if any I/O is executed after
> ->presuspend has been called.

The only time multipathd doesn't use noflush is on resize.  Otherwise
I'm pretty sure it _does_ use noflush all the time.

But the point is that the map method shouldn't be called while the
multipath device is suspended.

I already provided fixes for this, staged here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8

and relative to to 4.7:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7-mpath-fixes

With these patches our testing on real SRP hardware testbed (fast DDN
backend) doesn't see any IO errors.

But I'll revisit must_push_back relative to dm_noflush_suspending();
specifically the new must_push_back_rq() could be made to not check
dm_noflush_suspending().

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-03 16:55                                         ` Bart Van Assche
  2016-08-04  9:53                                           ` Hannes Reinecke
@ 2016-08-04 16:10                                           ` Mike Snitzer
  2016-08-04 17:42                                             ` Bart Van Assche
  1 sibling, 1 reply; 74+ messages in thread
From: Mike Snitzer @ 2016-08-04 16:10 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, Laurence Oberman, linux-scsi

On Wed, Aug 03 2016 at 12:55pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 08/02/2016 05:40 PM, Mike Snitzer wrote:
> >But I asked you to run the v4.7 kernel patches I
> >pointed to _without_ any of your debug patches.
> 
> I need several patches to fix bugs that are not related to the
> device mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs"
> (https://lkml.org/lkml/2016/8/3/289).

OK, but you have way more changes than seem needed.  In particular the
blk-mq error handling changes look suspect.

I'm also not sure what REQ_FAIL_IF_NO_PATH is all about (vaguely recall
seeing it before; and suggesting you use SCSI's more traditional
differentiated IO errors).

Anyway, at this point you're having us test too many changes that aren't
yet upstream:

$ git diff bart/srp-initiator-for-next dm/dm-4.7-mpath-fixes -- drivers block include kernel | diffstat
 block/bio-integrity.c                   |    1
 block/blk-cgroup.c                      |    4
 block/blk-core.c                        |   16 ---
 block/blk-mq.c                          |   16 ---
 block/partition-generic.c               |    3
 drivers/acpi/acpica/nswalk.c            |    1
 drivers/infiniband/core/rw.c            |   24 +++--
 drivers/infiniband/core/verbs.c         |    9 --
 drivers/infiniband/hw/hfi1/Kconfig      |    1
 drivers/infiniband/hw/mlx4/qp.c         |    6 -
 drivers/infiniband/sw/rdmavt/Kconfig    |    1
 drivers/infiniband/ulp/isert/ib_isert.c |    2
 drivers/infiniband/ulp/isert/ib_isert.h |    1
 drivers/infiniband/ulp/srp/ib_srp.c     |  131 --------------------------------
 drivers/infiniband/ulp/srp/ib_srp.h     |    5 -
 drivers/infiniband/ulp/srpt/ib_srpt.c   |   10 +-
 drivers/infiniband/ulp/srpt/ib_srpt.h   |    6 -
 drivers/md/dm-crypt.c                   |    4
 drivers/md/dm-ioctl.c                   |   77 +++++++++---------
 drivers/md/dm-mpath.c                   |   32 -------
 drivers/md/dm.c                         |   22 -----
 drivers/scsi/scsi_lib.c                 |   36 +-------
 drivers/scsi/scsi_priv.h                |    2
 drivers/scsi/scsi_scan.c                |    2
 drivers/scsi/scsi_sysfs.c               |   48 -----------
 drivers/scsi/sd.c                       |    6 -
 drivers/scsi/sg.c                       |    3
 include/linux/blk-mq.h                  |    3
 include/linux/blk_types.h               |    5 -
 include/linux/blkdev.h                  |    1
 include/linux/dmar.h                    |    2
 include/rdma/ib_verbs.h                 |    6 -
 include/scsi/scsi_device.h              |    2
 kernel/sched/wait.c                     |    2
 34 files changed, 106 insertions(+), 384 deletions(-)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-04 16:10                                           ` Mike Snitzer
@ 2016-08-04 17:42                                             ` Bart Van Assche
  2016-08-04 23:58                                               ` Mike Snitzer
  0 siblings, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-08-04 17:42 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Laurence Oberman, linux-scsi

On 08/04/2016 09:10 AM, Mike Snitzer wrote:
> Anyway, at this point you're having us test too many changes that aren't
> yet upstream:
>
> $ git diff bart/srp-initiator-for-next dm/dm-4.7-mpath-fixes -- drivers block include kernel | diffstat
>  block/bio-integrity.c                   |    1
>  block/blk-cgroup.c                      |    4
>  block/blk-core.c                        |   16 ---
>  block/blk-mq.c                          |   16 ---
>  block/partition-generic.c               |    3
>  drivers/acpi/acpica/nswalk.c            |    1
>  drivers/infiniband/core/rw.c            |   24 +++--
>  drivers/infiniband/core/verbs.c         |    9 --
>  drivers/infiniband/hw/hfi1/Kconfig      |    1
>  drivers/infiniband/hw/mlx4/qp.c         |    6 -
>  drivers/infiniband/sw/rdmavt/Kconfig    |    1
>  drivers/infiniband/ulp/isert/ib_isert.c |    2
>  drivers/infiniband/ulp/isert/ib_isert.h |    1
>  drivers/infiniband/ulp/srp/ib_srp.c     |  131 --------------------------------
>  drivers/infiniband/ulp/srp/ib_srp.h     |    5 -
>  drivers/infiniband/ulp/srpt/ib_srpt.c   |   10 +-
>  drivers/infiniband/ulp/srpt/ib_srpt.h   |    6 -
>  drivers/md/dm-crypt.c                   |    4
>  drivers/md/dm-ioctl.c                   |   77 +++++++++---------
>  drivers/md/dm-mpath.c                   |   32 -------
>  drivers/md/dm.c                         |   22 -----
>  drivers/scsi/scsi_lib.c                 |   36 +-------
>  drivers/scsi/scsi_priv.h                |    2
>  drivers/scsi/scsi_scan.c                |    2
>  drivers/scsi/scsi_sysfs.c               |   48 -----------
>  drivers/scsi/sd.c                       |    6 -
>  drivers/scsi/sg.c                       |    3
>  include/linux/blk-mq.h                  |    3
>  include/linux/blk_types.h               |    5 -
>  include/linux/blkdev.h                  |    1
>  include/linux/dmar.h                    |    2
>  include/rdma/ib_verbs.h                 |    6 -
>  include/scsi/scsi_device.h              |    2
>  kernel/sched/wait.c                     |    2
>  34 files changed, 106 insertions(+), 384 deletions(-)

Hello Mike,

Most of the changes you are referring to either are already upstream, 
are expected to arrive in Linus' tree later this week or only add 
debugging pr_info() statements. The changes that either are already 
upstream or that are expected to be upstream soon are:

$ for b in origin/master dledford-rdma/k.o/for-4.8-1 
dledford-rdma/k.o/for-4.8-2; do git log v4.7..$b --author="Bart Van 
Assche" | grep ^commit -A4 | sed -n 's/^    //p'; done
block: Fix spelling in a source code comment
dm ioctl: Simplify parameter buffer management code
dm crypt: Fix sparse complaints
block/blk-cgroup.c: Declare local symbols static
block/bio-integrity.c: Add #include "blk.h"
block/partition-generic.c: Remove a set-but-not-used variable
IB/hfi1: Disable by default
IB/rdmavt: Disable by default
IB/isert: Remove an unused member variable
IB/srpt: Simplify srpt_queue_response()
IB/srpt: Limit the number of SG elements per work request
IB/core, RDMA RW API: Do not exceed QP SGE send limit
IB/core: Make rdma_rw_ctx_init() initialize all used fields

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-04 17:42                                             ` Bart Van Assche
@ 2016-08-04 23:58                                               ` Mike Snitzer
  2016-08-05  1:07                                                 ` Laurence Oberman
  2016-08-05 18:40                                                 ` Bart Van Assche
  0 siblings, 2 replies; 74+ messages in thread
From: Mike Snitzer @ 2016-08-04 23:58 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, Laurence Oberman, linux-scsi

I've staged another fix, Laurence is seeing success with this added:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=d50a6450104c237db1dc75314d17b78c990a8c05

I'll be sending all the fixes I've queued to Linus tonight or early
tomorrow (since I'll then be on vacation until Monday 8/15).

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-04 23:58                                               ` Mike Snitzer
@ 2016-08-05  1:07                                                 ` Laurence Oberman
  2016-08-05 11:43                                                   ` Laurence Oberman
  2016-08-05 18:40                                                 ` Bart Van Assche
  1 sibling, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-05  1:07 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Bart Van Assche, dm-devel, linux-scsi



----- Original Message -----
> From: "Mike Snitzer" <snitzer@redhat.com>
> To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> Cc: dm-devel@redhat.com, "Laurence Oberman" <loberman@redhat.com>, linux-scsi@vger.kernel.org
> Sent: Thursday, August 4, 2016 7:58:50 PM
> Subject: Re: dm-mq and end_clone_request()
> 
> I've staged another fix, Laurence is seeing success with this added:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=d50a6450104c237db1dc75314d17b78c990a8c05
> 
> I'll be sending all the fixes I've queued to Linus tonight or early
> tomorrow (since I'll then be on vacation until Monday 8/15).
> 
Hello Bart,

I applied that patch to your kernel and while I still obviously see all the debug logging its no longer failing fio for me.
I ran 8 loops with 20 parallel fio runs. This was on a different server to the one I had been testing on.

However I am concerned about timing playing a part here here so let us know what you find.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-05  1:07                                                 ` Laurence Oberman
@ 2016-08-05 11:43                                                   ` Laurence Oberman
  2016-08-05 15:39                                                     ` Laurence Oberman
  2016-08-05 18:42                                                     ` [dm-devel] " Bart Van Assche
  0 siblings, 2 replies; 74+ messages in thread
From: Laurence Oberman @ 2016-08-05 11:43 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Bart Van Assche, dm-devel, linux-scsi



----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Mike Snitzer" <snitzer@redhat.com>
> Cc: "Bart Van Assche" <bart.vanassche@sandisk.com>, dm-devel@redhat.com, linux-scsi@vger.kernel.org
> Sent: Thursday, August 4, 2016 9:07:28 PM
> Subject: Re: dm-mq and end_clone_request()
> 
> 
> 
> ----- Original Message -----
> > From: "Mike Snitzer" <snitzer@redhat.com>
> > To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> > Cc: dm-devel@redhat.com, "Laurence Oberman" <loberman@redhat.com>,
> > linux-scsi@vger.kernel.org
> > Sent: Thursday, August 4, 2016 7:58:50 PM
> > Subject: Re: dm-mq and end_clone_request()
> > 
> > I've staged another fix, Laurence is seeing success with this added:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=d50a6450104c237db1dc75314d17b78c990a8c05
> > 
> > I'll be sending all the fixes I've queued to Linus tonight or early
> > tomorrow (since I'll then be on vacation until Monday 8/15).
> > 
> Hello Bart,
> 
> I applied that patch to your kernel and while I still obviously see all the
> debug logging its no longer failing fio for me.
> I ran 8 loops with 20 parallel fio runs. This was on a different server to
> the one I had been testing on.
> 
> However I am concerned about timing playing a part here here so let us know
> what you find.
> 
> Thanks
> Laurence
Replying to my own message:

Hi Bart, Mike

Further testing has shown we are still exposed here so more investigation is necessary.
The above patch seems to help but I still see sporadic cases of errors escaping up the stack.

I expect you will see the same so more work to do here to figure this out.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-05 11:43                                                   ` Laurence Oberman
@ 2016-08-05 15:39                                                     ` Laurence Oberman
  2016-08-05 15:43                                                       ` Bart Van Assche
  2016-08-05 18:42                                                     ` [dm-devel] " Bart Van Assche
  1 sibling, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-05 15:39 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Bart Van Assche, dm-devel, linux-scsi



----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Mike Snitzer" <snitzer@redhat.com>
> Cc: "Bart Van Assche" <bart.vanassche@sandisk.com>, dm-devel@redhat.com, linux-scsi@vger.kernel.org
> Sent: Friday, August 5, 2016 7:43:30 AM
> Subject: Re: dm-mq and end_clone_request()
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman@redhat.com>
> > To: "Mike Snitzer" <snitzer@redhat.com>
> > Cc: "Bart Van Assche" <bart.vanassche@sandisk.com>, dm-devel@redhat.com,
> > linux-scsi@vger.kernel.org
> > Sent: Thursday, August 4, 2016 9:07:28 PM
> > Subject: Re: dm-mq and end_clone_request()
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Mike Snitzer" <snitzer@redhat.com>
> > > To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> > > Cc: dm-devel@redhat.com, "Laurence Oberman" <loberman@redhat.com>,
> > > linux-scsi@vger.kernel.org
> > > Sent: Thursday, August 4, 2016 7:58:50 PM
> > > Subject: Re: dm-mq and end_clone_request()
> > > 
> > > I've staged another fix, Laurence is seeing success with this added:
> > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=d50a6450104c237db1dc75314d17b78c990a8c05
> > > 
> > > I'll be sending all the fixes I've queued to Linus tonight or early
> > > tomorrow (since I'll then be on vacation until Monday 8/15).
> > > 
> > Hello Bart,
> > 
> > I applied that patch to your kernel and while I still obviously see all the
> > debug logging its no longer failing fio for me.
> > I ran 8 loops with 20 parallel fio runs. This was on a different server to
> > the one I had been testing on.
> > 
> > However I am concerned about timing playing a part here here so let us know
> > what you find.
> > 
> > Thanks
> > Laurence
> Replying to my own message:
> 
> Hi Bart, Mike
> 
> Further testing has shown we are still exposed here so more investigation is
> necessary.
> The above patch seems to help but I still see sporadic cases of errors
> escaping up the stack.
> 
> I expect you will see the same so more work to do here to figure this out.
> 
> Thanks
> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello Bart

I completely forgot I had set no_path_retry=12, so after 12 retries it will error out.
This is likely why I had different results seemingly affected by timing.
Mike reminded me of it this morning.

What do you have set for no_path_retry, because when I set it to queue, it blocks the paths coming back for some reason.
I am now investigating why that is happening :).
I see now I need to add "simultaneous all paths lost" scenarios to my QA testing, as its not a common scenario.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-05 15:39                                                     ` Laurence Oberman
@ 2016-08-05 15:43                                                       ` Bart Van Assche
  0 siblings, 0 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-08-05 15:43 UTC (permalink / raw)
  To: Laurence Oberman, Mike Snitzer; +Cc: dm-devel, linux-scsi

On 08/05/2016 08:39 AM, Laurence Oberman wrote:
> I completely forgot I had set no_path_retry=12, so after 12 retries it will error out.
> This is likely why I had different results seemingly affected by timing.
> Mike reminded me of it this morning.
> 
> What do you have set for no_path_retry, because when I set it to queue, it blocks the paths coming back for some reason.
> I am now investigating why that is happening :).
> I see now I need to add "simultaneous all paths lost" scenarios to my QA testing, as its not a common scenario.

Hello Laurence,

I'm using the following multipath.conf file for the tests I run:

defaults {
	user_friendly_names	yes
	queue_without_daemon	no
}

blacklist {
	device {
		vendor			"ATA"
		product			".*"
	}
}

devices {
	device {
		vendor			"SCST_BIO|LIO-ORG"
		product			".*"
		features		"3 queue_if_no_path pg_init_retries 50"
		path_grouping_policy	group_by_prio
		path_selector		"queue-length 0"
		path_checker		tur
	}
}

blacklist_exceptions {
        property        ".*"
}

Bart.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-04 23:58                                               ` Mike Snitzer
  2016-08-05  1:07                                                 ` Laurence Oberman
@ 2016-08-05 18:40                                                 ` Bart Van Assche
  1 sibling, 0 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-08-05 18:40 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Laurence Oberman, linux-scsi

On 08/04/2016 04:58 PM, Mike Snitzer wrote:
> I've staged another fix, Laurence is seeing success with this added:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=d50a6450104c237db1dc75314d17b78c990a8c05

Thanks Mike. I have started testing that fix this morning.

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-05 11:43                                                   ` Laurence Oberman
  2016-08-05 15:39                                                     ` Laurence Oberman
@ 2016-08-05 18:42                                                     ` Bart Van Assche
  2016-08-06 14:47                                                       ` Laurence Oberman
  1 sibling, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-08-05 18:42 UTC (permalink / raw)
  To: Laurence Oberman, Mike Snitzer; +Cc: dm-devel, linux-scsi

On 08/05/2016 04:43 AM, Laurence Oberman wrote:
> Further testing has shown we are still exposed here so more investigation is necessary.
> The above patch seems to help but I still see sporadic cases of errors escaping up the stack.
>
> I expect you will see the same so more work to do here to figure this out.

Hello Laurence,

Unfortunately I also still see sporadic I/O errors when testing 
all-paths-down with CONFIG_DM_MQ_DEFAULT=n (I have not yet tried to 
retest with CONFIG_DM_MQ_DEFAULT=y).

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-05 18:42                                                     ` [dm-devel] " Bart Van Assche
@ 2016-08-06 14:47                                                       ` Laurence Oberman
  2016-08-07 22:31                                                         ` [dm-devel] " Bart Van Assche
  2016-08-08 15:11                                                         ` Bart Van Assche
  0 siblings, 2 replies; 74+ messages in thread
From: Laurence Oberman @ 2016-08-06 14:47 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-scsi, Mike Snitzer



----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>, "Mike Snitzer" <snitzer@redhat.com>
> Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org
> Sent: Friday, August 5, 2016 2:42:49 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> On 08/05/2016 04:43 AM, Laurence Oberman wrote:
> > Further testing has shown we are still exposed here so more investigation
> > is necessary.
> > The above patch seems to help but I still see sporadic cases of errors
> > escaping up the stack.
> >
> > I expect you will see the same so more work to do here to figure this out.
> 
> Hello Laurence,
> 
> Unfortunately I also still see sporadic I/O errors when testing
> all-paths-down with CONFIG_DM_MQ_DEFAULT=n (I have not yet tried to
> retest with CONFIG_DM_MQ_DEFAULT=y).
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello Bart,

I am still debugging this, now that I have no_path_retry=queue and not a count :)
I am often hitting the host delete race, have you seen this on your testing during debugging.

I am using your kernel built from your git tree that has  Mikes patches applied.
4.7.0bart

[66813.896159] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[66813.933246] Workqueue: srp_remove srp_remove_work [ib_srp]
[66813.964703]  0000000000000086 00000000d185b9ce ffff88060fa03d20 ffffffff813456df
[66814.007292]  0000000000000000 0000000000000000 ffff88060fa03d60 ffffffff81089fb1
[66814.049336]  0000007da067604b ffff880c01643d80 0000000000017ec0 ffff880c016447dc
[66814.091725] Call Trace:
[66814.104775]  <IRQ>  [<ffffffff813456df>] dump_stack+0x63/0x84
[66814.136507]  [<ffffffff81089fb1>] __warn+0xd1/0xf0
[66814.163118]  [<ffffffff8108a0ed>] warn_slowpath_null+0x1d/0x20
[66814.195409]  [<ffffffff8104fd7e>] native_smp_send_reschedule+0x3e/0x40
[66814.231954]  [<ffffffff810b47db>] try_to_wake_up+0x30b/0x390
[66814.263661]  [<ffffffff810b4912>] default_wake_function+0x12/0x20
[66814.297713]  [<ffffffff810ccb05>] __wake_up_common+0x55/0x90
[66814.330021]  [<ffffffff810ccb53>] __wake_up_locked+0x13/0x20
[66814.361906]  [<ffffffff81261179>] ep_poll_callback+0xb9/0x200
[66814.392784]  [<ffffffff810ccb05>] __wake_up_common+0x55/0x90
[66814.424908]  [<ffffffff810ccc59>] __wake_up+0x39/0x50
[66814.454327]  [<ffffffff810e1f80>] wake_up_klogd_work_func+0x40/0x60
[66814.490152]  [<ffffffff81177b6d>] irq_work_run_list+0x4d/0x70
[66814.523007]  [<ffffffff810710d0>] ? do_flush_tlb_all+0x50/0x50
[66814.556161]  [<ffffffff81177bbc>] irq_work_run+0x2c/0x30
[66814.586677]  [<ffffffff8110ab5f>] flush_smp_call_function_queue+0x8f/0x160
[66814.625667]  [<ffffffff8110b613>] generic_smp_call_function_single_interrupt+0x13/0x60
[66814.669276]  [<ffffffff81050167>] smp_call_function_interrupt+0x27/0x40
[66814.706255]  [<ffffffff816c7e9c>] call_function_interrupt+0x8c/0xa0
[66814.741406]  <EOI>  [<ffffffff8118e733>] ? panic+0x1ef/0x233
[66814.772851]  [<ffffffff8118e72f>] ? panic+0x1eb/0x233
[66814.800207]  [<ffffffff810308f8>] oops_end+0xb8/0xd0
[66814.827454]  [<ffffffff8106977e>] no_context+0x13e/0x3a0
[66814.858368]  [<ffffffff811f3feb>] ? __slab_free+0x9b/0x280
[66814.890365]  [<ffffffff81069ace>] __bad_area_nosemaphore+0xee/0x1d0
[66814.926508]  [<ffffffff81069bc4>] bad_area_nosemaphore+0x14/0x20
[66814.959939]  [<ffffffff8106a269>] __do_page_fault+0x89/0x4a0
[66814.992039]  [<ffffffff811f3feb>] ? __slab_free+0x9b/0x280
[66815.023052]  [<ffffffff8106a6b0>] do_page_fault+0x30/0x80
[66815.053368]  [<ffffffff816c8b88>] page_fault+0x28/0x30
[66815.083196]  [<ffffffff814ae4e9>] ? __scsi_remove_device+0x79/0x160
[66815.117444]  [<ffffffff814ae5c2>] ? __scsi_remove_device+0x152/0x160
[66815.152051]  [<ffffffff814ac790>] scsi_forget_host+0x60/0x70
[66815.183939]  [<ffffffff814a0137>] scsi_remove_host+0x77/0x110
[66815.216152]  [<ffffffffa0677be0>] srp_remove_work+0x90/0x200 [ib_srp]
[66815.253221]  [<ffffffff810a2e72>] process_one_work+0x152/0x400
[66815.286221]  [<ffffffff810a3765>] worker_thread+0x125/0x4b0
[66815.317313]  [<ffffffff810a3640>] ? rescuer_thread+0x380/0x380
[66815.349770]  [<ffffffff810a9298>] kthread+0xd8/0xf0
[66815.376082]  [<ffffffff816c6b3f>] ret_from_fork+0x1f/0x40
[66815.404767]  [<ffffffff810a91c0>] ? kthread_park+0x60/0x60
[66815.436448] ---[ end trace bfaf79198d0976f5 ]---

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-06 14:47                                                       ` Laurence Oberman
@ 2016-08-07 22:31                                                         ` Bart Van Assche
  2016-08-08 12:45                                                           ` Laurence Oberman
  2016-08-08 15:11                                                         ` Bart Van Assche
  1 sibling, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-08-07 22:31 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: Mike Snitzer, dm-devel, linux-scsi

On 08/06/16 07:47, Laurence Oberman wrote:
> [66813.933246] Workqueue: srp_remove srp_remove_work [ib_srp]
> [ ... ]
> [66815.152051]  [<ffffffff814ac790>] scsi_forget_host+0x60/0x70
> [66815.183939]  [<ffffffff814a0137>] scsi_remove_host+0x77/0x110
> [66815.216152]  [<ffffffffa0677be0>] srp_remove_work+0x90/0x200 [ib_srp]
> [66815.253221]  [<ffffffff810a2e72>] process_one_work+0x152/0x400
> [66815.286221]  [<ffffffff810a3765>] worker_thread+0x125/0x4b0
> [66815.317313]  [<ffffffff810a3640>] ? rescuer_thread+0x380/0x380
> [66815.349770]  [<ffffffff810a9298>] kthread+0xd8/0xf0
> [66815.376082]  [<ffffffff816c6b3f>] ret_from_fork+0x1f/0x40
> [66815.404767]  [<ffffffff810a91c0>] ? kthread_park+0x60/0x60

Hello Laurence,

This is a callstack I have not yet encountered myself during any test. 
Please provide the output of the following commands:
$ gdb /lib/modules/$(uname -r)/build/vmlinux
(gdb) list *(scsi_forget_host+0x60)

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-07 22:31                                                         ` [dm-devel] " Bart Van Assche
@ 2016-08-08 12:45                                                           ` Laurence Oberman
  2016-08-08 13:44                                                               ` Johannes Thumshirn
  0 siblings, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-08 12:45 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Mike Snitzer, dm-devel, linux-scsi



----- Original Message -----
> From: "Bart Van Assche" <bvanassche@acm.org>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: "Mike Snitzer" <snitzer@redhat.com>, dm-devel@redhat.com, linux-scsi@vger.kernel.org
> Sent: Sunday, August 7, 2016 6:31:11 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> On 08/06/16 07:47, Laurence Oberman wrote:
> > [66813.933246] Workqueue: srp_remove srp_remove_work [ib_srp]
> > [ ... ]
> > [66815.152051]  [<ffffffff814ac790>] scsi_forget_host+0x60/0x70
> > [66815.183939]  [<ffffffff814a0137>] scsi_remove_host+0x77/0x110
> > [66815.216152]  [<ffffffffa0677be0>] srp_remove_work+0x90/0x200 [ib_srp]
> > [66815.253221]  [<ffffffff810a2e72>] process_one_work+0x152/0x400
> > [66815.286221]  [<ffffffff810a3765>] worker_thread+0x125/0x4b0
> > [66815.317313]  [<ffffffff810a3640>] ? rescuer_thread+0x380/0x380
> > [66815.349770]  [<ffffffff810a9298>] kthread+0xd8/0xf0
> > [66815.376082]  [<ffffffff816c6b3f>] ret_from_fork+0x1f/0x40
> > [66815.404767]  [<ffffffff810a91c0>] ? kthread_park+0x60/0x60
> 
> Hello Laurence,
> 
> This is a callstack I have not yet encountered myself during any test.
> Please provide the output of the following commands:
> $ gdb /lib/modules/$(uname -r)/build/vmlinux
> (gdb) list *(scsi_forget_host+0x60)
> 
> Thanks,
> 
> Bart.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
[loberman@jumptest1 linux]$ gdb vmlinux
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/loberman/bart/linux/vmlinux...done.
(gdb) list *(scsi_forget_host+0x60)
0xffffffff814ac790 is in scsi_forget_host (drivers/scsi/scsi_scan.c:1895).
1890		list_for_each_entry(sdev, &shost->__devices, siblings) {
1891			if (sdev->sdev_state == SDEV_DEL)
1892				continue;
1893			spin_unlock_irqrestore(shost->host_lock, flags);
1894			__scsi_remove_device(sdev);
1895			goto restart;
1896		}
1897		spin_unlock_irqrestore(shost->host_lock, flags);
1898	}
1899	

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-08 12:45                                                           ` Laurence Oberman
@ 2016-08-08 13:44                                                               ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2016-08-08 13:44 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: Bart Van Assche, Mike Snitzer, dm-devel, linux-scsi

On Mon, Aug 08, 2016 at 08:45:59AM -0400, Laurence Oberman wrote:
> 
> 
> ----- Original Message -----
> > From: "Bart Van Assche" <bvanassche@acm.org>
> > To: "Laurence Oberman" <loberman@redhat.com>
> > Cc: "Mike Snitzer" <snitzer@redhat.com>, dm-devel@redhat.com, linux-scsi@vger.kernel.org
> > Sent: Sunday, August 7, 2016 6:31:11 PM
> > Subject: Re: [dm-devel] dm-mq and end_clone_request()
> > 
> > On 08/06/16 07:47, Laurence Oberman wrote:
> > > [66813.933246] Workqueue: srp_remove srp_remove_work [ib_srp]
> > > [ ... ]
> > > [66815.152051]  [<ffffffff814ac790>] scsi_forget_host+0x60/0x70
> > > [66815.183939]  [<ffffffff814a0137>] scsi_remove_host+0x77/0x110
> > > [66815.216152]  [<ffffffffa0677be0>] srp_remove_work+0x90/0x200 [ib_srp]
> > > [66815.253221]  [<ffffffff810a2e72>] process_one_work+0x152/0x400
> > > [66815.286221]  [<ffffffff810a3765>] worker_thread+0x125/0x4b0
> > > [66815.317313]  [<ffffffff810a3640>] ? rescuer_thread+0x380/0x380
> > > [66815.349770]  [<ffffffff810a9298>] kthread+0xd8/0xf0
> > > [66815.376082]  [<ffffffff816c6b3f>] ret_from_fork+0x1f/0x40
> > > [66815.404767]  [<ffffffff810a91c0>] ? kthread_park+0x60/0x60
> > 
> > Hello Laurence,
> > 
> > This is a callstack I have not yet encountered myself during any test.
> > Please provide the output of the following commands:
> > $ gdb /lib/modules/$(uname -r)/build/vmlinux
> > (gdb) list *(scsi_forget_host+0x60)
> > 
> > Thanks,
> > 
> > Bart.
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> [loberman@jumptest1 linux]$ gdb vmlinux
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /home/loberman/bart/linux/vmlinux...done.
> (gdb) list *(scsi_forget_host+0x60)
> 0xffffffff814ac790 is in scsi_forget_host (drivers/scsi/scsi_scan.c:1895).
> 1890		list_for_each_entry(sdev, &shost->__devices, siblings) {
> 1891			if (sdev->sdev_state == SDEV_DEL)
> 1892				continue;
> 1893			spin_unlock_irqrestore(shost->host_lock, flags);
> 1894			__scsi_remove_device(sdev);
> 1895			goto restart;
> 1896		}
> 1897		spin_unlock_irqrestore(shost->host_lock, flags);
> 1898	}
> 1899	


So __scsi_remove_device() is also checking for sdev->sdev_state == SDEV_DEL
and returns if so. If it would have the chance to do so the goto restart would
be hit and we'd reatart the list traverse. The if in turn just continues and
I've seen endless loops with this pattern (check the 40998193560 ->
90a88d6ef88edc -> f05795d3d771f30a7bd commit chain).

Might want to give the below patch a shot?


>From fee838ebfea88b581994b3f855eab8da20b07fc9 Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <jthumshirn@suse.de>
Date: Mon, 8 Aug 2016 15:41:09 +0200
Subject: [PATCH] scsi: restart full list search when re-encountering a deleted device

__scsi_remove_device() already checks for reentrency with a deleted device,
so there's no need to do it in scsi_forget_host() as well.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_scan.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index e0a78f5..1c5a4d6 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1890,8 +1890,6 @@ void scsi_forget_host(struct Scsi_Host *shost)
  restart:
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_for_each_entry(sdev, &shost->__devices, siblings) {
-		if (sdev->sdev_state == SDEV_DEL)
-			continue;
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		__scsi_remove_device(sdev);
 		goto restart;
-- 
2.9.2


-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
@ 2016-08-08 13:44                                                               ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2016-08-08 13:44 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: Bart Van Assche, Mike Snitzer, dm-devel, linux-scsi

On Mon, Aug 08, 2016 at 08:45:59AM -0400, Laurence Oberman wrote:
> 
> 
> ----- Original Message -----
> > From: "Bart Van Assche" <bvanassche@acm.org>
> > To: "Laurence Oberman" <loberman@redhat.com>
> > Cc: "Mike Snitzer" <snitzer@redhat.com>, dm-devel@redhat.com, linux-scsi@vger.kernel.org
> > Sent: Sunday, August 7, 2016 6:31:11 PM
> > Subject: Re: [dm-devel] dm-mq and end_clone_request()
> > 
> > On 08/06/16 07:47, Laurence Oberman wrote:
> > > [66813.933246] Workqueue: srp_remove srp_remove_work [ib_srp]
> > > [ ... ]
> > > [66815.152051]  [<ffffffff814ac790>] scsi_forget_host+0x60/0x70
> > > [66815.183939]  [<ffffffff814a0137>] scsi_remove_host+0x77/0x110
> > > [66815.216152]  [<ffffffffa0677be0>] srp_remove_work+0x90/0x200 [ib_srp]
> > > [66815.253221]  [<ffffffff810a2e72>] process_one_work+0x152/0x400
> > > [66815.286221]  [<ffffffff810a3765>] worker_thread+0x125/0x4b0
> > > [66815.317313]  [<ffffffff810a3640>] ? rescuer_thread+0x380/0x380
> > > [66815.349770]  [<ffffffff810a9298>] kthread+0xd8/0xf0
> > > [66815.376082]  [<ffffffff816c6b3f>] ret_from_fork+0x1f/0x40
> > > [66815.404767]  [<ffffffff810a91c0>] ? kthread_park+0x60/0x60
> > 
> > Hello Laurence,
> > 
> > This is a callstack I have not yet encountered myself during any test.
> > Please provide the output of the following commands:
> > $ gdb /lib/modules/$(uname -r)/build/vmlinux
> > (gdb) list *(scsi_forget_host+0x60)
> > 
> > Thanks,
> > 
> > Bart.
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> [loberman@jumptest1 linux]$ gdb vmlinux
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /home/loberman/bart/linux/vmlinux...done.
> (gdb) list *(scsi_forget_host+0x60)
> 0xffffffff814ac790 is in scsi_forget_host (drivers/scsi/scsi_scan.c:1895).
> 1890		list_for_each_entry(sdev, &shost->__devices, siblings) {
> 1891			if (sdev->sdev_state == SDEV_DEL)
> 1892				continue;
> 1893			spin_unlock_irqrestore(shost->host_lock, flags);
> 1894			__scsi_remove_device(sdev);
> 1895			goto restart;
> 1896		}
> 1897		spin_unlock_irqrestore(shost->host_lock, flags);
> 1898	}
> 1899	


So __scsi_remove_device() is also checking for sdev->sdev_state == SDEV_DEL
and returns if so. If it would have the chance to do so the goto restart would
be hit and we'd reatart the list traverse. The if in turn just continues and
I've seen endless loops with this pattern (check the 40998193560 ->
90a88d6ef88edc -> f05795d3d771f30a7bd commit chain).

Might want to give the below patch a shot?


From fee838ebfea88b581994b3f855eab8da20b07fc9 Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <jthumshirn@suse.de>
Date: Mon, 8 Aug 2016 15:41:09 +0200
Subject: [PATCH] scsi: restart full list search when re-encountering a deleted device

__scsi_remove_device() already checks for reentrency with a deleted device,
so there's no need to do it in scsi_forget_host() as well.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_scan.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index e0a78f5..1c5a4d6 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1890,8 +1890,6 @@ void scsi_forget_host(struct Scsi_Host *shost)
  restart:
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_for_each_entry(sdev, &shost->__devices, siblings) {
-		if (sdev->sdev_state == SDEV_DEL)
-			continue;
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		__scsi_remove_device(sdev);
 		goto restart;
-- 
2.9.2


-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-08 13:44                                                               ` Johannes Thumshirn
  (?)
@ 2016-08-08 14:32                                                               ` Laurence Oberman
  -1 siblings, 0 replies; 74+ messages in thread
From: Laurence Oberman @ 2016-08-08 14:32 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Bart Van Assche, Mike Snitzer, dm-devel, linux-scsi



----- Original Message -----
> From: "Johannes Thumshirn" <jthumshirn@suse.de>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: "Bart Van Assche" <bvanassche@acm.org>, "Mike Snitzer" <snitzer@redhat.com>, dm-devel@redhat.com,
> linux-scsi@vger.kernel.org
> Sent: Monday, August 8, 2016 9:44:40 AM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> On Mon, Aug 08, 2016 at 08:45:59AM -0400, Laurence Oberman wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "Bart Van Assche" <bvanassche@acm.org>
> > > To: "Laurence Oberman" <loberman@redhat.com>
> > > Cc: "Mike Snitzer" <snitzer@redhat.com>, dm-devel@redhat.com,
> > > linux-scsi@vger.kernel.org
> > > Sent: Sunday, August 7, 2016 6:31:11 PM
> > > Subject: Re: [dm-devel] dm-mq and end_clone_request()
> > > 
> > > On 08/06/16 07:47, Laurence Oberman wrote:
> > > > [66813.933246] Workqueue: srp_remove srp_remove_work [ib_srp]
> > > > [ ... ]
> > > > [66815.152051]  [<ffffffff814ac790>] scsi_forget_host+0x60/0x70
> > > > [66815.183939]  [<ffffffff814a0137>] scsi_remove_host+0x77/0x110
> > > > [66815.216152]  [<ffffffffa0677be0>] srp_remove_work+0x90/0x200
> > > > [ib_srp]
> > > > [66815.253221]  [<ffffffff810a2e72>] process_one_work+0x152/0x400
> > > > [66815.286221]  [<ffffffff810a3765>] worker_thread+0x125/0x4b0
> > > > [66815.317313]  [<ffffffff810a3640>] ? rescuer_thread+0x380/0x380
> > > > [66815.349770]  [<ffffffff810a9298>] kthread+0xd8/0xf0
> > > > [66815.376082]  [<ffffffff816c6b3f>] ret_from_fork+0x1f/0x40
> > > > [66815.404767]  [<ffffffff810a91c0>] ? kthread_park+0x60/0x60
> > > 
> > > Hello Laurence,
> > > 
> > > This is a callstack I have not yet encountered myself during any test.
> > > Please provide the output of the following commands:
> > > $ gdb /lib/modules/$(uname -r)/build/vmlinux
> > > (gdb) list *(scsi_forget_host+0x60)
> > > 
> > > Thanks,
> > > 
> > > Bart.
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > [loberman@jumptest1 linux]$ gdb vmlinux
> > GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
> > Copyright (C) 2013 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64-redhat-linux-gnu".
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>...
> > Reading symbols from /home/loberman/bart/linux/vmlinux...done.
> > (gdb) list *(scsi_forget_host+0x60)
> > 0xffffffff814ac790 is in scsi_forget_host (drivers/scsi/scsi_scan.c:1895).
> > 1890		list_for_each_entry(sdev, &shost->__devices, siblings) {
> > 1891			if (sdev->sdev_state == SDEV_DEL)
> > 1892				continue;
> > 1893			spin_unlock_irqrestore(shost->host_lock, flags);
> > 1894			__scsi_remove_device(sdev);
> > 1895			goto restart;
> > 1896		}
> > 1897		spin_unlock_irqrestore(shost->host_lock, flags);
> > 1898	}
> > 1899
> 
> 
> So __scsi_remove_device() is also checking for sdev->sdev_state == SDEV_DEL
> and returns if so. If it would have the chance to do so the goto restart
> would
> be hit and we'd reatart the list traverse. The if in turn just continues and
> I've seen endless loops with this pattern (check the 40998193560 ->
> 90a88d6ef88edc -> f05795d3d771f30a7bd commit chain).
> 
> Might want to give the below patch a shot?
> 
> 
> From fee838ebfea88b581994b3f855eab8da20b07fc9 Mon Sep 17 00:00:00 2001
> From: Johannes Thumshirn <jthumshirn@suse.de>
> Date: Mon, 8 Aug 2016 15:41:09 +0200
> Subject: [PATCH] scsi: restart full list search when re-encountering a
> deleted device
> 
> __scsi_remove_device() already checks for reentrency with a deleted device,
> so there's no need to do it in scsi_forget_host() as well.
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/scsi/scsi_scan.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index e0a78f5..1c5a4d6 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1890,8 +1890,6 @@ void scsi_forget_host(struct Scsi_Host *shost)
>   restart:
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	list_for_each_entry(sdev, &shost->__devices, siblings) {
> -		if (sdev->sdev_state == SDEV_DEL)
> -			continue;
>  		spin_unlock_irqrestore(shost->host_lock, flags);
>  		__scsi_remove_device(sdev);
>  		goto restart;
> --
> 2.9.2
> 
> 
> --
> Johannes Thumshirn                                          Storage
> jthumshirn@suse.de                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Johannes,
Thanks very much.
Regards
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-08 13:44                                                               ` Johannes Thumshirn
  (?)
  (?)
@ 2016-08-08 14:54                                                               ` Bart Van Assche
  -1 siblings, 0 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-08-08 14:54 UTC (permalink / raw)
  To: Johannes Thumshirn, Laurence Oberman; +Cc: Mike Snitzer, dm-devel, linux-scsi

On 08/08/16 06:44, Johannes Thumshirn wrote:
> So __scsi_remove_device() is also checking for sdev->sdev_state == SDEV_DEL
> and returns if so. If it would have the chance to do so the goto restart would
> be hit and we'd reatart the list traverse. The if in turn just continues and
> I've seen endless loops with this pattern (check the 40998193560 ->
> 90a88d6ef88edc -> f05795d3d771f30a7bd commit chain).
>
> Might want to give the below patch a shot?
>
>
>>From fee838ebfea88b581994b3f855eab8da20b07fc9 Mon Sep 17 00:00:00 2001
> From: Johannes Thumshirn <jthumshirn@suse.de>
> Date: Mon, 8 Aug 2016 15:41:09 +0200
> Subject: [PATCH] scsi: restart full list search when re-encountering a deleted device
>
> __scsi_remove_device() already checks for reentrency with a deleted device,
> so there's no need to do it in scsi_forget_host() as well.
>
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/scsi/scsi_scan.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index e0a78f5..1c5a4d6 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1890,8 +1890,6 @@ void scsi_forget_host(struct Scsi_Host *shost)
>   restart:
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	list_for_each_entry(sdev, &shost->__devices, siblings) {
> -		if (sdev->sdev_state == SDEV_DEL)
> -			continue;
>  		spin_unlock_irqrestore(shost->host_lock, flags);
>  		__scsi_remove_device(sdev);
>  		goto restart;

Hello Johannes,

Sorry but I do not agree with this patch. This patch will namely cause 
scsi_forget_host() to busy-wait until 
scsi_device_dev_release_usercontext() is called if it encounters a SCSI 
device on the host list that is in state SDEV_DEL. Additionally, I don't 
see how this patch can avoid an endless loop.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-06 14:47                                                       ` Laurence Oberman
  2016-08-07 22:31                                                         ` [dm-devel] " Bart Van Assche
@ 2016-08-08 15:11                                                         ` Bart Van Assche
  2016-08-08 15:26                                                           ` Laurence Oberman
  1 sibling, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-08-08 15:11 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: dm-devel, linux-scsi, Mike Snitzer, Johannes Thumshirn

On 08/06/16 07:47, Laurence Oberman wrote:
> [66814.772851]  [<ffffffff8118e72f>] ? panic+0x1eb/0x233
> [66814.800207]  [<ffffffff810308f8>] oops_end+0xb8/0xd0
> [66814.827454]  [<ffffffff8106977e>] no_context+0x13e/0x3a0
> [66814.858368]  [<ffffffff811f3feb>] ? __slab_free+0x9b/0x280
> [66814.890365]  [<ffffffff81069ace>] __bad_area_nosemaphore+0xee/0x1d0
> [66814.926508]  [<ffffffff81069bc4>] bad_area_nosemaphore+0x14/0x20
> [66814.959939]  [<ffffffff8106a269>] __do_page_fault+0x89/0x4a0
> [66814.992039]  [<ffffffff811f3feb>] ? __slab_free+0x9b/0x280
> [66815.023052]  [<ffffffff8106a6b0>] do_page_fault+0x30/0x80
> [66815.053368]  [<ffffffff816c8b88>] page_fault+0x28/0x30
> [66815.083196]  [<ffffffff814ae4e9>] ? __scsi_remove_device+0x79/0x160
> [66815.117444]  [<ffffffff814ae5c2>] ? __scsi_remove_device+0x152/0x160
> [66815.152051]  [<ffffffff814ac790>] scsi_forget_host+0x60/0x70
> [66815.183939]  [<ffffffff814a0137>] scsi_remove_host+0x77/0x110
> [66815.216152]  [<ffffffffa0677be0>] srp_remove_work+0x90/0x200 [ib_srp]
> [66815.253221]  [<ffffffff810a2e72>] process_one_work+0x152/0x400
> [66815.286221]  [<ffffffff810a3765>] worker_thread+0x125/0x4b0
> [66815.317313]  [<ffffffff810a3640>] ? rescuer_thread+0x380/0x380
> [66815.349770]  [<ffffffff810a9298>] kthread+0xd8/0xf0
> [66815.376082]  [<ffffffff816c6b3f>] ret_from_fork+0x1f/0x40
> [66815.404767]  [<ffffffff810a91c0>] ? kthread_park+0x60/0x60

Hello Laurence and Johannes,

Later today I will start testing the following patch:


Subject: [PATCH] Fix a use-after-free in scsi_forget_host()

Avoid that scsi_forget_host() can call __scsi_remove_device()
while scsi_device_dev_release_usercontext() is freeing the same
SCSI device.
---
 drivers/scsi/scsi_scan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index e0a78f5..6cffc90 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost)
  restart:
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_for_each_entry(sdev, &shost->__devices, siblings) {
-		if (sdev->sdev_state == SDEV_DEL)
+		if (sdev->sdev_state == SDEV_DEL || scsi_device_get(sdev) < 0)
 			continue;
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		__scsi_remove_device(sdev);
+		scsi_device_put(sdev);
 		goto restart;
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
-- 
2.9.2



^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-08 15:11                                                         ` Bart Van Assche
@ 2016-08-08 15:26                                                           ` Laurence Oberman
  2016-08-08 15:28                                                             ` Bart Van Assche
  2016-08-08 22:39                                                             ` Bart Van Assche
  0 siblings, 2 replies; 74+ messages in thread
From: Laurence Oberman @ 2016-08-08 15:26 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-scsi, Mike Snitzer, Johannes Thumshirn



----- Original Message -----
> From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" <snitzer@redhat.com>, "Johannes Thumshirn"
> <jthumshirn@suse.de>
> Sent: Monday, August 8, 2016 11:11:56 AM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> On 08/06/16 07:47, Laurence Oberman wrote:
> > [66814.772851]  [<ffffffff8118e72f>] ? panic+0x1eb/0x233
> > [66814.800207]  [<ffffffff810308f8>] oops_end+0xb8/0xd0
> > [66814.827454]  [<ffffffff8106977e>] no_context+0x13e/0x3a0
> > [66814.858368]  [<ffffffff811f3feb>] ? __slab_free+0x9b/0x280
> > [66814.890365]  [<ffffffff81069ace>] __bad_area_nosemaphore+0xee/0x1d0
> > [66814.926508]  [<ffffffff81069bc4>] bad_area_nosemaphore+0x14/0x20
> > [66814.959939]  [<ffffffff8106a269>] __do_page_fault+0x89/0x4a0
> > [66814.992039]  [<ffffffff811f3feb>] ? __slab_free+0x9b/0x280
> > [66815.023052]  [<ffffffff8106a6b0>] do_page_fault+0x30/0x80
> > [66815.053368]  [<ffffffff816c8b88>] page_fault+0x28/0x30
> > [66815.083196]  [<ffffffff814ae4e9>] ? __scsi_remove_device+0x79/0x160
> > [66815.117444]  [<ffffffff814ae5c2>] ? __scsi_remove_device+0x152/0x160
> > [66815.152051]  [<ffffffff814ac790>] scsi_forget_host+0x60/0x70
> > [66815.183939]  [<ffffffff814a0137>] scsi_remove_host+0x77/0x110
> > [66815.216152]  [<ffffffffa0677be0>] srp_remove_work+0x90/0x200 [ib_srp]
> > [66815.253221]  [<ffffffff810a2e72>] process_one_work+0x152/0x400
> > [66815.286221]  [<ffffffff810a3765>] worker_thread+0x125/0x4b0
> > [66815.317313]  [<ffffffff810a3640>] ? rescuer_thread+0x380/0x380
> > [66815.349770]  [<ffffffff810a9298>] kthread+0xd8/0xf0
> > [66815.376082]  [<ffffffff816c6b3f>] ret_from_fork+0x1f/0x40
> > [66815.404767]  [<ffffffff810a91c0>] ? kthread_park+0x60/0x60
> 
> Hello Laurence and Johannes,
> 
> Later today I will start testing the following patch:
> 
> 
> Subject: [PATCH] Fix a use-after-free in scsi_forget_host()
> 
> Avoid that scsi_forget_host() can call __scsi_remove_device()
> while scsi_device_dev_release_usercontext() is freeing the same
> SCSI device.
> ---
>  drivers/scsi/scsi_scan.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index e0a78f5..6cffc90 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost)
>   restart:
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	list_for_each_entry(sdev, &shost->__devices, siblings) {
> -		if (sdev->sdev_state == SDEV_DEL)
> +		if (sdev->sdev_state == SDEV_DEL || scsi_device_get(sdev) < 0)
>  			continue;
>  		spin_unlock_irqrestore(shost->host_lock, flags);
>  		__scsi_remove_device(sdev);
> +		scsi_device_put(sdev);
>  		goto restart;
>  	}
>  	spin_unlock_irqrestore(shost->host_lock, flags);
> --
> 2.9.2
> 
> 
> 
Hello Bart, Johannes

I will test this as well.
I have lost my DDN array today (sadly:)) but I have two systems back to back again using ramdisk on the one to serve LUNS.

If I pull from  https://github.com/bvanassche/linux again, and switch branch to srp-initiator-for-next, will I get all Mikes latest patches from last week + this.
I guess I can just check myself, but might as well just ask.

Thanks!!
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-08 15:26                                                           ` Laurence Oberman
@ 2016-08-08 15:28                                                             ` Bart Van Assche
  2016-08-08 22:39                                                             ` Bart Van Assche
  1 sibling, 0 replies; 74+ messages in thread
From: Bart Van Assche @ 2016-08-08 15:28 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: dm-devel, linux-scsi, Mike Snitzer, Johannes Thumshirn

On 08/08/16 08:26, Laurence Oberman wrote:
>
>
> ----- Original Message -----
>> From: "Bart Van Assche" <Bart.VanAssche@sandisk.com>
>> To: "Laurence Oberman" <loberman@redhat.com>
>> Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" <snitzer@redhat.com>, "Johannes Thumshirn"
>> <jthumshirn@suse.de>
>> Sent: Monday, August 8, 2016 11:11:56 AM
>> Subject: Re: [dm-devel] dm-mq and end_clone_request()
>>
>> On 08/06/16 07:47, Laurence Oberman wrote:
>>> [66814.772851]  [<ffffffff8118e72f>] ? panic+0x1eb/0x233
>>> [66814.800207]  [<ffffffff810308f8>] oops_end+0xb8/0xd0
>>> [66814.827454]  [<ffffffff8106977e>] no_context+0x13e/0x3a0
>>> [66814.858368]  [<ffffffff811f3feb>] ? __slab_free+0x9b/0x280
>>> [66814.890365]  [<ffffffff81069ace>] __bad_area_nosemaphore+0xee/0x1d0
>>> [66814.926508]  [<ffffffff81069bc4>] bad_area_nosemaphore+0x14/0x20
>>> [66814.959939]  [<ffffffff8106a269>] __do_page_fault+0x89/0x4a0
>>> [66814.992039]  [<ffffffff811f3feb>] ? __slab_free+0x9b/0x280
>>> [66815.023052]  [<ffffffff8106a6b0>] do_page_fault+0x30/0x80
>>> [66815.053368]  [<ffffffff816c8b88>] page_fault+0x28/0x30
>>> [66815.083196]  [<ffffffff814ae4e9>] ? __scsi_remove_device+0x79/0x160
>>> [66815.117444]  [<ffffffff814ae5c2>] ? __scsi_remove_device+0x152/0x160
>>> [66815.152051]  [<ffffffff814ac790>] scsi_forget_host+0x60/0x70
>>> [66815.183939]  [<ffffffff814a0137>] scsi_remove_host+0x77/0x110
>>> [66815.216152]  [<ffffffffa0677be0>] srp_remove_work+0x90/0x200 [ib_srp]
>>> [66815.253221]  [<ffffffff810a2e72>] process_one_work+0x152/0x400
>>> [66815.286221]  [<ffffffff810a3765>] worker_thread+0x125/0x4b0
>>> [66815.317313]  [<ffffffff810a3640>] ? rescuer_thread+0x380/0x380
>>> [66815.349770]  [<ffffffff810a9298>] kthread+0xd8/0xf0
>>> [66815.376082]  [<ffffffff816c6b3f>] ret_from_fork+0x1f/0x40
>>> [66815.404767]  [<ffffffff810a91c0>] ? kthread_park+0x60/0x60
>>
>> Hello Laurence and Johannes,
>>
>> Later today I will start testing the following patch:
>>
>>
>> Subject: [PATCH] Fix a use-after-free in scsi_forget_host()
>>
>> Avoid that scsi_forget_host() can call __scsi_remove_device()
>> while scsi_device_dev_release_usercontext() is freeing the same
>> SCSI device.
>> ---
>>  drivers/scsi/scsi_scan.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>> index e0a78f5..6cffc90 100644
>> --- a/drivers/scsi/scsi_scan.c
>> +++ b/drivers/scsi/scsi_scan.c
>> @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost)
>>   restart:
>>  	spin_lock_irqsave(shost->host_lock, flags);
>>  	list_for_each_entry(sdev, &shost->__devices, siblings) {
>> -		if (sdev->sdev_state == SDEV_DEL)
>> +		if (sdev->sdev_state == SDEV_DEL || scsi_device_get(sdev) < 0)
>>  			continue;
>>  		spin_unlock_irqrestore(shost->host_lock, flags);
>>  		__scsi_remove_device(sdev);
>> +		scsi_device_put(sdev);
>>  		goto restart;
>>  	}
>>  	spin_unlock_irqrestore(shost->host_lock, flags);
>> --
>> 2.9.2
>>
>>
>>
> Hello Bart, Johannes
>
> I will test this as well.
> I have lost my DDN array today (sadly:)) but I have two systems back to back again using ramdisk on the one to serve LUNS.
>
> If I pull from  https://github.com/bvanassche/linux again, and switch branch to srp-initiator-for-next, will I get all Mikes latest patches from last week + this.
> I guess I can just check myself, but might as well just ask.

Hello Laurence,

The above patch is not yet available on my srp-initiator-for-next 
branch. I will push out that patch later today after it has been tested.

Bart.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-08 15:26                                                           ` Laurence Oberman
  2016-08-08 15:28                                                             ` Bart Van Assche
@ 2016-08-08 22:39                                                             ` Bart Van Assche
  2016-08-08 22:52                                                               ` Laurence Oberman
  1 sibling, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-08-08 22:39 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: dm-devel, Mike Snitzer, linux-scsi, Johannes Thumshirn

On 08/08/2016 08:26 AM, Laurence Oberman wrote:
> I will test this as well.
> I have lost my DDN array today (sadly:)) but I have two systems
> back to back again using ramdisk on the one to serve LUNS.
> 
> If I pull from  https://github.com/bvanassche/linux again, and
> switch branch to srp-initiator-for-next, will I get all Mikes
> latest patches from last week + this. I guess I can just check
> myself, but might as well just ask.

Hello Laurence,

Sorry but I do not yet have a fix available for the scsi_forget_host()
crash you reported in an earlier e-mail. But Mike's latest patches
including the patch below are now available at
https://github.com/bvanassche/linux in the srp-initiator-for-next
branch. Further feedback is welcome.

Thanks,

Bart.

[PATCH] Check invariants at runtime

Warn if sdev->sdev_state != SDEV_DEL when __scsi_remove_device()
returns. Check whether all __scsi_remove_device() callers hold the
scan_mutex.
---
 drivers/scsi/scsi_sysfs.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82209ad4..a21e321 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1312,6 +1312,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 {
 	struct device *dev = &sdev->sdev_gendev, *sdp = NULL;
 
+	lockdep_assert_held(&sdev->host->scan_mutex);
+
 	/*
 	 * This cleanup path is not reentrant and while it is impossible
 	 * to get a new reference with scsi_device_get() someone can still
@@ -1321,8 +1323,11 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		return;
 
 	if (sdev->is_visible) {
-		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
+		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) {
+			WARN_ONCE(sdev->sdev_state != SDEV_DEL,
+				  "sdev state %d\n", sdev->sdev_state);
 			return;
+		}
 
 		bsg_unregister_queue(sdev->request_queue);
 		sdp = scsi_get_ulpdev(dev);
@@ -1339,6 +1344,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	 * device.
 	 */
 	scsi_device_set_state(sdev, SDEV_DEL);
+	WARN_ONCE(sdev->sdev_state != SDEV_DEL, "sdev state %d\n",
+		  sdev->sdev_state);
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-08 22:39                                                             ` Bart Van Assche
@ 2016-08-08 22:52                                                               ` Laurence Oberman
  2016-08-09  0:09                                                                 ` Laurence Oberman
  0 siblings, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-08 22:52 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, Mike Snitzer, linux-scsi, Johannes Thumshirn



----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: dm-devel@redhat.com, "Mike Snitzer" <snitzer@redhat.com>, linux-scsi@vger.kernel.org, "Johannes Thumshirn"
> <jthumshirn@suse.de>
> Sent: Monday, August 8, 2016 6:39:07 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> On 08/08/2016 08:26 AM, Laurence Oberman wrote:
> > I will test this as well.
> > I have lost my DDN array today (sadly:)) but I have two systems
> > back to back again using ramdisk on the one to serve LUNS.
> > 
> > If I pull from  https://github.com/bvanassche/linux again, and
> > switch branch to srp-initiator-for-next, will I get all Mikes
> > latest patches from last week + this. I guess I can just check
> > myself, but might as well just ask.
> 
> Hello Laurence,
> 
> Sorry but I do not yet have a fix available for the scsi_forget_host()
> crash you reported in an earlier e-mail. But Mike's latest patches
> including the patch below are now available at
> https://github.com/bvanassche/linux in the srp-initiator-for-next
> branch. Further feedback is welcome.
> 
> Thanks,
> 
> Bart.
> 
> [PATCH] Check invariants at runtime
> 
> Warn if sdev->sdev_state != SDEV_DEL when __scsi_remove_device()
> returns. Check whether all __scsi_remove_device() callers hold the
> scan_mutex.
> ---
>  drivers/scsi/scsi_sysfs.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 82209ad4..a21e321 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1312,6 +1312,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  {
>  	struct device *dev = &sdev->sdev_gendev, *sdp = NULL;
>  
> +	lockdep_assert_held(&sdev->host->scan_mutex);
> +
>  	/*
>  	 * This cleanup path is not reentrant and while it is impossible
>  	 * to get a new reference with scsi_device_get() someone can still
> @@ -1321,8 +1323,11 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  		return;
>  
>  	if (sdev->is_visible) {
> -		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> +		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) {
> +			WARN_ONCE(sdev->sdev_state != SDEV_DEL,
> +				  "sdev state %d\n", sdev->sdev_state);
>  			return;
> +		}
>  
>  		bsg_unregister_queue(sdev->request_queue);
>  		sdp = scsi_get_ulpdev(dev);
> @@ -1339,6 +1344,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  	 * device.
>  	 */
>  	scsi_device_set_state(sdev, SDEV_DEL);
> +	WARN_ONCE(sdev->sdev_state != SDEV_DEL, "sdev state %d\n",
> +		  sdev->sdev_state);
>  	blk_cleanup_queue(sdev->request_queue);
>  	cancel_work_sync(&sdev->requeue_work);
>  
> --
> 2.9.2
> 
Hello Bart

No problem Sir. I did apply the patch just to help you test and so far it been stable.
I will revert it and carry on my debugging of the dm issue.
I do have the other patches in the original pull request I took so I am running with all Mike's patches.

Many Thanks as always for all the help you provide all of us.

Thanks
Laurence


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-08 22:52                                                               ` Laurence Oberman
@ 2016-08-09  0:09                                                                 ` Laurence Oberman
  2016-08-09 15:51                                                                   ` Bart Van Assche
  0 siblings, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-09  0:09 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, Mike Snitzer, linux-scsi, Johannes Thumshirn



----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> Cc: dm-devel@redhat.com, "Mike Snitzer" <snitzer@redhat.com>, linux-scsi@vger.kernel.org, "Johannes Thumshirn"
> <jthumshirn@suse.de>
> Sent: Monday, August 8, 2016 6:52:47 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> 
> 
> ----- Original Message -----
> > From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> > To: "Laurence Oberman" <loberman@redhat.com>
> > Cc: dm-devel@redhat.com, "Mike Snitzer" <snitzer@redhat.com>,
> > linux-scsi@vger.kernel.org, "Johannes Thumshirn"
> > <jthumshirn@suse.de>
> > Sent: Monday, August 8, 2016 6:39:07 PM
> > Subject: Re: [dm-devel] dm-mq and end_clone_request()
> > 
> > On 08/08/2016 08:26 AM, Laurence Oberman wrote:
> > > I will test this as well.
> > > I have lost my DDN array today (sadly:)) but I have two systems
> > > back to back again using ramdisk on the one to serve LUNS.
> > > 
> > > If I pull from  https://github.com/bvanassche/linux again, and
> > > switch branch to srp-initiator-for-next, will I get all Mikes
> > > latest patches from last week + this. I guess I can just check
> > > myself, but might as well just ask.
> > 
> > Hello Laurence,
> > 
> > Sorry but I do not yet have a fix available for the scsi_forget_host()
> > crash you reported in an earlier e-mail. But Mike's latest patches
> > including the patch below are now available at
> > https://github.com/bvanassche/linux in the srp-initiator-for-next
> > branch. Further feedback is welcome.
> > 
> > Thanks,
> > 
> > Bart.
> > 
> > [PATCH] Check invariants at runtime
> > 
> > Warn if sdev->sdev_state != SDEV_DEL when __scsi_remove_device()
> > returns. Check whether all __scsi_remove_device() callers hold the
> > scan_mutex.
> > ---
> >  drivers/scsi/scsi_sysfs.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > index 82209ad4..a21e321 100644
> > --- a/drivers/scsi/scsi_sysfs.c
> > +++ b/drivers/scsi/scsi_sysfs.c
> > @@ -1312,6 +1312,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >  {
> >  	struct device *dev = &sdev->sdev_gendev, *sdp = NULL;
> >  
> > +	lockdep_assert_held(&sdev->host->scan_mutex);
> > +
> >  	/*
> >  	 * This cleanup path is not reentrant and while it is impossible
> >  	 * to get a new reference with scsi_device_get() someone can still
> > @@ -1321,8 +1323,11 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >  		return;
> >  
> >  	if (sdev->is_visible) {
> > -		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> > +		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) {
> > +			WARN_ONCE(sdev->sdev_state != SDEV_DEL,
> > +				  "sdev state %d\n", sdev->sdev_state);
> >  			return;
> > +		}
> >  
> >  		bsg_unregister_queue(sdev->request_queue);
> >  		sdp = scsi_get_ulpdev(dev);
> > @@ -1339,6 +1344,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >  	 * device.
> >  	 */
> >  	scsi_device_set_state(sdev, SDEV_DEL);
> > +	WARN_ONCE(sdev->sdev_state != SDEV_DEL, "sdev state %d\n",
> > +		  sdev->sdev_state);
> >  	blk_cleanup_queue(sdev->request_queue);
> >  	cancel_work_sync(&sdev->requeue_work);
> >  
> > --
> > 2.9.2
> > 
> Hello Bart
> 
> No problem Sir. I did apply the patch just to help you test and so far it
> been stable.
> I will revert it and carry on my debugging of the dm issue.
> I do have the other patches in the original pull request I took so I am
> running with all Mike's patches.
> 
> Many Thanks as always for all the help you provide all of us.
> 
> Thanks
> Laurence
> 
> 
Hello Bart

So now back to a 10 LUN dual path (ramdisk backed) two-server configuration I am unable to reproduce the dm issue.
Recovery is very fast with the servers connected back to back.
This is using your kernel and this multipath.conf

        device {
                vendor "LIO-ORG"
                product "*"
                path_grouping_policy "multibus"
                path_selector "round-robin 0"
                path_checker "tur"
                features "0"
                hardware_handler "0"
                no_path_retry "queue"
        }

Mikes patches have definitely stabilized this issue for me on this configuration.

I will see if I can move to a larger target server that has more memory and allocate more mpath devices.
I feel this issue in large configurations is now rooted in multipath not bringing back maps sometimes even when the actual paths are back via srp_daemon.
I am still tracking that down.

If you recall, last week I caused some of our own issues by forgetting I had a no_path_retry 12 hiding in my multipath.conf.
Since removing that and spending most of the weekend testing on the DDN array (had to give that back today), 
most of my issues were either the sporadic host delete race or multipath not re-instantiating paths.

I dont know if this helps, but since applying your latest patch I have not seen the host delete race.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: dm-mq and end_clone_request()
  2016-08-09  0:09                                                                 ` Laurence Oberman
@ 2016-08-09 15:51                                                                   ` Bart Van Assche
  2016-08-09 17:12                                                                     ` [dm-devel] " Laurence Oberman
  0 siblings, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-08-09 15:51 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: dm-devel, linux-scsi, Mike Snitzer, Johannes Thumshirn

On 08/08/2016 05:09 PM, Laurence Oberman wrote:
> So now back to a 10 LUN dual path (ramdisk backed) two-server
> configuration I am unable to reproduce the dm issue.
> Recovery is very fast with the servers connected back to back.
> This is using your kernel and this multipath.conf
> 
> [ ... ]
> 
> Mikes patches have definitely stabilized this issue for me on this
> configuration.
> 
> I will see if I can move to a larger target server that has more
> memory and allocate more mpath devices. I feel this issue in large
> configurations is now rooted in multipath not bringing back maps
> sometimes even when the actual paths are back via srp_daemon.
> I am still tracking that down.
> 
> If you recall, last week I caused some of our own issues by
> forgetting I had a no_path_retry 12 hiding in my multipath.conf.
> Since removing that and spending most of the weekend testing on
> the DDN array (had to give that back today), most of my issues
> were either the sporadic host delete race or multipath not
> re-instantiating paths.
> 
> I dont know if this helps, but since applying your latest patch I
> have not seen the host delete race.

Hello Laurence,

My latest SCSI core patch adds additional instrumentation to the SCSI
core but does not change the behavior of the SCSI core. So it cannot
fix the scsi_forget_host() crash you had reported.

On my setup, with the kernel code from the srp-initiator-for-next
branch and with CONFIG_DM_MQ_DEFAULT=n, I still see that when I run the
srp-test software that fio reports I/O errors every now and then. What
I see in syslog seems to indicate that these I/O errors are generated
by dm-mpath:

Aug  9 08:45:39 ion-dev-ib-ini kernel: mpath 254:1: queue_if_no_path 1 -> 0
Aug  9 08:45:39 ion-dev-ib-ini kernel: must_push_back: 107 callbacks suppressed
Aug  9 08:45:39 ion-dev-ib-ini kernel: device-mapper: multipath: must_push_back: queue_if_no_path=0 suspend_active=1 suspending=0
Aug  9 08:45:39 ion-dev-ib-ini kernel: __multipath_map(): (a) returning -5
Aug  9 08:45:39 ion-dev-ib-ini kernel: map_request(): clone_and_map_rq() returned -5
Aug  9 08:45:39 ion-dev-ib-ini kernel: dm_complete_request: error = -5
Aug  9 08:45:39 ion-dev-ib-ini kernel: dm_softirq_done: dm-1 tio->error = -5

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-09 15:51                                                                   ` Bart Van Assche
@ 2016-08-09 17:12                                                                     ` Laurence Oberman
  2016-08-09 17:16                                                                       ` Bart Van Assche
  0 siblings, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-09 17:12 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, Mike Snitzer, linux-scsi, Johannes Thumshirn



----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: dm-devel@redhat.com, "Mike Snitzer" <snitzer@redhat.com>, linux-scsi@vger.kernel.org, "Johannes Thumshirn"
> <jthumshirn@suse.de>
> Sent: Tuesday, August 9, 2016 11:51:00 AM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> On 08/08/2016 05:09 PM, Laurence Oberman wrote:
> > So now back to a 10 LUN dual path (ramdisk backed) two-server
> > configuration I am unable to reproduce the dm issue.
> > Recovery is very fast with the servers connected back to back.
> > This is using your kernel and this multipath.conf
> > 
> > [ ... ]
> > 
> > Mikes patches have definitely stabilized this issue for me on this
> > configuration.
> > 
> > I will see if I can move to a larger target server that has more
> > memory and allocate more mpath devices. I feel this issue in large
> > configurations is now rooted in multipath not bringing back maps
> > sometimes even when the actual paths are back via srp_daemon.
> > I am still tracking that down.
> > 
> > If you recall, last week I caused some of our own issues by
> > forgetting I had a no_path_retry 12 hiding in my multipath.conf.
> > Since removing that and spending most of the weekend testing on
> > the DDN array (had to give that back today), most of my issues
> > were either the sporadic host delete race or multipath not
> > re-instantiating paths.
> > 
> > I dont know if this helps, but since applying your latest patch I
> > have not seen the host delete race.
> 
> Hello Laurence,
> 
> My latest SCSI core patch adds additional instrumentation to the SCSI
> core but does not change the behavior of the SCSI core. So it cannot
> fix the scsi_forget_host() crash you had reported.
> 
> On my setup, with the kernel code from the srp-initiator-for-next
> branch and with CONFIG_DM_MQ_DEFAULT=n, I still see that when I run the
> srp-test software that fio reports I/O errors every now and then. What
> I see in syslog seems to indicate that these I/O errors are generated
> by dm-mpath:
> 
> Aug  9 08:45:39 ion-dev-ib-ini kernel: mpath 254:1: queue_if_no_path 1 -> 0
> Aug  9 08:45:39 ion-dev-ib-ini kernel: must_push_back: 107 callbacks
> suppressed
> Aug  9 08:45:39 ion-dev-ib-ini kernel: device-mapper: multipath:
> must_push_back: queue_if_no_path=0 suspend_active=1 suspending=0
> Aug  9 08:45:39 ion-dev-ib-ini kernel: __multipath_map(): (a) returning -5
> Aug  9 08:45:39 ion-dev-ib-ini kernel: map_request(): clone_and_map_rq()
> returned -5
> Aug  9 08:45:39 ion-dev-ib-ini kernel: dm_complete_request: error = -5
> Aug  9 08:45:39 ion-dev-ib-ini kernel: dm_softirq_done: dm-1 tio->error = -5
> 
> Bart.
> 
> 
Hello Bart

I was talking about this patch

--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost)
  restart:
         spin_lock_irqsave(shost->host_lock, flags);
         list_for_each_entry(sdev, &shost->__devices, siblings) {
-                if (sdev->sdev_state == SDEV_DEL)
+                if (sdev->sdev_state == SDEV_DEL || scsi_device_get(sdev) < 0)
                         continue;
                 spin_unlock_irqrestore(shost->host_lock, flags);
                 __scsi_remove_device(sdev);
+                scsi_device_put(sdev);
                 goto restart;
         }
         spin_unlock_irqrestore(shost->host_lock, flags);
-- 

This is the one I applied. that's not just instrumentation right ?

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-09 17:12                                                                     ` [dm-devel] " Laurence Oberman
@ 2016-08-09 17:16                                                                       ` Bart Van Assche
  2016-08-09 17:21                                                                         ` Laurence Oberman
  0 siblings, 1 reply; 74+ messages in thread
From: Bart Van Assche @ 2016-08-09 17:16 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: dm-devel, linux-scsi, Mike Snitzer, Johannes Thumshirn

On 08/09/2016 10:12 AM, Laurence Oberman wrote:
> I was talking about this patch
>
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost)
>   restart:
>          spin_lock_irqsave(shost->host_lock, flags);
>          list_for_each_entry(sdev, &shost->__devices, siblings) {
> -                if (sdev->sdev_state == SDEV_DEL)
> +                if (sdev->sdev_state == SDEV_DEL || scsi_device_get(sdev) < 0)
>                          continue;
>                  spin_unlock_irqrestore(shost->host_lock, flags);
>                  __scsi_remove_device(sdev);
> +                scsi_device_put(sdev);
>                  goto restart;
>          }
>          spin_unlock_irqrestore(shost->host_lock, flags);

Hello Laurence,

Did you run your tests with that patch applied? If so, it would help if 
you could rerun your tests without that patch. If the above patch makes 
a difference it means that it can happen that __scsi_remove_device() 
does not change the device state into SDEV_DEL. That's a bug and we need 
to know whether or not __scsi_remove_device() behaves correctly.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-09 17:16                                                                       ` Bart Van Assche
@ 2016-08-09 17:21                                                                         ` Laurence Oberman
  2016-08-10 21:38                                                                           ` Laurence Oberman
  0 siblings, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-09 17:21 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-scsi, Mike Snitzer, Johannes Thumshirn



----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" <snitzer@redhat.com>, "Johannes Thumshirn"
> <jthumshirn@suse.de>
> Sent: Tuesday, August 9, 2016 1:16:52 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> On 08/09/2016 10:12 AM, Laurence Oberman wrote:
> > I was talking about this patch
> >
> > --- a/drivers/scsi/scsi_scan.c
> > +++ b/drivers/scsi/scsi_scan.c
> > @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost)
> >   restart:
> >          spin_lock_irqsave(shost->host_lock, flags);
> >          list_for_each_entry(sdev, &shost->__devices, siblings) {
> > -                if (sdev->sdev_state == SDEV_DEL)
> > +                if (sdev->sdev_state == SDEV_DEL || scsi_device_get(sdev)
> > < 0)
> >                          continue;
> >                  spin_unlock_irqrestore(shost->host_lock, flags);
> >                  __scsi_remove_device(sdev);
> > +                scsi_device_put(sdev);
> >                  goto restart;
> >          }
> >          spin_unlock_irqrestore(shost->host_lock, flags);
> 
> Hello Laurence,
> 
> Did you run your tests with that patch applied? If so, it would help if
> you could rerun your tests without that patch. If the above patch makes
> a difference it means that it can happen that __scsi_remove_device()
> does not change the device state into SDEV_DEL. That's a bug and we need
> to know whether or not __scsi_remove_device() behaves correctly.
> 
> Thanks,
> 
> Bart.
> 
Yes Sir, I ran all yesterdays tests on your kernel with that patch applied.
Of course it may well just be luck/coincidence that the host delete race is no longer happening
so I agree we need to re-run the tests so I will revert and re-run.
I will probably only get back to you tomorrow with the results.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-09 17:21                                                                         ` Laurence Oberman
@ 2016-08-10 21:38                                                                           ` Laurence Oberman
  2016-08-11 16:51                                                                             ` Laurence Oberman
  0 siblings, 1 reply; 74+ messages in thread
From: Laurence Oberman @ 2016-08-10 21:38 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-scsi, Mike Snitzer, Johannes Thumshirn



----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" <snitzer@redhat.com>, "Johannes Thumshirn"
> <jthumshirn@suse.de>
> Sent: Tuesday, August 9, 2016 1:21:15 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> 
> 
> ----- Original Message -----
> > From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> > To: "Laurence Oberman" <loberman@redhat.com>
> > Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer"
> > <snitzer@redhat.com>, "Johannes Thumshirn"
> > <jthumshirn@suse.de>
> > Sent: Tuesday, August 9, 2016 1:16:52 PM
> > Subject: Re: [dm-devel] dm-mq and end_clone_request()
> > 
> > On 08/09/2016 10:12 AM, Laurence Oberman wrote:
> > > I was talking about this patch
> > >
> > > --- a/drivers/scsi/scsi_scan.c
> > > +++ b/drivers/scsi/scsi_scan.c
> > > @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost)
> > >   restart:
> > >          spin_lock_irqsave(shost->host_lock, flags);
> > >          list_for_each_entry(sdev, &shost->__devices, siblings) {
> > > -                if (sdev->sdev_state == SDEV_DEL)
> > > +                if (sdev->sdev_state == SDEV_DEL ||
> > > scsi_device_get(sdev)
> > > < 0)
> > >                          continue;
> > >                  spin_unlock_irqrestore(shost->host_lock, flags);
> > >                  __scsi_remove_device(sdev);
> > > +                scsi_device_put(sdev);
> > >                  goto restart;
> > >          }
> > >          spin_unlock_irqrestore(shost->host_lock, flags);
> > 
> > Hello Laurence,
> > 
> > Did you run your tests with that patch applied? If so, it would help if
> > you could rerun your tests without that patch. If the above patch makes
> > a difference it means that it can happen that __scsi_remove_device()
> > does not change the device state into SDEV_DEL. That's a bug and we need
> > to know whether or not __scsi_remove_device() behaves correctly.
> > 
> > Thanks,
> > 
> > Bart.
> > 
> Yes Sir, I ran all yesterdays tests on your kernel with that patch applied.
> Of course it may well just be luck/coincidence that the host delete race is
> no longer happening
> so I agree we need to re-run the tests so I will revert and re-run.
> I will probably only get back to you tomorrow with the results.
> 
> Thanks
> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello Bart

I only just got time now to revert that patch and build a kernel.
Will test this tonight and let you know if I am back to seeing panics sporadically without the patch.
As already mentioned, this is a different configuration to what I had when I was able to reproduce the panic.
This means the lack of hitting this stack trace and panic may turn out to have nothing to do with the patch I applied.

Thanks
Laurence 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dm-devel] dm-mq and end_clone_request()
  2016-08-10 21:38                                                                           ` Laurence Oberman
@ 2016-08-11 16:51                                                                             ` Laurence Oberman
  0 siblings, 0 replies; 74+ messages in thread
From: Laurence Oberman @ 2016-08-11 16:51 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, linux-scsi, Mike Snitzer, Johannes Thumshirn



----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer" <snitzer@redhat.com>, "Johannes Thumshirn"
> <jthumshirn@suse.de>
> Sent: Wednesday, August 10, 2016 5:38:16 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman@redhat.com>
> > To: "Bart Van Assche" <bart.vanassche@sandisk.com>
> > Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer"
> > <snitzer@redhat.com>, "Johannes Thumshirn"
> > <jthumshirn@suse.de>
> > Sent: Tuesday, August 9, 2016 1:21:15 PM
> > Subject: Re: [dm-devel] dm-mq and end_clone_request()
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> > > To: "Laurence Oberman" <loberman@redhat.com>
> > > Cc: dm-devel@redhat.com, linux-scsi@vger.kernel.org, "Mike Snitzer"
> > > <snitzer@redhat.com>, "Johannes Thumshirn"
> > > <jthumshirn@suse.de>
> > > Sent: Tuesday, August 9, 2016 1:16:52 PM
> > > Subject: Re: [dm-devel] dm-mq and end_clone_request()
> > > 
> > > On 08/09/2016 10:12 AM, Laurence Oberman wrote:
> > > > I was talking about this patch
> > > >
> > > > --- a/drivers/scsi/scsi_scan.c
> > > > +++ b/drivers/scsi/scsi_scan.c
> > > > @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost)
> > > >   restart:
> > > >          spin_lock_irqsave(shost->host_lock, flags);
> > > >          list_for_each_entry(sdev, &shost->__devices, siblings) {
> > > > -                if (sdev->sdev_state == SDEV_DEL)
> > > > +                if (sdev->sdev_state == SDEV_DEL ||
> > > > scsi_device_get(sdev)
> > > > < 0)
> > > >                          continue;
> > > >                  spin_unlock_irqrestore(shost->host_lock, flags);
> > > >                  __scsi_remove_device(sdev);
> > > > +                scsi_device_put(sdev);
> > > >                  goto restart;
> > > >          }
> > > >          spin_unlock_irqrestore(shost->host_lock, flags);
> > > 
> > > Hello Laurence,
> > > 
> > > Did you run your tests with that patch applied? If so, it would help if
> > > you could rerun your tests without that patch. If the above patch makes
> > > a difference it means that it can happen that __scsi_remove_device()
> > > does not change the device state into SDEV_DEL. That's a bug and we need
> > > to know whether or not __scsi_remove_device() behaves correctly.
> > > 
> > > Thanks,
> > > 
> > > Bart.
> > > 
> > Yes Sir, I ran all yesterdays tests on your kernel with that patch applied.
> > Of course it may well just be luck/coincidence that the host delete race is
> > no longer happening
> > so I agree we need to re-run the tests so I will revert and re-run.
> > I will probably only get back to you tomorrow with the results.
> > 
> > Thanks
> > Laurence
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> Hello Bart
> 
> I only just got time now to revert that patch and build a kernel.
> Will test this tonight and let you know if I am back to seeing panics
> sporadically without the patch.
> As already mentioned, this is a different configuration to what I had when I
> was able to reproduce the panic.
> This means the lack of hitting this stack trace and panic may turn out to
> have nothing to do with the patch I applied.
> 
> Thanks
> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello Bart

I can no longer reproduce the stack even with the patch reverted so its behaving as you expected and the patch is as you already said, not valid.
I ran about 30 fio tests with your kernel and multiple host deletions and and did experience only one hard fio error.
My tests now produce the same results as you are seeing.

The single fio errors was with many more executions of the test so its not easy to get these fio errors.

Away from tomorrow on vacation for 10 days

Thanks
Laurence

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2016-08-11 16:51 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 22:57 dm-mq and end_clone_request() Bart Van Assche
2016-07-20 14:08 ` Mike Snitzer
2016-07-20 14:27   ` Mike Snitzer
2016-07-20 17:37     ` Bart Van Assche
2016-07-20 18:33       ` Mike Snitzer
2016-07-21 20:58         ` [dm-devel] " Bart Van Assche
2016-07-25 17:53           ` Mike Snitzer
2016-07-25 21:23             ` Mike Snitzer
2016-07-25 22:00               ` Bart Van Assche
2016-07-26  1:16                 ` Mike Snitzer
2016-07-26 22:51                   ` Bart Van Assche
2016-07-27 14:08                     ` Mike Snitzer
2016-07-27 15:52                       ` [dm-devel] " Benjamin Marzinski
2016-07-27 19:06                         ` Bart Van Assche
2016-07-27 19:54                           ` Mike Snitzer
2016-07-27 20:09                   ` Mike Snitzer
2016-07-27 23:05                     ` Bart Van Assche
2016-07-28 13:33                       ` Mike Snitzer
2016-07-28 15:23                         ` Bart Van Assche
2016-07-28 15:40                           ` Mike Snitzer
2016-07-29  6:28                             ` [dm-devel] " Hannes Reinecke
2016-07-26  6:02             ` Hannes Reinecke
2016-07-26 16:06               ` Mike Snitzer
     [not found]         ` <317679447.7168375.1469729769593.JavaMail.zimbra@redhat.com>
     [not found]           ` <6880321d-e14f-169b-d100-6e460dd9bd09@sandisk.com>
     [not found]             ` <1110327939.7305916.1469819453678.JavaMail.zimbra@redhat.com>
     [not found]               ` <a5c1a149-b1a2-b5a4-2207-bdaf32db3cbd@sandisk.com>
     [not found]                 ` <757522831.7667712.1470059860543.JavaMail.zimbra@redhat.com>
     [not found]                   ` <536022978.7668211.1470060125271.JavaMail.zimbra@redhat.com>
     [not found]                     ` <931235537.7668834.1470060339483.JavaMail.zimbra@redhat.com>
     [not found]                       ` <1264951811.7684268.1470065187014.JavaMail.zimbra@redhat.com>
     [not found]                         ` <17da3ab0-233a-2cec-f921-bfd42c953ccc@sandisk.com>
2016-08-01 17:59                           ` Mike Snitzer
2016-08-01 18:55                             ` Bart Van Assche
2016-08-01 19:15                               ` Mike Snitzer
2016-08-01 20:46                               ` Mike Snitzer
2016-08-01 22:41                                 ` Bart Van Assche
2016-08-01 22:41                                   ` Bart Van Assche
2016-08-02 17:45                                   ` Mike Snitzer
2016-08-03  0:19                                     ` Bart Van Assche
2016-08-03  0:40                                       ` Mike Snitzer
2016-08-03  1:33                                         ` Laurence Oberman
2016-08-03  2:10                                           ` Mike Snitzer
2016-08-03  2:18                                             ` Laurence Oberman
2016-08-03  2:55                                               ` Laurence Oberman
2016-08-03 15:10                                                 ` Laurence Oberman
2016-08-03 16:06                                           ` Bart Van Assche
2016-08-03 17:25                                             ` Laurence Oberman
2016-08-03 18:03                                             ` [dm-devel] " Laurence Oberman
2016-08-03 16:55                                         ` Bart Van Assche
2016-08-04  9:53                                           ` Hannes Reinecke
2016-08-04 10:09                                             ` Hannes Reinecke
2016-08-04 15:10                                               ` Mike Snitzer
2016-08-04 16:10                                           ` Mike Snitzer
2016-08-04 17:42                                             ` Bart Van Assche
2016-08-04 23:58                                               ` Mike Snitzer
2016-08-05  1:07                                                 ` Laurence Oberman
2016-08-05 11:43                                                   ` Laurence Oberman
2016-08-05 15:39                                                     ` Laurence Oberman
2016-08-05 15:43                                                       ` Bart Van Assche
2016-08-05 18:42                                                     ` [dm-devel] " Bart Van Assche
2016-08-06 14:47                                                       ` Laurence Oberman
2016-08-07 22:31                                                         ` [dm-devel] " Bart Van Assche
2016-08-08 12:45                                                           ` Laurence Oberman
2016-08-08 13:44                                                             ` Johannes Thumshirn
2016-08-08 13:44                                                               ` Johannes Thumshirn
2016-08-08 14:32                                                               ` Laurence Oberman
2016-08-08 14:54                                                               ` Bart Van Assche
2016-08-08 15:11                                                         ` Bart Van Assche
2016-08-08 15:26                                                           ` Laurence Oberman
2016-08-08 15:28                                                             ` Bart Van Assche
2016-08-08 22:39                                                             ` Bart Van Assche
2016-08-08 22:52                                                               ` Laurence Oberman
2016-08-09  0:09                                                                 ` Laurence Oberman
2016-08-09 15:51                                                                   ` Bart Van Assche
2016-08-09 17:12                                                                     ` [dm-devel] " Laurence Oberman
2016-08-09 17:16                                                                       ` Bart Van Assche
2016-08-09 17:21                                                                         ` Laurence Oberman
2016-08-10 21:38                                                                           ` Laurence Oberman
2016-08-11 16:51                                                                             ` Laurence Oberman
2016-08-05 18:40                                                 ` Bart Van Assche
2016-07-21 20:32       ` Mike Snitzer
2016-07-21 20:40         ` [dm-devel] " Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.