All of lore.kernel.org
 help / color / mirror / Atom feed
* v4.8 dm-mpath
@ 2016-08-16 17:32 Bart Van Assche
  2016-08-16 19:12 ` Mike Snitzer
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2016-08-16 17:32 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Hello Mike,

If I trigger failover and failback with kernel v4.8-rc2 and ib_srp then I
see the following:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
IP: [<ffffffff8130d03b>] blk_mq_insert_request+0x3b/0xc0
CPU: 4 PID: 12606 Comm: kdmwork-254:1 Not tainted 4.8.0-rc2-dbg+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
task: ffff880363d3e240 task.stack: ffff8803618ac000
RIP: 0010:[<ffffffff8130d03b>]  [<ffffffff8130d03b>] blk_mq_insert_request+0x3b/0xc0
Call Trace:
 [<ffffffff81300da9>] blk_insert_cloned_request+0xa9/0x1e0
 [<ffffffffa04302f0>] map_request+0x190/0x2d0 [dm_mod]
 [<ffffffffa043044d>] map_tio_request+0x1d/0x40 [dm_mod]
 [<ffffffff81087101>] kthread_worker_fn+0xd1/0x1b0
 [<ffffffff81086fba>] kthread+0xea/0x100
 [<ffffffff8162d53f>] ret_from_fork+0x1f/0x40

(gdb) list *(blk_mq_insert_request+0x3b)
0xffffffff8130d03b is in blk_mq_insert_request (block/blk-mq.c:1078).
1073            struct request_queue *q = rq->q;
1074            struct blk_mq_hw_ctx *hctx;
1075            struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx;
1076
1077            current_ctx = blk_mq_get_ctx(q);
1078            if (!cpu_online(ctx->cpu))
1079                    rq->mq_ctx = ctx = current_ctx;
1080
1081            hctx = q->mq_ops->map_queue(q, ctx->cpu);
1082
(gdb) print &((struct blk_mq_ctx*)0)->cpu
$1 = (unsigned int *) 0x80

I think this means that ctx->cpu == NULL was hit.

This was observed with the same test software and configuration I
used for my kernel v4.7 tests (CONFIG_SCSI_MQ_DEFAULT=y and
CONFIG_DM_MQ_DEFAULT=n).

The above callstack was not observed while testing kernel v4.7.

Can you have a look at this?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v4.8 dm-mpath
  2016-08-16 17:32 v4.8 dm-mpath Bart Van Assche
@ 2016-08-16 19:12 ` Mike Snitzer
  2016-08-17  2:43   ` Mike Snitzer
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2016-08-16 19:12 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Aug 16 2016 at  1:32pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> Hello Mike,
> 
> If I trigger failover and failback with kernel v4.8-rc2 and ib_srp then I
> see the following:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
> IP: [<ffffffff8130d03b>] blk_mq_insert_request+0x3b/0xc0
> CPU: 4 PID: 12606 Comm: kdmwork-254:1 Not tainted 4.8.0-rc2-dbg+ #1
> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
> task: ffff880363d3e240 task.stack: ffff8803618ac000
> RIP: 0010:[<ffffffff8130d03b>]  [<ffffffff8130d03b>] blk_mq_insert_request+0x3b/0xc0
> Call Trace:
>  [<ffffffff81300da9>] blk_insert_cloned_request+0xa9/0x1e0
>  [<ffffffffa04302f0>] map_request+0x190/0x2d0 [dm_mod]
>  [<ffffffffa043044d>] map_tio_request+0x1d/0x40 [dm_mod]
>  [<ffffffff81087101>] kthread_worker_fn+0xd1/0x1b0
>  [<ffffffff81086fba>] kthread+0xea/0x100
>  [<ffffffff8162d53f>] ret_from_fork+0x1f/0x40
> 
> (gdb) list *(blk_mq_insert_request+0x3b)
> 0xffffffff8130d03b is in blk_mq_insert_request (block/blk-mq.c:1078).
> 1073            struct request_queue *q = rq->q;
> 1074            struct blk_mq_hw_ctx *hctx;
> 1075            struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx;
> 1076
> 1077            current_ctx = blk_mq_get_ctx(q);
> 1078            if (!cpu_online(ctx->cpu))
> 1079                    rq->mq_ctx = ctx = current_ctx;
> 1080
> 1081            hctx = q->mq_ops->map_queue(q, ctx->cpu);
> 1082
> (gdb) print &((struct blk_mq_ctx*)0)->cpu
> $1 = (unsigned int *) 0x80
> 
> I think this means that ctx->cpu == NULL was hit.
> 
> This was observed with the same test software and configuration I
> used for my kernel v4.7 tests (CONFIG_SCSI_MQ_DEFAULT=y and
> CONFIG_DM_MQ_DEFAULT=n).
> 
> The above callstack was not observed while testing kernel v4.7.

I only applied the 3 patches that I asked you to include in your v4.7
kernel(s), which I made available with this branch:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7-mpath-fixes

Could be that something changed in v4.8 block core's blk-mq code that
needs to be taken into account, we'll see.

> Can you have a look at this?

I'll get linux-dm.git's 'dm-4.8' branch (v4.8-rc2) loaded on one of my
testbed systems to run the mptest testsuite.  It should provide coverage
for simple failover and failback.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v4.8 dm-mpath
  2016-08-16 19:12 ` Mike Snitzer
@ 2016-08-17  2:43   ` Mike Snitzer
  2016-08-18  0:29     ` Bart Van Assche
  2016-08-25 21:06     ` Bart Van Assche
  0 siblings, 2 replies; 10+ messages in thread
From: Mike Snitzer @ 2016-08-17  2:43 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, Aug 16 2016 at  3:12pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Aug 16 2016 at  1:32pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > Can you have a look at this?
> 
> I'll get linux-dm.git's 'dm-4.8' branch (v4.8-rc2) loaded on one of my
> testbed systems to run the mptest testsuite.  It should provide coverage
> for simple failover and failback.

I successfully ran mptest against that v4.8-rc2 based kernel.  Any
chance you could try mptest to see if it also passes for you?  Or if it
is able to trigger the issue for you.

mptest's runtest script defaults to setting up scsi-mq and dm-mq.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v4.8 dm-mpath
  2016-08-17  2:43   ` Mike Snitzer
@ 2016-08-18  0:29     ` Bart Van Assche
  2016-08-18  1:54       ` Mike Snitzer
  2016-08-25 21:06     ` Bart Van Assche
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2016-08-18  0:29 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

On 08/16/2016 07:43 PM, Mike Snitzer wrote:
> On Tue, Aug 16 2016 at  3:12pm -0400, Mike Snitzer <snitzer@redhat.com> wrote:
>> On Tue, Aug 16 2016 at  1:32pm -0400, Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>>> Can you have a look at this?
>>
>> I'll get linux-dm.git's 'dm-4.8' branch (v4.8-rc2) loaded on one of my
>> testbed systems to run the mptest testsuite.  It should provide coverage
>> for simple failover and failback.
>
> I successfully ran mptest against that v4.8-rc2 based kernel.  Any
> chance you could try mptest to see if it also passes for you?  Or if it
> is able to trigger the issue for you.
>
> mptest's runtest script defaults to setting up scsi-mq and dm-mq.

Hello Mike,

I will run mptest as soon as I have the time.

Earlier today I came up with three patches that avoid the hang in 
truncate_inode_pages_range() that I had reported before. It would be 
appreciated if you could have a look at these patches. Please keep in 
mind that I'm not a dm expert.

Thanks,

Bart.


[-- Attachment #2: 0001-block-Avoid-that-requests-get-stuck-when-stopping-an.patch --]
[-- Type: text/x-patch, Size: 2176 bytes --]

From 0e8e9f7d7489f5e3b0bf9e0c59257277d08a2ec0 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Wed, 17 Aug 2016 14:18:36 -0700
Subject: [PATCH 1/3] block: Avoid that requests get stuck when stopping and
 restarting a queue

If a driver calls blk_start_queue() after a queue has been marked
"dead", avoid that requests get stuck by invoking __blk_run_queue()
directly if QUEUE_FLAG_DEAD has been set.

While testing the dm-mpath driver I noticed that sporadically a
readahead request was generated by the filesystem on top of the dm
device but that that I/O request was never processed. This happened
only while the dm core was suspending and resuming I/O.s

The following backtrace (triggered by a WARN_ON_ONCE(true) statement
that I had inserted just before the __blk_run_queue(q) call) shows
that this code path is relevant:

WARNING: CPU: 7 PID: 4707 at drivers/md/dm.c:2035 map_request+0x1d5/0x340 [dm_mod]
Aug 17 15:47:35 ion-dev-ib-ini kernel: CPU: 7 PID: 4707 Comm: kdmwork-254:0 Tainted: G        W       4.7.0-dbg+ #2
Call Trace:
 [<ffffffff81322ee7>] dump_stack+0x68/0xa1
 [<ffffffff81061e06>] __warn+0xc6/0xe0
 [<ffffffff81061ed8>] warn_slowpath_null+0x18/0x20
 [<ffffffffa04282d5>] map_request+0x1d5/0x340 [dm_mod]
 [<ffffffffa042845d>] map_tio_request+0x1d/0x40 [dm_mod]
 [<ffffffff81085fe1>] kthread_worker_fn+0xd1/0x1b0
 [<ffffffff81085e9a>] kthread+0xea/0x100
 [<ffffffff8162857f>] ret_from_fork+0x1f/0x40

References: commit 704605711ef0 ("block: Avoid scheduling delayed work on a dead queue")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: stable <stable@vger.kernel.org>
---
 block/blk-core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 0d9638e..5f75709 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -355,6 +355,8 @@ void blk_run_queue_async(struct request_queue *q)
 {
 	if (likely(!blk_queue_stopped(q) && !blk_queue_dead(q)))
 		mod_delayed_work(kblockd_workqueue, &q->delay_work, 0);
+	else
+		__blk_run_queue(q);
 }
 EXPORT_SYMBOL(blk_run_queue_async);
 
-- 
2.9.2


[-- Attachment #3: 0002-dm-Change-return-type-of-__dm_internal_suspend-to-in.patch --]
[-- Type: text/x-patch, Size: 1706 bytes --]

From 9b103cfb2d9b4cdec73c7d09188b4d42b041d49e Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Wed, 17 Aug 2016 16:26:02 -0700
Subject: [PATCH 2/3] dm: Change return type of __dm_internal_suspend() to int

---
 drivers/md/dm.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3689b7f..6eccdf3 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3379,16 +3379,18 @@ out:
  * It may be used only from the kernel.
  */
 
-static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_flags)
+static int __dm_internal_suspend(struct mapped_device *md,
+				 unsigned suspend_flags)
 {
 	struct dm_table *map = NULL;
+	int ret = 0;
 
 	if (md->internal_suspend_count++)
-		return; /* nested internal suspend */
+		goto out; /* nested internal suspend */
 
 	if (dm_suspended_md(md)) {
 		set_bit(DMF_SUSPENDED_INTERNALLY, &md->flags);
-		return; /* nest suspend */
+		goto out; /* nest suspend */
 	}
 
 	map = rcu_dereference_protected(md->map, lockdep_is_held(&md->suspend_lock));
@@ -3399,10 +3401,13 @@ static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_fla
 	 * would require changing .presuspend to return an error -- avoid this
 	 * until there is a need for more elaborate variants of internal suspend.
 	 */
-	(void) __dm_suspend(md, map, suspend_flags, TASK_UNINTERRUPTIBLE,
-			    DMF_SUSPENDED_INTERNALLY);
+	ret = __dm_suspend(md, map, suspend_flags, TASK_UNINTERRUPTIBLE,
+			   DMF_SUSPENDED_INTERNALLY);
 
 	dm_table_postsuspend_targets(map);
+
+out:
+	return ret;
 }
 
 static void __dm_internal_resume(struct mapped_device *md)
-- 
2.9.2


[-- Attachment #4: 0003-dm-Avoid-that-requests-get-stuck-while-destroying-a-.patch --]
[-- Type: text/x-patch, Size: 1911 bytes --]

From f6e58e71eec584b3f66fe8367340ca896d29408e Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Wed, 17 Aug 2016 15:26:07 -0700
Subject: [PATCH 3/3] dm: Avoid that requests get stuck while destroying a dm
 device

---
 drivers/md/dm.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 6eccdf3..ae4bfa0 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2958,10 +2958,13 @@ const char *dm_device_name(struct mapped_device *md)
 }
 EXPORT_SYMBOL_GPL(dm_device_name);
 
+static int __dm_internal_suspend(struct mapped_device *md,
+				 unsigned suspend_flags);
+
 static void __dm_destroy(struct mapped_device *md, bool wait)
 {
-	struct dm_table *map;
-	int srcu_idx;
+	struct request_queue *q = dm_get_md_queue(md);
+	int res;
 
 	might_sleep();
 
@@ -2970,21 +2973,21 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
 	set_bit(DMF_FREEING, &md->flags);
 	spin_unlock(&_minor_lock);
 
-	if (dm_request_based(md) && md->kworker_task)
-		flush_kthread_worker(&md->kworker);
+	/*
+	 * Avoid that dm_make_request() gets called after
+	 * __dm_internal_suspend() finished.
+	 */
+	spin_lock_irq(q->queue_lock);
+	queue_flag_set(QUEUE_FLAG_DYING, q);
+	spin_unlock_irq(q->queue_lock);
 
 	/*
 	 * Take suspend_lock so that presuspend and postsuspend methods
 	 * do not race with internal suspend.
 	 */
 	mutex_lock(&md->suspend_lock);
-	map = dm_get_live_table(md, &srcu_idx);
-	if (!dm_suspended_md(md)) {
-		dm_table_presuspend_targets(map);
-		dm_table_postsuspend_targets(map);
-	}
-	/* dm_put_live_table must be before msleep, otherwise deadlock is possible */
-	dm_put_live_table(md, srcu_idx);
+	res = __dm_internal_suspend(md, DM_SUSPEND_NOFLUSH_FLAG);
+	WARN_ONCE(res, "__dm_internal_suspend() returned %d\n", res);
 	mutex_unlock(&md->suspend_lock);
 
 	/*
-- 
2.9.2


[-- Attachment #5: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: v4.8 dm-mpath
  2016-08-18  0:29     ` Bart Van Assche
@ 2016-08-18  1:54       ` Mike Snitzer
  2016-08-25 17:40         ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2016-08-18  1:54 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Wed, Aug 17 2016 at  8:29pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 08/16/2016 07:43 PM, Mike Snitzer wrote:
> >On Tue, Aug 16 2016 at  3:12pm -0400, Mike Snitzer <snitzer@redhat.com> wrote:
> >>On Tue, Aug 16 2016 at  1:32pm -0400, Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>>Can you have a look at this?
> >>
> >>I'll get linux-dm.git's 'dm-4.8' branch (v4.8-rc2) loaded on one of my
> >>testbed systems to run the mptest testsuite.  It should provide coverage
> >>for simple failover and failback.
> >
> >I successfully ran mptest against that v4.8-rc2 based kernel.  Any
> >chance you could try mptest to see if it also passes for you?  Or if it
> >is able to trigger the issue for you.
> >
> >mptest's runtest script defaults to setting up scsi-mq and dm-mq.
> 
> Hello Mike,
> 
> I will run mptest as soon as I have the time.

OK, thanks.
 
> Earlier today I came up with three patches that avoid the hang in
> truncate_inode_pages_range() that I had reported before. It would be
> appreciated if you could have a look at these patches. Please keep
> in mind that I'm not a dm expert.

I don't recall where you mentioned a hang in
truncate_inode_pages_range().. ah I see it here:
https://patchwork.kernel.org/patch/9202331/

Comments inlined below.

> From 0e8e9f7d7489f5e3b0bf9e0c59257277d08a2ec0 Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@sandisk.com>
> Date: Wed, 17 Aug 2016 14:18:36 -0700
> Subject: [PATCH 1/3] block: Avoid that requests get stuck when stopping and
>  restarting a queue
> 
> If a driver calls blk_start_queue() after a queue has been marked
> "dead", avoid that requests get stuck by invoking __blk_run_queue()
> directly if QUEUE_FLAG_DEAD has been set.
> 
> While testing the dm-mpath driver I noticed that sporadically a
> readahead request was generated by the filesystem on top of the dm
> device but that that I/O request was never processed. This happened
> only while the dm core was suspending and resuming I/O.s
> 
> The following backtrace (triggered by a WARN_ON_ONCE(true) statement
> that I had inserted just before the __blk_run_queue(q) call) shows
> that this code path is relevant:
> 
> WARNING: CPU: 7 PID: 4707 at drivers/md/dm.c:2035 map_request+0x1d5/0x340 [dm_mod]
> Aug 17 15:47:35 ion-dev-ib-ini kernel: CPU: 7 PID: 4707 Comm: kdmwork-254:0 Tainted: G        W       4.7.0-dbg+ #2
> Call Trace:
>  [<ffffffff81322ee7>] dump_stack+0x68/0xa1
>  [<ffffffff81061e06>] __warn+0xc6/0xe0
>  [<ffffffff81061ed8>] warn_slowpath_null+0x18/0x20
>  [<ffffffffa04282d5>] map_request+0x1d5/0x340 [dm_mod]
>  [<ffffffffa042845d>] map_tio_request+0x1d/0x40 [dm_mod]
>  [<ffffffff81085fe1>] kthread_worker_fn+0xd1/0x1b0
>  [<ffffffff81085e9a>] kthread+0xea/0x100
>  [<ffffffff8162857f>] ret_from_fork+0x1f/0x40
> 
> References: commit 704605711ef0 ("block: Avoid scheduling delayed work on a dead queue")
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: stable <stable@vger.kernel.org>
> ---
>  block/blk-core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 0d9638e..5f75709 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -355,6 +355,8 @@ void blk_run_queue_async(struct request_queue *q)
>  {
>  	if (likely(!blk_queue_stopped(q) && !blk_queue_dead(q)))
>  		mod_delayed_work(kblockd_workqueue, &q->delay_work, 0);
> +	else
> +		__blk_run_queue(q);
>  }
>  EXPORT_SYMBOL(blk_run_queue_async);

Your else clause would catch the case when the queue is stopped but
not dead... __blk_run_queue() will handle this cleanly but I think the
logic could be: else if (blk_queue_dead(q))

Also, strikes me as odd to have an _async interface resort to a
synchronous call.  Why is that needed?  Are you sure it is safe for all
callers?

> From 9b103cfb2d9b4cdec73c7d09188b4d42b041d49e Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@sandisk.com>
> Date: Wed, 17 Aug 2016 16:26:02 -0700
> Subject: [PATCH 2/3] dm: Change return type of __dm_internal_suspend() to int
> 
> ---
>  drivers/md/dm.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3689b7f..6eccdf3 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -3379,16 +3379,18 @@ out:
>   * It may be used only from the kernel.
>   */
>  
> -static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_flags)
> +static int __dm_internal_suspend(struct mapped_device *md,
> +				 unsigned suspend_flags)
>  {
>  	struct dm_table *map = NULL;
> +	int ret = 0;
>  
>  	if (md->internal_suspend_count++)
> -		return; /* nested internal suspend */
> +		goto out; /* nested internal suspend */
>  
>  	if (dm_suspended_md(md)) {
>  		set_bit(DMF_SUSPENDED_INTERNALLY, &md->flags);
> -		return; /* nest suspend */
> +		goto out; /* nest suspend */
>  	}
>  
>  	map = rcu_dereference_protected(md->map, lockdep_is_held(&md->suspend_lock));
> @@ -3399,10 +3401,13 @@ static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_fla
>  	 * would require changing .presuspend to return an error -- avoid this
>  	 * until there is a need for more elaborate variants of internal suspend.
>  	 */
> -	(void) __dm_suspend(md, map, suspend_flags, TASK_UNINTERRUPTIBLE,
> -			    DMF_SUSPENDED_INTERNALLY);
> +	ret = __dm_suspend(md, map, suspend_flags, TASK_UNINTERRUPTIBLE,
> +			   DMF_SUSPENDED_INTERNALLY);
>  
>  	dm_table_postsuspend_targets(map);
> +
> +out:
> +	return ret;
>  }
>  
>  static void __dm_internal_resume(struct mapped_device *md)

So you're only wanting to have your new __dm_internal_suspend() caller,
__dm_destroy, issue a WARN_ON if __dm_suspend() fails.

Is this _really_ important or did you just do this for debugging
purposes?


> From f6e58e71eec584b3f66fe8367340ca896d29408e Mon Sep 17 00:00:00 2001
> From: Bart Van Assche <bart.vanassche@sandisk.com>
> Date: Wed, 17 Aug 2016 15:26:07 -0700
> Subject: [PATCH 3/3] dm: Avoid that requests get stuck while destroying a dm
>  device
> 
> ---
>  drivers/md/dm.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 6eccdf3..ae4bfa0 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2958,10 +2958,13 @@ const char *dm_device_name(struct mapped_device *md)
>  }
>  EXPORT_SYMBOL_GPL(dm_device_name);
>  
> +static int __dm_internal_suspend(struct mapped_device *md,
> +				 unsigned suspend_flags);
> +
>  static void __dm_destroy(struct mapped_device *md, bool wait)
>  {
> -	struct dm_table *map;
> -	int srcu_idx;
> +	struct request_queue *q = dm_get_md_queue(md);
> +	int res;
>  
>  	might_sleep();
>  
> @@ -2970,21 +2973,21 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
>  	set_bit(DMF_FREEING, &md->flags);
>  	spin_unlock(&_minor_lock);
>  
> -	if (dm_request_based(md) && md->kworker_task)
> -		flush_kthread_worker(&md->kworker);

Header needs detail.  E.g. __dm_suspend() via, __dm_internal_suspend(),
will call flush_kthread_worker().  But that is a nit in the grand scheme
of things.  Bigger concerns below.

> +	/*
> +	 * Avoid that dm_make_request() gets called after
> +	 * __dm_internal_suspend() finished.
> +	 */
> +	spin_lock_irq(q->queue_lock);
> +	queue_flag_set(QUEUE_FLAG_DYING, q);
> +	spin_unlock_irq(q->queue_lock);

Feels like the case for marking the queue as DYING can be made
independent of changes to the mechanism for flushing IO here.

>  
>  	/*
>  	 * Take suspend_lock so that presuspend and postsuspend methods
>  	 * do not race with internal suspend.
>  	 */
>  	mutex_lock(&md->suspend_lock);
> -	map = dm_get_live_table(md, &srcu_idx);
> -	if (!dm_suspended_md(md)) {
> -		dm_table_presuspend_targets(map);
> -		dm_table_postsuspend_targets(map);
> -	}
> -	/* dm_put_live_table must be before msleep, otherwise deadlock is possible */
> -	dm_put_live_table(md, srcu_idx);
> +	res = __dm_internal_suspend(md, DM_SUSPEND_NOFLUSH_FLAG);
> +	WARN_ONCE(res, "__dm_internal_suspend() returned %d\n", res);
>  	mutex_unlock(&md->suspend_lock);
>  
>  	/*

Please be explicit about why switching to using __dm_internal_suspend()
is important.  Are you just going for code reuse or is there a more
important reason?

I can infer you probably want the call to dm_stop_queue()...

But I need way more convincing on why __dm_internal_suspend() is needed.
As you can see from the header for commit ffcc3936416, the internal
suspend code is relatively complex.  Increasing its use, especially in a
method such as __dm_destroy, needs a good reason.

But that aside, what raises serious concerns for me is that you're using
DM_SUSPEND_NOFLUSH_FLAG where we absolutely _need_ to flush IO (given
we're in __dm_destroy).

At a minimum DM_SUSPEND_NOFLUSH_FLAG must not be used.

Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v4.8 dm-mpath
  2016-08-18  1:54       ` Mike Snitzer
@ 2016-08-25 17:40         ` Bart Van Assche
  2016-08-26 14:26           ` Mike Snitzer
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2016-08-25 17:40 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 08/17/2016 06:54 PM, Mike Snitzer wrote:
> On Wed, Aug 17 2016 at  8:29pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> Earlier today I came up with three patches that avoid the hang in
>> truncate_inode_pages_range() that I had reported before. It would be
>> appreciated if you could have a look at these patches. Please keep
>> in mind that I'm not a dm expert.
>
> I don't recall where you mentioned a hang in
> truncate_inode_pages_range().. ah I see it here:
> https://patchwork.kernel.org/patch/9202331/

Hello Mike,

As usual, thanks for the quick feedback. But it seems like I sent my 
e-mail too soon: after I had sent my e-mail I ran again into the 
truncate_inode_pages_range() hang. So far I have run the fio + simulated 
cable pulling test with the following configurations:
* scsi-mq + single queue dm: hang occurs after a few iterations.
* scsi-mq + dm-mq: hang occurs after a few iterations.
* single queue scsi + dm-mq: test passes.

I will have another look at the blk-mq core and scsi-mq core layers.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v4.8 dm-mpath
  2016-08-17  2:43   ` Mike Snitzer
  2016-08-18  0:29     ` Bart Van Assche
@ 2016-08-25 21:06     ` Bart Van Assche
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2016-08-25 21:06 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 08/16/2016 07:43 PM, Mike Snitzer wrote:
> On Tue, Aug 16 2016 at  3:12pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
>> On Tue, Aug 16 2016 at  1:32pm -0400,
>> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>>
>>> Can you have a look at this?
>>
>> I'll get linux-dm.git's 'dm-4.8' branch (v4.8-rc2) loaded on one of my
>> testbed systems to run the mptest testsuite.  It should provide coverage
>> for simple failover and failback.
>
> I successfully ran mptest against that v4.8-rc2 based kernel.  Any
> chance you could try mptest to see if it also passes for you?  Or if it
> is able to trigger the issue for you.
>
> mptest's runtest script defaults to setting up scsi-mq and dm-mq.

Hello Mike,

mptest on top of scsi_debug passes on v4.8-rc3 on the same system I used 
to run my SRP tests:

# ./runtest
[ ... ]
PASSED:  test_00_no_failure test_01_sdev_offline test_02_sdev_delete 
test_03_dm_failpath test_04_dm_switchpg
FAILED:

My SRP tests pass on the same setup if I disable scsi-mq. However, I 
still sporadically see I/O errors if I run srp-test 02 on kernel 
v4.8-rc3 with scsi-mq and dm-mq:
# srp-test/run_tests -t 02
[ ... ]
verify: bad magic header 0, wanted acca at file 
/root/mnt1/data-integrity-test.1.0 offset 581632, length 4096
fio: io_u error on file /root/mnt1/data-integrity-test.8.0: Input/output 
error: write offset=8888320, buflen=4096

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v4.8 dm-mpath
  2016-08-25 17:40         ` Bart Van Assche
@ 2016-08-26 14:26           ` Mike Snitzer
  2016-08-26 15:33             ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2016-08-26 14:26 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Thu, Aug 25 2016 at  1:40pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 08/17/2016 06:54 PM, Mike Snitzer wrote:
> >On Wed, Aug 17 2016 at  8:29pm -0400,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>Earlier today I came up with three patches that avoid the hang in
> >>truncate_inode_pages_range() that I had reported before. It would be
> >>appreciated if you could have a look at these patches. Please keep
> >>in mind that I'm not a dm expert.
> >
> >I don't recall where you mentioned a hang in
> >truncate_inode_pages_range().. ah I see it here:
> >https://patchwork.kernel.org/patch/9202331/
> 
> Hello Mike,
> 
> As usual, thanks for the quick feedback. But it seems like I sent my
> e-mail too soon: after I had sent my e-mail I ran again into the
> truncate_inode_pages_range() hang.

I was skeptical your 3 earlier patches (particularly the __dm_destroy to
use internel suspend patch) would fix anything you care about in your
testing.  __dm_destroy is only used once all references on the DM mpath
device are dropped.  When you do your fio + cable pull tests you're just
bouncing underlying paths around.  You aren't _ever_ destroying the
multipath device.  That is why your __dm_destroy patch seemed off the
mark to me.

> So far I have run the fio +
> simulated cable pulling test with the following configurations:
> * scsi-mq + single queue dm: hang occurs after a few iterations.
> * scsi-mq + dm-mq: hang occurs after a few iterations.
> * single queue scsi + dm-mq: test passes.
> 
> I will have another look at the blk-mq core and scsi-mq core layers.

OK.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v4.8 dm-mpath
  2016-08-26 14:26           ` Mike Snitzer
@ 2016-08-26 15:33             ` Bart Van Assche
  2016-08-26 16:35               ` Mike Snitzer
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2016-08-26 15:33 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 08/26/2016 07:26 AM, Mike Snitzer wrote:
> On Thu, Aug 25 2016 at  1:40pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> As usual, thanks for the quick feedback. But it seems like I sent my
>> e-mail too soon: after I had sent my e-mail I ran again into the
>> truncate_inode_pages_range() hang.
>
> I was skeptical your 3 earlier patches (particularly the __dm_destroy to
> use internel suspend patch) would fix anything you care about in your
> testing.  __dm_destroy is only used once all references on the DM mpath
> device are dropped.  When you do your fio + cable pull tests you're just
> bouncing underlying paths around.  You aren't _ever_ destroying the
> multipath device.  That is why your __dm_destroy patch seemed off the
> mark to me.

Hello Mike,

In case it wasn't clear, I want to drop the three patches you referred 
to. But I also want to clarify that my tests *do* trigger 
__dm_destroy(). If you have a look at the srp-test scripts then you will 
see that "dmsetup remove" is invoked after each test. What I see is that 
lock_page() and other page cache functions hang sporadically around the 
time the dm device is removed, most likely due to I/O that is submitted 
but never completed. That's why I started looking at the scsi-mq/blk-mq 
device removal code.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v4.8 dm-mpath
  2016-08-26 15:33             ` Bart Van Assche
@ 2016-08-26 16:35               ` Mike Snitzer
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2016-08-26 16:35 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Fri, Aug 26 2016 at 11:33am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 08/26/2016 07:26 AM, Mike Snitzer wrote:
> >On Thu, Aug 25 2016 at  1:40pm -0400,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>As usual, thanks for the quick feedback. But it seems like I sent my
> >>e-mail too soon: after I had sent my e-mail I ran again into the
> >>truncate_inode_pages_range() hang.
> >
> >I was skeptical your 3 earlier patches (particularly the __dm_destroy to
> >use internel suspend patch) would fix anything you care about in your
> >testing.  __dm_destroy is only used once all references on the DM mpath
> >device are dropped.  When you do your fio + cable pull tests you're just
> >bouncing underlying paths around.  You aren't _ever_ destroying the
> >multipath device.  That is why your __dm_destroy patch seemed off the
> >mark to me.
> 
> Hello Mike,
> 
> In case it wasn't clear, I want to drop the three patches you
> referred to. But I also want to clarify that my tests *do* trigger
> __dm_destroy(). If you have a look at the srp-test scripts then you
> will see that "dmsetup remove" is invoked after each test. What I
> see is that lock_page() and other page cache functions hang
> sporadically around the time the dm device is removed, most likely
> due to I/O that is submitted but never completed. That's why I
> started looking at the scsi-mq/blk-mq device removal code.

We're going round and round with a test that doesn't reflect 99% of the
usage that DM multipath sees.  I think we need to take a step back and
re-evaluate the test in question.

Could well be that there is some problem with outstanding IO racing with
DM multipath device removal.  BUT I'd really appreciate it if you could
make the 'dmsetup remove' phase secondary.  You're welcome to keep the
test you have (with DM device removal mixed with IO), make it
configurable with a flag or whatever, but it strikes me as much more of
a niche concern.  Not dismissing the need to make whatever it is you're
doing work.. but we're seriously conflating all the variables in play.

Customers aren't removing their multipath devices a lot.  So can we do
this?

test step1:
Lets at least verify that DM multipath fault handling capabilities
during normal IO in the face of cable pulls is reliable (be them
syntehtic pulls or real).

test step2:
Once the IO completes (after paths are restored) and fio ends _then_
DM multipath devices can be removed.

You'll note that all mptest tests follow this 2 step pattern.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-08-26 16:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-16 17:32 v4.8 dm-mpath Bart Van Assche
2016-08-16 19:12 ` Mike Snitzer
2016-08-17  2:43   ` Mike Snitzer
2016-08-18  0:29     ` Bart Van Assche
2016-08-18  1:54       ` Mike Snitzer
2016-08-25 17:40         ` Bart Van Assche
2016-08-26 14:26           ` Mike Snitzer
2016-08-26 15:33             ` Bart Van Assche
2016-08-26 16:35               ` Mike Snitzer
2016-08-25 21:06     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.