All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
@ 2018-03-28  1:20 Ming Lei
  2018-03-28  3:22 ` Jens Axboe
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-03-28  1:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Stefan Haberland,
	Christian Borntraeger, Ming Lei, Christoph Hellwig

>From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
with each possisble CPU") on, it should be easier to see unmapped hctx
in some CPU topo, such as, hctx may not be mapped to any CPU.

This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
checking if the hctx is mapped in blk_mq_run_hw_queues().

blk_mq_run_hw_queues() is often run in SCSI or some driver's completion
path, so this warning has to be addressed.

Reported-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Fixes: 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..48f25a63833b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1459,7 +1459,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 	int i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
-		if (blk_mq_hctx_stopped(hctx))
+		if (blk_mq_hctx_stopped(hctx) || !blk_mq_hw_queue_mapped(hctx))
 			continue;
 
 		blk_mq_run_hw_queue(hctx, async);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-28  1:20 [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() Ming Lei
@ 2018-03-28  3:22 ` Jens Axboe
  2018-03-28  7:45   ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2018-03-28  3:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, Christoph Hellwig, Stefan Haberland,
	Christian Borntraeger, Christoph Hellwig

On 3/27/18 7:20 PM, Ming Lei wrote:
> From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
> with each possisble CPU") on, it should be easier to see unmapped hctx
> in some CPU topo, such as, hctx may not be mapped to any CPU.
> 
> This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
> checking if the hctx is mapped in blk_mq_run_hw_queues().
> 
> blk_mq_run_hw_queues() is often run in SCSI or some driver's completion
> path, so this warning has to be addressed.

I don't like this very much. You're catching just one particular case,
and if the hw queue has pending IO (for instance), then it's just wrong.

How about something like the below? Totally untested...

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..4c04ac124e5d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1307,6 +1307,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	int srcu_idx;
 
 	/*
+	 * Warn if the queue isn't mapped AND we have pending IO. Not being
+	 * mapped isn't necessarily a huge issue, if we don't have pending IO.
+	 */
+	if (!blk_mq_hw_queue_mapped(hctx) &&
+	    !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx)))
+		return;
+
+	/*
 	 * We should be running this queue from one of the CPUs that
 	 * are mapped to it.
 	 *

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-28  3:22 ` Jens Axboe
@ 2018-03-28  7:45   ` Christian Borntraeger
  2018-03-28 14:38     ` Jens Axboe
  2018-03-28 15:26     ` Ming Lei
  0 siblings, 2 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-28  7:45 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig

FWIW, this patch does not fix the issue for me:

ostname=? addr=? terminal=? res=success'
[   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
[   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
[   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
[   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
[   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
[   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
[   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
[   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
[   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
[   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
                          000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
                         #000000000069c5a2: a7f40001		brc	15,69c5a4
                         >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
                          000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
                          000000000069c5b2: 07f4		bcr	15,%r4
                          000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
                          000000000069c5ba: a7f4fff6		brc	15,69c5a6
[   21.455067] Call Trace:
[   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
[   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
[   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
[   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
[   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
[   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
[   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
[   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
[   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
[   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
[   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
[   21.455136] Last Breaking-Event-Address:
[   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
[   21.455140] ---[ end trace be43f99a5d1e553e ]---
[   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring


On 03/28/2018 05:22 AM, Jens Axboe wrote:
> On 3/27/18 7:20 PM, Ming Lei wrote:
>> From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
>> with each possisble CPU") on, it should be easier to see unmapped hctx
>> in some CPU topo, such as, hctx may not be mapped to any CPU.
>>
>> This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
>> checking if the hctx is mapped in blk_mq_run_hw_queues().
>>
>> blk_mq_run_hw_queues() is often run in SCSI or some driver's completion
>> path, so this warning has to be addressed.
> 
> I don't like this very much. You're catching just one particular case,
> and if the hw queue has pending IO (for instance), then it's just wrong.
> 
> How about something like the below? Totally untested...
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 16e83e6df404..4c04ac124e5d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1307,6 +1307,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>  	int srcu_idx;
> 
>  	/*
> +	 * Warn if the queue isn't mapped AND we have pending IO. Not being
> +	 * mapped isn't necessarily a huge issue, if we don't have pending IO.
> +	 */
> +	if (!blk_mq_hw_queue_mapped(hctx) &&
> +	    !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx)))
> +		return;
> +
> +	/*
>  	 * We should be running this queue from one of the CPUs that
>  	 * are mapped to it.
>  	 *
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-28  7:45   ` Christian Borntraeger
@ 2018-03-28 14:38     ` Jens Axboe
  2018-03-28 14:53       ` Jens Axboe
  2018-03-28 15:26     ` Ming Lei
  1 sibling, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2018-03-28 14:38 UTC (permalink / raw)
  To: Christian Borntraeger, Ming Lei
  Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig

On 3/28/18 1:45 AM, Christian Borntraeger wrote:
> FWIW, this patch does not fix the issue for me:

Looks like I didn't do the delayed path. How about the below?


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..fd663ae1094c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1302,10 +1302,23 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 	return (queued + errors) != 0;
 }
 
+static bool blk_mq_bail_unmapped(struct blk_mq_hw_ctx *hctx)
+{
+	/*
+	 * Warn if the queue isn't mapped AND we have pending IO. Not being
+	 * mapped isn't necessarily a huge issue, if we don't have pending IO.
+	 */
+	return !blk_mq_hw_queue_mapped(hctx) &&
+		!WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx));
+}
+
 static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	int srcu_idx;
 
+	if (blk_mq_bail_unmapped(hctx))
+		return;
+
 	/*
 	 * We should be running this queue from one of the CPUs that
 	 * are mapped to it.
@@ -1399,9 +1412,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
 					unsigned long msecs)
 {
-	if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
+	if (blk_mq_bail_unmapped(hctx))
 		return;
-
 	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-28 14:38     ` Jens Axboe
@ 2018-03-28 14:53       ` Jens Axboe
  2018-03-28 15:38         ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Jens Axboe @ 2018-03-28 14:53 UTC (permalink / raw)
  To: Christian Borntraeger, Ming Lei
  Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig

On 3/28/18 8:38 AM, Jens Axboe wrote:
> On 3/28/18 1:45 AM, Christian Borntraeger wrote:
>> FWIW, this patch does not fix the issue for me:
> 
> Looks like I didn't do the delayed path. How about the below?

OK, final version... This is more in line with what I originally
suggested.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..c90016c36a70 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1306,6 +1306,10 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	int srcu_idx;
 
+	if (!blk_mq_hw_queue_mapped(hctx) &&
+	    !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx)))
+		return;
+
 	/*
 	 * We should be running this queue from one of the CPUs that
 	 * are mapped to it.
@@ -1399,9 +1403,6 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
 					unsigned long msecs)
 {
-	if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
-		return;
-
 	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 
@@ -1586,9 +1587,6 @@ static void blk_mq_run_work_fn(struct work_struct *work)
 
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
 {
-	if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
-		return;
-
 	/*
 	 * Stop the hw queue, then modify currently delayed work.
 	 * This should prevent us from running the queue prematurely.

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-28  7:45   ` Christian Borntraeger
  2018-03-28 14:38     ` Jens Axboe
@ 2018-03-28 15:26     ` Ming Lei
  2018-03-28 15:36       ` Christian Borntraeger
  1 sibling, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-03-28 15:26 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

Hi Christian,

On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> FWIW, this patch does not fix the issue for me:
> 
> ostname=? addr=? terminal=? res=success'
> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>                           000000000069c5b2: 07f4		bcr	15,%r4
>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> [   21.455067] Call Trace:
> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> [   21.455136] Last Breaking-Event-Address:
> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring

Thinking about this issue further, I can't understand the root cause for
this issue.

After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with
each possisble CPU"), each hw queue should be mapped to at least one CPU, that
means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?

Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
following command?

(cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;)

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-28 15:26     ` Ming Lei
@ 2018-03-28 15:36       ` Christian Borntraeger
  2018-03-28 15:44         ` Christian Borntraeger
  2018-03-29  2:00         ` Ming Lei
  0 siblings, 2 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-28 15:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 4631 bytes --]



On 03/28/2018 05:26 PM, Ming Lei wrote:
> Hi Christian,
> 
> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>> FWIW, this patch does not fix the issue for me:
>>
>> ostname=? addr=? terminal=? res=success'
>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>                           000000000069c5b2: 07f4		bcr	15,%r4
>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
>> [   21.455067] Call Trace:
>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
>> [   21.455136] Last Breaking-Event-Address:
>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> 
> Thinking about this issue further, I can't understand the root cause for
> this issue.
> 
> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with
> each possisble CPU"), each hw queue should be mapped to at least one CPU, that
> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?
> 
> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
> following command?

# lscpu
Architecture:        s390x
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Big Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s) per book:  3
Book(s) per drawer:  2
Drawer(s):           4
NUMA node(s):        1
Vendor ID:           IBM/S390
Machine type:        2964
CPU dynamic MHz:     5000
CPU static MHz:      5000
BogoMIPS:            20325.00
Hypervisor:          PR/SM
Hypervisor vendor:   IBM
Virtualization type: full
Dispatching mode:    horizontal
L1d cache:           128K
L1i cache:           96K
L2d cache:           2048K
L2i cache:           2048K
L3 cache:            65536K
L4 cache:            491520K
NUMA node0 CPU(s):   0-15
Flags:               esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie

# lsdasd 
Bus-ID     Status      Name      Device  Type  BlkSz  Size      Blocks
==============================================================================
0.0.3f75   active      dasda     94:0    ECKD  4096   21129MB   5409180
0.0.3f76   active      dasdb     94:4    ECKD  4096   21129MB   5409180
0.0.3f77   active      dasdc     94:8    ECKD  4096   21129MB   5409180
0.0.3f74   active      dasdd     94:12   ECKD  4096   21129MB   5409180

> 
> (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;)


see attachement:

[-- Attachment #2: log --]
[-- Type: text/plain, Size: 21552 bytes --]

dasda/range:4
dasda/capability:10
dasda/inflight:       0        0
dasda/ext_range:4
dasda/power/runtime_suspended_time:0
dasda/power/runtime_active_time:0
dasda/power/control:auto
dasda/power/runtime_status:unsupported
dasda/dev:94:0
dasda/hidden:0
dasda/ro:0
dasda/mq/7/nr_tags:1024
dasda/mq/7/nr_reserved_tags:0
dasda/mq/7/cpu_list:7
dasda/mq/15/nr_tags:1024
dasda/mq/15/nr_reserved_tags:0
dasda/mq/15/cpu_list:15
dasda/mq/5/nr_tags:1024
dasda/mq/5/nr_reserved_tags:0
dasda/mq/5/cpu_list:5
dasda/mq/13/nr_tags:1024
dasda/mq/13/nr_reserved_tags:0
dasda/mq/13/cpu_list:13
dasda/mq/3/nr_tags:1024
dasda/mq/3/nr_reserved_tags:0
dasda/mq/3/cpu_list:3
dasda/mq/11/nr_tags:1024
dasda/mq/11/nr_reserved_tags:0
dasda/mq/11/cpu_list:11
dasda/mq/1/nr_tags:1024
dasda/mq/1/nr_reserved_tags:0
dasda/mq/1/cpu_list:1
dasda/mq/8/nr_tags:1024
dasda/mq/8/nr_reserved_tags:0
dasda/mq/8/cpu_list:8
dasda/mq/6/nr_tags:1024
dasda/mq/6/nr_reserved_tags:0
dasda/mq/6/cpu_list:6
dasda/mq/14/nr_tags:1024
dasda/mq/14/nr_reserved_tags:0
dasda/mq/14/cpu_list:14
dasda/mq/4/nr_tags:1024
dasda/mq/4/nr_reserved_tags:0
dasda/mq/4/cpu_list:4
dasda/mq/12/nr_tags:1024
dasda/mq/12/nr_reserved_tags:0
dasda/mq/12/cpu_list:12
dasda/mq/2/nr_tags:1024
dasda/mq/2/nr_reserved_tags:0
dasda/mq/2/cpu_list:2
dasda/mq/10/nr_tags:1024
dasda/mq/10/nr_reserved_tags:0
dasda/mq/10/cpu_list:10
dasda/mq/0/nr_tags:1024
dasda/mq/0/nr_reserved_tags:0
dasda/mq/0/cpu_list:0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281
dasda/mq/9/nr_tags:1024
dasda/mq/9/nr_reserved_tags:0
dasda/mq/9/cpu_list:9
dasda/stat:     126        0    10728       60        0        0        0        0        0       10       20
dasda/removable:0
dasda/size:43273440
dasda/alignment_offset:0
dasda/queue/hw_sector_size:4096
dasda/queue/max_discard_segments:1
dasda/queue/max_segment_size:4096
dasda/queue/physical_block_size:4096
dasda/queue/discard_max_bytes:0
dasda/queue/rotational:0
dasda/queue/iosched/fifo_batch:16
dasda/queue/iosched/read_expire:500
dasda/queue/iosched/writes_starved:2
dasda/queue/iosched/write_expire:5000
dasda/queue/iosched/front_merges:1
dasda/queue/write_same_max_bytes:0
dasda/queue/zoned:none
dasda/queue/max_sectors_kb:760
dasda/queue/discard_zeroes_data:0
dasda/queue/read_ahead_kb:128
dasda/queue/discard_max_hw_bytes:0
dasda/queue/wbt_lat_usec:75000
dasda/queue/nomerges:0
dasda/queue/max_segments:65535
dasda/queue/rq_affinity:1
dasda/queue/iostats:1
dasda/queue/dax:0
dasda/queue/minimum_io_size:4096
dasda/queue/chunk_sectors:0
dasda/queue/io_poll:1
dasda/queue/write_zeroes_max_bytes:0
dasda/queue/max_hw_sectors_kb:760
dasda/queue/add_random:0
dasda/queue/optimal_io_size:0
dasda/queue/nr_requests:256
dasda/queue/scheduler:[mq-deadline] kyber none
dasda/queue/discard_granularity:0
dasda/queue/logical_block_size:4096
dasda/queue/io_poll_delay:-1
dasda/queue/max_integrity_segments:0
dasda/queue/write_cache:write through
dasda/trace/end_lba:disabled
dasda/trace/act_mask:disabled
dasda/trace/start_lba:disabled
dasda/trace/enable:0
dasda/trace/pid:disabled
dasda/uevent:MAJOR=94
dasda/uevent:MINOR=0
dasda/uevent:DEVNAME=dasda
dasda/uevent:DEVTYPE=disk
dasda/integrity/write_generate:0
dasda/integrity/device_is_integrity_capable:0
dasda/integrity/tag_size:0
dasda/integrity/read_verify:0
dasda/integrity/protection_interval_bytes:0
dasda/integrity/format:none
dasda/discard_alignment:0
dasda/dasda1/start:192
dasda/dasda1/inflight:       0        0
dasda/dasda1/power/runtime_suspended_time:0
dasda/dasda1/power/runtime_active_time:0
dasda/dasda1/power/control:auto
dasda/dasda1/power/runtime_status:unsupported
dasda/dasda1/dev:94:1
dasda/dasda1/ro:0
dasda/dasda1/partition:1
dasda/dasda1/stat:     115        0    10216       60        0        0        0        0        0       10       20
dasda/dasda1/size:43273248
dasda/dasda1/alignment_offset:0
dasda/dasda1/trace/end_lba:disabled
dasda/dasda1/trace/act_mask:disabled
dasda/dasda1/trace/start_lba:disabled
dasda/dasda1/trace/enable:0
dasda/dasda1/trace/pid:disabled
dasda/dasda1/uevent:MAJOR=94
dasda/dasda1/uevent:MINOR=1
dasda/dasda1/uevent:DEVNAME=dasda1
dasda/dasda1/uevent:DEVTYPE=partition
dasda/dasda1/uevent:PARTN=1
dasda/dasda1/discard_alignment:0
dasdb/range:4
dasdb/capability:10
dasdb/inflight:       0        0
dasdb/ext_range:4
dasdb/power/runtime_suspended_time:0
dasdb/power/runtime_active_time:0
dasdb/power/control:auto
dasdb/power/runtime_status:unsupported
dasdb/dev:94:4
dasdb/hidden:0
dasdb/ro:0
dasdb/mq/7/nr_tags:1024
dasdb/mq/7/nr_reserved_tags:0
dasdb/mq/7/cpu_list:7
dasdb/mq/15/nr_tags:1024
dasdb/mq/15/nr_reserved_tags:0
dasdb/mq/15/cpu_list:15
dasdb/mq/5/nr_tags:1024
dasdb/mq/5/nr_reserved_tags:0
dasdb/mq/5/cpu_list:5
dasdb/mq/13/nr_tags:1024
dasdb/mq/13/nr_reserved_tags:0
dasdb/mq/13/cpu_list:13
dasdb/mq/3/nr_tags:1024
dasdb/mq/3/nr_reserved_tags:0
dasdb/mq/3/cpu_list:3
dasdb/mq/11/nr_tags:1024
dasdb/mq/11/nr_reserved_tags:0
dasdb/mq/11/cpu_list:11
dasdb/mq/1/nr_tags:1024
dasdb/mq/1/nr_reserved_tags:0
dasdb/mq/1/cpu_list:1
dasdb/mq/8/nr_tags:1024
dasdb/mq/8/nr_reserved_tags:0
dasdb/mq/8/cpu_list:8
dasdb/mq/6/nr_tags:1024
dasdb/mq/6/nr_reserved_tags:0
dasdb/mq/6/cpu_list:6
dasdb/mq/14/nr_tags:1024
dasdb/mq/14/nr_reserved_tags:0
dasdb/mq/14/cpu_list:14
dasdb/mq/4/nr_tags:1024
dasdb/mq/4/nr_reserved_tags:0
dasdb/mq/4/cpu_list:4
dasdb/mq/12/nr_tags:1024
dasdb/mq/12/nr_reserved_tags:0
dasdb/mq/12/cpu_list:12
dasdb/mq/2/nr_tags:1024
dasdb/mq/2/nr_reserved_tags:0
dasdb/mq/2/cpu_list:2
dasdb/mq/10/nr_tags:1024
dasdb/mq/10/nr_reserved_tags:0
dasdb/mq/10/cpu_list:10
dasdb/mq/0/nr_tags:1024
dasdb/mq/0/nr_reserved_tags:0
dasdb/mq/0/cpu_list:0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281
dasdb/mq/9/nr_tags:1024
dasdb/mq/9/nr_reserved_tags:0
dasdb/mq/9/cpu_list:9
dasdb/stat:     129        0    10504       50        1        0        8        0        0       10       10
dasdb/removable:0
dasdb/size:43273440
dasdb/alignment_offset:0
dasdb/queue/hw_sector_size:4096
dasdb/queue/max_discard_segments:1
dasdb/queue/max_segment_size:4096
dasdb/queue/physical_block_size:4096
dasdb/queue/discard_max_bytes:0
dasdb/queue/rotational:0
dasdb/queue/iosched/fifo_batch:16
dasdb/queue/iosched/read_expire:500
dasdb/queue/iosched/writes_starved:2
dasdb/queue/iosched/write_expire:5000
dasdb/queue/iosched/front_merges:1
dasdb/queue/write_same_max_bytes:0
dasdb/queue/zoned:none
dasdb/queue/max_sectors_kb:760
dasdb/queue/discard_zeroes_data:0
dasdb/queue/read_ahead_kb:128
dasdb/queue/discard_max_hw_bytes:0
dasdb/queue/wbt_lat_usec:75000
dasdb/queue/nomerges:0
dasdb/queue/max_segments:65535
dasdb/queue/rq_affinity:1
dasdb/queue/iostats:1
dasdb/queue/dax:0
dasdb/queue/minimum_io_size:4096
dasdb/queue/chunk_sectors:0
dasdb/queue/io_poll:1
dasdb/queue/write_zeroes_max_bytes:0
dasdb/queue/max_hw_sectors_kb:760
dasdb/queue/add_random:0
dasdb/queue/optimal_io_size:0
dasdb/queue/nr_requests:256
dasdb/queue/scheduler:[mq-deadline] kyber none
dasdb/queue/discard_granularity:0
dasdb/queue/logical_block_size:4096
dasdb/queue/io_poll_delay:-1
dasdb/queue/max_integrity_segments:0
dasdb/queue/write_cache:write through
dasdb/trace/end_lba:disabled
dasdb/trace/act_mask:disabled
dasdb/trace/start_lba:disabled
dasdb/trace/enable:0
dasdb/trace/pid:disabled
dasdb/uevent:MAJOR=94
dasdb/uevent:MINOR=4
dasdb/uevent:DEVNAME=dasdb
dasdb/uevent:DEVTYPE=disk
dasdb/integrity/write_generate:0
dasdb/integrity/device_is_integrity_capable:0
dasdb/integrity/tag_size:0
dasdb/integrity/read_verify:0
dasdb/integrity/protection_interval_bytes:0
dasdb/integrity/format:none
dasdb/dasdb1/start:192
dasdb/dasdb1/inflight:       0        0
dasdb/dasdb1/power/runtime_suspended_time:0
dasdb/dasdb1/power/runtime_active_time:0
dasdb/dasdb1/power/control:auto
dasdb/dasdb1/power/runtime_status:unsupported
dasdb/dasdb1/dev:94:5
dasdb/dasdb1/ro:0
dasdb/dasdb1/partition:1
dasdb/dasdb1/stat:     118        0     9992       30        1        0        8        0        0        0        0
dasdb/dasdb1/size:43273248
dasdb/dasdb1/alignment_offset:0
dasdb/dasdb1/trace/end_lba:disabled
dasdb/dasdb1/trace/act_mask:disabled
dasdb/dasdb1/trace/start_lba:disabled
dasdb/dasdb1/trace/enable:0
dasdb/dasdb1/trace/pid:disabled
dasdb/dasdb1/uevent:MAJOR=94
dasdb/dasdb1/uevent:MINOR=5
dasdb/dasdb1/uevent:DEVNAME=dasdb1
dasdb/dasdb1/uevent:DEVTYPE=partition
dasdb/dasdb1/uevent:PARTN=1
dasdb/dasdb1/discard_alignment:0
dasdb/discard_alignment:0
dasdc/range:4
dasdc/capability:10
dasdc/inflight:       0        0
dasdc/ext_range:4
dasdc/power/runtime_suspended_time:0
dasdc/power/runtime_active_time:0
dasdc/power/control:auto
dasdc/power/runtime_status:unsupported
dasdc/dev:94:8
dasdc/hidden:0
dasdc/ro:0
dasdc/mq/7/nr_tags:1024
dasdc/mq/7/nr_reserved_tags:0
dasdc/mq/7/cpu_list:7
dasdc/mq/15/nr_tags:1024
dasdc/mq/15/nr_reserved_tags:0
dasdc/mq/15/cpu_list:15
dasdc/mq/5/nr_tags:1024
dasdc/mq/5/nr_reserved_tags:0
dasdc/mq/5/cpu_list:5
dasdc/mq/13/nr_tags:1024
dasdc/mq/13/nr_reserved_tags:0
dasdc/mq/13/cpu_list:13
dasdc/mq/3/nr_tags:1024
dasdc/mq/3/nr_reserved_tags:0
dasdc/mq/3/cpu_list:3
dasdc/mq/11/nr_tags:1024
dasdc/mq/11/nr_reserved_tags:0
dasdc/mq/11/cpu_list:11
dasdc/mq/1/nr_tags:1024
dasdc/mq/1/nr_reserved_tags:0
dasdc/mq/1/cpu_list:1
dasdc/mq/8/nr_tags:1024
dasdc/mq/8/nr_reserved_tags:0
dasdc/mq/8/cpu_list:8
dasdc/mq/6/nr_tags:1024
dasdc/mq/6/nr_reserved_tags:0
dasdc/mq/6/cpu_list:6
dasdc/mq/14/nr_tags:1024
dasdc/mq/14/nr_reserved_tags:0
dasdc/mq/14/cpu_list:14
dasdc/mq/4/nr_tags:1024
dasdc/mq/4/nr_reserved_tags:0
dasdc/mq/4/cpu_list:4
dasdc/mq/12/nr_tags:1024
dasdc/mq/12/nr_reserved_tags:0
dasdc/mq/12/cpu_list:12
dasdc/mq/2/nr_tags:1024
dasdc/mq/2/nr_reserved_tags:0
dasdc/mq/2/cpu_list:2
dasdc/mq/10/nr_tags:1024
dasdc/mq/10/nr_reserved_tags:0
dasdc/mq/10/cpu_list:10
dasdc/mq/0/nr_tags:1024
dasdc/mq/0/nr_reserved_tags:0
dasdc/mq/0/cpu_list:0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281
dasdc/mq/9/nr_tags:1024
dasdc/mq/9/nr_reserved_tags:0
dasdc/mq/9/cpu_list:9
dasdc/stat:     129        0    10504       50        1        0        8        0        0        0        0
dasdc/removable:0
dasdc/size:43273440
dasdc/alignment_offset:0
dasdc/queue/hw_sector_size:4096
dasdc/queue/max_discard_segments:1
dasdc/queue/max_segment_size:4096
dasdc/queue/physical_block_size:4096
dasdc/queue/discard_max_bytes:0
dasdc/queue/rotational:0
dasdc/queue/iosched/fifo_batch:16
dasdc/queue/iosched/read_expire:500
dasdc/queue/iosched/writes_starved:2
dasdc/queue/iosched/write_expire:5000
dasdc/queue/iosched/front_merges:1
dasdc/queue/write_same_max_bytes:0
dasdc/queue/zoned:none
dasdc/queue/max_sectors_kb:760
dasdc/queue/discard_zeroes_data:0
dasdc/queue/read_ahead_kb:128
dasdc/queue/discard_max_hw_bytes:0
dasdc/queue/wbt_lat_usec:75000
dasdc/queue/nomerges:0
dasdc/queue/max_segments:65535
dasdc/queue/rq_affinity:1
dasdc/queue/iostats:1
dasdc/queue/dax:0
dasdc/queue/minimum_io_size:4096
dasdc/queue/chunk_sectors:0
dasdc/queue/io_poll:1
dasdc/queue/write_zeroes_max_bytes:0
dasdc/queue/max_hw_sectors_kb:760
dasdc/queue/add_random:0
dasdc/queue/optimal_io_size:0
dasdc/queue/nr_requests:256
dasdc/queue/scheduler:[mq-deadline] kyber none
dasdc/queue/discard_granularity:0
dasdc/queue/logical_block_size:4096
dasdc/queue/io_poll_delay:-1
dasdc/queue/max_integrity_segments:0
dasdc/queue/write_cache:write through
dasdc/trace/end_lba:disabled
dasdc/trace/act_mask:disabled
dasdc/trace/start_lba:disabled
dasdc/trace/enable:0
dasdc/trace/pid:disabled
dasdc/uevent:MAJOR=94
dasdc/uevent:MINOR=8
dasdc/uevent:DEVNAME=dasdc
dasdc/uevent:DEVTYPE=disk
dasdc/dasdc1/start:192
dasdc/dasdc1/inflight:       0        0
dasdc/dasdc1/power/runtime_suspended_time:0
dasdc/dasdc1/power/runtime_active_time:0
dasdc/dasdc1/power/control:auto
dasdc/dasdc1/power/runtime_status:unsupported
dasdc/dasdc1/dev:94:9
dasdc/dasdc1/ro:0
dasdc/dasdc1/partition:1
dasdc/dasdc1/stat:     118        0     9992       50        1        0        8        0        0        0        0
dasdc/dasdc1/size:43273248
dasdc/dasdc1/alignment_offset:0
dasdc/dasdc1/trace/end_lba:disabled
dasdc/dasdc1/trace/act_mask:disabled
dasdc/dasdc1/trace/start_lba:disabled
dasdc/dasdc1/trace/enable:0
dasdc/dasdc1/trace/pid:disabled
dasdc/dasdc1/uevent:MAJOR=94
dasdc/dasdc1/uevent:MINOR=9
dasdc/dasdc1/uevent:DEVNAME=dasdc1
dasdc/dasdc1/uevent:DEVTYPE=partition
dasdc/dasdc1/uevent:PARTN=1
dasdc/dasdc1/discard_alignment:0
dasdc/integrity/write_generate:0
dasdc/integrity/device_is_integrity_capable:0
dasdc/integrity/tag_size:0
dasdc/integrity/read_verify:0
dasdc/integrity/protection_interval_bytes:0
dasdc/integrity/format:none
dasdc/discard_alignment:0
dasdd/range:4
dasdd/capability:10
dasdd/inflight:       0        0
dasdd/ext_range:4
dasdd/power/runtime_suspended_time:0
dasdd/power/runtime_active_time:0
dasdd/power/control:auto
dasdd/power/runtime_status:unsupported
dasdd/dev:94:12
dasdd/hidden:0
dasdd/ro:0
dasdd/mq/7/nr_tags:1024
dasdd/mq/7/nr_reserved_tags:0
dasdd/mq/7/cpu_list:7
dasdd/mq/15/nr_tags:1024
dasdd/mq/15/nr_reserved_tags:0
dasdd/mq/15/cpu_list:15
dasdd/mq/5/nr_tags:1024
dasdd/mq/5/nr_reserved_tags:0
dasdd/mq/5/cpu_list:5
dasdd/mq/13/nr_tags:1024
dasdd/mq/13/nr_reserved_tags:0
dasdd/mq/13/cpu_list:13
dasdd/mq/3/nr_tags:1024
dasdd/mq/3/nr_reserved_tags:0
dasdd/mq/3/cpu_list:3
dasdd/mq/11/nr_tags:1024
dasdd/mq/11/nr_reserved_tags:0
dasdd/mq/11/cpu_list:11
dasdd/mq/1/nr_tags:1024
dasdd/mq/1/nr_reserved_tags:0
dasdd/mq/1/cpu_list:1
dasdd/mq/8/nr_tags:1024
dasdd/mq/8/nr_reserved_tags:0
dasdd/mq/8/cpu_list:8
dasdd/mq/6/nr_tags:1024
dasdd/mq/6/nr_reserved_tags:0
dasdd/mq/6/cpu_list:6
dasdd/mq/14/nr_tags:1024
dasdd/mq/14/nr_reserved_tags:0
dasdd/mq/14/cpu_list:14
dasdd/mq/4/nr_tags:1024
dasdd/mq/4/nr_reserved_tags:0
dasdd/mq/4/cpu_list:4
dasdd/mq/12/nr_tags:1024
dasdd/mq/12/nr_reserved_tags:0
dasdd/mq/12/cpu_list:12
dasdd/mq/2/nr_tags:1024
dasdd/mq/2/nr_reserved_tags:0
dasdd/mq/2/cpu_list:2
dasdd/mq/10/nr_tags:1024
dasdd/mq/10/nr_reserved_tags:0
dasdd/mq/10/cpu_list:10
dasdd/mq/0/nr_tags:1024
dasdd/mq/0/nr_reserved_tags:0
dasdd/mq/0/cpu_list:0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281
dasdd/mq/9/nr_tags:1024
dasdd/mq/9/nr_reserved_tags:0
dasdd/mq/9/cpu_list:9
dasdd/stat:    6591     1270   412856     4720    24963     7143   743920    30530        0     3670     8400
dasdd/removable:0
dasdd/size:43273440
dasdd/alignment_offset:0
dasdd/queue/hw_sector_size:4096
dasdd/queue/max_discard_segments:1
dasdd/queue/max_segment_size:4096
dasdd/queue/physical_block_size:4096
dasdd/queue/discard_max_bytes:0
dasdd/queue/rotational:0
dasdd/queue/iosched/fifo_batch:16
dasdd/queue/iosched/read_expire:500
dasdd/queue/iosched/writes_starved:2
dasdd/queue/iosched/write_expire:5000
dasdd/queue/iosched/front_merges:1
dasdd/queue/write_same_max_bytes:0
dasdd/queue/zoned:none
dasdd/queue/max_sectors_kb:760
dasdd/queue/discard_zeroes_data:0
dasdd/queue/read_ahead_kb:128
dasdd/queue/discard_max_hw_bytes:0
dasdd/queue/wbt_lat_usec:75000
dasdd/queue/nomerges:0
dasdd/queue/max_segments:65535
dasdd/queue/rq_affinity:1
dasdd/queue/iostats:1
dasdd/queue/dax:0
dasdd/queue/minimum_io_size:4096
dasdd/queue/chunk_sectors:0
dasdd/queue/io_poll:1
dasdd/queue/write_zeroes_max_bytes:0
dasdd/queue/max_hw_sectors_kb:760
dasdd/queue/add_random:0
dasdd/queue/optimal_io_size:0
dasdd/queue/nr_requests:256
dasdd/queue/scheduler:[mq-deadline] kyber none
dasdd/queue/discard_granularity:0
dasdd/queue/logical_block_size:4096
dasdd/queue/io_poll_delay:-1
dasdd/queue/max_integrity_segments:0
dasdd/queue/write_cache:write through
dasdd/trace/end_lba:disabled
dasdd/trace/act_mask:disabled
dasdd/trace/start_lba:disabled
dasdd/trace/enable:0
dasdd/trace/pid:disabled
dasdd/uevent:MAJOR=94
dasdd/uevent:MINOR=12
dasdd/uevent:DEVNAME=dasdd
dasdd/uevent:DEVTYPE=disk
dasdd/dasdd1/start:192
dasdd/dasdd1/inflight:       0        0
dasdd/dasdd1/power/runtime_suspended_time:0
dasdd/dasdd1/power/runtime_active_time:0
dasdd/dasdd1/power/control:auto
dasdd/dasdd1/power/runtime_status:unsupported
dasdd/dasdd1/dev:94:13
dasdd/dasdd1/ro:0
dasdd/dasdd1/partition:1
dasdd/dasdd1/stat:    6580     1270   412344     4720    24963     7143   743920    30530        0     3670     8400
dasdd/dasdd1/size:43273248
dasdd/dasdd1/alignment_offset:0
dasdd/dasdd1/trace/end_lba:disabled
dasdd/dasdd1/trace/act_mask:disabled
dasdd/dasdd1/trace/start_lba:disabled
dasdd/dasdd1/trace/enable:0
dasdd/dasdd1/trace/pid:disabled
dasdd/dasdd1/uevent:MAJOR=94
dasdd/dasdd1/uevent:MINOR=13
dasdd/dasdd1/uevent:DEVNAME=dasdd1
dasdd/dasdd1/uevent:DEVTYPE=partition
dasdd/dasdd1/uevent:PARTN=1
dasdd/dasdd1/discard_alignment:0
dasdd/integrity/write_generate:0
dasdd/integrity/device_is_integrity_capable:0
dasdd/integrity/tag_size:0
dasdd/integrity/read_verify:0
dasdd/integrity/protection_interval_bytes:0
dasdd/integrity/format:none
dasdd/discard_alignment:0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-28 14:53       ` Jens Axboe
@ 2018-03-28 15:38         ` Christian Borntraeger
  0 siblings, 0 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-28 15:38 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig

With that patch I now get:

[   40.620619] virbr0: port 1(virbr0-nic) entered disabled state
[   47.418592] run queue from wrong CPU 3, hctx inactive
[   47.418602] CPU: 3 PID: 2153 Comm: kworker/3:1H Tainted: G        W        4.16.0-rc7+ #27
[   47.418604] Hardware name: IBM 2964 NC9 704 (LPAR)
[   47.418613] Workqueue: kblockd blk_mq_run_work_fn
[   47.418615] Call Trace:
[   47.418621] ([<0000000000113b86>] show_stack+0x56/0x80)
[   47.418626]  [<0000000000a5cd9a>] dump_stack+0x82/0xb0 
[   47.418627]  [<000000000069c4be>] __blk_mq_run_hw_queue+0x136/0x160 
[   47.418631]  [<0000000000163906>] process_one_work+0x1be/0x420 
[   47.418633]  [<0000000000163bc0>] worker_thread+0x58/0x458 
[   47.418635]  [<000000000016a9d0>] kthread+0x148/0x160 
[   47.418639]  [<0000000000a7bf3a>] kernel_thread_starter+0x6/0xc 
[   47.418640]  [<0000000000a7bf34>] kernel_thread_starter+0x0/0xc 
[   77.670407] run queue from wrong CPU 4, hctx inactive
[   77.670416] CPU: 4 PID: 2155 Comm: kworker/4:1H Tainted: G        W        4.16.0-rc7+ #27
[   77.670418] Hardware name: IBM 2964 NC9 704 (LPAR)
[   77.670428] Workqueue: kblockd blk_mq_run_work_fn
[   77.670430] Call Trace:
[   77.670436] ([<0000000000113b86>] show_stack+0x56/0x80)
[   77.670441]  [<0000000000a5cd9a>] dump_stack+0x82/0xb0 
[   77.670442]  [<000000000069c4be>] __blk_mq_run_hw_queue+0x136/0x160 
[   77.670446]  [<0000000000163906>] process_one_work+0x1be/0x420 
[   77.670448]  [<0000000000163bc0>] worker_thread+0x58/0x458 
[   77.670450]  [<000000000016a9d0>] kthread+0x148/0x160 
[   77.670454]  [<0000000000a7bf3a>] kernel_thread_starter+0x6/0xc 
[   77.670455]  [<0000000000a7bf34>] kernel_thread_starter+0x0/0xc 



On 03/28/2018 04:53 PM, Jens Axboe wrote:
> On 3/28/18 8:38 AM, Jens Axboe wrote:
>> On 3/28/18 1:45 AM, Christian Borntraeger wrote:
>>> FWIW, this patch does not fix the issue for me:
>>
>> Looks like I didn't do the delayed path. How about the below?
> 
> OK, final version... This is more in line with what I originally
> suggested.
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 16e83e6df404..c90016c36a70 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1306,6 +1306,10 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>  {
>  	int srcu_idx;
> 
> +	if (!blk_mq_hw_queue_mapped(hctx) &&
> +	    !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx)))
> +		return;
> +
>  	/*
>  	 * We should be running this queue from one of the CPUs that
>  	 * are mapped to it.
> @@ -1399,9 +1403,6 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>  static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
>  					unsigned long msecs)
>  {
> -	if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
> -		return;
> -
>  	if (unlikely(blk_mq_hctx_stopped(hctx)))
>  		return;
> 
> @@ -1586,9 +1587,6 @@ static void blk_mq_run_work_fn(struct work_struct *work)
> 
>  void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
>  {
> -	if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx)))
> -		return;
> -
>  	/*
>  	 * Stop the hw queue, then modify currently delayed work.
>  	 * This should prevent us from running the queue prematurely.
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-28 15:36       ` Christian Borntraeger
@ 2018-03-28 15:44         ` Christian Borntraeger
  2018-03-29  2:00         ` Ming Lei
  1 sibling, 0 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-28 15:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

FWIW, these logs were from a different system (with less disks and cpus).
the related log is

[    4.114191] dasd-eckd.2aa01a: 0.0.3f77: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors
[    4.114852] dasd-eckd.2aa01a: 0.0.3f74: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors
[    4.122361] dasd-eckd.412b53: 0.0.3f77: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout
[    4.122811] dasd-eckd.412b53: 0.0.3f74: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout
[    4.123568]  dasdc:VOL1/  0X3F77: dasdc1
[    4.124092]  dasdd:VOL1/  0X3F74: dasdd1
[    4.286220] WARNING: CPU: 1 PID: 1262 at block/blk-mq.c:1402 __blk_mq_delay_run_hw_queue+0xbe/0xd8
[    4.286225] Modules linked in: autofs4
[    4.286231] CPU: 1 PID: 1262 Comm: dasdconf.sh Not tainted 4.16.0-20180323.rc6.git0.792f5024dd01.300.fc27.s390x #1
[    4.286232] Hardware name: IBM 2964 NC9 704 (LPAR)
[    4.286236] Krnl PSW : 0000000053ccfc28 00000000c4b59c51 (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
[    4.286239]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[    4.286242] Krnl GPRS: 00000003da4eb000 0000000300000000 00000003dae67000 0000000000000001
[    4.286243]            0000000000000000 00000003da4eb710 0000000300000000 000000010dbafd98
[    4.286245]            000000010dbafd98 0000000000000001 0000000000000001 0000000000000000
[    4.286248]            00000003dae67000 0000000000aaa750 000000010dbafc00 000000010dbafbc8
[    4.286256] Krnl Code: 0000000000698e46: ebaff0a00004        lmg     %r10,%r15,160(%r15)
                          0000000000698e4c: c0f4ffff7aca        brcl    15,6883e0
                         #0000000000698e52: a7f40001            brc     15,698e54
                         >0000000000698e56: e340f0c00004        lg      %r4,192(%r15)
                          0000000000698e5c: ebaff0a00004        lmg     %r10,%r15,160(%r15)
                          0000000000698e62: 07f4                bcr     15,%r4
                          0000000000698e64: c0e5ffffff02        brasl   %r14,698c68
                          0000000000698e6a: a7f4fff6            brc     15,698e56
[    4.286301] Call Trace:
[    4.286304] ([<000000010dbafc08>] 0x10dbafc08)
[    4.286306]  [<0000000000698f5a>] blk_mq_run_hw_queue+0x82/0x180 
[    4.286308]  [<00000000006990c0>] blk_mq_run_hw_queues+0x68/0x88 
[    4.286310]  [<00000000006982de>] __blk_mq_complete_request+0x11e/0x1d8 
[    4.286313]  [<0000000000698424>] blk_mq_complete_request+0x8c/0xc8 
[    4.286319]  [<000000000082c5d0>] dasd_block_tasklet+0x158/0x490 
[    4.286325]  [<000000000014bc9a>] tasklet_hi_action+0x92/0x120 
[    4.286329]  [<00000000009feeb0>] __do_softirq+0x120/0x348 
[    4.286331]  [<000000000014b76a>] irq_exit+0xba/0xd0 
[    4.286335]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
[    4.286337]  [<00000000009fe180>] io_int_handler+0x130/0x298 
[    4.286338] Last Breaking-Event-Address:
[    4.286340]  [<0000000000698e52>] __blk_mq_delay_run_hw_queue+0xba/0xd8
[    4.286342] ---[ end trace 0d746eb6f9348354 ]---

On 03/28/2018 05:36 PM, Christian Borntraeger wrote:
> 
> 
> On 03/28/2018 05:26 PM, Ming Lei wrote:
>> Hi Christian,
>>
>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>>> FWIW, this patch does not fix the issue for me:
>>>
>>> ostname=? addr=? terminal=? res=success'
>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>                           000000000069c5b2: 07f4		bcr	15,%r4
>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
>>> [   21.455067] Call Trace:
>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
>>> [   21.455136] Last Breaking-Event-Address:
>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
>>
>> Thinking about this issue further, I can't understand the root cause for
>> this issue.
>>
>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with
>> each possisble CPU"), each hw queue should be mapped to at least one CPU, that
>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?
>>
>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
>> following command?
> 
> # lscpu
> Architecture:        s390x
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Big Endian
> CPU(s):              16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  2
> Core(s) per socket:  8
> Socket(s) per book:  3
> Book(s) per drawer:  2
> Drawer(s):           4
> NUMA node(s):        1
> Vendor ID:           IBM/S390
> Machine type:        2964
> CPU dynamic MHz:     5000
> CPU static MHz:      5000
> BogoMIPS:            20325.00
> Hypervisor:          PR/SM
> Hypervisor vendor:   IBM
> Virtualization type: full
> Dispatching mode:    horizontal
> L1d cache:           128K
> L1i cache:           96K
> L2d cache:           2048K
> L2i cache:           2048K
> L3 cache:            65536K
> L4 cache:            491520K
> NUMA node0 CPU(s):   0-15
> Flags:               esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
> 
> # lsdasd 
> Bus-ID     Status      Name      Device  Type  BlkSz  Size      Blocks
> ==============================================================================
> 0.0.3f75   active      dasda     94:0    ECKD  4096   21129MB   5409180
> 0.0.3f76   active      dasdb     94:4    ECKD  4096   21129MB   5409180
> 0.0.3f77   active      dasdc     94:8    ECKD  4096   21129MB   5409180
> 0.0.3f74   active      dasdd     94:12   ECKD  4096   21129MB   5409180
> 
>>
>> (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;)
> 
> 
> see attachement:
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-28 15:36       ` Christian Borntraeger
  2018-03-28 15:44         ` Christian Borntraeger
@ 2018-03-29  2:00         ` Ming Lei
  2018-03-29  7:23           ` Christian Borntraeger
  1 sibling, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-03-29  2:00 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/28/2018 05:26 PM, Ming Lei wrote:
> > Hi Christian,
> > 
> > On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> >> FWIW, this patch does not fix the issue for me:
> >>
> >> ostname=? addr=? terminal=? res=success'
> >> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> >> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> >> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> >> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> >> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> >> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> >> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
> >>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
> >>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
> >>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>                           000000000069c5b2: 07f4		bcr	15,%r4
> >>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
> >>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> >> [   21.455067] Call Trace:
> >> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> >> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> >> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> >> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> >> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> >> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> >> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> >> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> >> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> >> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> >> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> >> [   21.455136] Last Breaking-Event-Address:
> >> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> >> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> > 
> > Thinking about this issue further, I can't understand the root cause for
> > this issue.
> > 
> > After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with
> > each possisble CPU"), each hw queue should be mapped to at least one CPU, that
> > means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?
> > 
> > Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
> > following command?
> 
> # lscpu
> Architecture:        s390x
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Big Endian
> CPU(s):              16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  2
> Core(s) per socket:  8
> Socket(s) per book:  3
> Book(s) per drawer:  2
> Drawer(s):           4
> NUMA node(s):        1
> Vendor ID:           IBM/S390
> Machine type:        2964
> CPU dynamic MHz:     5000
> CPU static MHz:      5000
> BogoMIPS:            20325.00
> Hypervisor:          PR/SM
> Hypervisor vendor:   IBM
> Virtualization type: full
> Dispatching mode:    horizontal
> L1d cache:           128K
> L1i cache:           96K
> L2d cache:           2048K
> L2i cache:           2048K
> L3 cache:            65536K
> L4 cache:            491520K
> NUMA node0 CPU(s):   0-15
> Flags:               esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
> 
> # lsdasd 
> Bus-ID     Status      Name      Device  Type  BlkSz  Size      Blocks
> ==============================================================================
> 0.0.3f75   active      dasda     94:0    ECKD  4096   21129MB   5409180
> 0.0.3f76   active      dasdb     94:4    ECKD  4096   21129MB   5409180
> 0.0.3f77   active      dasdc     94:8    ECKD  4096   21129MB   5409180
> 0.0.3f74   active      dasdd     94:12   ECKD  4096   21129MB   5409180

I have tried to emulate your CPU topo via VM and the blk-mq mapping of
null_blk is basically similar with your DASD mapping, but still can't
reproduce your issue.

BTW, do you need to do cpu hotplug or other actions for triggering this warning?

> 
> > 
> > (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;)
> 
> 
> see attachement:

> dasda/range:4
> dasda/capability:10
> dasda/inflight:       0        0
> dasda/ext_range:4
> dasda/power/runtime_suspended_time:0
> dasda/power/runtime_active_time:0
> dasda/power/control:auto
> dasda/power/runtime_status:unsupported
> dasda/dev:94:0

No, it is from sysfs instead of debugfs, so please run the following
command:

	(cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;)

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29  2:00         ` Ming Lei
@ 2018-03-29  7:23           ` Christian Borntraeger
  2018-03-29  9:09             ` Christian Borntraeger
  2018-03-29  9:52             ` Ming Lei
  0 siblings, 2 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-29  7:23 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 5626 bytes --]



On 03/29/2018 04:00 AM, Ming Lei wrote:
> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/28/2018 05:26 PM, Ming Lei wrote:
>>> Hi Christian,
>>>
>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>>>> FWIW, this patch does not fix the issue for me:
>>>>
>>>> ostname=? addr=? terminal=? res=success'
>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
>>>> [   21.455067] Call Trace:
>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
>>>> [   21.455136] Last Breaking-Event-Address:
>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
>>>
>>> Thinking about this issue further, I can't understand the root cause for
>>> this issue.
>>>
>>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with
>>> each possisble CPU"), each hw queue should be mapped to at least one CPU, that
>>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?
>>>
>>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
>>> following command?
>>
>> # lscpu
>> Architecture:        s390x
>> CPU op-mode(s):      32-bit, 64-bit
>> Byte Order:          Big Endian
>> CPU(s):              16
>> On-line CPU(s) list: 0-15
>> Thread(s) per core:  2
>> Core(s) per socket:  8
>> Socket(s) per book:  3
>> Book(s) per drawer:  2
>> Drawer(s):           4
>> NUMA node(s):        1
>> Vendor ID:           IBM/S390
>> Machine type:        2964
>> CPU dynamic MHz:     5000
>> CPU static MHz:      5000
>> BogoMIPS:            20325.00
>> Hypervisor:          PR/SM
>> Hypervisor vendor:   IBM
>> Virtualization type: full
>> Dispatching mode:    horizontal
>> L1d cache:           128K
>> L1i cache:           96K
>> L2d cache:           2048K
>> L2i cache:           2048K
>> L3 cache:            65536K
>> L4 cache:            491520K
>> NUMA node0 CPU(s):   0-15
>> Flags:               esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
>>
>> # lsdasd 
>> Bus-ID     Status      Name      Device  Type  BlkSz  Size      Blocks
>> ==============================================================================
>> 0.0.3f75   active      dasda     94:0    ECKD  4096   21129MB   5409180
>> 0.0.3f76   active      dasdb     94:4    ECKD  4096   21129MB   5409180
>> 0.0.3f77   active      dasdc     94:8    ECKD  4096   21129MB   5409180
>> 0.0.3f74   active      dasdd     94:12   ECKD  4096   21129MB   5409180
> 
> I have tried to emulate your CPU topo via VM and the blk-mq mapping of
> null_blk is basically similar with your DASD mapping, but still can't
> reproduce your issue.
> 
> BTW, do you need to do cpu hotplug or other actions for triggering this warning?

No, without hotplug.
> 
>>
>>>
>>> (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;)
>>
>>
>> see attachement:
> 
>> dasda/range:4
>> dasda/capability:10
>> dasda/inflight:       0        0
>> dasda/ext_range:4
>> dasda/power/runtime_suspended_time:0
>> dasda/power/runtime_active_time:0
>> dasda/power/control:auto
>> dasda/power/runtime_status:unsupported
>> dasda/dev:94:0
> 
> No, it is from sysfs instead of debugfs, so please run the following

Eeks. Yes sorry. New version.

[-- Attachment #2: log.gz --]
[-- Type: application/gzip, Size: 204740 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29  7:23           ` Christian Borntraeger
@ 2018-03-29  9:09             ` Christian Borntraeger
  2018-03-29  9:40               ` Ming Lei
  2018-03-29  9:52             ` Ming Lei
  1 sibling, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-29  9:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 04:00 AM, Ming Lei wrote:
>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
>>>
>>>
>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
>>>> Hi Christian,
>>>>
>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>>>>> FWIW, this patch does not fix the issue for me:
>>>>>
>>>>> ostname=? addr=? terminal=? res=success'
>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
>>>>> [   21.455067] Call Trace:
>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
>>>>> [   21.455136] Last Breaking-Event-Address:
>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
>>>>
>>>> Thinking about this issue further, I can't understand the root cause for
>>>> this issue.

FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29  9:09             ` Christian Borntraeger
@ 2018-03-29  9:40               ` Ming Lei
  2018-03-29 10:10                 ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-03-29  9:40 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> > 
> > 
> > On 03/29/2018 04:00 AM, Ming Lei wrote:
> >> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >>>
> >>>
> >>> On 03/28/2018 05:26 PM, Ming Lei wrote:
> >>>> Hi Christian,
> >>>>
> >>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> >>>>> FWIW, this patch does not fix the issue for me:
> >>>>>
> >>>>> ostname=? addr=? terminal=? res=success'
> >>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> >>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> >>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> >>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> >>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> >>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> >>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
> >>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
> >>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
> >>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
> >>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
> >>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> >>>>> [   21.455067] Call Trace:
> >>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> >>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> >>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> >>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> >>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> >>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> >>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> >>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> >>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> >>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> >>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> >>>>> [   21.455136] Last Breaking-Event-Address:
> >>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> >>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> >>>>
> >>>> Thinking about this issue further, I can't understand the root cause for
> >>>> this issue.
> 
> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.

I think the following patch is needed, and this way aligns to the mapping
created via managed IRQ at least.

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..638ab5c11b3c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -14,13 +14,12 @@
 #include "blk.h"
 #include "blk-mq.h"
 
+/*
+ * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
+ * queues even it isn't present yet.
+ */
 static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
 {
-	/*
-	 * Non present CPU will be mapped to queue index 0.
-	 */
-	if (!cpu_present(cpu))
-		return 0;
 	return cpu % nr_queues;
 }

Thanks,
Ming

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29  7:23           ` Christian Borntraeger
  2018-03-29  9:09             ` Christian Borntraeger
@ 2018-03-29  9:52             ` Ming Lei
  2018-03-29 10:11               ` Christian Borntraeger
  2018-03-29 10:13               ` Ming Lei
  1 sibling, 2 replies; 40+ messages in thread
From: Ming Lei @ 2018-03-29  9:52 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Thu, Mar 29, 2018 at 09:23:10AM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 04:00 AM, Ming Lei wrote:
> > On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/28/2018 05:26 PM, Ming Lei wrote:
> >>> Hi Christian,
> >>>
> >>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> >>>> FWIW, this patch does not fix the issue for me:
> >>>>
> >>>> ostname=? addr=? terminal=? res=success'
> >>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> >>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> >>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> >>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> >>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> >>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> >>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
> >>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
> >>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
> >>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>                           000000000069c5b2: 07f4		bcr	15,%r4
> >>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
> >>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> >>>> [   21.455067] Call Trace:
> >>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> >>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> >>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> >>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> >>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> >>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> >>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> >>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> >>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> >>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> >>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> >>>> [   21.455136] Last Breaking-Event-Address:
> >>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> >>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> >>>
> >>> Thinking about this issue further, I can't understand the root cause for
> >>> this issue.
> >>>
> >>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with
> >>> each possisble CPU"), each hw queue should be mapped to at least one CPU, that
> >>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?
> >>>
> >>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
> >>> following command?
> >>
> >> # lscpu
> >> Architecture:        s390x
> >> CPU op-mode(s):      32-bit, 64-bit
> >> Byte Order:          Big Endian
> >> CPU(s):              16
> >> On-line CPU(s) list: 0-15
> >> Thread(s) per core:  2
> >> Core(s) per socket:  8
> >> Socket(s) per book:  3
> >> Book(s) per drawer:  2
> >> Drawer(s):           4
> >> NUMA node(s):        1
> >> Vendor ID:           IBM/S390
> >> Machine type:        2964
> >> CPU dynamic MHz:     5000
> >> CPU static MHz:      5000
> >> BogoMIPS:            20325.00
> >> Hypervisor:          PR/SM
> >> Hypervisor vendor:   IBM
> >> Virtualization type: full
> >> Dispatching mode:    horizontal
> >> L1d cache:           128K
> >> L1i cache:           96K
> >> L2d cache:           2048K
> >> L2i cache:           2048K
> >> L3 cache:            65536K
> >> L4 cache:            491520K
> >> NUMA node0 CPU(s):   0-15
> >> Flags:               esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
> >>
> >> # lsdasd 
> >> Bus-ID     Status      Name      Device  Type  BlkSz  Size      Blocks
> >> ==============================================================================
> >> 0.0.3f75   active      dasda     94:0    ECKD  4096   21129MB   5409180
> >> 0.0.3f76   active      dasdb     94:4    ECKD  4096   21129MB   5409180
> >> 0.0.3f77   active      dasdc     94:8    ECKD  4096   21129MB   5409180
> >> 0.0.3f74   active      dasdd     94:12   ECKD  4096   21129MB   5409180
> > 
> > I have tried to emulate your CPU topo via VM and the blk-mq mapping of
> > null_blk is basically similar with your DASD mapping, but still can't
> > reproduce your issue.
> > 
> > BTW, do you need to do cpu hotplug or other actions for triggering this warning?
> 
> No, without hotplug.

>From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be
unmapped, could you check if it is hctx0 which is unmapped when the
warning is triggered? If not, what is the unmapped hctx? And you can do
that by adding one extra line:

	printk("unmapped hctx %d", hctx->queue_num);

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29  9:40               ` Ming Lei
@ 2018-03-29 10:10                 ` Christian Borntraeger
  2018-03-29 10:48                   ` Ming Lei
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-29 10:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 03/29/2018 11:40 AM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
>>>
>>>
>>> On 03/29/2018 04:00 AM, Ming Lei wrote:
>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
>>>>>> Hi Christian,
>>>>>>
>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>>>>>>> FWIW, this patch does not fix the issue for me:
>>>>>>>
>>>>>>> ostname=? addr=? terminal=? res=success'
>>>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
>>>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>>>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
>>>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>>>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>>>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>>>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
>>>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
>>>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
>>>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
>>>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
>>>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
>>>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
>>>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
>>>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
>>>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
>>>>>>> [   21.455067] Call Trace:
>>>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
>>>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
>>>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
>>>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
>>>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>>>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
>>>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
>>>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
>>>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
>>>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
>>>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
>>>>>>> [   21.455136] Last Breaking-Event-Address:
>>>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
>>>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>>>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
>>>>>>
>>>>>> Thinking about this issue further, I can't understand the root cause for
>>>>>> this issue.
>>
>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> 
> I think the following patch is needed, and this way aligns to the mapping
> created via managed IRQ at least.
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..638ab5c11b3c 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -14,13 +14,12 @@
>  #include "blk.h"
>  #include "blk-mq.h"
> 
> +/*
> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
> + * queues even it isn't present yet.
> + */
>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>  {
> -	/*
> -	 * Non present CPU will be mapped to queue index 0.
> -	 */
> -	if (!cpu_present(cpu))
> -		return 0;
>  	return cpu % nr_queues;
>  }
> 
> Thanks,
> Ming
> 

With that I no longer see the WARN_ON but the other warning instead:

[   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd).
[   32.000543] systemd: 18 output lines suppressed due to ratelimiting
[   32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null)
[   32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1
[   32.359832] tun: Universal TUN/TAP device driver, 1.6
[   32.470841] run queue from wrong CPU 18, hctx active
[   32.470845] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31
[   32.470847] Hardware name: IBM 2964 NC9 704 (LPAR)
[   32.470856] Workqueue: kblockd blk_mq_run_work_fn
[   32.470857] Call Trace:
[   32.470862] ([<0000000000113b86>] show_stack+0x56/0x80)
[   32.470867]  [<0000000000a5cd02>] dump_stack+0x82/0xb0 
[   32.470869]  [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 
[   32.470873]  [<0000000000163906>] process_one_work+0x1be/0x420 
[   32.470875]  [<0000000000163bc0>] worker_thread+0x58/0x458 
[   32.470877]  [<000000000016a9d0>] kthread+0x148/0x160 
[   32.470880]  [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc 
[   32.470882]  [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc 
[   32.470889] run queue from wrong CPU 18, hctx active
[   32.470891] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31
[   32.470892] Hardware name: IBM 2964 NC9 704 (LPAR)
[   32.470894] Workqueue: kblockd blk_mq_run_work_fn
[   32.470895] Call Trace:
[   32.470897] ([<0000000000113b86>] show_stack+0x56/0x80)
[   32.470898]  [<0000000000a5cd02>] dump_stack+0x82/0xb0 
[   32.470900]  [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 
[   32.470902]  [<0000000000163906>] process_one_work+0x1be/0x420 
[   32.470903]  [<0000000000163bc0>] worker_thread+0x58/0x458 
[   32.470905]  [<000000000016a9d0>] kthread+0x148/0x160 
[   32.470906]  [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc 
[   32.470908]  [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc 
[   32.470910] run queue from wrong CPU 18, hctx active
[   32.470911] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31
[   32.470913] Hardware name: IBM 2964 NC9 704 (LPAR)
[   32.470914] Workqueue: kblockd blk_mq_run_work_fn
[   32.470916] Call Trace:
[   32.470918] ([<0000000000113b86>] show_stack+0x56/0x80)
[   32.470919]  [<0000000000a5cd02>] dump_stack+0x82/0xb0 
[   32.470921]  [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 
[   32.470922]  [<0000000000163906>] process_one_work+0x1be/0x420 
[   32.470924]  [<0000000000163bc0>] worker_thread+0x58/0x458 
[   32.470925]  [<000000000016a9d0>] kthread+0x148/0x160 
[   32.470927]  [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc 
[   32.470929]  [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc 
[   32.470930] run queue from wrong CPU 18, hctx active
[   32.470932] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31
[   32.470933] Hardware name: IBM 2964 NC9 704 (LPAR)
[   32.470935] Workqueue: kblockd blk_mq_run_work_fn
[   32.470936] Call Trace:
[   32.470938] ([<0000000000113b86>] show_stack+0x56/0x80)
[   32.470939]  [<0000000000a5cd02>] dump_stack+0x82/0xb0 
[   32.470941]  [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 
[   32.470943]  [<0000000000163906>] process_one_work+0x1be/0x420 
[   32.470944]  [<0000000000163bc0>] worker_thread+0x58/0x458 
[   32.470946]  [<000000000016a9d0>] kthread+0x148/0x160 
[   32.470947]  [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc 
[   32.470949]  [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc 
[   32.470950] run queue from wrong CPU 18, hctx active
[   32.470952] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31
[   32.470953] Hardware name: IBM 2964 NC9 704 (LPAR)
[   32.470955] Workqueue: kblockd blk_mq_run_work_fn
[   32.470956] Call Trace:
[   32.470958] ([<0000000000113b86>] show_stack+0x56/0x80)
[   32.470959]  [<0000000000a5cd02>] dump_stack+0x82/0xb0 
[   32.470961]  [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 
[   32.470963]  [<0000000000163906>] process_one_work+0x1be/0x420 
[   32.470964]  [<0000000000163bc0>] worker_thread+0x58/0x458 
[   32.470966]  [<000000000016a9d0>] kthread+0x148/0x160 
[   32.470967]  [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc 
[   32.470969]  [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc 
[   32.470971] run queue from wrong CPU 18, hctx active
[   32.470972] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31
[   32.470973] Hardware name: IBM 2964 NC9 704 (LPAR)
[   32.470975] Workqueue: kblockd blk_mq_run_work_fn
[   32.470976] Call Trace:
[   32.470978] ([<0000000000113b86>] show_stack+0x56/0x80)
[   32.470979]  [<0000000000a5cd02>] dump_stack+0x82/0xb0 
[   32.470981]  [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 
[   32.470983]  [<0000000000163906>] process_one_work+0x1be/0x420 
[   32.470985]  [<0000000000163bc0>] worker_thread+0x58/0x458 
[   32.470986]  [<000000000016a9d0>] kthread+0x148/0x160 
[   32.470988]  [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc 
[   32.470989]  [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc 
[   32.470991] run queue from wrong CPU 18, hctx active
[   32.470992] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31
[   32.470993] Hardware name: IBM 2964 NC9 704 (LPAR)
[   32.470995] Workqueue: kblockd blk_mq_run_work_fn

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29  9:52             ` Ming Lei
@ 2018-03-29 10:11               ` Christian Borntraeger
  2018-03-29 10:12                 ` Christian Borntraeger
  2018-03-29 10:13               ` Ming Lei
  1 sibling, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-29 10:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 03/29/2018 11:52 AM, Ming Lei wrote:
> From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be
> unmapped, could you check if it is hctx0 which is unmapped when the
> warning is triggered? If not, what is the unmapped hctx? And you can do
> that by adding one extra line:
> 
> 	printk("unmapped hctx %d", hctx->queue_num);

Where do you want that printk?
> 
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29 10:11               ` Christian Borntraeger
@ 2018-03-29 10:12                 ` Christian Borntraeger
  0 siblings, 0 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-29 10:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 03/29/2018 12:11 PM, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 11:52 AM, Ming Lei wrote:
>> From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be
>> unmapped, could you check if it is hctx0 which is unmapped when the
>> warning is triggered? If not, what is the unmapped hctx? And you can do
>> that by adding one extra line:
>>
>> 	printk("unmapped hctx %d", hctx->queue_num);
> 
> Where do you want that printk?

And do you want it with or without the other patch that you have just sent?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29  9:52             ` Ming Lei
  2018-03-29 10:11               ` Christian Borntraeger
@ 2018-03-29 10:13               ` Ming Lei
  1 sibling, 0 replies; 40+ messages in thread
From: Ming Lei @ 2018-03-29 10:13 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Thu, Mar 29, 2018 at 05:52:16PM +0800, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 09:23:10AM +0200, Christian Borntraeger wrote:
> > 
> > 
> > On 03/29/2018 04:00 AM, Ming Lei wrote:
> > > On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> > >>
> > >>
> > >> On 03/28/2018 05:26 PM, Ming Lei wrote:
> > >>> Hi Christian,
> > >>>
> > >>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> > >>>> FWIW, this patch does not fix the issue for me:
> > >>>>
> > >>>> ostname=? addr=? terminal=? res=success'
> > >>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> > >>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> > >>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> > >>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> > >>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> > >>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > >>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> > >>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> > >>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> > >>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> > >>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> > >>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
> > >>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
> > >>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
> > >>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> > >>>>                           000000000069c5b2: 07f4		bcr	15,%r4
> > >>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
> > >>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> > >>>> [   21.455067] Call Trace:
> > >>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> > >>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> > >>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> > >>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> > >>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> > >>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> > >>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> > >>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> > >>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> > >>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> > >>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> > >>>> [   21.455136] Last Breaking-Event-Address:
> > >>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> > >>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> > >>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> > >>>
> > >>> Thinking about this issue further, I can't understand the root cause for
> > >>> this issue.
> > >>>
> > >>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with
> > >>> each possisble CPU"), each hw queue should be mapped to at least one CPU, that
> > >>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong?
> > >>>
> > >>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the
> > >>> following command?
> > >>
> > >> # lscpu
> > >> Architecture:        s390x
> > >> CPU op-mode(s):      32-bit, 64-bit
> > >> Byte Order:          Big Endian
> > >> CPU(s):              16
> > >> On-line CPU(s) list: 0-15
> > >> Thread(s) per core:  2
> > >> Core(s) per socket:  8
> > >> Socket(s) per book:  3
> > >> Book(s) per drawer:  2
> > >> Drawer(s):           4
> > >> NUMA node(s):        1
> > >> Vendor ID:           IBM/S390
> > >> Machine type:        2964
> > >> CPU dynamic MHz:     5000
> > >> CPU static MHz:      5000
> > >> BogoMIPS:            20325.00
> > >> Hypervisor:          PR/SM
> > >> Hypervisor vendor:   IBM
> > >> Virtualization type: full
> > >> Dispatching mode:    horizontal
> > >> L1d cache:           128K
> > >> L1i cache:           96K
> > >> L2d cache:           2048K
> > >> L2i cache:           2048K
> > >> L3 cache:            65536K
> > >> L4 cache:            491520K
> > >> NUMA node0 CPU(s):   0-15
> > >> Flags:               esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie
> > >>
> > >> # lsdasd 
> > >> Bus-ID     Status      Name      Device  Type  BlkSz  Size      Blocks
> > >> ==============================================================================
> > >> 0.0.3f75   active      dasda     94:0    ECKD  4096   21129MB   5409180
> > >> 0.0.3f76   active      dasdb     94:4    ECKD  4096   21129MB   5409180
> > >> 0.0.3f77   active      dasdc     94:8    ECKD  4096   21129MB   5409180
> > >> 0.0.3f74   active      dasdd     94:12   ECKD  4096   21129MB   5409180
> > > 
> > > I have tried to emulate your CPU topo via VM and the blk-mq mapping of
> > > null_blk is basically similar with your DASD mapping, but still can't
> > > reproduce your issue.
> > > 
> > > BTW, do you need to do cpu hotplug or other actions for triggering this warning?
> > 
> > No, without hotplug.
> 
> From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be
> unmapped, could you check if it is hctx0 which is unmapped when the
> warning is triggered? If not, what is the unmapped hctx? And you can do
> that by adding one extra line:
> 
> 	printk("unmapped hctx %d", hctx->queue_num);

It should be triggered when running any hctx from 16 to 63, instead of
0.

I see why I didn't trigger it via null_blk, because null_blk won't run
all hw queues, and I should have used scsi_debug to do that.

Then the patch of touching blk-mq-cpumap.c I sent before should address
this issue.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29 10:10                 ` Christian Borntraeger
@ 2018-03-29 10:48                   ` Ming Lei
  2018-03-29 10:49                     ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-03-29 10:48 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 11:40 AM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> >>>
> >>>
> >>> On 03/29/2018 04:00 AM, Ming Lei wrote:
> >>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >>>>>
> >>>>>
> >>>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
> >>>>>> Hi Christian,
> >>>>>>
> >>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> >>>>>>> FWIW, this patch does not fix the issue for me:
> >>>>>>>
> >>>>>>> ostname=? addr=? terminal=? res=success'
> >>>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >>>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> >>>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> >>>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >>>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >>>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >>>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> >>>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> >>>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> >>>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> >>>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
> >>>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
> >>>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
> >>>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
> >>>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
> >>>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> >>>>>>> [   21.455067] Call Trace:
> >>>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> >>>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> >>>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> >>>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> >>>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> >>>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> >>>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> >>>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> >>>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> >>>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> >>>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> >>>>>>> [   21.455136] Last Breaking-Event-Address:
> >>>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> >>>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >>>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> >>>>>>
> >>>>>> Thinking about this issue further, I can't understand the root cause for
> >>>>>> this issue.
> >>
> >> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> > 
> > I think the following patch is needed, and this way aligns to the mapping
> > created via managed IRQ at least.
> > 
> > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> > index 9f8cffc8a701..638ab5c11b3c 100644
> > --- a/block/blk-mq-cpumap.c
> > +++ b/block/blk-mq-cpumap.c
> > @@ -14,13 +14,12 @@
> >  #include "blk.h"
> >  #include "blk-mq.h"
> > 
> > +/*
> > + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
> > + * queues even it isn't present yet.
> > + */
> >  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> >  {
> > -	/*
> > -	 * Non present CPU will be mapped to queue index 0.
> > -	 */
> > -	if (!cpu_present(cpu))
> > -		return 0;
> >  	return cpu % nr_queues;
> >  }
> > 
> > Thanks,
> > Ming
> > 
> 
> With that I no longer see the WARN_ON but the other warning instead:
> 
> [   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [   31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd).
> [   32.000543] systemd: 18 output lines suppressed due to ratelimiting
> [   32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null)
> [   32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1
> [   32.359832] tun: Universal TUN/TAP device driver, 1.6
> [   32.470841] run queue from wrong CPU 18, hctx active

But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and
you also said CPU hotplug isn't involved in your test, so I am just
wondering where the CPU 18 is from?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29 10:48                   ` Ming Lei
@ 2018-03-29 10:49                     ` Christian Borntraeger
  2018-03-29 11:43                       ` Ming Lei
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-29 10:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 03/29/2018 12:48 PM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 11:40 AM, Ming Lei wrote:
>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote:
>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
>>>>>>>> Hi Christian,
>>>>>>>>
>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>>>>>>>>> FWIW, this patch does not fix the issue for me:
>>>>>>>>>
>>>>>>>>> ostname=? addr=? terminal=? res=success'
>>>>>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
>>>>>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>>>>>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
>>>>>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>>>>>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>>>>>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>>>>>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
>>>>>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
>>>>>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
>>>>>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
>>>>>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
>>>>>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
>>>>>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
>>>>>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
>>>>>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
>>>>>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
>>>>>>>>> [   21.455067] Call Trace:
>>>>>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
>>>>>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
>>>>>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
>>>>>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
>>>>>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>>>>>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
>>>>>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
>>>>>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
>>>>>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
>>>>>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
>>>>>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
>>>>>>>>> [   21.455136] Last Breaking-Event-Address:
>>>>>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
>>>>>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>>>>>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
>>>>>>>>
>>>>>>>> Thinking about this issue further, I can't understand the root cause for
>>>>>>>> this issue.
>>>>
>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
>>>
>>> I think the following patch is needed, and this way aligns to the mapping
>>> created via managed IRQ at least.
>>>
>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>> index 9f8cffc8a701..638ab5c11b3c 100644
>>> --- a/block/blk-mq-cpumap.c
>>> +++ b/block/blk-mq-cpumap.c
>>> @@ -14,13 +14,12 @@
>>>  #include "blk.h"
>>>  #include "blk-mq.h"
>>>
>>> +/*
>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
>>> + * queues even it isn't present yet.
>>> + */
>>>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>>>  {
>>> -	/*
>>> -	 * Non present CPU will be mapped to queue index 0.
>>> -	 */
>>> -	if (!cpu_present(cpu))
>>> -		return 0;
>>>  	return cpu % nr_queues;
>>>  }
>>>
>>> Thanks,
>>> Ming
>>>
>>
>> With that I no longer see the WARN_ON but the other warning instead:
>>
>> [   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
>> [   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
>> [   31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd).
>> [   32.000543] systemd: 18 output lines suppressed due to ratelimiting
>> [   32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null)
>> [   32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1
>> [   32.359832] tun: Universal TUN/TAP device driver, 1.6
>> [   32.470841] run queue from wrong CPU 18, hctx active
> 
> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and
> you also said CPU hotplug isn't involved in your test, so I am just
> wondering where the CPU 18 is from?


I have 2 test systems. One with 44CPU the other with 16.
The one  with 16 now has the same flood of messages:


    4.454510]  dasdc:VOL1/  0X3F77: dasdc1
[    4.454592]  dasdd:VOL1/  0X3F74: dasdd1
[    4.593711] run queue from wrong CPU 0, hctx active
[    4.593715] CPU: 0 PID: 4 Comm: kworker/0:0H Not tainted 4.16.0-rc7+ #1
[    4.593717] Hardware name: IBM 2964 NC9 704 (LPAR)
[    4.593724] Workqueue: kblockd blk_mq_run_work_fn
[    4.593726] Call Trace:
[    4.593731] ([<0000000000113b86>] show_stack+0x56/0x80)
[    4.593735]  [<0000000000a5acd2>] dump_stack+0x82/0xb0 
[    4.593737]  [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 
[    4.593741]  [<0000000000162ede>] process_one_work+0x1be/0x420 
[    4.593742]  [<0000000000163198>] worker_thread+0x58/0x458 
[    4.593745]  [<0000000000169fa8>] kthread+0x148/0x160 
[    4.593748]  [<0000000000a79e72>] kernel_thread_starter+0x6/0xc 
[    4.593749]  [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc 
[    4.611606] dasdconf.sh Warning: 0.0.3f75 is already online, not configuring
[    4.626454] run queue from wrong CPU 10, hctx active
[    4.626456] CPU: 10 PID: 62 Comm: kworker/10:0H Not tainted 4.16.0-rc7+ #1
[    4.626458] Hardware name: IBM 2964 NC9 704 (LPAR)
[    4.626462] Workqueue: kblockd blk_mq_run_work_fn
[    4.626463] Call Trace:
[    4.626466] ([<0000000000113b86>] show_stack+0x56/0x80)
[    4.626468]  [<0000000000a5acd2>] dump_stack+0x82/0xb0 
[    4.626469]  [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 
[    4.626471]  [<0000000000162ede>] process_one_work+0x1be/0x420 
[    4.626473]  [<0000000000163198>] worker_thread+0x58/0x458 
[    4.626474]  [<0000000000169fa8>] kthread+0x148/0x160 
[    4.626476]  [<0000000000a79e72>] kernel_thread_starter+0x6/0xc 
[    4.626477]  [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc 
[    4.699514] dasdconf.sh Warning: 0.0.3f76 is already online, not configuring
[    4.709725] random: crng init done
[    4.711200] dasdconf.sh Warning: 0.0.3f74 is already online, not configuring
[    4.718452] dasdconf.sh Warning: 0.0.3f77 is already online, not configuring
[    4.726455] EXT4-fs (dasdd1): mounted filesystem with ordered data mode. Opts: (null)
[    5.075280] systemd-journald[208]: Received SIGTERM from PID 1 (systemd).
[    5.114536] run queue from wrong CPU 8, hctx active
[    5.114539] CPU: 8 PID: 1542 Comm: kworker/8:1H Not tainted 4.16.0-rc7+ #1
[    5.114541] Hardware name: IBM 2964 NC9 704 (LPAR)
[    5.114544] Workqueue: kblockd blk_mq_run_work_fn
[    5.114545] Call Trace:
[    5.114548] ([<0000000000113b86>] show_stack+0x56/0x80)
[    5.114550]  [<0000000000a5acd2>] dump_stack+0x82/0xb0 
[    5.114551]  [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 
[    5.114553]  [<0000000000162ede>] process_one_work+0x1be/0x420 
[    5.114555]  [<0000000000163198>] worker_thread+0x58/0x458 
[    5.114556]  [<0000000000169fa8>] kthread+0x148/0x160 
[    5.114558]  [<0000000000a79e72>] kernel_thread_starter+0x6/0xc 
[    5.114559]  [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc 
[    5.137222] systemd: 16 output lines suppressed due to ratelimiting
[    5.663932] run queue from wrong CPU 7, hctx active
[    5.663959] CPU: 7 PID: 1574 Comm: kworker/7:1H Not tainted 4.16.0-rc7+ #1
[    5.663972] Hardware name: IBM 2964 NC9 704 (LPAR)
[    5.663999] Workqueue: kblockd blk_mq_run_work_fn
[    5.664012] Call Trace:
[    5.664034] ([<0000000000113b86>] show_stack+0x56/0x80)
[    5.664053]  [<0000000000a5acd2>] dump_stack+0x82/0xb0 
[    5.664064]  [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 
[    5.664082]  [<0000000000162ede>] process_one_work+0x1be/0x420 
[    5.664097]  [<0000000000163198>] worker_thread+0x58/0x458 
[    5.664110]  [<0000000000169fa8>] kthread+0x148/0x160 
[    5.664123]  [<0000000000a79e72>] kernel_thread_starter+0x6/0xc 
[    5.664136]  [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc 
[    5.796783] run queue from wrong CPU 7, hctx active
[    5.796811] CPU: 7 PID: 1574 Comm: kworker/7:1H Not tainted 4.16.0-rc7+ #1
[    5.796828] Hardware name: IBM 2964 NC9 704 (LPAR)
[    5.796850] Workqueue: kblockd blk_mq_run_work_fn
[    5.796866] Call Trace:
[    5.796874] ([<0000000000113b86>] show_stack+0x56/0x80)
[    5.796878]  [<0000000000a5acd2>] dump_stack+0x82/0xb0 
[    5.796888]  [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 
[    5.796902]  [<0000000000162ede>] process_one_work+0x1be/0x420 
[    5.796917]  [<0000000000163198>] worker_thread+0x58/0x458 
[    5.796931]  [<0000000000169fa8>] kthread+0x148/0x160 
[    5.796944]  [<0000000000a79e72>] kernel_thread_starter+0x6/0xc 
[    5.796957]  [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc 
[    5.824996] run queue from wrong CPU 7, hctx active
[    5.825017] CPU: 7 PID: 1574 Comm: kworker/7:1H Not tainted 4.16.0-rc7+ #1
[    5.825028] Hardware name: IBM 2964 NC9 704 (LPAR)
[    5.825046] Workqueue: kblockd blk_mq_run_work_fn
[    5.825061] Call Trace:
[    5.825076] ([<0000000000113b86>] show_stack+0x56/0x80)
[    5.825089]  [<0000000000a5acd2>] dump_stack+0x82/0xb0 
[    5.825105]  [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 
[    5.825119]  [<0000000000162ede>] process_one_work+0x1be/0x420 
[    5.825133]  [<0000000000163198>] worker_thread+0x58/0x458 
[    5.825147]  [<0000000000169fa8>] kthread+0x148/0x160 
[    5.825160]  [<0000000000a79e72>] kernel_thread_starter+0x6/0xc 
[    5.825176]  [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc 
[    5.900186] run queue from wrong CPU 7, hctx active
[    5.900211] CPU: 7 PID: 1574 Comm: kworker/7:1H Not tainted 4.16.0-rc7+ #1
[    5.900246] Hardware name: IBM 2964 NC9 704 (LPAR)
[    5.900269] Workqueue: kblockd blk_mq_run_work_fn
[    5.900280] Call Trace:
[    5.900298] ([<0000000000113b86>] show_stack+0x56/0x80)
[    5.900314]  [<0000000000a5acd2>] dump_stack+0x82/0xb0 
[    5.900318]  [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 
[    5.900322]  [<0000000000162ede>] process_one_work+0x1be/0x420 
[    5.900338]  [<0000000000163198>] worker_thread+0x58/0x458 
[    5.900351]  [<0000000000169fa8>] kthread+0x148/0x160 
[    5.900365]  [<0000000000a79e72>] kernel_thread_starter+0x6/0xc 
[    5.900379]  [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc 
[    6.221462] EXT4-fs (dasdd1): re-mounted. Opts: (null)
[    6.249875] run queue from wrong CPU 4, hctx active
[    6.249883] CPU: 4 PID: 1515 Comm: kworker/4:1H Not tainted 4.16.0-rc7+ #1
[    6.249886] Hardware name: IBM 2964 NC9 704 (LPAR)
[    6.249892] Workqueue: kblockd blk_mq_run_work_fn
[    6.249895] Call Trace:
[    6.249899] ([<0000000000113b86>] sho

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29 10:49                     ` Christian Borntraeger
@ 2018-03-29 11:43                       ` Ming Lei
  2018-03-29 11:49                         ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-03-29 11:43 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 12:48 PM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 11:40 AM, Ming Lei wrote:
> >>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> >>>>>
> >>>>>
> >>>>> On 03/29/2018 04:00 AM, Ming Lei wrote:
> >>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
> >>>>>>>> Hi Christian,
> >>>>>>>>
> >>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> >>>>>>>>> FWIW, this patch does not fix the issue for me:
> >>>>>>>>>
> >>>>>>>>> ostname=? addr=? terminal=? res=success'
> >>>>>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >>>>>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> >>>>>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> >>>>>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >>>>>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >>>>>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >>>>>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> >>>>>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> >>>>>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> >>>>>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> >>>>>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
> >>>>>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
> >>>>>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
> >>>>>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
> >>>>>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
> >>>>>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> >>>>>>>>> [   21.455067] Call Trace:
> >>>>>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> >>>>>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> >>>>>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> >>>>>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> >>>>>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> >>>>>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> >>>>>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> >>>>>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> >>>>>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> >>>>>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> >>>>>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> >>>>>>>>> [   21.455136] Last Breaking-Event-Address:
> >>>>>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> >>>>>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >>>>>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> >>>>>>>>
> >>>>>>>> Thinking about this issue further, I can't understand the root cause for
> >>>>>>>> this issue.
> >>>>
> >>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> >>>
> >>> I think the following patch is needed, and this way aligns to the mapping
> >>> created via managed IRQ at least.
> >>>
> >>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> >>> index 9f8cffc8a701..638ab5c11b3c 100644
> >>> --- a/block/blk-mq-cpumap.c
> >>> +++ b/block/blk-mq-cpumap.c
> >>> @@ -14,13 +14,12 @@
> >>>  #include "blk.h"
> >>>  #include "blk-mq.h"
> >>>
> >>> +/*
> >>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
> >>> + * queues even it isn't present yet.
> >>> + */
> >>>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> >>>  {
> >>> -	/*
> >>> -	 * Non present CPU will be mapped to queue index 0.
> >>> -	 */
> >>> -	if (!cpu_present(cpu))
> >>> -		return 0;
> >>>  	return cpu % nr_queues;
> >>>  }
> >>>
> >>> Thanks,
> >>> Ming
> >>>
> >>
> >> With that I no longer see the WARN_ON but the other warning instead:
> >>
> >> [   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> >> [   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> >> [   31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd).
> >> [   32.000543] systemd: 18 output lines suppressed due to ratelimiting
> >> [   32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null)
> >> [   32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1
> >> [   32.359832] tun: Universal TUN/TAP device driver, 1.6
> >> [   32.470841] run queue from wrong CPU 18, hctx active
> > 
> > But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and
> > you also said CPU hotplug isn't involved in your test, so I am just
> > wondering where the CPU 18 is from?
> 
> 
> I have 2 test systems. One with 44CPU the other with 16.
> The one  with 16 now has the same flood of messages:
> 
> 
>     4.454510]  dasdc:VOL1/  0X3F77: dasdc1
> [    4.454592]  dasdd:VOL1/  0X3F74: dasdd1
> [    4.593711] run queue from wrong CPU 0, hctx active

Still can't reproduce your issue, could you please collect debugfs
log again on your 16-core system after applying the patch on blk_mq_cpumap.c?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29 11:43                       ` Ming Lei
@ 2018-03-29 11:49                         ` Christian Borntraeger
  2018-03-30  2:53                           ` Ming Lei
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2018-03-29 11:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 6559 bytes --]



On 03/29/2018 01:43 PM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 12:48 PM, Ming Lei wrote:
>>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 03/29/2018 11:40 AM, Ming Lei wrote:
>>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote:
>>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
>>>>>>>>>> Hi Christian,
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>>>>>>>>>>> FWIW, this patch does not fix the issue for me:
>>>>>>>>>>>
>>>>>>>>>>> ostname=? addr=? terminal=? res=success'
>>>>>>>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
>>>>>>>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>>>>>>>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
>>>>>>>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>>>>>>>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>>>>>>>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>>>>>>>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
>>>>>>>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
>>>>>>>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
>>>>>>>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
>>>>>>>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>>>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
>>>>>>>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
>>>>>>>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
>>>>>>>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>>>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
>>>>>>>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
>>>>>>>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
>>>>>>>>>>> [   21.455067] Call Trace:
>>>>>>>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
>>>>>>>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
>>>>>>>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
>>>>>>>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
>>>>>>>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>>>>>>>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
>>>>>>>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
>>>>>>>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
>>>>>>>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
>>>>>>>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
>>>>>>>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
>>>>>>>>>>> [   21.455136] Last Breaking-Event-Address:
>>>>>>>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
>>>>>>>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>>>>>>>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
>>>>>>>>>>
>>>>>>>>>> Thinking about this issue further, I can't understand the root cause for
>>>>>>>>>> this issue.
>>>>>>
>>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
>>>>>
>>>>> I think the following patch is needed, and this way aligns to the mapping
>>>>> created via managed IRQ at least.
>>>>>
>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>>> index 9f8cffc8a701..638ab5c11b3c 100644
>>>>> --- a/block/blk-mq-cpumap.c
>>>>> +++ b/block/blk-mq-cpumap.c
>>>>> @@ -14,13 +14,12 @@
>>>>>  #include "blk.h"
>>>>>  #include "blk-mq.h"
>>>>>
>>>>> +/*
>>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
>>>>> + * queues even it isn't present yet.
>>>>> + */
>>>>>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>>>>>  {
>>>>> -	/*
>>>>> -	 * Non present CPU will be mapped to queue index 0.
>>>>> -	 */
>>>>> -	if (!cpu_present(cpu))
>>>>> -		return 0;
>>>>>  	return cpu % nr_queues;
>>>>>  }
>>>>>
>>>>> Thanks,
>>>>> Ming
>>>>>
>>>>
>>>> With that I no longer see the WARN_ON but the other warning instead:
>>>>
>>>> [   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
>>>> [   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
>>>> [   31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd).
>>>> [   32.000543] systemd: 18 output lines suppressed due to ratelimiting
>>>> [   32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null)
>>>> [   32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1
>>>> [   32.359832] tun: Universal TUN/TAP device driver, 1.6
>>>> [   32.470841] run queue from wrong CPU 18, hctx active
>>>
>>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and
>>> you also said CPU hotplug isn't involved in your test, so I am just
>>> wondering where the CPU 18 is from?
>>
>>
>> I have 2 test systems. One with 44CPU the other with 16.
>> The one  with 16 now has the same flood of messages:
>>
>>
>>     4.454510]  dasdc:VOL1/  0X3F77: dasdc1
>> [    4.454592]  dasdd:VOL1/  0X3F74: dasdd1
>> [    4.593711] run queue from wrong CPU 0, hctx active
> 
> Still can't reproduce your issue, could you please collect debugfs
> log again on your 16-core system after applying the patch on blk_mq_cpumap.c?

done.

If you need anything quick, you can reach mit via irc freenode or oftc (user borntraeger)

[-- Attachment #2: dmesg.gz --]
[-- Type: application/gzip, Size: 10778 bytes --]

[-- Attachment #3: log.gz --]
[-- Type: application/gzip, Size: 338323 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-29 11:49                         ` Christian Borntraeger
@ 2018-03-30  2:53                           ` Ming Lei
  2018-04-04  8:18                             ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-03-30  2:53 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/29/2018 01:43 PM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 12:48 PM, Ming Lei wrote:
> >>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 03/29/2018 11:40 AM, Ming Lei wrote:
> >>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote:
> >>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
> >>>>>>>>>> Hi Christian,
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> >>>>>>>>>>> FWIW, this patch does not fix the issue for me:
> >>>>>>>>>>>
> >>>>>>>>>>> ostname=? addr=? terminal=? res=success'
> >>>>>>>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >>>>>>>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> >>>>>>>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> >>>>>>>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >>>>>>>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >>>>>>>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >>>>>>>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> >>>>>>>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> >>>>>>>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> >>>>>>>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> >>>>>>>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>>>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
> >>>>>>>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
> >>>>>>>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
> >>>>>>>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>>>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
> >>>>>>>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
> >>>>>>>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> >>>>>>>>>>> [   21.455067] Call Trace:
> >>>>>>>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> >>>>>>>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> >>>>>>>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> >>>>>>>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> >>>>>>>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> >>>>>>>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> >>>>>>>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> >>>>>>>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> >>>>>>>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> >>>>>>>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> >>>>>>>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> >>>>>>>>>>> [   21.455136] Last Breaking-Event-Address:
> >>>>>>>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> >>>>>>>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >>>>>>>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> >>>>>>>>>>
> >>>>>>>>>> Thinking about this issue further, I can't understand the root cause for
> >>>>>>>>>> this issue.
> >>>>>>
> >>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> >>>>>
> >>>>> I think the following patch is needed, and this way aligns to the mapping
> >>>>> created via managed IRQ at least.
> >>>>>
> >>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> >>>>> index 9f8cffc8a701..638ab5c11b3c 100644
> >>>>> --- a/block/blk-mq-cpumap.c
> >>>>> +++ b/block/blk-mq-cpumap.c
> >>>>> @@ -14,13 +14,12 @@
> >>>>>  #include "blk.h"
> >>>>>  #include "blk-mq.h"
> >>>>>
> >>>>> +/*
> >>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
> >>>>> + * queues even it isn't present yet.
> >>>>> + */
> >>>>>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> >>>>>  {
> >>>>> -	/*
> >>>>> -	 * Non present CPU will be mapped to queue index 0.
> >>>>> -	 */
> >>>>> -	if (!cpu_present(cpu))
> >>>>> -		return 0;
> >>>>>  	return cpu % nr_queues;
> >>>>>  }
> >>>>>
> >>>>> Thanks,
> >>>>> Ming
> >>>>>
> >>>>
> >>>> With that I no longer see the WARN_ON but the other warning instead:
> >>>>
> >>>> [   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> >>>> [   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> >>>> [   31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd).
> >>>> [   32.000543] systemd: 18 output lines suppressed due to ratelimiting
> >>>> [   32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null)
> >>>> [   32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1
> >>>> [   32.359832] tun: Universal TUN/TAP device driver, 1.6
> >>>> [   32.470841] run queue from wrong CPU 18, hctx active
> >>>
> >>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and
> >>> you also said CPU hotplug isn't involved in your test, so I am just
> >>> wondering where the CPU 18 is from?
> >>
> >>
> >> I have 2 test systems. One with 44CPU the other with 16.
> >> The one  with 16 now has the same flood of messages:
> >>
> >>
> >>     4.454510]  dasdc:VOL1/  0X3F77: dasdc1
> >> [    4.454592]  dasdd:VOL1/  0X3F74: dasdd1
> >> [    4.593711] run queue from wrong CPU 0, hctx active
> > 
> > Still can't reproduce your issue, could you please collect debugfs
> > log again on your 16-core system after applying the patch on blk_mq_cpumap.c?
> 
> done.

OK, thanks, from the dumped mapping, looks everything is fine, each hctx
is mapped to 4 CPUs, and only one of them is online.

But still don't know why hctx is run from wrong CPU, looks we need more
info to dump, could you apply the following debug patch and post the
log?

---
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..638ab5c11b3c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -14,13 +14,12 @@
 #include "blk.h"
 #include "blk-mq.h"
 
+/*
+ * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
+ * queues even it isn't present yet.
+ */
 static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
 {
-	/*
-	 * Non present CPU will be mapped to queue index 0.
-	 */
-	if (!cpu_present(cpu))
-		return 0;
 	return cpu % nr_queues;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..65767be7927d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1325,9 +1325,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	 */
 	if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
 		cpu_online(hctx->next_cpu)) {
-		printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n",
-			raw_smp_processor_id(),
+		int cpu;
+		printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n",
+			raw_smp_processor_id(), hctx->next_cpu,
+			hctx->queue_num,
 			cpumask_empty(hctx->cpumask) ? "inactive": "active");
+		for_each_cpu(cpu, hctx->cpumask)
+			printk("%d ", cpu);
+		printk("\n");
 		dump_stack();
 	}
 
> 
> If you need anything quick, you can reach mit via irc freenode or oftc (user borntraeger)

OK, that is cool!

Thanks,
Ming

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-03-30  2:53                           ` Ming Lei
@ 2018-04-04  8:18                             ` Christian Borntraeger
  2018-04-05 16:05                               ` Ming Lei
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-04  8:18 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 8698 bytes --]



On 03/30/2018 04:53 AM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 01:43 PM, Ming Lei wrote:
>>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 03/29/2018 12:48 PM, Ming Lei wrote:
>>>>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 03/29/2018 11:40 AM, Ming Lei wrote:
>>>>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote:
>>>>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
>>>>>>>>>>>>> FWIW, this patch does not fix the issue for me:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ostname=? addr=? terminal=? res=success'
>>>>>>>>>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
>>>>>>>>>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
>>>>>>>>>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
>>>>>>>>>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
>>>>>>>>>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
>>>>>>>>>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>>>>>>>>>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
>>>>>>>>>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
>>>>>>>>>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
>>>>>>>>>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
>>>>>>>>>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>>>>>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
>>>>>>>>>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
>>>>>>>>>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
>>>>>>>>>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
>>>>>>>>>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
>>>>>>>>>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
>>>>>>>>>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
>>>>>>>>>>>>> [   21.455067] Call Trace:
>>>>>>>>>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
>>>>>>>>>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
>>>>>>>>>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
>>>>>>>>>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
>>>>>>>>>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
>>>>>>>>>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
>>>>>>>>>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
>>>>>>>>>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
>>>>>>>>>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
>>>>>>>>>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
>>>>>>>>>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
>>>>>>>>>>>>> [   21.455136] Last Breaking-Event-Address:
>>>>>>>>>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
>>>>>>>>>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
>>>>>>>>>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
>>>>>>>>>>>>
>>>>>>>>>>>> Thinking about this issue further, I can't understand the root cause for
>>>>>>>>>>>> this issue.
>>>>>>>>
>>>>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
>>>>>>>
>>>>>>> I think the following patch is needed, and this way aligns to the mapping
>>>>>>> created via managed IRQ at least.
>>>>>>>
>>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>>>>> index 9f8cffc8a701..638ab5c11b3c 100644
>>>>>>> --- a/block/blk-mq-cpumap.c
>>>>>>> +++ b/block/blk-mq-cpumap.c
>>>>>>> @@ -14,13 +14,12 @@
>>>>>>>  #include "blk.h"
>>>>>>>  #include "blk-mq.h"
>>>>>>>
>>>>>>> +/*
>>>>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
>>>>>>> + * queues even it isn't present yet.
>>>>>>> + */
>>>>>>>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>>>>>>>  {
>>>>>>> -	/*
>>>>>>> -	 * Non present CPU will be mapped to queue index 0.
>>>>>>> -	 */
>>>>>>> -	if (!cpu_present(cpu))
>>>>>>> -		return 0;
>>>>>>>  	return cpu % nr_queues;
>>>>>>>  }
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ming
>>>>>>>
>>>>>>
>>>>>> With that I no longer see the WARN_ON but the other warning instead:
>>>>>>
>>>>>> [   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
>>>>>> [   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
>>>>>> [   31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd).
>>>>>> [   32.000543] systemd: 18 output lines suppressed due to ratelimiting
>>>>>> [   32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null)
>>>>>> [   32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1
>>>>>> [   32.359832] tun: Universal TUN/TAP device driver, 1.6
>>>>>> [   32.470841] run queue from wrong CPU 18, hctx active
>>>>>
>>>>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and
>>>>> you also said CPU hotplug isn't involved in your test, so I am just
>>>>> wondering where the CPU 18 is from?
>>>>
>>>>
>>>> I have 2 test systems. One with 44CPU the other with 16.
>>>> The one  with 16 now has the same flood of messages:
>>>>
>>>>
>>>>     4.454510]  dasdc:VOL1/  0X3F77: dasdc1
>>>> [    4.454592]  dasdd:VOL1/  0X3F74: dasdd1
>>>> [    4.593711] run queue from wrong CPU 0, hctx active
>>>
>>> Still can't reproduce your issue, could you please collect debugfs
>>> log again on your 16-core system after applying the patch on blk_mq_cpumap.c?
>>
>> done.
> 
> OK, thanks, from the dumped mapping, looks everything is fine, each hctx
> is mapped to 4 CPUs, and only one of them is online.
> 
> But still don't know why hctx is run from wrong CPU, looks we need more
> info to dump, could you apply the following debug patch and post the
> log?
> 
> ---
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..638ab5c11b3c 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -14,13 +14,12 @@
>  #include "blk.h"
>  #include "blk-mq.h"
> 
> +/*
> + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
> + * queues even it isn't present yet.
> + */
>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>  {
> -	/*
> -	 * Non present CPU will be mapped to queue index 0.
> -	 */
> -	if (!cpu_present(cpu))
> -		return 0;
>  	return cpu % nr_queues;
>  }
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 16e83e6df404..65767be7927d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1325,9 +1325,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>  	 */
>  	if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>  		cpu_online(hctx->next_cpu)) {
> -		printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n",
> -			raw_smp_processor_id(),
> +		int cpu;
> +		printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n",
> +			raw_smp_processor_id(), hctx->next_cpu,
> +			hctx->queue_num,
>  			cpumask_empty(hctx->cpumask) ? "inactive": "active");
> +		for_each_cpu(cpu, hctx->cpumask)
> +			printk("%d ", cpu);
> +		printk("\n");
>  		dump_stack();
>  	}
> 


attached. FWIW, it looks like these messages happen mostly during boot and come less
and less often the longer the system runs. Could it be that the workqeue is misplaced
before it runs for the first time, but then it is ok?



[-- Attachment #2: dmesg.gz --]
[-- Type: application/gzip, Size: 22606 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-04  8:18                             ` Christian Borntraeger
@ 2018-04-05 16:05                               ` Ming Lei
  2018-04-05 16:11                                 ` Ming Lei
  2018-04-06  8:35                                 ` Christian Borntraeger
  0 siblings, 2 replies; 40+ messages in thread
From: Ming Lei @ 2018-04-05 16:05 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Wed, Apr 04, 2018 at 10:18:13AM +0200, Christian Borntraeger wrote:
> 
> 
> On 03/30/2018 04:53 AM, Ming Lei wrote:
> > On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 03/29/2018 01:43 PM, Ming Lei wrote:
> >>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 03/29/2018 12:48 PM, Ming Lei wrote:
> >>>>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 03/29/2018 11:40 AM, Ming Lei wrote:
> >>>>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote:
> >>>>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
> >>>>>>>>>>>> Hi Christian,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> >>>>>>>>>>>>> FWIW, this patch does not fix the issue for me:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ostname=? addr=? terminal=? res=success'
> >>>>>>>>>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> >>>>>>>>>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> >>>>>>>>>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> >>>>>>>>>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> >>>>>>>>>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> >>>>>>>>>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >>>>>>>>>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> >>>>>>>>>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> >>>>>>>>>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> >>>>>>>>>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> >>>>>>>>>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>>>>>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
> >>>>>>>>>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
> >>>>>>>>>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
> >>>>>>>>>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> >>>>>>>>>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
> >>>>>>>>>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
> >>>>>>>>>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> >>>>>>>>>>>>> [   21.455067] Call Trace:
> >>>>>>>>>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> >>>>>>>>>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> >>>>>>>>>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> >>>>>>>>>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> >>>>>>>>>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> >>>>>>>>>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> >>>>>>>>>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> >>>>>>>>>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> >>>>>>>>>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> >>>>>>>>>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> >>>>>>>>>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> >>>>>>>>>>>>> [   21.455136] Last Breaking-Event-Address:
> >>>>>>>>>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> >>>>>>>>>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> >>>>>>>>>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thinking about this issue further, I can't understand the root cause for
> >>>>>>>>>>>> this issue.
> >>>>>>>>
> >>>>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> >>>>>>>
> >>>>>>> I think the following patch is needed, and this way aligns to the mapping
> >>>>>>> created via managed IRQ at least.
> >>>>>>>
> >>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> >>>>>>> index 9f8cffc8a701..638ab5c11b3c 100644
> >>>>>>> --- a/block/blk-mq-cpumap.c
> >>>>>>> +++ b/block/blk-mq-cpumap.c
> >>>>>>> @@ -14,13 +14,12 @@
> >>>>>>>  #include "blk.h"
> >>>>>>>  #include "blk-mq.h"
> >>>>>>>
> >>>>>>> +/*
> >>>>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
> >>>>>>> + * queues even it isn't present yet.
> >>>>>>> + */
> >>>>>>>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> >>>>>>>  {
> >>>>>>> -	/*
> >>>>>>> -	 * Non present CPU will be mapped to queue index 0.
> >>>>>>> -	 */
> >>>>>>> -	if (!cpu_present(cpu))
> >>>>>>> -		return 0;
> >>>>>>>  	return cpu % nr_queues;
> >>>>>>>  }
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Ming
> >>>>>>>
> >>>>>>
> >>>>>> With that I no longer see the WARN_ON but the other warning instead:
> >>>>>>
> >>>>>> [   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> >>>>>> [   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> >>>>>> [   31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd).
> >>>>>> [   32.000543] systemd: 18 output lines suppressed due to ratelimiting
> >>>>>> [   32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null)
> >>>>>> [   32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1
> >>>>>> [   32.359832] tun: Universal TUN/TAP device driver, 1.6
> >>>>>> [   32.470841] run queue from wrong CPU 18, hctx active
> >>>>>
> >>>>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and
> >>>>> you also said CPU hotplug isn't involved in your test, so I am just
> >>>>> wondering where the CPU 18 is from?
> >>>>
> >>>>
> >>>> I have 2 test systems. One with 44CPU the other with 16.
> >>>> The one  with 16 now has the same flood of messages:
> >>>>
> >>>>
> >>>>     4.454510]  dasdc:VOL1/  0X3F77: dasdc1
> >>>> [    4.454592]  dasdd:VOL1/  0X3F74: dasdd1
> >>>> [    4.593711] run queue from wrong CPU 0, hctx active
> >>>
> >>> Still can't reproduce your issue, could you please collect debugfs
> >>> log again on your 16-core system after applying the patch on blk_mq_cpumap.c?
> >>
> >> done.
> > 
> > OK, thanks, from the dumped mapping, looks everything is fine, each hctx
> > is mapped to 4 CPUs, and only one of them is online.
> > 
> > But still don't know why hctx is run from wrong CPU, looks we need more
> > info to dump, could you apply the following debug patch and post the
> > log?
> > 
> > ---
> > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> > index 9f8cffc8a701..638ab5c11b3c 100644
> > --- a/block/blk-mq-cpumap.c
> > +++ b/block/blk-mq-cpumap.c
> > @@ -14,13 +14,12 @@
> >  #include "blk.h"
> >  #include "blk-mq.h"
> > 
> > +/*
> > + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
> > + * queues even it isn't present yet.
> > + */
> >  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> >  {
> > -	/*
> > -	 * Non present CPU will be mapped to queue index 0.
> > -	 */
> > -	if (!cpu_present(cpu))
> > -		return 0;
> >  	return cpu % nr_queues;
> >  }
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 16e83e6df404..65767be7927d 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1325,9 +1325,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> >  	 */
> >  	if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> >  		cpu_online(hctx->next_cpu)) {
> > -		printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n",
> > -			raw_smp_processor_id(),
> > +		int cpu;
> > +		printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n",
> > +			raw_smp_processor_id(), hctx->next_cpu,
> > +			hctx->queue_num,
> >  			cpumask_empty(hctx->cpumask) ? "inactive": "active");
> > +		for_each_cpu(cpu, hctx->cpumask)
> > +			printk("%d ", cpu);
> > +		printk("\n");
> >  		dump_stack();
> >  	}
> > 
> 
> 
> attached. FWIW, it looks like these messages happen mostly during boot and come less
> and less often the longer the system runs. Could it be that the workqeue is misplaced
> before it runs for the first time, but then it is ok?

Looks not workqueue's issue, and it shows that hctx->next_cpu is figured
out as wrong by blk_mq_hctx_next_cpu(), maybe on your ARCH, either 'nr_cpu_ids'
or 'cpu_online_mask' is too strange during booting and breaks blk_mq_hctx_next_cpu().

Could you please apply the following patch and provide the dmesg boot log?

---
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..638ab5c11b3c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -14,13 +14,12 @@
 #include "blk.h"
 #include "blk-mq.h"
 
+/*
+ * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
+ * queues even it isn't present yet.
+ */
 static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
 {
-	/*
-	 * Non present CPU will be mapped to queue index 0.
-	 */
-	if (!cpu_present(cpu))
-		return 0;
 	return cpu % nr_queues;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 90838e998f66..996f8a963026 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1324,9 +1324,18 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	 */
 	if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
 		cpu_online(hctx->next_cpu)) {
-		printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n",
-			raw_smp_processor_id(),
+		int cpu;
+		printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n",
+			raw_smp_processor_id(), hctx->next_cpu,
+			hctx->queue_num,
 			cpumask_empty(hctx->cpumask) ? "inactive": "active");
+		printk("dump CPUs mapped to this hctx:\n");
+		for_each_cpu(cpu, hctx->cpumask)
+			printk("%d ", cpu);
+		printk("\n");
+		printk("nr_cpu_ids is %d, and dump online cpus:\n", nr_cpu_ids);
+		for_each_cpu(cpu, cpu_online_mask)
+			printk("%d ", cpu);
 		dump_stack();
 	}
 

Thanks,
Ming

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-05 16:05                               ` Ming Lei
@ 2018-04-05 16:11                                 ` Ming Lei
  2018-04-05 17:39                                   ` Christian Borntraeger
  2018-04-06  8:35                                 ` Christian Borntraeger
  1 sibling, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-04-05 16:11 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Fri, Apr 06, 2018 at 12:05:03AM +0800, Ming Lei wrote:
> On Wed, Apr 04, 2018 at 10:18:13AM +0200, Christian Borntraeger wrote:
> > 
> > 
> > On 03/30/2018 04:53 AM, Ming Lei wrote:
> > > On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote:
> > >>
> > >>
> > >> On 03/29/2018 01:43 PM, Ming Lei wrote:
> > >>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:
> > >>>>
> > >>>>
> > >>>> On 03/29/2018 12:48 PM, Ming Lei wrote:
> > >>>>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>> On 03/29/2018 11:40 AM, Ming Lei wrote:
> > >>>>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote:
> > >>>>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
> > >>>>>>>>>>>> Hi Christian,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote:
> > >>>>>>>>>>>>> FWIW, this patch does not fix the issue for me:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> ostname=? addr=? terminal=? res=success'
> > >>>>>>>>>>>>> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8
> > >>>>>>>>>>>>> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4
> > >>>>>>>>>>>>> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26
> > >>>>>>>>>>>>> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> > >>>>>>>>>>>>> [   21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> > >>>>>>>>>>>>> [   21.454996]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > >>>>>>>>>>>>> [   21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001
> > >>>>>>>>>>>>> [   21.455008]            0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98
> > >>>>>>>>>>>>> [   21.455011]            00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000
> > >>>>>>>>>>>>> [   21.455014]            0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0
> > >>>>>>>>>>>>> [   21.455032] Krnl Code: 000000000069c596: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> > >>>>>>>>>>>>>                           000000000069c59c: c0f4ffff7a5e	brcl	15,68ba58
> > >>>>>>>>>>>>>                          #000000000069c5a2: a7f40001		brc	15,69c5a4
> > >>>>>>>>>>>>>                          >000000000069c5a6: e340f0c00004	lg	%r4,192(%r15)
> > >>>>>>>>>>>>>                           000000000069c5ac: ebaff0a00004	lmg	%r10,%r15,160(%r15)
> > >>>>>>>>>>>>>                           000000000069c5b2: 07f4		bcr	15,%r4
> > >>>>>>>>>>>>>                           000000000069c5b4: c0e5fffffeea	brasl	%r14,69c388
> > >>>>>>>>>>>>>                           000000000069c5ba: a7f4fff6		brc	15,69c5a6
> > >>>>>>>>>>>>> [   21.455067] Call Trace:
> > >>>>>>>>>>>>> [   21.455072] ([<00000001b691fd98>] 0x1b691fd98)
> > >>>>>>>>>>>>> [   21.455079]  [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 
> > >>>>>>>>>>>>> [   21.455083]  [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 
> > >>>>>>>>>>>>> [   21.455089]  [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 
> > >>>>>>>>>>>>> [   21.455091]  [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 
> > >>>>>>>>>>>>> [   21.455103]  [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 
> > >>>>>>>>>>>>> [   21.455110]  [<000000000014c742>] tasklet_hi_action+0x92/0x120 
> > >>>>>>>>>>>>> [   21.455118]  [<0000000000a7cfc0>] __do_softirq+0x120/0x348 
> > >>>>>>>>>>>>> [   21.455122]  [<000000000014c212>] irq_exit+0xba/0xd0 
> > >>>>>>>>>>>>> [   21.455130]  [<000000000010bf92>] do_IRQ+0x8a/0xb8 
> > >>>>>>>>>>>>> [   21.455133]  [<0000000000a7c298>] io_int_handler+0x130/0x298 
> > >>>>>>>>>>>>> [   21.455136] Last Breaking-Event-Address:
> > >>>>>>>>>>>>> [   21.455138]  [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8
> > >>>>>>>>>>>>> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> > >>>>>>>>>>>>> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thinking about this issue further, I can't understand the root cause for
> > >>>>>>>>>>>> this issue.
> > >>>>>>>>
> > >>>>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
> > >>>>>>>
> > >>>>>>> I think the following patch is needed, and this way aligns to the mapping
> > >>>>>>> created via managed IRQ at least.
> > >>>>>>>
> > >>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> > >>>>>>> index 9f8cffc8a701..638ab5c11b3c 100644
> > >>>>>>> --- a/block/blk-mq-cpumap.c
> > >>>>>>> +++ b/block/blk-mq-cpumap.c
> > >>>>>>> @@ -14,13 +14,12 @@
> > >>>>>>>  #include "blk.h"
> > >>>>>>>  #include "blk-mq.h"
> > >>>>>>>
> > >>>>>>> +/*
> > >>>>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to
> > >>>>>>> + * queues even it isn't present yet.
> > >>>>>>> + */
> > >>>>>>>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> > >>>>>>>  {
> > >>>>>>> -	/*
> > >>>>>>> -	 * Non present CPU will be mapped to queue index 0.
> > >>>>>>> -	 */
> > >>>>>>> -	if (!cpu_present(cpu))
> > >>>>>>> -		return 0;
> > >>>>>>>  	return cpu % nr_queues;
> > >>>>>>>  }
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Ming
> > >>>>>>>
> > >>>>>>
> > >>>>>> With that I no longer see the WARN_ON but the other warning instead:
> > >>>>>>
> > >>>>>> [   31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> > >>>>>> [   31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> > >>>>>> [   31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd).
> > >>>>>> [   32.000543] systemd: 18 output lines suppressed due to ratelimiting
> > >>>>>> [   32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null)
> > >>>>>> [   32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1
> > >>>>>> [   32.359832] tun: Universal TUN/TAP device driver, 1.6
> > >>>>>> [   32.470841] run queue from wrong CPU 18, hctx active
> > >>>>>
> > >>>>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and
> > >>>>> you also said CPU hotplug isn't involved in your test, so I am just
> > >>>>> wondering where the CPU 18 is from?
> > >>>>
> > >>>>
> > >>>> I have 2 test systems. One with 44CPU the other with 16.
> > >>>> The one  with 16 now has the same flood of messages:
> > >>>>
> > >>>>
> > >>>>     4.454510]  dasdc:VOL1/  0X3F77: dasdc1
> > >>>> [    4.454592]  dasdd:VOL1/  0X3F74: dasdd1
> > >>>> [    4.593711] run queue from wrong CPU 0, hctx active
> > >>>
> > >>> Still can't reproduce your issue, could you please collect debugfs
> > >>> log again on your 16-core system after applying the patch on blk_mq_cpumap.c?
> > >>
> > >> done.
> > > 
> > > OK, thanks, from the dumped mapping, looks everything is fine, each hctx
> > > is mapped to 4 CPUs, and only one of them is online.
> > > 
> > > But still don't know why hctx is run from wrong CPU, looks we need more
> > > info to dump, could you apply the following debug patch and post the
> > > log?
> > > 
> > > ---
> > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> > > index 9f8cffc8a701..638ab5c11b3c 100644
> > > --- a/block/blk-mq-cpumap.c
> > > +++ b/block/blk-mq-cpumap.c
> > > @@ -14,13 +14,12 @@
> > >  #include "blk.h"
> > >  #include "blk-mq.h"
> > > 
> > > +/*
> > > + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
> > > + * queues even it isn't present yet.
> > > + */
> > >  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> > >  {
> > > -	/*
> > > -	 * Non present CPU will be mapped to queue index 0.
> > > -	 */
> > > -	if (!cpu_present(cpu))
> > > -		return 0;
> > >  	return cpu % nr_queues;
> > >  }
> > > 
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 16e83e6df404..65767be7927d 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -1325,9 +1325,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> > >  	 */
> > >  	if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> > >  		cpu_online(hctx->next_cpu)) {
> > > -		printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n",
> > > -			raw_smp_processor_id(),
> > > +		int cpu;
> > > +		printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n",
> > > +			raw_smp_processor_id(), hctx->next_cpu,
> > > +			hctx->queue_num,
> > >  			cpumask_empty(hctx->cpumask) ? "inactive": "active");
> > > +		for_each_cpu(cpu, hctx->cpumask)
> > > +			printk("%d ", cpu);
> > > +		printk("\n");
> > >  		dump_stack();
> > >  	}
> > > 
> > 
> > 
> > attached. FWIW, it looks like these messages happen mostly during boot and come less
> > and less often the longer the system runs. Could it be that the workqeue is misplaced
> > before it runs for the first time, but then it is ok?
> 
> Looks not workqueue's issue, and it shows that hctx->next_cpu is figured
> out as wrong by blk_mq_hctx_next_cpu(), maybe on your ARCH, either 'nr_cpu_ids'
> or 'cpu_online_mask' is too strange during booting and breaks blk_mq_hctx_next_cpu().
> 
> Could you please apply the following patch and provide the dmesg boot log?

And please post out the 'lscpu' log together from the test machine too.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-05 16:11                                 ` Ming Lei
@ 2018-04-05 17:39                                   ` Christian Borntraeger
  2018-04-05 17:43                                     ` Christian Borntraeger
  2018-04-06  8:41                                     ` Ming Lei
  0 siblings, 2 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-05 17:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]



On 04/05/2018 06:11 PM, Ming Lei wrote:
>>
>> Could you please apply the following patch and provide the dmesg boot log?
> 
> And please post out the 'lscpu' log together from the test machine too.

attached.

As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
== 16 threads.




[-- Attachment #2: dmesg.gz --]
[-- Type: application/gzip, Size: 22264 bytes --]

[-- Attachment #3: lscpu --]
[-- Type: text/plain, Size: 808 bytes --]

Architecture:        s390x
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Big Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s) per book:  3
Book(s) per drawer:  2
Drawer(s):           4
NUMA node(s):        1
Vendor ID:           IBM/S390
Machine type:        2964
CPU dynamic MHz:     5000
CPU static MHz:      5000
BogoMIPS:            20325.00
Hypervisor:          PR/SM
Hypervisor vendor:   IBM
Virtualization type: full
Dispatching mode:    horizontal
L1d cache:           128K
L1i cache:           96K
L2d cache:           2048K
L2i cache:           2048K
L3 cache:            65536K
L4 cache:            491520K
NUMA node0 CPU(s):   0-15
Flags:               esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie

[-- Attachment #4: lscpu2 --]
[-- Type: text/plain, Size: 1406 bytes --]

CPU NODE DRAWER BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED POLARIZATION ADDRESS
0   0    0      0    0      0    0:0:0:0         yes    yes        horizontal   0
1   0    0      0    0      0    1:1:1:1         yes    yes        horizontal   1
2   0    0      0    0      1    2:2:2:2         yes    yes        horizontal   2
3   0    0      0    0      1    3:3:3:3         yes    yes        horizontal   3
4   0    0      0    0      2    4:4:4:4         yes    yes        horizontal   4
5   0    0      0    0      2    5:5:5:5         yes    yes        horizontal   5
6   0    0      0    0      3    6:6:6:6         yes    yes        horizontal   6
7   0    0      0    0      3    7:7:7:7         yes    yes        horizontal   7
8   0    0      0    1      4    8:8:8:8         yes    yes        horizontal   8
9   0    0      0    1      4    9:9:9:9         yes    yes        horizontal   9
10  0    0      0    1      5    10:10:10:10     yes    yes        horizontal   10
11  0    0      0    1      5    11:11:11:11     yes    yes        horizontal   11
12  0    0      0    1      6    12:12:12:12     yes    yes        horizontal   12
13  0    0      0    1      6    13:13:13:13     yes    yes        horizontal   13
14  0    0      0    1      7    14:14:14:14     yes    yes        horizontal   14
15  0    0      0    1      7    15:15:15:15     yes    yes        horizontal   15

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-05 17:39                                   ` Christian Borntraeger
@ 2018-04-05 17:43                                     ` Christian Borntraeger
  2018-04-06  8:41                                     ` Ming Lei
  1 sibling, 0 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-05 17:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 04/05/2018 07:39 PM, Christian Borntraeger wrote:
> 
> 
> On 04/05/2018 06:11 PM, Ming Lei wrote:
>>>
>>> Could you please apply the following patch and provide the dmesg boot log?
>>
>> And please post out the 'lscpu' log together from the test machine too.
> 
> attached.
> 
> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
> == 16 threads.

To say it differently 
The whole system has up to 141 cpus, but this LPAR has only 8 cpus assigned. So we
have 16 CPUS (SMT) but this could become up to 282 IF I would do CPU hotplug. (But
this is not used here).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-05 16:05                               ` Ming Lei
  2018-04-05 16:11                                 ` Ming Lei
@ 2018-04-06  8:35                                 ` Christian Borntraeger
  1 sibling, 0 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-06  8:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 04/05/2018 06:05 PM, Ming Lei wrote:
[...]
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 90838e998f66..996f8a963026 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1324,9 +1324,18 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>  	 */
>  	if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>  		cpu_online(hctx->next_cpu)) {
> -		printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n",
> -			raw_smp_processor_id(),
> +		int cpu;
> +		printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n",
> +			raw_smp_processor_id(), hctx->next_cpu,
> +			hctx->queue_num,
>  			cpumask_empty(hctx->cpumask) ? "inactive": "active");
> +		printk("dump CPUs mapped to this hctx:\n");
> +		for_each_cpu(cpu, hctx->cpumask)
> +			printk("%d ", cpu);
> +		printk("\n");
> +		printk("nr_cpu_ids is %d, and dump online cpus:\n", nr_cpu_ids);
> +		for_each_cpu(cpu, cpu_online_mask)
> +			printk("%d ", cpu);
>  		dump_stack();
>  	}
> 

FWIW, with things like

[    4.049828] dump CPUs mapped to this hctx:
[    4.049829] 18 
[    4.049829] 82 
[    4.049830] 146 
[    4.049830] 210 
[    4.049831] 274 

[    4.049832] nr_cpu_ids is 282, and dump online cpus:
[    4.049833] 0 
[    4.049833] 1 
[    4.049834] 2 
[    4.049834] 3 
[    4.049835] 4 
[    4.049835] 5 
[    4.049836] 6 
[    4.049836] 7 
[    4.049837] 8 
[    4.049837] 9 
[    4.049838] 10 
[    4.049839] 11 
[    4.049839] 12 
[    4.049840] 13 
[    4.049840] 14 
[    4.049841] 15 

So the hctx has only "possible CPUs", but all are offline.

Doesnt that always make this run unbound? See blk_mq_hctx_next_cpu  below.

/*
 * It'd be great if the workqueue API had a way to pass
 * in a mask and had some smarts for more clever placement.
 * For now we just round-robin here, switching for every
 * BLK_MQ_CPU_WORK_BATCH queued items.
 */
static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
{
        bool tried = false;

        if (hctx->queue->nr_hw_queues == 1)
                return WORK_CPU_UNBOUND;

        if (--hctx->next_cpu_batch <= 0) {
                int next_cpu;
select_cpu:     
                next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
                                cpu_online_mask);
                if (next_cpu >= nr_cpu_ids)
                        next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask);

                /*
                 * No online CPU is found, so have to make sure hctx->next_cpu
                 * is set correctly for not breaking workqueue.
                 */
                if (next_cpu >= nr_cpu_ids)
                        hctx->next_cpu = cpumask_first(hctx->cpumask);
                else
                        hctx->next_cpu = next_cpu;
                hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
        }
        
        /*
         * Do unbound schedule if we can't find a online CPU for this hctx,
         * and it should only happen in the path of handling CPU DEAD.
         */
        if (!cpu_online(hctx->next_cpu)) {
                if (!tried) {
                        tried = true;
                        goto select_cpu;
                }

                /*
                 * Make sure to re-select CPU next time once after CPUs
                 * in hctx->cpumask become online again.
                 */
                hctx->next_cpu_batch = 1;
                return WORK_CPU_UNBOUND;
        }
        return hctx->next_cpu;
}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-05 17:39                                   ` Christian Borntraeger
  2018-04-05 17:43                                     ` Christian Borntraeger
@ 2018-04-06  8:41                                     ` Ming Lei
  2018-04-06  8:51                                       ` Christian Borntraeger
  1 sibling, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-04-06  8:41 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/05/2018 06:11 PM, Ming Lei wrote:
> >>
> >> Could you please apply the following patch and provide the dmesg boot log?
> > 
> > And please post out the 'lscpu' log together from the test machine too.
> 
> attached.
> 
> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
> == 16 threads.

OK, thanks!

The most weird thing is that hctx->next_cpu is computed as 512 since
nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
possible CPU.

Looks like it is a s390 specific issue, since I can setup one queue
which has same mapping with yours:

	- nr_cpu_id is 282
	- CPU 0~15 is online
	- 64 queues null_blk
	- still run all hw queues in .complete handler

But can't reproduce this issue at all.

So please test the following patch, which may tell us why hctx->next_cpu
is computed wrong:

---
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..638ab5c11b3c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -14,13 +14,12 @@
 #include "blk.h"
 #include "blk-mq.h"
 
+/*
+ * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
+ * queues even it isn't present yet.
+ */
 static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
 {
-	/*
-	 * Non present CPU will be mapped to queue index 0.
-	 */
-	if (!cpu_present(cpu))
-		return 0;
 	return cpu % nr_queues;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 90838e998f66..9b130e4b87df 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1343,6 +1343,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	hctx_unlock(hctx, srcu_idx);
 }
 
+static void check_next_cpu(int next_cpu, const char *str1, const char *str2)
+{
+	if (next_cpu > nr_cpu_ids)
+		printk_ratelimited("wrong next_cpu %d, %s, %s\n",
+				next_cpu, str1, str2);
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1352,26 +1359,29 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 {
 	bool tried = false;
+	int next_cpu = hctx->next_cpu;
 
 	if (hctx->queue->nr_hw_queues == 1)
 		return WORK_CPU_UNBOUND;
 
 	if (--hctx->next_cpu_batch <= 0) {
-		int next_cpu;
 select_cpu:
-		next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
+		next_cpu = cpumask_next_and(next_cpu, hctx->cpumask,
 				cpu_online_mask);
-		if (next_cpu >= nr_cpu_ids)
+		check_next_cpu(next_cpu, __func__, "next_and");
+		if (next_cpu >= nr_cpu_ids) {
 			next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask);
+			check_next_cpu(next_cpu, __func__, "first_and");
+		}
 
 		/*
 		 * No online CPU is found, so have to make sure hctx->next_cpu
 		 * is set correctly for not breaking workqueue.
 		 */
-		if (next_cpu >= nr_cpu_ids)
-			hctx->next_cpu = cpumask_first(hctx->cpumask);
-		else
-			hctx->next_cpu = next_cpu;
+		if (next_cpu >= nr_cpu_ids) {
+			next_cpu = cpumask_first(hctx->cpumask);
+			check_next_cpu(next_cpu, __func__, "first");
+		}
 		hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
 	}
 
@@ -1379,7 +1389,7 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 	 * Do unbound schedule if we can't find a online CPU for this hctx,
 	 * and it should only happen in the path of handling CPU DEAD.
 	 */
-	if (!cpu_online(hctx->next_cpu)) {
+	if (!cpu_online(next_cpu)) {
 		if (!tried) {
 			tried = true;
 			goto select_cpu;
@@ -1392,7 +1402,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 		hctx->next_cpu_batch = 1;
 		return WORK_CPU_UNBOUND;
 	}
-	return hctx->next_cpu;
+
+	hctx->next_cpu = next_cpu;
+	return next_cpu;
 }
 
 static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
@@ -2408,6 +2420,8 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 	mutex_unlock(&q->sysfs_lock);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
+		int next_cpu;
+
 		/*
 		 * If no software queues are mapped to this hardware queue,
 		 * disable it and free the request entries.
@@ -2437,8 +2451,10 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 		/*
 		 * Initialize batch roundrobin counts
 		 */
-		hctx->next_cpu = cpumask_first_and(hctx->cpumask,
+		next_cpu = cpumask_first_and(hctx->cpumask,
 				cpu_online_mask);
+		check_next_cpu(next_cpu, __func__, "first_and");
+		hctx->next_cpu = next_cpu;
 		hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
 	}
 }


Thanks,
Ming

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06  8:41                                     ` Ming Lei
@ 2018-04-06  8:51                                       ` Christian Borntraeger
  2018-04-06  8:53                                         ` Christian Borntraeger
  2018-04-06  9:23                                         ` Ming Lei
  0 siblings, 2 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-06  8:51 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 04/06/2018 10:41 AM, Ming Lei wrote:
> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/05/2018 06:11 PM, Ming Lei wrote:
>>>>
>>>> Could you please apply the following patch and provide the dmesg boot log?
>>>
>>> And please post out the 'lscpu' log together from the test machine too.
>>
>> attached.
>>
>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
>> == 16 threads.
> 
> OK, thanks!
> 
> The most weird thing is that hctx->next_cpu is computed as 512 since
> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> possible CPU.
> 
> Looks like it is a s390 specific issue, since I can setup one queue
> which has same mapping with yours:
> 
> 	- nr_cpu_id is 282
> 	- CPU 0~15 is online
> 	- 64 queues null_blk
> 	- still run all hw queues in .complete handler
> 
> But can't reproduce this issue at all.
> 
> So please test the following patch, which may tell us why hctx->next_cpu
> is computed wrong:

I see things like

[    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and

which is exactly what happens if the find and and operation fails (returns size of bitmap).

FWIW, I added a dump stack for the case when we run unbound before I tested your patch:

Apr 06 10:47:41 s38lp39 kernel: CPU: 15 PID: 86 Comm: ksoftirqd/15 Not tainted 4.16.0-07249-g864f9fc031e4-dirty #2
Apr 06 10:47:41 s38lp39 kernel: Hardware name: IBM 2964 NC9 704 (LPAR)
Apr 06 10:47:41 s38lp39 kernel: Call Trace:
Apr 06 10:47:41 s38lp39 kernel: ([<0000000000113946>] show_stack+0x56/0x80)
Apr 06 10:47:41 s38lp39 kernel:  [<00000000009d8132>] dump_stack+0x82/0xb0 
Apr 06 10:47:41 s38lp39 kernel:  [<00000000006a05de>] blk_mq_hctx_next_cpu+0x12e/0x138 
Apr 06 10:47:41 s38lp39 kernel:  [<00000000006a084c>] __blk_mq_delay_run_hw_queue+0x94/0xd8 
Apr 06 10:47:41 s38lp39 kernel:  [<00000000006a097a>] blk_mq_run_hw_queue+0x82/0x180 
Apr 06 10:47:41 s38lp39 kernel:  [<00000000006a0ae0>] blk_mq_run_hw_queues+0x68/0x88 
Apr 06 10:47:41 s38lp39 kernel:  [<000000000069fc4e>] __blk_mq_complete_request+0x11e/0x1d8 
Apr 06 10:47:41 s38lp39 kernel:  [<000000000069fd94>] blk_mq_complete_request+0x8c/0xc8 
Apr 06 10:47:41 s38lp39 kernel:  [<0000000000824c50>] dasd_block_tasklet+0x158/0x490 
Apr 06 10:47:41 s38lp39 kernel:  [<000000000014a952>] tasklet_action_common.isra.5+0x7a/0x100 
Apr 06 10:47:41 s38lp39 kernel:  [<00000000009f8248>] __do_softirq+0x98/0x368 
Apr 06 10:47:41 s38lp39 kernel:  [<000000000014a322>] run_ksoftirqd+0x4a/0x68 
Apr 06 10:47:41 s38lp39 kernel:  [<000000000016dc20>] smpboot_thread_fn+0x108/0x1b0 
Apr 06 10:47:41 s38lp39 kernel:  [<0000000000168e70>] kthread+0x148/0x160 
Apr 06 10:47:41 s38lp39 kernel:  [<00000000009f727a>] kernel_thread_starter+0x6/0xc 
Apr 06 10:47:41 s38lp39 kernel:  [<00000000009f7274>] kernel_thread_starter+0x0/0xc

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06  8:51                                       ` Christian Borntraeger
@ 2018-04-06  8:53                                         ` Christian Borntraeger
  2018-04-06  9:23                                         ` Ming Lei
  1 sibling, 0 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-06  8:53 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 04/06/2018 10:51 AM, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 10:41 AM, Ming Lei wrote:
>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
>>>
>>>
>>> On 04/05/2018 06:11 PM, Ming Lei wrote:
>>>>>
>>>>> Could you please apply the following patch and provide the dmesg boot log?
>>>>
>>>> And please post out the 'lscpu' log together from the test machine too.
>>>
>>> attached.
>>>
>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
>>> == 16 threads.
>>
>> OK, thanks!
>>
>> The most weird thing is that hctx->next_cpu is computed as 512 since
>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
>> possible CPU.
>>
>> Looks like it is a s390 specific issue, since I can setup one queue
>> which has same mapping with yours:
>>
>> 	- nr_cpu_id is 282
>> 	- CPU 0~15 is online
>> 	- 64 queues null_blk
>> 	- still run all hw queues in .complete handler
>>
>> But can't reproduce this issue at all.
>>
>> So please test the following patch, which may tell us why hctx->next_cpu
>> is computed wrong:
> 
> I see things like
> 
> [    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and

There are more

# dmesg  | grep "wrong next"  | cut -d "]" -f 2- | uniq -c
     10  wrong next_cpu 512, blk_mq_map_swqueue, first_and
     72  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
      7  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
      1  wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
     10  wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and


> 
> which is exactly what happens if the find and and operation fails (returns size of bitmap).
> 
> FWIW, I added a dump stack for the case when we run unbound before I tested your patch:
> 
> Apr 06 10:47:41 s38lp39 kernel: CPU: 15 PID: 86 Comm: ksoftirqd/15 Not tainted 4.16.0-07249-g864f9fc031e4-dirty #2
> Apr 06 10:47:41 s38lp39 kernel: Hardware name: IBM 2964 NC9 704 (LPAR)
> Apr 06 10:47:41 s38lp39 kernel: Call Trace:
> Apr 06 10:47:41 s38lp39 kernel: ([<0000000000113946>] show_stack+0x56/0x80)
> Apr 06 10:47:41 s38lp39 kernel:  [<00000000009d8132>] dump_stack+0x82/0xb0 
> Apr 06 10:47:41 s38lp39 kernel:  [<00000000006a05de>] blk_mq_hctx_next_cpu+0x12e/0x138 
> Apr 06 10:47:41 s38lp39 kernel:  [<00000000006a084c>] __blk_mq_delay_run_hw_queue+0x94/0xd8 
> Apr 06 10:47:41 s38lp39 kernel:  [<00000000006a097a>] blk_mq_run_hw_queue+0x82/0x180 
> Apr 06 10:47:41 s38lp39 kernel:  [<00000000006a0ae0>] blk_mq_run_hw_queues+0x68/0x88 
> Apr 06 10:47:41 s38lp39 kernel:  [<000000000069fc4e>] __blk_mq_complete_request+0x11e/0x1d8 
> Apr 06 10:47:41 s38lp39 kernel:  [<000000000069fd94>] blk_mq_complete_request+0x8c/0xc8 
> Apr 06 10:47:41 s38lp39 kernel:  [<0000000000824c50>] dasd_block_tasklet+0x158/0x490 
> Apr 06 10:47:41 s38lp39 kernel:  [<000000000014a952>] tasklet_action_common.isra.5+0x7a/0x100 
> Apr 06 10:47:41 s38lp39 kernel:  [<00000000009f8248>] __do_softirq+0x98/0x368 
> Apr 06 10:47:41 s38lp39 kernel:  [<000000000014a322>] run_ksoftirqd+0x4a/0x68 
> Apr 06 10:47:41 s38lp39 kernel:  [<000000000016dc20>] smpboot_thread_fn+0x108/0x1b0 
> Apr 06 10:47:41 s38lp39 kernel:  [<0000000000168e70>] kthread+0x148/0x160 
> Apr 06 10:47:41 s38lp39 kernel:  [<00000000009f727a>] kernel_thread_starter+0x6/0xc 
> Apr 06 10:47:41 s38lp39 kernel:  [<00000000009f7274>] kernel_thread_starter+0x0/0xc
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06  8:51                                       ` Christian Borntraeger
  2018-04-06  8:53                                         ` Christian Borntraeger
@ 2018-04-06  9:23                                         ` Ming Lei
  2018-04-06 10:19                                           ` Christian Borntraeger
  2018-04-06 11:37                                           ` Christian Borntraeger
  1 sibling, 2 replies; 40+ messages in thread
From: Ming Lei @ 2018-04-06  9:23 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 10:41 AM, Ming Lei wrote:
> > On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/05/2018 06:11 PM, Ming Lei wrote:
> >>>>
> >>>> Could you please apply the following patch and provide the dmesg boot log?
> >>>
> >>> And please post out the 'lscpu' log together from the test machine too.
> >>
> >> attached.
> >>
> >> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
> >> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
> >> == 16 threads.
> > 
> > OK, thanks!
> > 
> > The most weird thing is that hctx->next_cpu is computed as 512 since
> > nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> > possible CPU.
> > 
> > Looks like it is a s390 specific issue, since I can setup one queue
> > which has same mapping with yours:
> > 
> > 	- nr_cpu_id is 282
> > 	- CPU 0~15 is online
> > 	- 64 queues null_blk
> > 	- still run all hw queues in .complete handler
> > 
> > But can't reproduce this issue at all.
> > 
> > So please test the following patch, which may tell us why hctx->next_cpu
> > is computed wrong:
> 
> I see things like
> 
> [    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> [    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> 
> which is exactly what happens if the find and and operation fails (returns size of bitmap).

Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
in your previous debug log, it means the following function returns
totally wrong result on S390.

	cpumask_first_and(hctx->cpumask, cpu_online_mask);

The debugfs log shows that each hctx->cpumask includes one online
CPU(0~15).

So looks it isn't one issue in block MQ core.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06  9:23                                         ` Ming Lei
@ 2018-04-06 10:19                                           ` Christian Borntraeger
  2018-04-06 13:41                                             ` Ming Lei
  2018-04-06 11:37                                           ` Christian Borntraeger
  1 sibling, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-06 10:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 04/06/2018 11:23 AM, Ming Lei wrote:
> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/06/2018 10:41 AM, Ming Lei wrote:
>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 04/05/2018 06:11 PM, Ming Lei wrote:
>>>>>>
>>>>>> Could you please apply the following patch and provide the dmesg boot log?
>>>>>
>>>>> And please post out the 'lscpu' log together from the test machine too.
>>>>
>>>> attached.
>>>>
>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
>>>> == 16 threads.
>>>
>>> OK, thanks!
>>>
>>> The most weird thing is that hctx->next_cpu is computed as 512 since
>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
>>> possible CPU.
>>>
>>> Looks like it is a s390 specific issue, since I can setup one queue
>>> which has same mapping with yours:
>>>
>>> 	- nr_cpu_id is 282
>>> 	- CPU 0~15 is online
>>> 	- 64 queues null_blk
>>> 	- still run all hw queues in .complete handler
>>>
>>> But can't reproduce this issue at all.
>>>
>>> So please test the following patch, which may tell us why hctx->next_cpu
>>> is computed wrong:
>>
>> I see things like
>>
>> [    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>
>> which is exactly what happens if the find and and operation fails (returns size of bitmap).
> 
> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> in your previous debug log, it means the following function returns
> totally wrong result on S390.
> 
> 	cpumask_first_and(hctx->cpumask, cpu_online_mask);
> 
> The debugfs log shows that each hctx->cpumask includes one online
> CPU(0~15).

Really? the last log (with the latest patch applied  shows a lot of contexts
that do not have CPUs in 0-15:

e.g. 
[    4.049828] dump CPUs mapped to this hctx:
[    4.049829] 18 
[    4.049829] 82 
[    4.049830] 146 
[    4.049830] 210 
[    4.049831] 274 

> 
> So looks it isn't one issue in block MQ core.
> 
> Thanks,
> Ming
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06  9:23                                         ` Ming Lei
  2018-04-06 10:19                                           ` Christian Borntraeger
@ 2018-04-06 11:37                                           ` Christian Borntraeger
  1 sibling, 0 replies; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-06 11:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 04/06/2018 11:23 AM, Ming Lei wrote:
> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/06/2018 10:41 AM, Ming Lei wrote:
>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 04/05/2018 06:11 PM, Ming Lei wrote:
>>>>>>
>>>>>> Could you please apply the following patch and provide the dmesg boot log?
>>>>>
>>>>> And please post out the 'lscpu' log together from the test machine too.
>>>>
>>>> attached.
>>>>
>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
>>>> == 16 threads.
>>>
>>> OK, thanks!
>>>
>>> The most weird thing is that hctx->next_cpu is computed as 512 since
>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
>>> possible CPU.
>>>
>>> Looks like it is a s390 specific issue, since I can setup one queue
>>> which has same mapping with yours:
>>>
>>> 	- nr_cpu_id is 282
>>> 	- CPU 0~15 is online
>>> 	- 64 queues null_blk
>>> 	- still run all hw queues in .complete handler
>>>
>>> But can't reproduce this issue at all.
>>>
>>> So please test the following patch, which may tell us why hctx->next_cpu
>>> is computed wrong:
>>
>> I see things like
>>
>> [    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>> [    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>
>> which is exactly what happens if the find and and operation fails (returns size of bitmap).
> 
> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> in your previous debug log, it means the following function returns
> totally wrong result on S390.
> 
> 	cpumask_first_and(hctx->cpumask, cpu_online_mask);
> 
> The debugfs log shows that each hctx->cpumask includes one online
> CPU(0~15).
> 
> So looks it isn't one issue in block MQ core.

So I checked further and printed the mask I think I can ignore the next_and cases. It is totally
valid to get 512 here (as we might start with an offset that is already the last cpu and we need
to wrap with first_and)).

So the first_and and the first cases are really the interesting one. And I think the code is perfectly
right, there is no bit after and for these cases:


[    3.220021] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    3.220023] 1: 0000000000010000 0000000000010000 0000000000010000 0000000000010000 0000000000010000 0000000000000000 0000000000000000 0000000000000000
[    3.220025] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    3.220027] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    3.220028] 1: 0000000000020000 0000000000020000 0000000000020000 0000000000020000 0000000000020000 0000000000000000 0000000000000000 0000000000000000
[    3.220030] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    3.220032] 1: 0000000000010000 0000000000010000 0000000000010000 0000000000010000 0000000000010000 0000000000000000 0000000000000000 0000000000000000
[    3.220033] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    3.220035] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    3.220036] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    3.220037] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    3.220039] 1: 0000000000040000 0000000000040000 0000000000040000 0000000000040000 0000000000040000 0000000000000000 0000000000000000 0000000000000000
[    3.220040] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    3.220042] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    3.220062] 1: 0000000000020000 0000000000020000 0000000000020000 0000000000020000 0000000000020000 0000000000000000 0000000000000000 0000000000000000
[    3.220063] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    3.220064] wrong next_cpu 512, blk_mq_map_swqueue, first_and
[    3.220066] 1: 0000000000080000 0000000000080000 0000000000080000 0000000000080000 0000000000080000 0000000000000000 0000000000000000 0000000000000000
[    3.220067] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06 10:19                                           ` Christian Borntraeger
@ 2018-04-06 13:41                                             ` Ming Lei
  2018-04-06 14:26                                               ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-04-06 13:41 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 11:23 AM, Ming Lei wrote:
> > On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/06/2018 10:41 AM, Ming Lei wrote:
> >>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 04/05/2018 06:11 PM, Ming Lei wrote:
> >>>>>>
> >>>>>> Could you please apply the following patch and provide the dmesg boot log?
> >>>>>
> >>>>> And please post out the 'lscpu' log together from the test machine too.
> >>>>
> >>>> attached.
> >>>>
> >>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
> >>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
> >>>> == 16 threads.
> >>>
> >>> OK, thanks!
> >>>
> >>> The most weird thing is that hctx->next_cpu is computed as 512 since
> >>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> >>> possible CPU.
> >>>
> >>> Looks like it is a s390 specific issue, since I can setup one queue
> >>> which has same mapping with yours:
> >>>
> >>> 	- nr_cpu_id is 282
> >>> 	- CPU 0~15 is online
> >>> 	- 64 queues null_blk
> >>> 	- still run all hw queues in .complete handler
> >>>
> >>> But can't reproduce this issue at all.
> >>>
> >>> So please test the following patch, which may tell us why hctx->next_cpu
> >>> is computed wrong:
> >>
> >> I see things like
> >>
> >> [    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >> [    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>
> >> which is exactly what happens if the find and and operation fails (returns size of bitmap).
> > 
> > Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> > in your previous debug log, it means the following function returns
> > totally wrong result on S390.
> > 
> > 	cpumask_first_and(hctx->cpumask, cpu_online_mask);
> > 
> > The debugfs log shows that each hctx->cpumask includes one online
> > CPU(0~15).
> 
> Really? the last log (with the latest patch applied  shows a lot of contexts
> that do not have CPUs in 0-15:
> 
> e.g. 
> [    4.049828] dump CPUs mapped to this hctx:
> [    4.049829] 18 
> [    4.049829] 82 
> [    4.049830] 146 
> [    4.049830] 210 
> [    4.049831] 274 

That won't be an issue, since no IO can be submitted from these offline
CPUs, then these hctx shouldn't have been run at all.

But hctx->next_cpu can be set as 512 for these inactive hctx in
blk_mq_map_swqueue(), then please test the attached patch, and if
hctx->next_cpu is still set as 512, something is still wrong.

---

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..638ab5c11b3c 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -14,13 +14,12 @@
 #include "blk.h"
 #include "blk-mq.h"
 
+/*
+ * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
+ * queues even it isn't present yet.
+ */
 static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
 {
-	/*
-	 * Non present CPU will be mapped to queue index 0.
-	 */
-	if (!cpu_present(cpu))
-		return 0;
 	return cpu % nr_queues;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 90838e998f66..1a834d96a718 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1343,6 +1343,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	hctx_unlock(hctx, srcu_idx);
 }
 
+static void check_next_cpu(int next_cpu, const char *str1, const char *str2)
+{
+	if (next_cpu > nr_cpu_ids)
+		printk_ratelimited("wrong next_cpu %d, %s, %s\n",
+				next_cpu, str1, str2);
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1352,26 +1359,29 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 {
 	bool tried = false;
+	int next_cpu = hctx->next_cpu;
 
 	if (hctx->queue->nr_hw_queues == 1)
 		return WORK_CPU_UNBOUND;
 
 	if (--hctx->next_cpu_batch <= 0) {
-		int next_cpu;
 select_cpu:
-		next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
+		next_cpu = cpumask_next_and(next_cpu, hctx->cpumask,
 				cpu_online_mask);
-		if (next_cpu >= nr_cpu_ids)
+		check_next_cpu(next_cpu, __func__, "next_and");
+		if (next_cpu >= nr_cpu_ids) {
 			next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask);
+			check_next_cpu(next_cpu, __func__, "first_and");
+		}
 
 		/*
 		 * No online CPU is found, so have to make sure hctx->next_cpu
 		 * is set correctly for not breaking workqueue.
 		 */
-		if (next_cpu >= nr_cpu_ids)
-			hctx->next_cpu = cpumask_first(hctx->cpumask);
-		else
-			hctx->next_cpu = next_cpu;
+		if (next_cpu >= nr_cpu_ids) {
+			next_cpu = cpumask_first(hctx->cpumask);
+			check_next_cpu(next_cpu, __func__, "first");
+		}
 		hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
 	}
 
@@ -1379,7 +1389,7 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 	 * Do unbound schedule if we can't find a online CPU for this hctx,
 	 * and it should only happen in the path of handling CPU DEAD.
 	 */
-	if (!cpu_online(hctx->next_cpu)) {
+	if (!cpu_online(next_cpu)) {
 		if (!tried) {
 			tried = true;
 			goto select_cpu;
@@ -1392,7 +1402,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 		hctx->next_cpu_batch = 1;
 		return WORK_CPU_UNBOUND;
 	}
-	return hctx->next_cpu;
+
+	hctx->next_cpu = next_cpu;
+	return next_cpu;
 }
 
 static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
@@ -2408,6 +2420,8 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 	mutex_unlock(&q->sysfs_lock);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
+		int next_cpu;
+
 		/*
 		 * If no software queues are mapped to this hardware queue,
 		 * disable it and free the request entries.
@@ -2437,8 +2451,12 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 		/*
 		 * Initialize batch roundrobin counts
 		 */
-		hctx->next_cpu = cpumask_first_and(hctx->cpumask,
+		next_cpu = cpumask_first_and(hctx->cpumask,
 				cpu_online_mask);
+		if (next_cpu >= nr_cpu_ids)
+			next_cpu = cpumask_first(hctx->cpumask);
+		check_next_cpu(next_cpu, __func__, "first_and");
+		hctx->next_cpu = next_cpu;
 		hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
 	}
 }
Thanks,
Ming

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06 13:41                                             ` Ming Lei
@ 2018-04-06 14:26                                               ` Christian Borntraeger
  2018-04-06 14:58                                                 ` Ming Lei
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-06 14:26 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 04/06/2018 03:41 PM, Ming Lei wrote:
> On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/06/2018 11:23 AM, Ming Lei wrote:
>>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 04/06/2018 10:41 AM, Ming Lei wrote:
>>>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 04/05/2018 06:11 PM, Ming Lei wrote:
>>>>>>>>
>>>>>>>> Could you please apply the following patch and provide the dmesg boot log?
>>>>>>>
>>>>>>> And please post out the 'lscpu' log together from the test machine too.
>>>>>>
>>>>>> attached.
>>>>>>
>>>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>>>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
>>>>>> == 16 threads.
>>>>>
>>>>> OK, thanks!
>>>>>
>>>>> The most weird thing is that hctx->next_cpu is computed as 512 since
>>>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
>>>>> possible CPU.
>>>>>
>>>>> Looks like it is a s390 specific issue, since I can setup one queue
>>>>> which has same mapping with yours:
>>>>>
>>>>> 	- nr_cpu_id is 282
>>>>> 	- CPU 0~15 is online
>>>>> 	- 64 queues null_blk
>>>>> 	- still run all hw queues in .complete handler
>>>>>
>>>>> But can't reproduce this issue at all.
>>>>>
>>>>> So please test the following patch, which may tell us why hctx->next_cpu
>>>>> is computed wrong:
>>>>
>>>> I see things like
>>>>
>>>> [    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>> [    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>> [    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>> [    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>> [    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>> [    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>> [    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>> [    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>
>>>> which is exactly what happens if the find and and operation fails (returns size of bitmap).
>>>
>>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
>>> in your previous debug log, it means the following function returns
>>> totally wrong result on S390.
>>>
>>> 	cpumask_first_and(hctx->cpumask, cpu_online_mask);
>>>
>>> The debugfs log shows that each hctx->cpumask includes one online
>>> CPU(0~15).
>>
>> Really? the last log (with the latest patch applied  shows a lot of contexts
>> that do not have CPUs in 0-15:
>>
>> e.g. 
>> [    4.049828] dump CPUs mapped to this hctx:
>> [    4.049829] 18 
>> [    4.049829] 82 
>> [    4.049830] 146 
>> [    4.049830] 210 
>> [    4.049831] 274 
> 
> That won't be an issue, since no IO can be submitted from these offline
> CPUs, then these hctx shouldn't have been run at all.
> 
> But hctx->next_cpu can be set as 512 for these inactive hctx in
> blk_mq_map_swqueue(), then please test the attached patch, and if
> hctx->next_cpu is still set as 512, something is still wrong.


WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" messages.
your debug code still triggers, though.

wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and

If we would remove the debug code then dmesg would be clean it seems.


> ---
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..638ab5c11b3c 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -14,13 +14,12 @@
>  #include "blk.h"
>  #include "blk-mq.h"
> 
> +/*
> + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to
> + * queues even it isn't present yet.
> + */
>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>  {
> -	/*
> -	 * Non present CPU will be mapped to queue index 0.
> -	 */
> -	if (!cpu_present(cpu))
> -		return 0;
>  	return cpu % nr_queues;
>  }
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 90838e998f66..1a834d96a718 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1343,6 +1343,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>  	hctx_unlock(hctx, srcu_idx);
>  }
> 
> +static void check_next_cpu(int next_cpu, const char *str1, const char *str2)
> +{
> +	if (next_cpu > nr_cpu_ids)
> +		printk_ratelimited("wrong next_cpu %d, %s, %s\n",
> +				next_cpu, str1, str2);
> +}
> +
>  /*
>   * It'd be great if the workqueue API had a way to pass
>   * in a mask and had some smarts for more clever placement.
> @@ -1352,26 +1359,29 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>  static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>  {
>  	bool tried = false;
> +	int next_cpu = hctx->next_cpu;
> 
>  	if (hctx->queue->nr_hw_queues == 1)
>  		return WORK_CPU_UNBOUND;
> 
>  	if (--hctx->next_cpu_batch <= 0) {
> -		int next_cpu;
>  select_cpu:
> -		next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
> +		next_cpu = cpumask_next_and(next_cpu, hctx->cpumask,
>  				cpu_online_mask);
> -		if (next_cpu >= nr_cpu_ids)
> +		check_next_cpu(next_cpu, __func__, "next_and");
> +		if (next_cpu >= nr_cpu_ids) {
>  			next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask);
> +			check_next_cpu(next_cpu, __func__, "first_and");
> +		}
> 
>  		/*
>  		 * No online CPU is found, so have to make sure hctx->next_cpu
>  		 * is set correctly for not breaking workqueue.
>  		 */
> -		if (next_cpu >= nr_cpu_ids)
> -			hctx->next_cpu = cpumask_first(hctx->cpumask);
> -		else
> -			hctx->next_cpu = next_cpu;
> +		if (next_cpu >= nr_cpu_ids) {
> +			next_cpu = cpumask_first(hctx->cpumask);
> +			check_next_cpu(next_cpu, __func__, "first");
> +		}
>  		hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
>  	}
> 
> @@ -1379,7 +1389,7 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>  	 * Do unbound schedule if we can't find a online CPU for this hctx,
>  	 * and it should only happen in the path of handling CPU DEAD.
>  	 */
> -	if (!cpu_online(hctx->next_cpu)) {
> +	if (!cpu_online(next_cpu)) {
>  		if (!tried) {
>  			tried = true;
>  			goto select_cpu;
> @@ -1392,7 +1402,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>  		hctx->next_cpu_batch = 1;
>  		return WORK_CPU_UNBOUND;
>  	}
> -	return hctx->next_cpu;
> +
> +	hctx->next_cpu = next_cpu;
> +	return next_cpu;
>  }
> 
>  static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
> @@ -2408,6 +2420,8 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  	mutex_unlock(&q->sysfs_lock);
> 
>  	queue_for_each_hw_ctx(q, hctx, i) {
> +		int next_cpu;
> +
>  		/*
>  		 * If no software queues are mapped to this hardware queue,
>  		 * disable it and free the request entries.
> @@ -2437,8 +2451,12 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  		/*
>  		 * Initialize batch roundrobin counts
>  		 */
> -		hctx->next_cpu = cpumask_first_and(hctx->cpumask,
> +		next_cpu = cpumask_first_and(hctx->cpumask,
>  				cpu_online_mask);
> +		if (next_cpu >= nr_cpu_ids)
> +			next_cpu = cpumask_first(hctx->cpumask);
> +		check_next_cpu(next_cpu, __func__, "first_and");
> +		hctx->next_cpu = next_cpu;
>  		hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
>  	}
>  }
> Thanks,
> Ming
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06 14:26                                               ` Christian Borntraeger
@ 2018-04-06 14:58                                                 ` Ming Lei
  2018-04-06 15:11                                                   ` Christian Borntraeger
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-04-06 14:58 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Fri, Apr 06, 2018 at 04:26:49PM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 03:41 PM, Ming Lei wrote:
> > On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/06/2018 11:23 AM, Ming Lei wrote:
> >>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 04/06/2018 10:41 AM, Ming Lei wrote:
> >>>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 04/05/2018 06:11 PM, Ming Lei wrote:
> >>>>>>>>
> >>>>>>>> Could you please apply the following patch and provide the dmesg boot log?
> >>>>>>>
> >>>>>>> And please post out the 'lscpu' log together from the test machine too.
> >>>>>>
> >>>>>> attached.
> >>>>>>
> >>>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
> >>>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
> >>>>>> == 16 threads.
> >>>>>
> >>>>> OK, thanks!
> >>>>>
> >>>>> The most weird thing is that hctx->next_cpu is computed as 512 since
> >>>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> >>>>> possible CPU.
> >>>>>
> >>>>> Looks like it is a s390 specific issue, since I can setup one queue
> >>>>> which has same mapping with yours:
> >>>>>
> >>>>> 	- nr_cpu_id is 282
> >>>>> 	- CPU 0~15 is online
> >>>>> 	- 64 queues null_blk
> >>>>> 	- still run all hw queues in .complete handler
> >>>>>
> >>>>> But can't reproduce this issue at all.
> >>>>>
> >>>>> So please test the following patch, which may tell us why hctx->next_cpu
> >>>>> is computed wrong:
> >>>>
> >>>> I see things like
> >>>>
> >>>> [    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>> [    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>> [    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>> [    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>> [    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>> [    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>> [    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>> [    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>
> >>>> which is exactly what happens if the find and and operation fails (returns size of bitmap).
> >>>
> >>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> >>> in your previous debug log, it means the following function returns
> >>> totally wrong result on S390.
> >>>
> >>> 	cpumask_first_and(hctx->cpumask, cpu_online_mask);
> >>>
> >>> The debugfs log shows that each hctx->cpumask includes one online
> >>> CPU(0~15).
> >>
> >> Really? the last log (with the latest patch applied  shows a lot of contexts
> >> that do not have CPUs in 0-15:
> >>
> >> e.g. 
> >> [    4.049828] dump CPUs mapped to this hctx:
> >> [    4.049829] 18 
> >> [    4.049829] 82 
> >> [    4.049830] 146 
> >> [    4.049830] 210 
> >> [    4.049831] 274 
> > 
> > That won't be an issue, since no IO can be submitted from these offline
> > CPUs, then these hctx shouldn't have been run at all.
> > 
> > But hctx->next_cpu can be set as 512 for these inactive hctx in
> > blk_mq_map_swqueue(), then please test the attached patch, and if
> > hctx->next_cpu is still set as 512, something is still wrong.
> 
> 
> WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" messages.
> your debug code still triggers, though.
> 
> wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
> wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
> 
> If we would remove the debug code then dmesg would be clean it seems.

That is still a bit strange, since for any inactive hctx(without online
CPU mapped), blk_mq_run_hw_queue() will check blk_mq_hctx_has_pending()
first. And there shouldn't be any pending IO for all inactive hctx
in your case, so looks blk_mq_hctx_next_cpu() shouldn't be called for
inactive hctx.

I will prepare one patchset and post out soon, and hope all these issues
can be covered.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06 14:58                                                 ` Ming Lei
@ 2018-04-06 15:11                                                   ` Christian Borntraeger
  2018-04-06 15:40                                                     ` Ming Lei
  0 siblings, 1 reply; 40+ messages in thread
From: Christian Borntraeger @ 2018-04-06 15:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig



On 04/06/2018 04:58 PM, Ming Lei wrote:
> On Fri, Apr 06, 2018 at 04:26:49PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 04/06/2018 03:41 PM, Ming Lei wrote:
>>> On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 04/06/2018 11:23 AM, Ming Lei wrote:
>>>>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 04/06/2018 10:41 AM, Ming Lei wrote:
>>>>>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 04/05/2018 06:11 PM, Ming Lei wrote:
>>>>>>>>>>
>>>>>>>>>> Could you please apply the following patch and provide the dmesg boot log?
>>>>>>>>>
>>>>>>>>> And please post out the 'lscpu' log together from the test machine too.
>>>>>>>>
>>>>>>>> attached.
>>>>>>>>
>>>>>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
>>>>>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
>>>>>>>> == 16 threads.
>>>>>>>
>>>>>>> OK, thanks!
>>>>>>>
>>>>>>> The most weird thing is that hctx->next_cpu is computed as 512 since
>>>>>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
>>>>>>> possible CPU.
>>>>>>>
>>>>>>> Looks like it is a s390 specific issue, since I can setup one queue
>>>>>>> which has same mapping with yours:
>>>>>>>
>>>>>>> 	- nr_cpu_id is 282
>>>>>>> 	- CPU 0~15 is online
>>>>>>> 	- 64 queues null_blk
>>>>>>> 	- still run all hw queues in .complete handler
>>>>>>>
>>>>>>> But can't reproduce this issue at all.
>>>>>>>
>>>>>>> So please test the following patch, which may tell us why hctx->next_cpu
>>>>>>> is computed wrong:
>>>>>>
>>>>>> I see things like
>>>>>>
>>>>>> [    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>> [    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>> [    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>> [    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>> [    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>> [    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>> [    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>> [    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
>>>>>>
>>>>>> which is exactly what happens if the find and and operation fails (returns size of bitmap).
>>>>>
>>>>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
>>>>> in your previous debug log, it means the following function returns
>>>>> totally wrong result on S390.
>>>>>
>>>>> 	cpumask_first_and(hctx->cpumask, cpu_online_mask);
>>>>>
>>>>> The debugfs log shows that each hctx->cpumask includes one online
>>>>> CPU(0~15).
>>>>
>>>> Really? the last log (with the latest patch applied  shows a lot of contexts
>>>> that do not have CPUs in 0-15:
>>>>
>>>> e.g. 
>>>> [    4.049828] dump CPUs mapped to this hctx:
>>>> [    4.049829] 18 
>>>> [    4.049829] 82 
>>>> [    4.049830] 146 
>>>> [    4.049830] 210 
>>>> [    4.049831] 274 
>>>
>>> That won't be an issue, since no IO can be submitted from these offline
>>> CPUs, then these hctx shouldn't have been run at all.
>>>
>>> But hctx->next_cpu can be set as 512 for these inactive hctx in
>>> blk_mq_map_swqueue(), then please test the attached patch, and if
>>> hctx->next_cpu is still set as 512, something is still wrong.
>>
>>
>> WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" messages.
>> your debug code still triggers, though.
>>
>> wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
>> wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
>>
>> If we would remove the debug code then dmesg would be clean it seems.
> 
> That is still a bit strange, since for any inactive hctx(without online
> CPU mapped), blk_mq_run_hw_queue() will check blk_mq_hctx_has_pending()

I think for next_and it is reasonable to see this, as the next_and will return
512 after we have used the last one. In fact the code does call first_and in
that case for a reason, no?


> first. And there shouldn't be any pending IO for all inactive hctx
> in your case, so looks blk_mq_hctx_next_cpu() shouldn't be called for
> inactive hctx.
> 
> I will prepare one patchset and post out soon, and hope all these issues
> can be covered.
> 
> Thanks,
> Ming
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
  2018-04-06 15:11                                                   ` Christian Borntraeger
@ 2018-04-06 15:40                                                     ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2018-04-06 15:40 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland,
	Christoph Hellwig

On Fri, Apr 06, 2018 at 05:11:53PM +0200, Christian Borntraeger wrote:
> 
> 
> On 04/06/2018 04:58 PM, Ming Lei wrote:
> > On Fri, Apr 06, 2018 at 04:26:49PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 04/06/2018 03:41 PM, Ming Lei wrote:
> >>> On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 04/06/2018 11:23 AM, Ming Lei wrote:
> >>>>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 04/06/2018 10:41 AM, Ming Lei wrote:
> >>>>>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 04/05/2018 06:11 PM, Ming Lei wrote:
> >>>>>>>>>>
> >>>>>>>>>> Could you please apply the following patch and provide the dmesg boot log?
> >>>>>>>>>
> >>>>>>>>> And please post out the 'lscpu' log together from the test machine too.
> >>>>>>>>
> >>>>>>>> attached.
> >>>>>>>>
> >>>>>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller.
> >>>>>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores
> >>>>>>>> == 16 threads.
> >>>>>>>
> >>>>>>> OK, thanks!
> >>>>>>>
> >>>>>>> The most weird thing is that hctx->next_cpu is computed as 512 since
> >>>>>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of
> >>>>>>> possible CPU.
> >>>>>>>
> >>>>>>> Looks like it is a s390 specific issue, since I can setup one queue
> >>>>>>> which has same mapping with yours:
> >>>>>>>
> >>>>>>> 	- nr_cpu_id is 282
> >>>>>>> 	- CPU 0~15 is online
> >>>>>>> 	- 64 queues null_blk
> >>>>>>> 	- still run all hw queues in .complete handler
> >>>>>>>
> >>>>>>> But can't reproduce this issue at all.
> >>>>>>>
> >>>>>>> So please test the following patch, which may tell us why hctx->next_cpu
> >>>>>>> is computed wrong:
> >>>>>>
> >>>>>> I see things like
> >>>>>>
> >>>>>> [    8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>> [    8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>> [    8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>> [    8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>> [    8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>> [    8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>> [    8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>> [    8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>> [    8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and
> >>>>>>
> >>>>>> which is exactly what happens if the find and and operation fails (returns size of bitmap).
> >>>>>
> >>>>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct
> >>>>> in your previous debug log, it means the following function returns
> >>>>> totally wrong result on S390.
> >>>>>
> >>>>> 	cpumask_first_and(hctx->cpumask, cpu_online_mask);
> >>>>>
> >>>>> The debugfs log shows that each hctx->cpumask includes one online
> >>>>> CPU(0~15).
> >>>>
> >>>> Really? the last log (with the latest patch applied  shows a lot of contexts
> >>>> that do not have CPUs in 0-15:
> >>>>
> >>>> e.g. 
> >>>> [    4.049828] dump CPUs mapped to this hctx:
> >>>> [    4.049829] 18 
> >>>> [    4.049829] 82 
> >>>> [    4.049830] 146 
> >>>> [    4.049830] 210 
> >>>> [    4.049831] 274 
> >>>
> >>> That won't be an issue, since no IO can be submitted from these offline
> >>> CPUs, then these hctx shouldn't have been run at all.
> >>>
> >>> But hctx->next_cpu can be set as 512 for these inactive hctx in
> >>> blk_mq_map_swqueue(), then please test the attached patch, and if
> >>> hctx->next_cpu is still set as 512, something is still wrong.
> >>
> >>
> >> WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" messages.
> >> your debug code still triggers, though.
> >>
> >> wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and
> >> wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and
> >>
> >> If we would remove the debug code then dmesg would be clean it seems.
> > 
> > That is still a bit strange, since for any inactive hctx(without online
> > CPU mapped), blk_mq_run_hw_queue() will check blk_mq_hctx_has_pending()
> 
> I think for next_and it is reasonable to see this, as the next_and will return
> 512 after we have used the last one. In fact the code does call first_and in
> that case for a reason, no?

It is possible for dumping 'first_and' when there isn't any online CPUs mapped
to this hctx.

But my question is that for this case, there shouldn't be any IO queued
for this hctx, and blk_mq_hctx_has_pending() has been called to check
that, so blk_mq_hctx_next_cpu() should have only be called when
blk_mq_hctx_has_pending() in blk_mq_run_hw_queue() is true.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-04-06 15:40 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-28  1:20 [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() Ming Lei
2018-03-28  3:22 ` Jens Axboe
2018-03-28  7:45   ` Christian Borntraeger
2018-03-28 14:38     ` Jens Axboe
2018-03-28 14:53       ` Jens Axboe
2018-03-28 15:38         ` Christian Borntraeger
2018-03-28 15:26     ` Ming Lei
2018-03-28 15:36       ` Christian Borntraeger
2018-03-28 15:44         ` Christian Borntraeger
2018-03-29  2:00         ` Ming Lei
2018-03-29  7:23           ` Christian Borntraeger
2018-03-29  9:09             ` Christian Borntraeger
2018-03-29  9:40               ` Ming Lei
2018-03-29 10:10                 ` Christian Borntraeger
2018-03-29 10:48                   ` Ming Lei
2018-03-29 10:49                     ` Christian Borntraeger
2018-03-29 11:43                       ` Ming Lei
2018-03-29 11:49                         ` Christian Borntraeger
2018-03-30  2:53                           ` Ming Lei
2018-04-04  8:18                             ` Christian Borntraeger
2018-04-05 16:05                               ` Ming Lei
2018-04-05 16:11                                 ` Ming Lei
2018-04-05 17:39                                   ` Christian Borntraeger
2018-04-05 17:43                                     ` Christian Borntraeger
2018-04-06  8:41                                     ` Ming Lei
2018-04-06  8:51                                       ` Christian Borntraeger
2018-04-06  8:53                                         ` Christian Borntraeger
2018-04-06  9:23                                         ` Ming Lei
2018-04-06 10:19                                           ` Christian Borntraeger
2018-04-06 13:41                                             ` Ming Lei
2018-04-06 14:26                                               ` Christian Borntraeger
2018-04-06 14:58                                                 ` Ming Lei
2018-04-06 15:11                                                   ` Christian Borntraeger
2018-04-06 15:40                                                     ` Ming Lei
2018-04-06 11:37                                           ` Christian Borntraeger
2018-04-06  8:35                                 ` Christian Borntraeger
2018-03-29  9:52             ` Ming Lei
2018-03-29 10:11               ` Christian Borntraeger
2018-03-29 10:12                 ` Christian Borntraeger
2018-03-29 10:13               ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.