* [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() @ 2018-03-28 1:20 Ming Lei 2018-03-28 3:22 ` Jens Axboe 0 siblings, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-03-28 1:20 UTC (permalink / raw) To: Jens Axboe Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christian Borntraeger, Ming Lei, Christoph Hellwig >From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each possisble CPU") on, it should be easier to see unmapped hctx in some CPU topo, such as, hctx may not be mapped to any CPU. This patch avoids the warning in __blk_mq_delay_run_hw_queue() by checking if the hctx is mapped in blk_mq_run_hw_queues(). blk_mq_run_hw_queues() is often run in SCSI or some driver's completion path, so this warning has to be addressed. Reported-by: Stefan Haberland <sth@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Fixes: 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each possisble CPU") Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 16e83e6df404..48f25a63833b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1459,7 +1459,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async) int i; queue_for_each_hw_ctx(q, hctx, i) { - if (blk_mq_hctx_stopped(hctx)) + if (blk_mq_hctx_stopped(hctx) || !blk_mq_hw_queue_mapped(hctx)) continue; blk_mq_run_hw_queue(hctx, async); -- 2.9.5 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-28 1:20 [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() Ming Lei @ 2018-03-28 3:22 ` Jens Axboe 2018-03-28 7:45 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Jens Axboe @ 2018-03-28 3:22 UTC (permalink / raw) To: Ming Lei Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christian Borntraeger, Christoph Hellwig On 3/27/18 7:20 PM, Ming Lei wrote: > From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule > with each possisble CPU") on, it should be easier to see unmapped hctx > in some CPU topo, such as, hctx may not be mapped to any CPU. > > This patch avoids the warning in __blk_mq_delay_run_hw_queue() by > checking if the hctx is mapped in blk_mq_run_hw_queues(). > > blk_mq_run_hw_queues() is often run in SCSI or some driver's completion > path, so this warning has to be addressed. I don't like this very much. You're catching just one particular case, and if the hw queue has pending IO (for instance), then it's just wrong. How about something like the below? Totally untested... diff --git a/block/blk-mq.c b/block/blk-mq.c index 16e83e6df404..4c04ac124e5d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1307,6 +1307,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) int srcu_idx; /* + * Warn if the queue isn't mapped AND we have pending IO. Not being + * mapped isn't necessarily a huge issue, if we don't have pending IO. + */ + if (!blk_mq_hw_queue_mapped(hctx) && + !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx))) + return; + + /* * We should be running this queue from one of the CPUs that * are mapped to it. * -- Jens Axboe ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-28 3:22 ` Jens Axboe @ 2018-03-28 7:45 ` Christian Borntraeger 2018-03-28 14:38 ` Jens Axboe 2018-03-28 15:26 ` Ming Lei 0 siblings, 2 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-03-28 7:45 UTC (permalink / raw) To: Jens Axboe, Ming Lei Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig FWIW, this patch does not fix the issue for me: ostname=? addr=? terminal=? res=success' [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 #000000000069c5a2: a7f40001 brc 15,69c5a4 >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) 000000000069c5b2: 07f4 bcr 15,%r4 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 000000000069c5ba: a7f4fff6 brc 15,69c5a6 [ 21.455067] Call Trace: [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 [ 21.455136] Last Breaking-Event-Address: [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring On 03/28/2018 05:22 AM, Jens Axboe wrote: > On 3/27/18 7:20 PM, Ming Lei wrote: >> From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule >> with each possisble CPU") on, it should be easier to see unmapped hctx >> in some CPU topo, such as, hctx may not be mapped to any CPU. >> >> This patch avoids the warning in __blk_mq_delay_run_hw_queue() by >> checking if the hctx is mapped in blk_mq_run_hw_queues(). >> >> blk_mq_run_hw_queues() is often run in SCSI or some driver's completion >> path, so this warning has to be addressed. > > I don't like this very much. You're catching just one particular case, > and if the hw queue has pending IO (for instance), then it's just wrong. > > How about something like the below? Totally untested... > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 16e83e6df404..4c04ac124e5d 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1307,6 +1307,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) > int srcu_idx; > > /* > + * Warn if the queue isn't mapped AND we have pending IO. Not being > + * mapped isn't necessarily a huge issue, if we don't have pending IO. > + */ > + if (!blk_mq_hw_queue_mapped(hctx) && > + !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx))) > + return; > + > + /* > * We should be running this queue from one of the CPUs that > * are mapped to it. > * > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-28 7:45 ` Christian Borntraeger @ 2018-03-28 14:38 ` Jens Axboe 2018-03-28 14:53 ` Jens Axboe 2018-03-28 15:26 ` Ming Lei 1 sibling, 1 reply; 40+ messages in thread From: Jens Axboe @ 2018-03-28 14:38 UTC (permalink / raw) To: Christian Borntraeger, Ming Lei Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 3/28/18 1:45 AM, Christian Borntraeger wrote: > FWIW, this patch does not fix the issue for me: Looks like I didn't do the delayed path. How about the below? diff --git a/block/blk-mq.c b/block/blk-mq.c index 16e83e6df404..fd663ae1094c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1302,10 +1302,23 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, return (queued + errors) != 0; } +static bool blk_mq_bail_unmapped(struct blk_mq_hw_ctx *hctx) +{ + /* + * Warn if the queue isn't mapped AND we have pending IO. Not being + * mapped isn't necessarily a huge issue, if we don't have pending IO. + */ + return !blk_mq_hw_queue_mapped(hctx) && + !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx)); +} + static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) { int srcu_idx; + if (blk_mq_bail_unmapped(hctx)) + return; + /* * We should be running this queue from one of the CPUs that * are mapped to it. @@ -1399,9 +1412,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, unsigned long msecs) { - if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx))) + if (blk_mq_bail_unmapped(hctx)) return; - if (unlikely(blk_mq_hctx_stopped(hctx))) return; -- Jens Axboe ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-28 14:38 ` Jens Axboe @ 2018-03-28 14:53 ` Jens Axboe 2018-03-28 15:38 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Jens Axboe @ 2018-03-28 14:53 UTC (permalink / raw) To: Christian Borntraeger, Ming Lei Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 3/28/18 8:38 AM, Jens Axboe wrote: > On 3/28/18 1:45 AM, Christian Borntraeger wrote: >> FWIW, this patch does not fix the issue for me: > > Looks like I didn't do the delayed path. How about the below? OK, final version... This is more in line with what I originally suggested. diff --git a/block/blk-mq.c b/block/blk-mq.c index 16e83e6df404..c90016c36a70 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1306,6 +1306,10 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) { int srcu_idx; + if (!blk_mq_hw_queue_mapped(hctx) && + !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx))) + return; + /* * We should be running this queue from one of the CPUs that * are mapped to it. @@ -1399,9 +1403,6 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, unsigned long msecs) { - if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx))) - return; - if (unlikely(blk_mq_hctx_stopped(hctx))) return; @@ -1586,9 +1587,6 @@ static void blk_mq_run_work_fn(struct work_struct *work) void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) { - if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx))) - return; - /* * Stop the hw queue, then modify currently delayed work. * This should prevent us from running the queue prematurely. -- Jens Axboe ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-28 14:53 ` Jens Axboe @ 2018-03-28 15:38 ` Christian Borntraeger 0 siblings, 0 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-03-28 15:38 UTC (permalink / raw) To: Jens Axboe, Ming Lei Cc: linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig With that patch I now get: [ 40.620619] virbr0: port 1(virbr0-nic) entered disabled state [ 47.418592] run queue from wrong CPU 3, hctx inactive [ 47.418602] CPU: 3 PID: 2153 Comm: kworker/3:1H Tainted: G W 4.16.0-rc7+ #27 [ 47.418604] Hardware name: IBM 2964 NC9 704 (LPAR) [ 47.418613] Workqueue: kblockd blk_mq_run_work_fn [ 47.418615] Call Trace: [ 47.418621] ([<0000000000113b86>] show_stack+0x56/0x80) [ 47.418626] [<0000000000a5cd9a>] dump_stack+0x82/0xb0 [ 47.418627] [<000000000069c4be>] __blk_mq_run_hw_queue+0x136/0x160 [ 47.418631] [<0000000000163906>] process_one_work+0x1be/0x420 [ 47.418633] [<0000000000163bc0>] worker_thread+0x58/0x458 [ 47.418635] [<000000000016a9d0>] kthread+0x148/0x160 [ 47.418639] [<0000000000a7bf3a>] kernel_thread_starter+0x6/0xc [ 47.418640] [<0000000000a7bf34>] kernel_thread_starter+0x0/0xc [ 77.670407] run queue from wrong CPU 4, hctx inactive [ 77.670416] CPU: 4 PID: 2155 Comm: kworker/4:1H Tainted: G W 4.16.0-rc7+ #27 [ 77.670418] Hardware name: IBM 2964 NC9 704 (LPAR) [ 77.670428] Workqueue: kblockd blk_mq_run_work_fn [ 77.670430] Call Trace: [ 77.670436] ([<0000000000113b86>] show_stack+0x56/0x80) [ 77.670441] [<0000000000a5cd9a>] dump_stack+0x82/0xb0 [ 77.670442] [<000000000069c4be>] __blk_mq_run_hw_queue+0x136/0x160 [ 77.670446] [<0000000000163906>] process_one_work+0x1be/0x420 [ 77.670448] [<0000000000163bc0>] worker_thread+0x58/0x458 [ 77.670450] [<000000000016a9d0>] kthread+0x148/0x160 [ 77.670454] [<0000000000a7bf3a>] kernel_thread_starter+0x6/0xc [ 77.670455] [<0000000000a7bf34>] kernel_thread_starter+0x0/0xc On 03/28/2018 04:53 PM, Jens Axboe wrote: > On 3/28/18 8:38 AM, Jens Axboe wrote: >> On 3/28/18 1:45 AM, Christian Borntraeger wrote: >>> FWIW, this patch does not fix the issue for me: >> >> Looks like I didn't do the delayed path. How about the below? > > OK, final version... This is more in line with what I originally > suggested. > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 16e83e6df404..c90016c36a70 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1306,6 +1306,10 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) > { > int srcu_idx; > > + if (!blk_mq_hw_queue_mapped(hctx) && > + !WARN_ON_ONCE(blk_mq_hctx_has_pending(hctx))) > + return; > + > /* > * We should be running this queue from one of the CPUs that > * are mapped to it. > @@ -1399,9 +1403,6 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) > static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, > unsigned long msecs) > { > - if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx))) > - return; > - > if (unlikely(blk_mq_hctx_stopped(hctx))) > return; > > @@ -1586,9 +1587,6 @@ static void blk_mq_run_work_fn(struct work_struct *work) > > void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs) > { > - if (WARN_ON_ONCE(!blk_mq_hw_queue_mapped(hctx))) > - return; > - > /* > * Stop the hw queue, then modify currently delayed work. > * This should prevent us from running the queue prematurely. > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-28 7:45 ` Christian Borntraeger 2018-03-28 14:38 ` Jens Axboe @ 2018-03-28 15:26 ` Ming Lei 2018-03-28 15:36 ` Christian Borntraeger 1 sibling, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-03-28 15:26 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig Hi Christian, On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > FWIW, this patch does not fix the issue for me: > > ostname=? addr=? terminal=? res=success' > [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > #000000000069c5a2: a7f40001 brc 15,69c5a4 > >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > 000000000069c5b2: 07f4 bcr 15,%r4 > 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > [ 21.455067] Call Trace: > [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > [ 21.455136] Last Breaking-Event-Address: > [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring Thinking about this issue further, I can't understand the root cause for this issue. After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each possisble CPU"), each hw queue should be mapped to at least one CPU, that means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong? Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the following command? (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;) Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-28 15:26 ` Ming Lei @ 2018-03-28 15:36 ` Christian Borntraeger 2018-03-28 15:44 ` Christian Borntraeger 2018-03-29 2:00 ` Ming Lei 0 siblings, 2 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-03-28 15:36 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig [-- Attachment #1: Type: text/plain, Size: 4631 bytes --] On 03/28/2018 05:26 PM, Ming Lei wrote: > Hi Christian, > > On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: >> FWIW, this patch does not fix the issue for me: >> >> ostname=? addr=? terminal=? res=success' >> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 >> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 >> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 >> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) >> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) >> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 >> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 >> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 >> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 >> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) >> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 >> #000000000069c5a2: a7f40001 brc 15,69c5a4 >> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) >> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) >> 000000000069c5b2: 07f4 bcr 15,%r4 >> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 >> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 >> [ 21.455067] Call Trace: >> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) >> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 >> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 >> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 >> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 >> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 >> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 >> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 >> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 >> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 >> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 >> [ 21.455136] Last Breaking-Event-Address: >> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 >> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- >> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > > Thinking about this issue further, I can't understand the root cause for > this issue. > > After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with > each possisble CPU"), each hw queue should be mapped to at least one CPU, that > means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong? > > Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the > following command? # lscpu Architecture: s390x CPU op-mode(s): 32-bit, 64-bit Byte Order: Big Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s) per book: 3 Book(s) per drawer: 2 Drawer(s): 4 NUMA node(s): 1 Vendor ID: IBM/S390 Machine type: 2964 CPU dynamic MHz: 5000 CPU static MHz: 5000 BogoMIPS: 20325.00 Hypervisor: PR/SM Hypervisor vendor: IBM Virtualization type: full Dispatching mode: horizontal L1d cache: 128K L1i cache: 96K L2d cache: 2048K L2i cache: 2048K L3 cache: 65536K L4 cache: 491520K NUMA node0 CPU(s): 0-15 Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie # lsdasd Bus-ID Status Name Device Type BlkSz Size Blocks ============================================================================== 0.0.3f75 active dasda 94:0 ECKD 4096 21129MB 5409180 0.0.3f76 active dasdb 94:4 ECKD 4096 21129MB 5409180 0.0.3f77 active dasdc 94:8 ECKD 4096 21129MB 5409180 0.0.3f74 active dasdd 94:12 ECKD 4096 21129MB 5409180 > > (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;) see attachement: [-- Attachment #2: log --] [-- Type: text/plain, Size: 21552 bytes --] dasda/range:4 dasda/capability:10 dasda/inflight: 0 0 dasda/ext_range:4 dasda/power/runtime_suspended_time:0 dasda/power/runtime_active_time:0 dasda/power/control:auto dasda/power/runtime_status:unsupported dasda/dev:94:0 dasda/hidden:0 dasda/ro:0 dasda/mq/7/nr_tags:1024 dasda/mq/7/nr_reserved_tags:0 dasda/mq/7/cpu_list:7 dasda/mq/15/nr_tags:1024 dasda/mq/15/nr_reserved_tags:0 dasda/mq/15/cpu_list:15 dasda/mq/5/nr_tags:1024 dasda/mq/5/nr_reserved_tags:0 dasda/mq/5/cpu_list:5 dasda/mq/13/nr_tags:1024 dasda/mq/13/nr_reserved_tags:0 dasda/mq/13/cpu_list:13 dasda/mq/3/nr_tags:1024 dasda/mq/3/nr_reserved_tags:0 dasda/mq/3/cpu_list:3 dasda/mq/11/nr_tags:1024 dasda/mq/11/nr_reserved_tags:0 dasda/mq/11/cpu_list:11 dasda/mq/1/nr_tags:1024 dasda/mq/1/nr_reserved_tags:0 dasda/mq/1/cpu_list:1 dasda/mq/8/nr_tags:1024 dasda/mq/8/nr_reserved_tags:0 dasda/mq/8/cpu_list:8 dasda/mq/6/nr_tags:1024 dasda/mq/6/nr_reserved_tags:0 dasda/mq/6/cpu_list:6 dasda/mq/14/nr_tags:1024 dasda/mq/14/nr_reserved_tags:0 dasda/mq/14/cpu_list:14 dasda/mq/4/nr_tags:1024 dasda/mq/4/nr_reserved_tags:0 dasda/mq/4/cpu_list:4 dasda/mq/12/nr_tags:1024 dasda/mq/12/nr_reserved_tags:0 dasda/mq/12/cpu_list:12 dasda/mq/2/nr_tags:1024 dasda/mq/2/nr_reserved_tags:0 dasda/mq/2/cpu_list:2 dasda/mq/10/nr_tags:1024 dasda/mq/10/nr_reserved_tags:0 dasda/mq/10/cpu_list:10 dasda/mq/0/nr_tags:1024 dasda/mq/0/nr_reserved_tags:0 dasda/mq/0/cpu_list:0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281 dasda/mq/9/nr_tags:1024 dasda/mq/9/nr_reserved_tags:0 dasda/mq/9/cpu_list:9 dasda/stat: 126 0 10728 60 0 0 0 0 0 10 20 dasda/removable:0 dasda/size:43273440 dasda/alignment_offset:0 dasda/queue/hw_sector_size:4096 dasda/queue/max_discard_segments:1 dasda/queue/max_segment_size:4096 dasda/queue/physical_block_size:4096 dasda/queue/discard_max_bytes:0 dasda/queue/rotational:0 dasda/queue/iosched/fifo_batch:16 dasda/queue/iosched/read_expire:500 dasda/queue/iosched/writes_starved:2 dasda/queue/iosched/write_expire:5000 dasda/queue/iosched/front_merges:1 dasda/queue/write_same_max_bytes:0 dasda/queue/zoned:none dasda/queue/max_sectors_kb:760 dasda/queue/discard_zeroes_data:0 dasda/queue/read_ahead_kb:128 dasda/queue/discard_max_hw_bytes:0 dasda/queue/wbt_lat_usec:75000 dasda/queue/nomerges:0 dasda/queue/max_segments:65535 dasda/queue/rq_affinity:1 dasda/queue/iostats:1 dasda/queue/dax:0 dasda/queue/minimum_io_size:4096 dasda/queue/chunk_sectors:0 dasda/queue/io_poll:1 dasda/queue/write_zeroes_max_bytes:0 dasda/queue/max_hw_sectors_kb:760 dasda/queue/add_random:0 dasda/queue/optimal_io_size:0 dasda/queue/nr_requests:256 dasda/queue/scheduler:[mq-deadline] kyber none dasda/queue/discard_granularity:0 dasda/queue/logical_block_size:4096 dasda/queue/io_poll_delay:-1 dasda/queue/max_integrity_segments:0 dasda/queue/write_cache:write through dasda/trace/end_lba:disabled dasda/trace/act_mask:disabled dasda/trace/start_lba:disabled dasda/trace/enable:0 dasda/trace/pid:disabled dasda/uevent:MAJOR=94 dasda/uevent:MINOR=0 dasda/uevent:DEVNAME=dasda dasda/uevent:DEVTYPE=disk dasda/integrity/write_generate:0 dasda/integrity/device_is_integrity_capable:0 dasda/integrity/tag_size:0 dasda/integrity/read_verify:0 dasda/integrity/protection_interval_bytes:0 dasda/integrity/format:none dasda/discard_alignment:0 dasda/dasda1/start:192 dasda/dasda1/inflight: 0 0 dasda/dasda1/power/runtime_suspended_time:0 dasda/dasda1/power/runtime_active_time:0 dasda/dasda1/power/control:auto dasda/dasda1/power/runtime_status:unsupported dasda/dasda1/dev:94:1 dasda/dasda1/ro:0 dasda/dasda1/partition:1 dasda/dasda1/stat: 115 0 10216 60 0 0 0 0 0 10 20 dasda/dasda1/size:43273248 dasda/dasda1/alignment_offset:0 dasda/dasda1/trace/end_lba:disabled dasda/dasda1/trace/act_mask:disabled dasda/dasda1/trace/start_lba:disabled dasda/dasda1/trace/enable:0 dasda/dasda1/trace/pid:disabled dasda/dasda1/uevent:MAJOR=94 dasda/dasda1/uevent:MINOR=1 dasda/dasda1/uevent:DEVNAME=dasda1 dasda/dasda1/uevent:DEVTYPE=partition dasda/dasda1/uevent:PARTN=1 dasda/dasda1/discard_alignment:0 dasdb/range:4 dasdb/capability:10 dasdb/inflight: 0 0 dasdb/ext_range:4 dasdb/power/runtime_suspended_time:0 dasdb/power/runtime_active_time:0 dasdb/power/control:auto dasdb/power/runtime_status:unsupported dasdb/dev:94:4 dasdb/hidden:0 dasdb/ro:0 dasdb/mq/7/nr_tags:1024 dasdb/mq/7/nr_reserved_tags:0 dasdb/mq/7/cpu_list:7 dasdb/mq/15/nr_tags:1024 dasdb/mq/15/nr_reserved_tags:0 dasdb/mq/15/cpu_list:15 dasdb/mq/5/nr_tags:1024 dasdb/mq/5/nr_reserved_tags:0 dasdb/mq/5/cpu_list:5 dasdb/mq/13/nr_tags:1024 dasdb/mq/13/nr_reserved_tags:0 dasdb/mq/13/cpu_list:13 dasdb/mq/3/nr_tags:1024 dasdb/mq/3/nr_reserved_tags:0 dasdb/mq/3/cpu_list:3 dasdb/mq/11/nr_tags:1024 dasdb/mq/11/nr_reserved_tags:0 dasdb/mq/11/cpu_list:11 dasdb/mq/1/nr_tags:1024 dasdb/mq/1/nr_reserved_tags:0 dasdb/mq/1/cpu_list:1 dasdb/mq/8/nr_tags:1024 dasdb/mq/8/nr_reserved_tags:0 dasdb/mq/8/cpu_list:8 dasdb/mq/6/nr_tags:1024 dasdb/mq/6/nr_reserved_tags:0 dasdb/mq/6/cpu_list:6 dasdb/mq/14/nr_tags:1024 dasdb/mq/14/nr_reserved_tags:0 dasdb/mq/14/cpu_list:14 dasdb/mq/4/nr_tags:1024 dasdb/mq/4/nr_reserved_tags:0 dasdb/mq/4/cpu_list:4 dasdb/mq/12/nr_tags:1024 dasdb/mq/12/nr_reserved_tags:0 dasdb/mq/12/cpu_list:12 dasdb/mq/2/nr_tags:1024 dasdb/mq/2/nr_reserved_tags:0 dasdb/mq/2/cpu_list:2 dasdb/mq/10/nr_tags:1024 dasdb/mq/10/nr_reserved_tags:0 dasdb/mq/10/cpu_list:10 dasdb/mq/0/nr_tags:1024 dasdb/mq/0/nr_reserved_tags:0 dasdb/mq/0/cpu_list:0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281 dasdb/mq/9/nr_tags:1024 dasdb/mq/9/nr_reserved_tags:0 dasdb/mq/9/cpu_list:9 dasdb/stat: 129 0 10504 50 1 0 8 0 0 10 10 dasdb/removable:0 dasdb/size:43273440 dasdb/alignment_offset:0 dasdb/queue/hw_sector_size:4096 dasdb/queue/max_discard_segments:1 dasdb/queue/max_segment_size:4096 dasdb/queue/physical_block_size:4096 dasdb/queue/discard_max_bytes:0 dasdb/queue/rotational:0 dasdb/queue/iosched/fifo_batch:16 dasdb/queue/iosched/read_expire:500 dasdb/queue/iosched/writes_starved:2 dasdb/queue/iosched/write_expire:5000 dasdb/queue/iosched/front_merges:1 dasdb/queue/write_same_max_bytes:0 dasdb/queue/zoned:none dasdb/queue/max_sectors_kb:760 dasdb/queue/discard_zeroes_data:0 dasdb/queue/read_ahead_kb:128 dasdb/queue/discard_max_hw_bytes:0 dasdb/queue/wbt_lat_usec:75000 dasdb/queue/nomerges:0 dasdb/queue/max_segments:65535 dasdb/queue/rq_affinity:1 dasdb/queue/iostats:1 dasdb/queue/dax:0 dasdb/queue/minimum_io_size:4096 dasdb/queue/chunk_sectors:0 dasdb/queue/io_poll:1 dasdb/queue/write_zeroes_max_bytes:0 dasdb/queue/max_hw_sectors_kb:760 dasdb/queue/add_random:0 dasdb/queue/optimal_io_size:0 dasdb/queue/nr_requests:256 dasdb/queue/scheduler:[mq-deadline] kyber none dasdb/queue/discard_granularity:0 dasdb/queue/logical_block_size:4096 dasdb/queue/io_poll_delay:-1 dasdb/queue/max_integrity_segments:0 dasdb/queue/write_cache:write through dasdb/trace/end_lba:disabled dasdb/trace/act_mask:disabled dasdb/trace/start_lba:disabled dasdb/trace/enable:0 dasdb/trace/pid:disabled dasdb/uevent:MAJOR=94 dasdb/uevent:MINOR=4 dasdb/uevent:DEVNAME=dasdb dasdb/uevent:DEVTYPE=disk dasdb/integrity/write_generate:0 dasdb/integrity/device_is_integrity_capable:0 dasdb/integrity/tag_size:0 dasdb/integrity/read_verify:0 dasdb/integrity/protection_interval_bytes:0 dasdb/integrity/format:none dasdb/dasdb1/start:192 dasdb/dasdb1/inflight: 0 0 dasdb/dasdb1/power/runtime_suspended_time:0 dasdb/dasdb1/power/runtime_active_time:0 dasdb/dasdb1/power/control:auto dasdb/dasdb1/power/runtime_status:unsupported dasdb/dasdb1/dev:94:5 dasdb/dasdb1/ro:0 dasdb/dasdb1/partition:1 dasdb/dasdb1/stat: 118 0 9992 30 1 0 8 0 0 0 0 dasdb/dasdb1/size:43273248 dasdb/dasdb1/alignment_offset:0 dasdb/dasdb1/trace/end_lba:disabled dasdb/dasdb1/trace/act_mask:disabled dasdb/dasdb1/trace/start_lba:disabled dasdb/dasdb1/trace/enable:0 dasdb/dasdb1/trace/pid:disabled dasdb/dasdb1/uevent:MAJOR=94 dasdb/dasdb1/uevent:MINOR=5 dasdb/dasdb1/uevent:DEVNAME=dasdb1 dasdb/dasdb1/uevent:DEVTYPE=partition dasdb/dasdb1/uevent:PARTN=1 dasdb/dasdb1/discard_alignment:0 dasdb/discard_alignment:0 dasdc/range:4 dasdc/capability:10 dasdc/inflight: 0 0 dasdc/ext_range:4 dasdc/power/runtime_suspended_time:0 dasdc/power/runtime_active_time:0 dasdc/power/control:auto dasdc/power/runtime_status:unsupported dasdc/dev:94:8 dasdc/hidden:0 dasdc/ro:0 dasdc/mq/7/nr_tags:1024 dasdc/mq/7/nr_reserved_tags:0 dasdc/mq/7/cpu_list:7 dasdc/mq/15/nr_tags:1024 dasdc/mq/15/nr_reserved_tags:0 dasdc/mq/15/cpu_list:15 dasdc/mq/5/nr_tags:1024 dasdc/mq/5/nr_reserved_tags:0 dasdc/mq/5/cpu_list:5 dasdc/mq/13/nr_tags:1024 dasdc/mq/13/nr_reserved_tags:0 dasdc/mq/13/cpu_list:13 dasdc/mq/3/nr_tags:1024 dasdc/mq/3/nr_reserved_tags:0 dasdc/mq/3/cpu_list:3 dasdc/mq/11/nr_tags:1024 dasdc/mq/11/nr_reserved_tags:0 dasdc/mq/11/cpu_list:11 dasdc/mq/1/nr_tags:1024 dasdc/mq/1/nr_reserved_tags:0 dasdc/mq/1/cpu_list:1 dasdc/mq/8/nr_tags:1024 dasdc/mq/8/nr_reserved_tags:0 dasdc/mq/8/cpu_list:8 dasdc/mq/6/nr_tags:1024 dasdc/mq/6/nr_reserved_tags:0 dasdc/mq/6/cpu_list:6 dasdc/mq/14/nr_tags:1024 dasdc/mq/14/nr_reserved_tags:0 dasdc/mq/14/cpu_list:14 dasdc/mq/4/nr_tags:1024 dasdc/mq/4/nr_reserved_tags:0 dasdc/mq/4/cpu_list:4 dasdc/mq/12/nr_tags:1024 dasdc/mq/12/nr_reserved_tags:0 dasdc/mq/12/cpu_list:12 dasdc/mq/2/nr_tags:1024 dasdc/mq/2/nr_reserved_tags:0 dasdc/mq/2/cpu_list:2 dasdc/mq/10/nr_tags:1024 dasdc/mq/10/nr_reserved_tags:0 dasdc/mq/10/cpu_list:10 dasdc/mq/0/nr_tags:1024 dasdc/mq/0/nr_reserved_tags:0 dasdc/mq/0/cpu_list:0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281 dasdc/mq/9/nr_tags:1024 dasdc/mq/9/nr_reserved_tags:0 dasdc/mq/9/cpu_list:9 dasdc/stat: 129 0 10504 50 1 0 8 0 0 0 0 dasdc/removable:0 dasdc/size:43273440 dasdc/alignment_offset:0 dasdc/queue/hw_sector_size:4096 dasdc/queue/max_discard_segments:1 dasdc/queue/max_segment_size:4096 dasdc/queue/physical_block_size:4096 dasdc/queue/discard_max_bytes:0 dasdc/queue/rotational:0 dasdc/queue/iosched/fifo_batch:16 dasdc/queue/iosched/read_expire:500 dasdc/queue/iosched/writes_starved:2 dasdc/queue/iosched/write_expire:5000 dasdc/queue/iosched/front_merges:1 dasdc/queue/write_same_max_bytes:0 dasdc/queue/zoned:none dasdc/queue/max_sectors_kb:760 dasdc/queue/discard_zeroes_data:0 dasdc/queue/read_ahead_kb:128 dasdc/queue/discard_max_hw_bytes:0 dasdc/queue/wbt_lat_usec:75000 dasdc/queue/nomerges:0 dasdc/queue/max_segments:65535 dasdc/queue/rq_affinity:1 dasdc/queue/iostats:1 dasdc/queue/dax:0 dasdc/queue/minimum_io_size:4096 dasdc/queue/chunk_sectors:0 dasdc/queue/io_poll:1 dasdc/queue/write_zeroes_max_bytes:0 dasdc/queue/max_hw_sectors_kb:760 dasdc/queue/add_random:0 dasdc/queue/optimal_io_size:0 dasdc/queue/nr_requests:256 dasdc/queue/scheduler:[mq-deadline] kyber none dasdc/queue/discard_granularity:0 dasdc/queue/logical_block_size:4096 dasdc/queue/io_poll_delay:-1 dasdc/queue/max_integrity_segments:0 dasdc/queue/write_cache:write through dasdc/trace/end_lba:disabled dasdc/trace/act_mask:disabled dasdc/trace/start_lba:disabled dasdc/trace/enable:0 dasdc/trace/pid:disabled dasdc/uevent:MAJOR=94 dasdc/uevent:MINOR=8 dasdc/uevent:DEVNAME=dasdc dasdc/uevent:DEVTYPE=disk dasdc/dasdc1/start:192 dasdc/dasdc1/inflight: 0 0 dasdc/dasdc1/power/runtime_suspended_time:0 dasdc/dasdc1/power/runtime_active_time:0 dasdc/dasdc1/power/control:auto dasdc/dasdc1/power/runtime_status:unsupported dasdc/dasdc1/dev:94:9 dasdc/dasdc1/ro:0 dasdc/dasdc1/partition:1 dasdc/dasdc1/stat: 118 0 9992 50 1 0 8 0 0 0 0 dasdc/dasdc1/size:43273248 dasdc/dasdc1/alignment_offset:0 dasdc/dasdc1/trace/end_lba:disabled dasdc/dasdc1/trace/act_mask:disabled dasdc/dasdc1/trace/start_lba:disabled dasdc/dasdc1/trace/enable:0 dasdc/dasdc1/trace/pid:disabled dasdc/dasdc1/uevent:MAJOR=94 dasdc/dasdc1/uevent:MINOR=9 dasdc/dasdc1/uevent:DEVNAME=dasdc1 dasdc/dasdc1/uevent:DEVTYPE=partition dasdc/dasdc1/uevent:PARTN=1 dasdc/dasdc1/discard_alignment:0 dasdc/integrity/write_generate:0 dasdc/integrity/device_is_integrity_capable:0 dasdc/integrity/tag_size:0 dasdc/integrity/read_verify:0 dasdc/integrity/protection_interval_bytes:0 dasdc/integrity/format:none dasdc/discard_alignment:0 dasdd/range:4 dasdd/capability:10 dasdd/inflight: 0 0 dasdd/ext_range:4 dasdd/power/runtime_suspended_time:0 dasdd/power/runtime_active_time:0 dasdd/power/control:auto dasdd/power/runtime_status:unsupported dasdd/dev:94:12 dasdd/hidden:0 dasdd/ro:0 dasdd/mq/7/nr_tags:1024 dasdd/mq/7/nr_reserved_tags:0 dasdd/mq/7/cpu_list:7 dasdd/mq/15/nr_tags:1024 dasdd/mq/15/nr_reserved_tags:0 dasdd/mq/15/cpu_list:15 dasdd/mq/5/nr_tags:1024 dasdd/mq/5/nr_reserved_tags:0 dasdd/mq/5/cpu_list:5 dasdd/mq/13/nr_tags:1024 dasdd/mq/13/nr_reserved_tags:0 dasdd/mq/13/cpu_list:13 dasdd/mq/3/nr_tags:1024 dasdd/mq/3/nr_reserved_tags:0 dasdd/mq/3/cpu_list:3 dasdd/mq/11/nr_tags:1024 dasdd/mq/11/nr_reserved_tags:0 dasdd/mq/11/cpu_list:11 dasdd/mq/1/nr_tags:1024 dasdd/mq/1/nr_reserved_tags:0 dasdd/mq/1/cpu_list:1 dasdd/mq/8/nr_tags:1024 dasdd/mq/8/nr_reserved_tags:0 dasdd/mq/8/cpu_list:8 dasdd/mq/6/nr_tags:1024 dasdd/mq/6/nr_reserved_tags:0 dasdd/mq/6/cpu_list:6 dasdd/mq/14/nr_tags:1024 dasdd/mq/14/nr_reserved_tags:0 dasdd/mq/14/cpu_list:14 dasdd/mq/4/nr_tags:1024 dasdd/mq/4/nr_reserved_tags:0 dasdd/mq/4/cpu_list:4 dasdd/mq/12/nr_tags:1024 dasdd/mq/12/nr_reserved_tags:0 dasdd/mq/12/cpu_list:12 dasdd/mq/2/nr_tags:1024 dasdd/mq/2/nr_reserved_tags:0 dasdd/mq/2/cpu_list:2 dasdd/mq/10/nr_tags:1024 dasdd/mq/10/nr_reserved_tags:0 dasdd/mq/10/cpu_list:10 dasdd/mq/0/nr_tags:1024 dasdd/mq/0/nr_reserved_tags:0 dasdd/mq/0/cpu_list:0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281 dasdd/mq/9/nr_tags:1024 dasdd/mq/9/nr_reserved_tags:0 dasdd/mq/9/cpu_list:9 dasdd/stat: 6591 1270 412856 4720 24963 7143 743920 30530 0 3670 8400 dasdd/removable:0 dasdd/size:43273440 dasdd/alignment_offset:0 dasdd/queue/hw_sector_size:4096 dasdd/queue/max_discard_segments:1 dasdd/queue/max_segment_size:4096 dasdd/queue/physical_block_size:4096 dasdd/queue/discard_max_bytes:0 dasdd/queue/rotational:0 dasdd/queue/iosched/fifo_batch:16 dasdd/queue/iosched/read_expire:500 dasdd/queue/iosched/writes_starved:2 dasdd/queue/iosched/write_expire:5000 dasdd/queue/iosched/front_merges:1 dasdd/queue/write_same_max_bytes:0 dasdd/queue/zoned:none dasdd/queue/max_sectors_kb:760 dasdd/queue/discard_zeroes_data:0 dasdd/queue/read_ahead_kb:128 dasdd/queue/discard_max_hw_bytes:0 dasdd/queue/wbt_lat_usec:75000 dasdd/queue/nomerges:0 dasdd/queue/max_segments:65535 dasdd/queue/rq_affinity:1 dasdd/queue/iostats:1 dasdd/queue/dax:0 dasdd/queue/minimum_io_size:4096 dasdd/queue/chunk_sectors:0 dasdd/queue/io_poll:1 dasdd/queue/write_zeroes_max_bytes:0 dasdd/queue/max_hw_sectors_kb:760 dasdd/queue/add_random:0 dasdd/queue/optimal_io_size:0 dasdd/queue/nr_requests:256 dasdd/queue/scheduler:[mq-deadline] kyber none dasdd/queue/discard_granularity:0 dasdd/queue/logical_block_size:4096 dasdd/queue/io_poll_delay:-1 dasdd/queue/max_integrity_segments:0 dasdd/queue/write_cache:write through dasdd/trace/end_lba:disabled dasdd/trace/act_mask:disabled dasdd/trace/start_lba:disabled dasdd/trace/enable:0 dasdd/trace/pid:disabled dasdd/uevent:MAJOR=94 dasdd/uevent:MINOR=12 dasdd/uevent:DEVNAME=dasdd dasdd/uevent:DEVTYPE=disk dasdd/dasdd1/start:192 dasdd/dasdd1/inflight: 0 0 dasdd/dasdd1/power/runtime_suspended_time:0 dasdd/dasdd1/power/runtime_active_time:0 dasdd/dasdd1/power/control:auto dasdd/dasdd1/power/runtime_status:unsupported dasdd/dasdd1/dev:94:13 dasdd/dasdd1/ro:0 dasdd/dasdd1/partition:1 dasdd/dasdd1/stat: 6580 1270 412344 4720 24963 7143 743920 30530 0 3670 8400 dasdd/dasdd1/size:43273248 dasdd/dasdd1/alignment_offset:0 dasdd/dasdd1/trace/end_lba:disabled dasdd/dasdd1/trace/act_mask:disabled dasdd/dasdd1/trace/start_lba:disabled dasdd/dasdd1/trace/enable:0 dasdd/dasdd1/trace/pid:disabled dasdd/dasdd1/uevent:MAJOR=94 dasdd/dasdd1/uevent:MINOR=13 dasdd/dasdd1/uevent:DEVNAME=dasdd1 dasdd/dasdd1/uevent:DEVTYPE=partition dasdd/dasdd1/uevent:PARTN=1 dasdd/dasdd1/discard_alignment:0 dasdd/integrity/write_generate:0 dasdd/integrity/device_is_integrity_capable:0 dasdd/integrity/tag_size:0 dasdd/integrity/read_verify:0 dasdd/integrity/protection_interval_bytes:0 dasdd/integrity/format:none dasdd/discard_alignment:0 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-28 15:36 ` Christian Borntraeger @ 2018-03-28 15:44 ` Christian Borntraeger 2018-03-29 2:00 ` Ming Lei 1 sibling, 0 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-03-28 15:44 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig FWIW, these logs were from a different system (with less disks and cpus). the related log is [ 4.114191] dasd-eckd.2aa01a: 0.0.3f77: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors [ 4.114852] dasd-eckd.2aa01a: 0.0.3f74: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors [ 4.122361] dasd-eckd.412b53: 0.0.3f77: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout [ 4.122811] dasd-eckd.412b53: 0.0.3f74: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout [ 4.123568] dasdc:VOL1/ 0X3F77: dasdc1 [ 4.124092] dasdd:VOL1/ 0X3F74: dasdd1 [ 4.286220] WARNING: CPU: 1 PID: 1262 at block/blk-mq.c:1402 __blk_mq_delay_run_hw_queue+0xbe/0xd8 [ 4.286225] Modules linked in: autofs4 [ 4.286231] CPU: 1 PID: 1262 Comm: dasdconf.sh Not tainted 4.16.0-20180323.rc6.git0.792f5024dd01.300.fc27.s390x #1 [ 4.286232] Hardware name: IBM 2964 NC9 704 (LPAR) [ 4.286236] Krnl PSW : 0000000053ccfc28 00000000c4b59c51 (__blk_mq_delay_run_hw_queue+0xbe/0xd8) [ 4.286239] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 4.286242] Krnl GPRS: 00000003da4eb000 0000000300000000 00000003dae67000 0000000000000001 [ 4.286243] 0000000000000000 00000003da4eb710 0000000300000000 000000010dbafd98 [ 4.286245] 000000010dbafd98 0000000000000001 0000000000000001 0000000000000000 [ 4.286248] 00000003dae67000 0000000000aaa750 000000010dbafc00 000000010dbafbc8 [ 4.286256] Krnl Code: 0000000000698e46: ebaff0a00004 lmg %r10,%r15,160(%r15) 0000000000698e4c: c0f4ffff7aca brcl 15,6883e0 #0000000000698e52: a7f40001 brc 15,698e54 >0000000000698e56: e340f0c00004 lg %r4,192(%r15) 0000000000698e5c: ebaff0a00004 lmg %r10,%r15,160(%r15) 0000000000698e62: 07f4 bcr 15,%r4 0000000000698e64: c0e5ffffff02 brasl %r14,698c68 0000000000698e6a: a7f4fff6 brc 15,698e56 [ 4.286301] Call Trace: [ 4.286304] ([<000000010dbafc08>] 0x10dbafc08) [ 4.286306] [<0000000000698f5a>] blk_mq_run_hw_queue+0x82/0x180 [ 4.286308] [<00000000006990c0>] blk_mq_run_hw_queues+0x68/0x88 [ 4.286310] [<00000000006982de>] __blk_mq_complete_request+0x11e/0x1d8 [ 4.286313] [<0000000000698424>] blk_mq_complete_request+0x8c/0xc8 [ 4.286319] [<000000000082c5d0>] dasd_block_tasklet+0x158/0x490 [ 4.286325] [<000000000014bc9a>] tasklet_hi_action+0x92/0x120 [ 4.286329] [<00000000009feeb0>] __do_softirq+0x120/0x348 [ 4.286331] [<000000000014b76a>] irq_exit+0xba/0xd0 [ 4.286335] [<000000000010bf92>] do_IRQ+0x8a/0xb8 [ 4.286337] [<00000000009fe180>] io_int_handler+0x130/0x298 [ 4.286338] Last Breaking-Event-Address: [ 4.286340] [<0000000000698e52>] __blk_mq_delay_run_hw_queue+0xba/0xd8 [ 4.286342] ---[ end trace 0d746eb6f9348354 ]--- On 03/28/2018 05:36 PM, Christian Borntraeger wrote: > > > On 03/28/2018 05:26 PM, Ming Lei wrote: >> Hi Christian, >> >> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: >>> FWIW, this patch does not fix the issue for me: >>> >>> ostname=? addr=? terminal=? res=success' >>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 >>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 >>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 >>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) >>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) >>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 >>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 >>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 >>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 >>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) >>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 >>> #000000000069c5a2: a7f40001 brc 15,69c5a4 >>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) >>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) >>> 000000000069c5b2: 07f4 bcr 15,%r4 >>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 >>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 >>> [ 21.455067] Call Trace: >>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) >>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 >>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 >>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 >>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 >>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 >>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 >>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 >>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 >>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 >>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 >>> [ 21.455136] Last Breaking-Event-Address: >>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 >>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- >>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring >> >> Thinking about this issue further, I can't understand the root cause for >> this issue. >> >> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with >> each possisble CPU"), each hw queue should be mapped to at least one CPU, that >> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong? >> >> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the >> following command? > > # lscpu > Architecture: s390x > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Big Endian > CPU(s): 16 > On-line CPU(s) list: 0-15 > Thread(s) per core: 2 > Core(s) per socket: 8 > Socket(s) per book: 3 > Book(s) per drawer: 2 > Drawer(s): 4 > NUMA node(s): 1 > Vendor ID: IBM/S390 > Machine type: 2964 > CPU dynamic MHz: 5000 > CPU static MHz: 5000 > BogoMIPS: 20325.00 > Hypervisor: PR/SM > Hypervisor vendor: IBM > Virtualization type: full > Dispatching mode: horizontal > L1d cache: 128K > L1i cache: 96K > L2d cache: 2048K > L2i cache: 2048K > L3 cache: 65536K > L4 cache: 491520K > NUMA node0 CPU(s): 0-15 > Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie > > # lsdasd > Bus-ID Status Name Device Type BlkSz Size Blocks > ============================================================================== > 0.0.3f75 active dasda 94:0 ECKD 4096 21129MB 5409180 > 0.0.3f76 active dasdb 94:4 ECKD 4096 21129MB 5409180 > 0.0.3f77 active dasdc 94:8 ECKD 4096 21129MB 5409180 > 0.0.3f74 active dasdd 94:12 ECKD 4096 21129MB 5409180 > >> >> (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;) > > > see attachement: > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-28 15:36 ` Christian Borntraeger 2018-03-28 15:44 ` Christian Borntraeger @ 2018-03-29 2:00 ` Ming Lei 2018-03-29 7:23 ` Christian Borntraeger 1 sibling, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-03-29 2:00 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > > > On 03/28/2018 05:26 PM, Ming Lei wrote: > > Hi Christian, > > > > On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > >> FWIW, this patch does not fix the issue for me: > >> > >> ostname=? addr=? terminal=? res=success' > >> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > >> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > >> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > >> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > >> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > >> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > >> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > >> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > >> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > >> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > >> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > >> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > >> #000000000069c5a2: a7f40001 brc 15,69c5a4 > >> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > >> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > >> 000000000069c5b2: 07f4 bcr 15,%r4 > >> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > >> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > >> [ 21.455067] Call Trace: > >> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > >> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > >> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > >> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > >> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > >> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > >> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > >> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > >> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > >> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > >> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > >> [ 21.455136] Last Breaking-Event-Address: > >> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > >> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > >> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > > > > Thinking about this issue further, I can't understand the root cause for > > this issue. > > > > After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with > > each possisble CPU"), each hw queue should be mapped to at least one CPU, that > > means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong? > > > > Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the > > following command? > > # lscpu > Architecture: s390x > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Big Endian > CPU(s): 16 > On-line CPU(s) list: 0-15 > Thread(s) per core: 2 > Core(s) per socket: 8 > Socket(s) per book: 3 > Book(s) per drawer: 2 > Drawer(s): 4 > NUMA node(s): 1 > Vendor ID: IBM/S390 > Machine type: 2964 > CPU dynamic MHz: 5000 > CPU static MHz: 5000 > BogoMIPS: 20325.00 > Hypervisor: PR/SM > Hypervisor vendor: IBM > Virtualization type: full > Dispatching mode: horizontal > L1d cache: 128K > L1i cache: 96K > L2d cache: 2048K > L2i cache: 2048K > L3 cache: 65536K > L4 cache: 491520K > NUMA node0 CPU(s): 0-15 > Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie > > # lsdasd > Bus-ID Status Name Device Type BlkSz Size Blocks > ============================================================================== > 0.0.3f75 active dasda 94:0 ECKD 4096 21129MB 5409180 > 0.0.3f76 active dasdb 94:4 ECKD 4096 21129MB 5409180 > 0.0.3f77 active dasdc 94:8 ECKD 4096 21129MB 5409180 > 0.0.3f74 active dasdd 94:12 ECKD 4096 21129MB 5409180 I have tried to emulate your CPU topo via VM and the blk-mq mapping of null_blk is basically similar with your DASD mapping, but still can't reproduce your issue. BTW, do you need to do cpu hotplug or other actions for triggering this warning? > > > > > (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;) > > > see attachement: > dasda/range:4 > dasda/capability:10 > dasda/inflight: 0 0 > dasda/ext_range:4 > dasda/power/runtime_suspended_time:0 > dasda/power/runtime_active_time:0 > dasda/power/control:auto > dasda/power/runtime_status:unsupported > dasda/dev:94:0 No, it is from sysfs instead of debugfs, so please run the following command: (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;) Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 2:00 ` Ming Lei @ 2018-03-29 7:23 ` Christian Borntraeger 2018-03-29 9:09 ` Christian Borntraeger 2018-03-29 9:52 ` Ming Lei 0 siblings, 2 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-03-29 7:23 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig [-- Attachment #1: Type: text/plain, Size: 5626 bytes --] On 03/29/2018 04:00 AM, Ming Lei wrote: > On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: >> >> >> On 03/28/2018 05:26 PM, Ming Lei wrote: >>> Hi Christian, >>> >>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: >>>> FWIW, this patch does not fix the issue for me: >>>> >>>> ostname=? addr=? terminal=? res=success' >>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 >>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 >>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 >>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) >>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) >>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 >>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 >>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 >>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 >>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 >>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 >>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) >>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>> 000000000069c5b2: 07f4 bcr 15,%r4 >>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 >>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 >>>> [ 21.455067] Call Trace: >>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) >>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 >>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 >>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 >>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 >>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 >>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 >>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 >>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 >>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 >>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 >>>> [ 21.455136] Last Breaking-Event-Address: >>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 >>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- >>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring >>> >>> Thinking about this issue further, I can't understand the root cause for >>> this issue. >>> >>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with >>> each possisble CPU"), each hw queue should be mapped to at least one CPU, that >>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong? >>> >>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the >>> following command? >> >> # lscpu >> Architecture: s390x >> CPU op-mode(s): 32-bit, 64-bit >> Byte Order: Big Endian >> CPU(s): 16 >> On-line CPU(s) list: 0-15 >> Thread(s) per core: 2 >> Core(s) per socket: 8 >> Socket(s) per book: 3 >> Book(s) per drawer: 2 >> Drawer(s): 4 >> NUMA node(s): 1 >> Vendor ID: IBM/S390 >> Machine type: 2964 >> CPU dynamic MHz: 5000 >> CPU static MHz: 5000 >> BogoMIPS: 20325.00 >> Hypervisor: PR/SM >> Hypervisor vendor: IBM >> Virtualization type: full >> Dispatching mode: horizontal >> L1d cache: 128K >> L1i cache: 96K >> L2d cache: 2048K >> L2i cache: 2048K >> L3 cache: 65536K >> L4 cache: 491520K >> NUMA node0 CPU(s): 0-15 >> Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie >> >> # lsdasd >> Bus-ID Status Name Device Type BlkSz Size Blocks >> ============================================================================== >> 0.0.3f75 active dasda 94:0 ECKD 4096 21129MB 5409180 >> 0.0.3f76 active dasdb 94:4 ECKD 4096 21129MB 5409180 >> 0.0.3f77 active dasdc 94:8 ECKD 4096 21129MB 5409180 >> 0.0.3f74 active dasdd 94:12 ECKD 4096 21129MB 5409180 > > I have tried to emulate your CPU topo via VM and the blk-mq mapping of > null_blk is basically similar with your DASD mapping, but still can't > reproduce your issue. > > BTW, do you need to do cpu hotplug or other actions for triggering this warning? No, without hotplug. > >> >>> >>> (cd /sys/kernel/debug/block/$DASD && find . -type f -exec grep -aH . {} \;) >> >> >> see attachement: > >> dasda/range:4 >> dasda/capability:10 >> dasda/inflight: 0 0 >> dasda/ext_range:4 >> dasda/power/runtime_suspended_time:0 >> dasda/power/runtime_active_time:0 >> dasda/power/control:auto >> dasda/power/runtime_status:unsupported >> dasda/dev:94:0 > > No, it is from sysfs instead of debugfs, so please run the following Eeks. Yes sorry. New version. [-- Attachment #2: log.gz --] [-- Type: application/gzip, Size: 204740 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 7:23 ` Christian Borntraeger @ 2018-03-29 9:09 ` Christian Borntraeger 2018-03-29 9:40 ` Ming Lei 2018-03-29 9:52 ` Ming Lei 1 sibling, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2018-03-29 9:09 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 03/29/2018 09:23 AM, Christian Borntraeger wrote: > > > On 03/29/2018 04:00 AM, Ming Lei wrote: >> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: >>> >>> >>> On 03/28/2018 05:26 PM, Ming Lei wrote: >>>> Hi Christian, >>>> >>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: >>>>> FWIW, this patch does not fix the issue for me: >>>>> >>>>> ostname=? addr=? terminal=? res=success' >>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 >>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 >>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 >>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) >>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) >>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 >>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 >>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 >>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 >>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 >>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 >>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) >>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>> 000000000069c5b2: 07f4 bcr 15,%r4 >>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 >>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 >>>>> [ 21.455067] Call Trace: >>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) >>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 >>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 >>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 >>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 >>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 >>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 >>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 >>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 >>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 >>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 >>>>> [ 21.455136] Last Breaking-Event-Address: >>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 >>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- >>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring >>>> >>>> Thinking about this issue further, I can't understand the root cause for >>>> this issue. FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 9:09 ` Christian Borntraeger @ 2018-03-29 9:40 ` Ming Lei 2018-03-29 10:10 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-03-29 9:40 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: > > > On 03/29/2018 09:23 AM, Christian Borntraeger wrote: > > > > > > On 03/29/2018 04:00 AM, Ming Lei wrote: > >> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > >>> > >>> > >>> On 03/28/2018 05:26 PM, Ming Lei wrote: > >>>> Hi Christian, > >>>> > >>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > >>>>> FWIW, this patch does not fix the issue for me: > >>>>> > >>>>> ostname=? addr=? terminal=? res=success' > >>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > >>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > >>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > >>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > >>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > >>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > >>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > >>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > >>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > >>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > >>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > >>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 > >>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > >>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>> 000000000069c5b2: 07f4 bcr 15,%r4 > >>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > >>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > >>>>> [ 21.455067] Call Trace: > >>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > >>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > >>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > >>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > >>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > >>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > >>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > >>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > >>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > >>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > >>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > >>>>> [ 21.455136] Last Breaking-Event-Address: > >>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > >>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > >>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > >>>> > >>>> Thinking about this issue further, I can't understand the root cause for > >>>> this issue. > > FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. I think the following patch is needed, and this way aligns to the mapping created via managed IRQ at least. diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 9f8cffc8a701..638ab5c11b3c 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -14,13 +14,12 @@ #include "blk.h" #include "blk-mq.h" +/* + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to + * queues even it isn't present yet. + */ static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) { - /* - * Non present CPU will be mapped to queue index 0. - */ - if (!cpu_present(cpu)) - return 0; return cpu % nr_queues; } Thanks, Ming ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 9:40 ` Ming Lei @ 2018-03-29 10:10 ` Christian Borntraeger 2018-03-29 10:48 ` Ming Lei 0 siblings, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2018-03-29 10:10 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 03/29/2018 11:40 AM, Ming Lei wrote: > On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: >> >> >> On 03/29/2018 09:23 AM, Christian Borntraeger wrote: >>> >>> >>> On 03/29/2018 04:00 AM, Ming Lei wrote: >>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: >>>>> >>>>> >>>>> On 03/28/2018 05:26 PM, Ming Lei wrote: >>>>>> Hi Christian, >>>>>> >>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: >>>>>>> FWIW, this patch does not fix the issue for me: >>>>>>> >>>>>>> ostname=? addr=? terminal=? res=success' >>>>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 >>>>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 >>>>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 >>>>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) >>>>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) >>>>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >>>>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 >>>>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 >>>>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 >>>>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 >>>>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 >>>>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 >>>>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) >>>>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>>>> 000000000069c5b2: 07f4 bcr 15,%r4 >>>>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 >>>>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 >>>>>>> [ 21.455067] Call Trace: >>>>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) >>>>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 >>>>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 >>>>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 >>>>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 >>>>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 >>>>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 >>>>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 >>>>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 >>>>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 >>>>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 >>>>>>> [ 21.455136] Last Breaking-Event-Address: >>>>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 >>>>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- >>>>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring >>>>>> >>>>>> Thinking about this issue further, I can't understand the root cause for >>>>>> this issue. >> >> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. > > I think the following patch is needed, and this way aligns to the mapping > created via managed IRQ at least. > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > index 9f8cffc8a701..638ab5c11b3c 100644 > --- a/block/blk-mq-cpumap.c > +++ b/block/blk-mq-cpumap.c > @@ -14,13 +14,12 @@ > #include "blk.h" > #include "blk-mq.h" > > +/* > + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to > + * queues even it isn't present yet. > + */ > static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > { > - /* > - * Non present CPU will be mapped to queue index 0. > - */ > - if (!cpu_present(cpu)) > - return 0; > return cpu % nr_queues; > } > > Thanks, > Ming > With that I no longer see the WARN_ON but the other warning instead: [ 31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd). [ 32.000543] systemd: 18 output lines suppressed due to ratelimiting [ 32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null) [ 32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1 [ 32.359832] tun: Universal TUN/TAP device driver, 1.6 [ 32.470841] run queue from wrong CPU 18, hctx active [ 32.470845] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31 [ 32.470847] Hardware name: IBM 2964 NC9 704 (LPAR) [ 32.470856] Workqueue: kblockd blk_mq_run_work_fn [ 32.470857] Call Trace: [ 32.470862] ([<0000000000113b86>] show_stack+0x56/0x80) [ 32.470867] [<0000000000a5cd02>] dump_stack+0x82/0xb0 [ 32.470869] [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 [ 32.470873] [<0000000000163906>] process_one_work+0x1be/0x420 [ 32.470875] [<0000000000163bc0>] worker_thread+0x58/0x458 [ 32.470877] [<000000000016a9d0>] kthread+0x148/0x160 [ 32.470880] [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc [ 32.470882] [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc [ 32.470889] run queue from wrong CPU 18, hctx active [ 32.470891] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31 [ 32.470892] Hardware name: IBM 2964 NC9 704 (LPAR) [ 32.470894] Workqueue: kblockd blk_mq_run_work_fn [ 32.470895] Call Trace: [ 32.470897] ([<0000000000113b86>] show_stack+0x56/0x80) [ 32.470898] [<0000000000a5cd02>] dump_stack+0x82/0xb0 [ 32.470900] [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 [ 32.470902] [<0000000000163906>] process_one_work+0x1be/0x420 [ 32.470903] [<0000000000163bc0>] worker_thread+0x58/0x458 [ 32.470905] [<000000000016a9d0>] kthread+0x148/0x160 [ 32.470906] [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc [ 32.470908] [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc [ 32.470910] run queue from wrong CPU 18, hctx active [ 32.470911] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31 [ 32.470913] Hardware name: IBM 2964 NC9 704 (LPAR) [ 32.470914] Workqueue: kblockd blk_mq_run_work_fn [ 32.470916] Call Trace: [ 32.470918] ([<0000000000113b86>] show_stack+0x56/0x80) [ 32.470919] [<0000000000a5cd02>] dump_stack+0x82/0xb0 [ 32.470921] [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 [ 32.470922] [<0000000000163906>] process_one_work+0x1be/0x420 [ 32.470924] [<0000000000163bc0>] worker_thread+0x58/0x458 [ 32.470925] [<000000000016a9d0>] kthread+0x148/0x160 [ 32.470927] [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc [ 32.470929] [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc [ 32.470930] run queue from wrong CPU 18, hctx active [ 32.470932] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31 [ 32.470933] Hardware name: IBM 2964 NC9 704 (LPAR) [ 32.470935] Workqueue: kblockd blk_mq_run_work_fn [ 32.470936] Call Trace: [ 32.470938] ([<0000000000113b86>] show_stack+0x56/0x80) [ 32.470939] [<0000000000a5cd02>] dump_stack+0x82/0xb0 [ 32.470941] [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 [ 32.470943] [<0000000000163906>] process_one_work+0x1be/0x420 [ 32.470944] [<0000000000163bc0>] worker_thread+0x58/0x458 [ 32.470946] [<000000000016a9d0>] kthread+0x148/0x160 [ 32.470947] [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc [ 32.470949] [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc [ 32.470950] run queue from wrong CPU 18, hctx active [ 32.470952] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31 [ 32.470953] Hardware name: IBM 2964 NC9 704 (LPAR) [ 32.470955] Workqueue: kblockd blk_mq_run_work_fn [ 32.470956] Call Trace: [ 32.470958] ([<0000000000113b86>] show_stack+0x56/0x80) [ 32.470959] [<0000000000a5cd02>] dump_stack+0x82/0xb0 [ 32.470961] [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 [ 32.470963] [<0000000000163906>] process_one_work+0x1be/0x420 [ 32.470964] [<0000000000163bc0>] worker_thread+0x58/0x458 [ 32.470966] [<000000000016a9d0>] kthread+0x148/0x160 [ 32.470967] [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc [ 32.470969] [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc [ 32.470971] run queue from wrong CPU 18, hctx active [ 32.470972] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31 [ 32.470973] Hardware name: IBM 2964 NC9 704 (LPAR) [ 32.470975] Workqueue: kblockd blk_mq_run_work_fn [ 32.470976] Call Trace: [ 32.470978] ([<0000000000113b86>] show_stack+0x56/0x80) [ 32.470979] [<0000000000a5cd02>] dump_stack+0x82/0xb0 [ 32.470981] [<000000000069c40a>] __blk_mq_run_hw_queue+0x12a/0x130 [ 32.470983] [<0000000000163906>] process_one_work+0x1be/0x420 [ 32.470985] [<0000000000163bc0>] worker_thread+0x58/0x458 [ 32.470986] [<000000000016a9d0>] kthread+0x148/0x160 [ 32.470988] [<0000000000a7bea2>] kernel_thread_starter+0x6/0xc [ 32.470989] [<0000000000a7be9c>] kernel_thread_starter+0x0/0xc [ 32.470991] run queue from wrong CPU 18, hctx active [ 32.470992] CPU: 18 PID: 2131 Comm: kworker/18:1H Not tainted 4.16.0-rc7+ #31 [ 32.470993] Hardware name: IBM 2964 NC9 704 (LPAR) [ 32.470995] Workqueue: kblockd blk_mq_run_work_fn ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 10:10 ` Christian Borntraeger @ 2018-03-29 10:48 ` Ming Lei 2018-03-29 10:49 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-03-29 10:48 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote: > > > On 03/29/2018 11:40 AM, Ming Lei wrote: > > On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: > >> > >> > >> On 03/29/2018 09:23 AM, Christian Borntraeger wrote: > >>> > >>> > >>> On 03/29/2018 04:00 AM, Ming Lei wrote: > >>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > >>>>> > >>>>> > >>>>> On 03/28/2018 05:26 PM, Ming Lei wrote: > >>>>>> Hi Christian, > >>>>>> > >>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > >>>>>>> FWIW, this patch does not fix the issue for me: > >>>>>>> > >>>>>>> ostname=? addr=? terminal=? res=success' > >>>>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > >>>>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > >>>>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > >>>>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > >>>>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > >>>>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > >>>>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > >>>>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > >>>>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > >>>>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > >>>>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > >>>>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 > >>>>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > >>>>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>>>> 000000000069c5b2: 07f4 bcr 15,%r4 > >>>>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > >>>>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > >>>>>>> [ 21.455067] Call Trace: > >>>>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > >>>>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > >>>>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > >>>>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > >>>>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > >>>>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > >>>>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > >>>>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > >>>>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > >>>>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > >>>>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > >>>>>>> [ 21.455136] Last Breaking-Event-Address: > >>>>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > >>>>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > >>>>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > >>>>>> > >>>>>> Thinking about this issue further, I can't understand the root cause for > >>>>>> this issue. > >> > >> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. > > > > I think the following patch is needed, and this way aligns to the mapping > > created via managed IRQ at least. > > > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > > index 9f8cffc8a701..638ab5c11b3c 100644 > > --- a/block/blk-mq-cpumap.c > > +++ b/block/blk-mq-cpumap.c > > @@ -14,13 +14,12 @@ > > #include "blk.h" > > #include "blk-mq.h" > > > > +/* > > + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to > > + * queues even it isn't present yet. > > + */ > > static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > > { > > - /* > > - * Non present CPU will be mapped to queue index 0. > > - */ > > - if (!cpu_present(cpu)) > > - return 0; > > return cpu % nr_queues; > > } > > > > Thanks, > > Ming > > > > With that I no longer see the WARN_ON but the other warning instead: > > [ 31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > [ 31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > [ 31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd). > [ 32.000543] systemd: 18 output lines suppressed due to ratelimiting > [ 32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null) > [ 32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1 > [ 32.359832] tun: Universal TUN/TAP device driver, 1.6 > [ 32.470841] run queue from wrong CPU 18, hctx active But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and you also said CPU hotplug isn't involved in your test, so I am just wondering where the CPU 18 is from? Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 10:48 ` Ming Lei @ 2018-03-29 10:49 ` Christian Borntraeger 2018-03-29 11:43 ` Ming Lei 0 siblings, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2018-03-29 10:49 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 03/29/2018 12:48 PM, Ming Lei wrote: > On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote: >> >> >> On 03/29/2018 11:40 AM, Ming Lei wrote: >>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote: >>>>> >>>>> >>>>> On 03/29/2018 04:00 AM, Ming Lei wrote: >>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: >>>>>>> >>>>>>> >>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote: >>>>>>>> Hi Christian, >>>>>>>> >>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: >>>>>>>>> FWIW, this patch does not fix the issue for me: >>>>>>>>> >>>>>>>>> ostname=? addr=? terminal=? res=success' >>>>>>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 >>>>>>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 >>>>>>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 >>>>>>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) >>>>>>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) >>>>>>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >>>>>>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 >>>>>>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 >>>>>>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 >>>>>>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 >>>>>>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>>>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 >>>>>>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 >>>>>>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) >>>>>>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>>>>>> 000000000069c5b2: 07f4 bcr 15,%r4 >>>>>>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 >>>>>>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 >>>>>>>>> [ 21.455067] Call Trace: >>>>>>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) >>>>>>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 >>>>>>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 >>>>>>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 >>>>>>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 >>>>>>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 >>>>>>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 >>>>>>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 >>>>>>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 >>>>>>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 >>>>>>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 >>>>>>>>> [ 21.455136] Last Breaking-Event-Address: >>>>>>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 >>>>>>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- >>>>>>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring >>>>>>>> >>>>>>>> Thinking about this issue further, I can't understand the root cause for >>>>>>>> this issue. >>>> >>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. >>> >>> I think the following patch is needed, and this way aligns to the mapping >>> created via managed IRQ at least. >>> >>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c >>> index 9f8cffc8a701..638ab5c11b3c 100644 >>> --- a/block/blk-mq-cpumap.c >>> +++ b/block/blk-mq-cpumap.c >>> @@ -14,13 +14,12 @@ >>> #include "blk.h" >>> #include "blk-mq.h" >>> >>> +/* >>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to >>> + * queues even it isn't present yet. >>> + */ >>> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) >>> { >>> - /* >>> - * Non present CPU will be mapped to queue index 0. >>> - */ >>> - if (!cpu_present(cpu)) >>> - return 0; >>> return cpu % nr_queues; >>> } >>> >>> Thanks, >>> Ming >>> >> >> With that I no longer see the WARN_ON but the other warning instead: >> >> [ 31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >> [ 31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >> [ 31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd). >> [ 32.000543] systemd: 18 output lines suppressed due to ratelimiting >> [ 32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null) >> [ 32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1 >> [ 32.359832] tun: Universal TUN/TAP device driver, 1.6 >> [ 32.470841] run queue from wrong CPU 18, hctx active > > But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and > you also said CPU hotplug isn't involved in your test, so I am just > wondering where the CPU 18 is from? I have 2 test systems. One with 44CPU the other with 16. The one with 16 now has the same flood of messages: 4.454510] dasdc:VOL1/ 0X3F77: dasdc1 [ 4.454592] dasdd:VOL1/ 0X3F74: dasdd1 [ 4.593711] run queue from wrong CPU 0, hctx active [ 4.593715] CPU: 0 PID: 4 Comm: kworker/0:0H Not tainted 4.16.0-rc7+ #1 [ 4.593717] Hardware name: IBM 2964 NC9 704 (LPAR) [ 4.593724] Workqueue: kblockd blk_mq_run_work_fn [ 4.593726] Call Trace: [ 4.593731] ([<0000000000113b86>] show_stack+0x56/0x80) [ 4.593735] [<0000000000a5acd2>] dump_stack+0x82/0xb0 [ 4.593737] [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 [ 4.593741] [<0000000000162ede>] process_one_work+0x1be/0x420 [ 4.593742] [<0000000000163198>] worker_thread+0x58/0x458 [ 4.593745] [<0000000000169fa8>] kthread+0x148/0x160 [ 4.593748] [<0000000000a79e72>] kernel_thread_starter+0x6/0xc [ 4.593749] [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc [ 4.611606] dasdconf.sh Warning: 0.0.3f75 is already online, not configuring [ 4.626454] run queue from wrong CPU 10, hctx active [ 4.626456] CPU: 10 PID: 62 Comm: kworker/10:0H Not tainted 4.16.0-rc7+ #1 [ 4.626458] Hardware name: IBM 2964 NC9 704 (LPAR) [ 4.626462] Workqueue: kblockd blk_mq_run_work_fn [ 4.626463] Call Trace: [ 4.626466] ([<0000000000113b86>] show_stack+0x56/0x80) [ 4.626468] [<0000000000a5acd2>] dump_stack+0x82/0xb0 [ 4.626469] [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 [ 4.626471] [<0000000000162ede>] process_one_work+0x1be/0x420 [ 4.626473] [<0000000000163198>] worker_thread+0x58/0x458 [ 4.626474] [<0000000000169fa8>] kthread+0x148/0x160 [ 4.626476] [<0000000000a79e72>] kernel_thread_starter+0x6/0xc [ 4.626477] [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc [ 4.699514] dasdconf.sh Warning: 0.0.3f76 is already online, not configuring [ 4.709725] random: crng init done [ 4.711200] dasdconf.sh Warning: 0.0.3f74 is already online, not configuring [ 4.718452] dasdconf.sh Warning: 0.0.3f77 is already online, not configuring [ 4.726455] EXT4-fs (dasdd1): mounted filesystem with ordered data mode. Opts: (null) [ 5.075280] systemd-journald[208]: Received SIGTERM from PID 1 (systemd). [ 5.114536] run queue from wrong CPU 8, hctx active [ 5.114539] CPU: 8 PID: 1542 Comm: kworker/8:1H Not tainted 4.16.0-rc7+ #1 [ 5.114541] Hardware name: IBM 2964 NC9 704 (LPAR) [ 5.114544] Workqueue: kblockd blk_mq_run_work_fn [ 5.114545] Call Trace: [ 5.114548] ([<0000000000113b86>] show_stack+0x56/0x80) [ 5.114550] [<0000000000a5acd2>] dump_stack+0x82/0xb0 [ 5.114551] [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 [ 5.114553] [<0000000000162ede>] process_one_work+0x1be/0x420 [ 5.114555] [<0000000000163198>] worker_thread+0x58/0x458 [ 5.114556] [<0000000000169fa8>] kthread+0x148/0x160 [ 5.114558] [<0000000000a79e72>] kernel_thread_starter+0x6/0xc [ 5.114559] [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc [ 5.137222] systemd: 16 output lines suppressed due to ratelimiting [ 5.663932] run queue from wrong CPU 7, hctx active [ 5.663959] CPU: 7 PID: 1574 Comm: kworker/7:1H Not tainted 4.16.0-rc7+ #1 [ 5.663972] Hardware name: IBM 2964 NC9 704 (LPAR) [ 5.663999] Workqueue: kblockd blk_mq_run_work_fn [ 5.664012] Call Trace: [ 5.664034] ([<0000000000113b86>] show_stack+0x56/0x80) [ 5.664053] [<0000000000a5acd2>] dump_stack+0x82/0xb0 [ 5.664064] [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 [ 5.664082] [<0000000000162ede>] process_one_work+0x1be/0x420 [ 5.664097] [<0000000000163198>] worker_thread+0x58/0x458 [ 5.664110] [<0000000000169fa8>] kthread+0x148/0x160 [ 5.664123] [<0000000000a79e72>] kernel_thread_starter+0x6/0xc [ 5.664136] [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc [ 5.796783] run queue from wrong CPU 7, hctx active [ 5.796811] CPU: 7 PID: 1574 Comm: kworker/7:1H Not tainted 4.16.0-rc7+ #1 [ 5.796828] Hardware name: IBM 2964 NC9 704 (LPAR) [ 5.796850] Workqueue: kblockd blk_mq_run_work_fn [ 5.796866] Call Trace: [ 5.796874] ([<0000000000113b86>] show_stack+0x56/0x80) [ 5.796878] [<0000000000a5acd2>] dump_stack+0x82/0xb0 [ 5.796888] [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 [ 5.796902] [<0000000000162ede>] process_one_work+0x1be/0x420 [ 5.796917] [<0000000000163198>] worker_thread+0x58/0x458 [ 5.796931] [<0000000000169fa8>] kthread+0x148/0x160 [ 5.796944] [<0000000000a79e72>] kernel_thread_starter+0x6/0xc [ 5.796957] [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc [ 5.824996] run queue from wrong CPU 7, hctx active [ 5.825017] CPU: 7 PID: 1574 Comm: kworker/7:1H Not tainted 4.16.0-rc7+ #1 [ 5.825028] Hardware name: IBM 2964 NC9 704 (LPAR) [ 5.825046] Workqueue: kblockd blk_mq_run_work_fn [ 5.825061] Call Trace: [ 5.825076] ([<0000000000113b86>] show_stack+0x56/0x80) [ 5.825089] [<0000000000a5acd2>] dump_stack+0x82/0xb0 [ 5.825105] [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 [ 5.825119] [<0000000000162ede>] process_one_work+0x1be/0x420 [ 5.825133] [<0000000000163198>] worker_thread+0x58/0x458 [ 5.825147] [<0000000000169fa8>] kthread+0x148/0x160 [ 5.825160] [<0000000000a79e72>] kernel_thread_starter+0x6/0xc [ 5.825176] [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc [ 5.900186] run queue from wrong CPU 7, hctx active [ 5.900211] CPU: 7 PID: 1574 Comm: kworker/7:1H Not tainted 4.16.0-rc7+ #1 [ 5.900246] Hardware name: IBM 2964 NC9 704 (LPAR) [ 5.900269] Workqueue: kblockd blk_mq_run_work_fn [ 5.900280] Call Trace: [ 5.900298] ([<0000000000113b86>] show_stack+0x56/0x80) [ 5.900314] [<0000000000a5acd2>] dump_stack+0x82/0xb0 [ 5.900318] [<000000000069a3ea>] __blk_mq_run_hw_queue+0x12a/0x130 [ 5.900322] [<0000000000162ede>] process_one_work+0x1be/0x420 [ 5.900338] [<0000000000163198>] worker_thread+0x58/0x458 [ 5.900351] [<0000000000169fa8>] kthread+0x148/0x160 [ 5.900365] [<0000000000a79e72>] kernel_thread_starter+0x6/0xc [ 5.900379] [<0000000000a79e6c>] kernel_thread_starter+0x0/0xc [ 6.221462] EXT4-fs (dasdd1): re-mounted. Opts: (null) [ 6.249875] run queue from wrong CPU 4, hctx active [ 6.249883] CPU: 4 PID: 1515 Comm: kworker/4:1H Not tainted 4.16.0-rc7+ #1 [ 6.249886] Hardware name: IBM 2964 NC9 704 (LPAR) [ 6.249892] Workqueue: kblockd blk_mq_run_work_fn [ 6.249895] Call Trace: [ 6.249899] ([<0000000000113b86>] sho ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 10:49 ` Christian Borntraeger @ 2018-03-29 11:43 ` Ming Lei 2018-03-29 11:49 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-03-29 11:43 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote: > > > On 03/29/2018 12:48 PM, Ming Lei wrote: > > On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote: > >> > >> > >> On 03/29/2018 11:40 AM, Ming Lei wrote: > >>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: > >>>> > >>>> > >>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote: > >>>>> > >>>>> > >>>>> On 03/29/2018 04:00 AM, Ming Lei wrote: > >>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > >>>>>>> > >>>>>>> > >>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote: > >>>>>>>> Hi Christian, > >>>>>>>> > >>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > >>>>>>>>> FWIW, this patch does not fix the issue for me: > >>>>>>>>> > >>>>>>>>> ostname=? addr=? terminal=? res=success' > >>>>>>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > >>>>>>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > >>>>>>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > >>>>>>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > >>>>>>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > >>>>>>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > >>>>>>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > >>>>>>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > >>>>>>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > >>>>>>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > >>>>>>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>>>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > >>>>>>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 > >>>>>>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > >>>>>>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>>>>>> 000000000069c5b2: 07f4 bcr 15,%r4 > >>>>>>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > >>>>>>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > >>>>>>>>> [ 21.455067] Call Trace: > >>>>>>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > >>>>>>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > >>>>>>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > >>>>>>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > >>>>>>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > >>>>>>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > >>>>>>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > >>>>>>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > >>>>>>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > >>>>>>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > >>>>>>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > >>>>>>>>> [ 21.455136] Last Breaking-Event-Address: > >>>>>>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > >>>>>>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > >>>>>>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > >>>>>>>> > >>>>>>>> Thinking about this issue further, I can't understand the root cause for > >>>>>>>> this issue. > >>>> > >>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. > >>> > >>> I think the following patch is needed, and this way aligns to the mapping > >>> created via managed IRQ at least. > >>> > >>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > >>> index 9f8cffc8a701..638ab5c11b3c 100644 > >>> --- a/block/blk-mq-cpumap.c > >>> +++ b/block/blk-mq-cpumap.c > >>> @@ -14,13 +14,12 @@ > >>> #include "blk.h" > >>> #include "blk-mq.h" > >>> > >>> +/* > >>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to > >>> + * queues even it isn't present yet. > >>> + */ > >>> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > >>> { > >>> - /* > >>> - * Non present CPU will be mapped to queue index 0. > >>> - */ > >>> - if (!cpu_present(cpu)) > >>> - return 0; > >>> return cpu % nr_queues; > >>> } > >>> > >>> Thanks, > >>> Ming > >>> > >> > >> With that I no longer see the WARN_ON but the other warning instead: > >> > >> [ 31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > >> [ 31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > >> [ 31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd). > >> [ 32.000543] systemd: 18 output lines suppressed due to ratelimiting > >> [ 32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null) > >> [ 32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1 > >> [ 32.359832] tun: Universal TUN/TAP device driver, 1.6 > >> [ 32.470841] run queue from wrong CPU 18, hctx active > > > > But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and > > you also said CPU hotplug isn't involved in your test, so I am just > > wondering where the CPU 18 is from? > > > I have 2 test systems. One with 44CPU the other with 16. > The one with 16 now has the same flood of messages: > > > 4.454510] dasdc:VOL1/ 0X3F77: dasdc1 > [ 4.454592] dasdd:VOL1/ 0X3F74: dasdd1 > [ 4.593711] run queue from wrong CPU 0, hctx active Still can't reproduce your issue, could you please collect debugfs log again on your 16-core system after applying the patch on blk_mq_cpumap.c? Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 11:43 ` Ming Lei @ 2018-03-29 11:49 ` Christian Borntraeger 2018-03-30 2:53 ` Ming Lei 0 siblings, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2018-03-29 11:49 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig [-- Attachment #1: Type: text/plain, Size: 6559 bytes --] On 03/29/2018 01:43 PM, Ming Lei wrote: > On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote: >> >> >> On 03/29/2018 12:48 PM, Ming Lei wrote: >>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 03/29/2018 11:40 AM, Ming Lei wrote: >>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: >>>>>> >>>>>> >>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote: >>>>>>> >>>>>>> >>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote: >>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote: >>>>>>>>>> Hi Christian, >>>>>>>>>> >>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: >>>>>>>>>>> FWIW, this patch does not fix the issue for me: >>>>>>>>>>> >>>>>>>>>>> ostname=? addr=? terminal=? res=success' >>>>>>>>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 >>>>>>>>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 >>>>>>>>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 >>>>>>>>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) >>>>>>>>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) >>>>>>>>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >>>>>>>>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 >>>>>>>>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 >>>>>>>>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 >>>>>>>>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 >>>>>>>>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>>>>>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 >>>>>>>>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 >>>>>>>>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) >>>>>>>>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>>>>>>>> 000000000069c5b2: 07f4 bcr 15,%r4 >>>>>>>>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 >>>>>>>>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 >>>>>>>>>>> [ 21.455067] Call Trace: >>>>>>>>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) >>>>>>>>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 >>>>>>>>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 >>>>>>>>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 >>>>>>>>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 >>>>>>>>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 >>>>>>>>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 >>>>>>>>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 >>>>>>>>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 >>>>>>>>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 >>>>>>>>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 >>>>>>>>>>> [ 21.455136] Last Breaking-Event-Address: >>>>>>>>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 >>>>>>>>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- >>>>>>>>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring >>>>>>>>>> >>>>>>>>>> Thinking about this issue further, I can't understand the root cause for >>>>>>>>>> this issue. >>>>>> >>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. >>>>> >>>>> I think the following patch is needed, and this way aligns to the mapping >>>>> created via managed IRQ at least. >>>>> >>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c >>>>> index 9f8cffc8a701..638ab5c11b3c 100644 >>>>> --- a/block/blk-mq-cpumap.c >>>>> +++ b/block/blk-mq-cpumap.c >>>>> @@ -14,13 +14,12 @@ >>>>> #include "blk.h" >>>>> #include "blk-mq.h" >>>>> >>>>> +/* >>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to >>>>> + * queues even it isn't present yet. >>>>> + */ >>>>> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) >>>>> { >>>>> - /* >>>>> - * Non present CPU will be mapped to queue index 0. >>>>> - */ >>>>> - if (!cpu_present(cpu)) >>>>> - return 0; >>>>> return cpu % nr_queues; >>>>> } >>>>> >>>>> Thanks, >>>>> Ming >>>>> >>>> >>>> With that I no longer see the WARN_ON but the other warning instead: >>>> >>>> [ 31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >>>> [ 31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >>>> [ 31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd). >>>> [ 32.000543] systemd: 18 output lines suppressed due to ratelimiting >>>> [ 32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null) >>>> [ 32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1 >>>> [ 32.359832] tun: Universal TUN/TAP device driver, 1.6 >>>> [ 32.470841] run queue from wrong CPU 18, hctx active >>> >>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and >>> you also said CPU hotplug isn't involved in your test, so I am just >>> wondering where the CPU 18 is from? >> >> >> I have 2 test systems. One with 44CPU the other with 16. >> The one with 16 now has the same flood of messages: >> >> >> 4.454510] dasdc:VOL1/ 0X3F77: dasdc1 >> [ 4.454592] dasdd:VOL1/ 0X3F74: dasdd1 >> [ 4.593711] run queue from wrong CPU 0, hctx active > > Still can't reproduce your issue, could you please collect debugfs > log again on your 16-core system after applying the patch on blk_mq_cpumap.c? done. If you need anything quick, you can reach mit via irc freenode or oftc (user borntraeger) [-- Attachment #2: dmesg.gz --] [-- Type: application/gzip, Size: 10778 bytes --] [-- Attachment #3: log.gz --] [-- Type: application/gzip, Size: 338323 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 11:49 ` Christian Borntraeger @ 2018-03-30 2:53 ` Ming Lei 2018-04-04 8:18 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-03-30 2:53 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote: > > > On 03/29/2018 01:43 PM, Ming Lei wrote: > > On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote: > >> > >> > >> On 03/29/2018 12:48 PM, Ming Lei wrote: > >>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote: > >>>> > >>>> > >>>> On 03/29/2018 11:40 AM, Ming Lei wrote: > >>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: > >>>>>> > >>>>>> > >>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote: > >>>>>>> > >>>>>>> > >>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote: > >>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote: > >>>>>>>>>> Hi Christian, > >>>>>>>>>> > >>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > >>>>>>>>>>> FWIW, this patch does not fix the issue for me: > >>>>>>>>>>> > >>>>>>>>>>> ostname=? addr=? terminal=? res=success' > >>>>>>>>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > >>>>>>>>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > >>>>>>>>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > >>>>>>>>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > >>>>>>>>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > >>>>>>>>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > >>>>>>>>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > >>>>>>>>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > >>>>>>>>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > >>>>>>>>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > >>>>>>>>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>>>>>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > >>>>>>>>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 > >>>>>>>>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > >>>>>>>>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>>>>>>>> 000000000069c5b2: 07f4 bcr 15,%r4 > >>>>>>>>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > >>>>>>>>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > >>>>>>>>>>> [ 21.455067] Call Trace: > >>>>>>>>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > >>>>>>>>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > >>>>>>>>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > >>>>>>>>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > >>>>>>>>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > >>>>>>>>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > >>>>>>>>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > >>>>>>>>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > >>>>>>>>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > >>>>>>>>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > >>>>>>>>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > >>>>>>>>>>> [ 21.455136] Last Breaking-Event-Address: > >>>>>>>>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > >>>>>>>>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > >>>>>>>>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > >>>>>>>>>> > >>>>>>>>>> Thinking about this issue further, I can't understand the root cause for > >>>>>>>>>> this issue. > >>>>>> > >>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. > >>>>> > >>>>> I think the following patch is needed, and this way aligns to the mapping > >>>>> created via managed IRQ at least. > >>>>> > >>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > >>>>> index 9f8cffc8a701..638ab5c11b3c 100644 > >>>>> --- a/block/blk-mq-cpumap.c > >>>>> +++ b/block/blk-mq-cpumap.c > >>>>> @@ -14,13 +14,12 @@ > >>>>> #include "blk.h" > >>>>> #include "blk-mq.h" > >>>>> > >>>>> +/* > >>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to > >>>>> + * queues even it isn't present yet. > >>>>> + */ > >>>>> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > >>>>> { > >>>>> - /* > >>>>> - * Non present CPU will be mapped to queue index 0. > >>>>> - */ > >>>>> - if (!cpu_present(cpu)) > >>>>> - return 0; > >>>>> return cpu % nr_queues; > >>>>> } > >>>>> > >>>>> Thanks, > >>>>> Ming > >>>>> > >>>> > >>>> With that I no longer see the WARN_ON but the other warning instead: > >>>> > >>>> [ 31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > >>>> [ 31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > >>>> [ 31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd). > >>>> [ 32.000543] systemd: 18 output lines suppressed due to ratelimiting > >>>> [ 32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null) > >>>> [ 32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1 > >>>> [ 32.359832] tun: Universal TUN/TAP device driver, 1.6 > >>>> [ 32.470841] run queue from wrong CPU 18, hctx active > >>> > >>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and > >>> you also said CPU hotplug isn't involved in your test, so I am just > >>> wondering where the CPU 18 is from? > >> > >> > >> I have 2 test systems. One with 44CPU the other with 16. > >> The one with 16 now has the same flood of messages: > >> > >> > >> 4.454510] dasdc:VOL1/ 0X3F77: dasdc1 > >> [ 4.454592] dasdd:VOL1/ 0X3F74: dasdd1 > >> [ 4.593711] run queue from wrong CPU 0, hctx active > > > > Still can't reproduce your issue, could you please collect debugfs > > log again on your 16-core system after applying the patch on blk_mq_cpumap.c? > > done. OK, thanks, from the dumped mapping, looks everything is fine, each hctx is mapped to 4 CPUs, and only one of them is online. But still don't know why hctx is run from wrong CPU, looks we need more info to dump, could you apply the following debug patch and post the log? --- diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 9f8cffc8a701..638ab5c11b3c 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -14,13 +14,12 @@ #include "blk.h" #include "blk-mq.h" +/* + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to + * queues even it isn't present yet. + */ static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) { - /* - * Non present CPU will be mapped to queue index 0. - */ - if (!cpu_present(cpu)) - return 0; return cpu % nr_queues; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 16e83e6df404..65767be7927d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1325,9 +1325,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) */ if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && cpu_online(hctx->next_cpu)) { - printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n", - raw_smp_processor_id(), + int cpu; + printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n", + raw_smp_processor_id(), hctx->next_cpu, + hctx->queue_num, cpumask_empty(hctx->cpumask) ? "inactive": "active"); + for_each_cpu(cpu, hctx->cpumask) + printk("%d ", cpu); + printk("\n"); dump_stack(); } > > If you need anything quick, you can reach mit via irc freenode or oftc (user borntraeger) OK, that is cool! Thanks, Ming ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-30 2:53 ` Ming Lei @ 2018-04-04 8:18 ` Christian Borntraeger 2018-04-05 16:05 ` Ming Lei 0 siblings, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2018-04-04 8:18 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig [-- Attachment #1: Type: text/plain, Size: 8698 bytes --] On 03/30/2018 04:53 AM, Ming Lei wrote: > On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote: >> >> >> On 03/29/2018 01:43 PM, Ming Lei wrote: >>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 03/29/2018 12:48 PM, Ming Lei wrote: >>>>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote: >>>>>> >>>>>> >>>>>> On 03/29/2018 11:40 AM, Ming Lei wrote: >>>>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote: >>>>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote: >>>>>>>>>>>> Hi Christian, >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: >>>>>>>>>>>>> FWIW, this patch does not fix the issue for me: >>>>>>>>>>>>> >>>>>>>>>>>>> ostname=? addr=? terminal=? res=success' >>>>>>>>>>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 >>>>>>>>>>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 >>>>>>>>>>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 >>>>>>>>>>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) >>>>>>>>>>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) >>>>>>>>>>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >>>>>>>>>>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 >>>>>>>>>>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 >>>>>>>>>>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 >>>>>>>>>>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 >>>>>>>>>>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>>>>>>>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 >>>>>>>>>>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 >>>>>>>>>>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) >>>>>>>>>>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) >>>>>>>>>>>>> 000000000069c5b2: 07f4 bcr 15,%r4 >>>>>>>>>>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 >>>>>>>>>>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 >>>>>>>>>>>>> [ 21.455067] Call Trace: >>>>>>>>>>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) >>>>>>>>>>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 >>>>>>>>>>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 >>>>>>>>>>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 >>>>>>>>>>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 >>>>>>>>>>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 >>>>>>>>>>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 >>>>>>>>>>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 >>>>>>>>>>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 >>>>>>>>>>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 >>>>>>>>>>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 >>>>>>>>>>>>> [ 21.455136] Last Breaking-Event-Address: >>>>>>>>>>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 >>>>>>>>>>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- >>>>>>>>>>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring >>>>>>>>>>>> >>>>>>>>>>>> Thinking about this issue further, I can't understand the root cause for >>>>>>>>>>>> this issue. >>>>>>>> >>>>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. >>>>>>> >>>>>>> I think the following patch is needed, and this way aligns to the mapping >>>>>>> created via managed IRQ at least. >>>>>>> >>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c >>>>>>> index 9f8cffc8a701..638ab5c11b3c 100644 >>>>>>> --- a/block/blk-mq-cpumap.c >>>>>>> +++ b/block/blk-mq-cpumap.c >>>>>>> @@ -14,13 +14,12 @@ >>>>>>> #include "blk.h" >>>>>>> #include "blk-mq.h" >>>>>>> >>>>>>> +/* >>>>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to >>>>>>> + * queues even it isn't present yet. >>>>>>> + */ >>>>>>> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) >>>>>>> { >>>>>>> - /* >>>>>>> - * Non present CPU will be mapped to queue index 0. >>>>>>> - */ >>>>>>> - if (!cpu_present(cpu)) >>>>>>> - return 0; >>>>>>> return cpu % nr_queues; >>>>>>> } >>>>>>> >>>>>>> Thanks, >>>>>>> Ming >>>>>>> >>>>>> >>>>>> With that I no longer see the WARN_ON but the other warning instead: >>>>>> >>>>>> [ 31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >>>>>> [ 31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >>>>>> [ 31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd). >>>>>> [ 32.000543] systemd: 18 output lines suppressed due to ratelimiting >>>>>> [ 32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null) >>>>>> [ 32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1 >>>>>> [ 32.359832] tun: Universal TUN/TAP device driver, 1.6 >>>>>> [ 32.470841] run queue from wrong CPU 18, hctx active >>>>> >>>>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and >>>>> you also said CPU hotplug isn't involved in your test, so I am just >>>>> wondering where the CPU 18 is from? >>>> >>>> >>>> I have 2 test systems. One with 44CPU the other with 16. >>>> The one with 16 now has the same flood of messages: >>>> >>>> >>>> 4.454510] dasdc:VOL1/ 0X3F77: dasdc1 >>>> [ 4.454592] dasdd:VOL1/ 0X3F74: dasdd1 >>>> [ 4.593711] run queue from wrong CPU 0, hctx active >>> >>> Still can't reproduce your issue, could you please collect debugfs >>> log again on your 16-core system after applying the patch on blk_mq_cpumap.c? >> >> done. > > OK, thanks, from the dumped mapping, looks everything is fine, each hctx > is mapped to 4 CPUs, and only one of them is online. > > But still don't know why hctx is run from wrong CPU, looks we need more > info to dump, could you apply the following debug patch and post the > log? > > --- > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > index 9f8cffc8a701..638ab5c11b3c 100644 > --- a/block/blk-mq-cpumap.c > +++ b/block/blk-mq-cpumap.c > @@ -14,13 +14,12 @@ > #include "blk.h" > #include "blk-mq.h" > > +/* > + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to > + * queues even it isn't present yet. > + */ > static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > { > - /* > - * Non present CPU will be mapped to queue index 0. > - */ > - if (!cpu_present(cpu)) > - return 0; > return cpu % nr_queues; > } > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 16e83e6df404..65767be7927d 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1325,9 +1325,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) > */ > if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && > cpu_online(hctx->next_cpu)) { > - printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n", > - raw_smp_processor_id(), > + int cpu; > + printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n", > + raw_smp_processor_id(), hctx->next_cpu, > + hctx->queue_num, > cpumask_empty(hctx->cpumask) ? "inactive": "active"); > + for_each_cpu(cpu, hctx->cpumask) > + printk("%d ", cpu); > + printk("\n"); > dump_stack(); > } > attached. FWIW, it looks like these messages happen mostly during boot and come less and less often the longer the system runs. Could it be that the workqeue is misplaced before it runs for the first time, but then it is ok? [-- Attachment #2: dmesg.gz --] [-- Type: application/gzip, Size: 22606 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-04 8:18 ` Christian Borntraeger @ 2018-04-05 16:05 ` Ming Lei 2018-04-05 16:11 ` Ming Lei 2018-04-06 8:35 ` Christian Borntraeger 0 siblings, 2 replies; 40+ messages in thread From: Ming Lei @ 2018-04-05 16:05 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Wed, Apr 04, 2018 at 10:18:13AM +0200, Christian Borntraeger wrote: > > > On 03/30/2018 04:53 AM, Ming Lei wrote: > > On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote: > >> > >> > >> On 03/29/2018 01:43 PM, Ming Lei wrote: > >>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote: > >>>> > >>>> > >>>> On 03/29/2018 12:48 PM, Ming Lei wrote: > >>>>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote: > >>>>>> > >>>>>> > >>>>>> On 03/29/2018 11:40 AM, Ming Lei wrote: > >>>>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote: > >>>>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote: > >>>>>>>>>>>> Hi Christian, > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > >>>>>>>>>>>>> FWIW, this patch does not fix the issue for me: > >>>>>>>>>>>>> > >>>>>>>>>>>>> ostname=? addr=? terminal=? res=success' > >>>>>>>>>>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > >>>>>>>>>>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > >>>>>>>>>>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > >>>>>>>>>>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > >>>>>>>>>>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > >>>>>>>>>>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > >>>>>>>>>>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > >>>>>>>>>>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > >>>>>>>>>>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > >>>>>>>>>>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > >>>>>>>>>>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>>>>>>>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > >>>>>>>>>>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 > >>>>>>>>>>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > >>>>>>>>>>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>>>>>>>>>>> 000000000069c5b2: 07f4 bcr 15,%r4 > >>>>>>>>>>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > >>>>>>>>>>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > >>>>>>>>>>>>> [ 21.455067] Call Trace: > >>>>>>>>>>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > >>>>>>>>>>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > >>>>>>>>>>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > >>>>>>>>>>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > >>>>>>>>>>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > >>>>>>>>>>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > >>>>>>>>>>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > >>>>>>>>>>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > >>>>>>>>>>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > >>>>>>>>>>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > >>>>>>>>>>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > >>>>>>>>>>>>> [ 21.455136] Last Breaking-Event-Address: > >>>>>>>>>>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > >>>>>>>>>>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > >>>>>>>>>>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > >>>>>>>>>>>> > >>>>>>>>>>>> Thinking about this issue further, I can't understand the root cause for > >>>>>>>>>>>> this issue. > >>>>>>>> > >>>>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. > >>>>>>> > >>>>>>> I think the following patch is needed, and this way aligns to the mapping > >>>>>>> created via managed IRQ at least. > >>>>>>> > >>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > >>>>>>> index 9f8cffc8a701..638ab5c11b3c 100644 > >>>>>>> --- a/block/blk-mq-cpumap.c > >>>>>>> +++ b/block/blk-mq-cpumap.c > >>>>>>> @@ -14,13 +14,12 @@ > >>>>>>> #include "blk.h" > >>>>>>> #include "blk-mq.h" > >>>>>>> > >>>>>>> +/* > >>>>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to > >>>>>>> + * queues even it isn't present yet. > >>>>>>> + */ > >>>>>>> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > >>>>>>> { > >>>>>>> - /* > >>>>>>> - * Non present CPU will be mapped to queue index 0. > >>>>>>> - */ > >>>>>>> - if (!cpu_present(cpu)) > >>>>>>> - return 0; > >>>>>>> return cpu % nr_queues; > >>>>>>> } > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Ming > >>>>>>> > >>>>>> > >>>>>> With that I no longer see the WARN_ON but the other warning instead: > >>>>>> > >>>>>> [ 31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > >>>>>> [ 31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > >>>>>> [ 31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd). > >>>>>> [ 32.000543] systemd: 18 output lines suppressed due to ratelimiting > >>>>>> [ 32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null) > >>>>>> [ 32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1 > >>>>>> [ 32.359832] tun: Universal TUN/TAP device driver, 1.6 > >>>>>> [ 32.470841] run queue from wrong CPU 18, hctx active > >>>>> > >>>>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and > >>>>> you also said CPU hotplug isn't involved in your test, so I am just > >>>>> wondering where the CPU 18 is from? > >>>> > >>>> > >>>> I have 2 test systems. One with 44CPU the other with 16. > >>>> The one with 16 now has the same flood of messages: > >>>> > >>>> > >>>> 4.454510] dasdc:VOL1/ 0X3F77: dasdc1 > >>>> [ 4.454592] dasdd:VOL1/ 0X3F74: dasdd1 > >>>> [ 4.593711] run queue from wrong CPU 0, hctx active > >>> > >>> Still can't reproduce your issue, could you please collect debugfs > >>> log again on your 16-core system after applying the patch on blk_mq_cpumap.c? > >> > >> done. > > > > OK, thanks, from the dumped mapping, looks everything is fine, each hctx > > is mapped to 4 CPUs, and only one of them is online. > > > > But still don't know why hctx is run from wrong CPU, looks we need more > > info to dump, could you apply the following debug patch and post the > > log? > > > > --- > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > > index 9f8cffc8a701..638ab5c11b3c 100644 > > --- a/block/blk-mq-cpumap.c > > +++ b/block/blk-mq-cpumap.c > > @@ -14,13 +14,12 @@ > > #include "blk.h" > > #include "blk-mq.h" > > > > +/* > > + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to > > + * queues even it isn't present yet. > > + */ > > static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > > { > > - /* > > - * Non present CPU will be mapped to queue index 0. > > - */ > > - if (!cpu_present(cpu)) > > - return 0; > > return cpu % nr_queues; > > } > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 16e83e6df404..65767be7927d 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -1325,9 +1325,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) > > */ > > if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && > > cpu_online(hctx->next_cpu)) { > > - printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n", > > - raw_smp_processor_id(), > > + int cpu; > > + printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n", > > + raw_smp_processor_id(), hctx->next_cpu, > > + hctx->queue_num, > > cpumask_empty(hctx->cpumask) ? "inactive": "active"); > > + for_each_cpu(cpu, hctx->cpumask) > > + printk("%d ", cpu); > > + printk("\n"); > > dump_stack(); > > } > > > > > attached. FWIW, it looks like these messages happen mostly during boot and come less > and less often the longer the system runs. Could it be that the workqeue is misplaced > before it runs for the first time, but then it is ok? Looks not workqueue's issue, and it shows that hctx->next_cpu is figured out as wrong by blk_mq_hctx_next_cpu(), maybe on your ARCH, either 'nr_cpu_ids' or 'cpu_online_mask' is too strange during booting and breaks blk_mq_hctx_next_cpu(). Could you please apply the following patch and provide the dmesg boot log? --- diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 9f8cffc8a701..638ab5c11b3c 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -14,13 +14,12 @@ #include "blk.h" #include "blk-mq.h" +/* + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to + * queues even it isn't present yet. + */ static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) { - /* - * Non present CPU will be mapped to queue index 0. - */ - if (!cpu_present(cpu)) - return 0; return cpu % nr_queues; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 90838e998f66..996f8a963026 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1324,9 +1324,18 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) */ if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && cpu_online(hctx->next_cpu)) { - printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n", - raw_smp_processor_id(), + int cpu; + printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n", + raw_smp_processor_id(), hctx->next_cpu, + hctx->queue_num, cpumask_empty(hctx->cpumask) ? "inactive": "active"); + printk("dump CPUs mapped to this hctx:\n"); + for_each_cpu(cpu, hctx->cpumask) + printk("%d ", cpu); + printk("\n"); + printk("nr_cpu_ids is %d, and dump online cpus:\n", nr_cpu_ids); + for_each_cpu(cpu, cpu_online_mask) + printk("%d ", cpu); dump_stack(); } Thanks, Ming ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-05 16:05 ` Ming Lei @ 2018-04-05 16:11 ` Ming Lei 2018-04-05 17:39 ` Christian Borntraeger 2018-04-06 8:35 ` Christian Borntraeger 1 sibling, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-04-05 16:11 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Fri, Apr 06, 2018 at 12:05:03AM +0800, Ming Lei wrote: > On Wed, Apr 04, 2018 at 10:18:13AM +0200, Christian Borntraeger wrote: > > > > > > On 03/30/2018 04:53 AM, Ming Lei wrote: > > > On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote: > > >> > > >> > > >> On 03/29/2018 01:43 PM, Ming Lei wrote: > > >>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote: > > >>>> > > >>>> > > >>>> On 03/29/2018 12:48 PM, Ming Lei wrote: > > >>>>> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote: > > >>>>>> > > >>>>>> > > >>>>>> On 03/29/2018 11:40 AM, Ming Lei wrote: > > >>>>>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote: > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On 03/29/2018 09:23 AM, Christian Borntraeger wrote: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On 03/29/2018 04:00 AM, Ming Lei wrote: > > >>>>>>>>>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On 03/28/2018 05:26 PM, Ming Lei wrote: > > >>>>>>>>>>>> Hi Christian, > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > > >>>>>>>>>>>>> FWIW, this patch does not fix the issue for me: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> ostname=? addr=? terminal=? res=success' > > >>>>>>>>>>>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > > >>>>>>>>>>>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > > >>>>>>>>>>>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > > >>>>>>>>>>>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > > >>>>>>>>>>>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > > >>>>>>>>>>>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > > >>>>>>>>>>>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > > >>>>>>>>>>>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > > >>>>>>>>>>>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > > >>>>>>>>>>>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > > >>>>>>>>>>>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > > >>>>>>>>>>>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > > >>>>>>>>>>>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 > > >>>>>>>>>>>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > > >>>>>>>>>>>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > > >>>>>>>>>>>>> 000000000069c5b2: 07f4 bcr 15,%r4 > > >>>>>>>>>>>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > > >>>>>>>>>>>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > > >>>>>>>>>>>>> [ 21.455067] Call Trace: > > >>>>>>>>>>>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > > >>>>>>>>>>>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > > >>>>>>>>>>>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > > >>>>>>>>>>>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > > >>>>>>>>>>>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > > >>>>>>>>>>>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > > >>>>>>>>>>>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > > >>>>>>>>>>>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > > >>>>>>>>>>>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > > >>>>>>>>>>>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > > >>>>>>>>>>>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > > >>>>>>>>>>>>> [ 21.455136] Last Breaking-Event-Address: > > >>>>>>>>>>>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > > >>>>>>>>>>>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > > >>>>>>>>>>>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thinking about this issue further, I can't understand the root cause for > > >>>>>>>>>>>> this issue. > > >>>>>>>> > > >>>>>>>> FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away. > > >>>>>>> > > >>>>>>> I think the following patch is needed, and this way aligns to the mapping > > >>>>>>> created via managed IRQ at least. > > >>>>>>> > > >>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > > >>>>>>> index 9f8cffc8a701..638ab5c11b3c 100644 > > >>>>>>> --- a/block/blk-mq-cpumap.c > > >>>>>>> +++ b/block/blk-mq-cpumap.c > > >>>>>>> @@ -14,13 +14,12 @@ > > >>>>>>> #include "blk.h" > > >>>>>>> #include "blk-mq.h" > > >>>>>>> > > >>>>>>> +/* > > >>>>>>> + * Given there isn't CPU hotplug handler in blk-mq, map all possible CPUs to > > >>>>>>> + * queues even it isn't present yet. > > >>>>>>> + */ > > >>>>>>> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > > >>>>>>> { > > >>>>>>> - /* > > >>>>>>> - * Non present CPU will be mapped to queue index 0. > > >>>>>>> - */ > > >>>>>>> - if (!cpu_present(cpu)) > > >>>>>>> - return 0; > > >>>>>>> return cpu % nr_queues; > > >>>>>>> } > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> Ming > > >>>>>>> > > >>>>>> > > >>>>>> With that I no longer see the WARN_ON but the other warning instead: > > >>>>>> > > >>>>>> [ 31.903096] audit: type=1130 audit(1522318064.439:41): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > > >>>>>> [ 31.903100] audit: type=1131 audit(1522318064.439:42): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > > >>>>>> [ 31.985756] systemd-journald[379]: Received SIGTERM from PID 1 (systemd). > > >>>>>> [ 32.000543] systemd: 18 output lines suppressed due to ratelimiting > > >>>>>> [ 32.209496] EXT4-fs (dasdc1): re-mounted. Opts: (null) > > >>>>>> [ 32.234808] systemd-journald[2490]: Received request to flush runtime journal from PID 1 > > >>>>>> [ 32.359832] tun: Universal TUN/TAP device driver, 1.6 > > >>>>>> [ 32.470841] run queue from wrong CPU 18, hctx active > > >>>>> > > >>>>> But your 'lscpu' log showed that you only have 16 CPUs online(0~15) and > > >>>>> you also said CPU hotplug isn't involved in your test, so I am just > > >>>>> wondering where the CPU 18 is from? > > >>>> > > >>>> > > >>>> I have 2 test systems. One with 44CPU the other with 16. > > >>>> The one with 16 now has the same flood of messages: > > >>>> > > >>>> > > >>>> 4.454510] dasdc:VOL1/ 0X3F77: dasdc1 > > >>>> [ 4.454592] dasdd:VOL1/ 0X3F74: dasdd1 > > >>>> [ 4.593711] run queue from wrong CPU 0, hctx active > > >>> > > >>> Still can't reproduce your issue, could you please collect debugfs > > >>> log again on your 16-core system after applying the patch on blk_mq_cpumap.c? > > >> > > >> done. > > > > > > OK, thanks, from the dumped mapping, looks everything is fine, each hctx > > > is mapped to 4 CPUs, and only one of them is online. > > > > > > But still don't know why hctx is run from wrong CPU, looks we need more > > > info to dump, could you apply the following debug patch and post the > > > log? > > > > > > --- > > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > > > index 9f8cffc8a701..638ab5c11b3c 100644 > > > --- a/block/blk-mq-cpumap.c > > > +++ b/block/blk-mq-cpumap.c > > > @@ -14,13 +14,12 @@ > > > #include "blk.h" > > > #include "blk-mq.h" > > > > > > +/* > > > + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to > > > + * queues even it isn't present yet. > > > + */ > > > static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > > > { > > > - /* > > > - * Non present CPU will be mapped to queue index 0. > > > - */ > > > - if (!cpu_present(cpu)) > > > - return 0; > > > return cpu % nr_queues; > > > } > > > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > > index 16e83e6df404..65767be7927d 100644 > > > --- a/block/blk-mq.c > > > +++ b/block/blk-mq.c > > > @@ -1325,9 +1325,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) > > > */ > > > if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && > > > cpu_online(hctx->next_cpu)) { > > > - printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n", > > > - raw_smp_processor_id(), > > > + int cpu; > > > + printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n", > > > + raw_smp_processor_id(), hctx->next_cpu, > > > + hctx->queue_num, > > > cpumask_empty(hctx->cpumask) ? "inactive": "active"); > > > + for_each_cpu(cpu, hctx->cpumask) > > > + printk("%d ", cpu); > > > + printk("\n"); > > > dump_stack(); > > > } > > > > > > > > > attached. FWIW, it looks like these messages happen mostly during boot and come less > > and less often the longer the system runs. Could it be that the workqeue is misplaced > > before it runs for the first time, but then it is ok? > > Looks not workqueue's issue, and it shows that hctx->next_cpu is figured > out as wrong by blk_mq_hctx_next_cpu(), maybe on your ARCH, either 'nr_cpu_ids' > or 'cpu_online_mask' is too strange during booting and breaks blk_mq_hctx_next_cpu(). > > Could you please apply the following patch and provide the dmesg boot log? And please post out the 'lscpu' log together from the test machine too. Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-05 16:11 ` Ming Lei @ 2018-04-05 17:39 ` Christian Borntraeger 2018-04-05 17:43 ` Christian Borntraeger 2018-04-06 8:41 ` Ming Lei 0 siblings, 2 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-04-05 17:39 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig [-- Attachment #1: Type: text/plain, Size: 384 bytes --] On 04/05/2018 06:11 PM, Ming Lei wrote: >> >> Could you please apply the following patch and provide the dmesg boot log? > > And please post out the 'lscpu' log together from the test machine too. attached. As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores == 16 threads. [-- Attachment #2: dmesg.gz --] [-- Type: application/gzip, Size: 22264 bytes --] [-- Attachment #3: lscpu --] [-- Type: text/plain, Size: 808 bytes --] Architecture: s390x CPU op-mode(s): 32-bit, 64-bit Byte Order: Big Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s) per book: 3 Book(s) per drawer: 2 Drawer(s): 4 NUMA node(s): 1 Vendor ID: IBM/S390 Machine type: 2964 CPU dynamic MHz: 5000 CPU static MHz: 5000 BogoMIPS: 20325.00 Hypervisor: PR/SM Hypervisor vendor: IBM Virtualization type: full Dispatching mode: horizontal L1d cache: 128K L1i cache: 96K L2d cache: 2048K L2i cache: 2048K L3 cache: 65536K L4 cache: 491520K NUMA node0 CPU(s): 0-15 Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie [-- Attachment #4: lscpu2 --] [-- Type: text/plain, Size: 1406 bytes --] CPU NODE DRAWER BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED POLARIZATION ADDRESS 0 0 0 0 0 0 0:0:0:0 yes yes horizontal 0 1 0 0 0 0 0 1:1:1:1 yes yes horizontal 1 2 0 0 0 0 1 2:2:2:2 yes yes horizontal 2 3 0 0 0 0 1 3:3:3:3 yes yes horizontal 3 4 0 0 0 0 2 4:4:4:4 yes yes horizontal 4 5 0 0 0 0 2 5:5:5:5 yes yes horizontal 5 6 0 0 0 0 3 6:6:6:6 yes yes horizontal 6 7 0 0 0 0 3 7:7:7:7 yes yes horizontal 7 8 0 0 0 1 4 8:8:8:8 yes yes horizontal 8 9 0 0 0 1 4 9:9:9:9 yes yes horizontal 9 10 0 0 0 1 5 10:10:10:10 yes yes horizontal 10 11 0 0 0 1 5 11:11:11:11 yes yes horizontal 11 12 0 0 0 1 6 12:12:12:12 yes yes horizontal 12 13 0 0 0 1 6 13:13:13:13 yes yes horizontal 13 14 0 0 0 1 7 14:14:14:14 yes yes horizontal 14 15 0 0 0 1 7 15:15:15:15 yes yes horizontal 15 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-05 17:39 ` Christian Borntraeger @ 2018-04-05 17:43 ` Christian Borntraeger 2018-04-06 8:41 ` Ming Lei 1 sibling, 0 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-04-05 17:43 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 04/05/2018 07:39 PM, Christian Borntraeger wrote: > > > On 04/05/2018 06:11 PM, Ming Lei wrote: >>> >>> Could you please apply the following patch and provide the dmesg boot log? >> >> And please post out the 'lscpu' log together from the test machine too. > > attached. > > As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. > We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores > == 16 threads. To say it differently The whole system has up to 141 cpus, but this LPAR has only 8 cpus assigned. So we have 16 CPUS (SMT) but this could become up to 282 IF I would do CPU hotplug. (But this is not used here). ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-05 17:39 ` Christian Borntraeger 2018-04-05 17:43 ` Christian Borntraeger @ 2018-04-06 8:41 ` Ming Lei 2018-04-06 8:51 ` Christian Borntraeger 1 sibling, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-04-06 8:41 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: > > > On 04/05/2018 06:11 PM, Ming Lei wrote: > >> > >> Could you please apply the following patch and provide the dmesg boot log? > > > > And please post out the 'lscpu' log together from the test machine too. > > attached. > > As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. > We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores > == 16 threads. OK, thanks! The most weird thing is that hctx->next_cpu is computed as 512 since nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of possible CPU. Looks like it is a s390 specific issue, since I can setup one queue which has same mapping with yours: - nr_cpu_id is 282 - CPU 0~15 is online - 64 queues null_blk - still run all hw queues in .complete handler But can't reproduce this issue at all. So please test the following patch, which may tell us why hctx->next_cpu is computed wrong: --- diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 9f8cffc8a701..638ab5c11b3c 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -14,13 +14,12 @@ #include "blk.h" #include "blk-mq.h" +/* + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to + * queues even it isn't present yet. + */ static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) { - /* - * Non present CPU will be mapped to queue index 0. - */ - if (!cpu_present(cpu)) - return 0; return cpu % nr_queues; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 90838e998f66..9b130e4b87df 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1343,6 +1343,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) hctx_unlock(hctx, srcu_idx); } +static void check_next_cpu(int next_cpu, const char *str1, const char *str2) +{ + if (next_cpu > nr_cpu_ids) + printk_ratelimited("wrong next_cpu %d, %s, %s\n", + next_cpu, str1, str2); +} + /* * It'd be great if the workqueue API had a way to pass * in a mask and had some smarts for more clever placement. @@ -1352,26 +1359,29 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) { bool tried = false; + int next_cpu = hctx->next_cpu; if (hctx->queue->nr_hw_queues == 1) return WORK_CPU_UNBOUND; if (--hctx->next_cpu_batch <= 0) { - int next_cpu; select_cpu: - next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask, + next_cpu = cpumask_next_and(next_cpu, hctx->cpumask, cpu_online_mask); - if (next_cpu >= nr_cpu_ids) + check_next_cpu(next_cpu, __func__, "next_and"); + if (next_cpu >= nr_cpu_ids) { next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask); + check_next_cpu(next_cpu, __func__, "first_and"); + } /* * No online CPU is found, so have to make sure hctx->next_cpu * is set correctly for not breaking workqueue. */ - if (next_cpu >= nr_cpu_ids) - hctx->next_cpu = cpumask_first(hctx->cpumask); - else - hctx->next_cpu = next_cpu; + if (next_cpu >= nr_cpu_ids) { + next_cpu = cpumask_first(hctx->cpumask); + check_next_cpu(next_cpu, __func__, "first"); + } hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; } @@ -1379,7 +1389,7 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) * Do unbound schedule if we can't find a online CPU for this hctx, * and it should only happen in the path of handling CPU DEAD. */ - if (!cpu_online(hctx->next_cpu)) { + if (!cpu_online(next_cpu)) { if (!tried) { tried = true; goto select_cpu; @@ -1392,7 +1402,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) hctx->next_cpu_batch = 1; return WORK_CPU_UNBOUND; } - return hctx->next_cpu; + + hctx->next_cpu = next_cpu; + return next_cpu; } static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, @@ -2408,6 +2420,8 @@ static void blk_mq_map_swqueue(struct request_queue *q) mutex_unlock(&q->sysfs_lock); queue_for_each_hw_ctx(q, hctx, i) { + int next_cpu; + /* * If no software queues are mapped to this hardware queue, * disable it and free the request entries. @@ -2437,8 +2451,10 @@ static void blk_mq_map_swqueue(struct request_queue *q) /* * Initialize batch roundrobin counts */ - hctx->next_cpu = cpumask_first_and(hctx->cpumask, + next_cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask); + check_next_cpu(next_cpu, __func__, "first_and"); + hctx->next_cpu = next_cpu; hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; } } Thanks, Ming ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 8:41 ` Ming Lei @ 2018-04-06 8:51 ` Christian Borntraeger 2018-04-06 8:53 ` Christian Borntraeger 2018-04-06 9:23 ` Ming Lei 0 siblings, 2 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-04-06 8:51 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 04/06/2018 10:41 AM, Ming Lei wrote: > On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: >> >> >> On 04/05/2018 06:11 PM, Ming Lei wrote: >>>> >>>> Could you please apply the following patch and provide the dmesg boot log? >>> >>> And please post out the 'lscpu' log together from the test machine too. >> >> attached. >> >> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. >> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores >> == 16 threads. > > OK, thanks! > > The most weird thing is that hctx->next_cpu is computed as 512 since > nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of > possible CPU. > > Looks like it is a s390 specific issue, since I can setup one queue > which has same mapping with yours: > > - nr_cpu_id is 282 > - CPU 0~15 is online > - 64 queues null_blk > - still run all hw queues in .complete handler > > But can't reproduce this issue at all. > > So please test the following patch, which may tell us why hctx->next_cpu > is computed wrong: I see things like [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and which is exactly what happens if the find and and operation fails (returns size of bitmap). FWIW, I added a dump stack for the case when we run unbound before I tested your patch: Apr 06 10:47:41 s38lp39 kernel: CPU: 15 PID: 86 Comm: ksoftirqd/15 Not tainted 4.16.0-07249-g864f9fc031e4-dirty #2 Apr 06 10:47:41 s38lp39 kernel: Hardware name: IBM 2964 NC9 704 (LPAR) Apr 06 10:47:41 s38lp39 kernel: Call Trace: Apr 06 10:47:41 s38lp39 kernel: ([<0000000000113946>] show_stack+0x56/0x80) Apr 06 10:47:41 s38lp39 kernel: [<00000000009d8132>] dump_stack+0x82/0xb0 Apr 06 10:47:41 s38lp39 kernel: [<00000000006a05de>] blk_mq_hctx_next_cpu+0x12e/0x138 Apr 06 10:47:41 s38lp39 kernel: [<00000000006a084c>] __blk_mq_delay_run_hw_queue+0x94/0xd8 Apr 06 10:47:41 s38lp39 kernel: [<00000000006a097a>] blk_mq_run_hw_queue+0x82/0x180 Apr 06 10:47:41 s38lp39 kernel: [<00000000006a0ae0>] blk_mq_run_hw_queues+0x68/0x88 Apr 06 10:47:41 s38lp39 kernel: [<000000000069fc4e>] __blk_mq_complete_request+0x11e/0x1d8 Apr 06 10:47:41 s38lp39 kernel: [<000000000069fd94>] blk_mq_complete_request+0x8c/0xc8 Apr 06 10:47:41 s38lp39 kernel: [<0000000000824c50>] dasd_block_tasklet+0x158/0x490 Apr 06 10:47:41 s38lp39 kernel: [<000000000014a952>] tasklet_action_common.isra.5+0x7a/0x100 Apr 06 10:47:41 s38lp39 kernel: [<00000000009f8248>] __do_softirq+0x98/0x368 Apr 06 10:47:41 s38lp39 kernel: [<000000000014a322>] run_ksoftirqd+0x4a/0x68 Apr 06 10:47:41 s38lp39 kernel: [<000000000016dc20>] smpboot_thread_fn+0x108/0x1b0 Apr 06 10:47:41 s38lp39 kernel: [<0000000000168e70>] kthread+0x148/0x160 Apr 06 10:47:41 s38lp39 kernel: [<00000000009f727a>] kernel_thread_starter+0x6/0xc Apr 06 10:47:41 s38lp39 kernel: [<00000000009f7274>] kernel_thread_starter+0x0/0xc ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 8:51 ` Christian Borntraeger @ 2018-04-06 8:53 ` Christian Borntraeger 2018-04-06 9:23 ` Ming Lei 1 sibling, 0 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-04-06 8:53 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 04/06/2018 10:51 AM, Christian Borntraeger wrote: > > > On 04/06/2018 10:41 AM, Ming Lei wrote: >> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: >>> >>> >>> On 04/05/2018 06:11 PM, Ming Lei wrote: >>>>> >>>>> Could you please apply the following patch and provide the dmesg boot log? >>>> >>>> And please post out the 'lscpu' log together from the test machine too. >>> >>> attached. >>> >>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. >>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores >>> == 16 threads. >> >> OK, thanks! >> >> The most weird thing is that hctx->next_cpu is computed as 512 since >> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of >> possible CPU. >> >> Looks like it is a s390 specific issue, since I can setup one queue >> which has same mapping with yours: >> >> - nr_cpu_id is 282 >> - CPU 0~15 is online >> - 64 queues null_blk >> - still run all hw queues in .complete handler >> >> But can't reproduce this issue at all. >> >> So please test the following patch, which may tell us why hctx->next_cpu >> is computed wrong: > > I see things like > > [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and There are more # dmesg | grep "wrong next" | cut -d "]" -f 2- | uniq -c 10 wrong next_cpu 512, blk_mq_map_swqueue, first_and 72 wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and 7 wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and 1 wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and 10 wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and > > which is exactly what happens if the find and and operation fails (returns size of bitmap). > > FWIW, I added a dump stack for the case when we run unbound before I tested your patch: > > Apr 06 10:47:41 s38lp39 kernel: CPU: 15 PID: 86 Comm: ksoftirqd/15 Not tainted 4.16.0-07249-g864f9fc031e4-dirty #2 > Apr 06 10:47:41 s38lp39 kernel: Hardware name: IBM 2964 NC9 704 (LPAR) > Apr 06 10:47:41 s38lp39 kernel: Call Trace: > Apr 06 10:47:41 s38lp39 kernel: ([<0000000000113946>] show_stack+0x56/0x80) > Apr 06 10:47:41 s38lp39 kernel: [<00000000009d8132>] dump_stack+0x82/0xb0 > Apr 06 10:47:41 s38lp39 kernel: [<00000000006a05de>] blk_mq_hctx_next_cpu+0x12e/0x138 > Apr 06 10:47:41 s38lp39 kernel: [<00000000006a084c>] __blk_mq_delay_run_hw_queue+0x94/0xd8 > Apr 06 10:47:41 s38lp39 kernel: [<00000000006a097a>] blk_mq_run_hw_queue+0x82/0x180 > Apr 06 10:47:41 s38lp39 kernel: [<00000000006a0ae0>] blk_mq_run_hw_queues+0x68/0x88 > Apr 06 10:47:41 s38lp39 kernel: [<000000000069fc4e>] __blk_mq_complete_request+0x11e/0x1d8 > Apr 06 10:47:41 s38lp39 kernel: [<000000000069fd94>] blk_mq_complete_request+0x8c/0xc8 > Apr 06 10:47:41 s38lp39 kernel: [<0000000000824c50>] dasd_block_tasklet+0x158/0x490 > Apr 06 10:47:41 s38lp39 kernel: [<000000000014a952>] tasklet_action_common.isra.5+0x7a/0x100 > Apr 06 10:47:41 s38lp39 kernel: [<00000000009f8248>] __do_softirq+0x98/0x368 > Apr 06 10:47:41 s38lp39 kernel: [<000000000014a322>] run_ksoftirqd+0x4a/0x68 > Apr 06 10:47:41 s38lp39 kernel: [<000000000016dc20>] smpboot_thread_fn+0x108/0x1b0 > Apr 06 10:47:41 s38lp39 kernel: [<0000000000168e70>] kthread+0x148/0x160 > Apr 06 10:47:41 s38lp39 kernel: [<00000000009f727a>] kernel_thread_starter+0x6/0xc > Apr 06 10:47:41 s38lp39 kernel: [<00000000009f7274>] kernel_thread_starter+0x0/0xc > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 8:51 ` Christian Borntraeger 2018-04-06 8:53 ` Christian Borntraeger @ 2018-04-06 9:23 ` Ming Lei 2018-04-06 10:19 ` Christian Borntraeger 2018-04-06 11:37 ` Christian Borntraeger 1 sibling, 2 replies; 40+ messages in thread From: Ming Lei @ 2018-04-06 9:23 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote: > > > On 04/06/2018 10:41 AM, Ming Lei wrote: > > On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: > >> > >> > >> On 04/05/2018 06:11 PM, Ming Lei wrote: > >>>> > >>>> Could you please apply the following patch and provide the dmesg boot log? > >>> > >>> And please post out the 'lscpu' log together from the test machine too. > >> > >> attached. > >> > >> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. > >> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores > >> == 16 threads. > > > > OK, thanks! > > > > The most weird thing is that hctx->next_cpu is computed as 512 since > > nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of > > possible CPU. > > > > Looks like it is a s390 specific issue, since I can setup one queue > > which has same mapping with yours: > > > > - nr_cpu_id is 282 > > - CPU 0~15 is online > > - 64 queues null_blk > > - still run all hw queues in .complete handler > > > > But can't reproduce this issue at all. > > > > So please test the following patch, which may tell us why hctx->next_cpu > > is computed wrong: > > I see things like > > [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and > [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and > > which is exactly what happens if the find and and operation fails (returns size of bitmap). Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct in your previous debug log, it means the following function returns totally wrong result on S390. cpumask_first_and(hctx->cpumask, cpu_online_mask); The debugfs log shows that each hctx->cpumask includes one online CPU(0~15). So looks it isn't one issue in block MQ core. Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 9:23 ` Ming Lei @ 2018-04-06 10:19 ` Christian Borntraeger 2018-04-06 13:41 ` Ming Lei 2018-04-06 11:37 ` Christian Borntraeger 1 sibling, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2018-04-06 10:19 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 04/06/2018 11:23 AM, Ming Lei wrote: > On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote: >> >> >> On 04/06/2018 10:41 AM, Ming Lei wrote: >>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 04/05/2018 06:11 PM, Ming Lei wrote: >>>>>> >>>>>> Could you please apply the following patch and provide the dmesg boot log? >>>>> >>>>> And please post out the 'lscpu' log together from the test machine too. >>>> >>>> attached. >>>> >>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. >>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores >>>> == 16 threads. >>> >>> OK, thanks! >>> >>> The most weird thing is that hctx->next_cpu is computed as 512 since >>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of >>> possible CPU. >>> >>> Looks like it is a s390 specific issue, since I can setup one queue >>> which has same mapping with yours: >>> >>> - nr_cpu_id is 282 >>> - CPU 0~15 is online >>> - 64 queues null_blk >>> - still run all hw queues in .complete handler >>> >>> But can't reproduce this issue at all. >>> >>> So please test the following patch, which may tell us why hctx->next_cpu >>> is computed wrong: >> >> I see things like >> >> [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> >> which is exactly what happens if the find and and operation fails (returns size of bitmap). > > Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct > in your previous debug log, it means the following function returns > totally wrong result on S390. > > cpumask_first_and(hctx->cpumask, cpu_online_mask); > > The debugfs log shows that each hctx->cpumask includes one online > CPU(0~15). Really? the last log (with the latest patch applied shows a lot of contexts that do not have CPUs in 0-15: e.g. [ 4.049828] dump CPUs mapped to this hctx: [ 4.049829] 18 [ 4.049829] 82 [ 4.049830] 146 [ 4.049830] 210 [ 4.049831] 274 > > So looks it isn't one issue in block MQ core. > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 10:19 ` Christian Borntraeger @ 2018-04-06 13:41 ` Ming Lei 2018-04-06 14:26 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-04-06 13:41 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote: > > > On 04/06/2018 11:23 AM, Ming Lei wrote: > > On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote: > >> > >> > >> On 04/06/2018 10:41 AM, Ming Lei wrote: > >>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: > >>>> > >>>> > >>>> On 04/05/2018 06:11 PM, Ming Lei wrote: > >>>>>> > >>>>>> Could you please apply the following patch and provide the dmesg boot log? > >>>>> > >>>>> And please post out the 'lscpu' log together from the test machine too. > >>>> > >>>> attached. > >>>> > >>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. > >>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores > >>>> == 16 threads. > >>> > >>> OK, thanks! > >>> > >>> The most weird thing is that hctx->next_cpu is computed as 512 since > >>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of > >>> possible CPU. > >>> > >>> Looks like it is a s390 specific issue, since I can setup one queue > >>> which has same mapping with yours: > >>> > >>> - nr_cpu_id is 282 > >>> - CPU 0~15 is online > >>> - 64 queues null_blk > >>> - still run all hw queues in .complete handler > >>> > >>> But can't reproduce this issue at all. > >>> > >>> So please test the following patch, which may tell us why hctx->next_cpu > >>> is computed wrong: > >> > >> I see things like > >> > >> [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >> > >> which is exactly what happens if the find and and operation fails (returns size of bitmap). > > > > Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct > > in your previous debug log, it means the following function returns > > totally wrong result on S390. > > > > cpumask_first_and(hctx->cpumask, cpu_online_mask); > > > > The debugfs log shows that each hctx->cpumask includes one online > > CPU(0~15). > > Really? the last log (with the latest patch applied shows a lot of contexts > that do not have CPUs in 0-15: > > e.g. > [ 4.049828] dump CPUs mapped to this hctx: > [ 4.049829] 18 > [ 4.049829] 82 > [ 4.049830] 146 > [ 4.049830] 210 > [ 4.049831] 274 That won't be an issue, since no IO can be submitted from these offline CPUs, then these hctx shouldn't have been run at all. But hctx->next_cpu can be set as 512 for these inactive hctx in blk_mq_map_swqueue(), then please test the attached patch, and if hctx->next_cpu is still set as 512, something is still wrong. --- diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 9f8cffc8a701..638ab5c11b3c 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -14,13 +14,12 @@ #include "blk.h" #include "blk-mq.h" +/* + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to + * queues even it isn't present yet. + */ static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) { - /* - * Non present CPU will be mapped to queue index 0. - */ - if (!cpu_present(cpu)) - return 0; return cpu % nr_queues; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 90838e998f66..1a834d96a718 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1343,6 +1343,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) hctx_unlock(hctx, srcu_idx); } +static void check_next_cpu(int next_cpu, const char *str1, const char *str2) +{ + if (next_cpu > nr_cpu_ids) + printk_ratelimited("wrong next_cpu %d, %s, %s\n", + next_cpu, str1, str2); +} + /* * It'd be great if the workqueue API had a way to pass * in a mask and had some smarts for more clever placement. @@ -1352,26 +1359,29 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) { bool tried = false; + int next_cpu = hctx->next_cpu; if (hctx->queue->nr_hw_queues == 1) return WORK_CPU_UNBOUND; if (--hctx->next_cpu_batch <= 0) { - int next_cpu; select_cpu: - next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask, + next_cpu = cpumask_next_and(next_cpu, hctx->cpumask, cpu_online_mask); - if (next_cpu >= nr_cpu_ids) + check_next_cpu(next_cpu, __func__, "next_and"); + if (next_cpu >= nr_cpu_ids) { next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask); + check_next_cpu(next_cpu, __func__, "first_and"); + } /* * No online CPU is found, so have to make sure hctx->next_cpu * is set correctly for not breaking workqueue. */ - if (next_cpu >= nr_cpu_ids) - hctx->next_cpu = cpumask_first(hctx->cpumask); - else - hctx->next_cpu = next_cpu; + if (next_cpu >= nr_cpu_ids) { + next_cpu = cpumask_first(hctx->cpumask); + check_next_cpu(next_cpu, __func__, "first"); + } hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; } @@ -1379,7 +1389,7 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) * Do unbound schedule if we can't find a online CPU for this hctx, * and it should only happen in the path of handling CPU DEAD. */ - if (!cpu_online(hctx->next_cpu)) { + if (!cpu_online(next_cpu)) { if (!tried) { tried = true; goto select_cpu; @@ -1392,7 +1402,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) hctx->next_cpu_batch = 1; return WORK_CPU_UNBOUND; } - return hctx->next_cpu; + + hctx->next_cpu = next_cpu; + return next_cpu; } static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, @@ -2408,6 +2420,8 @@ static void blk_mq_map_swqueue(struct request_queue *q) mutex_unlock(&q->sysfs_lock); queue_for_each_hw_ctx(q, hctx, i) { + int next_cpu; + /* * If no software queues are mapped to this hardware queue, * disable it and free the request entries. @@ -2437,8 +2451,12 @@ static void blk_mq_map_swqueue(struct request_queue *q) /* * Initialize batch roundrobin counts */ - hctx->next_cpu = cpumask_first_and(hctx->cpumask, + next_cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask); + if (next_cpu >= nr_cpu_ids) + next_cpu = cpumask_first(hctx->cpumask); + check_next_cpu(next_cpu, __func__, "first_and"); + hctx->next_cpu = next_cpu; hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; } } Thanks, Ming ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 13:41 ` Ming Lei @ 2018-04-06 14:26 ` Christian Borntraeger 2018-04-06 14:58 ` Ming Lei 0 siblings, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2018-04-06 14:26 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 04/06/2018 03:41 PM, Ming Lei wrote: > On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote: >> >> >> On 04/06/2018 11:23 AM, Ming Lei wrote: >>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 04/06/2018 10:41 AM, Ming Lei wrote: >>>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: >>>>>> >>>>>> >>>>>> On 04/05/2018 06:11 PM, Ming Lei wrote: >>>>>>>> >>>>>>>> Could you please apply the following patch and provide the dmesg boot log? >>>>>>> >>>>>>> And please post out the 'lscpu' log together from the test machine too. >>>>>> >>>>>> attached. >>>>>> >>>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. >>>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores >>>>>> == 16 threads. >>>>> >>>>> OK, thanks! >>>>> >>>>> The most weird thing is that hctx->next_cpu is computed as 512 since >>>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of >>>>> possible CPU. >>>>> >>>>> Looks like it is a s390 specific issue, since I can setup one queue >>>>> which has same mapping with yours: >>>>> >>>>> - nr_cpu_id is 282 >>>>> - CPU 0~15 is online >>>>> - 64 queues null_blk >>>>> - still run all hw queues in .complete handler >>>>> >>>>> But can't reproduce this issue at all. >>>>> >>>>> So please test the following patch, which may tell us why hctx->next_cpu >>>>> is computed wrong: >>>> >>>> I see things like >>>> >>>> [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>> >>>> which is exactly what happens if the find and and operation fails (returns size of bitmap). >>> >>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct >>> in your previous debug log, it means the following function returns >>> totally wrong result on S390. >>> >>> cpumask_first_and(hctx->cpumask, cpu_online_mask); >>> >>> The debugfs log shows that each hctx->cpumask includes one online >>> CPU(0~15). >> >> Really? the last log (with the latest patch applied shows a lot of contexts >> that do not have CPUs in 0-15: >> >> e.g. >> [ 4.049828] dump CPUs mapped to this hctx: >> [ 4.049829] 18 >> [ 4.049829] 82 >> [ 4.049830] 146 >> [ 4.049830] 210 >> [ 4.049831] 274 > > That won't be an issue, since no IO can be submitted from these offline > CPUs, then these hctx shouldn't have been run at all. > > But hctx->next_cpu can be set as 512 for these inactive hctx in > blk_mq_map_swqueue(), then please test the attached patch, and if > hctx->next_cpu is still set as 512, something is still wrong. WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" messages. your debug code still triggers, though. wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and If we would remove the debug code then dmesg would be clean it seems. > --- > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > index 9f8cffc8a701..638ab5c11b3c 100644 > --- a/block/blk-mq-cpumap.c > +++ b/block/blk-mq-cpumap.c > @@ -14,13 +14,12 @@ > #include "blk.h" > #include "blk-mq.h" > > +/* > + * Given there isn't CPU hotplug handler in blk-mq, map all CPUs to > + * queues even it isn't present yet. > + */ > static int cpu_to_queue_index(unsigned int nr_queues, const int cpu) > { > - /* > - * Non present CPU will be mapped to queue index 0. > - */ > - if (!cpu_present(cpu)) > - return 0; > return cpu % nr_queues; > } > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 90838e998f66..1a834d96a718 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1343,6 +1343,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) > hctx_unlock(hctx, srcu_idx); > } > > +static void check_next_cpu(int next_cpu, const char *str1, const char *str2) > +{ > + if (next_cpu > nr_cpu_ids) > + printk_ratelimited("wrong next_cpu %d, %s, %s\n", > + next_cpu, str1, str2); > +} > + > /* > * It'd be great if the workqueue API had a way to pass > * in a mask and had some smarts for more clever placement. > @@ -1352,26 +1359,29 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) > static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) > { > bool tried = false; > + int next_cpu = hctx->next_cpu; > > if (hctx->queue->nr_hw_queues == 1) > return WORK_CPU_UNBOUND; > > if (--hctx->next_cpu_batch <= 0) { > - int next_cpu; > select_cpu: > - next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask, > + next_cpu = cpumask_next_and(next_cpu, hctx->cpumask, > cpu_online_mask); > - if (next_cpu >= nr_cpu_ids) > + check_next_cpu(next_cpu, __func__, "next_and"); > + if (next_cpu >= nr_cpu_ids) { > next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask); > + check_next_cpu(next_cpu, __func__, "first_and"); > + } > > /* > * No online CPU is found, so have to make sure hctx->next_cpu > * is set correctly for not breaking workqueue. > */ > - if (next_cpu >= nr_cpu_ids) > - hctx->next_cpu = cpumask_first(hctx->cpumask); > - else > - hctx->next_cpu = next_cpu; > + if (next_cpu >= nr_cpu_ids) { > + next_cpu = cpumask_first(hctx->cpumask); > + check_next_cpu(next_cpu, __func__, "first"); > + } > hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > } > > @@ -1379,7 +1389,7 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) > * Do unbound schedule if we can't find a online CPU for this hctx, > * and it should only happen in the path of handling CPU DEAD. > */ > - if (!cpu_online(hctx->next_cpu)) { > + if (!cpu_online(next_cpu)) { > if (!tried) { > tried = true; > goto select_cpu; > @@ -1392,7 +1402,9 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) > hctx->next_cpu_batch = 1; > return WORK_CPU_UNBOUND; > } > - return hctx->next_cpu; > + > + hctx->next_cpu = next_cpu; > + return next_cpu; > } > > static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, > @@ -2408,6 +2420,8 @@ static void blk_mq_map_swqueue(struct request_queue *q) > mutex_unlock(&q->sysfs_lock); > > queue_for_each_hw_ctx(q, hctx, i) { > + int next_cpu; > + > /* > * If no software queues are mapped to this hardware queue, > * disable it and free the request entries. > @@ -2437,8 +2451,12 @@ static void blk_mq_map_swqueue(struct request_queue *q) > /* > * Initialize batch roundrobin counts > */ > - hctx->next_cpu = cpumask_first_and(hctx->cpumask, > + next_cpu = cpumask_first_and(hctx->cpumask, > cpu_online_mask); > + if (next_cpu >= nr_cpu_ids) > + next_cpu = cpumask_first(hctx->cpumask); > + check_next_cpu(next_cpu, __func__, "first_and"); > + hctx->next_cpu = next_cpu; > hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > } > } > Thanks, > Ming > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 14:26 ` Christian Borntraeger @ 2018-04-06 14:58 ` Ming Lei 2018-04-06 15:11 ` Christian Borntraeger 0 siblings, 1 reply; 40+ messages in thread From: Ming Lei @ 2018-04-06 14:58 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Fri, Apr 06, 2018 at 04:26:49PM +0200, Christian Borntraeger wrote: > > > On 04/06/2018 03:41 PM, Ming Lei wrote: > > On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote: > >> > >> > >> On 04/06/2018 11:23 AM, Ming Lei wrote: > >>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote: > >>>> > >>>> > >>>> On 04/06/2018 10:41 AM, Ming Lei wrote: > >>>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: > >>>>>> > >>>>>> > >>>>>> On 04/05/2018 06:11 PM, Ming Lei wrote: > >>>>>>>> > >>>>>>>> Could you please apply the following patch and provide the dmesg boot log? > >>>>>>> > >>>>>>> And please post out the 'lscpu' log together from the test machine too. > >>>>>> > >>>>>> attached. > >>>>>> > >>>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. > >>>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores > >>>>>> == 16 threads. > >>>>> > >>>>> OK, thanks! > >>>>> > >>>>> The most weird thing is that hctx->next_cpu is computed as 512 since > >>>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of > >>>>> possible CPU. > >>>>> > >>>>> Looks like it is a s390 specific issue, since I can setup one queue > >>>>> which has same mapping with yours: > >>>>> > >>>>> - nr_cpu_id is 282 > >>>>> - CPU 0~15 is online > >>>>> - 64 queues null_blk > >>>>> - still run all hw queues in .complete handler > >>>>> > >>>>> But can't reproduce this issue at all. > >>>>> > >>>>> So please test the following patch, which may tell us why hctx->next_cpu > >>>>> is computed wrong: > >>>> > >>>> I see things like > >>>> > >>>> [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>> > >>>> which is exactly what happens if the find and and operation fails (returns size of bitmap). > >>> > >>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct > >>> in your previous debug log, it means the following function returns > >>> totally wrong result on S390. > >>> > >>> cpumask_first_and(hctx->cpumask, cpu_online_mask); > >>> > >>> The debugfs log shows that each hctx->cpumask includes one online > >>> CPU(0~15). > >> > >> Really? the last log (with the latest patch applied shows a lot of contexts > >> that do not have CPUs in 0-15: > >> > >> e.g. > >> [ 4.049828] dump CPUs mapped to this hctx: > >> [ 4.049829] 18 > >> [ 4.049829] 82 > >> [ 4.049830] 146 > >> [ 4.049830] 210 > >> [ 4.049831] 274 > > > > That won't be an issue, since no IO can be submitted from these offline > > CPUs, then these hctx shouldn't have been run at all. > > > > But hctx->next_cpu can be set as 512 for these inactive hctx in > > blk_mq_map_swqueue(), then please test the attached patch, and if > > hctx->next_cpu is still set as 512, something is still wrong. > > > WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" messages. > your debug code still triggers, though. > > wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and > wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and > > If we would remove the debug code then dmesg would be clean it seems. That is still a bit strange, since for any inactive hctx(without online CPU mapped), blk_mq_run_hw_queue() will check blk_mq_hctx_has_pending() first. And there shouldn't be any pending IO for all inactive hctx in your case, so looks blk_mq_hctx_next_cpu() shouldn't be called for inactive hctx. I will prepare one patchset and post out soon, and hope all these issues can be covered. Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 14:58 ` Ming Lei @ 2018-04-06 15:11 ` Christian Borntraeger 2018-04-06 15:40 ` Ming Lei 0 siblings, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2018-04-06 15:11 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 04/06/2018 04:58 PM, Ming Lei wrote: > On Fri, Apr 06, 2018 at 04:26:49PM +0200, Christian Borntraeger wrote: >> >> >> On 04/06/2018 03:41 PM, Ming Lei wrote: >>> On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 04/06/2018 11:23 AM, Ming Lei wrote: >>>>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote: >>>>>> >>>>>> >>>>>> On 04/06/2018 10:41 AM, Ming Lei wrote: >>>>>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 04/05/2018 06:11 PM, Ming Lei wrote: >>>>>>>>>> >>>>>>>>>> Could you please apply the following patch and provide the dmesg boot log? >>>>>>>>> >>>>>>>>> And please post out the 'lscpu' log together from the test machine too. >>>>>>>> >>>>>>>> attached. >>>>>>>> >>>>>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. >>>>>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores >>>>>>>> == 16 threads. >>>>>>> >>>>>>> OK, thanks! >>>>>>> >>>>>>> The most weird thing is that hctx->next_cpu is computed as 512 since >>>>>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of >>>>>>> possible CPU. >>>>>>> >>>>>>> Looks like it is a s390 specific issue, since I can setup one queue >>>>>>> which has same mapping with yours: >>>>>>> >>>>>>> - nr_cpu_id is 282 >>>>>>> - CPU 0~15 is online >>>>>>> - 64 queues null_blk >>>>>>> - still run all hw queues in .complete handler >>>>>>> >>>>>>> But can't reproduce this issue at all. >>>>>>> >>>>>>> So please test the following patch, which may tell us why hctx->next_cpu >>>>>>> is computed wrong: >>>>>> >>>>>> I see things like >>>>>> >>>>>> [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and >>>>>> >>>>>> which is exactly what happens if the find and and operation fails (returns size of bitmap). >>>>> >>>>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct >>>>> in your previous debug log, it means the following function returns >>>>> totally wrong result on S390. >>>>> >>>>> cpumask_first_and(hctx->cpumask, cpu_online_mask); >>>>> >>>>> The debugfs log shows that each hctx->cpumask includes one online >>>>> CPU(0~15). >>>> >>>> Really? the last log (with the latest patch applied shows a lot of contexts >>>> that do not have CPUs in 0-15: >>>> >>>> e.g. >>>> [ 4.049828] dump CPUs mapped to this hctx: >>>> [ 4.049829] 18 >>>> [ 4.049829] 82 >>>> [ 4.049830] 146 >>>> [ 4.049830] 210 >>>> [ 4.049831] 274 >>> >>> That won't be an issue, since no IO can be submitted from these offline >>> CPUs, then these hctx shouldn't have been run at all. >>> >>> But hctx->next_cpu can be set as 512 for these inactive hctx in >>> blk_mq_map_swqueue(), then please test the attached patch, and if >>> hctx->next_cpu is still set as 512, something is still wrong. >> >> >> WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" messages. >> your debug code still triggers, though. >> >> wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and >> wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and >> >> If we would remove the debug code then dmesg would be clean it seems. > > That is still a bit strange, since for any inactive hctx(without online > CPU mapped), blk_mq_run_hw_queue() will check blk_mq_hctx_has_pending() I think for next_and it is reasonable to see this, as the next_and will return 512 after we have used the last one. In fact the code does call first_and in that case for a reason, no? > first. And there shouldn't be any pending IO for all inactive hctx > in your case, so looks blk_mq_hctx_next_cpu() shouldn't be called for > inactive hctx. > > I will prepare one patchset and post out soon, and hope all these issues > can be covered. > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 15:11 ` Christian Borntraeger @ 2018-04-06 15:40 ` Ming Lei 0 siblings, 0 replies; 40+ messages in thread From: Ming Lei @ 2018-04-06 15:40 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Fri, Apr 06, 2018 at 05:11:53PM +0200, Christian Borntraeger wrote: > > > On 04/06/2018 04:58 PM, Ming Lei wrote: > > On Fri, Apr 06, 2018 at 04:26:49PM +0200, Christian Borntraeger wrote: > >> > >> > >> On 04/06/2018 03:41 PM, Ming Lei wrote: > >>> On Fri, Apr 06, 2018 at 12:19:19PM +0200, Christian Borntraeger wrote: > >>>> > >>>> > >>>> On 04/06/2018 11:23 AM, Ming Lei wrote: > >>>>> On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote: > >>>>>> > >>>>>> > >>>>>> On 04/06/2018 10:41 AM, Ming Lei wrote: > >>>>>>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> On 04/05/2018 06:11 PM, Ming Lei wrote: > >>>>>>>>>> > >>>>>>>>>> Could you please apply the following patch and provide the dmesg boot log? > >>>>>>>>> > >>>>>>>>> And please post out the 'lscpu' log together from the test machine too. > >>>>>>>> > >>>>>>>> attached. > >>>>>>>> > >>>>>>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. > >>>>>>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores > >>>>>>>> == 16 threads. > >>>>>>> > >>>>>>> OK, thanks! > >>>>>>> > >>>>>>> The most weird thing is that hctx->next_cpu is computed as 512 since > >>>>>>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of > >>>>>>> possible CPU. > >>>>>>> > >>>>>>> Looks like it is a s390 specific issue, since I can setup one queue > >>>>>>> which has same mapping with yours: > >>>>>>> > >>>>>>> - nr_cpu_id is 282 > >>>>>>> - CPU 0~15 is online > >>>>>>> - 64 queues null_blk > >>>>>>> - still run all hw queues in .complete handler > >>>>>>> > >>>>>>> But can't reproduce this issue at all. > >>>>>>> > >>>>>>> So please test the following patch, which may tell us why hctx->next_cpu > >>>>>>> is computed wrong: > >>>>>> > >>>>>> I see things like > >>>>>> > >>>>>> [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and > >>>>>> > >>>>>> which is exactly what happens if the find and and operation fails (returns size of bitmap). > >>>>> > >>>>> Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct > >>>>> in your previous debug log, it means the following function returns > >>>>> totally wrong result on S390. > >>>>> > >>>>> cpumask_first_and(hctx->cpumask, cpu_online_mask); > >>>>> > >>>>> The debugfs log shows that each hctx->cpumask includes one online > >>>>> CPU(0~15). > >>>> > >>>> Really? the last log (with the latest patch applied shows a lot of contexts > >>>> that do not have CPUs in 0-15: > >>>> > >>>> e.g. > >>>> [ 4.049828] dump CPUs mapped to this hctx: > >>>> [ 4.049829] 18 > >>>> [ 4.049829] 82 > >>>> [ 4.049830] 146 > >>>> [ 4.049830] 210 > >>>> [ 4.049831] 274 > >>> > >>> That won't be an issue, since no IO can be submitted from these offline > >>> CPUs, then these hctx shouldn't have been run at all. > >>> > >>> But hctx->next_cpu can be set as 512 for these inactive hctx in > >>> blk_mq_map_swqueue(), then please test the attached patch, and if > >>> hctx->next_cpu is still set as 512, something is still wrong. > >> > >> > >> WIth this patch I no longer see the "run queue from wrong CPU x, hctx active" messages. > >> your debug code still triggers, though. > >> > >> wrong next_cpu 512, blk_mq_hctx_next_cpu, first_and > >> wrong next_cpu 512, blk_mq_hctx_next_cpu, next_and > >> > >> If we would remove the debug code then dmesg would be clean it seems. > > > > That is still a bit strange, since for any inactive hctx(without online > > CPU mapped), blk_mq_run_hw_queue() will check blk_mq_hctx_has_pending() > > I think for next_and it is reasonable to see this, as the next_and will return > 512 after we have used the last one. In fact the code does call first_and in > that case for a reason, no? It is possible for dumping 'first_and' when there isn't any online CPUs mapped to this hctx. But my question is that for this case, there shouldn't be any IO queued for this hctx, and blk_mq_hctx_has_pending() has been called to check that, so blk_mq_hctx_next_cpu() should have only be called when blk_mq_hctx_has_pending() in blk_mq_run_hw_queue() is true. Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-06 9:23 ` Ming Lei 2018-04-06 10:19 ` Christian Borntraeger @ 2018-04-06 11:37 ` Christian Borntraeger 1 sibling, 0 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-04-06 11:37 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 04/06/2018 11:23 AM, Ming Lei wrote: > On Fri, Apr 06, 2018 at 10:51:28AM +0200, Christian Borntraeger wrote: >> >> >> On 04/06/2018 10:41 AM, Ming Lei wrote: >>> On Thu, Apr 05, 2018 at 07:39:56PM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 04/05/2018 06:11 PM, Ming Lei wrote: >>>>>> >>>>>> Could you please apply the following patch and provide the dmesg boot log? >>>>> >>>>> And please post out the 'lscpu' log together from the test machine too. >>>> >>>> attached. >>>> >>>> As I said before this seems to go way with CONFIG_NR_CPUS=64 or smaller. >>>> We have 282 nr_cpu_ids here (max 141CPUs on that z13 with SMT2) but only 8 Cores >>>> == 16 threads. >>> >>> OK, thanks! >>> >>> The most weird thing is that hctx->next_cpu is computed as 512 since >>> nr_cpu_id is 282, and hctx->next_cpu should have pointed to one of >>> possible CPU. >>> >>> Looks like it is a s390 specific issue, since I can setup one queue >>> which has same mapping with yours: >>> >>> - nr_cpu_id is 282 >>> - CPU 0~15 is online >>> - 64 queues null_blk >>> - still run all hw queues in .complete handler >>> >>> But can't reproduce this issue at all. >>> >>> So please test the following patch, which may tell us why hctx->next_cpu >>> is computed wrong: >> >> I see things like >> >> [ 8.196907] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196910] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196912] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196913] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196914] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196915] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196916] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196917] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> [ 8.196918] wrong next_cpu 512, blk_mq_map_swqueue, first_and >> >> which is exactly what happens if the find and and operation fails (returns size of bitmap). > > Given both 'cpu_online_mask' and 'hctx->cpumask' are shown as correct > in your previous debug log, it means the following function returns > totally wrong result on S390. > > cpumask_first_and(hctx->cpumask, cpu_online_mask); > > The debugfs log shows that each hctx->cpumask includes one online > CPU(0~15). > > So looks it isn't one issue in block MQ core. So I checked further and printed the mask I think I can ignore the next_and cases. It is totally valid to get 512 here (as we might start with an offset that is already the last cpu and we need to wrap with first_and)). So the first_and and the first cases are really the interesting one. And I think the code is perfectly right, there is no bit after and for these cases: [ 3.220021] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 3.220023] 1: 0000000000010000 0000000000010000 0000000000010000 0000000000010000 0000000000010000 0000000000000000 0000000000000000 0000000000000000 [ 3.220025] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 3.220027] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 3.220028] 1: 0000000000020000 0000000000020000 0000000000020000 0000000000020000 0000000000020000 0000000000000000 0000000000000000 0000000000000000 [ 3.220030] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 3.220032] 1: 0000000000010000 0000000000010000 0000000000010000 0000000000010000 0000000000010000 0000000000000000 0000000000000000 0000000000000000 [ 3.220033] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 3.220035] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 3.220036] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 3.220037] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 3.220039] 1: 0000000000040000 0000000000040000 0000000000040000 0000000000040000 0000000000040000 0000000000000000 0000000000000000 0000000000000000 [ 3.220040] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 3.220042] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 3.220062] 1: 0000000000020000 0000000000020000 0000000000020000 0000000000020000 0000000000020000 0000000000000000 0000000000000000 0000000000000000 [ 3.220063] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 3.220064] wrong next_cpu 512, blk_mq_map_swqueue, first_and [ 3.220066] 1: 0000000000080000 0000000000080000 0000000000080000 0000000000080000 0000000000080000 0000000000000000 0000000000000000 0000000000000000 [ 3.220067] 2: 000000000000ffff 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-04-05 16:05 ` Ming Lei 2018-04-05 16:11 ` Ming Lei @ 2018-04-06 8:35 ` Christian Borntraeger 1 sibling, 0 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-04-06 8:35 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 04/05/2018 06:05 PM, Ming Lei wrote: [...] > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 90838e998f66..996f8a963026 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1324,9 +1324,18 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) > */ > if (!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && > cpu_online(hctx->next_cpu)) { > - printk(KERN_WARNING "run queue from wrong CPU %d, hctx %s\n", > - raw_smp_processor_id(), > + int cpu; > + printk(KERN_WARNING "run queue from wrong CPU %d/%d, hctx-%d %s\n", > + raw_smp_processor_id(), hctx->next_cpu, > + hctx->queue_num, > cpumask_empty(hctx->cpumask) ? "inactive": "active"); > + printk("dump CPUs mapped to this hctx:\n"); > + for_each_cpu(cpu, hctx->cpumask) > + printk("%d ", cpu); > + printk("\n"); > + printk("nr_cpu_ids is %d, and dump online cpus:\n", nr_cpu_ids); > + for_each_cpu(cpu, cpu_online_mask) > + printk("%d ", cpu); > dump_stack(); > } > FWIW, with things like [ 4.049828] dump CPUs mapped to this hctx: [ 4.049829] 18 [ 4.049829] 82 [ 4.049830] 146 [ 4.049830] 210 [ 4.049831] 274 [ 4.049832] nr_cpu_ids is 282, and dump online cpus: [ 4.049833] 0 [ 4.049833] 1 [ 4.049834] 2 [ 4.049834] 3 [ 4.049835] 4 [ 4.049835] 5 [ 4.049836] 6 [ 4.049836] 7 [ 4.049837] 8 [ 4.049837] 9 [ 4.049838] 10 [ 4.049839] 11 [ 4.049839] 12 [ 4.049840] 13 [ 4.049840] 14 [ 4.049841] 15 So the hctx has only "possible CPUs", but all are offline. Doesnt that always make this run unbound? See blk_mq_hctx_next_cpu below. /* * It'd be great if the workqueue API had a way to pass * in a mask and had some smarts for more clever placement. * For now we just round-robin here, switching for every * BLK_MQ_CPU_WORK_BATCH queued items. */ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) { bool tried = false; if (hctx->queue->nr_hw_queues == 1) return WORK_CPU_UNBOUND; if (--hctx->next_cpu_batch <= 0) { int next_cpu; select_cpu: next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask, cpu_online_mask); if (next_cpu >= nr_cpu_ids) next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask); /* * No online CPU is found, so have to make sure hctx->next_cpu * is set correctly for not breaking workqueue. */ if (next_cpu >= nr_cpu_ids) hctx->next_cpu = cpumask_first(hctx->cpumask); else hctx->next_cpu = next_cpu; hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; } /* * Do unbound schedule if we can't find a online CPU for this hctx, * and it should only happen in the path of handling CPU DEAD. */ if (!cpu_online(hctx->next_cpu)) { if (!tried) { tried = true; goto select_cpu; } /* * Make sure to re-select CPU next time once after CPUs * in hctx->cpumask become online again. */ hctx->next_cpu_batch = 1; return WORK_CPU_UNBOUND; } return hctx->next_cpu; } ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 7:23 ` Christian Borntraeger 2018-03-29 9:09 ` Christian Borntraeger @ 2018-03-29 9:52 ` Ming Lei 2018-03-29 10:11 ` Christian Borntraeger 2018-03-29 10:13 ` Ming Lei 1 sibling, 2 replies; 40+ messages in thread From: Ming Lei @ 2018-03-29 9:52 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Thu, Mar 29, 2018 at 09:23:10AM +0200, Christian Borntraeger wrote: > > > On 03/29/2018 04:00 AM, Ming Lei wrote: > > On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > >> > >> > >> On 03/28/2018 05:26 PM, Ming Lei wrote: > >>> Hi Christian, > >>> > >>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > >>>> FWIW, this patch does not fix the issue for me: > >>>> > >>>> ostname=? addr=? terminal=? res=success' > >>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > >>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > >>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > >>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > >>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > >>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > >>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > >>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > >>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > >>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > >>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > >>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 > >>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > >>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > >>>> 000000000069c5b2: 07f4 bcr 15,%r4 > >>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > >>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > >>>> [ 21.455067] Call Trace: > >>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > >>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > >>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > >>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > >>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > >>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > >>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > >>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > >>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > >>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > >>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > >>>> [ 21.455136] Last Breaking-Event-Address: > >>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > >>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > >>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > >>> > >>> Thinking about this issue further, I can't understand the root cause for > >>> this issue. > >>> > >>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with > >>> each possisble CPU"), each hw queue should be mapped to at least one CPU, that > >>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong? > >>> > >>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the > >>> following command? > >> > >> # lscpu > >> Architecture: s390x > >> CPU op-mode(s): 32-bit, 64-bit > >> Byte Order: Big Endian > >> CPU(s): 16 > >> On-line CPU(s) list: 0-15 > >> Thread(s) per core: 2 > >> Core(s) per socket: 8 > >> Socket(s) per book: 3 > >> Book(s) per drawer: 2 > >> Drawer(s): 4 > >> NUMA node(s): 1 > >> Vendor ID: IBM/S390 > >> Machine type: 2964 > >> CPU dynamic MHz: 5000 > >> CPU static MHz: 5000 > >> BogoMIPS: 20325.00 > >> Hypervisor: PR/SM > >> Hypervisor vendor: IBM > >> Virtualization type: full > >> Dispatching mode: horizontal > >> L1d cache: 128K > >> L1i cache: 96K > >> L2d cache: 2048K > >> L2i cache: 2048K > >> L3 cache: 65536K > >> L4 cache: 491520K > >> NUMA node0 CPU(s): 0-15 > >> Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie > >> > >> # lsdasd > >> Bus-ID Status Name Device Type BlkSz Size Blocks > >> ============================================================================== > >> 0.0.3f75 active dasda 94:0 ECKD 4096 21129MB 5409180 > >> 0.0.3f76 active dasdb 94:4 ECKD 4096 21129MB 5409180 > >> 0.0.3f77 active dasdc 94:8 ECKD 4096 21129MB 5409180 > >> 0.0.3f74 active dasdd 94:12 ECKD 4096 21129MB 5409180 > > > > I have tried to emulate your CPU topo via VM and the blk-mq mapping of > > null_blk is basically similar with your DASD mapping, but still can't > > reproduce your issue. > > > > BTW, do you need to do cpu hotplug or other actions for triggering this warning? > > No, without hotplug. >From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be unmapped, could you check if it is hctx0 which is unmapped when the warning is triggered? If not, what is the unmapped hctx? And you can do that by adding one extra line: printk("unmapped hctx %d", hctx->queue_num); Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 9:52 ` Ming Lei @ 2018-03-29 10:11 ` Christian Borntraeger 2018-03-29 10:12 ` Christian Borntraeger 2018-03-29 10:13 ` Ming Lei 1 sibling, 1 reply; 40+ messages in thread From: Christian Borntraeger @ 2018-03-29 10:11 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 03/29/2018 11:52 AM, Ming Lei wrote: > From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be > unmapped, could you check if it is hctx0 which is unmapped when the > warning is triggered? If not, what is the unmapped hctx? And you can do > that by adding one extra line: > > printk("unmapped hctx %d", hctx->queue_num); Where do you want that printk? > > Thanks, > Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 10:11 ` Christian Borntraeger @ 2018-03-29 10:12 ` Christian Borntraeger 0 siblings, 0 replies; 40+ messages in thread From: Christian Borntraeger @ 2018-03-29 10:12 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On 03/29/2018 12:11 PM, Christian Borntraeger wrote: > > > On 03/29/2018 11:52 AM, Ming Lei wrote: >> From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be >> unmapped, could you check if it is hctx0 which is unmapped when the >> warning is triggered? If not, what is the unmapped hctx? And you can do >> that by adding one extra line: >> >> printk("unmapped hctx %d", hctx->queue_num); > > Where do you want that printk? And do you want it with or without the other patch that you have just sent? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() 2018-03-29 9:52 ` Ming Lei 2018-03-29 10:11 ` Christian Borntraeger @ 2018-03-29 10:13 ` Ming Lei 1 sibling, 0 replies; 40+ messages in thread From: Ming Lei @ 2018-03-29 10:13 UTC (permalink / raw) To: Christian Borntraeger Cc: Jens Axboe, linux-block, Christoph Hellwig, Stefan Haberland, Christoph Hellwig On Thu, Mar 29, 2018 at 05:52:16PM +0800, Ming Lei wrote: > On Thu, Mar 29, 2018 at 09:23:10AM +0200, Christian Borntraeger wrote: > > > > > > On 03/29/2018 04:00 AM, Ming Lei wrote: > > > On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger wrote: > > >> > > >> > > >> On 03/28/2018 05:26 PM, Ming Lei wrote: > > >>> Hi Christian, > > >>> > > >>> On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger wrote: > > >>>> FWIW, this patch does not fix the issue for me: > > >>>> > > >>>> ostname=? addr=? terminal=? res=success' > > >>>> [ 21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 __blk_mq_delay_run_hw_queue+0xbe/0xd8 > > >>>> [ 21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod autofs4 > > >>>> [ 21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 4.16.0-rc7+ #26 > > >>>> [ 21.454987] Hardware name: IBM 2964 NC9 704 (LPAR) > > >>>> [ 21.454990] Krnl PSW : 00000000c0131ea3 000000003ea2f7bf (__blk_mq_delay_run_hw_queue+0xbe/0xd8) > > >>>> [ 21.454996] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > > >>>> [ 21.455005] Krnl GPRS: 0000013abb69a000 0000013a00000000 0000013ac6c0dc00 0000000000000001 > > >>>> [ 21.455008] 0000000000000000 0000013abb69a710 0000013a00000000 00000001b691fd98 > > >>>> [ 21.455011] 00000001b691fd98 0000013ace4775c8 0000000000000001 0000000000000000 > > >>>> [ 21.455014] 0000013ac6c0dc00 0000000000b47238 00000001b691fc08 00000001b691fbd0 > > >>>> [ 21.455032] Krnl Code: 000000000069c596: ebaff0a00004 lmg %r10,%r15,160(%r15) > > >>>> 000000000069c59c: c0f4ffff7a5e brcl 15,68ba58 > > >>>> #000000000069c5a2: a7f40001 brc 15,69c5a4 > > >>>> >000000000069c5a6: e340f0c00004 lg %r4,192(%r15) > > >>>> 000000000069c5ac: ebaff0a00004 lmg %r10,%r15,160(%r15) > > >>>> 000000000069c5b2: 07f4 bcr 15,%r4 > > >>>> 000000000069c5b4: c0e5fffffeea brasl %r14,69c388 > > >>>> 000000000069c5ba: a7f4fff6 brc 15,69c5a6 > > >>>> [ 21.455067] Call Trace: > > >>>> [ 21.455072] ([<00000001b691fd98>] 0x1b691fd98) > > >>>> [ 21.455079] [<000000000069c692>] blk_mq_run_hw_queue+0xba/0x100 > > >>>> [ 21.455083] [<000000000069c740>] blk_mq_run_hw_queues+0x68/0x88 > > >>>> [ 21.455089] [<000000000069b956>] __blk_mq_complete_request+0x11e/0x1d8 > > >>>> [ 21.455091] [<000000000069ba9c>] blk_mq_complete_request+0x8c/0xc8 > > >>>> [ 21.455103] [<00000000008aa250>] dasd_block_tasklet+0x158/0x490 > > >>>> [ 21.455110] [<000000000014c742>] tasklet_hi_action+0x92/0x120 > > >>>> [ 21.455118] [<0000000000a7cfc0>] __do_softirq+0x120/0x348 > > >>>> [ 21.455122] [<000000000014c212>] irq_exit+0xba/0xd0 > > >>>> [ 21.455130] [<000000000010bf92>] do_IRQ+0x8a/0xb8 > > >>>> [ 21.455133] [<0000000000a7c298>] io_int_handler+0x130/0x298 > > >>>> [ 21.455136] Last Breaking-Event-Address: > > >>>> [ 21.455138] [<000000000069c5a2>] __blk_mq_delay_run_hw_queue+0xba/0xd8 > > >>>> [ 21.455140] ---[ end trace be43f99a5d1e553e ]--- > > >>>> [ 21.510046] dasdconf.sh Warning: 0.0.241e is already online, not configuring > > >>> > > >>> Thinking about this issue further, I can't understand the root cause for > > >>> this issue. > > >>> > > >>> After commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with > > >>> each possisble CPU"), each hw queue should be mapped to at least one CPU, that > > >>> means this issue shouldn't happen. Maybe blk_mq_map_queues() works wrong? > > >>> > > >>> Could you dump 'lscpu' and provide blk-mq debugfs for your DASD via the > > >>> following command? > > >> > > >> # lscpu > > >> Architecture: s390x > > >> CPU op-mode(s): 32-bit, 64-bit > > >> Byte Order: Big Endian > > >> CPU(s): 16 > > >> On-line CPU(s) list: 0-15 > > >> Thread(s) per core: 2 > > >> Core(s) per socket: 8 > > >> Socket(s) per book: 3 > > >> Book(s) per drawer: 2 > > >> Drawer(s): 4 > > >> NUMA node(s): 1 > > >> Vendor ID: IBM/S390 > > >> Machine type: 2964 > > >> CPU dynamic MHz: 5000 > > >> CPU static MHz: 5000 > > >> BogoMIPS: 20325.00 > > >> Hypervisor: PR/SM > > >> Hypervisor vendor: IBM > > >> Virtualization type: full > > >> Dispatching mode: horizontal > > >> L1d cache: 128K > > >> L1i cache: 96K > > >> L2d cache: 2048K > > >> L2i cache: 2048K > > >> L3 cache: 65536K > > >> L4 cache: 491520K > > >> NUMA node0 CPU(s): 0-15 > > >> Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx sie > > >> > > >> # lsdasd > > >> Bus-ID Status Name Device Type BlkSz Size Blocks > > >> ============================================================================== > > >> 0.0.3f75 active dasda 94:0 ECKD 4096 21129MB 5409180 > > >> 0.0.3f76 active dasdb 94:4 ECKD 4096 21129MB 5409180 > > >> 0.0.3f77 active dasdc 94:8 ECKD 4096 21129MB 5409180 > > >> 0.0.3f74 active dasdd 94:12 ECKD 4096 21129MB 5409180 > > > > > > I have tried to emulate your CPU topo via VM and the blk-mq mapping of > > > null_blk is basically similar with your DASD mapping, but still can't > > > reproduce your issue. > > > > > > BTW, do you need to do cpu hotplug or other actions for triggering this warning? > > > > No, without hotplug. > > From the debugfs log, hctx0 is mapped to lots of CPU, so it shouldn't be > unmapped, could you check if it is hctx0 which is unmapped when the > warning is triggered? If not, what is the unmapped hctx? And you can do > that by adding one extra line: > > printk("unmapped hctx %d", hctx->queue_num); It should be triggered when running any hctx from 16 to 63, instead of 0. I see why I didn't trigger it via null_blk, because null_blk won't run all hw queues, and I should have used scsi_debug to do that. Then the patch of touching blk-mq-cpumap.c I sent before should address this issue. Thanks, Ming ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2018-04-06 15:40 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-03-28 1:20 [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues() Ming Lei 2018-03-28 3:22 ` Jens Axboe 2018-03-28 7:45 ` Christian Borntraeger 2018-03-28 14:38 ` Jens Axboe 2018-03-28 14:53 ` Jens Axboe 2018-03-28 15:38 ` Christian Borntraeger 2018-03-28 15:26 ` Ming Lei 2018-03-28 15:36 ` Christian Borntraeger 2018-03-28 15:44 ` Christian Borntraeger 2018-03-29 2:00 ` Ming Lei 2018-03-29 7:23 ` Christian Borntraeger 2018-03-29 9:09 ` Christian Borntraeger 2018-03-29 9:40 ` Ming Lei 2018-03-29 10:10 ` Christian Borntraeger 2018-03-29 10:48 ` Ming Lei 2018-03-29 10:49 ` Christian Borntraeger 2018-03-29 11:43 ` Ming Lei 2018-03-29 11:49 ` Christian Borntraeger 2018-03-30 2:53 ` Ming Lei 2018-04-04 8:18 ` Christian Borntraeger 2018-04-05 16:05 ` Ming Lei 2018-04-05 16:11 ` Ming Lei 2018-04-05 17:39 ` Christian Borntraeger 2018-04-05 17:43 ` Christian Borntraeger 2018-04-06 8:41 ` Ming Lei 2018-04-06 8:51 ` Christian Borntraeger 2018-04-06 8:53 ` Christian Borntraeger 2018-04-06 9:23 ` Ming Lei 2018-04-06 10:19 ` Christian Borntraeger 2018-04-06 13:41 ` Ming Lei 2018-04-06 14:26 ` Christian Borntraeger 2018-04-06 14:58 ` Ming Lei 2018-04-06 15:11 ` Christian Borntraeger 2018-04-06 15:40 ` Ming Lei 2018-04-06 11:37 ` Christian Borntraeger 2018-04-06 8:35 ` Christian Borntraeger 2018-03-29 9:52 ` Ming Lei 2018-03-29 10:11 ` Christian Borntraeger 2018-03-29 10:12 ` Christian Borntraeger 2018-03-29 10:13 ` Ming Lei
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.