All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
@ 2017-11-17 14:42 Christian Borntraeger
  2017-11-20 19:20 ` Bart Van Assche
  2017-11-20 19:20 ` 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk Bart Van Assche
  0 siblings, 2 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-17 14:42 UTC (permalink / raw)
  To: Jens Axboe, Michael S. Tsirkin, Jason Wang, linux-block, virtualization
  Cc: Bart Van Assche

When doing cpu hotplug in a KVM guest with virtio blk  I get  warnings like
  747.652408] ------------[ cut here ]------------
[  747.652410] WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 __blk_mq_run_hw_queue+0xd4/0x100
[  747.652410] Modules linked in: dm_multipath
[  747.652412] CPU: 4 PID: 2895 Comm: kworker/4:1H Tainted: G        W       4.14.0+ #191
[  747.652412] Hardware name: IBM 2964 NC9 704 (KVM/Linux)
[  747.652414] Workqueue: kblockd blk_mq_run_work_fn
[  747.652414] task: 0000000060680000 task.stack: 000000005ea30000
[  747.652415] Krnl PSW : 0704f00180000000 0000000000505864 (__blk_mq_run_hw_queue+0xd4/0x100)
[  747.652417]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[  747.652417] Krnl GPRS: 0000000000000010 00000000000000ff 000000005cbec400 0000000000000000
[  747.652418]            0000000063709120 0000000000000000 0000000063709500 0000000059fa44b0
[  747.652418]            0000000059fa4480 0000000000000000 000000006370f700 0000000063709100
[  747.652419]            000000005cbec500 0000000000970948 000000005ea33d80 000000005ea33d48
[  747.652423] Krnl Code: 0000000000505854: ebaff0a00004        lmg     %r10,%r15,160(%r15)
           000000000050585a: c0f4ffe690d3       brcl    15,1d7a00
          #0000000000505860: a7f40001           brc     15,505862
          >0000000000505864: 581003b0           l       %r1,944
           0000000000505868: c01b001fff00       nilf    %r1,2096896
           000000000050586e: a784ffdb           brc     8,505824
           0000000000505872: a7f40001           brc     15,505874
           0000000000505876: 9120218f           tm      399(%r2),32
[  747.652435] Call Trace:
[  747.652435] ([<0000000063709600>] 0x63709600)
[  747.652436]  [<0000000000187bcc>] process_one_work+0x264/0x4b8 
[  747.652438]  [<0000000000187e78>] worker_thread+0x58/0x4f8 
[  747.652439]  [<000000000018ee94>] kthread+0x144/0x168 
[  747.652439]  [<00000000008f8a62>] kernel_thread_starter+0x6/0xc 
[  747.652440]  [<00000000008f8a5c>] kernel_thread_starter+0x0/0xc 
[  747.652440] Last Breaking-Event-Address:
[  747.652441]  [<0000000000505860>] __blk_mq_run_hw_queue+0xd0/0x100
[  747.652442] ---[ end trace 4a001a80379b18ba ]---
[  747.652450] ------------[ cut here ]------------


This is 

b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*


Is this a known issue?

Christian

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-17 14:42 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk Christian Borntraeger
@ 2017-11-20 19:20 ` Bart Van Assche
  2017-11-20 19:29   ` Christian Borntraeger
  2017-11-20 19:29   ` Christian Borntraeger
  2017-11-20 19:20 ` 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk Bart Van Assche
  1 sibling, 2 replies; 96+ messages in thread
From: Bart Van Assche @ 2017-11-20 19:20 UTC (permalink / raw)
  To: virtualization, linux-block, mst, borntraeger, axboe, jasowang

T24gRnJpLCAyMDE3LTExLTE3IGF0IDE1OjQyICswMTAwLCBDaHJpc3RpYW4gQm9ybnRyYWVnZXIg
d3JvdGU6DQo+IFRoaXMgaXMgDQo+IA0KPiBiN2E3MWU2NmQgKEplbnMgQXhib2UgICAgICAgICAg
ICAgICAgMjAxNy0wOC0wMSAwOToyODoyNCAtMDYwMCAxMTQxKSAgICAgKiBhcmUgbWFwcGVkIHRv
IGl0Lg0KPiBiN2E3MWU2NmQgKEplbnMgQXhib2UgICAgICAgICAgICAgICAgMjAxNy0wOC0wMSAw
OToyODoyNCAtMDYwMCAxMTQyKSAgICAgKi8NCj4gNmE4M2U3NGQyIChCYXJ0IFZhbiBBc3NjaGUg
ICAgICAgICAgIDIwMTYtMTEtMDIgMTA6MDk6NTEgLTA2MDAgMTE0MykgICAgV0FSTl9PTighY3B1
bWFza190ZXN0X2NwdShyYXdfc21wX3Byb2Nlc3Nvcl9pZCgpLCBoY3R4LT5jcHVtYXNrKSAmJg0K
PiA2YTgzZTc0ZDIgKEJhcnQgVmFuIEFzc2NoZSAgICAgICAgICAgMjAxNi0xMS0wMiAxMDowOTo1
MSAtMDYwMCAxMTQ0KSAgICAgICAgICAgIGNwdV9vbmxpbmUoaGN0eC0+bmV4dF9jcHUpKTsNCj4g
NmE4M2U3NGQyIChCYXJ0IFZhbiBBc3NjaGUgICAgICAgICAgIDIwMTYtMTEtMDIgMTA6MDk6NTEg
LTA2MDAgMTE0NSkgDQo+IGI3YTcxZTY2ZCAoSmVucyBBeGJvZSAgICAgICAgICAgICAgICAyMDE3
LTA4LTAxIDA5OjI4OjI0IC0wNjAwIDExNDYpICAgIC8qDQoNCkRpZCB5b3UgcmVhbGx5IHRyeSB0
byBmaWd1cmUgb3V0IHdoZW4gdGhlIGNvZGUgdGhhdCByZXBvcnRlZCB0aGUgd2FybmluZw0Kd2Fz
IGludHJvZHVjZWQ/IEkgdGhpbmsgdGhhdCB3YXJuaW5nIHdhcyBpbnRyb2R1Y2VkIHRocm91Z2gg
dGhlIGZvbGxvd2luZw0KY29tbWl0Og0KDQpjb21taXQgZmQxMjcwZDVkZjZhMDA1ZTEyNDhlODcw
NDIxNTlhNzk5Y2M0YjJjOQ0KRGF0ZTogICBXZWQgQXByIDE2IDA5OjIzOjQ4IDIwMTQgLTA2MDAN
Cg0KICAgIGJsay1tcTogZG9uJ3QgdXNlIHByZWVtcHRfY291bnQoKSB0byBjaGVjayBmb3Igcmln
aHQgQ1BVDQogICAgIA0KICAgIFVQIG9yIENPTkZJR19QUkVFTVBUX05PTkUgd2lsbCByZXR1cm4g
MCwgYW5kIHdoYXQgd2UgcmVhbGx5DQogICAgd2FudCB0byBjaGVjayBpcyB3aGV0aGVyIG9yIG5v
dCB3ZSBhcmUgb24gdGhlIHJpZ2h0IENQVS4NCiAgICBTbyBkb24ndCBtYWtlIFBSRUVNUFQgcGFy
dCBvZiB0aGlzLCBqdXN0IHRlc3QgdGhlIENQVSBpbg0KICAgIHRoZSBtYXNrIGRpcmVjdGx5Lg0K
DQpBbnl3YXksIEkgdGhpbmsgdGhhdCB3YXJuaW5nIGlzIGFwcHJvcHJpYXRlIGFuZCB1c2VmdWwu
IFNvIHRoZSBuZXh0IHN0ZXANCmlzIHRvIGZpZ3VyZSBvdXQgd2hhdCB3b3JrIGl0ZW0gd2FzIGlu
dm9sdmVkIGFuZCB3aHkgdGhhdCB3b3JrIGl0ZW0gZ290DQpleGVjdXRlZCBvbiB0aGUgd3Jvbmcg
Q1BVLg0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-17 14:42 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk Christian Borntraeger
  2017-11-20 19:20 ` Bart Van Assche
@ 2017-11-20 19:20 ` Bart Van Assche
  1 sibling, 0 replies; 96+ messages in thread
From: Bart Van Assche @ 2017-11-20 19:20 UTC (permalink / raw)
  To: virtualization, linux-block, mst, borntraeger, axboe, jasowang

On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
> This is 
> 
> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*

Did you really try to figure out when the code that reported the warning
was introduced? I think that warning was introduced through the following
commit:

commit fd1270d5df6a005e1248e87042159a799cc4b2c9
Date:   Wed Apr 16 09:23:48 2014 -0600

    blk-mq: don't use preempt_count() to check for right CPU
     
    UP or CONFIG_PREEMPT_NONE will return 0, and what we really
    want to check is whether or not we are on the right CPU.
    So don't make PREEMPT part of this, just test the CPU in
    the mask directly.

Anyway, I think that warning is appropriate and useful. So the next step
is to figure out what work item was involved and why that work item got
executed on the wrong CPU.

Bart.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-20 19:20 ` Bart Van Assche
  2017-11-20 19:29   ` Christian Borntraeger
@ 2017-11-20 19:29   ` Christian Borntraeger
  2017-11-20 19:42     ` Jens Axboe
  2017-11-20 19:42     ` Jens Axboe
  1 sibling, 2 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-20 19:29 UTC (permalink / raw)
  To: Bart Van Assche, virtualization, linux-block, mst, axboe, jasowang



On 11/20/2017 08:20 PM, Bart Van Assche wrote:
> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>> This is 
>>
>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
> 
> Did you really try to figure out when the code that reported the warning
> was introduced? I think that warning was introduced through the following
> commit:

This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.

> 
> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
> Date:   Wed Apr 16 09:23:48 2014 -0600
> 
>     blk-mq: don't use preempt_count() to check for right CPU
>      
>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>     want to check is whether or not we are on the right CPU.
>     So don't make PREEMPT part of this, just test the CPU in
>     the mask directly.
> 
> Anyway, I think that warning is appropriate and useful. So the next step
> is to figure out what work item was involved and why that work item got
> executed on the wrong CPU.

It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
says: "no this is not a known issue" then :-)
I will try to take a dump to find out the work item

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-20 19:20 ` Bart Van Assche
@ 2017-11-20 19:29   ` Christian Borntraeger
  2017-11-20 19:29   ` Christian Borntraeger
  1 sibling, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-20 19:29 UTC (permalink / raw)
  To: Bart Van Assche, virtualization, linux-block, mst, axboe, jasowang



On 11/20/2017 08:20 PM, Bart Van Assche wrote:
> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>> This is 
>>
>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
> 
> Did you really try to figure out when the code that reported the warning
> was introduced? I think that warning was introduced through the following
> commit:

This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.

> 
> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
> Date:   Wed Apr 16 09:23:48 2014 -0600
> 
>     blk-mq: don't use preempt_count() to check for right CPU
>      
>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>     want to check is whether or not we are on the right CPU.
>     So don't make PREEMPT part of this, just test the CPU in
>     the mask directly.
> 
> Anyway, I think that warning is appropriate and useful. So the next step
> is to figure out what work item was involved and why that work item got
> executed on the wrong CPU.

It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
says: "no this is not a known issue" then :-)
I will try to take a dump to find out the work item

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-20 19:29   ` Christian Borntraeger
  2017-11-20 19:42     ` Jens Axboe
@ 2017-11-20 19:42     ` Jens Axboe
  2017-11-20 20:49       ` Christian Borntraeger
  2017-11-20 20:49         ` Christian Borntraeger
  1 sibling, 2 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-20 19:42 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang

On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>> This is 
>>>
>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>
>> Did you really try to figure out when the code that reported the warning
>> was introduced? I think that warning was introduced through the following
>> commit:
> 
> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
> 
>>
>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>
>>     blk-mq: don't use preempt_count() to check for right CPU
>>      
>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>     want to check is whether or not we are on the right CPU.
>>     So don't make PREEMPT part of this, just test the CPU in
>>     the mask directly.
>>
>> Anyway, I think that warning is appropriate and useful. So the next step
>> is to figure out what work item was involved and why that work item got
>> executed on the wrong CPU.
> 
> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
> says: "no this is not a known issue" then :-)
> I will try to take a dump to find out the work item

blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
and we reconfigure the mappings. So I don't think the above is unexpected,
if you are doing CPU hot unplug while running a fio job.

While it's a bit annoying that we trigger the WARN_ON() for a condition
that can happen, we're basically interested in it if it triggers for
normal operations.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-20 19:29   ` Christian Borntraeger
@ 2017-11-20 19:42     ` Jens Axboe
  2017-11-20 19:42     ` Jens Axboe
  1 sibling, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-20 19:42 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang

On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>> This is 
>>>
>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>
>> Did you really try to figure out when the code that reported the warning
>> was introduced? I think that warning was introduced through the following
>> commit:
> 
> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
> 
>>
>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>
>>     blk-mq: don't use preempt_count() to check for right CPU
>>      
>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>     want to check is whether or not we are on the right CPU.
>>     So don't make PREEMPT part of this, just test the CPU in
>>     the mask directly.
>>
>> Anyway, I think that warning is appropriate and useful. So the next step
>> is to figure out what work item was involved and why that work item got
>> executed on the wrong CPU.
> 
> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
> says: "no this is not a known issue" then :-)
> I will try to take a dump to find out the work item

blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
and we reconfigure the mappings. So I don't think the above is unexpected,
if you are doing CPU hot unplug while running a fio job.

While it's a bit annoying that we trigger the WARN_ON() for a condition
that can happen, we're basically interested in it if it triggers for
normal operations.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-20 19:42     ` Jens Axboe
@ 2017-11-20 20:49         ` Christian Borntraeger
  2017-11-20 20:49         ` Christian Borntraeger
  1 sibling, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-20 20:49 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel



On 11/20/2017 08:42 PM, Jens Axboe wrote:
> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>> This is 
>>>>
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>
>>> Did you really try to figure out when the code that reported the warning
>>> was introduced? I think that warning was introduced through the following
>>> commit:
>>
>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>
>>>
>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>
>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>      
>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>     want to check is whether or not we are on the right CPU.
>>>     So don't make PREEMPT part of this, just test the CPU in
>>>     the mask directly.
>>>
>>> Anyway, I think that warning is appropriate and useful. So the next step
>>> is to figure out what work item was involved and why that work item got
>>> executed on the wrong CPU.
>>
>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>> says: "no this is not a known issue" then :-)
>> I will try to take a dump to find out the work item
> 
> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
> and we reconfigure the mappings. So I don't think the above is unexpected,
> if you are doing CPU hot unplug while running a fio job.

I did a cpu hot plug (adding a CPU) and I started fio AFTER that.

 
> While it's a bit annoying that we trigger the WARN_ON() for a condition
> that can happen, we're basically interested in it if it triggers for
> normal operations.

I think we should never trigger a WARN_ON on conditions that can happen. I know some
folks enabling panic_on_warn to detect/avoid data integrity issues. FWIW, this also seems
to happen wit 4.13 and 4.12

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
@ 2017-11-20 20:49         ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-20 20:49 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel



On 11/20/2017 08:42 PM, Jens Axboe wrote:
> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>> This is 
>>>>
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>
>>> Did you really try to figure out when the code that reported the warning
>>> was introduced? I think that warning was introduced through the following
>>> commit:
>>
>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>
>>>
>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>
>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>      
>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>     want to check is whether or not we are on the right CPU.
>>>     So don't make PREEMPT part of this, just test the CPU in
>>>     the mask directly.
>>>
>>> Anyway, I think that warning is appropriate and useful. So the next step
>>> is to figure out what work item was involved and why that work item got
>>> executed on the wrong CPU.
>>
>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>> says: "no this is not a known issue" then :-)
>> I will try to take a dump to find out the work item
> 
> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
> and we reconfigure the mappings. So I don't think the above is unexpected,
> if you are doing CPU hot unplug while running a fio job.

I did a cpu hot plug (adding a CPU) and I started fio AFTER that.

 
> While it's a bit annoying that we trigger the WARN_ON() for a condition
> that can happen, we're basically interested in it if it triggers for
> normal operations.

I think we should never trigger a WARN_ON on conditions that can happen. I know some
folks enabling panic_on_warn to detect/avoid data integrity issues. FWIW, this also seems
to happen wit 4.13 and 4.12

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-20 19:42     ` Jens Axboe
@ 2017-11-20 20:49       ` Christian Borntraeger
  2017-11-20 20:49         ` Christian Borntraeger
  1 sibling, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-20 20:49 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel



On 11/20/2017 08:42 PM, Jens Axboe wrote:
> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>> This is 
>>>>
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>
>>> Did you really try to figure out when the code that reported the warning
>>> was introduced? I think that warning was introduced through the following
>>> commit:
>>
>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>
>>>
>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>
>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>      
>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>     want to check is whether or not we are on the right CPU.
>>>     So don't make PREEMPT part of this, just test the CPU in
>>>     the mask directly.
>>>
>>> Anyway, I think that warning is appropriate and useful. So the next step
>>> is to figure out what work item was involved and why that work item got
>>> executed on the wrong CPU.
>>
>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>> says: "no this is not a known issue" then :-)
>> I will try to take a dump to find out the work item
> 
> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
> and we reconfigure the mappings. So I don't think the above is unexpected,
> if you are doing CPU hot unplug while running a fio job.

I did a cpu hot plug (adding a CPU) and I started fio AFTER that.

 
> While it's a bit annoying that we trigger the WARN_ON() for a condition
> that can happen, we're basically interested in it if it triggers for
> normal operations.

I think we should never trigger a WARN_ON on conditions that can happen. I know some
folks enabling panic_on_warn to detect/avoid data integrity issues. FWIW, this also seems
to happen wit 4.13 and 4.12

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-20 20:49         ` Christian Borntraeger
@ 2017-11-20 20:52           ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-20 20:52 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel

On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>> This is 
>>>>>
>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>
>>>> Did you really try to figure out when the code that reported the warning
>>>> was introduced? I think that warning was introduced through the following
>>>> commit:
>>>
>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>
>>>>
>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>
>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>      
>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>     want to check is whether or not we are on the right CPU.
>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>     the mask directly.
>>>>
>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>> is to figure out what work item was involved and why that work item got
>>>> executed on the wrong CPU.
>>>
>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>> says: "no this is not a known issue" then :-)
>>> I will try to take a dump to find out the work item
>>
>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>> and we reconfigure the mappings. So I don't think the above is unexpected,
>> if you are doing CPU hot unplug while running a fio job.
> 
> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.

OK, that's different, we should not be triggering a warning for that.
What does your machine/virtblk topology look like in terms of CPUS,
nr of queues for virtblk, etc?

You can probably get this info the easiest by just doing a:

# find /sys/kernel/debug/block/virtX

replace virtX with your virtblk device name. Generate this info both
before and after the hotplug event.

>> While it's a bit annoying that we trigger the WARN_ON() for a condition
>> that can happen, we're basically interested in it if it triggers for
>> normal operations.
> 
> I think we should never trigger a WARN_ON on conditions that can
> happen. I know some folks enabling panic_on_warn to detect/avoid data
> integrity issues. FWIW, this also seems to happen wit 4.13 and 4.12

It's not supposed to happen for your case, so I'd say it's been useful.
It's not a critical thing, but it is something that should not trigger
and we need to look into why it did, and fixing it up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
@ 2017-11-20 20:52           ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-20 20:52 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel

On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>> This is 
>>>>>
>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>
>>>> Did you really try to figure out when the code that reported the warning
>>>> was introduced? I think that warning was introduced through the following
>>>> commit:
>>>
>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>
>>>>
>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>
>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>      
>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>     want to check is whether or not we are on the right CPU.
>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>     the mask directly.
>>>>
>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>> is to figure out what work item was involved and why that work item got
>>>> executed on the wrong CPU.
>>>
>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>> says: "no this is not a known issue" then :-)
>>> I will try to take a dump to find out the work item
>>
>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>> and we reconfigure the mappings. So I don't think the above is unexpected,
>> if you are doing CPU hot unplug while running a fio job.
> 
> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.

OK, that's different, we should not be triggering a warning for that.
What does your machine/virtblk topology look like in terms of CPUS,
nr of queues for virtblk, etc?

You can probably get this info the easiest by just doing a:

# find /sys/kernel/debug/block/virtX

replace virtX with your virtblk device name. Generate this info both
before and after the hotplug event.

>> While it's a bit annoying that we trigger the WARN_ON() for a condition
>> that can happen, we're basically interested in it if it triggers for
>> normal operations.
> 
> I think we should never trigger a WARN_ON on conditions that can
> happen. I know some folks enabling panic_on_warn to detect/avoid data
> integrity issues. FWIW, this also seems to happen wit 4.13 and 4.12

It's not supposed to happen for your case, so I'd say it's been useful.
It's not a critical thing, but it is something that should not trigger
and we need to look into why it did, and fixing it up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-20 20:52           ` Jens Axboe
@ 2017-11-21  8:35             ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21  8:35 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel



On 11/20/2017 09:52 PM, Jens Axboe wrote:
> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>> This is 
>>>>>>
>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>
>>>>> Did you really try to figure out when the code that reported the warning
>>>>> was introduced? I think that warning was introduced through the following
>>>>> commit:
>>>>
>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>
>>>>>
>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>
>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>      
>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>     want to check is whether or not we are on the right CPU.
>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>     the mask directly.
>>>>>
>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>> is to figure out what work item was involved and why that work item got
>>>>> executed on the wrong CPU.
>>>>
>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>> says: "no this is not a known issue" then :-)
>>>> I will try to take a dump to find out the work item
>>>
>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>> if you are doing CPU hot unplug while running a fio job.
>>
>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
> 
> OK, that's different, we should not be triggering a warning for that.
> What does your machine/virtblk topology look like in terms of CPUS,
> nr of queues for virtblk, etc?

FWIW, 4.11 does work, 4.12 and later is broken.

> 
> You can probably get this info the easiest by just doing a:
> 
> # find /sys/kernel/debug/block/virtX
> 
> replace virtX with your virtblk device name. Generate this info both
> before and after the hotplug event.

It happens in all variants (1 cpu to 2 or 16 to 17 and independent of the
number of disks).

What I can see is that the block layer does not yet sees the new CPU:

[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

--> in host virsh setvcpu zhyp137 2

[root@zhyp137 ~]# chcpu -e 1
CPU 1 enabled
[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat



If I already start with 2 cpus it looks like the following (all cpu1 entries are new)

[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu1
/sys/kernel/debug/block/vda/hctx0/cpu1/completed
/sys/kernel/debug/block/vda/hctx0/cpu1/merged
/sys/kernel/debug/block/vda/hctx0/cpu1/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu1/rq_list
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
@ 2017-11-21  8:35             ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21  8:35 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel



On 11/20/2017 09:52 PM, Jens Axboe wrote:
> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>> This is 
>>>>>>
>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>
>>>>> Did you really try to figure out when the code that reported the warning
>>>>> was introduced? I think that warning was introduced through the following
>>>>> commit:
>>>>
>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>
>>>>>
>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>
>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>      
>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>     want to check is whether or not we are on the right CPU.
>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>     the mask directly.
>>>>>
>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>> is to figure out what work item was involved and why that work item got
>>>>> executed on the wrong CPU.
>>>>
>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>> says: "no this is not a known issue" then :-)
>>>> I will try to take a dump to find out the work item
>>>
>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>> if you are doing CPU hot unplug while running a fio job.
>>
>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
> 
> OK, that's different, we should not be triggering a warning for that.
> What does your machine/virtblk topology look like in terms of CPUS,
> nr of queues for virtblk, etc?

FWIW, 4.11 does work, 4.12 and later is broken.

> 
> You can probably get this info the easiest by just doing a:
> 
> # find /sys/kernel/debug/block/virtX
> 
> replace virtX with your virtblk device name. Generate this info both
> before and after the hotplug event.

It happens in all variants (1 cpu to 2 or 16 to 17 and independent of the
number of disks).

What I can see is that the block layer does not yet sees the new CPU:

[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

--> in host virsh setvcpu zhyp137 2

[root@zhyp137 ~]# chcpu -e 1
CPU 1 enabled
[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat



If I already start with 2 cpus it looks like the following (all cpu1 entries are new)

[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu1
/sys/kernel/debug/block/vda/hctx0/cpu1/completed
/sys/kernel/debug/block/vda/hctx0/cpu1/merged
/sys/kernel/debug/block/vda/hctx0/cpu1/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu1/rq_list
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-21  8:35             ` Christian Borntraeger
@ 2017-11-21  9:50               ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21  9:50 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel



On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>>> This is 
>>>>>>>
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>>
>>>>>> Did you really try to figure out when the code that reported the warning
>>>>>> was introduced? I think that warning was introduced through the following
>>>>>> commit:
>>>>>
>>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>>
>>>>>>
>>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>>
>>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>>      
>>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>>     want to check is whether or not we are on the right CPU.
>>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>>     the mask directly.
>>>>>>
>>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>>> is to figure out what work item was involved and why that work item got
>>>>>> executed on the wrong CPU.
>>>>>
>>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>>> says: "no this is not a known issue" then :-)
>>>>> I will try to take a dump to find out the work item
>>>>
>>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>>> if you are doing CPU hot unplug while running a fio job.
>>>
>>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>
>> OK, that's different, we should not be triggering a warning for that.
>> What does your machine/virtblk topology look like in terms of CPUS,
>> nr of queues for virtblk, etc?
> 
> FWIW, 4.11 does work, 4.12 and later is broken.

In fact: 4.12 is fine, 4.12.14 is broken.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
@ 2017-11-21  9:50               ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21  9:50 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel



On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>>> This is 
>>>>>>>
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>>
>>>>>> Did you really try to figure out when the code that reported the warning
>>>>>> was introduced? I think that warning was introduced through the following
>>>>>> commit:
>>>>>
>>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>>
>>>>>>
>>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>>
>>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>>      
>>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>>     want to check is whether or not we are on the right CPU.
>>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>>     the mask directly.
>>>>>>
>>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>>> is to figure out what work item was involved and why that work item got
>>>>>> executed on the wrong CPU.
>>>>>
>>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>>> says: "no this is not a known issue" then :-)
>>>>> I will try to take a dump to find out the work item
>>>>
>>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>>> if you are doing CPU hot unplug while running a fio job.
>>>
>>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>
>> OK, that's different, we should not be triggering a warning for that.
>> What does your machine/virtblk topology look like in terms of CPUS,
>> nr of queues for virtblk, etc?
> 
> FWIW, 4.11 does work, 4.12 and later is broken.

In fact: 4.12 is fine, 4.12.14 is broken.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-21  8:35             ` Christian Borntraeger
  (?)
@ 2017-11-21  9:50             ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21  9:50 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel



On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>>> This is 
>>>>>>>
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>>
>>>>>> Did you really try to figure out when the code that reported the warning
>>>>>> was introduced? I think that warning was introduced through the following
>>>>>> commit:
>>>>>
>>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>>
>>>>>>
>>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>>
>>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>>      
>>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>>     want to check is whether or not we are on the right CPU.
>>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>>     the mask directly.
>>>>>>
>>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>>> is to figure out what work item was involved and why that work item got
>>>>>> executed on the wrong CPU.
>>>>>
>>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>>> says: "no this is not a known issue" then :-)
>>>>> I will try to take a dump to find out the work item
>>>>
>>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>>> if you are doing CPU hot unplug while running a fio job.
>>>
>>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>
>> OK, that's different, we should not be triggering a warning for that.
>> What does your machine/virtblk topology look like in terms of CPUS,
>> nr of queues for virtblk, etc?
> 
> FWIW, 4.11 does work, 4.12 and later is broken.

In fact: 4.12 is fine, 4.12.14 is broken.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21  9:50               ` Christian Borntraeger
@ 2017-11-21 10:14                 ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 10:14 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 10:50 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>>>> This is 
>>>>>>>>
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>>>
>>>>>>> Did you really try to figure out when the code that reported the warning
>>>>>>> was introduced? I think that warning was introduced through the following
>>>>>>> commit:
>>>>>>
>>>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>>>
>>>>>>>
>>>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>>>
>>>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>>>      
>>>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>>>     want to check is whether or not we are on the right CPU.
>>>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>>>     the mask directly.
>>>>>>>
>>>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>>>> is to figure out what work item was involved and why that work item got
>>>>>>> executed on the wrong CPU.
>>>>>>
>>>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>>>> says: "no this is not a known issue" then :-)
>>>>>> I will try to take a dump to find out the work item
>>>>>
>>>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>>>> if you are doing CPU hot unplug while running a fio job.
>>>>
>>>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>>
>>> OK, that's different, we should not be triggering a warning for that.
>>> What does your machine/virtblk topology look like in terms of CPUS,
>>> nr of queues for virtblk, etc?
>>
>> FWIW, 4.11 does work, 4.12 and later is broken.
> 
> In fact: 4.12 is fine, 4.12.14 is broken.


Bisect points to

1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jun 26 12:20:57 2017 +0200

    blk-mq: Create hctx for each present CPU
    
    commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
    
    Currently we only create hctx for online CPUs, which can lead to a lot
    of churn due to frequent soft offline / online operations.  Instead
    allocate one for each present CPU to avoid this and dramatically simplify
    the code.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jens Axboe <axboe@kernel.dk>
    Cc: Keith Busch <keith.busch@intel.com>
    Cc: linux-block@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
    Cc: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 a61cb023014a7b7a6b9f24ea04fe8ab22299e706 059ba6dc3290c74e0468937348e580cd53f963e7 M	block
:040000 040000 432e719d7e738ffcddfb8fc964544d3b3e0a68f7 f4572aa21b249a851a1b604c148eea109e93b30d M	include





adding Christoph FWIW, your patch triggers the following on 4.14 when doing a cpu hotplug (adding a
CPU) and then accessing a virtio-blk device.


  747.652408] ------------[ cut here ]------------
[  747.652410] WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 __blk_mq_run_hw_queue+0xd4/0x100
[  747.652410] Modules linked in: dm_multipath
[  747.652412] CPU: 4 PID: 2895 Comm: kworker/4:1H Tainted: G        W       4.14.0+ #191
[  747.652412] Hardware name: IBM 2964 NC9 704 (KVM/Linux)
[  747.652414] Workqueue: kblockd blk_mq_run_work_fn
[  747.652414] task: 0000000060680000 task.stack: 000000005ea30000
[  747.652415] Krnl PSW : 0704f00180000000 0000000000505864 (__blk_mq_run_hw_queue+0xd4/0x100)
[  747.652417]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[  747.652417] Krnl GPRS: 0000000000000010 00000000000000ff 000000005cbec400 0000000000000000
[  747.652418]            0000000063709120 0000000000000000 0000000063709500 0000000059fa44b0
[  747.652418]            0000000059fa4480 0000000000000000 000000006370f700 0000000063709100
[  747.652419]            000000005cbec500 0000000000970948 000000005ea33d80 000000005ea33d48
[  747.652423] Krnl Code: 0000000000505854: ebaff0a00004        lmg     %r10,%r15,160(%r15)
           000000000050585a: c0f4ffe690d3       brcl    15,1d7a00
          #0000000000505860: a7f40001           brc     15,505862
          >0000000000505864: 581003b0           l       %r1,944
           0000000000505868: c01b001fff00       nilf    %r1,2096896
           000000000050586e: a784ffdb           brc     8,505824
           0000000000505872: a7f40001           brc     15,505874
           0000000000505876: 9120218f           tm      399(%r2),32
[  747.652435] Call Trace:
[  747.652435] ([<0000000063709600>] 0x63709600)
[  747.652436]  [<0000000000187bcc>] process_one_work+0x264/0x4b8 
[  747.652438]  [<0000000000187e78>] worker_thread+0x58/0x4f8 
[  747.652439]  [<000000000018ee94>] kthread+0x144/0x168 
[  747.652439]  [<00000000008f8a62>] kernel_thread_starter+0x6/0xc 
[  747.652440]  [<00000000008f8a5c>] kernel_thread_starter+0x0/0xc 
[  747.652440] Last Breaking-Event-Address:
[  747.652441]  [<0000000000505860>] __blk_mq_run_hw_queue+0xd0/0x100
[  747.652442] ---[ end trace 4a001a80379b18ba ]---
[  747.652450] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 10:14                 ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 10:14 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 10:50 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>>>> This is 
>>>>>>>>
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>>>
>>>>>>> Did you really try to figure out when the code that reported the warning
>>>>>>> was introduced? I think that warning was introduced through the following
>>>>>>> commit:
>>>>>>
>>>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>>>
>>>>>>>
>>>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>>>
>>>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>>>      
>>>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>>>     want to check is whether or not we are on the right CPU.
>>>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>>>     the mask directly.
>>>>>>>
>>>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>>>> is to figure out what work item was involved and why that work item got
>>>>>>> executed on the wrong CPU.
>>>>>>
>>>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>>>> says: "no this is not a known issue" then :-)
>>>>>> I will try to take a dump to find out the work item
>>>>>
>>>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>>>> if you are doing CPU hot unplug while running a fio job.
>>>>
>>>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>>
>>> OK, that's different, we should not be triggering a warning for that.
>>> What does your machine/virtblk topology look like in terms of CPUS,
>>> nr of queues for virtblk, etc?
>>
>> FWIW, 4.11 does work, 4.12 and later is broken.
> 
> In fact: 4.12 is fine, 4.12.14 is broken.


Bisect points to

1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jun 26 12:20:57 2017 +0200

    blk-mq: Create hctx for each present CPU
    
    commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
    
    Currently we only create hctx for online CPUs, which can lead to a lot
    of churn due to frequent soft offline / online operations.  Instead
    allocate one for each present CPU to avoid this and dramatically simplify
    the code.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jens Axboe <axboe@kernel.dk>
    Cc: Keith Busch <keith.busch@intel.com>
    Cc: linux-block@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
    Cc: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 a61cb023014a7b7a6b9f24ea04fe8ab22299e706 059ba6dc3290c74e0468937348e580cd53f963e7 M	block
:040000 040000 432e719d7e738ffcddfb8fc964544d3b3e0a68f7 f4572aa21b249a851a1b604c148eea109e93b30d M	include





adding Christoph FWIW, your patch triggers the following on 4.14 when doing a cpu hotplug (adding a
CPU) and then accessing a virtio-blk device.


  747.652408] ------------[ cut here ]------------
[  747.652410] WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 __blk_mq_run_hw_queue+0xd4/0x100
[  747.652410] Modules linked in: dm_multipath
[  747.652412] CPU: 4 PID: 2895 Comm: kworker/4:1H Tainted: G        W       4.14.0+ #191
[  747.652412] Hardware name: IBM 2964 NC9 704 (KVM/Linux)
[  747.652414] Workqueue: kblockd blk_mq_run_work_fn
[  747.652414] task: 0000000060680000 task.stack: 000000005ea30000
[  747.652415] Krnl PSW : 0704f00180000000 0000000000505864 (__blk_mq_run_hw_queue+0xd4/0x100)
[  747.652417]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[  747.652417] Krnl GPRS: 0000000000000010 00000000000000ff 000000005cbec400 0000000000000000
[  747.652418]            0000000063709120 0000000000000000 0000000063709500 0000000059fa44b0
[  747.652418]            0000000059fa4480 0000000000000000 000000006370f700 0000000063709100
[  747.652419]            000000005cbec500 0000000000970948 000000005ea33d80 000000005ea33d48
[  747.652423] Krnl Code: 0000000000505854: ebaff0a00004        lmg     %r10,%r15,160(%r15)
           000000000050585a: c0f4ffe690d3       brcl    15,1d7a00
          #0000000000505860: a7f40001           brc     15,505862
          >0000000000505864: 581003b0           l       %r1,944
           0000000000505868: c01b001fff00       nilf    %r1,2096896
           000000000050586e: a784ffdb           brc     8,505824
           0000000000505872: a7f40001           brc     15,505874
           0000000000505876: 9120218f           tm      399(%r2),32
[  747.652435] Call Trace:
[  747.652435] ([<0000000063709600>] 0x63709600)
[  747.652436]  [<0000000000187bcc>] process_one_work+0x264/0x4b8 
[  747.652438]  [<0000000000187e78>] worker_thread+0x58/0x4f8 
[  747.652439]  [<000000000018ee94>] kthread+0x144/0x168 
[  747.652439]  [<00000000008f8a62>] kernel_thread_starter+0x6/0xc 
[  747.652440]  [<00000000008f8a5c>] kernel_thread_starter+0x0/0xc 
[  747.652440] Last Breaking-Event-Address:
[  747.652441]  [<0000000000505860>] __blk_mq_run_hw_queue+0xd0/0x100
[  747.652442] ---[ end trace 4a001a80379b18ba ]---
[  747.652450] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21  9:50               ` Christian Borntraeger
  (?)
@ 2017-11-21 10:14               ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 10:14 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 10:50 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>>>> This is 
>>>>>>>>
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>>>
>>>>>>> Did you really try to figure out when the code that reported the warning
>>>>>>> was introduced? I think that warning was introduced through the following
>>>>>>> commit:
>>>>>>
>>>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>>>
>>>>>>>
>>>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>>>
>>>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>>>      
>>>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>>>     want to check is whether or not we are on the right CPU.
>>>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>>>     the mask directly.
>>>>>>>
>>>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>>>> is to figure out what work item was involved and why that work item got
>>>>>>> executed on the wrong CPU.
>>>>>>
>>>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>>>> says: "no this is not a known issue" then :-)
>>>>>> I will try to take a dump to find out the work item
>>>>>
>>>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>>>> if you are doing CPU hot unplug while running a fio job.
>>>>
>>>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>>
>>> OK, that's different, we should not be triggering a warning for that.
>>> What does your machine/virtblk topology look like in terms of CPUS,
>>> nr of queues for virtblk, etc?
>>
>> FWIW, 4.11 does work, 4.12 and later is broken.
> 
> In fact: 4.12 is fine, 4.12.14 is broken.


Bisect points to

1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jun 26 12:20:57 2017 +0200

    blk-mq: Create hctx for each present CPU
    
    commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
    
    Currently we only create hctx for online CPUs, which can lead to a lot
    of churn due to frequent soft offline / online operations.  Instead
    allocate one for each present CPU to avoid this and dramatically simplify
    the code.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jens Axboe <axboe@kernel.dk>
    Cc: Keith Busch <keith.busch@intel.com>
    Cc: linux-block@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
    Cc: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 a61cb023014a7b7a6b9f24ea04fe8ab22299e706 059ba6dc3290c74e0468937348e580cd53f963e7 M	block
:040000 040000 432e719d7e738ffcddfb8fc964544d3b3e0a68f7 f4572aa21b249a851a1b604c148eea109e93b30d M	include





adding Christoph FWIW, your patch triggers the following on 4.14 when doing a cpu hotplug (adding a
CPU) and then accessing a virtio-blk device.


  747.652408] ------------[ cut here ]------------
[  747.652410] WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 __blk_mq_run_hw_queue+0xd4/0x100
[  747.652410] Modules linked in: dm_multipath
[  747.652412] CPU: 4 PID: 2895 Comm: kworker/4:1H Tainted: G        W       4.14.0+ #191
[  747.652412] Hardware name: IBM 2964 NC9 704 (KVM/Linux)
[  747.652414] Workqueue: kblockd blk_mq_run_work_fn
[  747.652414] task: 0000000060680000 task.stack: 000000005ea30000
[  747.652415] Krnl PSW : 0704f00180000000 0000000000505864 (__blk_mq_run_hw_queue+0xd4/0x100)
[  747.652417]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[  747.652417] Krnl GPRS: 0000000000000010 00000000000000ff 000000005cbec400 0000000000000000
[  747.652418]            0000000063709120 0000000000000000 0000000063709500 0000000059fa44b0
[  747.652418]            0000000059fa4480 0000000000000000 000000006370f700 0000000063709100
[  747.652419]            000000005cbec500 0000000000970948 000000005ea33d80 000000005ea33d48
[  747.652423] Krnl Code: 0000000000505854: ebaff0a00004        lmg     %r10,%r15,160(%r15)
           000000000050585a: c0f4ffe690d3       brcl    15,1d7a00
          #0000000000505860: a7f40001           brc     15,505862
          >0000000000505864: 581003b0           l       %r1,944
           0000000000505868: c01b001fff00       nilf    %r1,2096896
           000000000050586e: a784ffdb           brc     8,505824
           0000000000505872: a7f40001           brc     15,505874
           0000000000505876: 9120218f           tm      399(%r2),32
[  747.652435] Call Trace:
[  747.652435] ([<0000000063709600>] 0x63709600)
[  747.652436]  [<0000000000187bcc>] process_one_work+0x264/0x4b8 
[  747.652438]  [<0000000000187e78>] worker_thread+0x58/0x4f8 
[  747.652439]  [<000000000018ee94>] kthread+0x144/0x168 
[  747.652439]  [<00000000008f8a62>] kernel_thread_starter+0x6/0xc 
[  747.652440]  [<00000000008f8a5c>] kernel_thread_starter+0x0/0xc 
[  747.652440] Last Breaking-Event-Address:
[  747.652441]  [<0000000000505860>] __blk_mq_run_hw_queue+0xd0/0x100
[  747.652442] ---[ end trace 4a001a80379b18ba ]---
[  747.652450] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 10:14                 ` Christian Borntraeger
@ 2017-11-21 17:27                   ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 17:27 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
> Bisect points to
> 
> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Mon Jun 26 12:20:57 2017 +0200
> 
>     blk-mq: Create hctx for each present CPU
>     
>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>     
>     Currently we only create hctx for online CPUs, which can lead to a lot
>     of churn due to frequent soft offline / online operations.  Instead
>     allocate one for each present CPU to avoid this and dramatically simplify
>     the code.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>     Cc: Keith Busch <keith.busch@intel.com>
>     Cc: linux-block@vger.kernel.org
>     Cc: linux-nvme@lists.infradead.org
>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>     Cc: Mike Galbraith <efault@gmx.de>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

I wonder if we're simply not getting the masks updated correctly. I'll
take a look.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 17:27                   ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 17:27 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
> Bisect points to
> 
> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Mon Jun 26 12:20:57 2017 +0200
> 
>     blk-mq: Create hctx for each present CPU
>     
>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>     
>     Currently we only create hctx for online CPUs, which can lead to a lot
>     of churn due to frequent soft offline / online operations.  Instead
>     allocate one for each present CPU to avoid this and dramatically simplify
>     the code.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>     Cc: Keith Busch <keith.busch@intel.com>
>     Cc: linux-block@vger.kernel.org
>     Cc: linux-nvme@lists.infradead.org
>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>     Cc: Mike Galbraith <efault@gmx.de>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

I wonder if we're simply not getting the masks updated correctly. I'll
take a look.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 10:14                 ` Christian Borntraeger
  (?)
@ 2017-11-21 17:27                 ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 17:27 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
> Bisect points to
> 
> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Mon Jun 26 12:20:57 2017 +0200
> 
>     blk-mq: Create hctx for each present CPU
>     
>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>     
>     Currently we only create hctx for online CPUs, which can lead to a lot
>     of churn due to frequent soft offline / online operations.  Instead
>     allocate one for each present CPU to avoid this and dramatically simplify
>     the code.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>     Cc: Keith Busch <keith.busch@intel.com>
>     Cc: linux-block@vger.kernel.org
>     Cc: linux-nvme@lists.infradead.org
>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>     Cc: Mike Galbraith <efault@gmx.de>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

I wonder if we're simply not getting the masks updated correctly. I'll
take a look.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 17:27                   ` Jens Axboe
  (?)
@ 2017-11-21 18:09                     ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 18:09 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 10:27 AM, Jens Axboe wrote:
> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>> Bisect points to
>>
>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>> Author: Christoph Hellwig <hch@lst.de>
>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>
>>     blk-mq: Create hctx for each present CPU
>>     
>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>     
>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>     of churn due to frequent soft offline / online operations.  Instead
>>     allocate one for each present CPU to avoid this and dramatically simplify
>>     the code.
>>     
>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>     Cc: Keith Busch <keith.busch@intel.com>
>>     Cc: linux-block@vger.kernel.org
>>     Cc: linux-nvme@lists.infradead.org
>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>     Cc: Mike Galbraith <efault@gmx.de>
>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> I wonder if we're simply not getting the masks updated correctly. I'll
> take a look.

Can't make it trigger here. We do init for each present CPU, which means
that if I offline a few CPUs here and register a queue, those still show
up as present (just offline) and get mapped accordingly.

>From the looks of it, your setup is different. If the CPU doesn't show
up as present and it gets hotplugged, then I can see how this condition
would trigger. What environment are you running this in? We might have
to re-introduce the cpu hotplug notifier, right now we just monitor
for a dead cpu and handle that.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 18:09                     ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 18:09 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 10:27 AM, Jens Axboe wrote:
> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>> Bisect points to
>>
>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>> Author: Christoph Hellwig <hch@lst.de>
>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>
>>     blk-mq: Create hctx for each present CPU
>>     
>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>     
>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>     of churn due to frequent soft offline / online operations.  Instead
>>     allocate one for each present CPU to avoid this and dramatically simplify
>>     the code.
>>     
>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>     Cc: Keith Busch <keith.busch@intel.com>
>>     Cc: linux-block@vger.kernel.org
>>     Cc: linux-nvme@lists.infradead.org
>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>     Cc: Mike Galbraith <efault@gmx.de>
>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> I wonder if we're simply not getting the masks updated correctly. I'll
> take a look.

Can't make it trigger here. We do init for each present CPU, which means
that if I offline a few CPUs here and register a queue, those still show
up as present (just offline) and get mapped accordingly.

>From the looks of it, your setup is different. If the CPU doesn't show
up as present and it gets hotplugged, then I can see how this condition
would trigger. What environment are you running this in? We might have
to re-introduce the cpu hotplug notifier, right now we just monitor
for a dead cpu and handle that.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 18:09                     ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 18:09 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 10:27 AM, Jens Axboe wrote:
> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>> Bisect points to
>>
>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>> Author: Christoph Hellwig <hch@lst.de>
>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>
>>     blk-mq: Create hctx for each present CPU
>>     
>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>     
>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>     of churn due to frequent soft offline / online operations.  Instead
>>     allocate one for each present CPU to avoid this and dramatically simplify
>>     the code.
>>     
>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>     Cc: Keith Busch <keith.busch@intel.com>
>>     Cc: linux-block@vger.kernel.org
>>     Cc: linux-nvme@lists.infradead.org
>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>     Cc: Mike Galbraith <efault@gmx.de>
>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> I wonder if we're simply not getting the masks updated correctly. I'll
> take a look.

Can't make it trigger here. We do init for each present CPU, which means
that if I offline a few CPUs here and register a queue, those still show
up as present (just offline) and get mapped accordingly.

From the looks of it, your setup is different. If the CPU doesn't show
up as present and it gets hotplugged, then I can see how this condition
would trigger. What environment are you running this in? We might have
to re-introduce the cpu hotplug notifier, right now we just monitor
for a dead cpu and handle that.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:09                     ` Jens Axboe
@ 2017-11-21 18:12                       ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 18:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 07:09 PM, Jens Axboe wrote:
> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>> Bisect points to
>>>
>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>> Author: Christoph Hellwig <hch@lst.de>
>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>
>>>     blk-mq: Create hctx for each present CPU
>>>     
>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>     
>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>     of churn due to frequent soft offline / online operations.  Instead
>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>     the code.
>>>     
>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>     Cc: linux-block@vger.kernel.org
>>>     Cc: linux-nvme@lists.infradead.org
>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> I wonder if we're simply not getting the masks updated correctly. I'll
>> take a look.
> 
> Can't make it trigger here. We do init for each present CPU, which means
> that if I offline a few CPUs here and register a queue, those still show
> up as present (just offline) and get mapped accordingly.
> 
> From the looks of it, your setup is different. If the CPU doesn't show
> up as present and it gets hotplugged, then I can see how this condition
> would trigger. What environment are you running this in? We might have
> to re-introduce the cpu hotplug notifier, right now we just monitor
> for a dead cpu and handle that.

I am not doing a hot unplug and the replug, I use KVM and add a previously
not available CPU.

in libvirt/virsh speak:
  <vcpu placement='static' current='1'>4</vcpu>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 18:12                       ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 18:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 07:09 PM, Jens Axboe wrote:
> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>> Bisect points to
>>>
>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>> Author: Christoph Hellwig <hch@lst.de>
>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>
>>>     blk-mq: Create hctx for each present CPU
>>>     
>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>     
>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>     of churn due to frequent soft offline / online operations.  Instead
>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>     the code.
>>>     
>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>     Cc: linux-block@vger.kernel.org
>>>     Cc: linux-nvme@lists.infradead.org
>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> I wonder if we're simply not getting the masks updated correctly. I'll
>> take a look.
> 
> Can't make it trigger here. We do init for each present CPU, which means
> that if I offline a few CPUs here and register a queue, those still show
> up as present (just offline) and get mapped accordingly.
> 
> From the looks of it, your setup is different. If the CPU doesn't show
> up as present and it gets hotplugged, then I can see how this condition
> would trigger. What environment are you running this in? We might have
> to re-introduce the cpu hotplug notifier, right now we just monitor
> for a dead cpu and handle that.

I am not doing a hot unplug and the replug, I use KVM and add a previously
not available CPU.

in libvirt/virsh speak:
  <vcpu placement='static' current='1'>4</vcpu>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:09                     ` Jens Axboe
  (?)
  (?)
@ 2017-11-21 18:12                     ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 18:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 07:09 PM, Jens Axboe wrote:
> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>> Bisect points to
>>>
>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>> Author: Christoph Hellwig <hch@lst.de>
>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>
>>>     blk-mq: Create hctx for each present CPU
>>>     
>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>     
>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>     of churn due to frequent soft offline / online operations.  Instead
>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>     the code.
>>>     
>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>     Cc: linux-block@vger.kernel.org
>>>     Cc: linux-nvme@lists.infradead.org
>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> I wonder if we're simply not getting the masks updated correctly. I'll
>> take a look.
> 
> Can't make it trigger here. We do init for each present CPU, which means
> that if I offline a few CPUs here and register a queue, those still show
> up as present (just offline) and get mapped accordingly.
> 
> From the looks of it, your setup is different. If the CPU doesn't show
> up as present and it gets hotplugged, then I can see how this condition
> would trigger. What environment are you running this in? We might have
> to re-introduce the cpu hotplug notifier, right now we just monitor
> for a dead cpu and handle that.

I am not doing a hot unplug and the replug, I use KVM and add a previously
not available CPU.

in libvirt/virsh speak:
  <vcpu placement='static' current='1'>4</vcpu>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:12                       ` Christian Borntraeger
@ 2017-11-21 18:27                         ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 18:27 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>> Bisect points to
>>>>
>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>> Author: Christoph Hellwig <hch@lst.de>
>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>
>>>>     blk-mq: Create hctx for each present CPU
>>>>     
>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>     
>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>     the code.
>>>>     
>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>     Cc: linux-block@vger.kernel.org
>>>>     Cc: linux-nvme@lists.infradead.org
>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>
>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>> take a look.
>>
>> Can't make it trigger here. We do init for each present CPU, which means
>> that if I offline a few CPUs here and register a queue, those still show
>> up as present (just offline) and get mapped accordingly.
>>
>> From the looks of it, your setup is different. If the CPU doesn't show
>> up as present and it gets hotplugged, then I can see how this condition
>> would trigger. What environment are you running this in? We might have
>> to re-introduce the cpu hotplug notifier, right now we just monitor
>> for a dead cpu and handle that.
> 
> I am not doing a hot unplug and the replug, I use KVM and add a previously
> not available CPU.
> 
> in libvirt/virsh speak:
>   <vcpu placement='static' current='1'>4</vcpu>

So that's why we run into problems. It's not present when we load the device,
but becomes present and online afterwards.

Christoph, we used to handle this just fine, your patch broke it.

I'll see if I can come up with an appropriate fix.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 18:27                         ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 18:27 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>> Bisect points to
>>>>
>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>> Author: Christoph Hellwig <hch@lst.de>
>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>
>>>>     blk-mq: Create hctx for each present CPU
>>>>     
>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>     
>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>     the code.
>>>>     
>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>     Cc: linux-block@vger.kernel.org
>>>>     Cc: linux-nvme@lists.infradead.org
>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>
>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>> take a look.
>>
>> Can't make it trigger here. We do init for each present CPU, which means
>> that if I offline a few CPUs here and register a queue, those still show
>> up as present (just offline) and get mapped accordingly.
>>
>> From the looks of it, your setup is different. If the CPU doesn't show
>> up as present and it gets hotplugged, then I can see how this condition
>> would trigger. What environment are you running this in? We might have
>> to re-introduce the cpu hotplug notifier, right now we just monitor
>> for a dead cpu and handle that.
> 
> I am not doing a hot unplug and the replug, I use KVM and add a previously
> not available CPU.
> 
> in libvirt/virsh speak:
>   <vcpu placement='static' current='1'>4</vcpu>

So that's why we run into problems. It's not present when we load the device,
but becomes present and online afterwards.

Christoph, we used to handle this just fine, your patch broke it.

I'll see if I can come up with an appropriate fix.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:27                         ` Jens Axboe
@ 2017-11-21 18:39                           ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 18:39 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 11:27 AM, Jens Axboe wrote:
> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>> Bisect points to
>>>>>
>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>
>>>>>     blk-mq: Create hctx for each present CPU
>>>>>     
>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>     
>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>     the code.
>>>>>     
>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>     Cc: linux-block@vger.kernel.org
>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>
>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>> take a look.
>>>
>>> Can't make it trigger here. We do init for each present CPU, which means
>>> that if I offline a few CPUs here and register a queue, those still show
>>> up as present (just offline) and get mapped accordingly.
>>>
>>> From the looks of it, your setup is different. If the CPU doesn't show
>>> up as present and it gets hotplugged, then I can see how this condition
>>> would trigger. What environment are you running this in? We might have
>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>> for a dead cpu and handle that.
>>
>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>> not available CPU.
>>
>> in libvirt/virsh speak:
>>   <vcpu placement='static' current='1'>4</vcpu>
> 
> So that's why we run into problems. It's not present when we load the device,
> but becomes present and online afterwards.
> 
> Christoph, we used to handle this just fine, your patch broke it.
> 
> I'll see if I can come up with an appropriate fix.

Can you try the below?


diff --git a/block/blk-mq.c b/block/blk-mq.c
index b600463791ec..ab3a66e7bd03 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -40,6 +40,7 @@
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
+static void blk_mq_map_swqueue(struct request_queue *q);
 
 static int blk_mq_poll_stats_bkt(const struct request *rq)
 {
@@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	return -ENOMEM;
 }
 
+static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node)
+{
+	struct blk_mq_hw_ctx *hctx;
+
+	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
+	blk_mq_map_swqueue(hctx->queue);
+	return 0;
+}
+
 /*
  * 'cpu' is going away. splice any existing rq_list entries from this
  * software queue to the hw queue dispatch list, and ensure that it
@@ -1958,7 +1970,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	struct blk_mq_ctx *ctx;
 	LIST_HEAD(tmp);
 
-	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
+	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
 	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
 
 	spin_lock(&ctx->lock);
@@ -1981,8 +1993,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 
 static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
 {
-	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
-					    &hctx->cpuhp_dead);
+	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
 }
 
 /* hctx->ctxs will be freed in queue's release handler */
@@ -2039,7 +2050,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	hctx->queue = q;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
 
-	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
+	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
 
 	hctx->tags = set->tags[hctx_idx];
 
@@ -2974,7 +2987,8 @@ static int __init blk_mq_init(void)
 	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
 			(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
 
-	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
+	cpuhp_setup_state_multi(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
+				blk_mq_hctx_notify_prepare,
 				blk_mq_hctx_notify_dead);
 	return 0;
 }
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 95c9a5c862e2..a6f03e9464fb 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -52,7 +52,7 @@ struct blk_mq_hw_ctx {
 
 	atomic_t		nr_active;
 
-	struct hlist_node	cpuhp_dead;
+	struct hlist_node	cpuhp;
 	struct kobject		kobj;
 
 	unsigned long		poll_considered;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index ec32c4c5eb30..28b0fc9229c8 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -48,7 +48,7 @@ enum cpuhp_state {
 	CPUHP_BLOCK_SOFTIRQ_DEAD,
 	CPUHP_ACPI_CPUDRV_DEAD,
 	CPUHP_S390_PFAULT_DEAD,
-	CPUHP_BLK_MQ_DEAD,
+	CPUHP_BLK_MQ_PREPARE,
 	CPUHP_FS_BUFF_DEAD,
 	CPUHP_PRINTK_DEAD,
 	CPUHP_MM_MEMCQ_DEAD,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 18:39                           ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 18:39 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 11:27 AM, Jens Axboe wrote:
> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>> Bisect points to
>>>>>
>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>
>>>>>     blk-mq: Create hctx for each present CPU
>>>>>     
>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>     
>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>     the code.
>>>>>     
>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>     Cc: linux-block@vger.kernel.org
>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>
>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>> take a look.
>>>
>>> Can't make it trigger here. We do init for each present CPU, which means
>>> that if I offline a few CPUs here and register a queue, those still show
>>> up as present (just offline) and get mapped accordingly.
>>>
>>> From the looks of it, your setup is different. If the CPU doesn't show
>>> up as present and it gets hotplugged, then I can see how this condition
>>> would trigger. What environment are you running this in? We might have
>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>> for a dead cpu and handle that.
>>
>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>> not available CPU.
>>
>> in libvirt/virsh speak:
>>   <vcpu placement='static' current='1'>4</vcpu>
> 
> So that's why we run into problems. It's not present when we load the device,
> but becomes present and online afterwards.
> 
> Christoph, we used to handle this just fine, your patch broke it.
> 
> I'll see if I can come up with an appropriate fix.

Can you try the below?


diff --git a/block/blk-mq.c b/block/blk-mq.c
index b600463791ec..ab3a66e7bd03 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -40,6 +40,7 @@
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
+static void blk_mq_map_swqueue(struct request_queue *q);
 
 static int blk_mq_poll_stats_bkt(const struct request *rq)
 {
@@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	return -ENOMEM;
 }
 
+static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node)
+{
+	struct blk_mq_hw_ctx *hctx;
+
+	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
+	blk_mq_map_swqueue(hctx->queue);
+	return 0;
+}
+
 /*
  * 'cpu' is going away. splice any existing rq_list entries from this
  * software queue to the hw queue dispatch list, and ensure that it
@@ -1958,7 +1970,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	struct blk_mq_ctx *ctx;
 	LIST_HEAD(tmp);
 
-	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
+	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
 	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
 
 	spin_lock(&ctx->lock);
@@ -1981,8 +1993,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 
 static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
 {
-	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
-					    &hctx->cpuhp_dead);
+	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
 }
 
 /* hctx->ctxs will be freed in queue's release handler */
@@ -2039,7 +2050,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	hctx->queue = q;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
 
-	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
+	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
 
 	hctx->tags = set->tags[hctx_idx];
 
@@ -2974,7 +2987,8 @@ static int __init blk_mq_init(void)
 	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
 			(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
 
-	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
+	cpuhp_setup_state_multi(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
+				blk_mq_hctx_notify_prepare,
 				blk_mq_hctx_notify_dead);
 	return 0;
 }
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 95c9a5c862e2..a6f03e9464fb 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -52,7 +52,7 @@ struct blk_mq_hw_ctx {
 
 	atomic_t		nr_active;
 
-	struct hlist_node	cpuhp_dead;
+	struct hlist_node	cpuhp;
 	struct kobject		kobj;
 
 	unsigned long		poll_considered;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index ec32c4c5eb30..28b0fc9229c8 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -48,7 +48,7 @@ enum cpuhp_state {
 	CPUHP_BLOCK_SOFTIRQ_DEAD,
 	CPUHP_ACPI_CPUDRV_DEAD,
 	CPUHP_S390_PFAULT_DEAD,
-	CPUHP_BLK_MQ_DEAD,
+	CPUHP_BLK_MQ_PREPARE,
 	CPUHP_FS_BUFF_DEAD,
 	CPUHP_PRINTK_DEAD,
 	CPUHP_MM_MEMCQ_DEAD,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:39                           ` Jens Axboe
@ 2017-11-21 19:15                             ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 19:15 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 07:39 PM, Jens Axboe wrote:
> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>> Bisect points to
>>>>>>
>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>
>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>     
>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>     
>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>     the code.
>>>>>>     
>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>
>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>> take a look.
>>>>
>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>> that if I offline a few CPUs here and register a queue, those still show
>>>> up as present (just offline) and get mapped accordingly.
>>>>
>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>> up as present and it gets hotplugged, then I can see how this condition
>>>> would trigger. What environment are you running this in? We might have
>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>> for a dead cpu and handle that.
>>>
>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>> not available CPU.
>>>
>>> in libvirt/virsh speak:
>>>   <vcpu placement='static' current='1'>4</vcpu>
>>
>> So that's why we run into problems. It's not present when we load the device,
>> but becomes present and online afterwards.
>>
>> Christoph, we used to handle this just fine, your patch broke it.
>>
>> I'll see if I can come up with an appropriate fix.
> 
> Can you try the below?


It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:


output with 2 cpus:
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b600463791ec..ab3a66e7bd03 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -40,6 +40,7 @@
>  static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> +static void blk_mq_map_swqueue(struct request_queue *q);
> 
>  static int blk_mq_poll_stats_bkt(const struct request *rq)
>  {
> @@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  	return -ENOMEM;
>  }
> 
> +static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct blk_mq_hw_ctx *hctx;
> +
> +	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
> +	blk_mq_map_swqueue(hctx->queue);
> +	return 0;
> +}
> +
>  /*
>   * 'cpu' is going away. splice any existing rq_list entries from this
>   * software queue to the hw queue dispatch list, and ensure that it
> @@ -1958,7 +1970,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
>  	struct blk_mq_ctx *ctx;
>  	LIST_HEAD(tmp);
> 
> -	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
> +	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
>  	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
> 
>  	spin_lock(&ctx->lock);
> @@ -1981,8 +1993,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
> 
>  static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
>  {
> -	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
> -					    &hctx->cpuhp_dead);
> +	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
>  }
> 
>  /* hctx->ctxs will be freed in queue's release handler */
> @@ -2039,7 +2050,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
>  	hctx->queue = q;
>  	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
> 
> -	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
> +	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
> 
>  	hctx->tags = set->tags[hctx_idx];
> 
> @@ -2974,7 +2987,8 @@ static int __init blk_mq_init(void)
>  	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
>  			(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
> 
> -	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
> +	cpuhp_setup_state_multi(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> +				blk_mq_hctx_notify_prepare,
>  				blk_mq_hctx_notify_dead);
>  	return 0;
>  }
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 95c9a5c862e2..a6f03e9464fb 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -52,7 +52,7 @@ struct blk_mq_hw_ctx {
> 
>  	atomic_t		nr_active;
> 
> -	struct hlist_node	cpuhp_dead;
> +	struct hlist_node	cpuhp;
>  	struct kobject		kobj;
> 
>  	unsigned long		poll_considered;
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index ec32c4c5eb30..28b0fc9229c8 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -48,7 +48,7 @@ enum cpuhp_state {
>  	CPUHP_BLOCK_SOFTIRQ_DEAD,
>  	CPUHP_ACPI_CPUDRV_DEAD,
>  	CPUHP_S390_PFAULT_DEAD,
> -	CPUHP_BLK_MQ_DEAD,
> +	CPUHP_BLK_MQ_PREPARE,
>  	CPUHP_FS_BUFF_DEAD,
>  	CPUHP_PRINTK_DEAD,
>  	CPUHP_MM_MEMCQ_DEAD,
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 19:15                             ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 19:15 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 07:39 PM, Jens Axboe wrote:
> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>> Bisect points to
>>>>>>
>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>
>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>     
>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>     
>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>     the code.
>>>>>>     
>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>
>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>> take a look.
>>>>
>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>> that if I offline a few CPUs here and register a queue, those still show
>>>> up as present (just offline) and get mapped accordingly.
>>>>
>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>> up as present and it gets hotplugged, then I can see how this condition
>>>> would trigger. What environment are you running this in? We might have
>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>> for a dead cpu and handle that.
>>>
>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>> not available CPU.
>>>
>>> in libvirt/virsh speak:
>>>   <vcpu placement='static' current='1'>4</vcpu>
>>
>> So that's why we run into problems. It's not present when we load the device,
>> but becomes present and online afterwards.
>>
>> Christoph, we used to handle this just fine, your patch broke it.
>>
>> I'll see if I can come up with an appropriate fix.
> 
> Can you try the below?


It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:


output with 2 cpus:
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b600463791ec..ab3a66e7bd03 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -40,6 +40,7 @@
>  static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> +static void blk_mq_map_swqueue(struct request_queue *q);
> 
>  static int blk_mq_poll_stats_bkt(const struct request *rq)
>  {
> @@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  	return -ENOMEM;
>  }
> 
> +static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct blk_mq_hw_ctx *hctx;
> +
> +	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
> +	blk_mq_map_swqueue(hctx->queue);
> +	return 0;
> +}
> +
>  /*
>   * 'cpu' is going away. splice any existing rq_list entries from this
>   * software queue to the hw queue dispatch list, and ensure that it
> @@ -1958,7 +1970,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
>  	struct blk_mq_ctx *ctx;
>  	LIST_HEAD(tmp);
> 
> -	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
> +	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
>  	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
> 
>  	spin_lock(&ctx->lock);
> @@ -1981,8 +1993,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
> 
>  static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
>  {
> -	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
> -					    &hctx->cpuhp_dead);
> +	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
>  }
> 
>  /* hctx->ctxs will be freed in queue's release handler */
> @@ -2039,7 +2050,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
>  	hctx->queue = q;
>  	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
> 
> -	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
> +	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
> 
>  	hctx->tags = set->tags[hctx_idx];
> 
> @@ -2974,7 +2987,8 @@ static int __init blk_mq_init(void)
>  	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
>  			(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
> 
> -	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
> +	cpuhp_setup_state_multi(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> +				blk_mq_hctx_notify_prepare,
>  				blk_mq_hctx_notify_dead);
>  	return 0;
>  }
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 95c9a5c862e2..a6f03e9464fb 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -52,7 +52,7 @@ struct blk_mq_hw_ctx {
> 
>  	atomic_t		nr_active;
> 
> -	struct hlist_node	cpuhp_dead;
> +	struct hlist_node	cpuhp;
>  	struct kobject		kobj;
> 
>  	unsigned long		poll_considered;
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index ec32c4c5eb30..28b0fc9229c8 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -48,7 +48,7 @@ enum cpuhp_state {
>  	CPUHP_BLOCK_SOFTIRQ_DEAD,
>  	CPUHP_ACPI_CPUDRV_DEAD,
>  	CPUHP_S390_PFAULT_DEAD,
> -	CPUHP_BLK_MQ_DEAD,
> +	CPUHP_BLK_MQ_PREPARE,
>  	CPUHP_FS_BUFF_DEAD,
>  	CPUHP_PRINTK_DEAD,
>  	CPUHP_MM_MEMCQ_DEAD,
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:39                           ` Jens Axboe
  (?)
  (?)
@ 2017-11-21 19:15                           ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 19:15 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 07:39 PM, Jens Axboe wrote:
> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>> Bisect points to
>>>>>>
>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>
>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>     
>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>     
>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>     the code.
>>>>>>     
>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>
>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>> take a look.
>>>>
>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>> that if I offline a few CPUs here and register a queue, those still show
>>>> up as present (just offline) and get mapped accordingly.
>>>>
>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>> up as present and it gets hotplugged, then I can see how this condition
>>>> would trigger. What environment are you running this in? We might have
>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>> for a dead cpu and handle that.
>>>
>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>> not available CPU.
>>>
>>> in libvirt/virsh speak:
>>>   <vcpu placement='static' current='1'>4</vcpu>
>>
>> So that's why we run into problems. It's not present when we load the device,
>> but becomes present and online afterwards.
>>
>> Christoph, we used to handle this just fine, your patch broke it.
>>
>> I'll see if I can come up with an appropriate fix.
> 
> Can you try the below?


It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:


output with 2 cpus:
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b600463791ec..ab3a66e7bd03 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -40,6 +40,7 @@
>  static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> +static void blk_mq_map_swqueue(struct request_queue *q);
> 
>  static int blk_mq_poll_stats_bkt(const struct request *rq)
>  {
> @@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  	return -ENOMEM;
>  }
> 
> +static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct blk_mq_hw_ctx *hctx;
> +
> +	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
> +	blk_mq_map_swqueue(hctx->queue);
> +	return 0;
> +}
> +
>  /*
>   * 'cpu' is going away. splice any existing rq_list entries from this
>   * software queue to the hw queue dispatch list, and ensure that it
> @@ -1958,7 +1970,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
>  	struct blk_mq_ctx *ctx;
>  	LIST_HEAD(tmp);
> 
> -	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
> +	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
>  	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
> 
>  	spin_lock(&ctx->lock);
> @@ -1981,8 +1993,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
> 
>  static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
>  {
> -	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
> -					    &hctx->cpuhp_dead);
> +	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
>  }
> 
>  /* hctx->ctxs will be freed in queue's release handler */
> @@ -2039,7 +2050,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
>  	hctx->queue = q;
>  	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
> 
> -	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
> +	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
> 
>  	hctx->tags = set->tags[hctx_idx];
> 
> @@ -2974,7 +2987,8 @@ static int __init blk_mq_init(void)
>  	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
>  			(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
> 
> -	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
> +	cpuhp_setup_state_multi(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> +				blk_mq_hctx_notify_prepare,
>  				blk_mq_hctx_notify_dead);
>  	return 0;
>  }
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 95c9a5c862e2..a6f03e9464fb 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -52,7 +52,7 @@ struct blk_mq_hw_ctx {
> 
>  	atomic_t		nr_active;
> 
> -	struct hlist_node	cpuhp_dead;
> +	struct hlist_node	cpuhp;
>  	struct kobject		kobj;
> 
>  	unsigned long		poll_considered;
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index ec32c4c5eb30..28b0fc9229c8 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -48,7 +48,7 @@ enum cpuhp_state {
>  	CPUHP_BLOCK_SOFTIRQ_DEAD,
>  	CPUHP_ACPI_CPUDRV_DEAD,
>  	CPUHP_S390_PFAULT_DEAD,
> -	CPUHP_BLK_MQ_DEAD,
> +	CPUHP_BLK_MQ_PREPARE,
>  	CPUHP_FS_BUFF_DEAD,
>  	CPUHP_PRINTK_DEAD,
>  	CPUHP_MM_MEMCQ_DEAD,
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 19:15                             ` Christian Borntraeger
@ 2017-11-21 19:30                               ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 19:30 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>> Bisect points to
>>>>>>>
>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>
>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>     
>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>     
>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>     the code.
>>>>>>>     
>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>
>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>> take a look.
>>>>>
>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>> up as present (just offline) and get mapped accordingly.
>>>>>
>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>> would trigger. What environment are you running this in? We might have
>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>> for a dead cpu and handle that.
>>>>
>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>> not available CPU.
>>>>
>>>> in libvirt/virsh speak:
>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>
>>> So that's why we run into problems. It's not present when we load the device,
>>> but becomes present and online afterwards.
>>>
>>> Christoph, we used to handle this just fine, your patch broke it.
>>>
>>> I'll see if I can come up with an appropriate fix.
>>
>> Can you try the below?
> 
> 
> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
> 
> 
> output with 2 cpus:
> /sys/kernel/debug/block/vda
> /sys/kernel/debug/block/vda/hctx0
> /sys/kernel/debug/block/vda/hctx0/cpu0
> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
> /sys/kernel/debug/block/vda/hctx0/active
> /sys/kernel/debug/block/vda/hctx0/run
> /sys/kernel/debug/block/vda/hctx0/queued
> /sys/kernel/debug/block/vda/hctx0/dispatched
> /sys/kernel/debug/block/vda/hctx0/io_poll
> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/sched_tags
> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/tags
> /sys/kernel/debug/block/vda/hctx0/ctx_map
> /sys/kernel/debug/block/vda/hctx0/busy
> /sys/kernel/debug/block/vda/hctx0/dispatch
> /sys/kernel/debug/block/vda/hctx0/flags
> /sys/kernel/debug/block/vda/hctx0/state
> /sys/kernel/debug/block/vda/sched
> /sys/kernel/debug/block/vda/sched/dispatch
> /sys/kernel/debug/block/vda/sched/starved
> /sys/kernel/debug/block/vda/sched/batching
> /sys/kernel/debug/block/vda/sched/write_next_rq
> /sys/kernel/debug/block/vda/sched/write_fifo_list
> /sys/kernel/debug/block/vda/sched/read_next_rq
> /sys/kernel/debug/block/vda/sched/read_fifo_list
> /sys/kernel/debug/block/vda/write_hints
> /sys/kernel/debug/block/vda/state
> /sys/kernel/debug/block/vda/requeue_list
> /sys/kernel/debug/block/vda/poll_stat

Try this, basically just a revert.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..bc1950fa9ef6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,6 +37,9 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
+static DEFINE_MUTEX(all_q_mutex);
+static LIST_HEAD(all_q_list);
+
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
@@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 		INIT_LIST_HEAD(&__ctx->rq_list);
 		__ctx->queue = q;
 
-		/* If the cpu isn't present, the cpu is mapped to first hctx */
-		if (!cpu_present(i))
+		/* If the cpu isn't online, the cpu is mapped to first hctx */
+		if (!cpu_online(i))
 			continue;
 
 		hctx = blk_mq_map_queue(q, i);
@@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 	}
 }
 
-static void blk_mq_map_swqueue(struct request_queue *q)
+static void blk_mq_map_swqueue(struct request_queue *q,
+			       const struct cpumask *online_mask)
 {
 	unsigned int i, hctx_idx;
 	struct blk_mq_hw_ctx *hctx;
@@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 	}
 
 	/*
-	 * Map software to hardware queues.
-	 *
-	 * If the cpu isn't present, the cpu is mapped to first hctx.
+	 * Map software to hardware queues
 	 */
-	for_each_present_cpu(i) {
+	for_each_possible_cpu(i) {
+		/* If the cpu isn't online, the cpu is mapped to first hctx */
+		if (!cpumask_test_cpu(i, online_mask))
+			continue;
+
 		hctx_idx = q->mq_map[i];
 		/* unmapped hw queue can be remapped after CPU topo changed */
 		if (!set->tags[hctx_idx] &&
@@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		blk_queue_softirq_done(q, set->ops->complete);
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
+
+	get_online_cpus();
+	mutex_lock(&all_q_mutex);
+
+	list_add_tail(&q->all_q_node, &all_q_list);
 	blk_mq_add_queue_tag_set(set, q);
-	blk_mq_map_swqueue(q);
+	blk_mq_map_swqueue(q, cpu_online_mask);
+
+	mutex_unlock(&all_q_mutex);
+	put_online_cpus();
 
 	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
 		int ret;
@@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
 {
 	struct blk_mq_tag_set	*set = q->tag_set;
 
+	mutex_lock(&all_q_mutex);
+	list_del_init(&q->all_q_node);
+	mutex_unlock(&all_q_mutex);
+
 	blk_mq_del_queue_tag_set(q);
+
 	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 }
 
 /* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q)
+static void blk_mq_queue_reinit(struct request_queue *q,
+				const struct cpumask *online_mask)
 {
 	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
 
@@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
 	 * we should change hctx numa_node according to the new topology (this
 	 * involves freeing and re-allocating memory, worth doing?)
 	 */
-	blk_mq_map_swqueue(q);
+	blk_mq_map_swqueue(q, online_mask);
 
 	blk_mq_sysfs_register(q);
 	blk_mq_debugfs_register_hctxs(q);
 }
 
+/*
+ * New online cpumask which is going to be set in this hotplug event.
+ * Declare this cpumasks as global as cpu-hotplug operation is invoked
+ * one-by-one and dynamically allocating this could result in a failure.
+ */
+static struct cpumask cpuhp_online_new;
+
+static void blk_mq_queue_reinit_work(void)
+{
+	struct request_queue *q;
+
+	mutex_lock(&all_q_mutex);
+	/*
+	 * We need to freeze and reinit all existing queues.  Freezing
+	 * involves synchronous wait for an RCU grace period and doing it
+	 * one by one may take a long time.  Start freezing all queues in
+	 * one swoop and then wait for the completions so that freezing can
+	 * take place in parallel.
+	 */
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_freeze_queue_start(q);
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_freeze_queue_wait(q);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_queue_reinit(q, &cpuhp_online_new);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_unfreeze_queue(q);
+
+	mutex_unlock(&all_q_mutex);
+}
+
+static int blk_mq_queue_reinit_dead(unsigned int cpu)
+{
+	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+	blk_mq_queue_reinit_work();
+	return 0;
+}
+
+/*
+ * Before hotadded cpu starts handling requests, new mappings must be
+ * established.  Otherwise, these requests in hw queue might never be
+ * dispatched.
+ *
+ * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
+ * for CPU0, and ctx1 for CPU1).
+ *
+ * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
+ * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
+ *
+ * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
+ * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
+ * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
+ * ignored.
+ */
+static int blk_mq_queue_reinit_prepare(unsigned int cpu)
+{
+	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+	cpumask_set_cpu(cpu, &cpuhp_online_new);
+	blk_mq_queue_reinit_work();
+	return 0;
+}
+
 static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 {
 	int i;
@@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	blk_mq_update_queue_map(set);
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_realloc_hw_ctxs(set, q);
-		blk_mq_queue_reinit(q);
+		blk_mq_queue_reinit(q, cpu_online_mask);
 	}
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
@@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 	return __blk_mq_poll(hctx, rq);
 }
 
+void blk_mq_disable_hotplug(void)
+{
+	mutex_lock(&all_q_mutex);
+}
+
+void blk_mq_enable_hotplug(void)
+{
+	mutex_unlock(&all_q_mutex);
+}
+
 static int __init blk_mq_init(void)
 {
 	/*
@@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
 
 	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
 				blk_mq_hctx_notify_dead);
+
+	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
+				  blk_mq_queue_reinit_prepare,
+				  blk_mq_queue_reinit_dead);
 	return 0;
 }
 subsys_initcall(blk_mq_init);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 6c7c3ff5bf62..83b13ef1915e 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
+/*
+ * CPU hotplug helpers
+ */
+void blk_mq_enable_hotplug(void);
+void blk_mq_disable_hotplug(void);
 
 /*
  * CPU -> queue mappings
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 201ab7267986..c31d4e3bf6d0 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -76,6 +76,7 @@ enum cpuhp_state {
 	CPUHP_XEN_EVTCHN_PREPARE,
 	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
 	CPUHP_SH_SH3X_PREPARE,
+	CPUHP_BLK_MQ_PREPARE,
 	CPUHP_NET_FLOW_PREPARE,
 	CPUHP_TOPOLOGY_PREPARE,
 	CPUHP_NET_IUCV_PREPARE,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 19:30                               ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 19:30 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>> Bisect points to
>>>>>>>
>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>
>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>     
>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>     
>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>     the code.
>>>>>>>     
>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>
>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>> take a look.
>>>>>
>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>> up as present (just offline) and get mapped accordingly.
>>>>>
>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>> would trigger. What environment are you running this in? We might have
>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>> for a dead cpu and handle that.
>>>>
>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>> not available CPU.
>>>>
>>>> in libvirt/virsh speak:
>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>
>>> So that's why we run into problems. It's not present when we load the device,
>>> but becomes present and online afterwards.
>>>
>>> Christoph, we used to handle this just fine, your patch broke it.
>>>
>>> I'll see if I can come up with an appropriate fix.
>>
>> Can you try the below?
> 
> 
> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
> 
> 
> output with 2 cpus:
> /sys/kernel/debug/block/vda
> /sys/kernel/debug/block/vda/hctx0
> /sys/kernel/debug/block/vda/hctx0/cpu0
> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
> /sys/kernel/debug/block/vda/hctx0/active
> /sys/kernel/debug/block/vda/hctx0/run
> /sys/kernel/debug/block/vda/hctx0/queued
> /sys/kernel/debug/block/vda/hctx0/dispatched
> /sys/kernel/debug/block/vda/hctx0/io_poll
> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/sched_tags
> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/tags
> /sys/kernel/debug/block/vda/hctx0/ctx_map
> /sys/kernel/debug/block/vda/hctx0/busy
> /sys/kernel/debug/block/vda/hctx0/dispatch
> /sys/kernel/debug/block/vda/hctx0/flags
> /sys/kernel/debug/block/vda/hctx0/state
> /sys/kernel/debug/block/vda/sched
> /sys/kernel/debug/block/vda/sched/dispatch
> /sys/kernel/debug/block/vda/sched/starved
> /sys/kernel/debug/block/vda/sched/batching
> /sys/kernel/debug/block/vda/sched/write_next_rq
> /sys/kernel/debug/block/vda/sched/write_fifo_list
> /sys/kernel/debug/block/vda/sched/read_next_rq
> /sys/kernel/debug/block/vda/sched/read_fifo_list
> /sys/kernel/debug/block/vda/write_hints
> /sys/kernel/debug/block/vda/state
> /sys/kernel/debug/block/vda/requeue_list
> /sys/kernel/debug/block/vda/poll_stat

Try this, basically just a revert.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..bc1950fa9ef6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,6 +37,9 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
+static DEFINE_MUTEX(all_q_mutex);
+static LIST_HEAD(all_q_list);
+
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
@@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 		INIT_LIST_HEAD(&__ctx->rq_list);
 		__ctx->queue = q;
 
-		/* If the cpu isn't present, the cpu is mapped to first hctx */
-		if (!cpu_present(i))
+		/* If the cpu isn't online, the cpu is mapped to first hctx */
+		if (!cpu_online(i))
 			continue;
 
 		hctx = blk_mq_map_queue(q, i);
@@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 	}
 }
 
-static void blk_mq_map_swqueue(struct request_queue *q)
+static void blk_mq_map_swqueue(struct request_queue *q,
+			       const struct cpumask *online_mask)
 {
 	unsigned int i, hctx_idx;
 	struct blk_mq_hw_ctx *hctx;
@@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 	}
 
 	/*
-	 * Map software to hardware queues.
-	 *
-	 * If the cpu isn't present, the cpu is mapped to first hctx.
+	 * Map software to hardware queues
 	 */
-	for_each_present_cpu(i) {
+	for_each_possible_cpu(i) {
+		/* If the cpu isn't online, the cpu is mapped to first hctx */
+		if (!cpumask_test_cpu(i, online_mask))
+			continue;
+
 		hctx_idx = q->mq_map[i];
 		/* unmapped hw queue can be remapped after CPU topo changed */
 		if (!set->tags[hctx_idx] &&
@@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		blk_queue_softirq_done(q, set->ops->complete);
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
+
+	get_online_cpus();
+	mutex_lock(&all_q_mutex);
+
+	list_add_tail(&q->all_q_node, &all_q_list);
 	blk_mq_add_queue_tag_set(set, q);
-	blk_mq_map_swqueue(q);
+	blk_mq_map_swqueue(q, cpu_online_mask);
+
+	mutex_unlock(&all_q_mutex);
+	put_online_cpus();
 
 	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
 		int ret;
@@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
 {
 	struct blk_mq_tag_set	*set = q->tag_set;
 
+	mutex_lock(&all_q_mutex);
+	list_del_init(&q->all_q_node);
+	mutex_unlock(&all_q_mutex);
+
 	blk_mq_del_queue_tag_set(q);
+
 	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 }
 
 /* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q)
+static void blk_mq_queue_reinit(struct request_queue *q,
+				const struct cpumask *online_mask)
 {
 	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
 
@@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
 	 * we should change hctx numa_node according to the new topology (this
 	 * involves freeing and re-allocating memory, worth doing?)
 	 */
-	blk_mq_map_swqueue(q);
+	blk_mq_map_swqueue(q, online_mask);
 
 	blk_mq_sysfs_register(q);
 	blk_mq_debugfs_register_hctxs(q);
 }
 
+/*
+ * New online cpumask which is going to be set in this hotplug event.
+ * Declare this cpumasks as global as cpu-hotplug operation is invoked
+ * one-by-one and dynamically allocating this could result in a failure.
+ */
+static struct cpumask cpuhp_online_new;
+
+static void blk_mq_queue_reinit_work(void)
+{
+	struct request_queue *q;
+
+	mutex_lock(&all_q_mutex);
+	/*
+	 * We need to freeze and reinit all existing queues.  Freezing
+	 * involves synchronous wait for an RCU grace period and doing it
+	 * one by one may take a long time.  Start freezing all queues in
+	 * one swoop and then wait for the completions so that freezing can
+	 * take place in parallel.
+	 */
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_freeze_queue_start(q);
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_freeze_queue_wait(q);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_queue_reinit(q, &cpuhp_online_new);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_unfreeze_queue(q);
+
+	mutex_unlock(&all_q_mutex);
+}
+
+static int blk_mq_queue_reinit_dead(unsigned int cpu)
+{
+	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+	blk_mq_queue_reinit_work();
+	return 0;
+}
+
+/*
+ * Before hotadded cpu starts handling requests, new mappings must be
+ * established.  Otherwise, these requests in hw queue might never be
+ * dispatched.
+ *
+ * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
+ * for CPU0, and ctx1 for CPU1).
+ *
+ * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
+ * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
+ *
+ * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
+ * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
+ * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
+ * ignored.
+ */
+static int blk_mq_queue_reinit_prepare(unsigned int cpu)
+{
+	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+	cpumask_set_cpu(cpu, &cpuhp_online_new);
+	blk_mq_queue_reinit_work();
+	return 0;
+}
+
 static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 {
 	int i;
@@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	blk_mq_update_queue_map(set);
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_realloc_hw_ctxs(set, q);
-		blk_mq_queue_reinit(q);
+		blk_mq_queue_reinit(q, cpu_online_mask);
 	}
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
@@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 	return __blk_mq_poll(hctx, rq);
 }
 
+void blk_mq_disable_hotplug(void)
+{
+	mutex_lock(&all_q_mutex);
+}
+
+void blk_mq_enable_hotplug(void)
+{
+	mutex_unlock(&all_q_mutex);
+}
+
 static int __init blk_mq_init(void)
 {
 	/*
@@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
 
 	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
 				blk_mq_hctx_notify_dead);
+
+	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
+				  blk_mq_queue_reinit_prepare,
+				  blk_mq_queue_reinit_dead);
 	return 0;
 }
 subsys_initcall(blk_mq_init);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 6c7c3ff5bf62..83b13ef1915e 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
+/*
+ * CPU hotplug helpers
+ */
+void blk_mq_enable_hotplug(void);
+void blk_mq_disable_hotplug(void);
 
 /*
  * CPU -> queue mappings
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 201ab7267986..c31d4e3bf6d0 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -76,6 +76,7 @@ enum cpuhp_state {
 	CPUHP_XEN_EVTCHN_PREPARE,
 	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
 	CPUHP_SH_SH3X_PREPARE,
+	CPUHP_BLK_MQ_PREPARE,
 	CPUHP_NET_FLOW_PREPARE,
 	CPUHP_TOPOLOGY_PREPARE,
 	CPUHP_NET_IUCV_PREPARE,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 19:15                             ` Christian Borntraeger
  (?)
@ 2017-11-21 19:30                             ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 19:30 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>> Bisect points to
>>>>>>>
>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>
>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>     
>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>     
>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>     the code.
>>>>>>>     
>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>
>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>> take a look.
>>>>>
>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>> up as present (just offline) and get mapped accordingly.
>>>>>
>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>> would trigger. What environment are you running this in? We might have
>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>> for a dead cpu and handle that.
>>>>
>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>> not available CPU.
>>>>
>>>> in libvirt/virsh speak:
>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>
>>> So that's why we run into problems. It's not present when we load the device,
>>> but becomes present and online afterwards.
>>>
>>> Christoph, we used to handle this just fine, your patch broke it.
>>>
>>> I'll see if I can come up with an appropriate fix.
>>
>> Can you try the below?
> 
> 
> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
> 
> 
> output with 2 cpus:
> /sys/kernel/debug/block/vda
> /sys/kernel/debug/block/vda/hctx0
> /sys/kernel/debug/block/vda/hctx0/cpu0
> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
> /sys/kernel/debug/block/vda/hctx0/active
> /sys/kernel/debug/block/vda/hctx0/run
> /sys/kernel/debug/block/vda/hctx0/queued
> /sys/kernel/debug/block/vda/hctx0/dispatched
> /sys/kernel/debug/block/vda/hctx0/io_poll
> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/sched_tags
> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/tags
> /sys/kernel/debug/block/vda/hctx0/ctx_map
> /sys/kernel/debug/block/vda/hctx0/busy
> /sys/kernel/debug/block/vda/hctx0/dispatch
> /sys/kernel/debug/block/vda/hctx0/flags
> /sys/kernel/debug/block/vda/hctx0/state
> /sys/kernel/debug/block/vda/sched
> /sys/kernel/debug/block/vda/sched/dispatch
> /sys/kernel/debug/block/vda/sched/starved
> /sys/kernel/debug/block/vda/sched/batching
> /sys/kernel/debug/block/vda/sched/write_next_rq
> /sys/kernel/debug/block/vda/sched/write_fifo_list
> /sys/kernel/debug/block/vda/sched/read_next_rq
> /sys/kernel/debug/block/vda/sched/read_fifo_list
> /sys/kernel/debug/block/vda/write_hints
> /sys/kernel/debug/block/vda/state
> /sys/kernel/debug/block/vda/requeue_list
> /sys/kernel/debug/block/vda/poll_stat

Try this, basically just a revert.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..bc1950fa9ef6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,6 +37,9 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
+static DEFINE_MUTEX(all_q_mutex);
+static LIST_HEAD(all_q_list);
+
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
@@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 		INIT_LIST_HEAD(&__ctx->rq_list);
 		__ctx->queue = q;
 
-		/* If the cpu isn't present, the cpu is mapped to first hctx */
-		if (!cpu_present(i))
+		/* If the cpu isn't online, the cpu is mapped to first hctx */
+		if (!cpu_online(i))
 			continue;
 
 		hctx = blk_mq_map_queue(q, i);
@@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 	}
 }
 
-static void blk_mq_map_swqueue(struct request_queue *q)
+static void blk_mq_map_swqueue(struct request_queue *q,
+			       const struct cpumask *online_mask)
 {
 	unsigned int i, hctx_idx;
 	struct blk_mq_hw_ctx *hctx;
@@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 	}
 
 	/*
-	 * Map software to hardware queues.
-	 *
-	 * If the cpu isn't present, the cpu is mapped to first hctx.
+	 * Map software to hardware queues
 	 */
-	for_each_present_cpu(i) {
+	for_each_possible_cpu(i) {
+		/* If the cpu isn't online, the cpu is mapped to first hctx */
+		if (!cpumask_test_cpu(i, online_mask))
+			continue;
+
 		hctx_idx = q->mq_map[i];
 		/* unmapped hw queue can be remapped after CPU topo changed */
 		if (!set->tags[hctx_idx] &&
@@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		blk_queue_softirq_done(q, set->ops->complete);
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
+
+	get_online_cpus();
+	mutex_lock(&all_q_mutex);
+
+	list_add_tail(&q->all_q_node, &all_q_list);
 	blk_mq_add_queue_tag_set(set, q);
-	blk_mq_map_swqueue(q);
+	blk_mq_map_swqueue(q, cpu_online_mask);
+
+	mutex_unlock(&all_q_mutex);
+	put_online_cpus();
 
 	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
 		int ret;
@@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
 {
 	struct blk_mq_tag_set	*set = q->tag_set;
 
+	mutex_lock(&all_q_mutex);
+	list_del_init(&q->all_q_node);
+	mutex_unlock(&all_q_mutex);
+
 	blk_mq_del_queue_tag_set(q);
+
 	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 }
 
 /* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q)
+static void blk_mq_queue_reinit(struct request_queue *q,
+				const struct cpumask *online_mask)
 {
 	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
 
@@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
 	 * we should change hctx numa_node according to the new topology (this
 	 * involves freeing and re-allocating memory, worth doing?)
 	 */
-	blk_mq_map_swqueue(q);
+	blk_mq_map_swqueue(q, online_mask);
 
 	blk_mq_sysfs_register(q);
 	blk_mq_debugfs_register_hctxs(q);
 }
 
+/*
+ * New online cpumask which is going to be set in this hotplug event.
+ * Declare this cpumasks as global as cpu-hotplug operation is invoked
+ * one-by-one and dynamically allocating this could result in a failure.
+ */
+static struct cpumask cpuhp_online_new;
+
+static void blk_mq_queue_reinit_work(void)
+{
+	struct request_queue *q;
+
+	mutex_lock(&all_q_mutex);
+	/*
+	 * We need to freeze and reinit all existing queues.  Freezing
+	 * involves synchronous wait for an RCU grace period and doing it
+	 * one by one may take a long time.  Start freezing all queues in
+	 * one swoop and then wait for the completions so that freezing can
+	 * take place in parallel.
+	 */
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_freeze_queue_start(q);
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_freeze_queue_wait(q);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_queue_reinit(q, &cpuhp_online_new);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_unfreeze_queue(q);
+
+	mutex_unlock(&all_q_mutex);
+}
+
+static int blk_mq_queue_reinit_dead(unsigned int cpu)
+{
+	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+	blk_mq_queue_reinit_work();
+	return 0;
+}
+
+/*
+ * Before hotadded cpu starts handling requests, new mappings must be
+ * established.  Otherwise, these requests in hw queue might never be
+ * dispatched.
+ *
+ * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
+ * for CPU0, and ctx1 for CPU1).
+ *
+ * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
+ * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
+ *
+ * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
+ * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
+ * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
+ * ignored.
+ */
+static int blk_mq_queue_reinit_prepare(unsigned int cpu)
+{
+	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+	cpumask_set_cpu(cpu, &cpuhp_online_new);
+	blk_mq_queue_reinit_work();
+	return 0;
+}
+
 static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 {
 	int i;
@@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	blk_mq_update_queue_map(set);
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_realloc_hw_ctxs(set, q);
-		blk_mq_queue_reinit(q);
+		blk_mq_queue_reinit(q, cpu_online_mask);
 	}
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
@@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 	return __blk_mq_poll(hctx, rq);
 }
 
+void blk_mq_disable_hotplug(void)
+{
+	mutex_lock(&all_q_mutex);
+}
+
+void blk_mq_enable_hotplug(void)
+{
+	mutex_unlock(&all_q_mutex);
+}
+
 static int __init blk_mq_init(void)
 {
 	/*
@@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
 
 	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
 				blk_mq_hctx_notify_dead);
+
+	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
+				  blk_mq_queue_reinit_prepare,
+				  blk_mq_queue_reinit_dead);
 	return 0;
 }
 subsys_initcall(blk_mq_init);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 6c7c3ff5bf62..83b13ef1915e 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
+/*
+ * CPU hotplug helpers
+ */
+void blk_mq_enable_hotplug(void);
+void blk_mq_disable_hotplug(void);
 
 /*
  * CPU -> queue mappings
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 201ab7267986..c31d4e3bf6d0 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -76,6 +76,7 @@ enum cpuhp_state {
 	CPUHP_XEN_EVTCHN_PREPARE,
 	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
 	CPUHP_SH_SH3X_PREPARE,
+	CPUHP_BLK_MQ_PREPARE,
 	CPUHP_NET_FLOW_PREPARE,
 	CPUHP_TOPOLOGY_PREPARE,
 	CPUHP_NET_IUCV_PREPARE,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 19:30                               ` Jens Axboe
@ 2017-11-21 20:12                                 ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 08:30 PM, Jens Axboe wrote:
> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>> Bisect points to
>>>>>>>>
>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>
>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>     
>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>     
>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>     the code.
>>>>>>>>     
>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>
>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>> take a look.
>>>>>>
>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>
>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>> would trigger. What environment are you running this in? We might have
>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>> for a dead cpu and handle that.
>>>>>
>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>> not available CPU.
>>>>>
>>>>> in libvirt/virsh speak:
>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>
>>>> So that's why we run into problems. It's not present when we load the device,
>>>> but becomes present and online afterwards.
>>>>
>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>
>>>> I'll see if I can come up with an appropriate fix.
>>>
>>> Can you try the below?
>>
>>
>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>
>>
>> output with 2 cpus:
>> /sys/kernel/debug/block/vda
>> /sys/kernel/debug/block/vda/hctx0
>> /sys/kernel/debug/block/vda/hctx0/cpu0
>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>> /sys/kernel/debug/block/vda/hctx0/active
>> /sys/kernel/debug/block/vda/hctx0/run
>> /sys/kernel/debug/block/vda/hctx0/queued
>> /sys/kernel/debug/block/vda/hctx0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/io_poll
>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/tags
>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>> /sys/kernel/debug/block/vda/hctx0/busy
>> /sys/kernel/debug/block/vda/hctx0/dispatch
>> /sys/kernel/debug/block/vda/hctx0/flags
>> /sys/kernel/debug/block/vda/hctx0/state
>> /sys/kernel/debug/block/vda/sched
>> /sys/kernel/debug/block/vda/sched/dispatch
>> /sys/kernel/debug/block/vda/sched/starved
>> /sys/kernel/debug/block/vda/sched/batching
>> /sys/kernel/debug/block/vda/sched/write_next_rq
>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>> /sys/kernel/debug/block/vda/sched/read_next_rq
>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>> /sys/kernel/debug/block/vda/write_hints
>> /sys/kernel/debug/block/vda/state
>> /sys/kernel/debug/block/vda/requeue_list
>> /sys/kernel/debug/block/vda/poll_stat
> 
> Try this, basically just a revert.

Yes, seems to work.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Do you know why the original commit made it into 4.12 stable? After all
it has no Fixes tag and no cc stable-


> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..bc1950fa9ef6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -37,6 +37,9 @@
>  #include "blk-wbt.h"
>  #include "blk-mq-sched.h"
> 
> +static DEFINE_MUTEX(all_q_mutex);
> +static LIST_HEAD(all_q_list);
> +
>  static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> @@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
>  		INIT_LIST_HEAD(&__ctx->rq_list);
>  		__ctx->queue = q;
> 
> -		/* If the cpu isn't present, the cpu is mapped to first hctx */
> -		if (!cpu_present(i))
> +		/* If the cpu isn't online, the cpu is mapped to first hctx */
> +		if (!cpu_online(i))
>  			continue;
> 
>  		hctx = blk_mq_map_queue(q, i);
> @@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
>  	}
>  }
> 
> -static void blk_mq_map_swqueue(struct request_queue *q)
> +static void blk_mq_map_swqueue(struct request_queue *q,
> +			       const struct cpumask *online_mask)
>  {
>  	unsigned int i, hctx_idx;
>  	struct blk_mq_hw_ctx *hctx;
> @@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  	}
> 
>  	/*
> -	 * Map software to hardware queues.
> -	 *
> -	 * If the cpu isn't present, the cpu is mapped to first hctx.
> +	 * Map software to hardware queues
>  	 */
> -	for_each_present_cpu(i) {
> +	for_each_possible_cpu(i) {
> +		/* If the cpu isn't online, the cpu is mapped to first hctx */
> +		if (!cpumask_test_cpu(i, online_mask))
> +			continue;
> +
>  		hctx_idx = q->mq_map[i];
>  		/* unmapped hw queue can be remapped after CPU topo changed */
>  		if (!set->tags[hctx_idx] &&
> @@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  		blk_queue_softirq_done(q, set->ops->complete);
> 
>  	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
> +
> +	get_online_cpus();
> +	mutex_lock(&all_q_mutex);
> +
> +	list_add_tail(&q->all_q_node, &all_q_list);
>  	blk_mq_add_queue_tag_set(set, q);
> -	blk_mq_map_swqueue(q);
> +	blk_mq_map_swqueue(q, cpu_online_mask);
> +
> +	mutex_unlock(&all_q_mutex);
> +	put_online_cpus();
> 
>  	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
>  		int ret;
> @@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
>  {
>  	struct blk_mq_tag_set	*set = q->tag_set;
> 
> +	mutex_lock(&all_q_mutex);
> +	list_del_init(&q->all_q_node);
> +	mutex_unlock(&all_q_mutex);
> +
>  	blk_mq_del_queue_tag_set(q);
> +
>  	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
>  }
> 
>  /* Basically redo blk_mq_init_queue with queue frozen */
> -static void blk_mq_queue_reinit(struct request_queue *q)
> +static void blk_mq_queue_reinit(struct request_queue *q,
> +				const struct cpumask *online_mask)
>  {
>  	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
> 
> @@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
>  	 * we should change hctx numa_node according to the new topology (this
>  	 * involves freeing and re-allocating memory, worth doing?)
>  	 */
> -	blk_mq_map_swqueue(q);
> +	blk_mq_map_swqueue(q, online_mask);
> 
>  	blk_mq_sysfs_register(q);
>  	blk_mq_debugfs_register_hctxs(q);
>  }
> 
> +/*
> + * New online cpumask which is going to be set in this hotplug event.
> + * Declare this cpumasks as global as cpu-hotplug operation is invoked
> + * one-by-one and dynamically allocating this could result in a failure.
> + */
> +static struct cpumask cpuhp_online_new;
> +
> +static void blk_mq_queue_reinit_work(void)
> +{
> +	struct request_queue *q;
> +
> +	mutex_lock(&all_q_mutex);
> +	/*
> +	 * We need to freeze and reinit all existing queues.  Freezing
> +	 * involves synchronous wait for an RCU grace period and doing it
> +	 * one by one may take a long time.  Start freezing all queues in
> +	 * one swoop and then wait for the completions so that freezing can
> +	 * take place in parallel.
> +	 */
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_freeze_queue_start(q);
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_freeze_queue_wait(q);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_queue_reinit(q, &cpuhp_online_new);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_unfreeze_queue(q);
> +
> +	mutex_unlock(&all_q_mutex);
> +}
> +
> +static int blk_mq_queue_reinit_dead(unsigned int cpu)
> +{
> +	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> +	blk_mq_queue_reinit_work();
> +	return 0;
> +}
> +
> +/*
> + * Before hotadded cpu starts handling requests, new mappings must be
> + * established.  Otherwise, these requests in hw queue might never be
> + * dispatched.
> + *
> + * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
> + * for CPU0, and ctx1 for CPU1).
> + *
> + * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
> + * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
> + *
> + * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
> + * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
> + * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
> + * ignored.
> + */
> +static int blk_mq_queue_reinit_prepare(unsigned int cpu)
> +{
> +	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> +	cpumask_set_cpu(cpu, &cpuhp_online_new);
> +	blk_mq_queue_reinit_work();
> +	return 0;
> +}
> +
>  static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>  {
>  	int i;
> @@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  	blk_mq_update_queue_map(set);
>  	list_for_each_entry(q, &set->tag_list, tag_set_list) {
>  		blk_mq_realloc_hw_ctxs(set, q);
> -		blk_mq_queue_reinit(q);
> +		blk_mq_queue_reinit(q, cpu_online_mask);
>  	}
> 
>  	list_for_each_entry(q, &set->tag_list, tag_set_list)
> @@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
>  	return __blk_mq_poll(hctx, rq);
>  }
> 
> +void blk_mq_disable_hotplug(void)
> +{
> +	mutex_lock(&all_q_mutex);
> +}
> +
> +void blk_mq_enable_hotplug(void)
> +{
> +	mutex_unlock(&all_q_mutex);
> +}
> +
>  static int __init blk_mq_init(void)
>  {
>  	/*
> @@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
> 
>  	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
>  				blk_mq_hctx_notify_dead);
> +
> +	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> +				  blk_mq_queue_reinit_prepare,
> +				  blk_mq_queue_reinit_dead);
>  	return 0;
>  }
>  subsys_initcall(blk_mq_init);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 6c7c3ff5bf62..83b13ef1915e 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
>  void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
>  				struct list_head *list);
> +/*
> + * CPU hotplug helpers
> + */
> +void blk_mq_enable_hotplug(void);
> +void blk_mq_disable_hotplug(void);
> 
>  /*
>   * CPU -> queue mappings
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 201ab7267986..c31d4e3bf6d0 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -76,6 +76,7 @@ enum cpuhp_state {
>  	CPUHP_XEN_EVTCHN_PREPARE,
>  	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
>  	CPUHP_SH_SH3X_PREPARE,
> +	CPUHP_BLK_MQ_PREPARE,
>  	CPUHP_NET_FLOW_PREPARE,
>  	CPUHP_TOPOLOGY_PREPARE,
>  	CPUHP_NET_IUCV_PREPARE,
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:12                                 ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 08:30 PM, Jens Axboe wrote:
> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>> Bisect points to
>>>>>>>>
>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>
>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>     
>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>     
>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>     the code.
>>>>>>>>     
>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>
>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>> take a look.
>>>>>>
>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>
>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>> would trigger. What environment are you running this in? We might have
>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>> for a dead cpu and handle that.
>>>>>
>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>> not available CPU.
>>>>>
>>>>> in libvirt/virsh speak:
>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>
>>>> So that's why we run into problems. It's not present when we load the device,
>>>> but becomes present and online afterwards.
>>>>
>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>
>>>> I'll see if I can come up with an appropriate fix.
>>>
>>> Can you try the below?
>>
>>
>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>
>>
>> output with 2 cpus:
>> /sys/kernel/debug/block/vda
>> /sys/kernel/debug/block/vda/hctx0
>> /sys/kernel/debug/block/vda/hctx0/cpu0
>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>> /sys/kernel/debug/block/vda/hctx0/active
>> /sys/kernel/debug/block/vda/hctx0/run
>> /sys/kernel/debug/block/vda/hctx0/queued
>> /sys/kernel/debug/block/vda/hctx0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/io_poll
>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/tags
>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>> /sys/kernel/debug/block/vda/hctx0/busy
>> /sys/kernel/debug/block/vda/hctx0/dispatch
>> /sys/kernel/debug/block/vda/hctx0/flags
>> /sys/kernel/debug/block/vda/hctx0/state
>> /sys/kernel/debug/block/vda/sched
>> /sys/kernel/debug/block/vda/sched/dispatch
>> /sys/kernel/debug/block/vda/sched/starved
>> /sys/kernel/debug/block/vda/sched/batching
>> /sys/kernel/debug/block/vda/sched/write_next_rq
>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>> /sys/kernel/debug/block/vda/sched/read_next_rq
>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>> /sys/kernel/debug/block/vda/write_hints
>> /sys/kernel/debug/block/vda/state
>> /sys/kernel/debug/block/vda/requeue_list
>> /sys/kernel/debug/block/vda/poll_stat
> 
> Try this, basically just a revert.

Yes, seems to work.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Do you know why the original commit made it into 4.12 stable? After all
it has no Fixes tag and no cc stable-


> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..bc1950fa9ef6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -37,6 +37,9 @@
>  #include "blk-wbt.h"
>  #include "blk-mq-sched.h"
> 
> +static DEFINE_MUTEX(all_q_mutex);
> +static LIST_HEAD(all_q_list);
> +
>  static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> @@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
>  		INIT_LIST_HEAD(&__ctx->rq_list);
>  		__ctx->queue = q;
> 
> -		/* If the cpu isn't present, the cpu is mapped to first hctx */
> -		if (!cpu_present(i))
> +		/* If the cpu isn't online, the cpu is mapped to first hctx */
> +		if (!cpu_online(i))
>  			continue;
> 
>  		hctx = blk_mq_map_queue(q, i);
> @@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
>  	}
>  }
> 
> -static void blk_mq_map_swqueue(struct request_queue *q)
> +static void blk_mq_map_swqueue(struct request_queue *q,
> +			       const struct cpumask *online_mask)
>  {
>  	unsigned int i, hctx_idx;
>  	struct blk_mq_hw_ctx *hctx;
> @@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  	}
> 
>  	/*
> -	 * Map software to hardware queues.
> -	 *
> -	 * If the cpu isn't present, the cpu is mapped to first hctx.
> +	 * Map software to hardware queues
>  	 */
> -	for_each_present_cpu(i) {
> +	for_each_possible_cpu(i) {
> +		/* If the cpu isn't online, the cpu is mapped to first hctx */
> +		if (!cpumask_test_cpu(i, online_mask))
> +			continue;
> +
>  		hctx_idx = q->mq_map[i];
>  		/* unmapped hw queue can be remapped after CPU topo changed */
>  		if (!set->tags[hctx_idx] &&
> @@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  		blk_queue_softirq_done(q, set->ops->complete);
> 
>  	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
> +
> +	get_online_cpus();
> +	mutex_lock(&all_q_mutex);
> +
> +	list_add_tail(&q->all_q_node, &all_q_list);
>  	blk_mq_add_queue_tag_set(set, q);
> -	blk_mq_map_swqueue(q);
> +	blk_mq_map_swqueue(q, cpu_online_mask);
> +
> +	mutex_unlock(&all_q_mutex);
> +	put_online_cpus();
> 
>  	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
>  		int ret;
> @@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
>  {
>  	struct blk_mq_tag_set	*set = q->tag_set;
> 
> +	mutex_lock(&all_q_mutex);
> +	list_del_init(&q->all_q_node);
> +	mutex_unlock(&all_q_mutex);
> +
>  	blk_mq_del_queue_tag_set(q);
> +
>  	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
>  }
> 
>  /* Basically redo blk_mq_init_queue with queue frozen */
> -static void blk_mq_queue_reinit(struct request_queue *q)
> +static void blk_mq_queue_reinit(struct request_queue *q,
> +				const struct cpumask *online_mask)
>  {
>  	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
> 
> @@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
>  	 * we should change hctx numa_node according to the new topology (this
>  	 * involves freeing and re-allocating memory, worth doing?)
>  	 */
> -	blk_mq_map_swqueue(q);
> +	blk_mq_map_swqueue(q, online_mask);
> 
>  	blk_mq_sysfs_register(q);
>  	blk_mq_debugfs_register_hctxs(q);
>  }
> 
> +/*
> + * New online cpumask which is going to be set in this hotplug event.
> + * Declare this cpumasks as global as cpu-hotplug operation is invoked
> + * one-by-one and dynamically allocating this could result in a failure.
> + */
> +static struct cpumask cpuhp_online_new;
> +
> +static void blk_mq_queue_reinit_work(void)
> +{
> +	struct request_queue *q;
> +
> +	mutex_lock(&all_q_mutex);
> +	/*
> +	 * We need to freeze and reinit all existing queues.  Freezing
> +	 * involves synchronous wait for an RCU grace period and doing it
> +	 * one by one may take a long time.  Start freezing all queues in
> +	 * one swoop and then wait for the completions so that freezing can
> +	 * take place in parallel.
> +	 */
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_freeze_queue_start(q);
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_freeze_queue_wait(q);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_queue_reinit(q, &cpuhp_online_new);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_unfreeze_queue(q);
> +
> +	mutex_unlock(&all_q_mutex);
> +}
> +
> +static int blk_mq_queue_reinit_dead(unsigned int cpu)
> +{
> +	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> +	blk_mq_queue_reinit_work();
> +	return 0;
> +}
> +
> +/*
> + * Before hotadded cpu starts handling requests, new mappings must be
> + * established.  Otherwise, these requests in hw queue might never be
> + * dispatched.
> + *
> + * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
> + * for CPU0, and ctx1 for CPU1).
> + *
> + * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
> + * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
> + *
> + * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
> + * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
> + * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
> + * ignored.
> + */
> +static int blk_mq_queue_reinit_prepare(unsigned int cpu)
> +{
> +	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> +	cpumask_set_cpu(cpu, &cpuhp_online_new);
> +	blk_mq_queue_reinit_work();
> +	return 0;
> +}
> +
>  static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>  {
>  	int i;
> @@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  	blk_mq_update_queue_map(set);
>  	list_for_each_entry(q, &set->tag_list, tag_set_list) {
>  		blk_mq_realloc_hw_ctxs(set, q);
> -		blk_mq_queue_reinit(q);
> +		blk_mq_queue_reinit(q, cpu_online_mask);
>  	}
> 
>  	list_for_each_entry(q, &set->tag_list, tag_set_list)
> @@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
>  	return __blk_mq_poll(hctx, rq);
>  }
> 
> +void blk_mq_disable_hotplug(void)
> +{
> +	mutex_lock(&all_q_mutex);
> +}
> +
> +void blk_mq_enable_hotplug(void)
> +{
> +	mutex_unlock(&all_q_mutex);
> +}
> +
>  static int __init blk_mq_init(void)
>  {
>  	/*
> @@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
> 
>  	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
>  				blk_mq_hctx_notify_dead);
> +
> +	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> +				  blk_mq_queue_reinit_prepare,
> +				  blk_mq_queue_reinit_dead);
>  	return 0;
>  }
>  subsys_initcall(blk_mq_init);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 6c7c3ff5bf62..83b13ef1915e 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
>  void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
>  				struct list_head *list);
> +/*
> + * CPU hotplug helpers
> + */
> +void blk_mq_enable_hotplug(void);
> +void blk_mq_disable_hotplug(void);
> 
>  /*
>   * CPU -> queue mappings
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 201ab7267986..c31d4e3bf6d0 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -76,6 +76,7 @@ enum cpuhp_state {
>  	CPUHP_XEN_EVTCHN_PREPARE,
>  	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
>  	CPUHP_SH_SH3X_PREPARE,
> +	CPUHP_BLK_MQ_PREPARE,
>  	CPUHP_NET_FLOW_PREPARE,
>  	CPUHP_TOPOLOGY_PREPARE,
>  	CPUHP_NET_IUCV_PREPARE,
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 19:30                               ` Jens Axboe
  (?)
@ 2017-11-21 20:12                               ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig



On 11/21/2017 08:30 PM, Jens Axboe wrote:
> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>> Bisect points to
>>>>>>>>
>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>
>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>     
>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>     
>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>     the code.
>>>>>>>>     
>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>
>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>> take a look.
>>>>>>
>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>
>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>> would trigger. What environment are you running this in? We might have
>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>> for a dead cpu and handle that.
>>>>>
>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>> not available CPU.
>>>>>
>>>>> in libvirt/virsh speak:
>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>
>>>> So that's why we run into problems. It's not present when we load the device,
>>>> but becomes present and online afterwards.
>>>>
>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>
>>>> I'll see if I can come up with an appropriate fix.
>>>
>>> Can you try the below?
>>
>>
>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>
>>
>> output with 2 cpus:
>> /sys/kernel/debug/block/vda
>> /sys/kernel/debug/block/vda/hctx0
>> /sys/kernel/debug/block/vda/hctx0/cpu0
>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>> /sys/kernel/debug/block/vda/hctx0/active
>> /sys/kernel/debug/block/vda/hctx0/run
>> /sys/kernel/debug/block/vda/hctx0/queued
>> /sys/kernel/debug/block/vda/hctx0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/io_poll
>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/tags
>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>> /sys/kernel/debug/block/vda/hctx0/busy
>> /sys/kernel/debug/block/vda/hctx0/dispatch
>> /sys/kernel/debug/block/vda/hctx0/flags
>> /sys/kernel/debug/block/vda/hctx0/state
>> /sys/kernel/debug/block/vda/sched
>> /sys/kernel/debug/block/vda/sched/dispatch
>> /sys/kernel/debug/block/vda/sched/starved
>> /sys/kernel/debug/block/vda/sched/batching
>> /sys/kernel/debug/block/vda/sched/write_next_rq
>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>> /sys/kernel/debug/block/vda/sched/read_next_rq
>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>> /sys/kernel/debug/block/vda/write_hints
>> /sys/kernel/debug/block/vda/state
>> /sys/kernel/debug/block/vda/requeue_list
>> /sys/kernel/debug/block/vda/poll_stat
> 
> Try this, basically just a revert.

Yes, seems to work.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Do you know why the original commit made it into 4.12 stable? After all
it has no Fixes tag and no cc stable-


> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..bc1950fa9ef6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -37,6 +37,9 @@
>  #include "blk-wbt.h"
>  #include "blk-mq-sched.h"
> 
> +static DEFINE_MUTEX(all_q_mutex);
> +static LIST_HEAD(all_q_list);
> +
>  static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> @@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
>  		INIT_LIST_HEAD(&__ctx->rq_list);
>  		__ctx->queue = q;
> 
> -		/* If the cpu isn't present, the cpu is mapped to first hctx */
> -		if (!cpu_present(i))
> +		/* If the cpu isn't online, the cpu is mapped to first hctx */
> +		if (!cpu_online(i))
>  			continue;
> 
>  		hctx = blk_mq_map_queue(q, i);
> @@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
>  	}
>  }
> 
> -static void blk_mq_map_swqueue(struct request_queue *q)
> +static void blk_mq_map_swqueue(struct request_queue *q,
> +			       const struct cpumask *online_mask)
>  {
>  	unsigned int i, hctx_idx;
>  	struct blk_mq_hw_ctx *hctx;
> @@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  	}
> 
>  	/*
> -	 * Map software to hardware queues.
> -	 *
> -	 * If the cpu isn't present, the cpu is mapped to first hctx.
> +	 * Map software to hardware queues
>  	 */
> -	for_each_present_cpu(i) {
> +	for_each_possible_cpu(i) {
> +		/* If the cpu isn't online, the cpu is mapped to first hctx */
> +		if (!cpumask_test_cpu(i, online_mask))
> +			continue;
> +
>  		hctx_idx = q->mq_map[i];
>  		/* unmapped hw queue can be remapped after CPU topo changed */
>  		if (!set->tags[hctx_idx] &&
> @@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  		blk_queue_softirq_done(q, set->ops->complete);
> 
>  	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
> +
> +	get_online_cpus();
> +	mutex_lock(&all_q_mutex);
> +
> +	list_add_tail(&q->all_q_node, &all_q_list);
>  	blk_mq_add_queue_tag_set(set, q);
> -	blk_mq_map_swqueue(q);
> +	blk_mq_map_swqueue(q, cpu_online_mask);
> +
> +	mutex_unlock(&all_q_mutex);
> +	put_online_cpus();
> 
>  	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
>  		int ret;
> @@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
>  {
>  	struct blk_mq_tag_set	*set = q->tag_set;
> 
> +	mutex_lock(&all_q_mutex);
> +	list_del_init(&q->all_q_node);
> +	mutex_unlock(&all_q_mutex);
> +
>  	blk_mq_del_queue_tag_set(q);
> +
>  	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
>  }
> 
>  /* Basically redo blk_mq_init_queue with queue frozen */
> -static void blk_mq_queue_reinit(struct request_queue *q)
> +static void blk_mq_queue_reinit(struct request_queue *q,
> +				const struct cpumask *online_mask)
>  {
>  	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
> 
> @@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
>  	 * we should change hctx numa_node according to the new topology (this
>  	 * involves freeing and re-allocating memory, worth doing?)
>  	 */
> -	blk_mq_map_swqueue(q);
> +	blk_mq_map_swqueue(q, online_mask);
> 
>  	blk_mq_sysfs_register(q);
>  	blk_mq_debugfs_register_hctxs(q);
>  }
> 
> +/*
> + * New online cpumask which is going to be set in this hotplug event.
> + * Declare this cpumasks as global as cpu-hotplug operation is invoked
> + * one-by-one and dynamically allocating this could result in a failure.
> + */
> +static struct cpumask cpuhp_online_new;
> +
> +static void blk_mq_queue_reinit_work(void)
> +{
> +	struct request_queue *q;
> +
> +	mutex_lock(&all_q_mutex);
> +	/*
> +	 * We need to freeze and reinit all existing queues.  Freezing
> +	 * involves synchronous wait for an RCU grace period and doing it
> +	 * one by one may take a long time.  Start freezing all queues in
> +	 * one swoop and then wait for the completions so that freezing can
> +	 * take place in parallel.
> +	 */
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_freeze_queue_start(q);
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_freeze_queue_wait(q);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_queue_reinit(q, &cpuhp_online_new);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_unfreeze_queue(q);
> +
> +	mutex_unlock(&all_q_mutex);
> +}
> +
> +static int blk_mq_queue_reinit_dead(unsigned int cpu)
> +{
> +	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> +	blk_mq_queue_reinit_work();
> +	return 0;
> +}
> +
> +/*
> + * Before hotadded cpu starts handling requests, new mappings must be
> + * established.  Otherwise, these requests in hw queue might never be
> + * dispatched.
> + *
> + * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
> + * for CPU0, and ctx1 for CPU1).
> + *
> + * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
> + * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
> + *
> + * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
> + * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
> + * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
> + * ignored.
> + */
> +static int blk_mq_queue_reinit_prepare(unsigned int cpu)
> +{
> +	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> +	cpumask_set_cpu(cpu, &cpuhp_online_new);
> +	blk_mq_queue_reinit_work();
> +	return 0;
> +}
> +
>  static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>  {
>  	int i;
> @@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  	blk_mq_update_queue_map(set);
>  	list_for_each_entry(q, &set->tag_list, tag_set_list) {
>  		blk_mq_realloc_hw_ctxs(set, q);
> -		blk_mq_queue_reinit(q);
> +		blk_mq_queue_reinit(q, cpu_online_mask);
>  	}
> 
>  	list_for_each_entry(q, &set->tag_list, tag_set_list)
> @@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
>  	return __blk_mq_poll(hctx, rq);
>  }
> 
> +void blk_mq_disable_hotplug(void)
> +{
> +	mutex_lock(&all_q_mutex);
> +}
> +
> +void blk_mq_enable_hotplug(void)
> +{
> +	mutex_unlock(&all_q_mutex);
> +}
> +
>  static int __init blk_mq_init(void)
>  {
>  	/*
> @@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
> 
>  	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
>  				blk_mq_hctx_notify_dead);
> +
> +	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> +				  blk_mq_queue_reinit_prepare,
> +				  blk_mq_queue_reinit_dead);
>  	return 0;
>  }
>  subsys_initcall(blk_mq_init);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 6c7c3ff5bf62..83b13ef1915e 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
>  void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
>  				struct list_head *list);
> +/*
> + * CPU hotplug helpers
> + */
> +void blk_mq_enable_hotplug(void);
> +void blk_mq_disable_hotplug(void);
> 
>  /*
>   * CPU -> queue mappings
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 201ab7267986..c31d4e3bf6d0 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -76,6 +76,7 @@ enum cpuhp_state {
>  	CPUHP_XEN_EVTCHN_PREPARE,
>  	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
>  	CPUHP_SH_SH3X_PREPARE,
> +	CPUHP_BLK_MQ_PREPARE,
>  	CPUHP_NET_FLOW_PREPARE,
>  	CPUHP_TOPOLOGY_PREPARE,
>  	CPUHP_NET_IUCV_PREPARE,
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:12                                 ` Christian Borntraeger
@ 2017-11-21 20:14                                   ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:14 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>> Bisect points to
>>>>>>>>>
>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>
>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>     
>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>     
>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>     the code.
>>>>>>>>>     
>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>
>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>> take a look.
>>>>>>>
>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>
>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>> for a dead cpu and handle that.
>>>>>>
>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>> not available CPU.
>>>>>>
>>>>>> in libvirt/virsh speak:
>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>
>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>> but becomes present and online afterwards.
>>>>>
>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>
>>>>> I'll see if I can come up with an appropriate fix.
>>>>
>>>> Can you try the below?
>>>
>>>
>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>
>>>
>>> output with 2 cpus:
>>> /sys/kernel/debug/block/vda
>>> /sys/kernel/debug/block/vda/hctx0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>> /sys/kernel/debug/block/vda/hctx0/active
>>> /sys/kernel/debug/block/vda/hctx0/run
>>> /sys/kernel/debug/block/vda/hctx0/queued
>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/tags
>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>> /sys/kernel/debug/block/vda/hctx0/busy
>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>> /sys/kernel/debug/block/vda/hctx0/flags
>>> /sys/kernel/debug/block/vda/hctx0/state
>>> /sys/kernel/debug/block/vda/sched
>>> /sys/kernel/debug/block/vda/sched/dispatch
>>> /sys/kernel/debug/block/vda/sched/starved
>>> /sys/kernel/debug/block/vda/sched/batching
>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>> /sys/kernel/debug/block/vda/write_hints
>>> /sys/kernel/debug/block/vda/state
>>> /sys/kernel/debug/block/vda/requeue_list
>>> /sys/kernel/debug/block/vda/poll_stat
>>
>> Try this, basically just a revert.
> 
> Yes, seems to work.
> 
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Great, thanks for testing.

> Do you know why the original commit made it into 4.12 stable? After all
> it has no Fixes tag and no cc stable-

I was wondering the same thing when you said it was in 4.12.stable and
not in 4.12 release. That patch should absolutely not have gone into
stable, it's not marked as such and it's not fixing a problem that is
stable worthy. In fact, it's causing a regression...

Greg? Upstream commit is mentioned higher up, start of the email.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:14                                   ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:14 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig

On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>> Bisect points to
>>>>>>>>>
>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>
>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>     
>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>     
>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>     the code.
>>>>>>>>>     
>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>
>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>> take a look.
>>>>>>>
>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>
>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>> for a dead cpu and handle that.
>>>>>>
>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>> not available CPU.
>>>>>>
>>>>>> in libvirt/virsh speak:
>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>
>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>> but becomes present and online afterwards.
>>>>>
>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>
>>>>> I'll see if I can come up with an appropriate fix.
>>>>
>>>> Can you try the below?
>>>
>>>
>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>
>>>
>>> output with 2 cpus:
>>> /sys/kernel/debug/block/vda
>>> /sys/kernel/debug/block/vda/hctx0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>> /sys/kernel/debug/block/vda/hctx0/active
>>> /sys/kernel/debug/block/vda/hctx0/run
>>> /sys/kernel/debug/block/vda/hctx0/queued
>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/tags
>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>> /sys/kernel/debug/block/vda/hctx0/busy
>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>> /sys/kernel/debug/block/vda/hctx0/flags
>>> /sys/kernel/debug/block/vda/hctx0/state
>>> /sys/kernel/debug/block/vda/sched
>>> /sys/kernel/debug/block/vda/sched/dispatch
>>> /sys/kernel/debug/block/vda/sched/starved
>>> /sys/kernel/debug/block/vda/sched/batching
>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>> /sys/kernel/debug/block/vda/write_hints
>>> /sys/kernel/debug/block/vda/state
>>> /sys/kernel/debug/block/vda/requeue_list
>>> /sys/kernel/debug/block/vda/poll_stat
>>
>> Try this, basically just a revert.
> 
> Yes, seems to work.
> 
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Great, thanks for testing.

> Do you know why the original commit made it into 4.12 stable? After all
> it has no Fixes tag and no cc stable-

I was wondering the same thing when you said it was in 4.12.stable and
not in 4.12 release. That patch should absolutely not have gone into
stable, it's not marked as such and it's not fixing a problem that is
stable worthy. In fact, it's causing a regression...

Greg? Upstream commit is mentioned higher up, start of the email.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:14                                   ` Jens Axboe
@ 2017-11-21 20:19                                     ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:19 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
	stable


On 11/21/2017 09:14 PM, Jens Axboe wrote:
> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>> Bisect points to
>>>>>>>>>>
>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>
>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>     
>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>     
>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>     the code.
>>>>>>>>>>     
>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>
>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>> take a look.
>>>>>>>>
>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>
>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>> for a dead cpu and handle that.
>>>>>>>
>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>> not available CPU.
>>>>>>>
>>>>>>> in libvirt/virsh speak:
>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>
>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>> but becomes present and online afterwards.
>>>>>>
>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>
>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>
>>>>> Can you try the below?
>>>>
>>>>
>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>
>>>>
>>>> output with 2 cpus:
>>>> /sys/kernel/debug/block/vda
>>>> /sys/kernel/debug/block/vda/hctx0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>> /sys/kernel/debug/block/vda/sched
>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>> /sys/kernel/debug/block/vda/sched/starved
>>>> /sys/kernel/debug/block/vda/sched/batching
>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>> /sys/kernel/debug/block/vda/write_hints
>>>> /sys/kernel/debug/block/vda/state
>>>> /sys/kernel/debug/block/vda/requeue_list
>>>> /sys/kernel/debug/block/vda/poll_stat
>>>
>>> Try this, basically just a revert.
>>
>> Yes, seems to work.
>>
>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> Great, thanks for testing.
> 
>> Do you know why the original commit made it into 4.12 stable? After all
>> it has no Fixes tag and no cc stable-
> 
> I was wondering the same thing when you said it was in 4.12.stable and
> not in 4.12 release. That patch should absolutely not have gone into
> stable, it's not marked as such and it's not fixing a problem that is
> stable worthy. In fact, it's causing a regression...
> 
> Greg? Upstream commit is mentioned higher up, start of the email.
> 


Forgot to cc Greg?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:19                                     ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:19 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
	stable


On 11/21/2017 09:14 PM, Jens Axboe wrote:
> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>> Bisect points to
>>>>>>>>>>
>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>
>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>     
>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>     
>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>     the code.
>>>>>>>>>>     
>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>
>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>> take a look.
>>>>>>>>
>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>
>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>> for a dead cpu and handle that.
>>>>>>>
>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>> not available CPU.
>>>>>>>
>>>>>>> in libvirt/virsh speak:
>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>
>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>> but becomes present and online afterwards.
>>>>>>
>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>
>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>
>>>>> Can you try the below?
>>>>
>>>>
>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>
>>>>
>>>> output with 2 cpus:
>>>> /sys/kernel/debug/block/vda
>>>> /sys/kernel/debug/block/vda/hctx0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>> /sys/kernel/debug/block/vda/sched
>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>> /sys/kernel/debug/block/vda/sched/starved
>>>> /sys/kernel/debug/block/vda/sched/batching
>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>> /sys/kernel/debug/block/vda/write_hints
>>>> /sys/kernel/debug/block/vda/state
>>>> /sys/kernel/debug/block/vda/requeue_list
>>>> /sys/kernel/debug/block/vda/poll_stat
>>>
>>> Try this, basically just a revert.
>>
>> Yes, seems to work.
>>
>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> Great, thanks for testing.
> 
>> Do you know why the original commit made it into 4.12 stable? After all
>> it has no Fixes tag and no cc stable-
> 
> I was wondering the same thing when you said it was in 4.12.stable and
> not in 4.12 release. That patch should absolutely not have gone into
> stable, it's not marked as such and it's not fixing a problem that is
> stable worthy. In fact, it's causing a regression...
> 
> Greg? Upstream commit is mentioned higher up, start of the email.
> 


Forgot to cc Greg?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:14                                   ` Jens Axboe
  (?)
  (?)
@ 2017-11-21 20:19                                   ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:19 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
	stable


On 11/21/2017 09:14 PM, Jens Axboe wrote:
> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>> Bisect points to
>>>>>>>>>>
>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>
>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>     
>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>     
>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>     the code.
>>>>>>>>>>     
>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>
>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>> take a look.
>>>>>>>>
>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>
>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>> for a dead cpu and handle that.
>>>>>>>
>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>> not available CPU.
>>>>>>>
>>>>>>> in libvirt/virsh speak:
>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>
>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>> but becomes present and online afterwards.
>>>>>>
>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>
>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>
>>>>> Can you try the below?
>>>>
>>>>
>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>
>>>>
>>>> output with 2 cpus:
>>>> /sys/kernel/debug/block/vda
>>>> /sys/kernel/debug/block/vda/hctx0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>> /sys/kernel/debug/block/vda/sched
>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>> /sys/kernel/debug/block/vda/sched/starved
>>>> /sys/kernel/debug/block/vda/sched/batching
>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>> /sys/kernel/debug/block/vda/write_hints
>>>> /sys/kernel/debug/block/vda/state
>>>> /sys/kernel/debug/block/vda/requeue_list
>>>> /sys/kernel/debug/block/vda/poll_stat
>>>
>>> Try this, basically just a revert.
>>
>> Yes, seems to work.
>>
>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> Great, thanks for testing.
> 
>> Do you know why the original commit made it into 4.12 stable? After all
>> it has no Fixes tag and no cc stable-
> 
> I was wondering the same thing when you said it was in 4.12.stable and
> not in 4.12 release. That patch should absolutely not have gone into
> stable, it's not marked as such and it's not fixing a problem that is
> stable worthy. In fact, it's causing a regression...
> 
> Greg? Upstream commit is mentioned higher up, start of the email.
> 


Forgot to cc Greg?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:19                                     ` Christian Borntraeger
@ 2017-11-21 20:21                                       ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:21 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
	Greg Kroah-Hartman, stable

On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
> 
> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>> Bisect points to
>>>>>>>>>>>
>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>
>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>     
>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>     
>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>     the code.
>>>>>>>>>>>     
>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>
>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>> take a look.
>>>>>>>>>
>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>
>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>
>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>> not available CPU.
>>>>>>>>
>>>>>>>> in libvirt/virsh speak:
>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>
>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>> but becomes present and online afterwards.
>>>>>>>
>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>
>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>
>>>>>> Can you try the below?
>>>>>
>>>>>
>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>
>>>>>
>>>>> output with 2 cpus:
>>>>> /sys/kernel/debug/block/vda
>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>> /sys/kernel/debug/block/vda/sched
>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>> /sys/kernel/debug/block/vda/state
>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>
>>>> Try this, basically just a revert.
>>>
>>> Yes, seems to work.
>>>
>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>
>> Great, thanks for testing.
>>
>>> Do you know why the original commit made it into 4.12 stable? After all
>>> it has no Fixes tag and no cc stable-
>>
>> I was wondering the same thing when you said it was in 4.12.stable and
>> not in 4.12 release. That patch should absolutely not have gone into
>> stable, it's not marked as such and it's not fixing a problem that is
>> stable worthy. In fact, it's causing a regression...
>>
>> Greg? Upstream commit is mentioned higher up, start of the email.
>>
> 
> 
> Forgot to cc Greg?

I did, thanks for doing that. Now I wonder how to mark this patch,
as we should revert it from kernels that have the bad commit. 4.12
is fine, 4.12.later-stable is not.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:21                                       ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:21 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
	Greg Kroah-Hartman, stable

On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
> 
> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>> Bisect points to
>>>>>>>>>>>
>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>
>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>     
>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>     
>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>     the code.
>>>>>>>>>>>     
>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>
>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>> take a look.
>>>>>>>>>
>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>
>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>
>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>> not available CPU.
>>>>>>>>
>>>>>>>> in libvirt/virsh speak:
>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>
>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>> but becomes present and online afterwards.
>>>>>>>
>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>
>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>
>>>>>> Can you try the below?
>>>>>
>>>>>
>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>
>>>>>
>>>>> output with 2 cpus:
>>>>> /sys/kernel/debug/block/vda
>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>> /sys/kernel/debug/block/vda/sched
>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>> /sys/kernel/debug/block/vda/state
>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>
>>>> Try this, basically just a revert.
>>>
>>> Yes, seems to work.
>>>
>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>
>> Great, thanks for testing.
>>
>>> Do you know why the original commit made it into 4.12 stable? After all
>>> it has no Fixes tag and no cc stable-
>>
>> I was wondering the same thing when you said it was in 4.12.stable and
>> not in 4.12 release. That patch should absolutely not have gone into
>> stable, it's not marked as such and it's not fixing a problem that is
>> stable worthy. In fact, it's causing a regression...
>>
>> Greg? Upstream commit is mentioned higher up, start of the email.
>>
> 
> 
> Forgot to cc Greg?

I did, thanks for doing that. Now I wonder how to mark this patch,
as we should revert it from kernels that have the bad commit. 4.12
is fine, 4.12.later-stable is not.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:19                                     ` Christian Borntraeger
  (?)
  (?)
@ 2017-11-21 20:21                                     ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:21 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
	Greg Kroah-Hartman, stable

On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
> 
> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>> Bisect points to
>>>>>>>>>>>
>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>
>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>     
>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>     
>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>     the code.
>>>>>>>>>>>     
>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>
>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>> take a look.
>>>>>>>>>
>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>
>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>
>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>> not available CPU.
>>>>>>>>
>>>>>>>> in libvirt/virsh speak:
>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>
>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>> but becomes present and online afterwards.
>>>>>>>
>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>
>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>
>>>>>> Can you try the below?
>>>>>
>>>>>
>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>
>>>>>
>>>>> output with 2 cpus:
>>>>> /sys/kernel/debug/block/vda
>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>> /sys/kernel/debug/block/vda/sched
>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>> /sys/kernel/debug/block/vda/state
>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>
>>>> Try this, basically just a revert.
>>>
>>> Yes, seems to work.
>>>
>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>
>> Great, thanks for testing.
>>
>>> Do you know why the original commit made it into 4.12 stable? After all
>>> it has no Fixes tag and no cc stable-
>>
>> I was wondering the same thing when you said it was in 4.12.stable and
>> not in 4.12 release. That patch should absolutely not have gone into
>> stable, it's not marked as such and it's not fixing a problem that is
>> stable worthy. In fact, it's causing a regression...
>>
>> Greg? Upstream commit is mentioned higher up, start of the email.
>>
> 
> 
> Forgot to cc Greg?

I did, thanks for doing that. Now I wonder how to mark this patch,
as we should revert it from kernels that have the bad commit. 4.12
is fine, 4.12.later-stable is not.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:21                                       ` Jens Axboe
@ 2017-11-21 20:31                                         ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:31 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
	stable



On 11/21/2017 09:21 PM, Jens Axboe wrote:
> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>
>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>
>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>
>>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>     
>>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>     
>>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>     the code.
>>>>>>>>>>>>     
>>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>
>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>> take a look.
>>>>>>>>>>
>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>
>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>
>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>> not available CPU.
>>>>>>>>>
>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>
>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>> but becomes present and online afterwards.
>>>>>>>>
>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>
>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>
>>>>>>> Can you try the below?
>>>>>>
>>>>>>
>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>
>>>>>>
>>>>>> output with 2 cpus:
>>>>>> /sys/kernel/debug/block/vda
>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>> /sys/kernel/debug/block/vda/state
>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>
>>>>> Try this, basically just a revert.
>>>>
>>>> Yes, seems to work.
>>>>
>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> Great, thanks for testing.
>>>
>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>> it has no Fixes tag and no cc stable-
>>>
>>> I was wondering the same thing when you said it was in 4.12.stable and
>>> not in 4.12 release. That patch should absolutely not have gone into
>>> stable, it's not marked as such and it's not fixing a problem that is
>>> stable worthy. In fact, it's causing a regression...
>>>
>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>
>>
>>
>> Forgot to cc Greg?
> 
> I did, thanks for doing that. Now I wonder how to mark this patch,
> as we should revert it from kernels that have the bad commit. 4.12
> is fine, 4.12.later-stable is not.
> 

I think we should tag it with:

Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")

which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:31                                         ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:31 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
	stable



On 11/21/2017 09:21 PM, Jens Axboe wrote:
> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>
>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>
>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>
>>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>     
>>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>     
>>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>     the code.
>>>>>>>>>>>>     
>>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>
>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>> take a look.
>>>>>>>>>>
>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>
>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>
>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>> not available CPU.
>>>>>>>>>
>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>
>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>> but becomes present and online afterwards.
>>>>>>>>
>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>
>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>
>>>>>>> Can you try the below?
>>>>>>
>>>>>>
>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>
>>>>>>
>>>>>> output with 2 cpus:
>>>>>> /sys/kernel/debug/block/vda
>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>> /sys/kernel/debug/block/vda/state
>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>
>>>>> Try this, basically just a revert.
>>>>
>>>> Yes, seems to work.
>>>>
>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> Great, thanks for testing.
>>>
>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>> it has no Fixes tag and no cc stable-
>>>
>>> I was wondering the same thing when you said it was in 4.12.stable and
>>> not in 4.12 release. That patch should absolutely not have gone into
>>> stable, it's not marked as such and it's not fixing a problem that is
>>> stable worthy. In fact, it's causing a regression...
>>>
>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>
>>
>>
>> Forgot to cc Greg?
> 
> I did, thanks for doing that. Now I wonder how to mark this patch,
> as we should revert it from kernels that have the bad commit. 4.12
> is fine, 4.12.later-stable is not.
> 

I think we should tag it with:

Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")

which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:21                                       ` Jens Axboe
  (?)
@ 2017-11-21 20:31                                       ` Christian Borntraeger
  -1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:31 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
	stable



On 11/21/2017 09:21 PM, Jens Axboe wrote:
> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>
>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>
>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>
>>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>     
>>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>     
>>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>     the code.
>>>>>>>>>>>>     
>>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>
>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>> take a look.
>>>>>>>>>>
>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>
>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>
>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>> not available CPU.
>>>>>>>>>
>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>
>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>> but becomes present and online afterwards.
>>>>>>>>
>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>
>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>
>>>>>>> Can you try the below?
>>>>>>
>>>>>>
>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>
>>>>>>
>>>>>> output with 2 cpus:
>>>>>> /sys/kernel/debug/block/vda
>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>> /sys/kernel/debug/block/vda/state
>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>
>>>>> Try this, basically just a revert.
>>>>
>>>> Yes, seems to work.
>>>>
>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> Great, thanks for testing.
>>>
>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>> it has no Fixes tag and no cc stable-
>>>
>>> I was wondering the same thing when you said it was in 4.12.stable and
>>> not in 4.12 release. That patch should absolutely not have gone into
>>> stable, it's not marked as such and it's not fixing a problem that is
>>> stable worthy. In fact, it's causing a regression...
>>>
>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>
>>
>>
>> Forgot to cc Greg?
> 
> I did, thanks for doing that. Now I wonder how to mark this patch,
> as we should revert it from kernels that have the bad commit. 4.12
> is fine, 4.12.later-stable is not.
> 

I think we should tag it with:

Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")

which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:31                                         ` Christian Borntraeger
@ 2017-11-21 20:39                                           ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:39 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
	Greg Kroah-Hartman, stable

On 11/21/2017 01:31 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:21 PM, Jens Axboe wrote:
>> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>>
>>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>>
>>>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>>     the code.
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>>> take a look.
>>>>>>>>>>>
>>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>>
>>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>>
>>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>>> not available CPU.
>>>>>>>>>>
>>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>>
>>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>>> but becomes present and online afterwards.
>>>>>>>>>
>>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>>
>>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>>
>>>>>>>> Can you try the below?
>>>>>>>
>>>>>>>
>>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>>
>>>>>>>
>>>>>>> output with 2 cpus:
>>>>>>> /sys/kernel/debug/block/vda
>>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>>> /sys/kernel/debug/block/vda/state
>>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>>
>>>>>> Try this, basically just a revert.
>>>>>
>>>>> Yes, seems to work.
>>>>>
>>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>>
>>>> Great, thanks for testing.
>>>>
>>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>>> it has no Fixes tag and no cc stable-
>>>>
>>>> I was wondering the same thing when you said it was in 4.12.stable and
>>>> not in 4.12 release. That patch should absolutely not have gone into
>>>> stable, it's not marked as such and it's not fixing a problem that is
>>>> stable worthy. In fact, it's causing a regression...
>>>>
>>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>>
>>>
>>>
>>> Forgot to cc Greg?
>>
>> I did, thanks for doing that. Now I wonder how to mark this patch,
>> as we should revert it from kernels that have the bad commit. 4.12
>> is fine, 4.12.later-stable is not.
>>
> 
> I think we should tag it with:
> 
> Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
> 
> which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.

Yeah, I think so too. But thinking more about this, I'm pretty sure this
adds a bad lock dependency with hotplug. Need to verify so we ensure we
don't introduce a potential deadlock here...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:39                                           ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:39 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
	Greg Kroah-Hartman, stable

On 11/21/2017 01:31 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:21 PM, Jens Axboe wrote:
>> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>>
>>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>>
>>>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>>     the code.
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>>> take a look.
>>>>>>>>>>>
>>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>>
>>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>>
>>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>>> not available CPU.
>>>>>>>>>>
>>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>>
>>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>>> but becomes present and online afterwards.
>>>>>>>>>
>>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>>
>>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>>
>>>>>>>> Can you try the below?
>>>>>>>
>>>>>>>
>>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>>
>>>>>>>
>>>>>>> output with 2 cpus:
>>>>>>> /sys/kernel/debug/block/vda
>>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>>> /sys/kernel/debug/block/vda/state
>>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>>
>>>>>> Try this, basically just a revert.
>>>>>
>>>>> Yes, seems to work.
>>>>>
>>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>>
>>>> Great, thanks for testing.
>>>>
>>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>>> it has no Fixes tag and no cc stable-
>>>>
>>>> I was wondering the same thing when you said it was in 4.12.stable and
>>>> not in 4.12 release. That patch should absolutely not have gone into
>>>> stable, it's not marked as such and it's not fixing a problem that is
>>>> stable worthy. In fact, it's causing a regression...
>>>>
>>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>>
>>>
>>>
>>> Forgot to cc Greg?
>>
>> I did, thanks for doing that. Now I wonder how to mark this patch,
>> as we should revert it from kernels that have the bad commit. 4.12
>> is fine, 4.12.later-stable is not.
>>
> 
> I think we should tag it with:
> 
> Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
> 
> which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.

Yeah, I think so too. But thinking more about this, I'm pretty sure this
adds a bad lock dependency with hotplug. Need to verify so we ensure we
don't introduce a potential deadlock here...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:31                                         ` Christian Borntraeger
  (?)
  (?)
@ 2017-11-21 20:39                                         ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:39 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
	Greg Kroah-Hartman, stable

On 11/21/2017 01:31 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:21 PM, Jens Axboe wrote:
>> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>>
>>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>>
>>>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>>     the code.
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>>> take a look.
>>>>>>>>>>>
>>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>>
>>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>>
>>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>>> not available CPU.
>>>>>>>>>>
>>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>>
>>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>>> but becomes present and online afterwards.
>>>>>>>>>
>>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>>
>>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>>
>>>>>>>> Can you try the below?
>>>>>>>
>>>>>>>
>>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>>
>>>>>>>
>>>>>>> output with 2 cpus:
>>>>>>> /sys/kernel/debug/block/vda
>>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>>> /sys/kernel/debug/block/vda/state
>>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>>
>>>>>> Try this, basically just a revert.
>>>>>
>>>>> Yes, seems to work.
>>>>>
>>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>>
>>>> Great, thanks for testing.
>>>>
>>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>>> it has no Fixes tag and no cc stable-
>>>>
>>>> I was wondering the same thing when you said it was in 4.12.stable and
>>>> not in 4.12 release. That patch should absolutely not have gone into
>>>> stable, it's not marked as such and it's not fixing a problem that is
>>>> stable worthy. In fact, it's causing a regression...
>>>>
>>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>>
>>>
>>>
>>> Forgot to cc Greg?
>>
>> I did, thanks for doing that. Now I wonder how to mark this patch,
>> as we should revert it from kernels that have the bad commit. 4.12
>> is fine, 4.12.later-stable is not.
>>
> 
> I think we should tag it with:
> 
> Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
> 
> which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.

Yeah, I think so too. But thinking more about this, I'm pretty sure this
adds a bad lock dependency with hotplug. Need to verify so we ensure we
don't introduce a potential deadlock here...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:39                                           ` Jens Axboe
@ 2017-11-22  7:28                                             ` Christoph Hellwig
  -1 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-22  7:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
	Greg Kroah-Hartman, stable

Jens, please don't just revert the commit in your for-linus tree.

On its own this will totally mess up the interrupt assignments.  Give
me a bit of time to sort this out properly.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-22  7:28                                             ` Christoph Hellwig
  0 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-22  7:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
	Greg Kroah-Hartman, stable

Jens, please don't just revert the commit in your for-linus tree.

On its own this will totally mess up the interrupt assignments.  Give
me a bit of time to sort this out properly.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:39                                           ` Jens Axboe
  (?)
@ 2017-11-22  7:28                                           ` Christoph Hellwig
  -1 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-22  7:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: mst, Greg Kroah-Hartman, linux-kernel, stable, virtualization,
	linux-block, Bart Van Assche, Christoph Hellwig

Jens, please don't just revert the commit in your for-linus tree.

On its own this will totally mess up the interrupt assignments.  Give
me a bit of time to sort this out properly.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-22  7:28                                             ` Christoph Hellwig
@ 2017-11-22 14:46                                               ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-22 14:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Greg Kroah-Hartman,
	stable

On 11/22/2017 12:28 AM, Christoph Hellwig wrote:
> Jens, please don't just revert the commit in your for-linus tree.
> 
> On its own this will totally mess up the interrupt assignments.  Give
> me a bit of time to sort this out properly.

I wasn't going to push it until I heard otherwise. I'll just pop it
off, for-linus isn't a stable branch.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-22 14:46                                               ` Jens Axboe
  0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-22 14:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Borntraeger, Bart Van Assche, virtualization,
	linux-block, mst, jasowang, linux-kernel, Greg Kroah-Hartman,
	stable

On 11/22/2017 12:28 AM, Christoph Hellwig wrote:
> Jens, please don't just revert the commit in your for-linus tree.
> 
> On its own this will totally mess up the interrupt assignments.  Give
> me a bit of time to sort this out properly.

I wasn't going to push it until I heard otherwise. I'll just pop it
off, for-linus isn't a stable branch.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-22  7:28                                             ` Christoph Hellwig
  (?)
  (?)
@ 2017-11-22 14:46                                             ` Jens Axboe
  -1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-22 14:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mst, Greg Kroah-Hartman, linux-kernel, stable, virtualization,
	linux-block, Bart Van Assche

On 11/22/2017 12:28 AM, Christoph Hellwig wrote:
> Jens, please don't just revert the commit in your for-linus tree.
> 
> On its own this will totally mess up the interrupt assignments.  Give
> me a bit of time to sort this out properly.

I wasn't going to push it until I heard otherwise. I'll just pop it
off, for-linus isn't a stable branch.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:12                       ` Christian Borntraeger
@ 2017-11-23 14:02                         ` Christoph Hellwig
  -1 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig

I can't reproduce it in my VM with adding a new CPU.  Do you have
any interesting blk-mq like actually using multiple queues?  I'll
give that a spin next.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-23 14:02                         ` Christoph Hellwig
  0 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig

I can't reproduce it in my VM with adding a new CPU.  Do you have
any interesting blk-mq like actually using multiple queues?  I'll
give that a spin next.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:12                       ` Christian Borntraeger
  (?)
  (?)
@ 2017-11-23 14:02                       ` Christoph Hellwig
  -1 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, mst, linux-kernel, virtualization, linux-block,
	Bart Van Assche, Christoph Hellwig

I can't reproduce it in my VM with adding a new CPU.  Do you have
any interesting blk-mq like actually using multiple queues?  I'll
give that a spin next.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 14:02                         ` Christoph Hellwig
@ 2017-11-23 14:08                           ` Christoph Hellwig
  -1 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:08 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig

Ok, it helps to make sure we're actually doing I/O from the CPU,
I've reproduced it now.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-23 14:08                           ` Christoph Hellwig
  0 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:08 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
	jasowang, linux-kernel, Christoph Hellwig

Ok, it helps to make sure we're actually doing I/O from the CPU,
I've reproduced it now.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 14:02                         ` Christoph Hellwig
  (?)
  (?)
@ 2017-11-23 14:08                         ` Christoph Hellwig
  -1 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:08 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, mst, linux-kernel, virtualization, linux-block,
	Bart Van Assche, Christoph Hellwig

Ok, it helps to make sure we're actually doing I/O from the CPU,
I've reproduced it now.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-22 14:46                                               ` Jens Axboe
  (?)
@ 2017-11-23 14:34                                               ` Christoph Hellwig
  2017-11-23 14:42                                                 ` Hannes Reinecke
                                                                   ` (2 more replies)
  -1 siblings, 3 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Christian Borntraeger, Bart Van Assche,
	linux-block, linux-kernel, Thomas Gleixner

FYI, the patch below changes both the irq and block mappings to
always use the cpu possible map (should be split in two in due time).

I think this is the right way forward.  For every normal machine
those two are the same, but for VMs with maxcpus above their normal
count or some big iron that can grow more cpus it means we waster
a few more resources for the not present but reserved cpus.  It
fixes the reported issue for me:

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..3eb169f15842 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -16,11 +16,6 @@
 
 static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
 {
-	/*
-	 * Non present CPU will be mapped to queue index 0.
-	 */
-	if (!cpu_present(cpu))
-		return 0;
 	return cpu % nr_queues;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..612ce1fb7c4e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 		INIT_LIST_HEAD(&__ctx->rq_list);
 		__ctx->queue = q;
 
-		/* If the cpu isn't present, the cpu is mapped to first hctx */
-		if (!cpu_present(i))
-			continue;
-
-		hctx = blk_mq_map_queue(q, i);
-
 		/*
 		 * Set local node, IFF we have more than one hw queue. If
 		 * not, we remain on the home node of the device
 		 */
+		hctx = blk_mq_map_queue(q, i);
 		if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
 			hctx->numa_node = local_memory_node(cpu_to_node(i));
 	}
@@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 	 *
 	 * If the cpu isn't present, the cpu is mapped to first hctx.
 	 */
-	for_each_present_cpu(i) {
+	for_each_possible_cpu(i) {
 		hctx_idx = q->mq_map[i];
 		/* unmapped hw queue can be remapped after CPU topo changed */
 		if (!set->tags[hctx_idx] &&
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index e12d35108225..a37a3b4b6342 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 	}
 }
 
-static cpumask_var_t *alloc_node_to_present_cpumask(void)
+static cpumask_var_t *alloc_node_to_possible_cpumask(void)
 {
 	cpumask_var_t *masks;
 	int node;
@@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void)
 	return NULL;
 }
 
-static void free_node_to_present_cpumask(cpumask_var_t *masks)
+static void free_node_to_possible_cpumask(cpumask_var_t *masks)
 {
 	int node;
 
@@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t *masks)
 	kfree(masks);
 }
 
-static void build_node_to_present_cpumask(cpumask_var_t *masks)
+static void build_node_to_possible_cpumask(cpumask_var_t *masks)
 {
 	int cpu;
 
-	for_each_present_cpu(cpu)
+	for_each_possible_cpu(cpu)
 		cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
 }
 
-static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask,
+static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
 				const struct cpumask *mask, nodemask_t *nodemsk)
 {
 	int n, nodes = 0;
 
 	/* Calculate the number of nodes in the supplied affinity mask */
 	for_each_node(n) {
-		if (cpumask_intersects(mask, node_to_present_cpumask[n])) {
+		if (cpumask_intersects(mask, node_to_possible_cpumask[n])) {
 			node_set(n, *nodemsk);
 			nodes++;
 		}
@@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	int last_affv = affv + affd->pre_vectors;
 	nodemask_t nodemsk = NODE_MASK_NONE;
 	struct cpumask *masks;
-	cpumask_var_t nmsk, *node_to_present_cpumask;
+	cpumask_var_t nmsk, *node_to_possible_cpumask;
 
 	/*
 	 * If there aren't any vectors left after applying the pre/post
@@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	if (!masks)
 		goto out;
 
-	node_to_present_cpumask = alloc_node_to_present_cpumask();
-	if (!node_to_present_cpumask)
+	node_to_possible_cpumask = alloc_node_to_possible_cpumask();
+	if (!node_to_possible_cpumask)
 		goto out;
 
 	/* Fill out vectors at the beginning that don't need affinity */
@@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 
 	/* Stabilize the cpumasks */
 	get_online_cpus();
-	build_node_to_present_cpumask(node_to_present_cpumask);
-	nodes = get_nodes_in_cpumask(node_to_present_cpumask, cpu_present_mask,
+	build_node_to_possible_cpumask(node_to_possible_cpumask);
+	nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
 				     &nodemsk);
 
 	/*
@@ -146,7 +146,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	if (affv <= nodes) {
 		for_each_node_mask(n, nodemsk) {
 			cpumask_copy(masks + curvec,
-				     node_to_present_cpumask[n]);
+				     node_to_possible_cpumask[n]);
 			if (++curvec == last_affv)
 				break;
 		}
@@ -160,7 +160,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 		vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
 
 		/* Get the cpus on this node which are in the mask */
-		cpumask_and(nmsk, cpu_present_mask, node_to_present_cpumask[n]);
+		cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
 
 		/* Calculate the number of cpus per vector */
 		ncpus = cpumask_weight(nmsk);
@@ -192,7 +192,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	/* Fill out vectors at the end that don't need affinity */
 	for (; curvec < nvecs; curvec++)
 		cpumask_copy(masks + curvec, irq_default_affinity);
-	free_node_to_present_cpumask(node_to_present_cpumask);
+	free_node_to_possible_cpumask(node_to_possible_cpumask);
 out:
 	free_cpumask_var(nmsk);
 	return masks;
@@ -214,7 +214,7 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
 		return 0;
 
 	get_online_cpus();
-	ret = min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
+	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
 	put_online_cpus();
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 14:34                                               ` Christoph Hellwig
@ 2017-11-23 14:42                                                 ` Hannes Reinecke
  2017-11-23 14:47                                                   ` Christoph Hellwig
  2017-11-23 15:05                                                 ` Christian Borntraeger
  2017-11-23 18:17                                                 ` Christian Borntraeger
  2 siblings, 1 reply; 96+ messages in thread
From: Hannes Reinecke @ 2017-11-23 14:42 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Christian Borntraeger, Bart Van Assche, linux-block,
	linux-kernel, Thomas Gleixner

On 11/23/2017 03:34 PM, Christoph Hellwig wrote:
> FYI, the patch below changes both the irq and block mappings to
> always use the cpu possible map (should be split in two in due time).
> 
> I think this is the right way forward.  For every normal machine
> those two are the same, but for VMs with maxcpus above their normal
> count or some big iron that can grow more cpus it means we waster
> a few more resources for the not present but reserved cpus.  It
> fixes the reported issue for me:
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..3eb169f15842 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -16,11 +16,6 @@
>  
>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>  {
> -	/*
> -	 * Non present CPU will be mapped to queue index 0.
> -	 */
> -	if (!cpu_present(cpu))
> -		return 0;
>  	return cpu % nr_queues;
>  }
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..612ce1fb7c4e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
>  		INIT_LIST_HEAD(&__ctx->rq_list);
>  		__ctx->queue = q;
>  
> -		/* If the cpu isn't present, the cpu is mapped to first hctx */
> -		if (!cpu_present(i))
> -			continue;
> -
> -		hctx = blk_mq_map_queue(q, i);
> -
>  		/*
>  		 * Set local node, IFF we have more than one hw queue. If
>  		 * not, we remain on the home node of the device
>  		 */
> +		hctx = blk_mq_map_queue(q, i);
>  		if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
>  			hctx->numa_node = local_memory_node(cpu_to_node(i));
>  	}
> @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  	 *
>  	 * If the cpu isn't present, the cpu is mapped to first hctx.
>  	 */
> -	for_each_present_cpu(i) {
> +	for_each_possible_cpu(i) {
>  		hctx_idx = q->mq_map[i];
>  		/* unmapped hw queue can be remapped after CPU topo changed */
>  		if (!set->tags[hctx_idx] &&
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index e12d35108225..a37a3b4b6342 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  	}
>  }
>  
> -static cpumask_var_t *alloc_node_to_present_cpumask(void)
> +static cpumask_var_t *alloc_node_to_possible_cpumask(void)
>  {
>  	cpumask_var_t *masks;
>  	int node;
> @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void)
>  	return NULL;
>  }
>  
> -static void free_node_to_present_cpumask(cpumask_var_t *masks)
> +static void free_node_to_possible_cpumask(cpumask_var_t *masks)
>  {
>  	int node;
>  
> @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t *masks)
>  	kfree(masks);
>  }
>  
> -static void build_node_to_present_cpumask(cpumask_var_t *masks)
> +static void build_node_to_possible_cpumask(cpumask_var_t *masks)
>  {
>  	int cpu;
>  
> -	for_each_present_cpu(cpu)
> +	for_each_possible_cpu(cpu)
>  		cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
>  }
>  
> -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask,
> +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
>  				const struct cpumask *mask, nodemask_t *nodemsk)
>  {
>  	int n, nodes = 0;
>  
>  	/* Calculate the number of nodes in the supplied affinity mask */
>  	for_each_node(n) {
> -		if (cpumask_intersects(mask, node_to_present_cpumask[n])) {
> +		if (cpumask_intersects(mask, node_to_possible_cpumask[n])) {
>  			node_set(n, *nodemsk);
>  			nodes++;
>  		}
> @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	int last_affv = affv + affd->pre_vectors;
>  	nodemask_t nodemsk = NODE_MASK_NONE;
>  	struct cpumask *masks;
> -	cpumask_var_t nmsk, *node_to_present_cpumask;
> +	cpumask_var_t nmsk, *node_to_possible_cpumask;
>  
>  	/*
>  	 * If there aren't any vectors left after applying the pre/post
> @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	if (!masks)
>  		goto out;
>  
> -	node_to_present_cpumask = alloc_node_to_present_cpumask();
> -	if (!node_to_present_cpumask)
> +	node_to_possible_cpumask = alloc_node_to_possible_cpumask();
> +	if (!node_to_possible_cpumask)
>  		goto out;
>  
>  	/* Fill out vectors at the beginning that don't need affinity */
> @@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  
>  	/* Stabilize the cpumasks */
>  	get_online_cpus();
> -	build_node_to_present_cpumask(node_to_present_cpumask);
> -	nodes = get_nodes_in_cpumask(node_to_present_cpumask, cpu_present_mask,
> +	build_node_to_possible_cpumask(node_to_possible_cpumask);
> +	nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
>  				     &nodemsk);
>  
>  	/*
> @@ -146,7 +146,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	if (affv <= nodes) {
>  		for_each_node_mask(n, nodemsk) {
>  			cpumask_copy(masks + curvec,
> -				     node_to_present_cpumask[n]);
> +				     node_to_possible_cpumask[n]);
>  			if (++curvec == last_affv)
>  				break;
>  		}
> @@ -160,7 +160,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  		vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
>  
>  		/* Get the cpus on this node which are in the mask */
> -		cpumask_and(nmsk, cpu_present_mask, node_to_present_cpumask[n]);
> +		cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
>  
>  		/* Calculate the number of cpus per vector */
>  		ncpus = cpumask_weight(nmsk);
> @@ -192,7 +192,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	/* Fill out vectors at the end that don't need affinity */
>  	for (; curvec < nvecs; curvec++)
>  		cpumask_copy(masks + curvec, irq_default_affinity);
> -	free_node_to_present_cpumask(node_to_present_cpumask);
> +	free_node_to_possible_cpumask(node_to_possible_cpumask);
>  out:
>  	free_cpumask_var(nmsk);
>  	return masks;
> @@ -214,7 +214,7 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
>  		return 0;
>  
>  	get_online_cpus();
> -	ret = min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
> +	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
>  	put_online_cpus();
>  	return ret;
>  }
> 
What will happen for the CPU hotplug case?
Wouldn't we route I/O to a disabled CPU with this patch?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 14:42                                                 ` Hannes Reinecke
@ 2017-11-23 14:47                                                   ` Christoph Hellwig
  0 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:47 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Jens Axboe, Christian Borntraeger,
	Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner

[fullquote deleted]

> What will happen for the CPU hotplug case?
> Wouldn't we route I/O to a disabled CPU with this patch?

Why would we route I/O to a disabled CPU (we generally route
I/O to devices to start with).  How would including possible
but not present cpus change anything?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 14:34                                               ` Christoph Hellwig
  2017-11-23 14:42                                                 ` Hannes Reinecke
@ 2017-11-23 15:05                                                 ` Christian Borntraeger
  2017-11-23 18:17                                                 ` Christian Borntraeger
  2 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-23 15:05 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner

Yes it seems to fix the bug.

On 11/23/2017 03:34 PM, Christoph Hellwig wrote:
> FYI, the patch below changes both the irq and block mappings to
> always use the cpu possible map (should be split in two in due time).
> 
> I think this is the right way forward.  For every normal machine
> those two are the same, but for VMs with maxcpus above their normal
> count or some big iron that can grow more cpus it means we waster
> a few more resources for the not present but reserved cpus.  It
> fixes the reported issue for me:
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..3eb169f15842 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -16,11 +16,6 @@
> 
>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>  {
> -	/*
> -	 * Non present CPU will be mapped to queue index 0.
> -	 */
> -	if (!cpu_present(cpu))
> -		return 0;
>  	return cpu % nr_queues;
>  }
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..612ce1fb7c4e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
>  		INIT_LIST_HEAD(&__ctx->rq_list);
>  		__ctx->queue = q;
> 
> -		/* If the cpu isn't present, the cpu is mapped to first hctx */
> -		if (!cpu_present(i))
> -			continue;
> -
> -		hctx = blk_mq_map_queue(q, i);
> -
>  		/*
>  		 * Set local node, IFF we have more than one hw queue. If
>  		 * not, we remain on the home node of the device
>  		 */
> +		hctx = blk_mq_map_queue(q, i);
>  		if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
>  			hctx->numa_node = local_memory_node(cpu_to_node(i));
>  	}
> @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  	 *
>  	 * If the cpu isn't present, the cpu is mapped to first hctx.
>  	 */
> -	for_each_present_cpu(i) {
> +	for_each_possible_cpu(i) {
>  		hctx_idx = q->mq_map[i];
>  		/* unmapped hw queue can be remapped after CPU topo changed */
>  		if (!set->tags[hctx_idx] &&
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index e12d35108225..a37a3b4b6342 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  	}
>  }
> 
> -static cpumask_var_t *alloc_node_to_present_cpumask(void)
> +static cpumask_var_t *alloc_node_to_possible_cpumask(void)
>  {
>  	cpumask_var_t *masks;
>  	int node;
> @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void)
>  	return NULL;
>  }
> 
> -static void free_node_to_present_cpumask(cpumask_var_t *masks)
> +static void free_node_to_possible_cpumask(cpumask_var_t *masks)
>  {
>  	int node;
> 
> @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t *masks)
>  	kfree(masks);
>  }
> 
> -static void build_node_to_present_cpumask(cpumask_var_t *masks)
> +static void build_node_to_possible_cpumask(cpumask_var_t *masks)
>  {
>  	int cpu;
> 
> -	for_each_present_cpu(cpu)
> +	for_each_possible_cpu(cpu)
>  		cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
>  }
> 
> -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask,
> +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
>  				const struct cpumask *mask, nodemask_t *nodemsk)
>  {
>  	int n, nodes = 0;
> 
>  	/* Calculate the number of nodes in the supplied affinity mask */
>  	for_each_node(n) {
> -		if (cpumask_intersects(mask, node_to_present_cpumask[n])) {
> +		if (cpumask_intersects(mask, node_to_possible_cpumask[n])) {
>  			node_set(n, *nodemsk);
>  			nodes++;
>  		}
> @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	int last_affv = affv + affd->pre_vectors;
>  	nodemask_t nodemsk = NODE_MASK_NONE;
>  	struct cpumask *masks;
> -	cpumask_var_t nmsk, *node_to_present_cpumask;
> +	cpumask_var_t nmsk, *node_to_possible_cpumask;
> 
>  	/*
>  	 * If there aren't any vectors left after applying the pre/post
> @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	if (!masks)
>  		goto out;
> 
> -	node_to_present_cpumask = alloc_node_to_present_cpumask();
> -	if (!node_to_present_cpumask)
> +	node_to_possible_cpumask = alloc_node_to_possible_cpumask();
> +	if (!node_to_possible_cpumask)
>  		goto out;
> 
>  	/* Fill out vectors at the beginning that don't need affinity */
> @@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> 
>  	/* Stabilize the cpumasks */
>  	get_online_cpus();
> -	build_node_to_present_cpumask(node_to_present_cpumask);
> -	nodes = get_nodes_in_cpumask(node_to_present_cpumask, cpu_present_mask,
> +	build_node_to_possible_cpumask(node_to_possible_cpumask);
> +	nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
>  				     &nodemsk);
> 
>  	/*
> @@ -146,7 +146,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	if (affv <= nodes) {
>  		for_each_node_mask(n, nodemsk) {
>  			cpumask_copy(masks + curvec,
> -				     node_to_present_cpumask[n]);
> +				     node_to_possible_cpumask[n]);
>  			if (++curvec == last_affv)
>  				break;
>  		}
> @@ -160,7 +160,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  		vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
> 
>  		/* Get the cpus on this node which are in the mask */
> -		cpumask_and(nmsk, cpu_present_mask, node_to_present_cpumask[n]);
> +		cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
> 
>  		/* Calculate the number of cpus per vector */
>  		ncpus = cpumask_weight(nmsk);
> @@ -192,7 +192,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	/* Fill out vectors at the end that don't need affinity */
>  	for (; curvec < nvecs; curvec++)
>  		cpumask_copy(masks + curvec, irq_default_affinity);
> -	free_node_to_present_cpumask(node_to_present_cpumask);
> +	free_node_to_possible_cpumask(node_to_possible_cpumask);
>  out:
>  	free_cpumask_var(nmsk);
>  	return masks;
> @@ -214,7 +214,7 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
>  		return 0;
> 
>  	get_online_cpus();
> -	ret = min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
> +	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
>  	put_online_cpus();
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 14:34                                               ` Christoph Hellwig
  2017-11-23 14:42                                                 ` Hannes Reinecke
  2017-11-23 15:05                                                 ` Christian Borntraeger
@ 2017-11-23 18:17                                                 ` Christian Borntraeger
  2017-11-23 18:25                                                   ` Christoph Hellwig
  2 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-23 18:17 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner



On 11/23/2017 03:34 PM, Christoph Hellwig wrote:
> FYI, the patch below changes both the irq and block mappings to
> always use the cpu possible map (should be split in two in due time).
> 
> I think this is the right way forward.  For every normal machine
> those two are the same, but for VMs with maxcpus above their normal
> count or some big iron that can grow more cpus it means we waster
> a few more resources for the not present but reserved cpus.  It
> fixes the reported issue for me:


While it fixes the hotplug issue under KVM, the same kernel no longers boots in the host, 
it seems stuck early at boot just before detecting the SCSI disks. I have not yet looked into
that.

Christian
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..3eb169f15842 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -16,11 +16,6 @@
> 
>  static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>  {
> -	/*
> -	 * Non present CPU will be mapped to queue index 0.
> -	 */
> -	if (!cpu_present(cpu))
> -		return 0;
>  	return cpu % nr_queues;
>  }
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..612ce1fb7c4e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
>  		INIT_LIST_HEAD(&__ctx->rq_list);
>  		__ctx->queue = q;
> 
> -		/* If the cpu isn't present, the cpu is mapped to first hctx */
> -		if (!cpu_present(i))
> -			continue;
> -
> -		hctx = blk_mq_map_queue(q, i);
> -
>  		/*
>  		 * Set local node, IFF we have more than one hw queue. If
>  		 * not, we remain on the home node of the device
>  		 */
> +		hctx = blk_mq_map_queue(q, i);
>  		if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
>  			hctx->numa_node = local_memory_node(cpu_to_node(i));
>  	}
> @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  	 *
>  	 * If the cpu isn't present, the cpu is mapped to first hctx.
>  	 */
> -	for_each_present_cpu(i) {
> +	for_each_possible_cpu(i) {
>  		hctx_idx = q->mq_map[i];
>  		/* unmapped hw queue can be remapped after CPU topo changed */
>  		if (!set->tags[hctx_idx] &&
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index e12d35108225..a37a3b4b6342 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  	}
>  }
> 
> -static cpumask_var_t *alloc_node_to_present_cpumask(void)
> +static cpumask_var_t *alloc_node_to_possible_cpumask(void)
>  {
>  	cpumask_var_t *masks;
>  	int node;
> @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void)
>  	return NULL;
>  }
> 
> -static void free_node_to_present_cpumask(cpumask_var_t *masks)
> +static void free_node_to_possible_cpumask(cpumask_var_t *masks)
>  {
>  	int node;
> 
> @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t *masks)
>  	kfree(masks);
>  }
> 
> -static void build_node_to_present_cpumask(cpumask_var_t *masks)
> +static void build_node_to_possible_cpumask(cpumask_var_t *masks)
>  {
>  	int cpu;
> 
> -	for_each_present_cpu(cpu)
> +	for_each_possible_cpu(cpu)
>  		cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
>  }
> 
> -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask,
> +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
>  				const struct cpumask *mask, nodemask_t *nodemsk)
>  {
>  	int n, nodes = 0;
> 
>  	/* Calculate the number of nodes in the supplied affinity mask */
>  	for_each_node(n) {
> -		if (cpumask_intersects(mask, node_to_present_cpumask[n])) {
> +		if (cpumask_intersects(mask, node_to_possible_cpumask[n])) {
>  			node_set(n, *nodemsk);
>  			nodes++;
>  		}
> @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	int last_affv = affv + affd->pre_vectors;
>  	nodemask_t nodemsk = NODE_MASK_NONE;
>  	struct cpumask *masks;
> -	cpumask_var_t nmsk, *node_to_present_cpumask;
> +	cpumask_var_t nmsk, *node_to_possible_cpumask;
> 
>  	/*
>  	 * If there aren't any vectors left after applying the pre/post
> @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	if (!masks)
>  		goto out;
> 
> -	node_to_present_cpumask = alloc_node_to_present_cpumask();
> -	if (!node_to_present_cpumask)
> +	node_to_possible_cpumask = alloc_node_to_possible_cpumask();
> +	if (!node_to_possible_cpumask)
>  		goto out;
> 
>  	/* Fill out vectors at the beginning that don't need affinity */
> @@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> 
>  	/* Stabilize the cpumasks */
>  	get_online_cpus();
> -	build_node_to_present_cpumask(node_to_present_cpumask);
> -	nodes = get_nodes_in_cpumask(node_to_present_cpumask, cpu_present_mask,
> +	build_node_to_possible_cpumask(node_to_possible_cpumask);
> +	nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
>  				     &nodemsk);
> 
>  	/*
> @@ -146,7 +146,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	if (affv <= nodes) {
>  		for_each_node_mask(n, nodemsk) {
>  			cpumask_copy(masks + curvec,
> -				     node_to_present_cpumask[n]);
> +				     node_to_possible_cpumask[n]);
>  			if (++curvec == last_affv)
>  				break;
>  		}
> @@ -160,7 +160,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  		vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
> 
>  		/* Get the cpus on this node which are in the mask */
> -		cpumask_and(nmsk, cpu_present_mask, node_to_present_cpumask[n]);
> +		cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
> 
>  		/* Calculate the number of cpus per vector */
>  		ncpus = cpumask_weight(nmsk);
> @@ -192,7 +192,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	/* Fill out vectors at the end that don't need affinity */
>  	for (; curvec < nvecs; curvec++)
>  		cpumask_copy(masks + curvec, irq_default_affinity);
> -	free_node_to_present_cpumask(node_to_present_cpumask);
> +	free_node_to_possible_cpumask(node_to_possible_cpumask);
>  out:
>  	free_cpumask_var(nmsk);
>  	return masks;
> @@ -214,7 +214,7 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
>  		return 0;
> 
>  	get_online_cpus();
> -	ret = min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
> +	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
>  	put_online_cpus();
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 18:17                                                 ` Christian Borntraeger
@ 2017-11-23 18:25                                                   ` Christoph Hellwig
  2017-11-23 18:28                                                     ` Christian Borntraeger
  0 siblings, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 18:25 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
	linux-kernel, Thomas Gleixner

What HBA driver do you use in the host?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 18:25                                                   ` Christoph Hellwig
@ 2017-11-23 18:28                                                     ` Christian Borntraeger
  2017-11-23 18:32                                                       ` Christoph Hellwig
  0 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-23 18:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner

zfcp on s390.

On 11/23/2017 07:25 PM, Christoph Hellwig wrote:
> What HBA driver do you use in the host?
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 18:28                                                     ` Christian Borntraeger
@ 2017-11-23 18:32                                                       ` Christoph Hellwig
  2017-11-23 18:59                                                         ` Christian Borntraeger
  0 siblings, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 18:32 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
	linux-kernel, Thomas Gleixner

On Thu, Nov 23, 2017 at 07:28:31PM +0100, Christian Borntraeger wrote:
> zfcp on s390.

Ok, so it can't be the interrupt code, but probably is the blk-mq-cpumap.c
changes.  Can you try to revert just those for a quick test?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 18:32                                                       ` Christoph Hellwig
@ 2017-11-23 18:59                                                         ` Christian Borntraeger
  2017-11-24 13:09                                                           ` Christian Borntraeger
  0 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-23 18:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner



On 11/23/2017 07:32 PM, Christoph Hellwig wrote:
> On Thu, Nov 23, 2017 at 07:28:31PM +0100, Christian Borntraeger wrote:
>> zfcp on s390.
> 
> Ok, so it can't be the interrupt code, but probably is the blk-mq-cpumap.c
> changes.  Can you try to revert just those for a quick test?


Hmm, I get further in boot, but the system seems very sluggish and it does not
seem to be able to access the scsi disks (get data from them)

 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-23 18:59                                                         ` Christian Borntraeger
@ 2017-11-24 13:09                                                           ` Christian Borntraeger
  2017-11-27 15:54                                                             ` Christoph Hellwig
  0 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-24 13:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner



On 11/23/2017 07:59 PM, Christian Borntraeger wrote:
> 
> 
> On 11/23/2017 07:32 PM, Christoph Hellwig wrote:
>> On Thu, Nov 23, 2017 at 07:28:31PM +0100, Christian Borntraeger wrote:
>>> zfcp on s390.
>>
>> Ok, so it can't be the interrupt code, but probably is the blk-mq-cpumap.c
>> changes.  Can you try to revert just those for a quick test?
> 
> 
> Hmm, I get further in boot, but the system seems very sluggish and it does not
> seem to be able to access the scsi disks (get data from them)
> 

FWIW, just having the changes in irq_affinity.c is indeed fine.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-24 13:09                                                           ` Christian Borntraeger
@ 2017-11-27 15:54                                                             ` Christoph Hellwig
  2017-11-29 19:18                                                               ` Christian Borntraeger
  0 siblings, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-27 15:54 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
	linux-kernel, Thomas Gleixner

Can you try this git branch:

    git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix

Gitweb:

     http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-mq-hotplug-fix

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-27 15:54                                                             ` Christoph Hellwig
@ 2017-11-29 19:18                                                               ` Christian Borntraeger
  2017-11-29 19:36                                                                 ` Christian Borntraeger
  2017-12-04 16:21                                                                 ` Christoph Hellwig
  0 siblings, 2 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-29 19:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner

Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
Seems that this is the place where the system stops. (see the sysrq-t output
at the bottom).


Message
"[    0.247484] Linux version 4.15.0-rc1+ (cborntra@s38lp08) (gcc version 6.3.1 2
"
"0161221 (Red Hat 6.3.1-1.0.ibm) (GCC)) #229 SMP Wed Nov 29 20:05:35 CET 2017
"
"[    0.247489] setup: Linux is running natively in 64-bit mode
"
"[    0.247661] setup: The maximum memory size is 1048576MB
"
"[    0.247670] setup: Reserving 1024MB of memory at 1047552MB for crashkernel (S
"
"ystem RAM: 1047552MB)
"
"[    0.247688] numa: NUMA mode: plain
"
"[    0.247794] cpu: 64 configured CPUs, 0 standby CPUs
"
"[    0.247834] cpu: The CPU configuration topology of the machine is: 0 0 4 2 3 
"
"8 / 4
"
"[    0.248279] Write protected kernel read-only data: 12456k
"
"[    0.265131] Zone ranges:
"
"[    0.265134]   DMA      [mem 0x0000000000000000-0x000000007fffffff]
"
"[    0.265136]   Normal   [mem 0x0000000080000000-0x000000ffffffffff]
"
"[    0.265137] Movable zone start for each node
"
"[    0.265138] Early memory node ranges
"
"[    0.265139]   node   0: [mem 0x0000000000000000-0x000000ffffffffff]
"
"[    0.265141] Initmem setup node 0 [mem 0x0000000000000000-0x000000ffffffffff]
"
"[    7.445561] random: fast init done
"
"[    7.449194] percpu: Embedded 23 pages/cpu @000000fbbe600000 s56064 r8192 d299
"
"52 u94208
"
"[    7.449380] Built 1 zonelists, mobility grouping on.  Total pages: 264241152
"
"[    7.449381] Policy zone: Normal
"
"[    7.449384] Kernel command line: elevator=deadline audit_enable=0 audit=0 aud
"
"it_debug=0 selinux=0 crashkernel=1024M printk.time=1 zfcp.dbfsize=100 dasd=241c,
"
"241d,241e,241f root=/dev/dasda1 kvm.nested=1  BOOT_IMAGE=0
"
"[    7.449420] audit: disabled (until reboot)
"
"[    7.450513] log_buf_len individual max cpu contribution: 4096 bytes
"
"[    7.450514] log_buf_len total cpu_extra contributions: 1044480 bytes
"
"[    7.450515] log_buf_len min size: 131072 bytes
"
"[    7.450788] log_buf_len: 2097152 bytes
"
"[    7.450789] early log buf free: 125076(95%)
"
"[   11.040620] Memory: 1055873868K/1073741824K available (8248K kernel code, 107
"
"8K rwdata, 4204K rodata, 812K init, 700K bss, 17867956K reserved, 0K cma-reserve
"
"d)
"
"[   11.040938] SLUB: HWalign=256, Order=0-3, MinObjects=0, CPUs=256, Nodes=1
"
"[   11.040969] ftrace: allocating 26506 entries in 104 pages
"
"[   11.051476] Hierarchical RCU implementation.
"
"[   11.051476]  RCU event tracing is enabled.
"
"[   11.051478]  RCU debug extended QS entry/exit.
"
"[   11.053263] NR_IRQS: 3, nr_irqs: 3, preallocated irqs: 3
"
"[   11.053444] clocksource: tod: mask: 0xffffffffffffffff max_cycles: 0x3b0a9be8
"
"03b0a9, max_idle_ns: 1805497147909793 ns
"
"[   11.160192] console [ttyS0] enabled
"
"[   11.308228] pid_max: default: 262144 minimum: 2048
"
"[   11.308298] Security Framework initialized
"
"[   11.308300] SELinux:  Disabled at boot.
"
"[   11.354028] Dentry cache hash table entries: 33554432 (order: 16, 268435456 b
"
"ytes)
"
"[   11.376945] Inode-cache hash table entries: 16777216 (order: 15, 134217728 by
"
"tes)
"
"[   11.377685] Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes)
"

"[   11.378401] Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 b
"
"ytes)
"
"[   11.378984] Hierarchical SRCU implementation.
"
"[   11.380032] smp: Bringing up secondary CPUs ...
"
"[   11.393634] smp: Brought up 1 node, 64 CPUs
"
"[   11.585458] devtmpfs: initialized
"
"[   11.588589] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, ma
"
"x_idle_ns: 19112604462750000 ns
"
"[   11.588998] futex hash table entries: 65536 (order: 12, 16777216 bytes)
"
"[   11.591926] NET: Registered protocol family 16
"
"[   11.596413] HugeTLB registered 1.00 MiB page size, pre-allocated 0 pages
"
"[   11.597604] SCSI subsystem initialized
"
"[   11.597611] pps_core: LinuxPPS API ver. 1 registered
"
"[   11.597612] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giome
"
"tti <giometti@linux.it>
"
"[   11.597614] PTP clock support registered
"
"[   11.599088] NetLabel: Initializing
"
"[   11.599089] NetLabel:  domain hash size = 128
"
"[   11.599090] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
"
"[   11.599101] NetLabel:  unlabeled traffic allowed by default
"
"[   11.612542] PCI host bridge to bus 0000:00
"
"[   11.612546] pci_bus 0000:00: root bus resource [mem 0x8000000000000000-0x8000
"
"0000007fffff 64bit pref]
"
"[   11.612548] pci_bus 0000:00: No busn resource found for root bus, will use [b
"
"us 00-ff]
"
"[   11.616458] iommu: Adding device 0000:00:00.0 to group 0
"
"[   12.291894] VFS: Disk quotas dquot_6.6.0
"
"[   12.291942] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
"
"[   12.292226] NET: Registered protocol family 2
"
"[   12.292662] TCP established hash table entries: 524288 (order: 10, 4194304 by
"
"tes)
"
"[   12.294559] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
"
"[   12.295008] TCP: Hash tables configured (established 524288 bind 65536)
"
"[   12.295229] UDP hash table entries: 65536 (order: 9, 2097152 bytes)
"
"[   12.296173] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes)
"
"[   12.297343] NET: Registered protocol family 1
"
"[   12.301053] workingset: timestamp_bits=42 max_order=28 bucket_order=0
"
"[   12.304670] NET: Registered protocol family 38
"
"[   12.304694] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 2
"
"50)
"
"[   12.304939] io scheduler noop registered
"
"[   12.304940] io scheduler deadline registered (default)
"
"[   12.304975] io scheduler cfq registered
"
"[   12.304977] io scheduler mq-deadline registered (default)
"
"[   12.304977] io scheduler kyber registered
"
"[   12.305949] atomic64_test: passed
"
"[   12.305985] hvc_iucv: The z/VM IUCV HVC device driver cannot be used without 
"
"z/VM
"
"[   12.317868] loop: module loaded
"
"[   12.318153] tun: Universal TUN/TAP device driver, 1.6
"
"[   12.318240] mlx4_core: Mellanox ConnectX core driver v4.0-0
"
"[   12.318251] mlx4_core: Initializing 0000:00:00.0
"
"[   12.318324] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
"
"[   12.319389] mlx4_core 0000:00:00.0: Detected virtual function - running in sl
"
"ave mode
"
"[   12.319448] mlx4_core 0000:00:00.0: Sending reset
"
"[   12.320791] mlx4_core 0000:00:00.0: Sending vhcr0
"
"[   12.326014] mlx4_core 0000:00:00.0: Requested number of MACs is too much for 
"
"port 1, reducing to 64
"
"[   12.326016] mlx4_core 0000:00:00.0: Requested number of VLANs is too much for
"
" port 1, reducing to 1
"
"[   12.326240] mlx4_core 0000:00:00.0: HCA minimum page size:512
"
"[   12.327666] mlx4_core 0000:00:00.0: Timestamping is not supported in slave mo
"
"de
"
"[   12.537132] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
"
"[   12.537646] mlx4_en 0000:00:00.0: Activating port:1
"
"[   12.578197] mlx4_en: 0000:00:00.0: Port 1: Using 32 TX rings
"
"[   12.578201] mlx4_en: 0000:00:00.0: Port 1: Using 8 RX rings
"
"[   12.578676] mlx4_en: 0000:00:00.0: Port 1: Initializing port
"
"[   12.579202] mlx4_en 0000:00:00.0: Activating port:2
"
"[   12.613180] mlx4_en: 0000:00:00.0: Port 2: Using 32 TX rings
"
"[   12.613182] mlx4_en: 0000:00:00.0: Port 2: Using 8 RX rings
"
"[   12.613592] mlx4_en: 0000:00:00.0: Port 2: Initializing port
"
"[   12.614141] VFIO - User Level meta-driver version: 0.3
"
"[   12.614231] mousedev: PS/2 mouse device common for all mice
"
"[   12.614247] IR NEC protocol handler initialized
"
"[   12.614248] IR RC5(x/sz) protocol handler initialized
"
"[   12.614249] IR RC6 protocol handler initialized
"
"[   12.614250] IR JVC protocol handler initialized
"
"[   12.614251] IR Sony protocol handler initialized
"
"[   12.614252] IR SANYO protocol handler initialized
"
"[   12.614253] IR Sharp protocol handler initialized
"
"[   12.614254] IR MCE Keyboard/mouse protocol handler initialized
"
"[   12.614255] IR XMP protocol handler initialized
"
"[   12.614318] device-mapper: uevent: version 1.0.3
"
"[   12.614376] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-d
"
"evel@redhat.com
"
"[   12.614856] cio: Channel measurement facility initialized using format extend
"
"ed (mode autodetected)
"
"[   12.615194] Discipline DIAG cannot be used without z/VM
"
"[   12.619692] dasd-eckd 0.0.241c: A channel path to the device has become opera
"
"tional
"
"[   12.619847] dasd-eckd 0.0.241e: A channel path to the device has become opera
"
"tional
"
"[   12.619992] dasd-eckd 0.0.241d: A channel path to the device has become opera
"
"tional
"
"[   12.620344] dasd-eckd 0.0.241f: A channel path to the device has become opera
"
"tional
"
"[   12.621880] dasd-eckd 0.0.241e: New DASD 3390/0C (CU 3990/01) with 30051 cyli
"
"nders, 15 heads, 224 sectors
"
"[   12.622097] dasd-eckd 0.0.241d: New DASD 3390/0C (CU 3990/01) with 30051 cyli
"
"nders, 15 heads, 224 sectors
"
"[   12.622286] dasd-eckd 0.0.241f: New DASD 3390/0C (CU 3990/01) with 30051 cyli
"
"nders, 15 heads, 224 sectors
"
"[   12.622519] dasd-eckd 0.0.241c: New DASD 3390/0C (CU 3990/01) with 30051 cyli
"
"nders, 15 heads, 224 sectors
"
"[   12.637350] dasd-eckd 0.0.241c: DASD with 4 KB/block, 21636720 KB total size,
"
" 48 KB/track, compatible disk layout
"
"[   12.642780] dasd-eckd 0.0.241d: DASD with 4 KB/block, 21636720 KB total size,
"
" 48 KB/track, compatible disk layout
"
"[   12.644616]  dasdb:VOL1/  0X241D: dasdb1
"
"[   12.644943] dasd-eckd 0.0.241e: DASD with 4 KB/block, 21636720 KB total size,
"
" 48 KB/track, compatible disk layout
"
"[   12.645439] dasd-eckd 0.0.241f: DASD with 4 KB/block, 21636720 KB total size,
"
" 48 KB/track, compatible disk layout
"
"[   12.647222]  dasda:VOL1/  0X241C: dasda1
"
"[   12.651236]  dasdc:VOL1/  0X241E: dasdc1
"
"[   12.651704]  dasdd:VOL1/  0X241F: dasdd1
"
"[   13.171832] drop_monitor: Initializing network drop monitor service
"
"[   13.172016] Initializing XFRM netlink socket
"
"[   13.172105] NET: Registered protocol family 10
"
"[   13.172902] Segment Routing with IPv6
"
"[   13.172915] mip6: Mobile IPv6
"
"[   13.172916] NET: Registered protocol family 17
"
"[   13.172923] Key type dns_resolver registered
"
"[   13.173033] registered taskstats version 1
"
"[   13.173394] Key type encrypted registered
"
"[   13.173665] md: Waiting for all devices to be available before autodetect
"
"[   13.173667] md: If you don't use raid, use raid=noautodetect
"
"[   13.173894] md: Autodetecting RAID arrays.
"
"[   13.173896] md: autorun ...
"
"[   13.173896] md: ... autorun DONE.
"
"[   13.174405] EXT4-fs (dasda1): couldn't mount as ext3 due to feature incompati
"
"bilities
"
"[   13.174647] EXT4-fs (dasda1): couldn't mount as ext2 due to feature incompati
"
"bilities
"
"[   13.199229] EXT4-fs (dasda1): mounted filesystem with ordered data mode. Opts
"
": (null)
"
"[   13.199233] VFS: Mounted root (ext4 filesystem) readonly on device 94:1.
"
"[   16.773545] random: crng init done
"
"[  112.413804] sysrq: SysRq : Show State
"
"[  112.413809]   task                        PC stack   pid father
"
"[  112.413811] swapper/0       D    0     1      0 0x00000000
"
"[  112.413814] Call Trace:
"
"[  112.413820] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.413821]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.413824]  [<000000000019e8c4>] io_schedule+0x34/0x58 
"
"[  112.413826]  [<00000000009064f6>] bit_wait_io+0x2e/0x90 
"
"[  112.413827]  [<0000000000905fe8>] __wait_on_bit+0xb8/0x110 
"
"[  112.413828]  [<00000000009060de>] out_of_line_wait_on_bit+0x9e/0xb0 
"
"[  112.413833]  [<000000000041e5ba>] __ext4_get_inode_loc+0x52a/0x570 
"
"[  112.413836]  [<0000000000422664>] ext4_iget+0x7c/0xc28 
"
"[  112.413839]  [<000000000043cf5a>] ext4_lookup+0x12a/0x238 
"
"[  112.413843]  [<00000000003620b6>] lookup_slow+0xae/0x198 
"
"[  112.413844]  [<00000000003657a0>] walk_component+0x210/0x358 
"
"[  112.413845]  [<000000000036629a>] path_lookupat+0xe2/0x278 
"
"[  112.413847]  [<0000000000368174>] filename_lookup+0x9c/0x160 
"
"[  112.413848]  [<000000000036835c>] user_path_at_empty+0x5c/0x70 
"
"[  112.413851]  [<0000000000380b94>] do_mount+0x74/0xd10 
"
"[  112.413853]  [<0000000000381c1c>] SyS_mount+0xa4/0x108 
"
"[  112.413857]  [<00000000005c02f8>] devtmpfs_mount+0x60/0xc0 
"
"[  112.413860]  [<0000000000e395c6>] prepare_namespace+0x18e/0x1c0 
"
"[  112.413861]  [<0000000000e38e46>] kernel_init_freeable+0x26e/0x288 
"
"[  112.413865]  [<0000000000900572>] kernel_init+0x2a/0x150 
"
"[  112.413868]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.413869]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.413870] kthreadd        S    0     2      0 0x00000000
"
"[  112.418205]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418206]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418207]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418208] kworker/51:1    I    0   441      2 0x00000000
"
"[  112.418210] Call Trace:
"
"[  112.418211] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418213]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418214]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418215]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418217]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418218]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418219] kworker/53:1    I    0   442      2 0x00000000
"
"[  112.418221] Call Trace:
"
"[  112.418222] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418223]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418225]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418226]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418227]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418229]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418230] kworker/52:1    I    0   443      2 0x00000000
"
"[  112.418231] Call Trace:
"
"[  112.418233] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418234]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418235]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418237]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418238]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418239]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418240] kworker/54:1    I    0   444      2 0x00000000
"
"[  112.418242] Call Trace:
"
"[  112.418243] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418244]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418246]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418247]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418249]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418250]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418251] kworker/55:1    I    0   445      2 0x00000000
"
"[  112.418253] Call Trace:
"
"[  112.418254] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418255]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418256]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418258]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418259]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418260]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418262] kworker/57:1    I    0   446      2 0x00000000
"
"[  112.418263] Call Trace:
"
"[  112.418265] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418266]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418267]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418268]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418270]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418271]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418272] kworker/56:1    I    0   447      2 0x00000000
"
"[  112.418274] Call Trace:
"
"[  112.418275] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418276]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418278]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418279]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418280]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418282]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418283] kworker/59:1    I    0   448      2 0x00000000
"
"[  112.418284] Call Trace:
"
"[  112.418286] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418287]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418288]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418290]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418291]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418292]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418293] kworker/58:1    I    0   449      2 0x00000000
"
"[  112.418295] Call Trace:
"
"[  112.418296] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418297]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418299]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418300]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418301]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418303]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418304] kworker/61:1    I    0   450      2 0x00000000
"
"[  112.418305] Call Trace:
"
"[  112.418306] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418308]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418309]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418311]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418312]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418313]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418315] kworker/63:1    I    0   451      2 0x00000000
"
"[  112.418316] Call Trace:
"
"[  112.418318] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418319]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418320]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418322]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418323]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418324]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418325] kworker/0:1H    I    0   452      2 0x00000000
"
"[  112.418327] Call Trace:
"
"[  112.418328] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418329]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418330]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418332]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418333]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418335]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418336] jbd2/dasda1-8   S    0   453      2 0x00000000
"
"[  112.418337] Call Trace:
"
"[  112.418338] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418339]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418343]  [<00000000004725be>] kjournald2+0x386/0x3c8 
"
"[  112.418345]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418346]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418348]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418349] ext4-rsv-conver I    0   454      2 0x00000000
"
"[  112.418350] Call Trace:
"
"[  112.418352] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418353]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418354]  [<0000000000189008>] rescuer_thread+0x3f8/0x460 
"
"[  112.418356]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418357]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418358]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418360] kworker/62:1H   I    0   455      2 0x00000000
"
"[  112.418361] Call Trace:
"
"[  112.418362] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[  112.418364]  [<000000000090595a>] schedule+0x4a/0xb8 
"
"[  112.418365]  [<00000000001882ec>] worker_thread+0xe4/0x4f8 
"
"[  112.418366]  [<000000000018ee66>] kthread+0x13e/0x160 
"
"[  112.418368]  [<000000000090a9c2>] kernel_thread_starter+0x6/0xc 
"
"[  112.418369]  [<000000000090a9bc>] kernel_thread_starter+0x0/0xc 
"
"[  112.418370] Showing busy workqueues and worker pools:
"
"[  112.418407] workqueue events: flags=0x0
"
"[  112.418426]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
"
"[  112.418429]     in-flight: 343:ctrlchar_handle_sysrq
"
"[  112.418837] workqueue kblockd: flags=0x18
"
"[  112.418855]   pwq 131: cpus=65 node=0 flags=0x4 nice=-20 active=1/256
"
"[  112.418858]     pending: blk_mq_run_work_fn
"
"[  112.419188] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=2 idle: 20
"


On 11/27/2017 04:54 PM, Christoph Hellwig wrote:
> Can you try this git branch:
> 
>     git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix
> 
> Gitweb:
> 
>      http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-mq-hotplug-fix
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-29 19:18                                                               ` Christian Borntraeger
@ 2017-11-29 19:36                                                                 ` Christian Borntraeger
  2017-12-04 16:21                                                                 ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-29 19:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner


On 11/29/2017 08:18 PM, Christian Borntraeger wrote:
> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
> Seems that this is the place where the system stops. (see the sysrq-t output
> at the bottom).

FWIW, the failing kernel had CONFIG_NR_CPUS=256 and 32 CPUS (with SMT2) == 64 threads
with CONFIG_NR_CPUS=16 the system booted fine.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-29 19:18                                                               ` Christian Borntraeger
  2017-11-29 19:36                                                                 ` Christian Borntraeger
@ 2017-12-04 16:21                                                                 ` Christoph Hellwig
  2017-12-06 12:25                                                                     ` Christian Borntraeger
  1 sibling, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2017-12-04 16:21 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
	linux-kernel, Thomas Gleixner

On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote:
> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
> Seems that this is the place where the system stops. (see the sysrq-t output
> at the bottom).

Can you check which of the patches in the tree is the culprit?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-04 16:21                                                                 ` Christoph Hellwig
@ 2017-12-06 12:25                                                                     ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-06 12:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
	Thomas Gleixner, Stefan Haberland, linux-s390,
	Martin Schwidefsky

On 12/04/2017 05:21 PM, Christoph Hellwig wrote:
> On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote:
>> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
>> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
>> Seems that this is the place where the system stops. (see the sysrq-t output
>> at the bottom).
> 
> Can you check which of the patches in the tree is the culprit?


>From this branch

    git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix

commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
    blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and 
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
   genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.

Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
s390 irq handling code).


Some history:
I got this warning
"WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)"
since 4.13 (and also in 4.12 stable)
on CPU hotplug of previously unavailable CPUs (real hotplug, no offline/online)

This was introduced with 

 blk-mq: Create hctx for each present CPU
    commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 

And Christoph is currently working on a fix. The fixed kernel does boot with virtio-blk and
it fixes the warning but it hangs (outstanding I/O) with dasd disks.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-12-06 12:25                                                                     ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-06 12:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
	Thomas Gleixner, Stefan Haberland, linux-s390,
	Martin Schwidefsky

On 12/04/2017 05:21 PM, Christoph Hellwig wrote:
> On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote:
>> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
>> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
>> Seems that this is the place where the system stops. (see the sysrq-t output
>> at the bottom).
> 
> Can you check which of the patches in the tree is the culprit?


From this branch

    git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix

commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
    blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and 
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
   genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.

Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
s390 irq handling code).


Some history:
I got this warning
"WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)"
since 4.13 (and also in 4.12 stable)
on CPU hotplug of previously unavailable CPUs (real hotplug, no offline/online)

This was introduced with 

 blk-mq: Create hctx for each present CPU
    commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 

And Christoph is currently working on a fix. The fixed kernel does boot with virtio-blk and
it fixes the warning but it hangs (outstanding I/O) with dasd disks.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-06 12:25                                                                     ` Christian Borntraeger
  (?)
@ 2017-12-06 23:29                                                                     ` Christoph Hellwig
  2017-12-07  9:20                                                                       ` Christian Borntraeger
  2017-12-18 13:56                                                                       ` Stefan Haberland
  -1 siblings, 2 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-12-06 23:29 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
	linux-kernel, Thomas Gleixner, Stefan Haberland, linux-s390,
	Martin Schwidefsky

On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>     blk-mq: create a blk_mq_ctx for each possible CPU
> does not boot on DASD and 
> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>    genirq/affinity: assign vectors to all possible CPUs
> does boot with DASD disks.
> 
> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> s390 irq handling code).

That is interesting as it really isn't related to interrupts at all,
it just ensures that possible CPUs are set in ->cpumask.

I guess we'd really want:

e005655c389e3d25bf3e43f71611ec12f3012de0
"blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"

before this commit, but it seems like the whole stack didn't work for
your either.

I wonder if there is some weird thing about nr_cpu_ids in s390?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-06 23:29                                                                     ` Christoph Hellwig
@ 2017-12-07  9:20                                                                       ` Christian Borntraeger
  2017-12-14 17:32                                                                         ` Christian Borntraeger
  2017-12-18 13:56                                                                       ` Stefan Haberland
  1 sibling, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-07  9:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
	Thomas Gleixner, Stefan Haberland, linux-s390,
	Martin Schwidefsky



On 12/07/2017 12:29 AM, Christoph Hellwig wrote:
> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>     blk-mq: create a blk_mq_ctx for each possible CPU
>> does not boot on DASD and 
>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>    genirq/affinity: assign vectors to all possible CPUs
>> does boot with DASD disks.
>>
>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>> s390 irq handling code).
> 
> That is interesting as it really isn't related to interrupts at all,
> it just ensures that possible CPUs are set in ->cpumask.
> 
> I guess we'd really want:
> 
> e005655c389e3d25bf3e43f71611ec12f3012de0
> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
> 
> before this commit, but it seems like the whole stack didn't work for
> your either.
> 
> I wonder if there is some weird thing about nr_cpu_ids in s390?

The problem starts as soon as NR_CPUS is larger than the number
of real CPUs.

Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu:

e.g. dont we need something like (whitespace and indent damaged)

@@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
        if (--hctx->next_cpu_batch <= 0) {
                int next_cpu;
 
+               do  {
                next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
-               if (!cpu_online(next_cpu))
-                       next_cpu = cpumask_next(next_cpu, hctx->cpumask);
                if (next_cpu >= nr_cpu_ids)
                        next_cpu = cpumask_first(hctx->cpumask);
+               } while (!cpu_online(next_cpu));
 
                hctx->next_cpu = next_cpu;
                hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;

it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-07  9:20                                                                       ` Christian Borntraeger
@ 2017-12-14 17:32                                                                         ` Christian Borntraeger
  0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-14 17:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
	Thomas Gleixner, Stefan Haberland, linux-s390,
	Martin Schwidefsky

Independent from the issues with the dasd disks, this also seem to not enable
additional hardware queues.

with cpus 0,1 (and 248 cpus max)
I get cpus 0 and 2-247 attached to hardware contect 0 and I get
cpu 1 for hardware context 1. 

If I now add a cpu this does not change anything. hardware context 2,3,4
etc all have no CPU and hardware context 0 keeps sitting on all cpus (except 1).




On 12/07/2017 10:20 AM, Christian Borntraeger wrote:
> 
> 
> On 12/07/2017 12:29 AM, Christoph Hellwig wrote:
>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>     blk-mq: create a blk_mq_ctx for each possible CPU
>>> does not boot on DASD and 
>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>    genirq/affinity: assign vectors to all possible CPUs
>>> does boot with DASD disks.
>>>
>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>> s390 irq handling code).
>>
>> That is interesting as it really isn't related to interrupts at all,
>> it just ensures that possible CPUs are set in ->cpumask.
>>
>> I guess we'd really want:
>>
>> e005655c389e3d25bf3e43f71611ec12f3012de0
>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>
>> before this commit, but it seems like the whole stack didn't work for
>> your either.
>>
>> I wonder if there is some weird thing about nr_cpu_ids in s390?
> 
> The problem starts as soon as NR_CPUS is larger than the number
> of real CPUs.
> 
> Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu:
> 
> e.g. dont we need something like (whitespace and indent damaged)
> 
> @@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>         if (--hctx->next_cpu_batch <= 0) {
>                 int next_cpu;
>  
> +               do  {
>                 next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
> -               if (!cpu_online(next_cpu))
> -                       next_cpu = cpumask_next(next_cpu, hctx->cpumask);
>                 if (next_cpu >= nr_cpu_ids)
>                         next_cpu = cpumask_first(hctx->cpumask);
> +               } while (!cpu_online(next_cpu));
>  
>                 hctx->next_cpu = next_cpu;
>                 hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
> 
> it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)
> 
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-06 23:29                                                                     ` Christoph Hellwig
  2017-12-07  9:20                                                                       ` Christian Borntraeger
@ 2017-12-18 13:56                                                                       ` Stefan Haberland
  2017-12-20 15:47                                                                         ` Christian Borntraeger
  1 sibling, 1 reply; 96+ messages in thread
From: Stefan Haberland @ 2017-12-18 13:56 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Borntraeger
  Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
	Thomas Gleixner, linux-s390, Martin Schwidefsky

On 07.12.2017 00:29, Christoph Hellwig wrote:
> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>      blk-mq: create a blk_mq_ctx for each possible CPU
>> does not boot on DASD and
>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>     genirq/affinity: assign vectors to all possible CPUs
>> does boot with DASD disks.
>>
>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>> s390 irq handling code).
> That is interesting as it really isn't related to interrupts at all,
> it just ensures that possible CPUs are set in ->cpumask.
>
> I guess we'd really want:
>
> e005655c389e3d25bf3e43f71611ec12f3012de0
> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>
> before this commit, but it seems like the whole stack didn't work for
> your either.
>
> I wonder if there is some weird thing about nr_cpu_ids in s390?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

I tried this on my system and the blk-mq-hotplug-fix branch does not 
boot for me as well.
The disks get up and running and I/O works fine. At least the partition 
detection and EXT4-fs mount works.

But at some point in time the disk do not get any requests.

I currently have no clue why.
I took a dump and had a look at the disk states and they are fine. No 
error in the logs or in our debug entrys. Just empty DASD devices 
waiting to be called for I/O requests.

Do you have anything I could have a look at?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-18 13:56                                                                       ` Stefan Haberland
@ 2017-12-20 15:47                                                                         ` Christian Borntraeger
  2018-01-11  9:13                                                                             ` Ming Lei
  0 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-20 15:47 UTC (permalink / raw)
  To: Stefan Haberland, Christoph Hellwig, Jens Axboe
  Cc: Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner,
	linux-s390, Martin Schwidefsky

On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> On 07.12.2017 00:29, Christoph Hellwig wrote:
>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>      blk-mq: create a blk_mq_ctx for each possible CPU
>>> does not boot on DASD and
>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>     genirq/affinity: assign vectors to all possible CPUs
>>> does boot with DASD disks.
>>>
>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>> s390 irq handling code).
>> That is interesting as it really isn't related to interrupts at all,
>> it just ensures that possible CPUs are set in ->cpumask.
>>
>> I guess we'd really want:
>>
>> e005655c389e3d25bf3e43f71611ec12f3012de0
>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>
>> before this commit, but it seems like the whole stack didn't work for
>> your either.
>>
>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
> 
> But at some point in time the disk do not get any requests.
> 
> I currently have no clue why.
> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
> 
> Do you have anything I could have a look at?

Jens, Christoph, so what do we do about this?
To summarize:
- commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
- Jens' quick revert did fix the issue and did not broke DASD support but has some issues
with interrupt affinity.
- Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
without hotplug).

Christian

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-20 15:47                                                                         ` Christian Borntraeger
@ 2018-01-11  9:13                                                                             ` Ming Lei
  0 siblings, 0 replies; 96+ messages in thread
From: Ming Lei @ 2018-01-11  9:13 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Stefan Haberland, Christoph Hellwig, Jens Axboe, Bart Van Assche,
	linux-block, linux-kernel, Thomas Gleixner, linux-s390,
	Martin Schwidefsky

On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> > On 07.12.2017 00:29, Christoph Hellwig wrote:
> >> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> >> t > commit 11b2025c3326f7096ceb588c3117c7883850c068��� -> bad
> >>> ���� blk-mq: create a blk_mq_ctx for each possible CPU
> >>> does not boot on DASD and
> >>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc��� -> good
> >>> ��� genirq/affinity: assign vectors to all possible CPUs
> >>> does boot with DASD disks.
> >>>
> >>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> >>> s390 irq handling code).
> >> That is interesting as it really isn't related to interrupts at all,
> >> it just ensures that possible CPUs are set in ->cpumask.
> >>
> >> I guess we'd really want:
> >>
> >> e005655c389e3d25bf3e43f71611ec12f3012de0
> >> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
> >>
> >> before this commit, but it seems like the whole stack didn't work for
> >> your either.
> >>
> >> I wonder if there is some weird thing about nr_cpu_ids in s390?
> >> -- 
> >> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at� http://vger.kernel.org/majordomo-info.html
> >>
> > 
> > I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> > The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
> > 
> > But at some point in time the disk do not get any requests.
> > 
> > I currently have no clue why.
> > I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
> > 
> > Do you have anything I could have a look at?
> 
> Jens, Christoph, so what do we do about this?
> To summarize:
> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
> with interrupt affinity.
> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
> without hotplug).

Hello,

This one is a valid use case for VM, I think we need to fix that.

Looks there is issue on the fouth patch("blk-mq: only select online
CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
the other 3 patches are same with Christoph's:

	https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix

gitweb:
	https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix

Could you test it and provide the feedback?

BTW, if it can't help this issue, could you boot from a normal disk first
and dump blk-mq debugfs of DASD later?

Thanks, 
Ming

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2018-01-11  9:13                                                                             ` Ming Lei
  0 siblings, 0 replies; 96+ messages in thread
From: Ming Lei @ 2018-01-11  9:13 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Stefan Haberland, Christoph Hellwig, Jens Axboe, Bart Van Assche,
	linux-block, linux-kernel, Thomas Gleixner, linux-s390,
	Martin Schwidefsky

On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> > On 07.12.2017 00:29, Christoph Hellwig wrote:
> >> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> >> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
> >>>      blk-mq: create a blk_mq_ctx for each possible CPU
> >>> does not boot on DASD and
> >>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
> >>>     genirq/affinity: assign vectors to all possible CPUs
> >>> does boot with DASD disks.
> >>>
> >>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> >>> s390 irq handling code).
> >> That is interesting as it really isn't related to interrupts at all,
> >> it just ensures that possible CPUs are set in ->cpumask.
> >>
> >> I guess we'd really want:
> >>
> >> e005655c389e3d25bf3e43f71611ec12f3012de0
> >> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
> >>
> >> before this commit, but it seems like the whole stack didn't work for
> >> your either.
> >>
> >> I wonder if there is some weird thing about nr_cpu_ids in s390?
> >> -- 
> >> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> > 
> > I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> > The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
> > 
> > But at some point in time the disk do not get any requests.
> > 
> > I currently have no clue why.
> > I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
> > 
> > Do you have anything I could have a look at?
> 
> Jens, Christoph, so what do we do about this?
> To summarize:
> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
> with interrupt affinity.
> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
> without hotplug).

Hello,

This one is a valid use case for VM, I think we need to fix that.

Looks there is issue on the fouth patch("blk-mq: only select online
CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
the other 3 patches are same with Christoph's:

	https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix

gitweb:
	https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix

Could you test it and provide the feedback?

BTW, if it can't help this issue, could you boot from a normal disk first
and dump blk-mq debugfs of DASD later?

Thanks, 
Ming

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11  9:13                                                                             ` Ming Lei
  (?)
@ 2018-01-11  9:26                                                                             ` Stefan Haberland
  -1 siblings, 0 replies; 96+ messages in thread
From: Stefan Haberland @ 2018-01-11  9:26 UTC (permalink / raw)
  To: Ming Lei, Christian Borntraeger
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
	linux-kernel, Thomas Gleixner, linux-s390, Martin Schwidefsky

On 11.01.2018 10:13, Ming Lei wrote:
> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>>>       blk-mq: create a blk_mq_ctx for each possible CPU
>>>>> does not boot on DASD and
>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>>>      genirq/affinity: assign vectors to all possible CPUs
>>>>> does boot with DASD disks.
>>>>>
>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>> s390 irq handling code).
>>>> That is interesting as it really isn't related to interrupts at all,
>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>
>>>> I guess we'd really want:
>>>>
>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>
>>>> before this commit, but it seems like the whole stack didn't work for
>>>> your either.
>>>>
>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>
>>> But at some point in time the disk do not get any requests.
>>>
>>> I currently have no clue why.
>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>
>>> Do you have anything I could have a look at?
>> Jens, Christoph, so what do we do about this?
>> To summarize:
>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>> with interrupt affinity.
>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>> without hotplug).
> Hello,
>
> This one is a valid use case for VM, I think we need to fix that.
>
> Looks there is issue on the fouth patch("blk-mq: only select online
> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
> the other 3 patches are same with Christoph's:
>
> 	https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix
>
> gitweb:
> 	https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>
> Could you test it and provide the feedback?
>
> BTW, if it can't help this issue, could you boot from a normal disk first
> and dump blk-mq debugfs of DASD later?
>
> Thanks,
> Ming
>

Hi,

thanks for the patch. I had pretty much the same place in suspicion.
I will test it asap.

Regards,
Stefan

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11  9:13                                                                             ` Ming Lei
  (?)
  (?)
@ 2018-01-11 11:44                                                                             ` Christian Borntraeger
  2018-01-11 13:17                                                                               ` Stefan Haberland
  -1 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2018-01-11 11:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Stefan Haberland, Christoph Hellwig, Jens Axboe, Bart Van Assche,
	linux-block, linux-kernel, Thomas Gleixner, linux-s390,
	Martin Schwidefsky



On 01/11/2018 10:13 AM, Ming Lei wrote:
> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>>>      blk-mq: create a blk_mq_ctx for each possible CPU
>>>>> does not boot on DASD and
>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>>>     genirq/affinity: assign vectors to all possible CPUs
>>>>> does boot with DASD disks.
>>>>>
>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>> s390 irq handling code).
>>>> That is interesting as it really isn't related to interrupts at all,
>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>
>>>> I guess we'd really want:
>>>>
>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>
>>>> before this commit, but it seems like the whole stack didn't work for
>>>> your either.
>>>>
>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>
>>> But at some point in time the disk do not get any requests.
>>>
>>> I currently have no clue why.
>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>
>>> Do you have anything I could have a look at?
>>
>> Jens, Christoph, so what do we do about this?
>> To summarize:
>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>> with interrupt affinity.
>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>> without hotplug).
> 
> Hello,
> 
> This one is a valid use case for VM, I think we need to fix that.
> 
> Looks there is issue on the fouth patch("blk-mq: only select online
> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
> the other 3 patches are same with Christoph's:
> 
> 	https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix
> 
> gitweb:
> 	https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
> 
> Could you test it and provide the feedback?
> 
> BTW, if it can't help this issue, could you boot from a normal disk first
> and dump blk-mq debugfs of DASD later?

That kernel seems to boot fine on my system with DASD disks.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11 11:44                                                                             ` Christian Borntraeger
@ 2018-01-11 13:17                                                                               ` Stefan Haberland
  0 siblings, 0 replies; 96+ messages in thread
From: Stefan Haberland @ 2018-01-11 13:17 UTC (permalink / raw)
  To: Christian Borntraeger, Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
	linux-kernel, Thomas Gleixner, linux-s390, Martin Schwidefsky

On 11.01.2018 12:44, Christian Borntraeger wrote:
>
> On 01/11/2018 10:13 AM, Ming Lei wrote:
>> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>>>>       blk-mq: create a blk_mq_ctx for each possible CPU
>>>>>> does not boot on DASD and
>>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>>>>      genirq/affinity: assign vectors to all possible CPUs
>>>>>> does boot with DASD disks.
>>>>>>
>>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>>> s390 irq handling code).
>>>>> That is interesting as it really isn't related to interrupts at all,
>>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>>
>>>>> I guess we'd really want:
>>>>>
>>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>>
>>>>> before this commit, but it seems like the whole stack didn't work for
>>>>> your either.
>>>>>
>>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>>
>>>> But at some point in time the disk do not get any requests.
>>>>
>>>> I currently have no clue why.
>>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>>
>>>> Do you have anything I could have a look at?
>>> Jens, Christoph, so what do we do about this?
>>> To summarize:
>>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>>> with interrupt affinity.
>>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>>> without hotplug).
>> Hello,
>>
>> This one is a valid use case for VM, I think we need to fix that.
>>
>> Looks there is issue on the fouth patch("blk-mq: only select online
>> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
>> the other 3 patches are same with Christoph's:
>>
>> 	https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix
>>
>> gitweb:
>> 	https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>>
>> Could you test it and provide the feedback?
>>
>> BTW, if it can't help this issue, could you boot from a normal disk first
>> and dump blk-mq debugfs of DASD later?
> That kernel seems to boot fine on my system with DASD disks.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

I did some regression testing and it works quite well. Boot works, 
attaching CPUs during runtime on z/VM and enabling them in Linux works 
as well.
I also did some DASD online/offline CPU enable/disable loops.

Regards,
Stefan

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11  9:13                                                                             ` Ming Lei
                                                                                               ` (2 preceding siblings ...)
  (?)
@ 2018-01-11 17:46                                                                             ` Christoph Hellwig
  2018-01-12  1:16                                                                               ` Ming Lei
  -1 siblings, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2018-01-11 17:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christian Borntraeger, Stefan Haberland, Christoph Hellwig,
	Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
	Thomas Gleixner, linux-s390, Martin Schwidefsky

Thanks for looking into this Ming, I had missed it in the my current
work overload.  Can you send the updated series to Jens?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11 17:46                                                                             ` Christoph Hellwig
@ 2018-01-12  1:16                                                                               ` Ming Lei
  0 siblings, 0 replies; 96+ messages in thread
From: Ming Lei @ 2018-01-12  1:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Borntraeger, Stefan Haberland, Jens Axboe,
	Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner,
	linux-s390, Martin Schwidefsky

On Thu, Jan 11, 2018 at 06:46:54PM +0100, Christoph Hellwig wrote:
> Thanks for looking into this Ming, I had missed it in the my current
> work overload.  Can you send the updated series to Jens?

OK, I will post it out soon.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2018-01-12  1:16 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-17 14:42 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk Christian Borntraeger
2017-11-20 19:20 ` Bart Van Assche
2017-11-20 19:29   ` Christian Borntraeger
2017-11-20 19:29   ` Christian Borntraeger
2017-11-20 19:42     ` Jens Axboe
2017-11-20 19:42     ` Jens Axboe
2017-11-20 20:49       ` Christian Borntraeger
2017-11-20 20:49       ` Christian Borntraeger
2017-11-20 20:49         ` Christian Borntraeger
2017-11-20 20:52         ` Jens Axboe
2017-11-20 20:52           ` Jens Axboe
2017-11-21  8:35           ` Christian Borntraeger
2017-11-21  8:35             ` Christian Borntraeger
2017-11-21  9:50             ` Christian Borntraeger
2017-11-21  9:50             ` Christian Borntraeger
2017-11-21  9:50               ` Christian Borntraeger
2017-11-21 10:14               ` 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable) Christian Borntraeger
2017-11-21 10:14               ` Christian Borntraeger
2017-11-21 10:14                 ` Christian Borntraeger
2017-11-21 17:27                 ` Jens Axboe
2017-11-21 17:27                 ` Jens Axboe
2017-11-21 17:27                   ` Jens Axboe
2017-11-21 18:09                   ` Jens Axboe
2017-11-21 18:09                     ` Jens Axboe
2017-11-21 18:09                     ` Jens Axboe
2017-11-21 18:12                     ` Christian Borntraeger
2017-11-21 18:12                     ` Christian Borntraeger
2017-11-21 18:12                       ` Christian Borntraeger
2017-11-21 18:27                       ` Jens Axboe
2017-11-21 18:27                         ` Jens Axboe
2017-11-21 18:39                         ` Jens Axboe
2017-11-21 18:39                           ` Jens Axboe
2017-11-21 19:15                           ` Christian Borntraeger
2017-11-21 19:15                             ` Christian Borntraeger
2017-11-21 19:30                             ` Jens Axboe
2017-11-21 19:30                             ` Jens Axboe
2017-11-21 19:30                               ` Jens Axboe
2017-11-21 20:12                               ` Christian Borntraeger
2017-11-21 20:12                               ` Christian Borntraeger
2017-11-21 20:12                                 ` Christian Borntraeger
2017-11-21 20:14                                 ` Jens Axboe
2017-11-21 20:14                                   ` Jens Axboe
2017-11-21 20:19                                   ` Christian Borntraeger
2017-11-21 20:19                                     ` Christian Borntraeger
2017-11-21 20:21                                     ` Jens Axboe
2017-11-21 20:21                                       ` Jens Axboe
2017-11-21 20:31                                       ` Christian Borntraeger
2017-11-21 20:31                                       ` Christian Borntraeger
2017-11-21 20:31                                         ` Christian Borntraeger
2017-11-21 20:39                                         ` Jens Axboe
2017-11-21 20:39                                           ` Jens Axboe
2017-11-22  7:28                                           ` Christoph Hellwig
2017-11-22  7:28                                           ` Christoph Hellwig
2017-11-22  7:28                                             ` Christoph Hellwig
2017-11-22 14:46                                             ` Jens Axboe
2017-11-22 14:46                                               ` Jens Axboe
2017-11-23 14:34                                               ` Christoph Hellwig
2017-11-23 14:42                                                 ` Hannes Reinecke
2017-11-23 14:47                                                   ` Christoph Hellwig
2017-11-23 15:05                                                 ` Christian Borntraeger
2017-11-23 18:17                                                 ` Christian Borntraeger
2017-11-23 18:25                                                   ` Christoph Hellwig
2017-11-23 18:28                                                     ` Christian Borntraeger
2017-11-23 18:32                                                       ` Christoph Hellwig
2017-11-23 18:59                                                         ` Christian Borntraeger
2017-11-24 13:09                                                           ` Christian Borntraeger
2017-11-27 15:54                                                             ` Christoph Hellwig
2017-11-29 19:18                                                               ` Christian Borntraeger
2017-11-29 19:36                                                                 ` Christian Borntraeger
2017-12-04 16:21                                                                 ` Christoph Hellwig
2017-12-06 12:25                                                                   ` Christian Borntraeger
2017-12-06 12:25                                                                     ` Christian Borntraeger
2017-12-06 23:29                                                                     ` Christoph Hellwig
2017-12-07  9:20                                                                       ` Christian Borntraeger
2017-12-14 17:32                                                                         ` Christian Borntraeger
2017-12-18 13:56                                                                       ` Stefan Haberland
2017-12-20 15:47                                                                         ` Christian Borntraeger
2018-01-11  9:13                                                                           ` Ming Lei
2018-01-11  9:13                                                                             ` Ming Lei
2018-01-11  9:26                                                                             ` Stefan Haberland
2018-01-11 11:44                                                                             ` Christian Borntraeger
2018-01-11 13:17                                                                               ` Stefan Haberland
2018-01-11 17:46                                                                             ` Christoph Hellwig
2018-01-12  1:16                                                                               ` Ming Lei
2017-11-22 14:46                                             ` Jens Axboe
2017-11-21 20:39                                         ` Jens Axboe
2017-11-21 20:21                                     ` Jens Axboe
2017-11-21 20:19                                   ` Christian Borntraeger
2017-11-21 19:15                           ` Christian Borntraeger
2017-11-23 14:02                       ` Christoph Hellwig
2017-11-23 14:02                       ` Christoph Hellwig
2017-11-23 14:02                         ` Christoph Hellwig
2017-11-23 14:08                         ` Christoph Hellwig
2017-11-23 14:08                           ` Christoph Hellwig
2017-11-23 14:08                         ` Christoph Hellwig
2017-11-20 19:20 ` 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.