* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 19:30 ` Jens Axboe
0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 19:30 UTC (permalink / raw)
To: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig
On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>> Bisect points to
>>>>>>>
>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>
>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>
>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>
>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>> the code.
>>>>>>>
>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>
>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>> take a look.
>>>>>
>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>> up as present (just offline) and get mapped accordingly.
>>>>>
>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>> would trigger. What environment are you running this in? We might have
>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>> for a dead cpu and handle that.
>>>>
>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>> not available CPU.
>>>>
>>>> in libvirt/virsh speak:
>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>
>>> So that's why we run into problems. It's not present when we load the device,
>>> but becomes present and online afterwards.
>>>
>>> Christoph, we used to handle this just fine, your patch broke it.
>>>
>>> I'll see if I can come up with an appropriate fix.
>>
>> Can you try the below?
>
>
> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>
>
> output with 2 cpus:
> /sys/kernel/debug/block/vda
> /sys/kernel/debug/block/vda/hctx0
> /sys/kernel/debug/block/vda/hctx0/cpu0
> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
> /sys/kernel/debug/block/vda/hctx0/active
> /sys/kernel/debug/block/vda/hctx0/run
> /sys/kernel/debug/block/vda/hctx0/queued
> /sys/kernel/debug/block/vda/hctx0/dispatched
> /sys/kernel/debug/block/vda/hctx0/io_poll
> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/sched_tags
> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/tags
> /sys/kernel/debug/block/vda/hctx0/ctx_map
> /sys/kernel/debug/block/vda/hctx0/busy
> /sys/kernel/debug/block/vda/hctx0/dispatch
> /sys/kernel/debug/block/vda/hctx0/flags
> /sys/kernel/debug/block/vda/hctx0/state
> /sys/kernel/debug/block/vda/sched
> /sys/kernel/debug/block/vda/sched/dispatch
> /sys/kernel/debug/block/vda/sched/starved
> /sys/kernel/debug/block/vda/sched/batching
> /sys/kernel/debug/block/vda/sched/write_next_rq
> /sys/kernel/debug/block/vda/sched/write_fifo_list
> /sys/kernel/debug/block/vda/sched/read_next_rq
> /sys/kernel/debug/block/vda/sched/read_fifo_list
> /sys/kernel/debug/block/vda/write_hints
> /sys/kernel/debug/block/vda/state
> /sys/kernel/debug/block/vda/requeue_list
> /sys/kernel/debug/block/vda/poll_stat
Try this, basically just a revert.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..bc1950fa9ef6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,6 +37,9 @@
#include "blk-wbt.h"
#include "blk-mq-sched.h"
+static DEFINE_MUTEX(all_q_mutex);
+static LIST_HEAD(all_q_list);
+
static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
static void blk_mq_poll_stats_start(struct request_queue *q);
static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
@@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
INIT_LIST_HEAD(&__ctx->rq_list);
__ctx->queue = q;
- /* If the cpu isn't present, the cpu is mapped to first hctx */
- if (!cpu_present(i))
+ /* If the cpu isn't online, the cpu is mapped to first hctx */
+ if (!cpu_online(i))
continue;
hctx = blk_mq_map_queue(q, i);
@@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
}
}
-static void blk_mq_map_swqueue(struct request_queue *q)
+static void blk_mq_map_swqueue(struct request_queue *q,
+ const struct cpumask *online_mask)
{
unsigned int i, hctx_idx;
struct blk_mq_hw_ctx *hctx;
@@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
}
/*
- * Map software to hardware queues.
- *
- * If the cpu isn't present, the cpu is mapped to first hctx.
+ * Map software to hardware queues
*/
- for_each_present_cpu(i) {
+ for_each_possible_cpu(i) {
+ /* If the cpu isn't online, the cpu is mapped to first hctx */
+ if (!cpumask_test_cpu(i, online_mask))
+ continue;
+
hctx_idx = q->mq_map[i];
/* unmapped hw queue can be remapped after CPU topo changed */
if (!set->tags[hctx_idx] &&
@@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
blk_queue_softirq_done(q, set->ops->complete);
blk_mq_init_cpu_queues(q, set->nr_hw_queues);
+
+ get_online_cpus();
+ mutex_lock(&all_q_mutex);
+
+ list_add_tail(&q->all_q_node, &all_q_list);
blk_mq_add_queue_tag_set(set, q);
- blk_mq_map_swqueue(q);
+ blk_mq_map_swqueue(q, cpu_online_mask);
+
+ mutex_unlock(&all_q_mutex);
+ put_online_cpus();
if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
int ret;
@@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
{
struct blk_mq_tag_set *set = q->tag_set;
+ mutex_lock(&all_q_mutex);
+ list_del_init(&q->all_q_node);
+ mutex_unlock(&all_q_mutex);
+
blk_mq_del_queue_tag_set(q);
+
blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
}
/* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q)
+static void blk_mq_queue_reinit(struct request_queue *q,
+ const struct cpumask *online_mask)
{
WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
@@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
* we should change hctx numa_node according to the new topology (this
* involves freeing and re-allocating memory, worth doing?)
*/
- blk_mq_map_swqueue(q);
+ blk_mq_map_swqueue(q, online_mask);
blk_mq_sysfs_register(q);
blk_mq_debugfs_register_hctxs(q);
}
+/*
+ * New online cpumask which is going to be set in this hotplug event.
+ * Declare this cpumasks as global as cpu-hotplug operation is invoked
+ * one-by-one and dynamically allocating this could result in a failure.
+ */
+static struct cpumask cpuhp_online_new;
+
+static void blk_mq_queue_reinit_work(void)
+{
+ struct request_queue *q;
+
+ mutex_lock(&all_q_mutex);
+ /*
+ * We need to freeze and reinit all existing queues. Freezing
+ * involves synchronous wait for an RCU grace period and doing it
+ * one by one may take a long time. Start freezing all queues in
+ * one swoop and then wait for the completions so that freezing can
+ * take place in parallel.
+ */
+ list_for_each_entry(q, &all_q_list, all_q_node)
+ blk_freeze_queue_start(q);
+ list_for_each_entry(q, &all_q_list, all_q_node)
+ blk_mq_freeze_queue_wait(q);
+
+ list_for_each_entry(q, &all_q_list, all_q_node)
+ blk_mq_queue_reinit(q, &cpuhp_online_new);
+
+ list_for_each_entry(q, &all_q_list, all_q_node)
+ blk_mq_unfreeze_queue(q);
+
+ mutex_unlock(&all_q_mutex);
+}
+
+static int blk_mq_queue_reinit_dead(unsigned int cpu)
+{
+ cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+ blk_mq_queue_reinit_work();
+ return 0;
+}
+
+/*
+ * Before hotadded cpu starts handling requests, new mappings must be
+ * established. Otherwise, these requests in hw queue might never be
+ * dispatched.
+ *
+ * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
+ * for CPU0, and ctx1 for CPU1).
+ *
+ * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
+ * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
+ *
+ * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
+ * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
+ * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
+ * ignored.
+ */
+static int blk_mq_queue_reinit_prepare(unsigned int cpu)
+{
+ cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+ cpumask_set_cpu(cpu, &cpuhp_online_new);
+ blk_mq_queue_reinit_work();
+ return 0;
+}
+
static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
{
int i;
@@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
blk_mq_update_queue_map(set);
list_for_each_entry(q, &set->tag_list, tag_set_list) {
blk_mq_realloc_hw_ctxs(set, q);
- blk_mq_queue_reinit(q);
+ blk_mq_queue_reinit(q, cpu_online_mask);
}
list_for_each_entry(q, &set->tag_list, tag_set_list)
@@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
return __blk_mq_poll(hctx, rq);
}
+void blk_mq_disable_hotplug(void)
+{
+ mutex_lock(&all_q_mutex);
+}
+
+void blk_mq_enable_hotplug(void)
+{
+ mutex_unlock(&all_q_mutex);
+}
+
static int __init blk_mq_init(void)
{
/*
@@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
blk_mq_hctx_notify_dead);
+
+ cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
+ blk_mq_queue_reinit_prepare,
+ blk_mq_queue_reinit_dead);
return 0;
}
subsys_initcall(blk_mq_init);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 6c7c3ff5bf62..83b13ef1915e 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
struct list_head *list);
+/*
+ * CPU hotplug helpers
+ */
+void blk_mq_enable_hotplug(void);
+void blk_mq_disable_hotplug(void);
/*
* CPU -> queue mappings
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 201ab7267986..c31d4e3bf6d0 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -76,6 +76,7 @@ enum cpuhp_state {
CPUHP_XEN_EVTCHN_PREPARE,
CPUHP_ARM_SHMOBILE_SCU_PREPARE,
CPUHP_SH_SH3X_PREPARE,
+ CPUHP_BLK_MQ_PREPARE,
CPUHP_NET_FLOW_PREPARE,
CPUHP_TOPOLOGY_PREPARE,
CPUHP_NET_IUCV_PREPARE,
--
Jens Axboe
^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 19:30 ` Jens Axboe
(?)
@ 2017-11-21 20:12 ` Christian Borntraeger
-1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:12 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
jasowang, linux-kernel, Christoph Hellwig
On 11/21/2017 08:30 PM, Jens Axboe wrote:
> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>> Bisect points to
>>>>>>>>
>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>
>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>
>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>
>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>> the code.
>>>>>>>>
>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>
>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>> take a look.
>>>>>>
>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>
>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>> would trigger. What environment are you running this in? We might have
>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>> for a dead cpu and handle that.
>>>>>
>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>> not available CPU.
>>>>>
>>>>> in libvirt/virsh speak:
>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>
>>>> So that's why we run into problems. It's not present when we load the device,
>>>> but becomes present and online afterwards.
>>>>
>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>
>>>> I'll see if I can come up with an appropriate fix.
>>>
>>> Can you try the below?
>>
>>
>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>
>>
>> output with 2 cpus:
>> /sys/kernel/debug/block/vda
>> /sys/kernel/debug/block/vda/hctx0
>> /sys/kernel/debug/block/vda/hctx0/cpu0
>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>> /sys/kernel/debug/block/vda/hctx0/active
>> /sys/kernel/debug/block/vda/hctx0/run
>> /sys/kernel/debug/block/vda/hctx0/queued
>> /sys/kernel/debug/block/vda/hctx0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/io_poll
>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/tags
>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>> /sys/kernel/debug/block/vda/hctx0/busy
>> /sys/kernel/debug/block/vda/hctx0/dispatch
>> /sys/kernel/debug/block/vda/hctx0/flags
>> /sys/kernel/debug/block/vda/hctx0/state
>> /sys/kernel/debug/block/vda/sched
>> /sys/kernel/debug/block/vda/sched/dispatch
>> /sys/kernel/debug/block/vda/sched/starved
>> /sys/kernel/debug/block/vda/sched/batching
>> /sys/kernel/debug/block/vda/sched/write_next_rq
>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>> /sys/kernel/debug/block/vda/sched/read_next_rq
>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>> /sys/kernel/debug/block/vda/write_hints
>> /sys/kernel/debug/block/vda/state
>> /sys/kernel/debug/block/vda/requeue_list
>> /sys/kernel/debug/block/vda/poll_stat
>
> Try this, basically just a revert.
Yes, seems to work.
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Do you know why the original commit made it into 4.12 stable? After all
it has no Fixes tag and no cc stable-
>
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..bc1950fa9ef6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -37,6 +37,9 @@
> #include "blk-wbt.h"
> #include "blk-mq-sched.h"
>
> +static DEFINE_MUTEX(all_q_mutex);
> +static LIST_HEAD(all_q_list);
> +
> static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
> static void blk_mq_poll_stats_start(struct request_queue *q);
> static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> @@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
> INIT_LIST_HEAD(&__ctx->rq_list);
> __ctx->queue = q;
>
> - /* If the cpu isn't present, the cpu is mapped to first hctx */
> - if (!cpu_present(i))
> + /* If the cpu isn't online, the cpu is mapped to first hctx */
> + if (!cpu_online(i))
> continue;
>
> hctx = blk_mq_map_queue(q, i);
> @@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
> }
> }
>
> -static void blk_mq_map_swqueue(struct request_queue *q)
> +static void blk_mq_map_swqueue(struct request_queue *q,
> + const struct cpumask *online_mask)
> {
> unsigned int i, hctx_idx;
> struct blk_mq_hw_ctx *hctx;
> @@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> }
>
> /*
> - * Map software to hardware queues.
> - *
> - * If the cpu isn't present, the cpu is mapped to first hctx.
> + * Map software to hardware queues
> */
> - for_each_present_cpu(i) {
> + for_each_possible_cpu(i) {
> + /* If the cpu isn't online, the cpu is mapped to first hctx */
> + if (!cpumask_test_cpu(i, online_mask))
> + continue;
> +
> hctx_idx = q->mq_map[i];
> /* unmapped hw queue can be remapped after CPU topo changed */
> if (!set->tags[hctx_idx] &&
> @@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
> blk_queue_softirq_done(q, set->ops->complete);
>
> blk_mq_init_cpu_queues(q, set->nr_hw_queues);
> +
> + get_online_cpus();
> + mutex_lock(&all_q_mutex);
> +
> + list_add_tail(&q->all_q_node, &all_q_list);
> blk_mq_add_queue_tag_set(set, q);
> - blk_mq_map_swqueue(q);
> + blk_mq_map_swqueue(q, cpu_online_mask);
> +
> + mutex_unlock(&all_q_mutex);
> + put_online_cpus();
>
> if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
> int ret;
> @@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
> {
> struct blk_mq_tag_set *set = q->tag_set;
>
> + mutex_lock(&all_q_mutex);
> + list_del_init(&q->all_q_node);
> + mutex_unlock(&all_q_mutex);
> +
> blk_mq_del_queue_tag_set(q);
> +
> blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
> }
>
> /* Basically redo blk_mq_init_queue with queue frozen */
> -static void blk_mq_queue_reinit(struct request_queue *q)
> +static void blk_mq_queue_reinit(struct request_queue *q,
> + const struct cpumask *online_mask)
> {
> WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
>
> @@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
> * we should change hctx numa_node according to the new topology (this
> * involves freeing and re-allocating memory, worth doing?)
> */
> - blk_mq_map_swqueue(q);
> + blk_mq_map_swqueue(q, online_mask);
>
> blk_mq_sysfs_register(q);
> blk_mq_debugfs_register_hctxs(q);
> }
>
> +/*
> + * New online cpumask which is going to be set in this hotplug event.
> + * Declare this cpumasks as global as cpu-hotplug operation is invoked
> + * one-by-one and dynamically allocating this could result in a failure.
> + */
> +static struct cpumask cpuhp_online_new;
> +
> +static void blk_mq_queue_reinit_work(void)
> +{
> + struct request_queue *q;
> +
> + mutex_lock(&all_q_mutex);
> + /*
> + * We need to freeze and reinit all existing queues. Freezing
> + * involves synchronous wait for an RCU grace period and doing it
> + * one by one may take a long time. Start freezing all queues in
> + * one swoop and then wait for the completions so that freezing can
> + * take place in parallel.
> + */
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_freeze_queue_start(q);
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_freeze_queue_wait(q);
> +
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_queue_reinit(q, &cpuhp_online_new);
> +
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_unfreeze_queue(q);
> +
> + mutex_unlock(&all_q_mutex);
> +}
> +
> +static int blk_mq_queue_reinit_dead(unsigned int cpu)
> +{
> + cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> + blk_mq_queue_reinit_work();
> + return 0;
> +}
> +
> +/*
> + * Before hotadded cpu starts handling requests, new mappings must be
> + * established. Otherwise, these requests in hw queue might never be
> + * dispatched.
> + *
> + * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
> + * for CPU0, and ctx1 for CPU1).
> + *
> + * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
> + * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
> + *
> + * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
> + * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
> + * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
> + * ignored.
> + */
> +static int blk_mq_queue_reinit_prepare(unsigned int cpu)
> +{
> + cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> + cpumask_set_cpu(cpu, &cpuhp_online_new);
> + blk_mq_queue_reinit_work();
> + return 0;
> +}
> +
> static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> {
> int i;
> @@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
> blk_mq_update_queue_map(set);
> list_for_each_entry(q, &set->tag_list, tag_set_list) {
> blk_mq_realloc_hw_ctxs(set, q);
> - blk_mq_queue_reinit(q);
> + blk_mq_queue_reinit(q, cpu_online_mask);
> }
>
> list_for_each_entry(q, &set->tag_list, tag_set_list)
> @@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
> return __blk_mq_poll(hctx, rq);
> }
>
> +void blk_mq_disable_hotplug(void)
> +{
> + mutex_lock(&all_q_mutex);
> +}
> +
> +void blk_mq_enable_hotplug(void)
> +{
> + mutex_unlock(&all_q_mutex);
> +}
> +
> static int __init blk_mq_init(void)
> {
> /*
> @@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
>
> cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
> blk_mq_hctx_notify_dead);
> +
> + cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> + blk_mq_queue_reinit_prepare,
> + blk_mq_queue_reinit_dead);
> return 0;
> }
> subsys_initcall(blk_mq_init);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 6c7c3ff5bf62..83b13ef1915e 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
> void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
> struct list_head *list);
> +/*
> + * CPU hotplug helpers
> + */
> +void blk_mq_enable_hotplug(void);
> +void blk_mq_disable_hotplug(void);
>
> /*
> * CPU -> queue mappings
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 201ab7267986..c31d4e3bf6d0 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -76,6 +76,7 @@ enum cpuhp_state {
> CPUHP_XEN_EVTCHN_PREPARE,
> CPUHP_ARM_SHMOBILE_SCU_PREPARE,
> CPUHP_SH_SH3X_PREPARE,
> + CPUHP_BLK_MQ_PREPARE,
> CPUHP_NET_FLOW_PREPARE,
> CPUHP_TOPOLOGY_PREPARE,
> CPUHP_NET_IUCV_PREPARE,
>
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 19:30 ` Jens Axboe
@ 2017-11-21 20:12 ` Christian Borntraeger
-1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:12 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
jasowang, linux-kernel, Christoph Hellwig
On 11/21/2017 08:30 PM, Jens Axboe wrote:
> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>> Bisect points to
>>>>>>>>
>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>
>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>
>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>
>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>> the code.
>>>>>>>>
>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>
>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>> take a look.
>>>>>>
>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>
>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>> would trigger. What environment are you running this in? We might have
>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>> for a dead cpu and handle that.
>>>>>
>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>> not available CPU.
>>>>>
>>>>> in libvirt/virsh speak:
>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>
>>>> So that's why we run into problems. It's not present when we load the device,
>>>> but becomes present and online afterwards.
>>>>
>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>
>>>> I'll see if I can come up with an appropriate fix.
>>>
>>> Can you try the below?
>>
>>
>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>
>>
>> output with 2 cpus:
>> /sys/kernel/debug/block/vda
>> /sys/kernel/debug/block/vda/hctx0
>> /sys/kernel/debug/block/vda/hctx0/cpu0
>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>> /sys/kernel/debug/block/vda/hctx0/active
>> /sys/kernel/debug/block/vda/hctx0/run
>> /sys/kernel/debug/block/vda/hctx0/queued
>> /sys/kernel/debug/block/vda/hctx0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/io_poll
>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/tags
>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>> /sys/kernel/debug/block/vda/hctx0/busy
>> /sys/kernel/debug/block/vda/hctx0/dispatch
>> /sys/kernel/debug/block/vda/hctx0/flags
>> /sys/kernel/debug/block/vda/hctx0/state
>> /sys/kernel/debug/block/vda/sched
>> /sys/kernel/debug/block/vda/sched/dispatch
>> /sys/kernel/debug/block/vda/sched/starved
>> /sys/kernel/debug/block/vda/sched/batching
>> /sys/kernel/debug/block/vda/sched/write_next_rq
>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>> /sys/kernel/debug/block/vda/sched/read_next_rq
>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>> /sys/kernel/debug/block/vda/write_hints
>> /sys/kernel/debug/block/vda/state
>> /sys/kernel/debug/block/vda/requeue_list
>> /sys/kernel/debug/block/vda/poll_stat
>
> Try this, basically just a revert.
Yes, seems to work.
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Do you know why the original commit made it into 4.12 stable? After all
it has no Fixes tag and no cc stable-
>
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..bc1950fa9ef6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -37,6 +37,9 @@
> #include "blk-wbt.h"
> #include "blk-mq-sched.h"
>
> +static DEFINE_MUTEX(all_q_mutex);
> +static LIST_HEAD(all_q_list);
> +
> static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
> static void blk_mq_poll_stats_start(struct request_queue *q);
> static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> @@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
> INIT_LIST_HEAD(&__ctx->rq_list);
> __ctx->queue = q;
>
> - /* If the cpu isn't present, the cpu is mapped to first hctx */
> - if (!cpu_present(i))
> + /* If the cpu isn't online, the cpu is mapped to first hctx */
> + if (!cpu_online(i))
> continue;
>
> hctx = blk_mq_map_queue(q, i);
> @@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
> }
> }
>
> -static void blk_mq_map_swqueue(struct request_queue *q)
> +static void blk_mq_map_swqueue(struct request_queue *q,
> + const struct cpumask *online_mask)
> {
> unsigned int i, hctx_idx;
> struct blk_mq_hw_ctx *hctx;
> @@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> }
>
> /*
> - * Map software to hardware queues.
> - *
> - * If the cpu isn't present, the cpu is mapped to first hctx.
> + * Map software to hardware queues
> */
> - for_each_present_cpu(i) {
> + for_each_possible_cpu(i) {
> + /* If the cpu isn't online, the cpu is mapped to first hctx */
> + if (!cpumask_test_cpu(i, online_mask))
> + continue;
> +
> hctx_idx = q->mq_map[i];
> /* unmapped hw queue can be remapped after CPU topo changed */
> if (!set->tags[hctx_idx] &&
> @@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
> blk_queue_softirq_done(q, set->ops->complete);
>
> blk_mq_init_cpu_queues(q, set->nr_hw_queues);
> +
> + get_online_cpus();
> + mutex_lock(&all_q_mutex);
> +
> + list_add_tail(&q->all_q_node, &all_q_list);
> blk_mq_add_queue_tag_set(set, q);
> - blk_mq_map_swqueue(q);
> + blk_mq_map_swqueue(q, cpu_online_mask);
> +
> + mutex_unlock(&all_q_mutex);
> + put_online_cpus();
>
> if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
> int ret;
> @@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
> {
> struct blk_mq_tag_set *set = q->tag_set;
>
> + mutex_lock(&all_q_mutex);
> + list_del_init(&q->all_q_node);
> + mutex_unlock(&all_q_mutex);
> +
> blk_mq_del_queue_tag_set(q);
> +
> blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
> }
>
> /* Basically redo blk_mq_init_queue with queue frozen */
> -static void blk_mq_queue_reinit(struct request_queue *q)
> +static void blk_mq_queue_reinit(struct request_queue *q,
> + const struct cpumask *online_mask)
> {
> WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
>
> @@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
> * we should change hctx numa_node according to the new topology (this
> * involves freeing and re-allocating memory, worth doing?)
> */
> - blk_mq_map_swqueue(q);
> + blk_mq_map_swqueue(q, online_mask);
>
> blk_mq_sysfs_register(q);
> blk_mq_debugfs_register_hctxs(q);
> }
>
> +/*
> + * New online cpumask which is going to be set in this hotplug event.
> + * Declare this cpumasks as global as cpu-hotplug operation is invoked
> + * one-by-one and dynamically allocating this could result in a failure.
> + */
> +static struct cpumask cpuhp_online_new;
> +
> +static void blk_mq_queue_reinit_work(void)
> +{
> + struct request_queue *q;
> +
> + mutex_lock(&all_q_mutex);
> + /*
> + * We need to freeze and reinit all existing queues. Freezing
> + * involves synchronous wait for an RCU grace period and doing it
> + * one by one may take a long time. Start freezing all queues in
> + * one swoop and then wait for the completions so that freezing can
> + * take place in parallel.
> + */
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_freeze_queue_start(q);
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_freeze_queue_wait(q);
> +
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_queue_reinit(q, &cpuhp_online_new);
> +
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_unfreeze_queue(q);
> +
> + mutex_unlock(&all_q_mutex);
> +}
> +
> +static int blk_mq_queue_reinit_dead(unsigned int cpu)
> +{
> + cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> + blk_mq_queue_reinit_work();
> + return 0;
> +}
> +
> +/*
> + * Before hotadded cpu starts handling requests, new mappings must be
> + * established. Otherwise, these requests in hw queue might never be
> + * dispatched.
> + *
> + * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
> + * for CPU0, and ctx1 for CPU1).
> + *
> + * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
> + * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
> + *
> + * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
> + * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
> + * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
> + * ignored.
> + */
> +static int blk_mq_queue_reinit_prepare(unsigned int cpu)
> +{
> + cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> + cpumask_set_cpu(cpu, &cpuhp_online_new);
> + blk_mq_queue_reinit_work();
> + return 0;
> +}
> +
> static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> {
> int i;
> @@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
> blk_mq_update_queue_map(set);
> list_for_each_entry(q, &set->tag_list, tag_set_list) {
> blk_mq_realloc_hw_ctxs(set, q);
> - blk_mq_queue_reinit(q);
> + blk_mq_queue_reinit(q, cpu_online_mask);
> }
>
> list_for_each_entry(q, &set->tag_list, tag_set_list)
> @@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
> return __blk_mq_poll(hctx, rq);
> }
>
> +void blk_mq_disable_hotplug(void)
> +{
> + mutex_lock(&all_q_mutex);
> +}
> +
> +void blk_mq_enable_hotplug(void)
> +{
> + mutex_unlock(&all_q_mutex);
> +}
> +
> static int __init blk_mq_init(void)
> {
> /*
> @@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
>
> cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
> blk_mq_hctx_notify_dead);
> +
> + cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> + blk_mq_queue_reinit_prepare,
> + blk_mq_queue_reinit_dead);
> return 0;
> }
> subsys_initcall(blk_mq_init);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 6c7c3ff5bf62..83b13ef1915e 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
> void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
> struct list_head *list);
> +/*
> + * CPU hotplug helpers
> + */
> +void blk_mq_enable_hotplug(void);
> +void blk_mq_disable_hotplug(void);
>
> /*
> * CPU -> queue mappings
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 201ab7267986..c31d4e3bf6d0 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -76,6 +76,7 @@ enum cpuhp_state {
> CPUHP_XEN_EVTCHN_PREPARE,
> CPUHP_ARM_SHMOBILE_SCU_PREPARE,
> CPUHP_SH_SH3X_PREPARE,
> + CPUHP_BLK_MQ_PREPARE,
> CPUHP_NET_FLOW_PREPARE,
> CPUHP_TOPOLOGY_PREPARE,
> CPUHP_NET_IUCV_PREPARE,
>
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:12 ` Christian Borntraeger
0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:12 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
jasowang, linux-kernel, Christoph Hellwig
On 11/21/2017 08:30 PM, Jens Axboe wrote:
> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>> Bisect points to
>>>>>>>>
>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>
>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>
>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>
>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>> the code.
>>>>>>>>
>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>
>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>> take a look.
>>>>>>
>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>
>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>> would trigger. What environment are you running this in? We might have
>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>> for a dead cpu and handle that.
>>>>>
>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>> not available CPU.
>>>>>
>>>>> in libvirt/virsh speak:
>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>
>>>> So that's why we run into problems. It's not present when we load the device,
>>>> but becomes present and online afterwards.
>>>>
>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>
>>>> I'll see if I can come up with an appropriate fix.
>>>
>>> Can you try the below?
>>
>>
>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>
>>
>> output with 2 cpus:
>> /sys/kernel/debug/block/vda
>> /sys/kernel/debug/block/vda/hctx0
>> /sys/kernel/debug/block/vda/hctx0/cpu0
>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>> /sys/kernel/debug/block/vda/hctx0/active
>> /sys/kernel/debug/block/vda/hctx0/run
>> /sys/kernel/debug/block/vda/hctx0/queued
>> /sys/kernel/debug/block/vda/hctx0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/io_poll
>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/tags
>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>> /sys/kernel/debug/block/vda/hctx0/busy
>> /sys/kernel/debug/block/vda/hctx0/dispatch
>> /sys/kernel/debug/block/vda/hctx0/flags
>> /sys/kernel/debug/block/vda/hctx0/state
>> /sys/kernel/debug/block/vda/sched
>> /sys/kernel/debug/block/vda/sched/dispatch
>> /sys/kernel/debug/block/vda/sched/starved
>> /sys/kernel/debug/block/vda/sched/batching
>> /sys/kernel/debug/block/vda/sched/write_next_rq
>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>> /sys/kernel/debug/block/vda/sched/read_next_rq
>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>> /sys/kernel/debug/block/vda/write_hints
>> /sys/kernel/debug/block/vda/state
>> /sys/kernel/debug/block/vda/requeue_list
>> /sys/kernel/debug/block/vda/poll_stat
>
> Try this, basically just a revert.
Yes, seems to work.
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Do you know why the original commit made it into 4.12 stable? After all
it has no Fixes tag and no cc stable-
>
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..bc1950fa9ef6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -37,6 +37,9 @@
> #include "blk-wbt.h"
> #include "blk-mq-sched.h"
>
> +static DEFINE_MUTEX(all_q_mutex);
> +static LIST_HEAD(all_q_list);
> +
> static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
> static void blk_mq_poll_stats_start(struct request_queue *q);
> static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> @@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
> INIT_LIST_HEAD(&__ctx->rq_list);
> __ctx->queue = q;
>
> - /* If the cpu isn't present, the cpu is mapped to first hctx */
> - if (!cpu_present(i))
> + /* If the cpu isn't online, the cpu is mapped to first hctx */
> + if (!cpu_online(i))
> continue;
>
> hctx = blk_mq_map_queue(q, i);
> @@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
> }
> }
>
> -static void blk_mq_map_swqueue(struct request_queue *q)
> +static void blk_mq_map_swqueue(struct request_queue *q,
> + const struct cpumask *online_mask)
> {
> unsigned int i, hctx_idx;
> struct blk_mq_hw_ctx *hctx;
> @@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> }
>
> /*
> - * Map software to hardware queues.
> - *
> - * If the cpu isn't present, the cpu is mapped to first hctx.
> + * Map software to hardware queues
> */
> - for_each_present_cpu(i) {
> + for_each_possible_cpu(i) {
> + /* If the cpu isn't online, the cpu is mapped to first hctx */
> + if (!cpumask_test_cpu(i, online_mask))
> + continue;
> +
> hctx_idx = q->mq_map[i];
> /* unmapped hw queue can be remapped after CPU topo changed */
> if (!set->tags[hctx_idx] &&
> @@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
> blk_queue_softirq_done(q, set->ops->complete);
>
> blk_mq_init_cpu_queues(q, set->nr_hw_queues);
> +
> + get_online_cpus();
> + mutex_lock(&all_q_mutex);
> +
> + list_add_tail(&q->all_q_node, &all_q_list);
> blk_mq_add_queue_tag_set(set, q);
> - blk_mq_map_swqueue(q);
> + blk_mq_map_swqueue(q, cpu_online_mask);
> +
> + mutex_unlock(&all_q_mutex);
> + put_online_cpus();
>
> if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
> int ret;
> @@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
> {
> struct blk_mq_tag_set *set = q->tag_set;
>
> + mutex_lock(&all_q_mutex);
> + list_del_init(&q->all_q_node);
> + mutex_unlock(&all_q_mutex);
> +
> blk_mq_del_queue_tag_set(q);
> +
> blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
> }
>
> /* Basically redo blk_mq_init_queue with queue frozen */
> -static void blk_mq_queue_reinit(struct request_queue *q)
> +static void blk_mq_queue_reinit(struct request_queue *q,
> + const struct cpumask *online_mask)
> {
> WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
>
> @@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
> * we should change hctx numa_node according to the new topology (this
> * involves freeing and re-allocating memory, worth doing?)
> */
> - blk_mq_map_swqueue(q);
> + blk_mq_map_swqueue(q, online_mask);
>
> blk_mq_sysfs_register(q);
> blk_mq_debugfs_register_hctxs(q);
> }
>
> +/*
> + * New online cpumask which is going to be set in this hotplug event.
> + * Declare this cpumasks as global as cpu-hotplug operation is invoked
> + * one-by-one and dynamically allocating this could result in a failure.
> + */
> +static struct cpumask cpuhp_online_new;
> +
> +static void blk_mq_queue_reinit_work(void)
> +{
> + struct request_queue *q;
> +
> + mutex_lock(&all_q_mutex);
> + /*
> + * We need to freeze and reinit all existing queues. Freezing
> + * involves synchronous wait for an RCU grace period and doing it
> + * one by one may take a long time. Start freezing all queues in
> + * one swoop and then wait for the completions so that freezing can
> + * take place in parallel.
> + */
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_freeze_queue_start(q);
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_freeze_queue_wait(q);
> +
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_queue_reinit(q, &cpuhp_online_new);
> +
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_unfreeze_queue(q);
> +
> + mutex_unlock(&all_q_mutex);
> +}
> +
> +static int blk_mq_queue_reinit_dead(unsigned int cpu)
> +{
> + cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> + blk_mq_queue_reinit_work();
> + return 0;
> +}
> +
> +/*
> + * Before hotadded cpu starts handling requests, new mappings must be
> + * established. Otherwise, these requests in hw queue might never be
> + * dispatched.
> + *
> + * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
> + * for CPU0, and ctx1 for CPU1).
> + *
> + * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
> + * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
> + *
> + * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
> + * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
> + * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
> + * ignored.
> + */
> +static int blk_mq_queue_reinit_prepare(unsigned int cpu)
> +{
> + cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> + cpumask_set_cpu(cpu, &cpuhp_online_new);
> + blk_mq_queue_reinit_work();
> + return 0;
> +}
> +
> static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> {
> int i;
> @@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
> blk_mq_update_queue_map(set);
> list_for_each_entry(q, &set->tag_list, tag_set_list) {
> blk_mq_realloc_hw_ctxs(set, q);
> - blk_mq_queue_reinit(q);
> + blk_mq_queue_reinit(q, cpu_online_mask);
> }
>
> list_for_each_entry(q, &set->tag_list, tag_set_list)
> @@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
> return __blk_mq_poll(hctx, rq);
> }
>
> +void blk_mq_disable_hotplug(void)
> +{
> + mutex_lock(&all_q_mutex);
> +}
> +
> +void blk_mq_enable_hotplug(void)
> +{
> + mutex_unlock(&all_q_mutex);
> +}
> +
> static int __init blk_mq_init(void)
> {
> /*
> @@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
>
> cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
> blk_mq_hctx_notify_dead);
> +
> + cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> + blk_mq_queue_reinit_prepare,
> + blk_mq_queue_reinit_dead);
> return 0;
> }
> subsys_initcall(blk_mq_init);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 6c7c3ff5bf62..83b13ef1915e 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
> void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
> struct list_head *list);
> +/*
> + * CPU hotplug helpers
> + */
> +void blk_mq_enable_hotplug(void);
> +void blk_mq_disable_hotplug(void);
>
> /*
> * CPU -> queue mappings
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 201ab7267986..c31d4e3bf6d0 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -76,6 +76,7 @@ enum cpuhp_state {
> CPUHP_XEN_EVTCHN_PREPARE,
> CPUHP_ARM_SHMOBILE_SCU_PREPARE,
> CPUHP_SH_SH3X_PREPARE,
> + CPUHP_BLK_MQ_PREPARE,
> CPUHP_NET_FLOW_PREPARE,
> CPUHP_TOPOLOGY_PREPARE,
> CPUHP_NET_IUCV_PREPARE,
>
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:12 ` Christian Borntraeger
@ 2017-11-21 20:14 ` Jens Axboe
-1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:14 UTC (permalink / raw)
To: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig
On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>> Bisect points to
>>>>>>>>>
>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>
>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>
>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>
>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>> the code.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>
>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>> take a look.
>>>>>>>
>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>
>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>> for a dead cpu and handle that.
>>>>>>
>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>> not available CPU.
>>>>>>
>>>>>> in libvirt/virsh speak:
>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>
>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>> but becomes present and online afterwards.
>>>>>
>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>
>>>>> I'll see if I can come up with an appropriate fix.
>>>>
>>>> Can you try the below?
>>>
>>>
>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>
>>>
>>> output with 2 cpus:
>>> /sys/kernel/debug/block/vda
>>> /sys/kernel/debug/block/vda/hctx0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>> /sys/kernel/debug/block/vda/hctx0/active
>>> /sys/kernel/debug/block/vda/hctx0/run
>>> /sys/kernel/debug/block/vda/hctx0/queued
>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/tags
>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>> /sys/kernel/debug/block/vda/hctx0/busy
>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>> /sys/kernel/debug/block/vda/hctx0/flags
>>> /sys/kernel/debug/block/vda/hctx0/state
>>> /sys/kernel/debug/block/vda/sched
>>> /sys/kernel/debug/block/vda/sched/dispatch
>>> /sys/kernel/debug/block/vda/sched/starved
>>> /sys/kernel/debug/block/vda/sched/batching
>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>> /sys/kernel/debug/block/vda/write_hints
>>> /sys/kernel/debug/block/vda/state
>>> /sys/kernel/debug/block/vda/requeue_list
>>> /sys/kernel/debug/block/vda/poll_stat
>>
>> Try this, basically just a revert.
>
> Yes, seems to work.
>
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Great, thanks for testing.
> Do you know why the original commit made it into 4.12 stable? After all
> it has no Fixes tag and no cc stable-
I was wondering the same thing when you said it was in 4.12.stable and
not in 4.12 release. That patch should absolutely not have gone into
stable, it's not marked as such and it's not fixing a problem that is
stable worthy. In fact, it's causing a regression...
Greg? Upstream commit is mentioned higher up, start of the email.
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:14 ` Jens Axboe
0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:14 UTC (permalink / raw)
To: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig
On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>> Bisect points to
>>>>>>>>>
>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>
>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>
>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>
>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>> the code.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>
>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>> take a look.
>>>>>>>
>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>
>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>> for a dead cpu and handle that.
>>>>>>
>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>> not available CPU.
>>>>>>
>>>>>> in libvirt/virsh speak:
>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>
>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>> but becomes present and online afterwards.
>>>>>
>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>
>>>>> I'll see if I can come up with an appropriate fix.
>>>>
>>>> Can you try the below?
>>>
>>>
>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>
>>>
>>> output with 2 cpus:
>>> /sys/kernel/debug/block/vda
>>> /sys/kernel/debug/block/vda/hctx0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>> /sys/kernel/debug/block/vda/hctx0/active
>>> /sys/kernel/debug/block/vda/hctx0/run
>>> /sys/kernel/debug/block/vda/hctx0/queued
>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/tags
>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>> /sys/kernel/debug/block/vda/hctx0/busy
>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>> /sys/kernel/debug/block/vda/hctx0/flags
>>> /sys/kernel/debug/block/vda/hctx0/state
>>> /sys/kernel/debug/block/vda/sched
>>> /sys/kernel/debug/block/vda/sched/dispatch
>>> /sys/kernel/debug/block/vda/sched/starved
>>> /sys/kernel/debug/block/vda/sched/batching
>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>> /sys/kernel/debug/block/vda/write_hints
>>> /sys/kernel/debug/block/vda/state
>>> /sys/kernel/debug/block/vda/requeue_list
>>> /sys/kernel/debug/block/vda/poll_stat
>>
>> Try this, basically just a revert.
>
> Yes, seems to work.
>
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Great, thanks for testing.
> Do you know why the original commit made it into 4.12 stable? After all
> it has no Fixes tag and no cc stable-
I was wondering the same thing when you said it was in 4.12.stable and
not in 4.12 release. That patch should absolutely not have gone into
stable, it's not marked as such and it's not fixing a problem that is
stable worthy. In fact, it's causing a regression...
Greg? Upstream commit is mentioned higher up, start of the email.
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:14 ` Jens Axboe
@ 2017-11-21 20:19 ` Christian Borntraeger
-1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:19 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
stable
On 11/21/2017 09:14 PM, Jens Axboe wrote:
> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>> Bisect points to
>>>>>>>>>>
>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>
>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>
>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>
>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>> the code.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>
>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>> take a look.
>>>>>>>>
>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>
>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>> for a dead cpu and handle that.
>>>>>>>
>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>> not available CPU.
>>>>>>>
>>>>>>> in libvirt/virsh speak:
>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>
>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>> but becomes present and online afterwards.
>>>>>>
>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>
>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>
>>>>> Can you try the below?
>>>>
>>>>
>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>
>>>>
>>>> output with 2 cpus:
>>>> /sys/kernel/debug/block/vda
>>>> /sys/kernel/debug/block/vda/hctx0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>> /sys/kernel/debug/block/vda/sched
>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>> /sys/kernel/debug/block/vda/sched/starved
>>>> /sys/kernel/debug/block/vda/sched/batching
>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>> /sys/kernel/debug/block/vda/write_hints
>>>> /sys/kernel/debug/block/vda/state
>>>> /sys/kernel/debug/block/vda/requeue_list
>>>> /sys/kernel/debug/block/vda/poll_stat
>>>
>>> Try this, basically just a revert.
>>
>> Yes, seems to work.
>>
>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>
> Great, thanks for testing.
>
>> Do you know why the original commit made it into 4.12 stable? After all
>> it has no Fixes tag and no cc stable-
>
> I was wondering the same thing when you said it was in 4.12.stable and
> not in 4.12 release. That patch should absolutely not have gone into
> stable, it's not marked as such and it's not fixing a problem that is
> stable worthy. In fact, it's causing a regression...
>
> Greg? Upstream commit is mentioned higher up, start of the email.
>
Forgot to cc Greg?
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:19 ` Christian Borntraeger
0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:19 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
stable
On 11/21/2017 09:14 PM, Jens Axboe wrote:
> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>> Bisect points to
>>>>>>>>>>
>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>
>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>
>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>
>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>> the code.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>
>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>> take a look.
>>>>>>>>
>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>
>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>> for a dead cpu and handle that.
>>>>>>>
>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>> not available CPU.
>>>>>>>
>>>>>>> in libvirt/virsh speak:
>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>
>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>> but becomes present and online afterwards.
>>>>>>
>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>
>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>
>>>>> Can you try the below?
>>>>
>>>>
>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>
>>>>
>>>> output with 2 cpus:
>>>> /sys/kernel/debug/block/vda
>>>> /sys/kernel/debug/block/vda/hctx0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>> /sys/kernel/debug/block/vda/sched
>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>> /sys/kernel/debug/block/vda/sched/starved
>>>> /sys/kernel/debug/block/vda/sched/batching
>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>> /sys/kernel/debug/block/vda/write_hints
>>>> /sys/kernel/debug/block/vda/state
>>>> /sys/kernel/debug/block/vda/requeue_list
>>>> /sys/kernel/debug/block/vda/poll_stat
>>>
>>> Try this, basically just a revert.
>>
>> Yes, seems to work.
>>
>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>
> Great, thanks for testing.
>
>> Do you know why the original commit made it into 4.12 stable? After all
>> it has no Fixes tag and no cc stable-
>
> I was wondering the same thing when you said it was in 4.12.stable and
> not in 4.12 release. That patch should absolutely not have gone into
> stable, it's not marked as such and it's not fixing a problem that is
> stable worthy. In fact, it's causing a regression...
>
> Greg? Upstream commit is mentioned higher up, start of the email.
>
Forgot to cc Greg?
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:19 ` Christian Borntraeger
@ 2017-11-21 20:21 ` Jens Axboe
-1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:21 UTC (permalink / raw)
To: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
Greg Kroah-Hartman, stable
On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>
> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>> Bisect points to
>>>>>>>>>>>
>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>
>>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>>
>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>
>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>> the code.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>
>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>> take a look.
>>>>>>>>>
>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>
>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>
>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>> not available CPU.
>>>>>>>>
>>>>>>>> in libvirt/virsh speak:
>>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>
>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>> but becomes present and online afterwards.
>>>>>>>
>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>
>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>
>>>>>> Can you try the below?
>>>>>
>>>>>
>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>
>>>>>
>>>>> output with 2 cpus:
>>>>> /sys/kernel/debug/block/vda
>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>> /sys/kernel/debug/block/vda/sched
>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>> /sys/kernel/debug/block/vda/state
>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>
>>>> Try this, basically just a revert.
>>>
>>> Yes, seems to work.
>>>
>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>
>> Great, thanks for testing.
>>
>>> Do you know why the original commit made it into 4.12 stable? After all
>>> it has no Fixes tag and no cc stable-
>>
>> I was wondering the same thing when you said it was in 4.12.stable and
>> not in 4.12 release. That patch should absolutely not have gone into
>> stable, it's not marked as such and it's not fixing a problem that is
>> stable worthy. In fact, it's causing a regression...
>>
>> Greg? Upstream commit is mentioned higher up, start of the email.
>>
>
>
> Forgot to cc Greg?
I did, thanks for doing that. Now I wonder how to mark this patch,
as we should revert it from kernels that have the bad commit. 4.12
is fine, 4.12.later-stable is not.
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:21 ` Jens Axboe
0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:21 UTC (permalink / raw)
To: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
Greg Kroah-Hartman, stable
On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>
> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>> Bisect points to
>>>>>>>>>>>
>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>
>>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>>
>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>
>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>> the code.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>
>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>> take a look.
>>>>>>>>>
>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>
>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>
>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>> not available CPU.
>>>>>>>>
>>>>>>>> in libvirt/virsh speak:
>>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>
>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>> but becomes present and online afterwards.
>>>>>>>
>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>
>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>
>>>>>> Can you try the below?
>>>>>
>>>>>
>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>
>>>>>
>>>>> output with 2 cpus:
>>>>> /sys/kernel/debug/block/vda
>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>> /sys/kernel/debug/block/vda/sched
>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>> /sys/kernel/debug/block/vda/state
>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>
>>>> Try this, basically just a revert.
>>>
>>> Yes, seems to work.
>>>
>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>
>> Great, thanks for testing.
>>
>>> Do you know why the original commit made it into 4.12 stable? After all
>>> it has no Fixes tag and no cc stable-
>>
>> I was wondering the same thing when you said it was in 4.12.stable and
>> not in 4.12 release. That patch should absolutely not have gone into
>> stable, it's not marked as such and it's not fixing a problem that is
>> stable worthy. In fact, it's causing a regression...
>>
>> Greg? Upstream commit is mentioned higher up, start of the email.
>>
>
>
> Forgot to cc Greg?
I did, thanks for doing that. Now I wonder how to mark this patch,
as we should revert it from kernels that have the bad commit. 4.12
is fine, 4.12.later-stable is not.
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:21 ` Jens Axboe
(?)
@ 2017-11-21 20:31 ` Christian Borntraeger
-1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:31 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
stable
On 11/21/2017 09:21 PM, Jens Axboe wrote:
> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>
>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>
>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>
>>>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>
>>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>> the code.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>
>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>> take a look.
>>>>>>>>>>
>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>
>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>
>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>> not available CPU.
>>>>>>>>>
>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>
>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>> but becomes present and online afterwards.
>>>>>>>>
>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>
>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>
>>>>>>> Can you try the below?
>>>>>>
>>>>>>
>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>
>>>>>>
>>>>>> output with 2 cpus:
>>>>>> /sys/kernel/debug/block/vda
>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>> /sys/kernel/debug/block/vda/state
>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>
>>>>> Try this, basically just a revert.
>>>>
>>>> Yes, seems to work.
>>>>
>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> Great, thanks for testing.
>>>
>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>> it has no Fixes tag and no cc stable-
>>>
>>> I was wondering the same thing when you said it was in 4.12.stable and
>>> not in 4.12 release. That patch should absolutely not have gone into
>>> stable, it's not marked as such and it's not fixing a problem that is
>>> stable worthy. In fact, it's causing a regression...
>>>
>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>
>>
>>
>> Forgot to cc Greg?
>
> I did, thanks for doing that. Now I wonder how to mark this patch,
> as we should revert it from kernels that have the bad commit. 4.12
> is fine, 4.12.later-stable is not.
>
I think we should tag it with:
Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:21 ` Jens Axboe
@ 2017-11-21 20:31 ` Christian Borntraeger
-1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:31 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
stable
On 11/21/2017 09:21 PM, Jens Axboe wrote:
> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>
>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>
>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>
>>>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>
>>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>> the code.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>
>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>> take a look.
>>>>>>>>>>
>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>
>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>
>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>> not available CPU.
>>>>>>>>>
>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>
>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>> but becomes present and online afterwards.
>>>>>>>>
>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>
>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>
>>>>>>> Can you try the below?
>>>>>>
>>>>>>
>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>
>>>>>>
>>>>>> output with 2 cpus:
>>>>>> /sys/kernel/debug/block/vda
>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>> /sys/kernel/debug/block/vda/state
>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>
>>>>> Try this, basically just a revert.
>>>>
>>>> Yes, seems to work.
>>>>
>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> Great, thanks for testing.
>>>
>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>> it has no Fixes tag and no cc stable-
>>>
>>> I was wondering the same thing when you said it was in 4.12.stable and
>>> not in 4.12 release. That patch should absolutely not have gone into
>>> stable, it's not marked as such and it's not fixing a problem that is
>>> stable worthy. In fact, it's causing a regression...
>>>
>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>
>>
>>
>> Forgot to cc Greg?
>
> I did, thanks for doing that. Now I wonder how to mark this patch,
> as we should revert it from kernels that have the bad commit. 4.12
> is fine, 4.12.later-stable is not.
>
I think we should tag it with:
Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:31 ` Christian Borntraeger
0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:31 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
stable
On 11/21/2017 09:21 PM, Jens Axboe wrote:
> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>
>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>
>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>
>>>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>
>>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>> the code.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>
>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>> take a look.
>>>>>>>>>>
>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>
>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>
>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>> not available CPU.
>>>>>>>>>
>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>
>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>> but becomes present and online afterwards.
>>>>>>>>
>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>
>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>
>>>>>>> Can you try the below?
>>>>>>
>>>>>>
>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>
>>>>>>
>>>>>> output with 2 cpus:
>>>>>> /sys/kernel/debug/block/vda
>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>> /sys/kernel/debug/block/vda/state
>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>
>>>>> Try this, basically just a revert.
>>>>
>>>> Yes, seems to work.
>>>>
>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> Great, thanks for testing.
>>>
>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>> it has no Fixes tag and no cc stable-
>>>
>>> I was wondering the same thing when you said it was in 4.12.stable and
>>> not in 4.12 release. That patch should absolutely not have gone into
>>> stable, it's not marked as such and it's not fixing a problem that is
>>> stable worthy. In fact, it's causing a regression...
>>>
>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>
>>
>>
>> Forgot to cc Greg?
>
> I did, thanks for doing that. Now I wonder how to mark this patch,
> as we should revert it from kernels that have the bad commit. 4.12
> is fine, 4.12.later-stable is not.
>
I think we should tag it with:
Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:31 ` Christian Borntraeger
@ 2017-11-21 20:39 ` Jens Axboe
-1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:39 UTC (permalink / raw)
To: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
Greg Kroah-Hartman, stable
On 11/21/2017 01:31 PM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 09:21 PM, Jens Axboe wrote:
>> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>>
>>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>>
>>>>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>>
>>>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>> the code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>>> take a look.
>>>>>>>>>>>
>>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>>
>>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>>
>>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>>> not available CPU.
>>>>>>>>>>
>>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>>
>>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>>> but becomes present and online afterwards.
>>>>>>>>>
>>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>>
>>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>>
>>>>>>>> Can you try the below?
>>>>>>>
>>>>>>>
>>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>>
>>>>>>>
>>>>>>> output with 2 cpus:
>>>>>>> /sys/kernel/debug/block/vda
>>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>>> /sys/kernel/debug/block/vda/state
>>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>>
>>>>>> Try this, basically just a revert.
>>>>>
>>>>> Yes, seems to work.
>>>>>
>>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>>
>>>> Great, thanks for testing.
>>>>
>>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>>> it has no Fixes tag and no cc stable-
>>>>
>>>> I was wondering the same thing when you said it was in 4.12.stable and
>>>> not in 4.12 release. That patch should absolutely not have gone into
>>>> stable, it's not marked as such and it's not fixing a problem that is
>>>> stable worthy. In fact, it's causing a regression...
>>>>
>>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>>
>>>
>>>
>>> Forgot to cc Greg?
>>
>> I did, thanks for doing that. Now I wonder how to mark this patch,
>> as we should revert it from kernels that have the bad commit. 4.12
>> is fine, 4.12.later-stable is not.
>>
>
> I think we should tag it with:
>
> Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
>
> which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.
Yeah, I think so too. But thinking more about this, I'm pretty sure this
adds a bad lock dependency with hotplug. Need to verify so we ensure we
don't introduce a potential deadlock here...
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-21 20:39 ` Jens Axboe
0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:39 UTC (permalink / raw)
To: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
Greg Kroah-Hartman, stable
On 11/21/2017 01:31 PM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 09:21 PM, Jens Axboe wrote:
>> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>>
>>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>>
>>>>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>>
>>>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>> the code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>>> take a look.
>>>>>>>>>>>
>>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>>
>>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>>
>>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>>> not available CPU.
>>>>>>>>>>
>>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>>
>>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>>> but becomes present and online afterwards.
>>>>>>>>>
>>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>>
>>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>>
>>>>>>>> Can you try the below?
>>>>>>>
>>>>>>>
>>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>>
>>>>>>>
>>>>>>> output with 2 cpus:
>>>>>>> /sys/kernel/debug/block/vda
>>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>>> /sys/kernel/debug/block/vda/state
>>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>>
>>>>>> Try this, basically just a revert.
>>>>>
>>>>> Yes, seems to work.
>>>>>
>>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>>
>>>> Great, thanks for testing.
>>>>
>>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>>> it has no Fixes tag and no cc stable-
>>>>
>>>> I was wondering the same thing when you said it was in 4.12.stable and
>>>> not in 4.12 release. That patch should absolutely not have gone into
>>>> stable, it's not marked as such and it's not fixing a problem that is
>>>> stable worthy. In fact, it's causing a regression...
>>>>
>>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>>
>>>
>>>
>>> Forgot to cc Greg?
>>
>> I did, thanks for doing that. Now I wonder how to mark this patch,
>> as we should revert it from kernels that have the bad commit. 4.12
>> is fine, 4.12.later-stable is not.
>>
>
> I think we should tag it with:
>
> Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
>
> which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.
Yeah, I think so too. But thinking more about this, I'm pretty sure this
adds a bad lock dependency with hotplug. Need to verify so we ensure we
don't introduce a potential deadlock here...
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:39 ` Jens Axboe
(?)
@ 2017-11-22 7:28 ` Christoph Hellwig
-1 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-22 7:28 UTC (permalink / raw)
To: Jens Axboe
Cc: mst, Greg Kroah-Hartman, linux-kernel, stable, virtualization,
linux-block, Bart Van Assche, Christoph Hellwig
Jens, please don't just revert the commit in your for-linus tree.
On its own this will totally mess up the interrupt assignments. Give
me a bit of time to sort this out properly.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:39 ` Jens Axboe
@ 2017-11-22 7:28 ` Christoph Hellwig
-1 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-22 7:28 UTC (permalink / raw)
To: Jens Axboe
Cc: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
Greg Kroah-Hartman, stable
Jens, please don't just revert the commit in your for-linus tree.
On its own this will totally mess up the interrupt assignments. Give
me a bit of time to sort this out properly.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-22 7:28 ` Christoph Hellwig
0 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-22 7:28 UTC (permalink / raw)
To: Jens Axboe
Cc: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
Greg Kroah-Hartman, stable
Jens, please don't just revert the commit in your for-linus tree.
On its own this will totally mess up the interrupt assignments. Give
me a bit of time to sort this out properly.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-22 7:28 ` Christoph Hellwig
@ 2017-11-22 14:46 ` Jens Axboe
-1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-22 14:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Greg Kroah-Hartman,
stable
On 11/22/2017 12:28 AM, Christoph Hellwig wrote:
> Jens, please don't just revert the commit in your for-linus tree.
>
> On its own this will totally mess up the interrupt assignments. Give
> me a bit of time to sort this out properly.
I wasn't going to push it until I heard otherwise. I'll just pop it
off, for-linus isn't a stable branch.
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-11-22 14:46 ` Jens Axboe
0 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-22 14:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Greg Kroah-Hartman,
stable
On 11/22/2017 12:28 AM, Christoph Hellwig wrote:
> Jens, please don't just revert the commit in your for-linus tree.
>
> On its own this will totally mess up the interrupt assignments. Give
> me a bit of time to sort this out properly.
I wasn't going to push it until I heard otherwise. I'll just pop it
off, for-linus isn't a stable branch.
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-22 14:46 ` Jens Axboe
(?)
@ 2017-11-23 14:34 ` Christoph Hellwig
2017-11-23 14:42 ` Hannes Reinecke
` (2 more replies)
-1 siblings, 3 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:34 UTC (permalink / raw)
To: Jens Axboe
Cc: Christoph Hellwig, Christian Borntraeger, Bart Van Assche,
linux-block, linux-kernel, Thomas Gleixner
FYI, the patch below changes both the irq and block mappings to
always use the cpu possible map (should be split in two in due time).
I think this is the right way forward. For every normal machine
those two are the same, but for VMs with maxcpus above their normal
count or some big iron that can grow more cpus it means we waster
a few more resources for the not present but reserved cpus. It
fixes the reported issue for me:
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc8a701..3eb169f15842 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -16,11 +16,6 @@
static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
{
- /*
- * Non present CPU will be mapped to queue index 0.
- */
- if (!cpu_present(cpu))
- return 0;
return cpu % nr_queues;
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..612ce1fb7c4e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
INIT_LIST_HEAD(&__ctx->rq_list);
__ctx->queue = q;
- /* If the cpu isn't present, the cpu is mapped to first hctx */
- if (!cpu_present(i))
- continue;
-
- hctx = blk_mq_map_queue(q, i);
-
/*
* Set local node, IFF we have more than one hw queue. If
* not, we remain on the home node of the device
*/
+ hctx = blk_mq_map_queue(q, i);
if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
hctx->numa_node = local_memory_node(cpu_to_node(i));
}
@@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
*
* If the cpu isn't present, the cpu is mapped to first hctx.
*/
- for_each_present_cpu(i) {
+ for_each_possible_cpu(i) {
hctx_idx = q->mq_map[i];
/* unmapped hw queue can be remapped after CPU topo changed */
if (!set->tags[hctx_idx] &&
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index e12d35108225..a37a3b4b6342 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
}
}
-static cpumask_var_t *alloc_node_to_present_cpumask(void)
+static cpumask_var_t *alloc_node_to_possible_cpumask(void)
{
cpumask_var_t *masks;
int node;
@@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void)
return NULL;
}
-static void free_node_to_present_cpumask(cpumask_var_t *masks)
+static void free_node_to_possible_cpumask(cpumask_var_t *masks)
{
int node;
@@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t *masks)
kfree(masks);
}
-static void build_node_to_present_cpumask(cpumask_var_t *masks)
+static void build_node_to_possible_cpumask(cpumask_var_t *masks)
{
int cpu;
- for_each_present_cpu(cpu)
+ for_each_possible_cpu(cpu)
cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
}
-static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask,
+static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
const struct cpumask *mask, nodemask_t *nodemsk)
{
int n, nodes = 0;
/* Calculate the number of nodes in the supplied affinity mask */
for_each_node(n) {
- if (cpumask_intersects(mask, node_to_present_cpumask[n])) {
+ if (cpumask_intersects(mask, node_to_possible_cpumask[n])) {
node_set(n, *nodemsk);
nodes++;
}
@@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
int last_affv = affv + affd->pre_vectors;
nodemask_t nodemsk = NODE_MASK_NONE;
struct cpumask *masks;
- cpumask_var_t nmsk, *node_to_present_cpumask;
+ cpumask_var_t nmsk, *node_to_possible_cpumask;
/*
* If there aren't any vectors left after applying the pre/post
@@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
if (!masks)
goto out;
- node_to_present_cpumask = alloc_node_to_present_cpumask();
- if (!node_to_present_cpumask)
+ node_to_possible_cpumask = alloc_node_to_possible_cpumask();
+ if (!node_to_possible_cpumask)
goto out;
/* Fill out vectors at the beginning that don't need affinity */
@@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
/* Stabilize the cpumasks */
get_online_cpus();
- build_node_to_present_cpumask(node_to_present_cpumask);
- nodes = get_nodes_in_cpumask(node_to_present_cpumask, cpu_present_mask,
+ build_node_to_possible_cpumask(node_to_possible_cpumask);
+ nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
&nodemsk);
/*
@@ -146,7 +146,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
if (affv <= nodes) {
for_each_node_mask(n, nodemsk) {
cpumask_copy(masks + curvec,
- node_to_present_cpumask[n]);
+ node_to_possible_cpumask[n]);
if (++curvec == last_affv)
break;
}
@@ -160,7 +160,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
/* Get the cpus on this node which are in the mask */
- cpumask_and(nmsk, cpu_present_mask, node_to_present_cpumask[n]);
+ cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
/* Calculate the number of cpus per vector */
ncpus = cpumask_weight(nmsk);
@@ -192,7 +192,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
/* Fill out vectors at the end that don't need affinity */
for (; curvec < nvecs; curvec++)
cpumask_copy(masks + curvec, irq_default_affinity);
- free_node_to_present_cpumask(node_to_present_cpumask);
+ free_node_to_possible_cpumask(node_to_possible_cpumask);
out:
free_cpumask_var(nmsk);
return masks;
@@ -214,7 +214,7 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
return 0;
get_online_cpus();
- ret = min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
+ ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
put_online_cpus();
return ret;
}
^ permalink raw reply related [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-23 14:34 ` Christoph Hellwig
@ 2017-11-23 14:42 ` Hannes Reinecke
2017-11-23 14:47 ` Christoph Hellwig
2017-11-23 15:05 ` Christian Borntraeger
2017-11-23 18:17 ` Christian Borntraeger
2 siblings, 1 reply; 96+ messages in thread
From: Hannes Reinecke @ 2017-11-23 14:42 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Christian Borntraeger, Bart Van Assche, linux-block,
linux-kernel, Thomas Gleixner
On 11/23/2017 03:34 PM, Christoph Hellwig wrote:
> FYI, the patch below changes both the irq and block mappings to
> always use the cpu possible map (should be split in two in due time).
>
> I think this is the right way forward. For every normal machine
> those two are the same, but for VMs with maxcpus above their normal
> count or some big iron that can grow more cpus it means we waster
> a few more resources for the not present but reserved cpus. It
> fixes the reported issue for me:
>
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..3eb169f15842 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -16,11 +16,6 @@
>
> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> {
> - /*
> - * Non present CPU will be mapped to queue index 0.
> - */
> - if (!cpu_present(cpu))
> - return 0;
> return cpu % nr_queues;
> }
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..612ce1fb7c4e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
> INIT_LIST_HEAD(&__ctx->rq_list);
> __ctx->queue = q;
>
> - /* If the cpu isn't present, the cpu is mapped to first hctx */
> - if (!cpu_present(i))
> - continue;
> -
> - hctx = blk_mq_map_queue(q, i);
> -
> /*
> * Set local node, IFF we have more than one hw queue. If
> * not, we remain on the home node of the device
> */
> + hctx = blk_mq_map_queue(q, i);
> if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
> hctx->numa_node = local_memory_node(cpu_to_node(i));
> }
> @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> *
> * If the cpu isn't present, the cpu is mapped to first hctx.
> */
> - for_each_present_cpu(i) {
> + for_each_possible_cpu(i) {
> hctx_idx = q->mq_map[i];
> /* unmapped hw queue can be remapped after CPU topo changed */
> if (!set->tags[hctx_idx] &&
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index e12d35108225..a37a3b4b6342 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> }
> }
>
> -static cpumask_var_t *alloc_node_to_present_cpumask(void)
> +static cpumask_var_t *alloc_node_to_possible_cpumask(void)
> {
> cpumask_var_t *masks;
> int node;
> @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void)
> return NULL;
> }
>
> -static void free_node_to_present_cpumask(cpumask_var_t *masks)
> +static void free_node_to_possible_cpumask(cpumask_var_t *masks)
> {
> int node;
>
> @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t *masks)
> kfree(masks);
> }
>
> -static void build_node_to_present_cpumask(cpumask_var_t *masks)
> +static void build_node_to_possible_cpumask(cpumask_var_t *masks)
> {
> int cpu;
>
> - for_each_present_cpu(cpu)
> + for_each_possible_cpu(cpu)
> cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
> }
>
> -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask,
> +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
> const struct cpumask *mask, nodemask_t *nodemsk)
> {
> int n, nodes = 0;
>
> /* Calculate the number of nodes in the supplied affinity mask */
> for_each_node(n) {
> - if (cpumask_intersects(mask, node_to_present_cpumask[n])) {
> + if (cpumask_intersects(mask, node_to_possible_cpumask[n])) {
> node_set(n, *nodemsk);
> nodes++;
> }
> @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> int last_affv = affv + affd->pre_vectors;
> nodemask_t nodemsk = NODE_MASK_NONE;
> struct cpumask *masks;
> - cpumask_var_t nmsk, *node_to_present_cpumask;
> + cpumask_var_t nmsk, *node_to_possible_cpumask;
>
> /*
> * If there aren't any vectors left after applying the pre/post
> @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> if (!masks)
> goto out;
>
> - node_to_present_cpumask = alloc_node_to_present_cpumask();
> - if (!node_to_present_cpumask)
> + node_to_possible_cpumask = alloc_node_to_possible_cpumask();
> + if (!node_to_possible_cpumask)
> goto out;
>
> /* Fill out vectors at the beginning that don't need affinity */
> @@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>
> /* Stabilize the cpumasks */
> get_online_cpus();
> - build_node_to_present_cpumask(node_to_present_cpumask);
> - nodes = get_nodes_in_cpumask(node_to_present_cpumask, cpu_present_mask,
> + build_node_to_possible_cpumask(node_to_possible_cpumask);
> + nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
> &nodemsk);
>
> /*
> @@ -146,7 +146,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> if (affv <= nodes) {
> for_each_node_mask(n, nodemsk) {
> cpumask_copy(masks + curvec,
> - node_to_present_cpumask[n]);
> + node_to_possible_cpumask[n]);
> if (++curvec == last_affv)
> break;
> }
> @@ -160,7 +160,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
>
> /* Get the cpus on this node which are in the mask */
> - cpumask_and(nmsk, cpu_present_mask, node_to_present_cpumask[n]);
> + cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
>
> /* Calculate the number of cpus per vector */
> ncpus = cpumask_weight(nmsk);
> @@ -192,7 +192,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> /* Fill out vectors at the end that don't need affinity */
> for (; curvec < nvecs; curvec++)
> cpumask_copy(masks + curvec, irq_default_affinity);
> - free_node_to_present_cpumask(node_to_present_cpumask);
> + free_node_to_possible_cpumask(node_to_possible_cpumask);
> out:
> free_cpumask_var(nmsk);
> return masks;
> @@ -214,7 +214,7 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
> return 0;
>
> get_online_cpus();
> - ret = min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
> + ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
> put_online_cpus();
> return ret;
> }
>
What will happen for the CPU hotplug case?
Wouldn't we route I/O to a disabled CPU with this patch?
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-23 14:42 ` Hannes Reinecke
@ 2017-11-23 14:47 ` Christoph Hellwig
0 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:47 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Jens Axboe, Christian Borntraeger,
Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner
[fullquote deleted]
> What will happen for the CPU hotplug case?
> Wouldn't we route I/O to a disabled CPU with this patch?
Why would we route I/O to a disabled CPU (we generally route
I/O to devices to start with). How would including possible
but not present cpus change anything?
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-23 14:34 ` Christoph Hellwig
2017-11-23 14:42 ` Hannes Reinecke
@ 2017-11-23 15:05 ` Christian Borntraeger
2017-11-23 18:17 ` Christian Borntraeger
2 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-23 15:05 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner
Yes it seems to fix the bug.
On 11/23/2017 03:34 PM, Christoph Hellwig wrote:
> FYI, the patch below changes both the irq and block mappings to
> always use the cpu possible map (should be split in two in due time).
>
> I think this is the right way forward. For every normal machine
> those two are the same, but for VMs with maxcpus above their normal
> count or some big iron that can grow more cpus it means we waster
> a few more resources for the not present but reserved cpus. It
> fixes the reported issue for me:
>
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..3eb169f15842 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -16,11 +16,6 @@
>
> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> {
> - /*
> - * Non present CPU will be mapped to queue index 0.
> - */
> - if (!cpu_present(cpu))
> - return 0;
> return cpu % nr_queues;
> }
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..612ce1fb7c4e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
> INIT_LIST_HEAD(&__ctx->rq_list);
> __ctx->queue = q;
>
> - /* If the cpu isn't present, the cpu is mapped to first hctx */
> - if (!cpu_present(i))
> - continue;
> -
> - hctx = blk_mq_map_queue(q, i);
> -
> /*
> * Set local node, IFF we have more than one hw queue. If
> * not, we remain on the home node of the device
> */
> + hctx = blk_mq_map_queue(q, i);
> if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
> hctx->numa_node = local_memory_node(cpu_to_node(i));
> }
> @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> *
> * If the cpu isn't present, the cpu is mapped to first hctx.
> */
> - for_each_present_cpu(i) {
> + for_each_possible_cpu(i) {
> hctx_idx = q->mq_map[i];
> /* unmapped hw queue can be remapped after CPU topo changed */
> if (!set->tags[hctx_idx] &&
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index e12d35108225..a37a3b4b6342 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> }
> }
>
> -static cpumask_var_t *alloc_node_to_present_cpumask(void)
> +static cpumask_var_t *alloc_node_to_possible_cpumask(void)
> {
> cpumask_var_t *masks;
> int node;
> @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void)
> return NULL;
> }
>
> -static void free_node_to_present_cpumask(cpumask_var_t *masks)
> +static void free_node_to_possible_cpumask(cpumask_var_t *masks)
> {
> int node;
>
> @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t *masks)
> kfree(masks);
> }
>
> -static void build_node_to_present_cpumask(cpumask_var_t *masks)
> +static void build_node_to_possible_cpumask(cpumask_var_t *masks)
> {
> int cpu;
>
> - for_each_present_cpu(cpu)
> + for_each_possible_cpu(cpu)
> cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
> }
>
> -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask,
> +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
> const struct cpumask *mask, nodemask_t *nodemsk)
> {
> int n, nodes = 0;
>
> /* Calculate the number of nodes in the supplied affinity mask */
> for_each_node(n) {
> - if (cpumask_intersects(mask, node_to_present_cpumask[n])) {
> + if (cpumask_intersects(mask, node_to_possible_cpumask[n])) {
> node_set(n, *nodemsk);
> nodes++;
> }
> @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> int last_affv = affv + affd->pre_vectors;
> nodemask_t nodemsk = NODE_MASK_NONE;
> struct cpumask *masks;
> - cpumask_var_t nmsk, *node_to_present_cpumask;
> + cpumask_var_t nmsk, *node_to_possible_cpumask;
>
> /*
> * If there aren't any vectors left after applying the pre/post
> @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> if (!masks)
> goto out;
>
> - node_to_present_cpumask = alloc_node_to_present_cpumask();
> - if (!node_to_present_cpumask)
> + node_to_possible_cpumask = alloc_node_to_possible_cpumask();
> + if (!node_to_possible_cpumask)
> goto out;
>
> /* Fill out vectors at the beginning that don't need affinity */
> @@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>
> /* Stabilize the cpumasks */
> get_online_cpus();
> - build_node_to_present_cpumask(node_to_present_cpumask);
> - nodes = get_nodes_in_cpumask(node_to_present_cpumask, cpu_present_mask,
> + build_node_to_possible_cpumask(node_to_possible_cpumask);
> + nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
> &nodemsk);
>
> /*
> @@ -146,7 +146,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> if (affv <= nodes) {
> for_each_node_mask(n, nodemsk) {
> cpumask_copy(masks + curvec,
> - node_to_present_cpumask[n]);
> + node_to_possible_cpumask[n]);
> if (++curvec == last_affv)
> break;
> }
> @@ -160,7 +160,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
>
> /* Get the cpus on this node which are in the mask */
> - cpumask_and(nmsk, cpu_present_mask, node_to_present_cpumask[n]);
> + cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
>
> /* Calculate the number of cpus per vector */
> ncpus = cpumask_weight(nmsk);
> @@ -192,7 +192,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> /* Fill out vectors at the end that don't need affinity */
> for (; curvec < nvecs; curvec++)
> cpumask_copy(masks + curvec, irq_default_affinity);
> - free_node_to_present_cpumask(node_to_present_cpumask);
> + free_node_to_possible_cpumask(node_to_possible_cpumask);
> out:
> free_cpumask_var(nmsk);
> return masks;
> @@ -214,7 +214,7 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
> return 0;
>
> get_online_cpus();
> - ret = min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
> + ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
> put_online_cpus();
> return ret;
> }
>
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-23 14:34 ` Christoph Hellwig
2017-11-23 14:42 ` Hannes Reinecke
2017-11-23 15:05 ` Christian Borntraeger
@ 2017-11-23 18:17 ` Christian Borntraeger
2017-11-23 18:25 ` Christoph Hellwig
2 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-23 18:17 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner
On 11/23/2017 03:34 PM, Christoph Hellwig wrote:
> FYI, the patch below changes both the irq and block mappings to
> always use the cpu possible map (should be split in two in due time).
>
> I think this is the right way forward. For every normal machine
> those two are the same, but for VMs with maxcpus above their normal
> count or some big iron that can grow more cpus it means we waster
> a few more resources for the not present but reserved cpus. It
> fixes the reported issue for me:
While it fixes the hotplug issue under KVM, the same kernel no longers boots in the host,
it seems stuck early at boot just before detecting the SCSI disks. I have not yet looked into
that.
Christian
>
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc8a701..3eb169f15842 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -16,11 +16,6 @@
>
> static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> {
> - /*
> - * Non present CPU will be mapped to queue index 0.
> - */
> - if (!cpu_present(cpu))
> - return 0;
> return cpu % nr_queues;
> }
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..612ce1fb7c4e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2114,16 +2114,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
> INIT_LIST_HEAD(&__ctx->rq_list);
> __ctx->queue = q;
>
> - /* If the cpu isn't present, the cpu is mapped to first hctx */
> - if (!cpu_present(i))
> - continue;
> -
> - hctx = blk_mq_map_queue(q, i);
> -
> /*
> * Set local node, IFF we have more than one hw queue. If
> * not, we remain on the home node of the device
> */
> + hctx = blk_mq_map_queue(q, i);
> if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
> hctx->numa_node = local_memory_node(cpu_to_node(i));
> }
> @@ -2180,7 +2175,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> *
> * If the cpu isn't present, the cpu is mapped to first hctx.
> */
> - for_each_present_cpu(i) {
> + for_each_possible_cpu(i) {
> hctx_idx = q->mq_map[i];
> /* unmapped hw queue can be remapped after CPU topo changed */
> if (!set->tags[hctx_idx] &&
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index e12d35108225..a37a3b4b6342 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -39,7 +39,7 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> }
> }
>
> -static cpumask_var_t *alloc_node_to_present_cpumask(void)
> +static cpumask_var_t *alloc_node_to_possible_cpumask(void)
> {
> cpumask_var_t *masks;
> int node;
> @@ -62,7 +62,7 @@ static cpumask_var_t *alloc_node_to_present_cpumask(void)
> return NULL;
> }
>
> -static void free_node_to_present_cpumask(cpumask_var_t *masks)
> +static void free_node_to_possible_cpumask(cpumask_var_t *masks)
> {
> int node;
>
> @@ -71,22 +71,22 @@ static void free_node_to_present_cpumask(cpumask_var_t *masks)
> kfree(masks);
> }
>
> -static void build_node_to_present_cpumask(cpumask_var_t *masks)
> +static void build_node_to_possible_cpumask(cpumask_var_t *masks)
> {
> int cpu;
>
> - for_each_present_cpu(cpu)
> + for_each_possible_cpu(cpu)
> cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]);
> }
>
> -static int get_nodes_in_cpumask(cpumask_var_t *node_to_present_cpumask,
> +static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
> const struct cpumask *mask, nodemask_t *nodemsk)
> {
> int n, nodes = 0;
>
> /* Calculate the number of nodes in the supplied affinity mask */
> for_each_node(n) {
> - if (cpumask_intersects(mask, node_to_present_cpumask[n])) {
> + if (cpumask_intersects(mask, node_to_possible_cpumask[n])) {
> node_set(n, *nodemsk);
> nodes++;
> }
> @@ -109,7 +109,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> int last_affv = affv + affd->pre_vectors;
> nodemask_t nodemsk = NODE_MASK_NONE;
> struct cpumask *masks;
> - cpumask_var_t nmsk, *node_to_present_cpumask;
> + cpumask_var_t nmsk, *node_to_possible_cpumask;
>
> /*
> * If there aren't any vectors left after applying the pre/post
> @@ -125,8 +125,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> if (!masks)
> goto out;
>
> - node_to_present_cpumask = alloc_node_to_present_cpumask();
> - if (!node_to_present_cpumask)
> + node_to_possible_cpumask = alloc_node_to_possible_cpumask();
> + if (!node_to_possible_cpumask)
> goto out;
>
> /* Fill out vectors at the beginning that don't need affinity */
> @@ -135,8 +135,8 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>
> /* Stabilize the cpumasks */
> get_online_cpus();
> - build_node_to_present_cpumask(node_to_present_cpumask);
> - nodes = get_nodes_in_cpumask(node_to_present_cpumask, cpu_present_mask,
> + build_node_to_possible_cpumask(node_to_possible_cpumask);
> + nodes = get_nodes_in_cpumask(node_to_possible_cpumask, cpu_possible_mask,
> &nodemsk);
>
> /*
> @@ -146,7 +146,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> if (affv <= nodes) {
> for_each_node_mask(n, nodemsk) {
> cpumask_copy(masks + curvec,
> - node_to_present_cpumask[n]);
> + node_to_possible_cpumask[n]);
> if (++curvec == last_affv)
> break;
> }
> @@ -160,7 +160,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
>
> /* Get the cpus on this node which are in the mask */
> - cpumask_and(nmsk, cpu_present_mask, node_to_present_cpumask[n]);
> + cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
>
> /* Calculate the number of cpus per vector */
> ncpus = cpumask_weight(nmsk);
> @@ -192,7 +192,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> /* Fill out vectors at the end that don't need affinity */
> for (; curvec < nvecs; curvec++)
> cpumask_copy(masks + curvec, irq_default_affinity);
> - free_node_to_present_cpumask(node_to_present_cpumask);
> + free_node_to_possible_cpumask(node_to_possible_cpumask);
> out:
> free_cpumask_var(nmsk);
> return masks;
> @@ -214,7 +214,7 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
> return 0;
>
> get_online_cpus();
> - ret = min_t(int, cpumask_weight(cpu_present_mask), vecs) + resv;
> + ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
> put_online_cpus();
> return ret;
> }
>
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-23 18:17 ` Christian Borntraeger
@ 2017-11-23 18:25 ` Christoph Hellwig
2017-11-23 18:28 ` Christian Borntraeger
0 siblings, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 18:25 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
linux-kernel, Thomas Gleixner
What HBA driver do you use in the host?
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-23 18:25 ` Christoph Hellwig
@ 2017-11-23 18:28 ` Christian Borntraeger
2017-11-23 18:32 ` Christoph Hellwig
0 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-23 18:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner
zfcp on s390.
On 11/23/2017 07:25 PM, Christoph Hellwig wrote:
> What HBA driver do you use in the host?
>
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-23 18:28 ` Christian Borntraeger
@ 2017-11-23 18:32 ` Christoph Hellwig
2017-11-23 18:59 ` Christian Borntraeger
0 siblings, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-23 18:32 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
linux-kernel, Thomas Gleixner
On Thu, Nov 23, 2017 at 07:28:31PM +0100, Christian Borntraeger wrote:
> zfcp on s390.
Ok, so it can't be the interrupt code, but probably is the blk-mq-cpumap.c
changes. Can you try to revert just those for a quick test?
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-23 18:32 ` Christoph Hellwig
@ 2017-11-23 18:59 ` Christian Borntraeger
2017-11-24 13:09 ` Christian Borntraeger
0 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-23 18:59 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner
On 11/23/2017 07:32 PM, Christoph Hellwig wrote:
> On Thu, Nov 23, 2017 at 07:28:31PM +0100, Christian Borntraeger wrote:
>> zfcp on s390.
>
> Ok, so it can't be the interrupt code, but probably is the blk-mq-cpumap.c
> changes. Can you try to revert just those for a quick test?
Hmm, I get further in boot, but the system seems very sluggish and it does not
seem to be able to access the scsi disks (get data from them)
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-23 18:59 ` Christian Borntraeger
@ 2017-11-24 13:09 ` Christian Borntraeger
2017-11-27 15:54 ` Christoph Hellwig
0 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-24 13:09 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner
On 11/23/2017 07:59 PM, Christian Borntraeger wrote:
>
>
> On 11/23/2017 07:32 PM, Christoph Hellwig wrote:
>> On Thu, Nov 23, 2017 at 07:28:31PM +0100, Christian Borntraeger wrote:
>>> zfcp on s390.
>>
>> Ok, so it can't be the interrupt code, but probably is the blk-mq-cpumap.c
>> changes. Can you try to revert just those for a quick test?
>
>
> Hmm, I get further in boot, but the system seems very sluggish and it does not
> seem to be able to access the scsi disks (get data from them)
>
FWIW, just having the changes in irq_affinity.c is indeed fine.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-24 13:09 ` Christian Borntraeger
@ 2017-11-27 15:54 ` Christoph Hellwig
2017-11-29 19:18 ` Christian Borntraeger
0 siblings, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2017-11-27 15:54 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
linux-kernel, Thomas Gleixner
Can you try this git branch:
git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix
Gitweb:
http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-mq-hotplug-fix
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-27 15:54 ` Christoph Hellwig
@ 2017-11-29 19:18 ` Christian Borntraeger
2017-11-29 19:36 ` Christian Borntraeger
2017-12-04 16:21 ` Christoph Hellwig
0 siblings, 2 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-29 19:18 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner
Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
Seems that this is the place where the system stops. (see the sysrq-t output
at the bottom).
Message
"[ 0.247484] Linux version 4.15.0-rc1+ (cborntra@s38lp08) (gcc version 6.3.1 2
"
"0161221 (Red Hat 6.3.1-1.0.ibm) (GCC)) #229 SMP Wed Nov 29 20:05:35 CET 2017
"
"[ 0.247489] setup: Linux is running natively in 64-bit mode
"
"[ 0.247661] setup: The maximum memory size is 1048576MB
"
"[ 0.247670] setup: Reserving 1024MB of memory at 1047552MB for crashkernel (S
"
"ystem RAM: 1047552MB)
"
"[ 0.247688] numa: NUMA mode: plain
"
"[ 0.247794] cpu: 64 configured CPUs, 0 standby CPUs
"
"[ 0.247834] cpu: The CPU configuration topology of the machine is: 0 0 4 2 3
"
"8 / 4
"
"[ 0.248279] Write protected kernel read-only data: 12456k
"
"[ 0.265131] Zone ranges:
"
"[ 0.265134] DMA [mem 0x0000000000000000-0x000000007fffffff]
"
"[ 0.265136] Normal [mem 0x0000000080000000-0x000000ffffffffff]
"
"[ 0.265137] Movable zone start for each node
"
"[ 0.265138] Early memory node ranges
"
"[ 0.265139] node 0: [mem 0x0000000000000000-0x000000ffffffffff]
"
"[ 0.265141] Initmem setup node 0 [mem 0x0000000000000000-0x000000ffffffffff]
"
"[ 7.445561] random: fast init done
"
"[ 7.449194] percpu: Embedded 23 pages/cpu @000000fbbe600000 s56064 r8192 d299
"
"52 u94208
"
"[ 7.449380] Built 1 zonelists, mobility grouping on. Total pages: 264241152
"
"[ 7.449381] Policy zone: Normal
"
"[ 7.449384] Kernel command line: elevator=deadline audit_enable=0 audit=0 aud
"
"it_debug=0 selinux=0 crashkernel=1024M printk.time=1 zfcp.dbfsize=100 dasd=241c,
"
"241d,241e,241f root=/dev/dasda1 kvm.nested=1 BOOT_IMAGE=0
"
"[ 7.449420] audit: disabled (until reboot)
"
"[ 7.450513] log_buf_len individual max cpu contribution: 4096 bytes
"
"[ 7.450514] log_buf_len total cpu_extra contributions: 1044480 bytes
"
"[ 7.450515] log_buf_len min size: 131072 bytes
"
"[ 7.450788] log_buf_len: 2097152 bytes
"
"[ 7.450789] early log buf free: 125076(95%)
"
"[ 11.040620] Memory: 1055873868K/1073741824K available (8248K kernel code, 107
"
"8K rwdata, 4204K rodata, 812K init, 700K bss, 17867956K reserved, 0K cma-reserve
"
"d)
"
"[ 11.040938] SLUB: HWalign=256, Order=0-3, MinObjects=0, CPUs=256, Nodes=1
"
"[ 11.040969] ftrace: allocating 26506 entries in 104 pages
"
"[ 11.051476] Hierarchical RCU implementation.
"
"[ 11.051476] RCU event tracing is enabled.
"
"[ 11.051478] RCU debug extended QS entry/exit.
"
"[ 11.053263] NR_IRQS: 3, nr_irqs: 3, preallocated irqs: 3
"
"[ 11.053444] clocksource: tod: mask: 0xffffffffffffffff max_cycles: 0x3b0a9be8
"
"03b0a9, max_idle_ns: 1805497147909793 ns
"
"[ 11.160192] console [ttyS0] enabled
"
"[ 11.308228] pid_max: default: 262144 minimum: 2048
"
"[ 11.308298] Security Framework initialized
"
"[ 11.308300] SELinux: Disabled at boot.
"
"[ 11.354028] Dentry cache hash table entries: 33554432 (order: 16, 268435456 b
"
"ytes)
"
"[ 11.376945] Inode-cache hash table entries: 16777216 (order: 15, 134217728 by
"
"tes)
"
"[ 11.377685] Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes)
"
"[ 11.378401] Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 b
"
"ytes)
"
"[ 11.378984] Hierarchical SRCU implementation.
"
"[ 11.380032] smp: Bringing up secondary CPUs ...
"
"[ 11.393634] smp: Brought up 1 node, 64 CPUs
"
"[ 11.585458] devtmpfs: initialized
"
"[ 11.588589] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, ma
"
"x_idle_ns: 19112604462750000 ns
"
"[ 11.588998] futex hash table entries: 65536 (order: 12, 16777216 bytes)
"
"[ 11.591926] NET: Registered protocol family 16
"
"[ 11.596413] HugeTLB registered 1.00 MiB page size, pre-allocated 0 pages
"
"[ 11.597604] SCSI subsystem initialized
"
"[ 11.597611] pps_core: LinuxPPS API ver. 1 registered
"
"[ 11.597612] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giome
"
"tti <giometti@linux.it>
"
"[ 11.597614] PTP clock support registered
"
"[ 11.599088] NetLabel: Initializing
"
"[ 11.599089] NetLabel: domain hash size = 128
"
"[ 11.599090] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
"
"[ 11.599101] NetLabel: unlabeled traffic allowed by default
"
"[ 11.612542] PCI host bridge to bus 0000:00
"
"[ 11.612546] pci_bus 0000:00: root bus resource [mem 0x8000000000000000-0x8000
"
"0000007fffff 64bit pref]
"
"[ 11.612548] pci_bus 0000:00: No busn resource found for root bus, will use [b
"
"us 00-ff]
"
"[ 11.616458] iommu: Adding device 0000:00:00.0 to group 0
"
"[ 12.291894] VFS: Disk quotas dquot_6.6.0
"
"[ 12.291942] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
"
"[ 12.292226] NET: Registered protocol family 2
"
"[ 12.292662] TCP established hash table entries: 524288 (order: 10, 4194304 by
"
"tes)
"
"[ 12.294559] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
"
"[ 12.295008] TCP: Hash tables configured (established 524288 bind 65536)
"
"[ 12.295229] UDP hash table entries: 65536 (order: 9, 2097152 bytes)
"
"[ 12.296173] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes)
"
"[ 12.297343] NET: Registered protocol family 1
"
"[ 12.301053] workingset: timestamp_bits=42 max_order=28 bucket_order=0
"
"[ 12.304670] NET: Registered protocol family 38
"
"[ 12.304694] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 2
"
"50)
"
"[ 12.304939] io scheduler noop registered
"
"[ 12.304940] io scheduler deadline registered (default)
"
"[ 12.304975] io scheduler cfq registered
"
"[ 12.304977] io scheduler mq-deadline registered (default)
"
"[ 12.304977] io scheduler kyber registered
"
"[ 12.305949] atomic64_test: passed
"
"[ 12.305985] hvc_iucv: The z/VM IUCV HVC device driver cannot be used without
"
"z/VM
"
"[ 12.317868] loop: module loaded
"
"[ 12.318153] tun: Universal TUN/TAP device driver, 1.6
"
"[ 12.318240] mlx4_core: Mellanox ConnectX core driver v4.0-0
"
"[ 12.318251] mlx4_core: Initializing 0000:00:00.0
"
"[ 12.318324] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
"
"[ 12.319389] mlx4_core 0000:00:00.0: Detected virtual function - running in sl
"
"ave mode
"
"[ 12.319448] mlx4_core 0000:00:00.0: Sending reset
"
"[ 12.320791] mlx4_core 0000:00:00.0: Sending vhcr0
"
"[ 12.326014] mlx4_core 0000:00:00.0: Requested number of MACs is too much for
"
"port 1, reducing to 64
"
"[ 12.326016] mlx4_core 0000:00:00.0: Requested number of VLANs is too much for
"
" port 1, reducing to 1
"
"[ 12.326240] mlx4_core 0000:00:00.0: HCA minimum page size:512
"
"[ 12.327666] mlx4_core 0000:00:00.0: Timestamping is not supported in slave mo
"
"de
"
"[ 12.537132] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
"
"[ 12.537646] mlx4_en 0000:00:00.0: Activating port:1
"
"[ 12.578197] mlx4_en: 0000:00:00.0: Port 1: Using 32 TX rings
"
"[ 12.578201] mlx4_en: 0000:00:00.0: Port 1: Using 8 RX rings
"
"[ 12.578676] mlx4_en: 0000:00:00.0: Port 1: Initializing port
"
"[ 12.579202] mlx4_en 0000:00:00.0: Activating port:2
"
"[ 12.613180] mlx4_en: 0000:00:00.0: Port 2: Using 32 TX rings
"
"[ 12.613182] mlx4_en: 0000:00:00.0: Port 2: Using 8 RX rings
"
"[ 12.613592] mlx4_en: 0000:00:00.0: Port 2: Initializing port
"
"[ 12.614141] VFIO - User Level meta-driver version: 0.3
"
"[ 12.614231] mousedev: PS/2 mouse device common for all mice
"
"[ 12.614247] IR NEC protocol handler initialized
"
"[ 12.614248] IR RC5(x/sz) protocol handler initialized
"
"[ 12.614249] IR RC6 protocol handler initialized
"
"[ 12.614250] IR JVC protocol handler initialized
"
"[ 12.614251] IR Sony protocol handler initialized
"
"[ 12.614252] IR SANYO protocol handler initialized
"
"[ 12.614253] IR Sharp protocol handler initialized
"
"[ 12.614254] IR MCE Keyboard/mouse protocol handler initialized
"
"[ 12.614255] IR XMP protocol handler initialized
"
"[ 12.614318] device-mapper: uevent: version 1.0.3
"
"[ 12.614376] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-d
"
"evel@redhat.com
"
"[ 12.614856] cio: Channel measurement facility initialized using format extend
"
"ed (mode autodetected)
"
"[ 12.615194] Discipline DIAG cannot be used without z/VM
"
"[ 12.619692] dasd-eckd 0.0.241c: A channel path to the device has become opera
"
"tional
"
"[ 12.619847] dasd-eckd 0.0.241e: A channel path to the device has become opera
"
"tional
"
"[ 12.619992] dasd-eckd 0.0.241d: A channel path to the device has become opera
"
"tional
"
"[ 12.620344] dasd-eckd 0.0.241f: A channel path to the device has become opera
"
"tional
"
"[ 12.621880] dasd-eckd 0.0.241e: New DASD 3390/0C (CU 3990/01) with 30051 cyli
"
"nders, 15 heads, 224 sectors
"
"[ 12.622097] dasd-eckd 0.0.241d: New DASD 3390/0C (CU 3990/01) with 30051 cyli
"
"nders, 15 heads, 224 sectors
"
"[ 12.622286] dasd-eckd 0.0.241f: New DASD 3390/0C (CU 3990/01) with 30051 cyli
"
"nders, 15 heads, 224 sectors
"
"[ 12.622519] dasd-eckd 0.0.241c: New DASD 3390/0C (CU 3990/01) with 30051 cyli
"
"nders, 15 heads, 224 sectors
"
"[ 12.637350] dasd-eckd 0.0.241c: DASD with 4 KB/block, 21636720 KB total size,
"
" 48 KB/track, compatible disk layout
"
"[ 12.642780] dasd-eckd 0.0.241d: DASD with 4 KB/block, 21636720 KB total size,
"
" 48 KB/track, compatible disk layout
"
"[ 12.644616] dasdb:VOL1/ 0X241D: dasdb1
"
"[ 12.644943] dasd-eckd 0.0.241e: DASD with 4 KB/block, 21636720 KB total size,
"
" 48 KB/track, compatible disk layout
"
"[ 12.645439] dasd-eckd 0.0.241f: DASD with 4 KB/block, 21636720 KB total size,
"
" 48 KB/track, compatible disk layout
"
"[ 12.647222] dasda:VOL1/ 0X241C: dasda1
"
"[ 12.651236] dasdc:VOL1/ 0X241E: dasdc1
"
"[ 12.651704] dasdd:VOL1/ 0X241F: dasdd1
"
"[ 13.171832] drop_monitor: Initializing network drop monitor service
"
"[ 13.172016] Initializing XFRM netlink socket
"
"[ 13.172105] NET: Registered protocol family 10
"
"[ 13.172902] Segment Routing with IPv6
"
"[ 13.172915] mip6: Mobile IPv6
"
"[ 13.172916] NET: Registered protocol family 17
"
"[ 13.172923] Key type dns_resolver registered
"
"[ 13.173033] registered taskstats version 1
"
"[ 13.173394] Key type encrypted registered
"
"[ 13.173665] md: Waiting for all devices to be available before autodetect
"
"[ 13.173667] md: If you don't use raid, use raid=noautodetect
"
"[ 13.173894] md: Autodetecting RAID arrays.
"
"[ 13.173896] md: autorun ...
"
"[ 13.173896] md: ... autorun DONE.
"
"[ 13.174405] EXT4-fs (dasda1): couldn't mount as ext3 due to feature incompati
"
"bilities
"
"[ 13.174647] EXT4-fs (dasda1): couldn't mount as ext2 due to feature incompati
"
"bilities
"
"[ 13.199229] EXT4-fs (dasda1): mounted filesystem with ordered data mode. Opts
"
": (null)
"
"[ 13.199233] VFS: Mounted root (ext4 filesystem) readonly on device 94:1.
"
"[ 16.773545] random: crng init done
"
"[ 112.413804] sysrq: SysRq : Show State
"
"[ 112.413809] task PC stack pid father
"
"[ 112.413811] swapper/0 D 0 1 0 0x00000000
"
"[ 112.413814] Call Trace:
"
"[ 112.413820] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.413821] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.413824] [<000000000019e8c4>] io_schedule+0x34/0x58
"
"[ 112.413826] [<00000000009064f6>] bit_wait_io+0x2e/0x90
"
"[ 112.413827] [<0000000000905fe8>] __wait_on_bit+0xb8/0x110
"
"[ 112.413828] [<00000000009060de>] out_of_line_wait_on_bit+0x9e/0xb0
"
"[ 112.413833] [<000000000041e5ba>] __ext4_get_inode_loc+0x52a/0x570
"
"[ 112.413836] [<0000000000422664>] ext4_iget+0x7c/0xc28
"
"[ 112.413839] [<000000000043cf5a>] ext4_lookup+0x12a/0x238
"
"[ 112.413843] [<00000000003620b6>] lookup_slow+0xae/0x198
"
"[ 112.413844] [<00000000003657a0>] walk_component+0x210/0x358
"
"[ 112.413845] [<000000000036629a>] path_lookupat+0xe2/0x278
"
"[ 112.413847] [<0000000000368174>] filename_lookup+0x9c/0x160
"
"[ 112.413848] [<000000000036835c>] user_path_at_empty+0x5c/0x70
"
"[ 112.413851] [<0000000000380b94>] do_mount+0x74/0xd10
"
"[ 112.413853] [<0000000000381c1c>] SyS_mount+0xa4/0x108
"
"[ 112.413857] [<00000000005c02f8>] devtmpfs_mount+0x60/0xc0
"
"[ 112.413860] [<0000000000e395c6>] prepare_namespace+0x18e/0x1c0
"
"[ 112.413861] [<0000000000e38e46>] kernel_init_freeable+0x26e/0x288
"
"[ 112.413865] [<0000000000900572>] kernel_init+0x2a/0x150
"
"[ 112.413868] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.413869] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.413870] kthreadd S 0 2 0 0x00000000
"
"[ 112.418205] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418206] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418207] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418208] kworker/51:1 I 0 441 2 0x00000000
"
"[ 112.418210] Call Trace:
"
"[ 112.418211] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418213] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418214] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418215] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418217] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418218] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418219] kworker/53:1 I 0 442 2 0x00000000
"
"[ 112.418221] Call Trace:
"
"[ 112.418222] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418223] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418225] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418226] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418227] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418229] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418230] kworker/52:1 I 0 443 2 0x00000000
"
"[ 112.418231] Call Trace:
"
"[ 112.418233] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418234] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418235] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418237] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418238] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418239] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418240] kworker/54:1 I 0 444 2 0x00000000
"
"[ 112.418242] Call Trace:
"
"[ 112.418243] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418244] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418246] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418247] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418249] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418250] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418251] kworker/55:1 I 0 445 2 0x00000000
"
"[ 112.418253] Call Trace:
"
"[ 112.418254] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418255] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418256] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418258] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418259] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418260] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418262] kworker/57:1 I 0 446 2 0x00000000
"
"[ 112.418263] Call Trace:
"
"[ 112.418265] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418266] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418267] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418268] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418270] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418271] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418272] kworker/56:1 I 0 447 2 0x00000000
"
"[ 112.418274] Call Trace:
"
"[ 112.418275] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418276] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418278] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418279] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418280] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418282] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418283] kworker/59:1 I 0 448 2 0x00000000
"
"[ 112.418284] Call Trace:
"
"[ 112.418286] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418287] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418288] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418290] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418291] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418292] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418293] kworker/58:1 I 0 449 2 0x00000000
"
"[ 112.418295] Call Trace:
"
"[ 112.418296] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418297] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418299] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418300] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418301] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418303] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418304] kworker/61:1 I 0 450 2 0x00000000
"
"[ 112.418305] Call Trace:
"
"[ 112.418306] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418308] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418309] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418311] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418312] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418313] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418315] kworker/63:1 I 0 451 2 0x00000000
"
"[ 112.418316] Call Trace:
"
"[ 112.418318] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418319] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418320] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418322] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418323] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418324] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418325] kworker/0:1H I 0 452 2 0x00000000
"
"[ 112.418327] Call Trace:
"
"[ 112.418328] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418329] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418330] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418332] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418333] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418335] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418336] jbd2/dasda1-8 S 0 453 2 0x00000000
"
"[ 112.418337] Call Trace:
"
"[ 112.418338] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418339] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418343] [<00000000004725be>] kjournald2+0x386/0x3c8
"
"[ 112.418345] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418346] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418348] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418349] ext4-rsv-conver I 0 454 2 0x00000000
"
"[ 112.418350] Call Trace:
"
"[ 112.418352] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418353] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418354] [<0000000000189008>] rescuer_thread+0x3f8/0x460
"
"[ 112.418356] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418357] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418358] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418360] kworker/62:1H I 0 455 2 0x00000000
"
"[ 112.418361] Call Trace:
"
"[ 112.418362] ([<0000000000905458>] __schedule+0x398/0x850)
"
"[ 112.418364] [<000000000090595a>] schedule+0x4a/0xb8
"
"[ 112.418365] [<00000000001882ec>] worker_thread+0xe4/0x4f8
"
"[ 112.418366] [<000000000018ee66>] kthread+0x13e/0x160
"
"[ 112.418368] [<000000000090a9c2>] kernel_thread_starter+0x6/0xc
"
"[ 112.418369] [<000000000090a9bc>] kernel_thread_starter+0x0/0xc
"
"[ 112.418370] Showing busy workqueues and worker pools:
"
"[ 112.418407] workqueue events: flags=0x0
"
"[ 112.418426] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
"
"[ 112.418429] in-flight: 343:ctrlchar_handle_sysrq
"
"[ 112.418837] workqueue kblockd: flags=0x18
"
"[ 112.418855] pwq 131: cpus=65 node=0 flags=0x4 nice=-20 active=1/256
"
"[ 112.418858] pending: blk_mq_run_work_fn
"
"[ 112.419188] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=2 idle: 20
"
On 11/27/2017 04:54 PM, Christoph Hellwig wrote:
> Can you try this git branch:
>
> git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix
>
> Gitweb:
>
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-mq-hotplug-fix
>
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-29 19:18 ` Christian Borntraeger
@ 2017-11-29 19:36 ` Christian Borntraeger
2017-12-04 16:21 ` Christoph Hellwig
1 sibling, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-29 19:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner
On 11/29/2017 08:18 PM, Christian Borntraeger wrote:
> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
> Seems that this is the place where the system stops. (see the sysrq-t output
> at the bottom).
FWIW, the failing kernel had CONFIG_NR_CPUS=256 and 32 CPUS (with SMT2) == 64 threads
with CONFIG_NR_CPUS=16 the system booted fine.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-29 19:18 ` Christian Borntraeger
2017-11-29 19:36 ` Christian Borntraeger
@ 2017-12-04 16:21 ` Christoph Hellwig
2017-12-06 12:25 ` Christian Borntraeger
1 sibling, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2017-12-04 16:21 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
linux-kernel, Thomas Gleixner
On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote:
> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
> Seems that this is the place where the system stops. (see the sysrq-t output
> at the bottom).
Can you check which of the patches in the tree is the culprit?
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-12-04 16:21 ` Christoph Hellwig
@ 2017-12-06 12:25 ` Christian Borntraeger
0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-06 12:25 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
Thomas Gleixner, Stefan Haberland, linux-s390,
Martin Schwidefsky
On 12/04/2017 05:21 PM, Christoph Hellwig wrote:
> On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote:
>> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
>> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
>> Seems that this is the place where the system stops. (see the sysrq-t output
>> at the bottom).
>
> Can you check which of the patches in the tree is the culprit?
>From this branch
git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix
commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.
Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
s390 irq handling code).
Some history:
I got this warning
"WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)"
since 4.13 (and also in 4.12 stable)
on CPU hotplug of previously unavailable CPUs (real hotplug, no offline/online)
This was introduced with
blk-mq: Create hctx for each present CPU
commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08
And Christoph is currently working on a fix. The fixed kernel does boot with virtio-blk and
it fixes the warning but it hangs (outstanding I/O) with dasd disks.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2017-12-06 12:25 ` Christian Borntraeger
0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-06 12:25 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
Thomas Gleixner, Stefan Haberland, linux-s390,
Martin Schwidefsky
On 12/04/2017 05:21 PM, Christoph Hellwig wrote:
> On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote:
>> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
>> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
>> Seems that this is the place where the system stops. (see the sysrq-t output
>> at the bottom).
>
> Can you check which of the patches in the tree is the culprit?
From this branch
git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix
commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.
Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
s390 irq handling code).
Some history:
I got this warning
"WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)"
since 4.13 (and also in 4.12 stable)
on CPU hotplug of previously unavailable CPUs (real hotplug, no offline/online)
This was introduced with
blk-mq: Create hctx for each present CPU
commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08
And Christoph is currently working on a fix. The fixed kernel does boot with virtio-blk and
it fixes the warning but it hangs (outstanding I/O) with dasd disks.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-12-06 12:25 ` Christian Borntraeger
(?)
@ 2017-12-06 23:29 ` Christoph Hellwig
2017-12-07 9:20 ` Christian Borntraeger
2017-12-18 13:56 ` Stefan Haberland
-1 siblings, 2 replies; 96+ messages in thread
From: Christoph Hellwig @ 2017-12-06 23:29 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
linux-kernel, Thomas Gleixner, Stefan Haberland, linux-s390,
Martin Schwidefsky
On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
> blk-mq: create a blk_mq_ctx for each possible CPU
> does not boot on DASD and
> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
> genirq/affinity: assign vectors to all possible CPUs
> does boot with DASD disks.
>
> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> s390 irq handling code).
That is interesting as it really isn't related to interrupts at all,
it just ensures that possible CPUs are set in ->cpumask.
I guess we'd really want:
e005655c389e3d25bf3e43f71611ec12f3012de0
"blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
before this commit, but it seems like the whole stack didn't work for
your either.
I wonder if there is some weird thing about nr_cpu_ids in s390?
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-12-06 23:29 ` Christoph Hellwig
@ 2017-12-07 9:20 ` Christian Borntraeger
2017-12-14 17:32 ` Christian Borntraeger
2017-12-18 13:56 ` Stefan Haberland
1 sibling, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-07 9:20 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
Thomas Gleixner, Stefan Haberland, linux-s390,
Martin Schwidefsky
On 12/07/2017 12:29 AM, Christoph Hellwig wrote:
> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>> blk-mq: create a blk_mq_ctx for each possible CPU
>> does not boot on DASD and
>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>> genirq/affinity: assign vectors to all possible CPUs
>> does boot with DASD disks.
>>
>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>> s390 irq handling code).
>
> That is interesting as it really isn't related to interrupts at all,
> it just ensures that possible CPUs are set in ->cpumask.
>
> I guess we'd really want:
>
> e005655c389e3d25bf3e43f71611ec12f3012de0
> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>
> before this commit, but it seems like the whole stack didn't work for
> your either.
>
> I wonder if there is some weird thing about nr_cpu_ids in s390?
The problem starts as soon as NR_CPUS is larger than the number
of real CPUs.
Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu:
e.g. dont we need something like (whitespace and indent damaged)
@@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
if (--hctx->next_cpu_batch <= 0) {
int next_cpu;
+ do {
next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
- if (!cpu_online(next_cpu))
- next_cpu = cpumask_next(next_cpu, hctx->cpumask);
if (next_cpu >= nr_cpu_ids)
next_cpu = cpumask_first(hctx->cpumask);
+ } while (!cpu_online(next_cpu));
hctx->next_cpu = next_cpu;
hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-12-07 9:20 ` Christian Borntraeger
@ 2017-12-14 17:32 ` Christian Borntraeger
0 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-14 17:32 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
Thomas Gleixner, Stefan Haberland, linux-s390,
Martin Schwidefsky
Independent from the issues with the dasd disks, this also seem to not enable
additional hardware queues.
with cpus 0,1 (and 248 cpus max)
I get cpus 0 and 2-247 attached to hardware contect 0 and I get
cpu 1 for hardware context 1.
If I now add a cpu this does not change anything. hardware context 2,3,4
etc all have no CPU and hardware context 0 keeps sitting on all cpus (except 1).
On 12/07/2017 10:20 AM, Christian Borntraeger wrote:
>
>
> On 12/07/2017 12:29 AM, Christoph Hellwig wrote:
>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>>> blk-mq: create a blk_mq_ctx for each possible CPU
>>> does not boot on DASD and
>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>>> genirq/affinity: assign vectors to all possible CPUs
>>> does boot with DASD disks.
>>>
>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>> s390 irq handling code).
>>
>> That is interesting as it really isn't related to interrupts at all,
>> it just ensures that possible CPUs are set in ->cpumask.
>>
>> I guess we'd really want:
>>
>> e005655c389e3d25bf3e43f71611ec12f3012de0
>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>
>> before this commit, but it seems like the whole stack didn't work for
>> your either.
>>
>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>
> The problem starts as soon as NR_CPUS is larger than the number
> of real CPUs.
>
> Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu:
>
> e.g. dont we need something like (whitespace and indent damaged)
>
> @@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
> if (--hctx->next_cpu_batch <= 0) {
> int next_cpu;
>
> + do {
> next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
> - if (!cpu_online(next_cpu))
> - next_cpu = cpumask_next(next_cpu, hctx->cpumask);
> if (next_cpu >= nr_cpu_ids)
> next_cpu = cpumask_first(hctx->cpumask);
> + } while (!cpu_online(next_cpu));
>
> hctx->next_cpu = next_cpu;
> hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
>
> it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)
>
>
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-12-06 23:29 ` Christoph Hellwig
2017-12-07 9:20 ` Christian Borntraeger
@ 2017-12-18 13:56 ` Stefan Haberland
2017-12-20 15:47 ` Christian Borntraeger
1 sibling, 1 reply; 96+ messages in thread
From: Stefan Haberland @ 2017-12-18 13:56 UTC (permalink / raw)
To: Christoph Hellwig, Christian Borntraeger
Cc: Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
Thomas Gleixner, linux-s390, Martin Schwidefsky
On 07.12.2017 00:29, Christoph Hellwig wrote:
> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>> blk-mq: create a blk_mq_ctx for each possible CPU
>> does not boot on DASD and
>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>> genirq/affinity: assign vectors to all possible CPUs
>> does boot with DASD disks.
>>
>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>> s390 irq handling code).
> That is interesting as it really isn't related to interrupts at all,
> it just ensures that possible CPUs are set in ->cpumask.
>
> I guess we'd really want:
>
> e005655c389e3d25bf3e43f71611ec12f3012de0
> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>
> before this commit, but it seems like the whole stack didn't work for
> your either.
>
> I wonder if there is some weird thing about nr_cpu_ids in s390?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
I tried this on my system and the blk-mq-hotplug-fix branch does not
boot for me as well.
The disks get up and running and I/O works fine. At least the partition
detection and EXT4-fs mount works.
But at some point in time the disk do not get any requests.
I currently have no clue why.
I took a dump and had a look at the disk states and they are fine. No
error in the logs or in our debug entrys. Just empty DASD devices
waiting to be called for I/O requests.
Do you have anything I could have a look at?
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-12-18 13:56 ` Stefan Haberland
@ 2017-12-20 15:47 ` Christian Borntraeger
2018-01-11 9:13 ` Ming Lei
0 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2017-12-20 15:47 UTC (permalink / raw)
To: Stefan Haberland, Christoph Hellwig, Jens Axboe
Cc: Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner,
linux-s390, Martin Schwidefsky
On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> On 07.12.2017 00:29, Christoph Hellwig wrote:
>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>>> blk-mq: create a blk_mq_ctx for each possible CPU
>>> does not boot on DASD and
>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>>> genirq/affinity: assign vectors to all possible CPUs
>>> does boot with DASD disks.
>>>
>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>> s390 irq handling code).
>> That is interesting as it really isn't related to interrupts at all,
>> it just ensures that possible CPUs are set in ->cpumask.
>>
>> I guess we'd really want:
>>
>> e005655c389e3d25bf3e43f71611ec12f3012de0
>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>
>> before this commit, but it seems like the whole stack didn't work for
>> your either.
>>
>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>
> But at some point in time the disk do not get any requests.
>
> I currently have no clue why.
> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>
> Do you have anything I could have a look at?
Jens, Christoph, so what do we do about this?
To summarize:
- commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
- Jens' quick revert did fix the issue and did not broke DASD support but has some issues
with interrupt affinity.
- Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
without hotplug).
Christian
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-12-20 15:47 ` Christian Borntraeger
@ 2018-01-11 9:13 ` Ming Lei
0 siblings, 0 replies; 96+ messages in thread
From: Ming Lei @ 2018-01-11 9:13 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Stefan Haberland, Christoph Hellwig, Jens Axboe, Bart Van Assche,
linux-block, linux-kernel, Thomas Gleixner, linux-s390,
Martin Schwidefsky
On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> > On 07.12.2017 00:29, Christoph Hellwig wrote:
> >> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> >> t > commit 11b2025c3326f7096ceb588c3117c7883850c068��� -> bad
> >>> ���� blk-mq: create a blk_mq_ctx for each possible CPU
> >>> does not boot on DASD and
> >>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc��� -> good
> >>> ��� genirq/affinity: assign vectors to all possible CPUs
> >>> does boot with DASD disks.
> >>>
> >>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> >>> s390 irq handling code).
> >> That is interesting as it really isn't related to interrupts at all,
> >> it just ensures that possible CPUs are set in ->cpumask.
> >>
> >> I guess we'd really want:
> >>
> >> e005655c389e3d25bf3e43f71611ec12f3012de0
> >> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
> >>
> >> before this commit, but it seems like the whole stack didn't work for
> >> your either.
> >>
> >> I wonder if there is some weird thing about nr_cpu_ids in s390?
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at� http://vger.kernel.org/majordomo-info.html
> >>
> >
> > I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> > The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
> >
> > But at some point in time the disk do not get any requests.
> >
> > I currently have no clue why.
> > I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
> >
> > Do you have anything I could have a look at?
>
> Jens, Christoph, so what do we do about this?
> To summarize:
> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
> with interrupt affinity.
> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
> without hotplug).
Hello,
This one is a valid use case for VM, I think we need to fix that.
Looks there is issue on the fouth patch("blk-mq: only select online
CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
the other 3 patches are same with Christoph's:
https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix
gitweb:
https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
Could you test it and provide the feedback?
BTW, if it can't help this issue, could you boot from a normal disk first
and dump blk-mq debugfs of DASD later?
Thanks,
Ming
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
@ 2018-01-11 9:13 ` Ming Lei
0 siblings, 0 replies; 96+ messages in thread
From: Ming Lei @ 2018-01-11 9:13 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Stefan Haberland, Christoph Hellwig, Jens Axboe, Bart Van Assche,
linux-block, linux-kernel, Thomas Gleixner, linux-s390,
Martin Schwidefsky
On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> > On 07.12.2017 00:29, Christoph Hellwig wrote:
> >> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> >> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
> >>> blk-mq: create a blk_mq_ctx for each possible CPU
> >>> does not boot on DASD and
> >>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
> >>> genirq/affinity: assign vectors to all possible CPUs
> >>> does boot with DASD disks.
> >>>
> >>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> >>> s390 irq handling code).
> >> That is interesting as it really isn't related to interrupts at all,
> >> it just ensures that possible CPUs are set in ->cpumask.
> >>
> >> I guess we'd really want:
> >>
> >> e005655c389e3d25bf3e43f71611ec12f3012de0
> >> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
> >>
> >> before this commit, but it seems like the whole stack didn't work for
> >> your either.
> >>
> >> I wonder if there is some weird thing about nr_cpu_ids in s390?
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >
> > I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> > The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
> >
> > But at some point in time the disk do not get any requests.
> >
> > I currently have no clue why.
> > I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
> >
> > Do you have anything I could have a look at?
>
> Jens, Christoph, so what do we do about this?
> To summarize:
> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
> with interrupt affinity.
> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
> without hotplug).
Hello,
This one is a valid use case for VM, I think we need to fix that.
Looks there is issue on the fouth patch("blk-mq: only select online
CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
the other 3 patches are same with Christoph's:
https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix
gitweb:
https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
Could you test it and provide the feedback?
BTW, if it can't help this issue, could you boot from a normal disk first
and dump blk-mq debugfs of DASD later?
Thanks,
Ming
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2018-01-11 9:13 ` Ming Lei
(?)
@ 2018-01-11 9:26 ` Stefan Haberland
-1 siblings, 0 replies; 96+ messages in thread
From: Stefan Haberland @ 2018-01-11 9:26 UTC (permalink / raw)
To: Ming Lei, Christian Borntraeger
Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
linux-kernel, Thomas Gleixner, linux-s390, Martin Schwidefsky
On 11.01.2018 10:13, Ming Lei wrote:
> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>>>>> blk-mq: create a blk_mq_ctx for each possible CPU
>>>>> does not boot on DASD and
>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>>>>> genirq/affinity: assign vectors to all possible CPUs
>>>>> does boot with DASD disks.
>>>>>
>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>> s390 irq handling code).
>>>> That is interesting as it really isn't related to interrupts at all,
>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>
>>>> I guess we'd really want:
>>>>
>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>
>>>> before this commit, but it seems like the whole stack didn't work for
>>>> your either.
>>>>
>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>
>>> But at some point in time the disk do not get any requests.
>>>
>>> I currently have no clue why.
>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>
>>> Do you have anything I could have a look at?
>> Jens, Christoph, so what do we do about this?
>> To summarize:
>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>> with interrupt affinity.
>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>> without hotplug).
> Hello,
>
> This one is a valid use case for VM, I think we need to fix that.
>
> Looks there is issue on the fouth patch("blk-mq: only select online
> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
> the other 3 patches are same with Christoph's:
>
> https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix
>
> gitweb:
> https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>
> Could you test it and provide the feedback?
>
> BTW, if it can't help this issue, could you boot from a normal disk first
> and dump blk-mq debugfs of DASD later?
>
> Thanks,
> Ming
>
Hi,
thanks for the patch. I had pretty much the same place in suspicion.
I will test it asap.
Regards,
Stefan
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2018-01-11 9:13 ` Ming Lei
(?)
(?)
@ 2018-01-11 11:44 ` Christian Borntraeger
2018-01-11 13:17 ` Stefan Haberland
-1 siblings, 1 reply; 96+ messages in thread
From: Christian Borntraeger @ 2018-01-11 11:44 UTC (permalink / raw)
To: Ming Lei
Cc: Stefan Haberland, Christoph Hellwig, Jens Axboe, Bart Van Assche,
linux-block, linux-kernel, Thomas Gleixner, linux-s390,
Martin Schwidefsky
On 01/11/2018 10:13 AM, Ming Lei wrote:
> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>>>>> blk-mq: create a blk_mq_ctx for each possible CPU
>>>>> does not boot on DASD and
>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>>>>> genirq/affinity: assign vectors to all possible CPUs
>>>>> does boot with DASD disks.
>>>>>
>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>> s390 irq handling code).
>>>> That is interesting as it really isn't related to interrupts at all,
>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>
>>>> I guess we'd really want:
>>>>
>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>
>>>> before this commit, but it seems like the whole stack didn't work for
>>>> your either.
>>>>
>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>
>>> But at some point in time the disk do not get any requests.
>>>
>>> I currently have no clue why.
>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>
>>> Do you have anything I could have a look at?
>>
>> Jens, Christoph, so what do we do about this?
>> To summarize:
>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>> with interrupt affinity.
>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>> without hotplug).
>
> Hello,
>
> This one is a valid use case for VM, I think we need to fix that.
>
> Looks there is issue on the fouth patch("blk-mq: only select online
> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
> the other 3 patches are same with Christoph's:
>
> https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix
>
> gitweb:
> https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>
> Could you test it and provide the feedback?
>
> BTW, if it can't help this issue, could you boot from a normal disk first
> and dump blk-mq debugfs of DASD later?
That kernel seems to boot fine on my system with DASD disks.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2018-01-11 11:44 ` Christian Borntraeger
@ 2018-01-11 13:17 ` Stefan Haberland
0 siblings, 0 replies; 96+ messages in thread
From: Stefan Haberland @ 2018-01-11 13:17 UTC (permalink / raw)
To: Christian Borntraeger, Ming Lei
Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche, linux-block,
linux-kernel, Thomas Gleixner, linux-s390, Martin Schwidefsky
On 11.01.2018 12:44, Christian Borntraeger wrote:
>
> On 01/11/2018 10:13 AM, Ming Lei wrote:
>> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>>>>>> blk-mq: create a blk_mq_ctx for each possible CPU
>>>>>> does not boot on DASD and
>>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>>>>>> genirq/affinity: assign vectors to all possible CPUs
>>>>>> does boot with DASD disks.
>>>>>>
>>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>>> s390 irq handling code).
>>>>> That is interesting as it really isn't related to interrupts at all,
>>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>>
>>>>> I guess we'd really want:
>>>>>
>>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>>
>>>>> before this commit, but it seems like the whole stack didn't work for
>>>>> your either.
>>>>>
>>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>>
>>>> But at some point in time the disk do not get any requests.
>>>>
>>>> I currently have no clue why.
>>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>>
>>>> Do you have anything I could have a look at?
>>> Jens, Christoph, so what do we do about this?
>>> To summarize:
>>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>>> with interrupt affinity.
>>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>>> without hotplug).
>> Hello,
>>
>> This one is a valid use case for VM, I think we need to fix that.
>>
>> Looks there is issue on the fouth patch("blk-mq: only select online
>> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
>> the other 3 patches are same with Christoph's:
>>
>> https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix
>>
>> gitweb:
>> https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>>
>> Could you test it and provide the feedback?
>>
>> BTW, if it can't help this issue, could you boot from a normal disk first
>> and dump blk-mq debugfs of DASD later?
> That kernel seems to boot fine on my system with DASD disks.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
I did some regression testing and it works quite well. Boot works,
attaching CPUs during runtime on z/VM and enabling them in Linux works
as well.
I also did some DASD online/offline CPU enable/disable loops.
Regards,
Stefan
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2018-01-11 9:13 ` Ming Lei
` (2 preceding siblings ...)
(?)
@ 2018-01-11 17:46 ` Christoph Hellwig
2018-01-12 1:16 ` Ming Lei
-1 siblings, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2018-01-11 17:46 UTC (permalink / raw)
To: Ming Lei
Cc: Christian Borntraeger, Stefan Haberland, Christoph Hellwig,
Jens Axboe, Bart Van Assche, linux-block, linux-kernel,
Thomas Gleixner, linux-s390, Martin Schwidefsky
Thanks for looking into this Ming, I had missed it in the my current
work overload. Can you send the updated series to Jens?
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2018-01-11 17:46 ` Christoph Hellwig
@ 2018-01-12 1:16 ` Ming Lei
0 siblings, 0 replies; 96+ messages in thread
From: Ming Lei @ 2018-01-12 1:16 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Borntraeger, Stefan Haberland, Jens Axboe,
Bart Van Assche, linux-block, linux-kernel, Thomas Gleixner,
linux-s390, Martin Schwidefsky
On Thu, Jan 11, 2018 at 06:46:54PM +0100, Christoph Hellwig wrote:
> Thanks for looking into this Ming, I had missed it in the my current
> work overload. Can you send the updated series to Jens?
OK, I will post it out soon.
Thanks,
Ming
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-22 7:28 ` Christoph Hellwig
(?)
(?)
@ 2017-11-22 14:46 ` Jens Axboe
-1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-22 14:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: mst, Greg Kroah-Hartman, linux-kernel, stable, virtualization,
linux-block, Bart Van Assche
On 11/22/2017 12:28 AM, Christoph Hellwig wrote:
> Jens, please don't just revert the commit in your for-linus tree.
>
> On its own this will totally mess up the interrupt assignments. Give
> me a bit of time to sort this out properly.
I wasn't going to push it until I heard otherwise. I'll just pop it
off, for-linus isn't a stable branch.
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:31 ` Christian Borntraeger
(?)
(?)
@ 2017-11-21 20:39 ` Jens Axboe
-1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:39 UTC (permalink / raw)
To: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
Greg Kroah-Hartman, stable
On 11/21/2017 01:31 PM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 09:21 PM, Jens Axboe wrote:
>> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>>
>>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>>
>>>>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>>
>>>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>> the code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>>> take a look.
>>>>>>>>>>>
>>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>>
>>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>>
>>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>>> not available CPU.
>>>>>>>>>>
>>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>>
>>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>>> but becomes present and online afterwards.
>>>>>>>>>
>>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>>
>>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>>
>>>>>>>> Can you try the below?
>>>>>>>
>>>>>>>
>>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>>
>>>>>>>
>>>>>>> output with 2 cpus:
>>>>>>> /sys/kernel/debug/block/vda
>>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>>> /sys/kernel/debug/block/vda/state
>>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>>
>>>>>> Try this, basically just a revert.
>>>>>
>>>>> Yes, seems to work.
>>>>>
>>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>>
>>>> Great, thanks for testing.
>>>>
>>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>>> it has no Fixes tag and no cc stable-
>>>>
>>>> I was wondering the same thing when you said it was in 4.12.stable and
>>>> not in 4.12 release. That patch should absolutely not have gone into
>>>> stable, it's not marked as such and it's not fixing a problem that is
>>>> stable worthy. In fact, it's causing a regression...
>>>>
>>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>>
>>>
>>>
>>> Forgot to cc Greg?
>>
>> I did, thanks for doing that. Now I wonder how to mark this patch,
>> as we should revert it from kernels that have the bad commit. 4.12
>> is fine, 4.12.later-stable is not.
>>
>
> I think we should tag it with:
>
> Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
>
> which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.
Yeah, I think so too. But thinking more about this, I'm pretty sure this
adds a bad lock dependency with hotplug. Need to verify so we ensure we
don't introduce a potential deadlock here...
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:19 ` Christian Borntraeger
(?)
(?)
@ 2017-11-21 20:21 ` Jens Axboe
-1 siblings, 0 replies; 96+ messages in thread
From: Jens Axboe @ 2017-11-21 20:21 UTC (permalink / raw)
To: Christian Borntraeger, Bart Van Assche, virtualization,
linux-block, mst, jasowang, linux-kernel, Christoph Hellwig,
Greg Kroah-Hartman, stable
On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>
> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>> Bisect points to
>>>>>>>>>>>
>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>
>>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>>
>>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>
>>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>> the code.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>
>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>> take a look.
>>>>>>>>>
>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>
>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>
>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>> not available CPU.
>>>>>>>>
>>>>>>>> in libvirt/virsh speak:
>>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>
>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>> but becomes present and online afterwards.
>>>>>>>
>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>
>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>
>>>>>> Can you try the below?
>>>>>
>>>>>
>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>
>>>>>
>>>>> output with 2 cpus:
>>>>> /sys/kernel/debug/block/vda
>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>> /sys/kernel/debug/block/vda/sched
>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>> /sys/kernel/debug/block/vda/state
>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>
>>>> Try this, basically just a revert.
>>>
>>> Yes, seems to work.
>>>
>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>
>> Great, thanks for testing.
>>
>>> Do you know why the original commit made it into 4.12 stable? After all
>>> it has no Fixes tag and no cc stable-
>>
>> I was wondering the same thing when you said it was in 4.12.stable and
>> not in 4.12 release. That patch should absolutely not have gone into
>> stable, it's not marked as such and it's not fixing a problem that is
>> stable worthy. In fact, it's causing a regression...
>>
>> Greg? Upstream commit is mentioned higher up, start of the email.
>>
>
>
> Forgot to cc Greg?
I did, thanks for doing that. Now I wonder how to mark this patch,
as we should revert it from kernels that have the bad commit. 4.12
is fine, 4.12.later-stable is not.
--
Jens Axboe
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
2017-11-21 20:14 ` Jens Axboe
(?)
(?)
@ 2017-11-21 20:19 ` Christian Borntraeger
-1 siblings, 0 replies; 96+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:19 UTC (permalink / raw)
To: Jens Axboe, Bart Van Assche, virtualization, linux-block, mst,
jasowang, linux-kernel, Christoph Hellwig, Greg Kroah-Hartman,
stable
On 11/21/2017 09:14 PM, Jens Axboe wrote:
> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>> Bisect points to
>>>>>>>>>>
>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Date: Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>
>>>>>>>>>> blk-mq: Create hctx for each present CPU
>>>>>>>>>>
>>>>>>>>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>
>>>>>>>>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>> of churn due to frequent soft offline / online operations. Instead
>>>>>>>>>> allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>> the code.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>> Cc: linux-block@vger.kernel.org
>>>>>>>>>> Cc: linux-nvme@lists.infradead.org
>>>>>>>>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>> Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>
>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>> take a look.
>>>>>>>>
>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>
>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>> for a dead cpu and handle that.
>>>>>>>
>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>> not available CPU.
>>>>>>>
>>>>>>> in libvirt/virsh speak:
>>>>>>> <vcpu placement='static' current='1'>4</vcpu>
>>>>>>
>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>> but becomes present and online afterwards.
>>>>>>
>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>
>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>
>>>>> Can you try the below?
>>>>
>>>>
>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>
>>>>
>>>> output with 2 cpus:
>>>> /sys/kernel/debug/block/vda
>>>> /sys/kernel/debug/block/vda/hctx0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>> /sys/kernel/debug/block/vda/sched
>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>> /sys/kernel/debug/block/vda/sched/starved
>>>> /sys/kernel/debug/block/vda/sched/batching
>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>> /sys/kernel/debug/block/vda/write_hints
>>>> /sys/kernel/debug/block/vda/state
>>>> /sys/kernel/debug/block/vda/requeue_list
>>>> /sys/kernel/debug/block/vda/poll_stat
>>>
>>> Try this, basically just a revert.
>>
>> Yes, seems to work.
>>
>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>
> Great, thanks for testing.
>
>> Do you know why the original commit made it into 4.12 stable? After all
>> it has no Fixes tag and no cc stable-
>
> I was wondering the same thing when you said it was in 4.12.stable and
> not in 4.12 release. That patch should absolutely not have gone into
> stable, it's not marked as such and it's not fixing a problem that is
> stable worthy. In fact, it's causing a regression...
>
> Greg? Upstream commit is mentioned higher up, start of the email.
>
Forgot to cc Greg?
^ permalink raw reply [flat|nested] 96+ messages in thread