[PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version)
@ 2020-05-15  1:41 Ming Lei
  2020-05-15  1:41 ` [PATCH 1/6] blk-mq: allocate request on cpu in hctx->cpumask for blk_mq_alloc_request_hctx Ming Lei
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Ming Lei @ 2020-05-15  1:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, John Garry, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, Thomas Gleixner

Hi,

Thomas mentioned:
    "
     That was the constraint of managed interrupts from the very beginning:
    
      The driver/subsystem has to quiesce the interrupt line and the associated
      queue _before_ it gets shutdown in CPU unplug and not fiddle with it
      until it's restarted by the core when the CPU is plugged in again.
    "

But no drivers or blk-mq do that before one hctx becomes inactive(all
CPUs for one hctx are offline), and even it is worse, blk-mq stills tries
to run hw queue after hctx is dead, see blk_mq_hctx_notify_dead().

This patchset tries to address the issue by the following approach:

- before the last cpu in hctx->cpumask is going to offline, mark this
hctx as inactive

- disable preempt during allocating tag for request, and after tag
is allocated, check if this hctx is inactive. If yes, give up this
allocation and try remote allocation from online CPUs

- before hctx becomes inactive, drain all allocated requests on this
hctx

Thanks John Garry for running lots of tests on arm64 with this previous
version patches and co-working on investigating all kinds of issues.

Thanks Christoph for review on old versions of this patches.

https://github.com/ming1/linux/commits/v5.7-rc-blk-mq-improve-cpu-hotplug

Ming Lei (6):
  blk-mq: allocate request on cpu in hctx->cpumask for
    blk_mq_alloc_request_hctx
  blk-mq: set data->ctx and data->hctx explicitly in blk_mq_get_request
  blk-mq: add blk_mq_all_tag_iter
  blk-mq: prepare for draining IO when hctx's all CPUs are offline
  blk-mq: disable preempt during allocating request tag
  blk-mq: stop to allocate new requst and drain request before hctx
    becomes inactive

 block/blk-mq-debugfs.c     |   2 +
 block/blk-mq-tag.c         |  47 +++++++++-
 block/blk-mq-tag.h         |   2 +
 block/blk-mq.c             | 184 ++++++++++++++++++++++++++++++++-----
 block/blk-mq.h             |   1 +
 drivers/block/loop.c       |   2 +-
 drivers/md/dm-rq.c         |   2 +-
 include/linux/blk-mq.h     |  10 ++
 include/linux/cpuhotplug.h |   1 +
 9 files changed, 221 insertions(+), 30 deletions(-)

Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
-- 
2.25.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] blk-mq: allocate request on cpu in hctx->cpumask for blk_mq_alloc_request_hctx
  2020-05-15  1:41 [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Ming Lei
@ 2020-05-15  1:41 ` Ming Lei
  2020-05-15 15:32   ` Christoph Hellwig
  2020-05-15  1:41 ` [PATCH 2/6] blk-mq: set data->ctx and data->hctx explicitly in blk_mq_get_request Ming Lei
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2020-05-15  1:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Christoph Hellwig, Thomas Gleixner, John Garry

blk_mq_alloc_request_hctx() asks blk-mq to allocate request from
specified hctx, which is usually bound with fixed cpu mapping, and
request is supposed to be allocated on CPU in hctx->cpumask.

So use smp_call_function_any() to allocate request on the cpu in
hctx->cpumask for blk_mq_alloc_request_hctx().

Dedclare blk_mq_get_request() beforehand because the following patches
reuses __blk_mq_alloc_request for blk_mq_get_request().

Prepare for improving cpu hotplug support.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9ee695bdf873..e2e1b6808b32 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -40,6 +40,10 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
+static struct request *blk_mq_get_request(struct request_queue *q,
+					  struct bio *bio,
+					  struct blk_mq_alloc_data *data);
+
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -330,6 +334,19 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	return rq;
 }
 
+struct blk_mq_smp_call_info {
+	struct request_queue *q;
+	struct blk_mq_alloc_data *data;
+	struct request *rq;
+};
+
+static void __blk_mq_alloc_request(void *alloc_info)
+{
+	struct blk_mq_smp_call_info *info = alloc_info;
+
+	info->rq = blk_mq_get_request(info->q, NULL, info->data);
+}
+
 static struct request *blk_mq_get_request(struct request_queue *q,
 					  struct bio *bio,
 					  struct blk_mq_alloc_data *data)
@@ -424,8 +441,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
 {
 	struct blk_mq_alloc_data alloc_data = { .flags = flags, .cmd_flags = op };
-	struct request *rq;
-	unsigned int cpu;
+	struct blk_mq_smp_call_info info = {.q = q, .data = &alloc_data};
 	int ret;
 
 	/*
@@ -448,21 +464,22 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	 * Check if the hardware context is actually mapped to anything.
 	 * If not tell the caller that it should skip this queue.
 	 */
-	alloc_data.hctx = q->queue_hw_ctx[hctx_idx];
-	if (!blk_mq_hw_queue_mapped(alloc_data.hctx)) {
+	if (!blk_mq_hw_queue_mapped(q->queue_hw_ctx[hctx_idx])) {
 		blk_queue_exit(q);
 		return ERR_PTR(-EXDEV);
 	}
-	cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
-	alloc_data.ctx = __blk_mq_get_ctx(q, cpu);
 
-	rq = blk_mq_get_request(q, NULL, &alloc_data);
+	ret = smp_call_function_any(alloc_data.hctx->cpumask,
+			__blk_mq_alloc_request, &info, 1);
 	blk_queue_exit(q);
 
-	if (!rq)
+	if (ret)
+		return ERR_PTR(ret);
+
+	if (!info.rq)
 		return ERR_PTR(-EWOULDBLOCK);
 
-	return rq;
+	return info.rq;
 }
 EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx);
 
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6] blk-mq: set data->ctx and data->hctx explicitly in blk_mq_get_request
  2020-05-15  1:41 [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Ming Lei
  2020-05-15  1:41 ` [PATCH 1/6] blk-mq: allocate request on cpu in hctx->cpumask for blk_mq_alloc_request_hctx Ming Lei
@ 2020-05-15  1:41 ` Ming Lei
  2020-05-15  1:41 ` [PATCH 3/6] blk-mq: add blk_mq_all_tag_iter Ming Lei
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2020-05-15  1:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Christoph Hellwig, Thomas Gleixner, John Garry

The only request allocation user which provides data->ctx and data->hctx
is blk_mq_alloc_request_hctx, now we have kill such use, so set
data->ctx and data->hctx explicitly in blk_mq_get_request()

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e2e1b6808b32..35966af878c6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -354,7 +354,6 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 	struct elevator_queue *e = q->elevator;
 	struct request *rq;
 	unsigned int tag;
-	bool clear_ctx_on_error = false;
 	u64 alloc_time_ns = 0;
 
 	blk_queue_enter_live(q);
@@ -364,13 +363,10 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 		alloc_time_ns = ktime_get_ns();
 
 	data->q = q;
-	if (likely(!data->ctx)) {
-		data->ctx = blk_mq_get_ctx(q);
-		clear_ctx_on_error = true;
-	}
-	if (likely(!data->hctx))
-		data->hctx = blk_mq_map_queue(q, data->cmd_flags,
-						data->ctx);
+
+	WARN_ON_ONCE(data->ctx || data->hctx);
+	data->ctx = blk_mq_get_ctx(q);
+	data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
 	if (data->cmd_flags & REQ_NOWAIT)
 		data->flags |= BLK_MQ_REQ_NOWAIT;
 
@@ -392,8 +388,8 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 
 	tag = blk_mq_get_tag(data);
 	if (tag == BLK_MQ_TAG_FAIL) {
-		if (clear_ctx_on_error)
-			data->ctx = NULL;
+		data->ctx = NULL;
+		data->hctx = NULL;
 		blk_queue_exit(q);
 		return NULL;
 	}
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6] blk-mq: add blk_mq_all_tag_iter
  2020-05-15  1:41 [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Ming Lei
  2020-05-15  1:41 ` [PATCH 1/6] blk-mq: allocate request on cpu in hctx->cpumask for blk_mq_alloc_request_hctx Ming Lei
  2020-05-15  1:41 ` [PATCH 2/6] blk-mq: set data->ctx and data->hctx explicitly in blk_mq_get_request Ming Lei
@ 2020-05-15  1:41 ` Ming Lei
  2020-05-15  1:41 ` [PATCH 4/6] blk-mq: prepare for draining IO when hctx's all CPUs are offline Ming Lei
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2020-05-15  1:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Christoph Hellwig, Thomas Gleixner, John Garry

Add one new function of blk_mq_all_tag_iter so that we can iterate every
allocated request from either io scheduler tags or driver tags, and this
way is more flexible since it allows the callers to do whatever they want
on allocated request.

It will be used to implement draining allocated requests on specified
hctx in this patchset.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 33 +++++++++++++++++++++++++++++----
 block/blk-mq-tag.h |  2 ++
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 586c9d6e904a..c27c6dfc7d36 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -257,6 +257,7 @@ struct bt_tags_iter_data {
 	busy_tag_iter_fn *fn;
 	void *data;
 	bool reserved;
+	bool iterate_all;
 };
 
 static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
@@ -274,7 +275,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * test and set the bit before assining ->rqs[].
 	 */
 	rq = tags->rqs[bitnr];
-	if (rq && blk_mq_request_started(rq))
+	if (rq && (iter_data->iterate_all || blk_mq_request_started(rq)))
 		return iter_data->fn(rq, iter_data->data, reserved);
 
 	return true;
@@ -294,13 +295,15 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
  *		bitmap_tags member of struct blk_mq_tags.
  */
 static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
-			     busy_tag_iter_fn *fn, void *data, bool reserved)
+			     busy_tag_iter_fn *fn, void *data, bool reserved,
+			     bool iterate_all)
 {
 	struct bt_tags_iter_data iter_data = {
 		.tags = tags,
 		.fn = fn,
 		.data = data,
 		.reserved = reserved,
+		.iterate_all = iterate_all,
 	};
 
 	if (tags->rqs)
@@ -321,8 +324,30 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
 		busy_tag_iter_fn *fn, void *priv)
 {
 	if (tags->nr_reserved_tags)
-		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
-	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
+		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true,
+				 false);
+	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false, false);
+}
+
+/**
+ * blk_mq_all_tag_iter - iterate over all requests in a tag map
+ * @tags:	Tag map to iterate over.
+ * @fn:		Pointer to the function that will be called for each
+ *		request. @fn will be called as follows: @fn(rq, @priv,
+ *		reserved) where rq is a pointer to a request. 'reserved'
+ *		indicates whether or not @rq is a reserved request. Return
+ *		true to continue iterating tags, false to stop.
+ * @priv:	Will be passed as second argument to @fn.
+ *
+ * It is the caller's responsibility to check rq's state in @fn.
+ */
+void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
+		void *priv)
+{
+	if (tags->nr_reserved_tags)
+		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true,
+				 true);
+	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false, true);
 }
 
 /**
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 2b8321efb682..d19546e8246b 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -34,6 +34,8 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		void *priv);
+void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
+		void *priv);
 
 static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
 						 struct blk_mq_hw_ctx *hctx)
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6] blk-mq: prepare for draining IO when hctx's all CPUs are offline
  2020-05-15  1:41 [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Ming Lei
                   ` (2 preceding siblings ...)
  2020-05-15  1:41 ` [PATCH 3/6] blk-mq: add blk_mq_all_tag_iter Ming Lei
@ 2020-05-15  1:41 ` Ming Lei
  2020-05-15  1:41 ` [PATCH 5/6] blk-mq: disable preempt during allocating request tag Ming Lei
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2020-05-15  1:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, John Garry, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, Thomas Gleixner

Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:

"
 That was the constraint of managed interrupts from the very beginning:

  The driver/subsystem has to quiesce the interrupt line and the associated
  queue _before_ it gets shutdown in CPU unplug and not fiddle with it
  until it's restarted by the core when the CPU is plugged in again.
"

However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is
one cpuhp state handled after the CPU is down, so there isn't any chance
to quiesce hctx for blk-mq wrt. CPU hotplug.

Add new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE for blk-mq to stop queues
and wait for completion of in-flight requests.

We will stop request allocation on any cpu in hctx->cpumask and wait for
completion of all allocated requests when one hctx is becoming inactive in
the following patch. This way may cause dead-lock for some stacking blk-mq
drivers, such as dm-rq and loop.

So add blk-mq flag of BLK_MQ_F_STACKING and mark it for dm-rq and loop,
so we needn't to wait for completion of in-flight requests from dm-rq &
loop, then the potential dead-lock can be avoided. It is safe to do
this way for dm-rq and loop, both don't use managed IRQ, and their IO
request completion is always triggered by underlying disk's IO
completion.

[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/

Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c     |  1 +
 block/blk-mq.c             | 19 +++++++++++++++++++
 drivers/block/loop.c       |  2 +-
 drivers/md/dm-rq.c         |  2 +-
 include/linux/blk-mq.h     |  7 +++++++
 include/linux/cpuhotplug.h |  1 +
 6 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 96b7a35c898a..6d40eff04524 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -239,6 +239,7 @@ static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(TAG_SHARED),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
+	HCTX_FLAG_NAME(STACKING),
 };
 #undef HCTX_FLAG_NAME
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 35966af878c6..151b63b1f88b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2312,6 +2312,16 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	return -ENOMEM;
 }
 
+static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node)
+{
+	return 0;
+}
+
+static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
+{
+	return 0;
+}
+
 /*
  * 'cpu' is going away. splice any existing rq_list entries from this
  * software queue to the hw queue dispatch list, and ensure that it
@@ -2348,6 +2358,9 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 
 static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
 {
+	if (!(hctx->flags & BLK_MQ_F_STACKING))
+		cpuhp_state_remove_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE,
+						    &hctx->cpuhp_online);
 	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
 					    &hctx->cpuhp_dead);
 }
@@ -2407,6 +2420,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
 {
 	hctx->queue_num = hctx_idx;
 
+	if (!(hctx->flags & BLK_MQ_F_STACKING))
+		cpuhp_state_add_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE,
+				&hctx->cpuhp_online);
 	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
 
 	hctx->tags = set->tags[hctx_idx];
@@ -3661,6 +3677,9 @@ static int __init blk_mq_init(void)
 {
 	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
 				blk_mq_hctx_notify_dead);
+	cpuhp_setup_state_multi(CPUHP_AP_BLK_MQ_ONLINE, "block/mq:online",
+				blk_mq_hctx_notify_online,
+				blk_mq_hctx_notify_offline);
 	return 0;
 }
 subsys_initcall(blk_mq_init);
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index da693e6a834e..d7904b4d8d12 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2037,7 +2037,7 @@ static int loop_add(struct loop_device **l, int i)
 	lo->tag_set.queue_depth = 128;
 	lo->tag_set.numa_node = NUMA_NO_NODE;
 	lo->tag_set.cmd_size = sizeof(struct loop_cmd);
-	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING;
 	lo->tag_set.driver_data = lo;
 
 	err = blk_mq_alloc_tag_set(&lo->tag_set);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 3f8577e2c13b..f60c02512121 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -547,7 +547,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
 	md->tag_set->ops = &dm_mq_ops;
 	md->tag_set->queue_depth = dm_get_blk_mq_queue_depth();
 	md->tag_set->numa_node = md->numa_node_id;
-	md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
+	md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING;
 	md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues();
 	md->tag_set->driver_data = md;
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d7307795439a..682ce79e8973 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -140,6 +140,8 @@ struct blk_mq_hw_ctx {
 	 */
 	atomic_t		nr_active;
 
+	/** @cpuhp_online: List to store request if CPU is going to die */
+	struct hlist_node	cpuhp_online;
 	/** @cpuhp_dead: List to store request if some CPU die. */
 	struct hlist_node	cpuhp_dead;
 	/** @kobj: Kernel object for sysfs. */
@@ -391,6 +393,11 @@ struct blk_mq_ops {
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
+	/*
+	 * set when this device requires underlying blk-mq device for
+	 * completing IO
+	 */
+	BLK_MQ_F_STACKING	= 1 << 2,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 77d70b633531..24b3a77810b6 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -152,6 +152,7 @@ enum cpuhp_state {
 	CPUHP_AP_SMPBOOT_THREADS,
 	CPUHP_AP_X86_VDSO_VMA_ONLINE,
 	CPUHP_AP_IRQ_AFFINITY_ONLINE,
+	CPUHP_AP_BLK_MQ_ONLINE,
 	CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS,
 	CPUHP_AP_X86_INTEL_EPB_ONLINE,
 	CPUHP_AP_PERF_ONLINE,
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6] blk-mq: disable preempt during allocating request tag
  2020-05-15  1:41 [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Ming Lei
                   ` (3 preceding siblings ...)
  2020-05-15  1:41 ` [PATCH 4/6] blk-mq: prepare for draining IO when hctx's all CPUs are offline Ming Lei
@ 2020-05-15  1:41 ` Ming Lei
  2020-05-15 15:38   ` Christoph Hellwig
  2020-05-15  1:41 ` [PATCH 6/6] blk-mq: stop to allocate new requst and drain request before hctx becomes inactive Ming Lei
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2020-05-15  1:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Christoph Hellwig, Thomas Gleixner, John Garry

Disable preempt for a little while during allocating request tag, so
request's tag(internal tag) is always allocated on the cpu of data->ctx,
prepare for improving to handle cpu hotplug during allocating request.

In the following patch, hctx->state will be checked to see if it becomes
inactive which is always set on the last CPU of hctx->cpumask.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 4 ++++
 block/blk-mq.c     | 4 ++++
 block/blk-mq.h     | 1 +
 3 files changed, 9 insertions(+)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index c27c6dfc7d36..0ae79ca6e2da 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -152,11 +152,15 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 		if (tag != -1)
 			break;
 
+		if (data->preempt_disabled)
+			preempt_enable();
 		bt_prev = bt;
 		io_schedule();
 
 		sbitmap_finish_wait(bt, ws, &wait);
 
+		if (data->preempt_disabled)
+			preempt_disable();
 		data->ctx = blk_mq_get_ctx(data->q);
 		data->hctx = blk_mq_map_queue(data->q, data->cmd_flags,
 						data->ctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 151b63b1f88b..18999198a37f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -365,6 +365,9 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 	data->q = q;
 
 	WARN_ON_ONCE(data->ctx || data->hctx);
+
+	preempt_disable();
+	data->preempt_disabled = true;
 	data->ctx = blk_mq_get_ctx(q);
 	data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
 	if (data->cmd_flags & REQ_NOWAIT)
@@ -387,6 +390,7 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 	}
 
 	tag = blk_mq_get_tag(data);
+	preempt_enable();
 	if (tag == BLK_MQ_TAG_FAIL) {
 		data->ctx = NULL;
 		data->hctx = NULL;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 10bfdfb494fa..5ebc8d8d5a54 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -152,6 +152,7 @@ struct blk_mq_alloc_data {
 	blk_mq_req_flags_t flags;
 	unsigned int shallow_depth;
 	unsigned int cmd_flags;
+	bool preempt_disabled;
 
 	/* input & output parameter */
 	struct blk_mq_ctx *ctx;
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] blk-mq: stop to allocate new requst and drain request before hctx becomes inactive
  2020-05-15  1:41 [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Ming Lei
                   ` (4 preceding siblings ...)
  2020-05-15  1:41 ` [PATCH 5/6] blk-mq: disable preempt during allocating request tag Ming Lei
@ 2020-05-15  1:41 ` Ming Lei
  2020-05-15  3:55   ` Bart Van Assche
  2020-05-15 15:45 ` [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Christoph Hellwig
  2020-05-16 12:35 ` Christoph Hellwig
  7 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2020-05-15  1:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, John Garry, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, Thomas Gleixner

Before one CPU becomes offline, check if it is the last online CPU of hctx.
If yes, mark this hctx as inactive, meantime wait for completion of all
allocated IO requests originated from this hctx. Meantime check if this
hctx has become inactive during allocating request, if yes, give up it
and retry from new online CPU.

This way guarantees that there isn't any inflight IO before shutdown
the managed IRQ line when all CPUs of this IRQ line is offline.

Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c |   1 +
 block/blk-mq-tag.c     |  10 ++++
 block/blk-mq.c         | 122 +++++++++++++++++++++++++++++++++++++----
 include/linux/blk-mq.h |   3 +
 4 files changed, 125 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 6d40eff04524..15df3a36e9fa 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -213,6 +213,7 @@ static const char *const hctx_state_name[] = {
 	HCTX_STATE_NAME(STOPPED),
 	HCTX_STATE_NAME(TAG_ACTIVE),
 	HCTX_STATE_NAME(SCHED_RESTART),
+	HCTX_STATE_NAME(INACTIVE),
 };
 #undef HCTX_STATE_NAME
 
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 0ae79ca6e2da..c8e34c1f547f 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -184,6 +184,16 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	sbitmap_finish_wait(bt, ws, &wait);
 
 found_tag:
+	/*
+	 * we are in request allocation, check if the current hctx is inactive.
+	 * If yes, give up this allocation. We will retry on another new online
+	 * CPU.
+	 */
+	if (data->preempt_disabled && unlikely(test_bit(BLK_MQ_S_INACTIVE,
+					&data->hctx->state))) {
+		blk_mq_put_tag(tags, data->ctx, tag + tag_offset);
+		return BLK_MQ_TAG_FAIL;
+	}
 	return tag + tag_offset;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 18999198a37f..0551aa1740ec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -40,9 +40,9 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
-static struct request *blk_mq_get_request(struct request_queue *q,
-					  struct bio *bio,
-					  struct blk_mq_alloc_data *data);
+static struct request *__blk_mq_get_request(struct request_queue *q,
+					    struct bio *bio,
+					    struct blk_mq_alloc_data *data);
 
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
@@ -338,18 +338,19 @@ struct blk_mq_smp_call_info {
 	struct request_queue *q;
 	struct blk_mq_alloc_data *data;
 	struct request *rq;
+	struct bio *bio;
 };
 
 static void __blk_mq_alloc_request(void *alloc_info)
 {
 	struct blk_mq_smp_call_info *info = alloc_info;
 
-	info->rq = blk_mq_get_request(info->q, NULL, info->data);
+	info->rq = __blk_mq_get_request(info->q, info->bio, info->data);
 }
 
-static struct request *blk_mq_get_request(struct request_queue *q,
-					  struct bio *bio,
-					  struct blk_mq_alloc_data *data)
+static struct request *__blk_mq_get_request(struct request_queue *q,
+					    struct bio *bio,
+					    struct blk_mq_alloc_data *data)
 {
 	struct elevator_queue *e = q->elevator;
 	struct request *rq;
@@ -413,6 +414,34 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 	return rq;
 }
 
+static struct request *blk_mq_get_request(struct request_queue *q,
+					  struct bio *bio,
+					  struct blk_mq_alloc_data *data)
+{
+	int ret;
+	struct blk_mq_smp_call_info info;
+	struct request *rq = __blk_mq_get_request(q, bio, data);
+
+	if (rq)
+		return rq;
+
+	if (data->flags & BLK_MQ_REQ_NOWAIT)
+		return NULL;
+
+	/*
+	 * The allocation fails because of inactive hctx, so retry from
+	 * online cpus.
+	 */
+	info.q = q;
+	info.bio = bio;
+	info.data = data;
+
+	ret = smp_call_function_any(cpu_online_mask, __blk_mq_alloc_request, &info, 1);
+	if (ret)
+		return ERR_PTR(ret);
+	return info.rq;
+}
+
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 		blk_mq_req_flags_t flags)
 {
@@ -2316,13 +2345,81 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	return -ENOMEM;
 }
 
-static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node)
+struct rq_iter_data {
+	bool has_rq;
+	struct blk_mq_hw_ctx *hctx;
+};
+
+static bool blk_mq_has_request(struct request *rq, void *data, bool reserved)
 {
-	return 0;
+	struct rq_iter_data *iter_data = data;
+
+	if (rq->mq_hctx == iter_data->hctx) {
+		iter_data->has_rq = true;
+		return false;
+	}
+
+	return true;
+}
+
+static bool blk_mq_tags_has_request(struct blk_mq_hw_ctx *hctx)
+{
+	struct blk_mq_tags *tags = hctx->sched_tags ?: hctx->tags;
+	struct rq_iter_data data = {
+		.hctx	= hctx,
+	};
+
+	blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
+	return data.has_rq;
+}
+
+static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu,
+		struct blk_mq_hw_ctx *hctx)
+{
+	if (cpumask_next_and(-1, hctx->cpumask, cpu_online_mask) != cpu)
+		return false;
+	if (cpumask_next_and(cpu, hctx->cpumask, cpu_online_mask) < nr_cpu_ids)
+		return false;
+	return true;
 }
 
 static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
 {
+	struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node,
+			struct blk_mq_hw_ctx, cpuhp_online);
+	struct request_queue *q = hctx->queue;
+
+	if (!cpumask_test_cpu(cpu, hctx->cpumask))
+		return 0;
+
+	if (!blk_mq_last_cpu_in_hctx(cpu, hctx))
+		return 0;
+
+	/* Prevent new request from being allocated on the current hctx/cpu */
+	set_bit(BLK_MQ_S_INACTIVE, &hctx->state);
+
+	/*
+	 * Grab one refcount for avoiding scheduler switch, and
+	 * return immediately if queue has been frozen.
+	 */
+	if (!percpu_ref_tryget(&q->q_usage_counter))
+		return 0;
+
+	/* wait until all requests in this hctx are gone */
+	while (blk_mq_tags_has_request(hctx))
+		msleep(5);
+
+	percpu_ref_put(&q->q_usage_counter);
+	return 0;
+}
+
+static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node,
+			struct blk_mq_hw_ctx, cpuhp_online);
+
+	if (cpumask_test_cpu(cpu, hctx->cpumask))
+		clear_bit(BLK_MQ_S_INACTIVE, &hctx->state);
 	return 0;
 }
 
@@ -2333,12 +2430,15 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
  */
 static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 {
-	struct blk_mq_hw_ctx *hctx;
+	struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node,
+			struct blk_mq_hw_ctx, cpuhp_dead);
 	struct blk_mq_ctx *ctx;
 	LIST_HEAD(tmp);
 	enum hctx_type type;
 
-	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
+	if (!cpumask_test_cpu(cpu, hctx->cpumask))
+		return 0;
+
 	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
 	type = hctx->type;
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 682ce79e8973..e3f6608346b5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -407,6 +407,9 @@ enum {
 	BLK_MQ_S_TAG_ACTIVE	= 1,
 	BLK_MQ_S_SCHED_RESTART	= 2,
 
+	/* hw queue is inactive after all its CPUs become offline */
+	BLK_MQ_S_INACTIVE	= 3,
+
 	BLK_MQ_MAX_DEPTH	= 10240,
 
 	BLK_MQ_CPU_WORK_BATCH	= 8,
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] blk-mq: stop to allocate new requst and drain request before hctx becomes inactive
  2020-05-15  1:41 ` [PATCH 6/6] blk-mq: stop to allocate new requst and drain request before hctx becomes inactive Ming Lei
@ 2020-05-15  3:55   ` Bart Van Assche
  2020-05-15  5:20     ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2020-05-15  3:55 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, John Garry, Hannes Reinecke, Christoph Hellwig,
	Thomas Gleixner

On 2020-05-14 18:41, Ming Lei wrote:
> +	/* Prevent new request from being allocated on the current hctx/cpu */
> +	set_bit(BLK_MQ_S_INACTIVE, &hctx->state);

What guarantees that this change has been observed by all CPUs before
the blk_mq_tags_has_request() loop finishes?

> +	/*
> +	 * Grab one refcount for avoiding scheduler switch, and
> +	 * return immediately if queue has been frozen.
> +	 */
> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> +		return 0;

If percpu_ref_tryget(&q->q_usage_counter) fails that means either that
request queue freezing is in progress or that a request queue has been
frozen. I don't think that it is safe to return immediately if request
queue freezing is still in progress.

> +	/* wait until all requests in this hctx are gone */
> +	while (blk_mq_tags_has_request(hctx))
> +		msleep(5);
> +
> +	percpu_ref_put(&q->q_usage_counter);
> +	return 0;
> +}

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] blk-mq: stop to allocate new requst and drain request before hctx becomes inactive
  2020-05-15  3:55   ` Bart Van Assche
@ 2020-05-15  5:20     ` Ming Lei
  0 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2020-05-15  5:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, John Garry, Hannes Reinecke,
	Christoph Hellwig, Thomas Gleixner

On Thu, May 14, 2020 at 08:55:34PM -0700, Bart Van Assche wrote:
> On 2020-05-14 18:41, Ming Lei wrote:
> > +	/* Prevent new request from being allocated on the current hctx/cpu */
> > +	set_bit(BLK_MQ_S_INACTIVE, &hctx->state);
> 
> What guarantees that this change has been observed by all CPUs before
> the blk_mq_tags_has_request() loop finishes?

We don't need all CPUs to observe this INACTIVE flag, and just need
allocation from the current CPU to observe this flag.

The current CPU is the last online CPU of this hctx, so:

1) any requests which are allocated from this cpu will either be drained
by blk_mq_tags_has_request(), or allocated remotely, see blk_mq_get_request().

2) for any requests which are allocated from other offline CPUs of this
hctx, their tag bits have been committed to memory before that cpu offline,
so the current cpu(blk_mq_hctx_notify_offline) can observe their tags bit
reliably, also each cpu offline handling is strictly serialized.

> 
> > +	/*
> > +	 * Grab one refcount for avoiding scheduler switch, and
> > +	 * return immediately if queue has been frozen.
> > +	 */
> > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > +		return 0;
> 
> If percpu_ref_tryget(&q->q_usage_counter) fails that means either that
> request queue freezing is in progress or that a request queue has been
> frozen. I don't think that it is safe to return immediately if request
> queue freezing is still in progress.

static inline bool percpu_ref_tryget(struct percpu_ref *ref)
{
        unsigned long __percpu *percpu_count;
        bool ret;

        rcu_read_lock();

        if (__ref_is_percpu(ref, &percpu_count)) {
                this_cpu_inc(*percpu_count);
                ret = true;
        } else {
                ret = atomic_long_inc_not_zero(&ref->count);
        }       

        rcu_read_unlock();

        return ret;
}

If percpu_ref_tryget returns false, that means the atomic refcount
has become zero, so the request queue has been frozen, and we are safe
to return. Or my understanding on atomic_long_inc_not_zero() is wrong? :-)

Thanks,
Ming

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] blk-mq: allocate request on cpu in hctx->cpumask for blk_mq_alloc_request_hctx
  2020-05-15  1:41 ` [PATCH 1/6] blk-mq: allocate request on cpu in hctx->cpumask for blk_mq_alloc_request_hctx Ming Lei
@ 2020-05-15 15:32   ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-05-15 15:32 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	Christoph Hellwig, Thomas Gleixner, John Garry

On Fri, May 15, 2020 at 09:41:48AM +0800, Ming Lei wrote:
> blk_mq_alloc_request_hctx() asks blk-mq to allocate request from
> specified hctx, which is usually bound with fixed cpu mapping, and
> request is supposed to be allocated on CPU in hctx->cpumask.
> 
> So use smp_call_function_any() to allocate request on the cpu in
> hctx->cpumask for blk_mq_alloc_request_hctx().
> 
> Dedclare blk_mq_get_request() beforehand because the following patches
> reuses __blk_mq_alloc_request for blk_mq_get_request().
> 
> Prepare for improving cpu hotplug support.

With your series applied the kernel instantly panics  when creating a
nvme-loop controller:

[   27.189993] nvmet: creating controller 1 for subsystem testnqn for NQN
hostnqn.
[   27.199370] nvme nvme0: creating 4 I/O queues.
[   27.202650] BUG: kernel NULL pointer dereference, address: 0000000000000128
[   27.205004] #PF: supervisor read access in kernel mode
[   27.206382] #PF: error_code(0x0000) - not-present page
[   27.207741] PGD 800000012dfc9067 P4D 800000012dfc9067 PUD 12dfae067 PMD 0 
[   27.209326] Oops: 0000 [#1] PREEMPT SMP PTI
[   27.210214] CPU: 2 PID: 3786 Comm: bash Not tainted 5.7.0-rc2+ #44
[   27.211511] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1 04/01/2014
[   27.213626] RIP: 0010:smp_call_function_any+0x34/0xf0
[   27.214736] Code: 41 54 49 89 f4 55 48 89 fd bf 01 00 00 00 e8 33 23 f8 ff 48 c7 c7 3b 0b 12 d
[   27.218079] RSP: 0018:ffffc900003d3b90 EFLAGS: 00010202
[   27.219340] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000001
[   27.221199] RDX: 0000000000000000 RSI: ffffffff8198c920 RDI: ffffffff83120b3b
[   27.222517] RBP: 0000000000000128 R08: 0000000000000002 R09: 0000000000020022
[   27.223780] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8198c920
[   27.225002] R13: ffffc900003d3bb8 R14: 0000000000000001 R15: ffff88812b488008
[   27.226195] FS:  00007fca23cae740(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   27.227520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.228378] CR2: 0000000000000128 CR3: 000000012eba6000 CR4: 00000000000006e0
[   27.229444] Call Trace:
[   27.229810]  blk_mq_alloc_request_hctx+0xe7/0x140
[   27.230487]  nvme_alloc_request+0x2d/0x70
[   27.231053]  __nvme_submit_sync_cmd+0x4a/0x1f0
[   27.231720]  ? mark_held_locks+0x49/0x70
[   27.232279]  ? __slab_alloc.isra.0.constprop.0+0x63/0x80
[   27.233038]  ? nvmf_connect_io_queue+0x85/0x180
[   27.233687]  nvmf_connect_io_queue+0x12d/0x180
[   27.234296]  ? cpumask_next_and+0x19/0x20
[   27.234848]  ? nvme_loop_connect_io_queues+0x4c/0x60
[   27.235718]  ? blk_mq_init_queue_data+0x36/0x60
[   27.236598]  nvme_loop_connect_io_queues+0x4c/0x60
[   27.237379]  nvme_loop_create_ctrl+0x2f0/0x450
[   27.238016]  nvmf_dev_write+0x7e3/0xb2f
[   27.238541]  ? find_held_lock+0x2b/0x80
[   27.239080]  ? do_user_addr_fault+0x205/0x480
[   27.239763]  vfs_write+0xb4/0x1a0
[   27.240302]  ksys_write+0x63/0xe0
[   27.240845]  do_syscall_64+0x4b/0x1e0
[   27.241446]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[   27.242253] RIP: 0033:0x7fca233a2134
[   27.242821] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 5
[   27.245798] RSP: 002b:00007ffe430d43e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   27.247255] RAX: ffffffffffffffda RBX: 000000000000002b RCX: 00007fca233a2134
[   27.248525] RDX: 000000000000002b RSI: 000000000119f408 RDI: 0000000000000001
[   27.249806] RBP: 000000000119f408 R08: 000000000000000a R09: 00000000011d8988
[   27.251238] R10: 000000000000000a R11: 0000000000000246 R12: 00007fca2366f760
[   27.252195] R13: 000000000000002b R14: 00007fca2366a760 R15: 000000000000002b
[   27.253157] Modules linked in:
[   27.253603] CR2: 0000000000000128
[   27.254056] ---[ end trace 75ba575e2625a1c6 ]---
[   27.254684] RIP: 0010:smp_call_function_any+0x34/0xf0
[   27.255398] Code: 41 54 49 89 f4 55 48 89 fd bf 01 00 00 00 e8 33 23 f8 ff 48 c7 c7 3b 0b 12 d
[   27.258140] RSP: 0018:ffffc900003d3b90 EFLAGS: 00010202
[   27.258966] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000001
[   27.260075] RDX: 0000000000000000 RSI: ffffffff8198c920 RDI: ffffffff83120b3b
[   27.261166] RBP: 0000000000000128 R08: 0000000000000002 R09: 0000000000020022
[   27.262278] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8198c920
[   27.263219] R13: ffffc900003d3bb8 R14: 0000000000000001 R15: ffff88812b488008
[   27.264139] FS:  00007fca23cae740(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   27.265175] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.265893] CR2: 0000000000000128 CR3: 000000012eba6000 CR4: 00000000000006e0
[   27.266805] Kernel panic - not syncing: Fatal exception
[   27.267640] Kernel Offset: disabled
[   27.268087] ---[ end Kernel panic - not syncing: Fatal exception ]---


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] blk-mq: disable preempt during allocating request tag
  2020-05-15  1:41 ` [PATCH 5/6] blk-mq: disable preempt during allocating request tag Ming Lei
@ 2020-05-15 15:38   ` Christoph Hellwig
  2020-05-15 16:10     ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2020-05-15 15:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	Christoph Hellwig, Thomas Gleixner, John Garry

On Fri, May 15, 2020 at 09:41:52AM +0800, Ming Lei wrote:
> Disable preempt for a little while during allocating request tag, so
> request's tag(internal tag) is always allocated on the cpu of data->ctx,
> prepare for improving to handle cpu hotplug during allocating request.
> 
> In the following patch, hctx->state will be checked to see if it becomes
> inactive which is always set on the last CPU of hctx->cpumask.

I like the idea, but I hate the interface.  I really think we need
to moving assigning the ctx and hctx entirely into blk_mq_get_tag,
and then just unconditionally disable preemption in there.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version)
  2020-05-15  1:41 [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Ming Lei
                   ` (5 preceding siblings ...)
  2020-05-15  1:41 ` [PATCH 6/6] blk-mq: stop to allocate new requst and drain request before hctx becomes inactive Ming Lei
@ 2020-05-15 15:45 ` Christoph Hellwig
  2020-05-16 12:35 ` Christoph Hellwig
  7 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-05-15 15:45 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, Thomas Gleixner

FYI, I think this could be significantly simplified by stashing
the input bio and output rq into struct blk_mq_alloc_data instead of
creating another structure.  Something like this untested patch on
top of the whole series shows where I'd like to get to:


diff --git a/block/blk-mq.c b/block/blk-mq.c
index bcc894faf4e36..52b8917ea7cb6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -41,10 +41,6 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
-static struct request *__blk_mq_get_request(struct request_queue *q,
-					    struct bio *bio,
-					    struct blk_mq_alloc_data *data);
-
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -274,8 +270,8 @@ static inline bool blk_mq_need_time_stamp(struct request *rq)
 	return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS)) || rq->q->elevator;
 }
 
-static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
-		unsigned int tag, unsigned int op, u64 alloc_time_ns)
+static void blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, unsigned int tag,
+		unsigned int op, u64 alloc_time_ns)
 {
 	struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
 	struct request *rq = tags->static_rqs[tag];
@@ -333,29 +329,13 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 
 	data->ctx->rq_dispatched[op_is_sync(op)]++;
 	refcount_set(&rq->ref, 1);
-	return rq;
-}
-
-struct blk_mq_smp_call_info {
-	struct request_queue *q;
-	struct blk_mq_alloc_data *data;
-	struct request *rq;
-	struct bio *bio;
-};
-
-static void __blk_mq_alloc_request(void *alloc_info)
-{
-	struct blk_mq_smp_call_info *info = alloc_info;
-
-	info->rq = __blk_mq_get_request(info->q, info->bio, info->data);
+	data->rq = rq;
 }
 
-static struct request *__blk_mq_get_request(struct request_queue *q,
-					    struct bio *bio,
-					    struct blk_mq_alloc_data *data)
+static void __blk_mq_get_request(struct blk_mq_alloc_data *data)
 {
+	struct request_queue *q = data->q;
 	struct elevator_queue *e = q->elevator;
-	struct request *rq;
 	unsigned int tag;
 	u64 alloc_time_ns = 0;
 
@@ -365,8 +345,6 @@ static struct request *__blk_mq_get_request(struct request_queue *q,
 	if (blk_queue_rq_alloc_time(q))
 		alloc_time_ns = ktime_get_ns();
 
-	data->q = q;
-
 	WARN_ON_ONCE(data->ctx || data->hctx);
 
 	preempt_disable();
@@ -398,64 +376,61 @@ static struct request *__blk_mq_get_request(struct request_queue *q,
 		data->ctx = NULL;
 		data->hctx = NULL;
 		blk_queue_exit(q);
-		return NULL;
+		return;
 	}
 
-	rq = blk_mq_rq_ctx_init(data, tag, data->cmd_flags, alloc_time_ns);
+	blk_mq_rq_ctx_init(data, tag, data->cmd_flags, alloc_time_ns);
 	if (!op_is_flush(data->cmd_flags)) {
-		rq->elv.icq = NULL;
+		data->rq->elv.icq = NULL;
 		if (e && e->type->ops.prepare_request) {
 			if (e->type->icq_cache)
-				blk_mq_sched_assign_ioc(rq);
+				blk_mq_sched_assign_ioc(data->rq);
 
-			e->type->ops.prepare_request(rq, bio);
-			rq->rq_flags |= RQF_ELVPRIV;
+			e->type->ops.prepare_request(data->rq, data->bio);
+			data->rq->rq_flags |= RQF_ELVPRIV;
 		}
 	}
 	data->hctx->queued++;
-	return rq;
 }
 
-static struct request *blk_mq_get_request(struct request_queue *q,
-					  struct bio *bio,
-					  struct blk_mq_alloc_data *data)
+static void blk_mq_get_request_cb(void *data)
 {
-	int ret;
-	struct blk_mq_smp_call_info info;
-	struct request *rq = __blk_mq_get_request(q, bio, data);
-
-	if (rq)
-		return rq;
+	__blk_mq_get_request(data);
+}
 
-	if (data->flags & BLK_MQ_REQ_NOWAIT)
-		return NULL;
+static int blk_mq_get_request_cpumask(struct blk_mq_alloc_data *data,
+		const cpumask_t *mask)
+{
+	return smp_call_function_any(mask, blk_mq_get_request_cb, data, 1);
+}
 
+static struct request *blk_mq_get_request(struct blk_mq_alloc_data *data)
+{
 	/*
-	 * The allocation fails because of inactive hctx, so retry from
-	 * online cpus.
+	 * For !BLK_MQ_REQ_NOWAIT the allocation only fails due to an inactive
+	 * hctx, so retry from an online CPU.
 	 */
-	info.q = q;
-	info.bio = bio;
-	info.data = data;
-
-	ret = smp_call_function_any(cpu_online_mask, __blk_mq_alloc_request, &info, 1);
-	if (ret)
-		return ERR_PTR(ret);
-	return info.rq;
+	__blk_mq_get_request(data);
+	if (!data->rq && !(data->flags & BLK_MQ_REQ_NOWAIT))
+		blk_mq_get_request_cpumask(data, cpu_online_mask);
+	return data->rq;
 }
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 		blk_mq_req_flags_t flags)
 {
-	struct blk_mq_alloc_data alloc_data = { .flags = flags, .cmd_flags = op };
+	struct blk_mq_alloc_data alloc_data = {
+		.q		= q,
+		.flags		= flags,
+		.cmd_flags	= op,
+	};
 	struct request *rq;
 	int ret;
 
 	ret = blk_queue_enter(q, flags);
 	if (ret)
 		return ERR_PTR(ret);
-
-	rq = blk_mq_get_request(q, NULL, &alloc_data);
+	rq = blk_mq_get_request(&alloc_data);
 	blk_queue_exit(q);
 
 	if (!rq)
@@ -469,10 +444,14 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 EXPORT_SYMBOL(blk_mq_alloc_request);
 
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
-	unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
+		unsigned int op, blk_mq_req_flags_t flags,
+		unsigned int hctx_idx)
 {
-	struct blk_mq_alloc_data alloc_data = { .flags = flags, .cmd_flags = op };
-	struct blk_mq_smp_call_info info = {.q = q, .data = &alloc_data};
+	struct blk_mq_alloc_data alloc_data = {
+		.q		= q,
+		.flags		= flags,
+		.cmd_flags	= op,
+	};
 	int ret;
 
 	/*
@@ -500,17 +479,9 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 		return ERR_PTR(-EXDEV);
 	}
 
-	ret = smp_call_function_any(alloc_data.hctx->cpumask,
-			__blk_mq_alloc_request, &info, 1);
+	blk_mq_get_request_cpumask(&alloc_data, alloc_data.hctx->cpumask);
 	blk_queue_exit(q);
-
-	if (ret)
-		return ERR_PTR(ret);
-
-	if (!info.rq)
-		return ERR_PTR(-EWOULDBLOCK);
-
-	return info.rq;
+	return alloc_data.rq;
 }
 EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx);
 
@@ -2065,7 +2036,9 @@ blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 {
 	const int is_sync = op_is_sync(bio->bi_opf);
 	const int is_flush_fua = op_is_flush(bio->bi_opf);
-	struct blk_mq_alloc_data data = { .flags = 0};
+	struct blk_mq_alloc_data data = {
+		.q		= q,
+	};
 	struct request *rq;
 	struct blk_plug *plug;
 	struct request *same_queue_rq = NULL;
@@ -2088,8 +2061,9 @@ blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
 	rq_qos_throttle(q, bio);
 
+	data.bio = bio;
 	data.cmd_flags = bio->bi_opf;
-	rq = blk_mq_get_request(q, bio, &data);
+	rq = blk_mq_get_request(&data);
 	if (unlikely(!rq)) {
 		rq_qos_cleanup(q, bio);
 		if (bio->bi_opf & REQ_NOWAIT)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 5ebc8d8d5a540..e07be009831e9 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -149,6 +149,7 @@ static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
 struct blk_mq_alloc_data {
 	/* input parameter */
 	struct request_queue *q;
+	struct bio *bio;
 	blk_mq_req_flags_t flags;
 	unsigned int shallow_depth;
 	unsigned int cmd_flags;
@@ -157,6 +158,9 @@ struct blk_mq_alloc_data {
 	/* input & output parameter */
 	struct blk_mq_ctx *ctx;
 	struct blk_mq_hw_ctx *hctx;
+
+	/* output parameter for __blk_mq_get_request */
+	struct request *rq;
 };
 
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] blk-mq: disable preempt during allocating request tag
  2020-05-15 15:38   ` Christoph Hellwig
@ 2020-05-15 16:10     ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-05-15 16:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Bart Van Assche, Hannes Reinecke,
	Christoph Hellwig, Thomas Gleixner, John Garry

On Fri, May 15, 2020 at 05:38:47PM +0200, Christoph Hellwig wrote:
> On Fri, May 15, 2020 at 09:41:52AM +0800, Ming Lei wrote:
> > Disable preempt for a little while during allocating request tag, so
> > request's tag(internal tag) is always allocated on the cpu of data->ctx,
> > prepare for improving to handle cpu hotplug during allocating request.
> > 
> > In the following patch, hctx->state will be checked to see if it becomes
> > inactive which is always set on the last CPU of hctx->cpumask.
> 
> I like the idea, but I hate the interface.  I really think we need
> to moving assigning the ctx and hctx entirely into blk_mq_get_tag,
> and then just unconditionally disable preemption in there.

Maybe something vaguely like this (on top of the previous patch):


diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index c8e34c1f547fe..f75be6c431ac6 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -111,7 +111,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	if (data->flags & BLK_MQ_REQ_RESERVED) {
 		if (unlikely(!tags->nr_reserved_tags)) {
 			WARN_ON_ONCE(1);
-			return BLK_MQ_TAG_FAIL;
+			goto fail;
 		}
 		bt = &tags->breserved_tags;
 		tag_offset = 0;
@@ -120,12 +120,22 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 		tag_offset = tags->nr_reserved_tags;
 	}
 
+	if (!data->driver_tag) {
+		preempt_disable();
+		data->ctx = blk_mq_get_ctx(data->q);
+		data->hctx = blk_mq_map_queue(data->q, data->cmd_flags,
+				data->ctx);
+	}
+
+	if (!(data->flags & BLK_MQ_REQ_INTERNAL))
+		data->shared = blk_mq_tag_busy(data->hctx);
+
 	tag = __blk_mq_get_tag(data, bt);
 	if (tag != -1)
 		goto found_tag;
 
 	if (data->flags & BLK_MQ_REQ_NOWAIT)
-		return BLK_MQ_TAG_FAIL;
+		goto fail;
 
 	ws = bt_wait_ptr(bt, data->hctx);
 	do {
@@ -152,14 +162,14 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 		if (tag != -1)
 			break;
 
-		if (data->preempt_disabled)
+		if (!data->driver_tag)
 			preempt_enable();
 		bt_prev = bt;
 		io_schedule();
 
 		sbitmap_finish_wait(bt, ws, &wait);
 
-		if (data->preempt_disabled)
+		if (!data->driver_tag)
 			preempt_disable();
 		data->ctx = blk_mq_get_ctx(data->q);
 		data->hctx = blk_mq_map_queue(data->q, data->cmd_flags,
@@ -184,17 +194,20 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	sbitmap_finish_wait(bt, ws, &wait);
 
 found_tag:
-	/*
-	 * we are in request allocation, check if the current hctx is inactive.
-	 * If yes, give up this allocation. We will retry on another new online
-	 * CPU.
-	 */
-	if (data->preempt_disabled && unlikely(test_bit(BLK_MQ_S_INACTIVE,
-					&data->hctx->state))) {
-		blk_mq_put_tag(tags, data->ctx, tag + tag_offset);
-		return BLK_MQ_TAG_FAIL;
+	if (!data->driver_tag) {
+		/* fail allocations for inactive contexts. */
+		if (unlikely(!test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state)))
+			goto fail_free_tag;
+		preempt_enable();
 	}
 	return tag + tag_offset;
+
+fail_free_tag:
+	blk_mq_put_tag(tags, data->ctx, tag + tag_offset);
+fail:
+	data->ctx = NULL;
+	data->hctx = NULL;
+	return BLK_MQ_TAG_FAIL;
 }
 
 void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 52b8917ea7cb6..298557d00e081 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -347,10 +347,6 @@ static void __blk_mq_get_request(struct blk_mq_alloc_data *data)
 
 	WARN_ON_ONCE(data->ctx || data->hctx);
 
-	preempt_disable();
-	data->preempt_disabled = true;
-	data->ctx = blk_mq_get_ctx(q);
-	data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
 	if (data->cmd_flags & REQ_NOWAIT)
 		data->flags |= BLK_MQ_REQ_NOWAIT;
 
@@ -366,15 +362,10 @@ static void __blk_mq_get_request(struct blk_mq_alloc_data *data)
 		    e->type->ops.limit_depth &&
 		    !(data->flags & BLK_MQ_REQ_RESERVED))
 			e->type->ops.limit_depth(data->cmd_flags, data);
-	} else {
-		blk_mq_tag_busy(data->hctx);
 	}
 
 	tag = blk_mq_get_tag(data);
-	preempt_enable();
 	if (tag == BLK_MQ_TAG_FAIL) {
-		data->ctx = NULL;
-		data->hctx = NULL;
 		blk_queue_exit(q);
 		return;
 	}
@@ -1042,8 +1033,8 @@ bool blk_mq_get_driver_tag(struct request *rq)
 		.hctx = rq->mq_hctx,
 		.flags = BLK_MQ_REQ_NOWAIT,
 		.cmd_flags = rq->cmd_flags,
+		.driver_tag = true,
 	};
-	bool shared;
 
 	if (rq->tag != -1)
 		return true;
@@ -1051,10 +1042,9 @@ bool blk_mq_get_driver_tag(struct request *rq)
 	if (blk_mq_tag_is_reserved(data.hctx->sched_tags, rq->internal_tag))
 		data.flags |= BLK_MQ_REQ_RESERVED;
 
-	shared = blk_mq_tag_busy(data.hctx);
 	rq->tag = blk_mq_get_tag(&data);
 	if (rq->tag >= 0) {
-		if (shared) {
+		if (data.shared) {
 			rq->rq_flags |= RQF_MQ_INFLIGHT;
 			atomic_inc(&data.hctx->nr_active);
 		}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index e07be009831e9..563d096ea0cf4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -153,14 +153,15 @@ struct blk_mq_alloc_data {
 	blk_mq_req_flags_t flags;
 	unsigned int shallow_depth;
 	unsigned int cmd_flags;
-	bool preempt_disabled;
+	bool driver_tag;
 
 	/* input & output parameter */
 	struct blk_mq_ctx *ctx;
 	struct blk_mq_hw_ctx *hctx;
 
-	/* output parameter for __blk_mq_get_request */
+	/* output parameters */
 	struct request *rq;
+	bool shared;
 };
 
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version)
  2020-05-15  1:41 [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Ming Lei
                   ` (6 preceding siblings ...)
  2020-05-15 15:45 ` [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Christoph Hellwig
@ 2020-05-16 12:35 ` Christoph Hellwig
  2020-05-17  4:08   ` Ming Lei
  7 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2020-05-16 12:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, Thomas Gleixner

I took at stab at the series this morning, and fixed the fabrics
crash (blk_mq_alloc_request_hctx passed the cpumask of a NULL hctx),
and pre-loaded a bunch of cļeanups to let your changes fit in better.

Let me know what you think, the git branch is here:

    git://git.infradead.org/users/hch/block.git blk-mq-hotplug

Gitweb:

    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-mq-hotplug


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version)
  2020-05-16 12:35 ` Christoph Hellwig
@ 2020-05-17  4:08   ` Ming Lei
  2020-05-17  7:08     ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2020-05-17  4:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke, Thomas Gleixner

On Sat, May 16, 2020 at 8:37 PM Christoph Hellwig <hch@lst.de> wrote:
>
> I took at stab at the series this morning, and fixed the fabrics
> crash (blk_mq_alloc_request_hctx passed the cpumask of a NULL hctx),
> and pre-loaded a bunch of cļeanups to let your changes fit in better.
>
> Let me know what you think, the git branch is here:
>
>     git://git.infradead.org/users/hch/block.git blk-mq-hotplug
>
> Gitweb:
>
>     http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-mq-hotplug
>

I think the approach is fine, especially the driver tag related
change, that was in my mind too, :-)

I guess you will post them all soon, and I will take a close look
after they are out.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version)
  2020-05-17  4:08   ` Ming Lei
@ 2020-05-17  7:08     ` Christoph Hellwig
  2020-05-18  4:07       ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2020-05-17  7:08 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Ming Lei, Jens Axboe, linux-block, John Garry,
	Bart Van Assche, Hannes Reinecke, Thomas Gleixner

On Sun, May 17, 2020 at 12:08:40PM +0800, Ming Lei wrote:
> On Sat, May 16, 2020 at 8:37 PM Christoph Hellwig <hch@lst.de> wrote:
> >
> > I took at stab at the series this morning, and fixed the fabrics
> > crash (blk_mq_alloc_request_hctx passed the cpumask of a NULL hctx),
> > and pre-loaded a bunch of cļeanups to let your changes fit in better.
> >
> > Let me know what you think, the git branch is here:
> >
> >     git://git.infradead.org/users/hch/block.git blk-mq-hotplug
> >
> > Gitweb:
> >
> >     http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-mq-hotplug
> >
> 
> I think the approach is fine, especially the driver tag related
> change, that was in my mind too, :-)
> 
> I guess you will post them all soon, and I will take a close look
> after they are out.

Let me know what you prefer - either I can send it out or you can
pick the series up.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version)
  2020-05-17  7:08     ` Christoph Hellwig
@ 2020-05-18  4:07       ` Ming Lei
  0 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2020-05-18  4:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-block, John Garry, Bart Van Assche,
	Hannes Reinecke, Thomas Gleixner

On Sun, May 17, 2020 at 09:08:53AM +0200, Christoph Hellwig wrote:
> On Sun, May 17, 2020 at 12:08:40PM +0800, Ming Lei wrote:
> > On Sat, May 16, 2020 at 8:37 PM Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > I took at stab at the series this morning, and fixed the fabrics
> > > crash (blk_mq_alloc_request_hctx passed the cpumask of a NULL hctx),
> > > and pre-loaded a bunch of cļeanups to let your changes fit in better.
> > >
> > > Let me know what you think, the git branch is here:
> > >
> > >     git://git.infradead.org/users/hch/block.git blk-mq-hotplug
> > >
> > > Gitweb:
> > >
> > >     http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-mq-hotplug
> > >
> > 
> > I think the approach is fine, especially the driver tag related
> > change, that was in my mind too, :-)
> > 
> > I guess you will post them all soon, and I will take a close look
> > after they are out.
> 
> Let me know what you prefer - either I can send it out or you can
> pick the series up.
> 

I prefer you send it out because they depend on your another 5 posted
cleanup patches.

thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-05-18  4:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-15  1:41 [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Ming Lei
2020-05-15  1:41 ` [PATCH 1/6] blk-mq: allocate request on cpu in hctx->cpumask for blk_mq_alloc_request_hctx Ming Lei
2020-05-15 15:32   ` Christoph Hellwig
2020-05-15  1:41 ` [PATCH 2/6] blk-mq: set data->ctx and data->hctx explicitly in blk_mq_get_request Ming Lei
2020-05-15  1:41 ` [PATCH 3/6] blk-mq: add blk_mq_all_tag_iter Ming Lei
2020-05-15  1:41 ` [PATCH 4/6] blk-mq: prepare for draining IO when hctx's all CPUs are offline Ming Lei
2020-05-15  1:41 ` [PATCH 5/6] blk-mq: disable preempt during allocating request tag Ming Lei
2020-05-15 15:38   ` Christoph Hellwig
2020-05-15 16:10     ` Christoph Hellwig
2020-05-15  1:41 ` [PATCH 6/6] blk-mq: stop to allocate new requst and drain request before hctx becomes inactive Ming Lei
2020-05-15  3:55   ` Bart Van Assche
2020-05-15  5:20     ` Ming Lei
2020-05-15 15:45 ` [PATCH 0/6] blk-mq: improvement CPU hotplug(simplified version) Christoph Hellwig
2020-05-16 12:35 ` Christoph Hellwig
2020-05-17  4:08   ` Ming Lei
2020-05-17  7:08     ` Christoph Hellwig
2020-05-18  4:07       ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.