All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>, Daniel Wagner <dwagner@suse.de>,
	Wen Xiong <wenxiong@us.ibm.com>,
	John Garry <john.garry@huawei.com>,
	Hannes Reinecke <hare@suse.de>, Keith Busch <kbusch@kernel.org>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Ming Lei <ming.lei@redhat.com>
Subject: [PATCH V2 1/6] blk-mq: prepare for not deactivating hctx if managed irq isn't used
Date: Fri,  2 Jul 2021 23:05:50 +0800	[thread overview]
Message-ID: <20210702150555.2401722-2-ming.lei@redhat.com> (raw)
In-Reply-To: <20210702150555.2401722-1-ming.lei@redhat.com>

blk-mq deactivates one hctx when the last CPU in hctx->cpumask become
offline by draining all requests originated from this hctx and moving new
allocation on other active hctx. This way is for avoiding inflight IO in
case of managed irq because managed irq is shutdown when the last CPU in
the irq's affinity becomes offline.

However, lots of drivers(nvme fc, rdma, tcp, loop, ...) don't use managed
irq, so they needn't to deactivate hctx when the last CPU becomes offline.
Also, some of them are the only user of blk_mq_alloc_request_hctx() which
is used for connecting io queue. And their requirement is that the connect
request needs to be submitted successfully via one specified hctx even though
all CPUs in this hctx->cpumask have become offline.

Preparing for addressing the requirement for nvme fc/rdma/loop by adding
BLK_MQ_F_MANAGED_IRQ to not deactivate hctxs if managed irq isn't used.
Finally, if one driver uses managed irq, it has to tell blk-mq via
BLK_MQ_F_MANAGED_IRQ.

Meantime blk-mq's cpu hotplug handling can be optimized a bit if managed
irq isn't used.

Given blk_mq_alloc_request_hctx() is always called by driver without
BLK_MQ_F_MANAGED_IRQ, it is safe to take one offline cpu for getting
the sw context.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-mq.c         | 23 +++++++++++++----------
 include/linux/blk-mq.h |  1 +
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 4b66d2776eda..17f57af3a4d6 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -247,6 +247,7 @@ static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(NO_SCHED),
 	HCTX_FLAG_NAME(STACKING),
 	HCTX_FLAG_NAME(TAG_HCTX_SHARED),
+	HCTX_FLAG_NAME(MANAGED_IRQ),
 };
 #undef HCTX_FLAG_NAME
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2e9fd0ec63d7..1d45d2922ca7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -427,6 +427,15 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 }
 EXPORT_SYMBOL(blk_mq_alloc_request);
 
+static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx)
+{
+	int cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask);
+
+	if (cpu >= nr_cpu_ids)
+		cpu = cpumask_first(hctx->cpumask);
+	return cpu;
+}
+
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
 {
@@ -468,7 +477,10 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	data.hctx = q->queue_hw_ctx[hctx_idx];
 	if (!blk_mq_hw_queue_mapped(data.hctx))
 		goto out_queue_exit;
-	cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
+
+	WARN_ON_ONCE(data.hctx->flags & BLK_MQ_F_MANAGED_IRQ);
+
+	cpu = blk_mq_first_mapped_cpu(data.hctx);
 	data.ctx = __blk_mq_get_ctx(q, cpu);
 
 	if (!q->elevator)
@@ -1501,15 +1513,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	hctx_unlock(hctx, srcu_idx);
 }
 
-static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx)
-{
-	int cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask);
-
-	if (cpu >= nr_cpu_ids)
-		cpu = cpumask_first(hctx->cpumask);
-	return cpu;
-}
-
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index fd2de2b422ed..62fc0393cc3a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -403,6 +403,7 @@ enum {
 	 */
 	BLK_MQ_F_STACKING	= 1 << 2,
 	BLK_MQ_F_TAG_HCTX_SHARED = 1 << 3,
+	BLK_MQ_F_MANAGED_IRQ	= 1 << 4,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.31.1


WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>, Daniel Wagner <dwagner@suse.de>,
	Wen Xiong <wenxiong@us.ibm.com>,
	John Garry <john.garry@huawei.com>,
	Hannes Reinecke <hare@suse.de>, Keith Busch <kbusch@kernel.org>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Ming Lei <ming.lei@redhat.com>
Subject: [PATCH V2 1/6] blk-mq: prepare for not deactivating hctx if managed irq isn't used
Date: Fri,  2 Jul 2021 23:05:50 +0800	[thread overview]
Message-ID: <20210702150555.2401722-2-ming.lei@redhat.com> (raw)
In-Reply-To: <20210702150555.2401722-1-ming.lei@redhat.com>

blk-mq deactivates one hctx when the last CPU in hctx->cpumask become
offline by draining all requests originated from this hctx and moving new
allocation on other active hctx. This way is for avoiding inflight IO in
case of managed irq because managed irq is shutdown when the last CPU in
the irq's affinity becomes offline.

However, lots of drivers(nvme fc, rdma, tcp, loop, ...) don't use managed
irq, so they needn't to deactivate hctx when the last CPU becomes offline.
Also, some of them are the only user of blk_mq_alloc_request_hctx() which
is used for connecting io queue. And their requirement is that the connect
request needs to be submitted successfully via one specified hctx even though
all CPUs in this hctx->cpumask have become offline.

Preparing for addressing the requirement for nvme fc/rdma/loop by adding
BLK_MQ_F_MANAGED_IRQ to not deactivate hctxs if managed irq isn't used.
Finally, if one driver uses managed irq, it has to tell blk-mq via
BLK_MQ_F_MANAGED_IRQ.

Meantime blk-mq's cpu hotplug handling can be optimized a bit if managed
irq isn't used.

Given blk_mq_alloc_request_hctx() is always called by driver without
BLK_MQ_F_MANAGED_IRQ, it is safe to take one offline cpu for getting
the sw context.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-mq.c         | 23 +++++++++++++----------
 include/linux/blk-mq.h |  1 +
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 4b66d2776eda..17f57af3a4d6 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -247,6 +247,7 @@ static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(NO_SCHED),
 	HCTX_FLAG_NAME(STACKING),
 	HCTX_FLAG_NAME(TAG_HCTX_SHARED),
+	HCTX_FLAG_NAME(MANAGED_IRQ),
 };
 #undef HCTX_FLAG_NAME
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2e9fd0ec63d7..1d45d2922ca7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -427,6 +427,15 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 }
 EXPORT_SYMBOL(blk_mq_alloc_request);
 
+static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx)
+{
+	int cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask);
+
+	if (cpu >= nr_cpu_ids)
+		cpu = cpumask_first(hctx->cpumask);
+	return cpu;
+}
+
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
 {
@@ -468,7 +477,10 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	data.hctx = q->queue_hw_ctx[hctx_idx];
 	if (!blk_mq_hw_queue_mapped(data.hctx))
 		goto out_queue_exit;
-	cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
+
+	WARN_ON_ONCE(data.hctx->flags & BLK_MQ_F_MANAGED_IRQ);
+
+	cpu = blk_mq_first_mapped_cpu(data.hctx);
 	data.ctx = __blk_mq_get_ctx(q, cpu);
 
 	if (!q->elevator)
@@ -1501,15 +1513,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	hctx_unlock(hctx, srcu_idx);
 }
 
-static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx)
-{
-	int cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask);
-
-	if (cpu >= nr_cpu_ids)
-		cpu = cpumask_first(hctx->cpumask);
-	return cpu;
-}
-
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index fd2de2b422ed..62fc0393cc3a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -403,6 +403,7 @@ enum {
 	 */
 	BLK_MQ_F_STACKING	= 1 << 2,
 	BLK_MQ_F_TAG_HCTX_SHARED = 1 << 3,
+	BLK_MQ_F_MANAGED_IRQ	= 1 << 4,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.31.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-07-02 15:07 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-02 15:05 [PATCH V2 0/6] blk-mq: fix blk_mq_alloc_request_hctx Ming Lei
2021-07-02 15:05 ` Ming Lei
2021-07-02 15:05 ` Ming Lei [this message]
2021-07-02 15:05   ` [PATCH V2 1/6] blk-mq: prepare for not deactivating hctx if managed irq isn't used Ming Lei
2021-07-02 15:05 ` [PATCH V2 2/6] nvme: pci: pass BLK_MQ_F_MANAGED_IRQ to blk-mq Ming Lei
2021-07-02 15:05   ` Ming Lei
2021-07-02 15:05 ` [PATCH V2 3/6] scsi: add flag of .use_managed_irq to 'struct Scsi_Host' Ming Lei
2021-07-02 15:05   ` Ming Lei
2021-07-05  9:25   ` John Garry
2021-07-05  9:25     ` John Garry
2021-07-05  9:55     ` Ming Lei
2021-07-05  9:55       ` Ming Lei
2021-07-06  5:37       ` Christoph Hellwig
2021-07-06  5:37         ` Christoph Hellwig
2021-07-06  7:41         ` Ming Lei
2021-07-06  7:41           ` Ming Lei
2021-07-06 10:32           ` Hannes Reinecke
2021-07-06 10:32             ` Hannes Reinecke
2021-07-07 10:53             ` Ming Lei
2021-07-07 10:53               ` Ming Lei
2021-07-02 15:05 ` [PATCH V2 4/6] scsi: set shost->use_managed_irq if driver uses managed irq Ming Lei
2021-07-02 15:05   ` Ming Lei
2021-07-05  7:35   ` John Garry
2021-07-05  7:35     ` John Garry
2021-07-06  5:38   ` Christoph Hellwig
2021-07-06  5:38     ` Christoph Hellwig
2021-07-02 15:05 ` [PATCH V2 5/6] virtio: add one field into virtio_device for recording if device " Ming Lei
2021-07-02 15:05   ` Ming Lei
2021-07-02 15:05   ` Ming Lei
2021-07-02 15:55   ` Michael S. Tsirkin
2021-07-02 15:55     ` Michael S. Tsirkin
2021-07-02 15:55     ` Michael S. Tsirkin
2021-07-05  2:48     ` Ming Lei
2021-07-05  2:48       ` Ming Lei
2021-07-05  2:48       ` Ming Lei
2021-07-05  3:59   ` Jason Wang
2021-07-05  3:59     ` Jason Wang
2021-07-05  3:59     ` Jason Wang
2021-07-06  5:42   ` Christoph Hellwig
2021-07-06  5:42     ` Christoph Hellwig
2021-07-06  5:42     ` Christoph Hellwig
2021-07-06  7:53     ` Ming Lei
2021-07-06  7:53       ` Ming Lei
2021-07-06  7:53       ` Ming Lei
2021-07-07  9:06     ` Thomas Gleixner
2021-07-07  9:06       ` Thomas Gleixner
2021-07-07  9:06       ` Thomas Gleixner
2021-07-07  9:42       ` Ming Lei
2021-07-07  9:42         ` Ming Lei
2021-07-07  9:42         ` Ming Lei
2021-07-07 14:05         ` Christoph Hellwig
2021-07-07 14:05           ` Christoph Hellwig
2021-07-07 14:05           ` Christoph Hellwig
2021-07-08  6:34           ` Ming Lei
2021-07-08  6:34             ` Ming Lei
2021-07-08  6:34             ` Ming Lei
2021-07-02 15:05 ` [PATCH V2 6/6] blk-mq: don't deactivate hctx if managed irq isn't used Ming Lei
2021-07-02 15:05   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210702150555.2401722-2-ming.lei@redhat.com \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@wdc.com \
    --cc=dwagner@suse.de \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    --cc=wenxiong@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.