* [PATCH V5 1/3] driver core: add device_has_managed_msi_irq
2021-07-21 9:17 [PATCH V5 0/3] blk-mq: fix blk_mq_alloc_request_hctx Ming Lei
@ 2021-07-21 9:17 ` Ming Lei
2021-07-21 13:48 ` Greg Kroah-Hartman
2021-07-21 14:36 ` Christoph Hellwig
2021-07-21 9:17 ` [PATCH V5 2/3] blk-mq: mark if one queue map uses managed irq Ming Lei
2021-07-21 9:17 ` [PATCH V5 3/3] blk-mq: don't deactivate hctx if managed irq isn't used Ming Lei
2 siblings, 2 replies; 11+ messages in thread
From: Ming Lei @ 2021-07-21 9:17 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig, linux-block, Greg Kroah-Hartman
Cc: Thomas Gleixner, John Garry, Sagi Grimberg, Daniel Wagner,
Wen Xiong, Hannes Reinecke, Ming Lei
irq vector allocation with managed affinity may be used by driver, and
blk-mq needs this info for draining queue because genirq core will shutdown
managed irq when all CPUs in the affinity mask are offline.
The info of using managed irq is often produced by drivers, and it is
consumed by blk-mq, so different subsystems are involved in this info flow.
Address this issue by adding one helper of device_has_managed_msi_irq()
which is suggested by John Garry.
Suggested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/base/core.c | 15 +++++++++++++++
include/linux/device.h | 2 ++
2 files changed, 17 insertions(+)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index cadcade65825..41daf9fabdfb 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -29,6 +29,7 @@
#include <linux/sched/mm.h>
#include <linux/sysfs.h>
#include <linux/dma-map-ops.h> /* for dma_default_coherent */
+#include <linux/msi.h> /* for device_has_managed_irq */
#include "base.h"
#include "power/power.h"
@@ -4797,3 +4798,17 @@ int device_match_any(struct device *dev, const void *unused)
return 1;
}
EXPORT_SYMBOL_GPL(device_match_any);
+
+bool device_has_managed_msi_irq(struct device *dev)
+{
+ struct msi_desc *desc;
+
+ if (IS_ENABLED(CONFIG_GENERIC_MSI_IRQ)) {
+ for_each_msi_entry(desc, dev) {
+ if (desc->affinity && desc->affinity->is_managed)
+ return true;
+ }
+ }
+ return false;
+}
+EXPORT_SYMBOL_GPL(device_has_managed_msi_irq);
diff --git a/include/linux/device.h b/include/linux/device.h
index 59940f1744c1..b1255524ce8b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -981,4 +981,6 @@ extern long sysfs_deprecated;
#define sysfs_deprecated 0
#endif
+bool device_has_managed_msi_irq(struct device *dev);
+
#endif /* _DEVICE_H_ */
--
2.31.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH V5 1/3] driver core: add device_has_managed_msi_irq
2021-07-21 9:17 ` [PATCH V5 1/3] driver core: add device_has_managed_msi_irq Ming Lei
@ 2021-07-21 13:48 ` Greg Kroah-Hartman
2021-07-21 14:36 ` Christoph Hellwig
1 sibling, 0 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2021-07-21 13:48 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, Christoph Hellwig, linux-block, Thomas Gleixner,
John Garry, Sagi Grimberg, Daniel Wagner, Wen Xiong,
Hannes Reinecke
On Wed, Jul 21, 2021 at 05:17:21PM +0800, Ming Lei wrote:
> irq vector allocation with managed affinity may be used by driver, and
> blk-mq needs this info for draining queue because genirq core will shutdown
> managed irq when all CPUs in the affinity mask are offline.
>
> The info of using managed irq is often produced by drivers, and it is
> consumed by blk-mq, so different subsystems are involved in this info flow.
>
> Address this issue by adding one helper of device_has_managed_msi_irq()
> which is suggested by John Garry.
>
> Suggested-by: John Garry <john.garry@huawei.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/base/core.c | 15 +++++++++++++++
> include/linux/device.h | 2 ++
> 2 files changed, 17 insertions(+)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index cadcade65825..41daf9fabdfb 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -29,6 +29,7 @@
> #include <linux/sched/mm.h>
> #include <linux/sysfs.h>
> #include <linux/dma-map-ops.h> /* for dma_default_coherent */
> +#include <linux/msi.h> /* for device_has_managed_irq */
>
> #include "base.h"
> #include "power/power.h"
> @@ -4797,3 +4798,17 @@ int device_match_any(struct device *dev, const void *unused)
> return 1;
> }
> EXPORT_SYMBOL_GPL(device_match_any);
> +
> +bool device_has_managed_msi_irq(struct device *dev)
> +{
> + struct msi_desc *desc;
> +
> + if (IS_ENABLED(CONFIG_GENERIC_MSI_IRQ)) {
> + for_each_msi_entry(desc, dev) {
> + if (desc->affinity && desc->affinity->is_managed)
> + return true;
> + }
> + }
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(device_has_managed_msi_irq);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 59940f1744c1..b1255524ce8b 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -981,4 +981,6 @@ extern long sysfs_deprecated;
> #define sysfs_deprecated 0
> #endif
>
> +bool device_has_managed_msi_irq(struct device *dev);
This really belongs in msi.h, not device.h, as it is very MSI specific.
Same for the .c change.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V5 1/3] driver core: add device_has_managed_msi_irq
2021-07-21 9:17 ` [PATCH V5 1/3] driver core: add device_has_managed_msi_irq Ming Lei
2021-07-21 13:48 ` Greg Kroah-Hartman
@ 2021-07-21 14:36 ` Christoph Hellwig
1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2021-07-21 14:36 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, Christoph Hellwig, linux-block, Greg Kroah-Hartman,
Thomas Gleixner, John Garry, Sagi Grimberg, Daniel Wagner,
Wen Xiong, Hannes Reinecke
On Wed, Jul 21, 2021 at 05:17:21PM +0800, Ming Lei wrote:
> +bool device_has_managed_msi_irq(struct device *dev)
> +{
> + struct msi_desc *desc;
> +
> + if (IS_ENABLED(CONFIG_GENERIC_MSI_IRQ)) {
And inline stub for the !CONFIG_GENERIC_MSI_IRQ would seem nicer
so that we don't even nee the call.
Also please add a kerneldoc comment, and as Greg said this probably
doesn't belong into drivers/base and device.h but something in the
irq layer. Maybe Thomas has a preference.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH V5 2/3] blk-mq: mark if one queue map uses managed irq
2021-07-21 9:17 [PATCH V5 0/3] blk-mq: fix blk_mq_alloc_request_hctx Ming Lei
2021-07-21 9:17 ` [PATCH V5 1/3] driver core: add device_has_managed_msi_irq Ming Lei
@ 2021-07-21 9:17 ` Ming Lei
2021-07-21 14:37 ` Christoph Hellwig
2021-07-21 18:53 ` John Garry
2021-07-21 9:17 ` [PATCH V5 3/3] blk-mq: don't deactivate hctx if managed irq isn't used Ming Lei
2 siblings, 2 replies; 11+ messages in thread
From: Ming Lei @ 2021-07-21 9:17 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig, linux-block, Greg Kroah-Hartman
Cc: Thomas Gleixner, John Garry, Sagi Grimberg, Daniel Wagner,
Wen Xiong, Hannes Reinecke, Ming Lei
Retrieve this info via new added helper of device_has_managed_msi_irq,
then we can decide if one hctx needs to be drained before all its CPUs
become offline.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-pci.c | 1 +
block/blk-mq-rdma.c | 3 +++
block/blk-mq-virtio.c | 1 +
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 1 +
include/linux/blk-mq.h | 3 ++-
5 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index b595a94c4d16..7011854a562a 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -37,6 +37,7 @@ int blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, struct pci_dev *pdev,
for_each_cpu(cpu, mask)
qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
+ qmap->use_managed_irq = device_has_managed_msi_irq(&pdev->dev);
return 0;
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
index 14f968e58b8f..7b10d8bd2a37 100644
--- a/block/blk-mq-rdma.c
+++ b/block/blk-mq-rdma.c
@@ -36,6 +36,9 @@ int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map,
map->mq_map[cpu] = map->queue_offset + queue;
}
+ /* So far RDMA doesn't use managed irq */
+ map->use_managed_irq = false;
+
return 0;
fallback:
diff --git a/block/blk-mq-virtio.c b/block/blk-mq-virtio.c
index 7b8a42c35102..bea91ac5996d 100644
--- a/block/blk-mq-virtio.c
+++ b/block/blk-mq-virtio.c
@@ -38,6 +38,7 @@ int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
for_each_cpu(cpu, mask)
qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
+ qmap->use_managed_irq = device_has_managed_msi_irq(&vdev->dev);
return 0;
fallback:
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index b0b2361e63fe..7d7df261d346 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -3562,6 +3562,7 @@ static int map_queues_v2_hw(struct Scsi_Host *shost)
for_each_cpu(cpu, mask)
qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
+ qmap->use_managed_irq = true;
return 0;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1d18447ebebc..d54a795ec971 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -192,7 +192,8 @@ struct blk_mq_hw_ctx {
struct blk_mq_queue_map {
unsigned int *mq_map;
unsigned int nr_queues;
- unsigned int queue_offset;
+ unsigned int queue_offset:31;
+ unsigned int use_managed_irq:1;
};
/**
--
2.31.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH V5 2/3] blk-mq: mark if one queue map uses managed irq
2021-07-21 9:17 ` [PATCH V5 2/3] blk-mq: mark if one queue map uses managed irq Ming Lei
@ 2021-07-21 14:37 ` Christoph Hellwig
2021-07-22 1:24 ` Ming Lei
2021-07-21 18:53 ` John Garry
1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2021-07-21 14:37 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, Christoph Hellwig, linux-block, Greg Kroah-Hartman,
Thomas Gleixner, John Garry, Sagi Grimberg, Daniel Wagner,
Wen Xiong, Hannes Reinecke
On Wed, Jul 21, 2021 at 05:17:22PM +0800, Ming Lei wrote:
> + /* So far RDMA doesn't use managed irq */
> + map->use_managed_irq = false;
It certainly did when I wrote this code. The comment should be
something "sigh, someone fucked up the rdma queue mapping. No managed
irqs for now".
Otherwise looks fine:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V5 2/3] blk-mq: mark if one queue map uses managed irq
2021-07-21 14:37 ` Christoph Hellwig
@ 2021-07-22 1:24 ` Ming Lei
0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2021-07-22 1:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, linux-block, Greg Kroah-Hartman, Thomas Gleixner,
John Garry, Sagi Grimberg, Daniel Wagner, Wen Xiong,
Hannes Reinecke
On Wed, Jul 21, 2021 at 04:37:16PM +0200, Christoph Hellwig wrote:
> On Wed, Jul 21, 2021 at 05:17:22PM +0800, Ming Lei wrote:
> > + /* So far RDMA doesn't use managed irq */
> > + map->use_managed_irq = false;
>
> It certainly did when I wrote this code. The comment should be
> something "sigh, someone fucked up the rdma queue mapping. No managed
> irqs for now".
But if it rdma queue mapping uses managed irq, the issue in blk_mq_alloc_request_hctx()
can be hard to fix, :-(
Thanks,
Ming
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V5 2/3] blk-mq: mark if one queue map uses managed irq
2021-07-21 9:17 ` [PATCH V5 2/3] blk-mq: mark if one queue map uses managed irq Ming Lei
2021-07-21 14:37 ` Christoph Hellwig
@ 2021-07-21 18:53 ` John Garry
2021-07-22 1:22 ` Ming Lei
1 sibling, 1 reply; 11+ messages in thread
From: John Garry @ 2021-07-21 18:53 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block, Greg Kroah-Hartman
Cc: Thomas Gleixner, Sagi Grimberg, Daniel Wagner, Wen Xiong,
Hannes Reinecke
On 21/07/2021 10:17, Ming Lei wrote:
FWIW,
Reviewed-by: John Garry <john.garry@huawei.com>
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 1d18447ebebc..d54a795ec971 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -192,7 +192,8 @@ struct blk_mq_hw_ctx {
> struct blk_mq_queue_map {
> unsigned int *mq_map;
> unsigned int nr_queues;
> - unsigned int queue_offset;
> + unsigned int queue_offset:31;
> + unsigned int use_managed_irq:1;
late nit: I'd be inclined to call this "drain_hw_queue" or similar,
which is what it means to blk-mq
> };
Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V5 2/3] blk-mq: mark if one queue map uses managed irq
2021-07-21 18:53 ` John Garry
@ 2021-07-22 1:22 ` Ming Lei
0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2021-07-22 1:22 UTC (permalink / raw)
To: John Garry
Cc: Jens Axboe, Christoph Hellwig, linux-block, Greg Kroah-Hartman,
Thomas Gleixner, Sagi Grimberg, Daniel Wagner, Wen Xiong,
Hannes Reinecke
On Wed, Jul 21, 2021 at 07:53:22PM +0100, John Garry wrote:
> On 21/07/2021 10:17, Ming Lei wrote:
>
> FWIW,
>
> Reviewed-by: John Garry <john.garry@huawei.com>
>
> > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > index 1d18447ebebc..d54a795ec971 100644
> > --- a/include/linux/blk-mq.h
> > +++ b/include/linux/blk-mq.h
> > @@ -192,7 +192,8 @@ struct blk_mq_hw_ctx {
> > struct blk_mq_queue_map {
> > unsigned int *mq_map;
> > unsigned int nr_queues;
> > - unsigned int queue_offset;
> > + unsigned int queue_offset:31;
> > + unsigned int use_managed_irq:1;
>
> late nit: I'd be inclined to call this "drain_hw_queue" or similar, which is
> what it means to blk-mq
But 'drain_hw_queue' isn't straightforward when it is checked in
blk_mq_alloc_request_hctx().
Thanks,
Ming
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH V5 3/3] blk-mq: don't deactivate hctx if managed irq isn't used
2021-07-21 9:17 [PATCH V5 0/3] blk-mq: fix blk_mq_alloc_request_hctx Ming Lei
2021-07-21 9:17 ` [PATCH V5 1/3] driver core: add device_has_managed_msi_irq Ming Lei
2021-07-21 9:17 ` [PATCH V5 2/3] blk-mq: mark if one queue map uses managed irq Ming Lei
@ 2021-07-21 9:17 ` Ming Lei
2021-07-21 14:37 ` Christoph Hellwig
2 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2021-07-21 9:17 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig, linux-block, Greg Kroah-Hartman
Cc: Thomas Gleixner, John Garry, Sagi Grimberg, Daniel Wagner,
Wen Xiong, Hannes Reinecke, Ming Lei
blk-mq deactivates one hctx when the last CPU in hctx->cpumask become
offline by draining all requests originated from this hctx and moving new
allocation to other active hctx. This way is for avoiding inflight IO in
case of managed irq because managed irq is shutdown when the last CPU in
the irq's affinity becomes offline.
However, lots of drivers(nvme fc, rdma, tcp, loop, ...) don't use managed
irq, so they needn't to deactivate hctx when the last CPU becomes offline.
Also, some of them are the only user of blk_mq_alloc_request_hctx() which
is used for connecting io queue. And their requirement is that the connect
request needs to be submitted successfully via one specified hctx even though
all CPUs in this hctx->cpumask have become offline.
Addressing the requirement for nvme fc/rdma/loop by allowing to
allocate request from one hctx when all CPUs in this hctx are offline,
since these drivers don't use managed irq.
Finally don't deactivate one hctx when it doesn't use managed irq.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 27 +++++++++++++++++----------
block/blk-mq.h | 8 ++++++++
2 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2c4ac51e54eb..591ab07c64d8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -427,6 +427,15 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
}
EXPORT_SYMBOL(blk_mq_alloc_request);
+static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx)
+{
+ int cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask);
+
+ if (cpu >= nr_cpu_ids)
+ cpu = cpumask_first(hctx->cpumask);
+ return cpu;
+}
+
struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
{
@@ -468,7 +477,10 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
data.hctx = q->queue_hw_ctx[hctx_idx];
if (!blk_mq_hw_queue_mapped(data.hctx))
goto out_queue_exit;
- cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
+
+ WARN_ON_ONCE(blk_mq_hctx_use_managed_irq(data.hctx));
+
+ cpu = blk_mq_first_mapped_cpu(data.hctx);
data.ctx = __blk_mq_get_ctx(q, cpu);
if (!q->elevator)
@@ -1501,15 +1513,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
hctx_unlock(hctx, srcu_idx);
}
-static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx)
-{
- int cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask);
-
- if (cpu >= nr_cpu_ids)
- cpu = cpumask_first(hctx->cpumask);
- return cpu;
-}
-
/*
* It'd be great if the workqueue API had a way to pass
* in a mask and had some smarts for more clever placement.
@@ -2556,6 +2559,10 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node,
struct blk_mq_hw_ctx, cpuhp_online);
+ /* hctx needn't to be deactivated in case managed irq isn't used */
+ if (!blk_mq_hctx_use_managed_irq(hctx))
+ return 0;
+
if (!cpumask_test_cpu(cpu, hctx->cpumask) ||
!blk_mq_last_cpu_in_hctx(cpu, hctx))
return 0;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d08779f77a26..7333b659d8f5 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -119,6 +119,14 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
return ctx->hctxs[type];
}
+static inline bool blk_mq_hctx_use_managed_irq(struct blk_mq_hw_ctx *hctx)
+{
+ if (hctx->type == HCTX_TYPE_POLL)
+ return false;
+
+ return hctx->queue->tag_set->map[hctx->type].use_managed_irq;
+}
+
/*
* sysfs helpers
*/
--
2.31.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH V5 3/3] blk-mq: don't deactivate hctx if managed irq isn't used
2021-07-21 9:17 ` [PATCH V5 3/3] blk-mq: don't deactivate hctx if managed irq isn't used Ming Lei
@ 2021-07-21 14:37 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2021-07-21 14:37 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, Christoph Hellwig, linux-block, Greg Kroah-Hartman,
Thomas Gleixner, John Garry, Sagi Grimberg, Daniel Wagner,
Wen Xiong, Hannes Reinecke
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 11+ messages in thread