* blk-mq: allow to defer ->queue_rq invocations to workqueue
@ 2014-11-03 8:23 Christoph Hellwig
2014-11-03 8:23 ` [PATCH 1/2] blk-mq: handle single queue case in blk_mq_hctx_next_cpu Christoph Hellwig
2014-11-03 8:23 ` [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue Christoph Hellwig
0 siblings, 2 replies; 6+ messages in thread
From: Christoph Hellwig @ 2014-11-03 8:23 UTC (permalink / raw)
To: Jens Axboe; +Cc: Richard Weinberger, Ming Lei, ceph-devel, linux-kernel
Drivers that need to do synchronous, blocking operations to do I/O generally
want to defer all I/O to a drіver-private workqueue. Examples for that are
the loop driver, rbd, or ubi block driver, and probably lots more that haven't
been evaluated yet.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] blk-mq: handle single queue case in blk_mq_hctx_next_cpu
2014-11-03 8:23 blk-mq: allow to defer ->queue_rq invocations to workqueue Christoph Hellwig
@ 2014-11-03 8:23 ` Christoph Hellwig
2014-11-03 8:23 ` [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue Christoph Hellwig
1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2014-11-03 8:23 UTC (permalink / raw)
To: Jens Axboe; +Cc: Richard Weinberger, Ming Lei, ceph-devel, linux-kernel
Don't duplicate the code to handle the not cpu bounce case in the
caller, do it inside blk_mq_hctx_next_cpu instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-mq.c | 34 +++++++++++++---------------------
1 file changed, 13 insertions(+), 21 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b355b59..22e50a5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -780,10 +780,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
*/
static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
{
- int cpu = hctx->next_cpu;
+ if (hctx->queue->nr_hw_queues == 1)
+ return WORK_CPU_UNBOUND;
if (--hctx->next_cpu_batch <= 0) {
- int next_cpu;
+ int cpu = hctx->next_cpu, next_cpu;
next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
if (next_cpu >= nr_cpu_ids)
@@ -791,9 +792,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
hctx->next_cpu = next_cpu;
hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
+
+ return cpu;
}
- return cpu;
+ return hctx->next_cpu;
}
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
@@ -801,16 +804,13 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
return;
- if (!async && cpumask_test_cpu(smp_processor_id(), hctx->cpumask))
+ if (!async && cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) {
__blk_mq_run_hw_queue(hctx);
- else if (hctx->queue->nr_hw_queues == 1)
- kblockd_schedule_delayed_work(&hctx->run_work, 0);
- else {
- unsigned int cpu;
-
- cpu = blk_mq_hctx_next_cpu(hctx);
- kblockd_schedule_delayed_work_on(cpu, &hctx->run_work, 0);
+ return;
}
+
+ kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
+ &hctx->run_work, 0);
}
void blk_mq_run_queues(struct request_queue *q, bool async)
@@ -908,16 +908,8 @@ static void blk_mq_delay_work_fn(struct work_struct *work)
void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
{
- unsigned long tmo = msecs_to_jiffies(msecs);
-
- if (hctx->queue->nr_hw_queues == 1)
- kblockd_schedule_delayed_work(&hctx->delay_work, tmo);
- else {
- unsigned int cpu;
-
- cpu = blk_mq_hctx_next_cpu(hctx);
- kblockd_schedule_delayed_work_on(cpu, &hctx->delay_work, tmo);
- }
+ kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
+ &hctx->delay_work, msecs_to_jiffies(msecs));
}
EXPORT_SYMBOL(blk_mq_delay_queue);
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue
2014-11-03 8:23 blk-mq: allow to defer ->queue_rq invocations to workqueue Christoph Hellwig
2014-11-03 8:23 ` [PATCH 1/2] blk-mq: handle single queue case in blk_mq_hctx_next_cpu Christoph Hellwig
@ 2014-11-03 8:23 ` Christoph Hellwig
2014-11-03 8:40 ` Ming Lei
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2014-11-03 8:23 UTC (permalink / raw)
To: Jens Axboe; +Cc: Richard Weinberger, Ming Lei, ceph-devel, linux-kernel
We have various block drivers that need to execute long term blocking
operations during I/O submission like file system or network I/O.
Currently these drivers just queue up work to an internal workqueue
from their request_fn. With blk-mq we can make sure they always get
called on their own workqueue directly for I/O submission by:
1) adding a flag to prevent inline submission of I/O, and
2) allowing the driver to pass in a workqueue in the tag_set that
will be used instead of kblockd.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-core.c | 2 +-
block/blk-mq.c | 12 +++++++++---
block/blk.h | 1 +
include/linux/blk-mq.h | 4 ++++
4 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 0421b53..7f7249f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -61,7 +61,7 @@ struct kmem_cache *blk_requestq_cachep;
/*
* Controlling structure to kblockd
*/
-static struct workqueue_struct *kblockd_workqueue;
+struct workqueue_struct *kblockd_workqueue;
void blk_queue_congestion_threshold(struct request_queue *q)
{
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 22e50a5..3d27d22 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -804,12 +804,13 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
return;
- if (!async && cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) {
+ if (!async && !(hctx->flags & BLK_MQ_F_WORKQUEUE) &&
+ cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) {
__blk_mq_run_hw_queue(hctx);
return;
}
- kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
+ queue_delayed_work_on(blk_mq_hctx_next_cpu(hctx), hctx->wq,
&hctx->run_work, 0);
}
@@ -908,7 +909,7 @@ static void blk_mq_delay_work_fn(struct work_struct *work)
void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
{
- kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
+ queue_delayed_work_on(blk_mq_hctx_next_cpu(hctx), hctx->wq,
&hctx->delay_work, msecs_to_jiffies(msecs));
}
EXPORT_SYMBOL(blk_mq_delay_queue);
@@ -1581,6 +1582,11 @@ static int blk_mq_init_hctx(struct request_queue *q,
hctx->flags = set->flags;
hctx->cmd_size = set->cmd_size;
+ if (set->wq)
+ hctx->wq = set->wq;
+ else
+ hctx->wq = kblockd_workqueue;
+
blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
blk_mq_hctx_notify, hctx);
blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
diff --git a/block/blk.h b/block/blk.h
index 43b0361..fb46ad0 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -25,6 +25,7 @@ struct blk_flush_queue {
spinlock_t mq_flush_lock;
};
+extern struct workqueue_struct *kblockd_workqueue;
extern struct kmem_cache *blk_requestq_cachep;
extern struct kmem_cache *request_cachep;
extern struct kobj_type blk_queue_ktype;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 5a901d0..ebe4699 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -37,6 +37,8 @@ struct blk_mq_hw_ctx {
unsigned int queue_num;
struct blk_flush_queue *fq;
+ struct workqueue_struct *wq;
+
void *driver_data;
struct blk_mq_ctxmap ctx_map;
@@ -64,6 +66,7 @@ struct blk_mq_hw_ctx {
struct blk_mq_tag_set {
struct blk_mq_ops *ops;
+ struct workqueue_struct *wq;
unsigned int nr_hw_queues;
unsigned int queue_depth; /* max hw supported */
unsigned int reserved_tags;
@@ -156,6 +159,7 @@ enum {
BLK_MQ_F_SG_MERGE = 1 << 2,
BLK_MQ_F_SYSFS_UP = 1 << 3,
BLK_MQ_F_DEFER_ISSUE = 1 << 4,
+ BLK_MQ_F_WORKQUEUE = 1 << 5,
BLK_MQ_S_STOPPED = 0,
BLK_MQ_S_TAG_ACTIVE = 1,
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue
2014-11-03 8:23 ` [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue Christoph Hellwig
@ 2014-11-03 8:40 ` Ming Lei
2014-11-03 10:10 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Ming Lei @ 2014-11-03 8:40 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Richard Weinberger, ceph-devel, Linux Kernel Mailing List
Hi Christoph,
On Mon, Nov 3, 2014 at 4:23 PM, Christoph Hellwig <hch@lst.de> wrote:
> We have various block drivers that need to execute long term blocking
> operations during I/O submission like file system or network I/O.
>
> Currently these drivers just queue up work to an internal workqueue
> from their request_fn. With blk-mq we can make sure they always get
> called on their own workqueue directly for I/O submission by:
>
> 1) adding a flag to prevent inline submission of I/O, and
> 2) allowing the driver to pass in a workqueue in the tag_set that
> will be used instead of kblockd.
The above two aren't enough because the big problem is that
drivers need a per-request work structure instead of 'hctx->run_work',
otherwise there are at most NR_CPUS concurrent submissions.
So the per-request work structure should be exposed to blk-mq
too for the kind of usage, such as .blk_mq_req_work(req) callback
in case of BLK_MQ_F_WORKQUEUE.
Thanks,
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-core.c | 2 +-
> block/blk-mq.c | 12 +++++++++---
> block/blk.h | 1 +
> include/linux/blk-mq.h | 4 ++++
> 4 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 0421b53..7f7249f 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -61,7 +61,7 @@ struct kmem_cache *blk_requestq_cachep;
> /*
> * Controlling structure to kblockd
> */
> -static struct workqueue_struct *kblockd_workqueue;
> +struct workqueue_struct *kblockd_workqueue;
>
> void blk_queue_congestion_threshold(struct request_queue *q)
> {
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 22e50a5..3d27d22 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -804,12 +804,13 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
> if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> return;
>
> - if (!async && cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) {
> + if (!async && !(hctx->flags & BLK_MQ_F_WORKQUEUE) &&
> + cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) {
> __blk_mq_run_hw_queue(hctx);
> return;
> }
>
> - kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
> + queue_delayed_work_on(blk_mq_hctx_next_cpu(hctx), hctx->wq,
> &hctx->run_work, 0);
> }
>
> @@ -908,7 +909,7 @@ static void blk_mq_delay_work_fn(struct work_struct *work)
>
> void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
> {
> - kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
> + queue_delayed_work_on(blk_mq_hctx_next_cpu(hctx), hctx->wq,
> &hctx->delay_work, msecs_to_jiffies(msecs));
> }
> EXPORT_SYMBOL(blk_mq_delay_queue);
> @@ -1581,6 +1582,11 @@ static int blk_mq_init_hctx(struct request_queue *q,
> hctx->flags = set->flags;
> hctx->cmd_size = set->cmd_size;
>
> + if (set->wq)
> + hctx->wq = set->wq;
> + else
> + hctx->wq = kblockd_workqueue;
> +
> blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
> blk_mq_hctx_notify, hctx);
> blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
> diff --git a/block/blk.h b/block/blk.h
> index 43b0361..fb46ad0 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -25,6 +25,7 @@ struct blk_flush_queue {
> spinlock_t mq_flush_lock;
> };
>
> +extern struct workqueue_struct *kblockd_workqueue;
> extern struct kmem_cache *blk_requestq_cachep;
> extern struct kmem_cache *request_cachep;
> extern struct kobj_type blk_queue_ktype;
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 5a901d0..ebe4699 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -37,6 +37,8 @@ struct blk_mq_hw_ctx {
> unsigned int queue_num;
> struct blk_flush_queue *fq;
>
> + struct workqueue_struct *wq;
> +
> void *driver_data;
>
> struct blk_mq_ctxmap ctx_map;
> @@ -64,6 +66,7 @@ struct blk_mq_hw_ctx {
>
> struct blk_mq_tag_set {
> struct blk_mq_ops *ops;
> + struct workqueue_struct *wq;
> unsigned int nr_hw_queues;
> unsigned int queue_depth; /* max hw supported */
> unsigned int reserved_tags;
> @@ -156,6 +159,7 @@ enum {
> BLK_MQ_F_SG_MERGE = 1 << 2,
> BLK_MQ_F_SYSFS_UP = 1 << 3,
> BLK_MQ_F_DEFER_ISSUE = 1 << 4,
> + BLK_MQ_F_WORKQUEUE = 1 << 5,
>
> BLK_MQ_S_STOPPED = 0,
> BLK_MQ_S_TAG_ACTIVE = 1,
> --
> 1.9.1
>
--
Ming Lei
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue
2014-11-03 8:40 ` Ming Lei
@ 2014-11-03 10:10 ` Christoph Hellwig
2014-11-03 11:54 ` Ming Lei
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2014-11-03 10:10 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Jens Axboe, Richard Weinberger, ceph-devel,
Linux Kernel Mailing List
On Mon, Nov 03, 2014 at 04:40:47PM +0800, Ming Lei wrote:
> The above two aren't enough because the big problem is that
> drivers need a per-request work structure instead of 'hctx->run_work',
> otherwise there are at most NR_CPUS concurrent submissions.
>
> So the per-request work structure should be exposed to blk-mq
> too for the kind of usage, such as .blk_mq_req_work(req) callback
> in case of BLK_MQ_F_WORKQUEUE.
Hmm. Maybe a better option is to just add a flag to never defer
->queue_rq to a workqueue and let drivers handle the it?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue
2014-11-03 10:10 ` Christoph Hellwig
@ 2014-11-03 11:54 ` Ming Lei
0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2014-11-03 11:54 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Richard Weinberger, ceph-devel, Linux Kernel Mailing List
On Mon, Nov 3, 2014 at 6:10 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Nov 03, 2014 at 04:40:47PM +0800, Ming Lei wrote:
>> The above two aren't enough because the big problem is that
>> drivers need a per-request work structure instead of 'hctx->run_work',
>> otherwise there are at most NR_CPUS concurrent submissions.
>>
>> So the per-request work structure should be exposed to blk-mq
>> too for the kind of usage, such as .blk_mq_req_work(req) callback
>> in case of BLK_MQ_F_WORKQUEUE.
>
> Hmm. Maybe a better option is to just add a flag to never defer
> ->queue_rq to a workqueue and let drivers handle the it?
That should work, but might lose potential merge benefit of defer.
Thanks,
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-11-03 11:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-03 8:23 blk-mq: allow to defer ->queue_rq invocations to workqueue Christoph Hellwig
2014-11-03 8:23 ` [PATCH 1/2] blk-mq: handle single queue case in blk_mq_hctx_next_cpu Christoph Hellwig
2014-11-03 8:23 ` [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue Christoph Hellwig
2014-11-03 8:40 ` Ming Lei
2014-11-03 10:10 ` Christoph Hellwig
2014-11-03 11:54 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).