All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2]: Add option for async ->queue_rq
@ 2016-09-22 14:52 Jens Axboe
  2016-09-22 14:52 ` [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Jens Axboe @ 2016-09-22 14:52 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: jbacik

Two patches that add the ability for a driver to flag itself
as wanting the ->queue_rq() invoked in a manner that allows
it to block. We'll need that for the nbd conversion, to avoid
having to add workqueue offload. We can use this in the current
loop mq path as well.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
  2016-09-22 14:52 [PATCH 0/2]: Add option for async ->queue_rq Jens Axboe
@ 2016-09-22 14:52 ` Jens Axboe
  2016-09-22 14:56   ` Christoph Hellwig
                     ` (2 more replies)
  2016-09-22 14:53 ` [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq() Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 19+ messages in thread
From: Jens Axboe @ 2016-09-22 14:52 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: jbacik, Jens Axboe

Two cases:

1) blk_mq_alloc_request() needlessly re-runs the queue, after
   calling into the tag allocation without NOWAIT set. We don't
   need to do that.

2) blk_mq_map_request() should just use blk_mq_run_hw_queue() with
   the async flag set to false.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-mq.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e0a69daddbd8..c29700010b5c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -34,8 +34,6 @@
 static DEFINE_MUTEX(all_q_mutex);
 static LIST_HEAD(all_q_list);
 
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx);
-
 /*
  * Check if any of the ctx's have pending work in this hardware queue
  */
@@ -228,19 +226,9 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw,
 	ctx = blk_mq_get_ctx(q);
 	hctx = q->mq_ops->map_queue(q, ctx->cpu);
 	blk_mq_set_alloc_data(&alloc_data, q, flags, ctx, hctx);
-
 	rq = __blk_mq_alloc_request(&alloc_data, rw, 0);
-	if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) {
-		__blk_mq_run_hw_queue(hctx);
-		blk_mq_put_ctx(ctx);
-
-		ctx = blk_mq_get_ctx(q);
-		hctx = q->mq_ops->map_queue(q, ctx->cpu);
-		blk_mq_set_alloc_data(&alloc_data, q, flags, ctx, hctx);
-		rq =  __blk_mq_alloc_request(&alloc_data, rw, 0);
-		ctx = alloc_data.ctx;
-	}
 	blk_mq_put_ctx(ctx);
+
 	if (!rq) {
 		blk_queue_exit(q);
 		return ERR_PTR(-EWOULDBLOCK);
@@ -1225,7 +1213,7 @@ static struct request *blk_mq_map_request(struct request_queue *q,
 	blk_mq_set_alloc_data(&alloc_data, q, BLK_MQ_REQ_NOWAIT, ctx, hctx);
 	rq = __blk_mq_alloc_request(&alloc_data, op, op_flags);
 	if (unlikely(!rq)) {
-		__blk_mq_run_hw_queue(hctx);
+		blk_mq_run_hw_queue(hctx, false);
 		blk_mq_put_ctx(ctx);
 		trace_block_sleeprq(q, bio, op);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()
  2016-09-22 14:52 [PATCH 0/2]: Add option for async ->queue_rq Jens Axboe
  2016-09-22 14:52 ` [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() Jens Axboe
@ 2016-09-22 14:53 ` Jens Axboe
  2016-09-22 14:59   ` Christoph Hellwig
  2016-09-22 18:01 ` [PATCH 0/2]: Add option for async ->queue_rq Josef Bacik
  2016-09-28  0:25 ` Bart Van Assche
  3 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2016-09-22 14:53 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: jbacik, Jens Axboe

If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its
->queue_rq() handler. For that case, blk-mq ensures that we always
calls it from a safe context.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-mq.c         | 2 +-
 include/linux/blk-mq.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c29700010b5c..eae2f12f011f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -908,7 +908,7 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 	    !blk_mq_hw_queue_mapped(hctx)))
 		return;
 
-	if (!async) {
+	if (!async && !(hctx->flags & BLK_MQ_F_BLOCKING)) {
 		int cpu = get_cpu();
 		if (cpumask_test_cpu(cpu, hctx->cpumask)) {
 			__blk_mq_run_hw_queue(hctx);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index fbcfdf323243..5daa0ef756dd 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -155,6 +155,7 @@ enum {
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
 	BLK_MQ_F_SG_MERGE	= 1 << 2,
 	BLK_MQ_F_DEFER_ISSUE	= 1 << 4,
+	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
 	BLK_MQ_F_ALLOC_POLICY_BITS = 1,
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
  2016-09-22 14:52 ` [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() Jens Axboe
@ 2016-09-22 14:56   ` Christoph Hellwig
  2016-09-22 18:56   ` Christoph Hellwig
  2016-09-23 21:56   ` Sagi Grimberg
  2 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2016-09-22 14:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: axboe, linux-block, jbacik

On Thu, Sep 22, 2016 at 08:52:59AM -0600, Jens Axboe wrote:
> Two cases:
> 
> 1) blk_mq_alloc_request() needlessly re-runs the queue, after
>    calling into the tag allocation without NOWAIT set. We don't
>    need to do that.
> 
> 2) blk_mq_map_request() should just use blk_mq_run_hw_queue() with
>    the async flag set to false.

I had some very similar patches in my queue but never got around
benchmarking them in enough setups to feel safe enough to post them.

Assuming you found no reaosn to keep the odd old scheme either:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()
  2016-09-22 14:53 ` [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq() Jens Axboe
@ 2016-09-22 14:59   ` Christoph Hellwig
  2016-09-22 15:03     ` Jens Axboe
  2016-09-22 15:04     ` Josef Bacik
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2016-09-22 14:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: axboe, linux-block, jbacik, Ming Lei

On Thu, Sep 22, 2016 at 08:53:00AM -0600, Jens Axboe wrote:
> If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its
> ->queue_rq() handler. For that case, blk-mq ensures that we always
> calls it from a safe context.

First can you provide a more useful defintion of blocking?  Lots of current
drivers will already block in ->queue_rq, mostly to allocate memory.

Second we looked at something similar a few times ago, mosty notably
when converting loop and rbd, and came to the conclusion that
performance sucks when we only have that single per-hctx work struct
for a busy device.  I think Ming has done a lot of the benchmarking,
so I've added him to Cc.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()
  2016-09-22 14:59   ` Christoph Hellwig
@ 2016-09-22 15:03     ` Jens Axboe
  2016-09-22 15:12       ` Christoph Hellwig
  2016-09-22 15:04     ` Josef Bacik
  1 sibling, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2016-09-22 15:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, jbacik, Ming Lei

On 09/22/2016 08:59 AM, Christoph Hellwig wrote:
> On Thu, Sep 22, 2016 at 08:53:00AM -0600, Jens Axboe wrote:
>> If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its
>> ->queue_rq() handler. For that case, blk-mq ensures that we always
>> calls it from a safe context.
>
> First can you provide a more useful defintion of blocking?  Lots of current
> drivers will already block in ->queue_rq, mostly to allocate memory.

Having to grab a mutex, for instance. We invoke ->queue_rq() with
preemption disabled, so I'd hope that would not be the case... What
drivers block in ->queue_rq?

> Second we looked at something similar a few times ago, mosty notably
> when converting loop and rbd, and came to the conclusion that
> performance sucks when we only have that single per-hctx work struct
> for a busy device.  I think Ming has done a lot of the benchmarking,
> so I've added him to Cc.

Loop was another case that was on my radar to get rid of the queue_work
it currently has to do. Josef is currently testing the nbd driver using
this approach, so we should get some numbers there too. If we have to,
we can always bump up the concurrency to mimic more of the behavior of
having multiple workers running on the hardware queue. I'd prefer to
still do that in blk-mq, instead of having drivers reinventing their own
work offload functionality.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()
  2016-09-22 14:59   ` Christoph Hellwig
  2016-09-22 15:03     ` Jens Axboe
@ 2016-09-22 15:04     ` Josef Bacik
  1 sibling, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2016-09-22 15:04 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: axboe, linux-block, Ming Lei

On 09/22/2016 10:59 AM, Christoph Hellwig wrote:
> On Thu, Sep 22, 2016 at 08:53:00AM -0600, Jens Axboe wrote:
>> If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its
>> ->queue_rq() handler. For that case, blk-mq ensures that we always
>> calls it from a safe context.
>
> First can you provide a more useful defintion of blocking?  Lots of current
> drivers will already block in ->queue_rq, mostly to allocate memory.
>

NBD needs to take a mutex before it sends.  This patch was born out of a kbuild 
error I got because of a schedule while atomic, this was the backtrace

[   17.698207] BUG: sleeping function called from invalid context at 
kernel/locking/mutex.c:620
[   17.700903] in_atomic(): 1, irqs_disabled(): 0, pid: 1050, name: mount
[   17.702350] 1 lock held by mount/1050:
[   17.703395]  #0:  (&type->s_umount_key#20/1){+.+.+.}, at: 
[<ffffffff8122627b>] sget_userns+0x2f7/0x4b0
[   17.706064] CPU: 0 PID: 1050 Comm: mount Not tainted 4.8.0-rc4-00007-gfd8383fd #1
[   17.708149] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[   17.710523]  0000000000000000 ffff880034dd39f0 ffffffff8178e23d ffff8800350e4d00
[   17.712773]  000000000000041a ffff880034dd3a18 ffffffff81108c39 ffffffff839f1476
[   17.714978]  000000000000026c 0000000000000000 ffff880034dd3a40 ffffffff81108cb5
[   17.717224] Call Trace:
[   17.718130]  [<ffffffff8178e23d>] dump_stack+0x82/0xb8
[   17.719474]  [<ffffffff81108c39>] ___might_sleep+0x1bd/0x1c4
[   17.720813]  [<ffffffff81108cb5>] __might_sleep+0x75/0x7c
[   17.722100]  [<ffffffff82ef53e2>] mutex_lock_nested+0x3e/0x396
[   17.723449]  [<ffffffff8114aace>] ? mod_timer+0x10/0x12
[   17.724705]  [<ffffffff81770002>] ? blk_add_timer+0xe0/0xe5
[   17.726088]  [<ffffffff81c2a35c>] nbd_queue_rq+0x7b/0x14b
[   17.727367]  [<ffffffff81c2a35c>] ? nbd_queue_rq+0x7b/0x14b
[   17.728653]  [<ffffffff81772b66>] __blk_mq_run_hw_queue+0x1c7/0x2c8
[   17.730118]  [<ffffffff81772942>] blk_mq_run_hw_queue+0x5e/0xbb
[   17.731454]  [<ffffffff81773d53>] blk_sq_make_request+0x1a1/0x1ba
[   17.732806]  [<ffffffff81768e24>] generic_make_request+0xbd/0x160
[   17.734153]  [<ffffffff81768fbd>] submit_bio+0xf6/0xff
[   17.735365]  [<ffffffff81252806>] submit_bh_wbc+0x136/0x143
[   17.736719]  [<ffffffff81252c00>] submit_bh+0x10/0x12
[   17.737888]  [<ffffffff81252c52>] __bread_gfp+0x50/0x6f
[   17.739212]  [<ffffffff812f290a>] ext4_fill_super+0x1f4/0x27ec
[   17.740492]  [<ffffffff81798a59>] ? vsnprintf+0x22d/0x3b7
[   17.741685]  [<ffffffff812265e3>] mount_bdev+0x144/0x197
[   17.742928]  [<ffffffff812f2716>] ? ext4_calculate_overhead+0x2bd/0x2bd
[   17.744275]  [<ffffffff812ede93>] ext4_mount+0x15/0x17
[   17.745406]  [<ffffffff81227049>] mount_fs+0x67/0x131
[   17.746518]  [<ffffffff8123ee2f>] vfs_kern_mount+0x6b/0xdb
[   17.747675]  [<ffffffff81241759>] do_mount+0x708/0x97d
[   17.748833]  [<ffffffff811e87f0>] ? __might_fault+0x7e/0x84
[   17.750074]  [<ffffffff811da511>] ? strndup_user+0x3f/0x53
[   17.751198]  [<ffffffff81268eb6>] compat_SyS_mount+0x185/0x1ae
[   17.752433]  [<ffffffff81003b6f>] do_int80_syscall_32+0x5c/0x6b
[   17.753592]  [<ffffffff82efa6d8>] entry_INT80_compat+0x38/0x50
[   17.754952] block nbd11: Attempted send on closed socket

> Second we looked at something similar a few times ago, mosty notably
> when converting loop and rbd, and came to the conclusion that
> performance sucks when we only have that single per-hctx work struct
> for a busy device.  I think Ming has done a lot of the benchmarking,
> so I've added him to Cc.
>

Yeah this looks to be about a 5% hit on my microbenchmarks for NBD, but nobody 
accuses NBD of being particularly fast either so I wasn't too worried about it. 
If we want to say that we can block in ->queue_rq that would make me happy too. 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()
  2016-09-22 15:03     ` Jens Axboe
@ 2016-09-22 15:12       ` Christoph Hellwig
  2016-09-22 15:17         ` Jens Axboe
  2016-09-23  1:35         ` Ming Lei
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2016-09-22 15:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, axboe, linux-block, jbacik, Ming Lei

On Thu, Sep 22, 2016 at 09:03:56AM -0600, Jens Axboe wrote:
> Having to grab a mutex, for instance. We invoke ->queue_rq() with
> preemption disabled, so I'd hope that would not be the case... What
> drivers block in ->queue_rq?

I though I had converted a lot of them to GFP_NOIO instead of GFP_ATOMIC
allocations, but I can't find any evidence of that.  Maybe it was just
my imagination, or an unsubmitted patch series.  Sorry for the
confusion.

> Loop was another case that was on my radar to get rid of the queue_work
> it currently has to do. Josef is currently testing the nbd driver using
> this approach, so we should get some numbers there too. If we have to,
> we can always bump up the concurrency to mimic more of the behavior of
> having multiple workers running on the hardware queue. I'd prefer to
> still do that in blk-mq, instead of having drivers reinventing their own
> work offload functionality.

There should be a lot of numbers in the list archives from the time
that Ming did the loop conversion, as I've been trying to steer him
that way, and he actually implemented and benchmarked it.

We can't just increase the concurrency as a single work_struct item
can't be queued multiple times even on a high concurreny workqueue.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()
  2016-09-22 15:12       ` Christoph Hellwig
@ 2016-09-22 15:17         ` Jens Axboe
  2016-09-23  1:43           ` Ming Lei
  2016-09-23  1:35         ` Ming Lei
  1 sibling, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2016-09-22 15:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, jbacik, Ming Lei

On 09/22/2016 09:12 AM, Christoph Hellwig wrote:
> On Thu, Sep 22, 2016 at 09:03:56AM -0600, Jens Axboe wrote:
>> Having to grab a mutex, for instance. We invoke ->queue_rq() with
>> preemption disabled, so I'd hope that would not be the case... What
>> drivers block in ->queue_rq?
>
> I though I had converted a lot of them to GFP_NOIO instead of GFP_ATOMIC
> allocations, but I can't find any evidence of that.  Maybe it was just
> my imagination, or an unsubmitted patch series.  Sorry for the
> confusion.

OK, that makes more sense. Pretty sure we would have had complaints!

>> Loop was another case that was on my radar to get rid of the queue_work
>> it currently has to do. Josef is currently testing the nbd driver using
>> this approach, so we should get some numbers there too. If we have to,
>> we can always bump up the concurrency to mimic more of the behavior of
>> having multiple workers running on the hardware queue. I'd prefer to
>> still do that in blk-mq, instead of having drivers reinventing their own
>> work offload functionality.
>
> There should be a lot of numbers in the list archives from the time
> that Ming did the loop conversion, as I've been trying to steer him
> that way, and he actually implemented and benchmarked it.
>
> We can't just increase the concurrency as a single work_struct item
> can't be queued multiple times even on a high concurreny workqueue.

But we could have more work items, if we had to. Even if loop isn't a
drop-in replacement for this simpler approach, I think it'll work well
enough for nbd. The 5% number from Josef is comparing to not having any
offload at all, I suspect the number from just converting to queue_work
in the driver would be similar.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2]: Add option for async ->queue_rq
  2016-09-22 14:52 [PATCH 0/2]: Add option for async ->queue_rq Jens Axboe
  2016-09-22 14:52 ` [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() Jens Axboe
  2016-09-22 14:53 ` [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq() Jens Axboe
@ 2016-09-22 18:01 ` Josef Bacik
  2016-09-28  0:25 ` Bart Van Assche
  3 siblings, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2016-09-22 18:01 UTC (permalink / raw)
  To: Jens Axboe, axboe, linux-block

On 09/22/2016 10:52 AM, Jens Axboe wrote:
> Two patches that add the ability for a driver to flag itself
> as wanting the ->queue_rq() invoked in a manner that allows
> it to block. We'll need that for the nbd conversion, to avoid
> having to add workqueue offload. We can use this in the current
> loop mq path as well.
>
>

You can add my

Tested-by: Josef Bacik <jbacik@fb.com>

to these.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
  2016-09-22 14:52 ` [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() Jens Axboe
  2016-09-22 14:56   ` Christoph Hellwig
@ 2016-09-22 18:56   ` Christoph Hellwig
  2016-09-22 20:29     ` Jens Axboe
  2016-09-23 21:59     ` Sagi Grimberg
  2016-09-23 21:56   ` Sagi Grimberg
  2 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2016-09-22 18:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: axboe, linux-block, jbacik

Ok, I looked into this a bit more, and while I'm still fine with the
patch I think it's only half of what we should do here.  There really
is no point in doing the first non-blocking path in blk_mq_map_request
either as bt_get itself already does the non-blocking pass, and also
runs the queue when scheduling in the later loop as well.  So to get
towards what I had in my tree we also need something like this:

---
>From c69bf02929d9c37d193b004a4c3c85c1142fa996 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Thu, 22 Sep 2016 11:38:23 -0700
Subject: blk-mq: remove non-blocking pass in blk_mq_map_request

bt_get already does a non-blocking pass as well as running the queue
when scheduling internally, no need to duplicate it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index eae2f12..e9ebe98 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1210,20 +1210,8 @@ static struct request *blk_mq_map_request(struct request_queue *q,
 		op_flags |= REQ_SYNC;
 
 	trace_block_getrq(q, bio, op);
-	blk_mq_set_alloc_data(&alloc_data, q, BLK_MQ_REQ_NOWAIT, ctx, hctx);
+	blk_mq_set_alloc_data(&alloc_data, q, 0, ctx, hctx);
 	rq = __blk_mq_alloc_request(&alloc_data, op, op_flags);
-	if (unlikely(!rq)) {
-		blk_mq_run_hw_queue(hctx, false);
-		blk_mq_put_ctx(ctx);
-		trace_block_sleeprq(q, bio, op);
-
-		ctx = blk_mq_get_ctx(q);
-		hctx = q->mq_ops->map_queue(q, ctx->cpu);
-		blk_mq_set_alloc_data(&alloc_data, q, 0, ctx, hctx);
-		rq = __blk_mq_alloc_request(&alloc_data, op, op_flags);
-		ctx = alloc_data.ctx;
-		hctx = alloc_data.hctx;
-	}
 
 	hctx->queued++;
 	data->hctx = hctx;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
  2016-09-22 18:56   ` Christoph Hellwig
@ 2016-09-22 20:29     ` Jens Axboe
  2016-09-23 21:59     ` Sagi Grimberg
  1 sibling, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2016-09-22 20:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, jbacik

On 09/22/2016 12:56 PM, Christoph Hellwig wrote:
> Ok, I looked into this a bit more, and while I'm still fine with the
> patch I think it's only half of what we should do here.  There really
> is no point in doing the first non-blocking path in blk_mq_map_request
> either as bt_get itself already does the non-blocking pass, and also
> runs the queue when scheduling in the later loop as well.  So to get
> towards what I had in my tree we also need something like this:

Good point, I'll apply this one as well.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()
  2016-09-22 15:12       ` Christoph Hellwig
  2016-09-22 15:17         ` Jens Axboe
@ 2016-09-23  1:35         ` Ming Lei
  1 sibling, 0 replies; 19+ messages in thread
From: Ming Lei @ 2016-09-23  1:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Jens Axboe, linux-block, jbacik

On Thu, Sep 22, 2016 at 11:12 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Sep 22, 2016 at 09:03:56AM -0600, Jens Axboe wrote:
>> Having to grab a mutex, for instance. We invoke ->queue_rq() with
>> preemption disabled, so I'd hope that would not be the case... What
>> drivers block in ->queue_rq?
>
> I though I had converted a lot of them to GFP_NOIO instead of GFP_ATOMIC
> allocations, but I can't find any evidence of that.  Maybe it was just
> my imagination, or an unsubmitted patch series.  Sorry for the
> confusion.
>
>> Loop was another case that was on my radar to get rid of the queue_work
>> it currently has to do. Josef is currently testing the nbd driver using
>> this approach, so we should get some numbers there too. If we have to,
>> we can always bump up the concurrency to mimic more of the behavior of
>> having multiple workers running on the hardware queue. I'd prefer to
>> still do that in blk-mq, instead of having drivers reinventing their own
>> work offload functionality.
>
> There should be a lot of numbers in the list archives from the time
> that Ming did the loop conversion, as I've been trying to steer him
> that way, and he actually implemented and benchmarked it.
>
> We can't just increase the concurrency as a single work_struct item
> can't be queued multiple times even on a high concurreny workqueue.

Yes, that is it, and that is why last time we can't convert to this way.

Thanks,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()
  2016-09-22 15:17         ` Jens Axboe
@ 2016-09-23  1:43           ` Ming Lei
  2016-09-23  2:44             ` Josef Bacik
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2016-09-23  1:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Jens Axboe, linux-block, jbacik

On Thu, Sep 22, 2016 at 11:17 PM, Jens Axboe <axboe@fb.com> wrote:
> On 09/22/2016 09:12 AM, Christoph Hellwig wrote:
>>
>> On Thu, Sep 22, 2016 at 09:03:56AM -0600, Jens Axboe wrote:
>>>
>>> Having to grab a mutex, for instance. We invoke ->queue_rq() with
>>> preemption disabled, so I'd hope that would not be the case... What
>>> drivers block in ->queue_rq?
>>
>>
>> I though I had converted a lot of them to GFP_NOIO instead of GFP_ATOMIC
>> allocations, but I can't find any evidence of that.  Maybe it was just
>> my imagination, or an unsubmitted patch series.  Sorry for the
>> confusion.
>
>
> OK, that makes more sense. Pretty sure we would have had complaints!
>
>>> Loop was another case that was on my radar to get rid of the queue_work
>>> it currently has to do. Josef is currently testing the nbd driver using
>>> this approach, so we should get some numbers there too. If we have to,
>>> we can always bump up the concurrency to mimic more of the behavior of
>>> having multiple workers running on the hardware queue. I'd prefer to
>>> still do that in blk-mq, instead of having drivers reinventing their own
>>> work offload functionality.
>>
>>
>> There should be a lot of numbers in the list archives from the time
>> that Ming did the loop conversion, as I've been trying to steer him
>> that way, and he actually implemented and benchmarked it.
>>
>> We can't just increase the concurrency as a single work_struct item
>> can't be queued multiple times even on a high concurreny workqueue.
>
>
> But we could have more work items, if we had to. Even if loop isn't a
> drop-in replacement for this simpler approach, I think it'll work well
> enough for nbd. The 5% number from Josef is comparing to not having any
> offload at all, I suspect the number from just converting to queue_work
> in the driver would be similar.

5% might be a noise.

>From nbd's .queue_rq(), kthread_work should match its case because
one per-nbd mutex is required and all cmds sent to the nbd are serialized,
so using common work items may hurt performance caused by
unnecessary context switch.

thanks,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq()
  2016-09-23  1:43           ` Ming Lei
@ 2016-09-23  2:44             ` Josef Bacik
  0 siblings, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2016-09-23  2:44 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, Jens Axboe, linux-block


> On Sep 22, 2016, at 9:44 PM, Ming Lei <ming.lei@canonical.com> wrote:
>=20
>> On Thu, Sep 22, 2016 at 11:17 PM, Jens Axboe <axboe@fb.com> wrote:
>>> On 09/22/2016 09:12 AM, Christoph Hellwig wrote:
>>>=20
>>>> On Thu, Sep 22, 2016 at 09:03:56AM -0600, Jens Axboe wrote:
>>>>=20
>>>> Having to grab a mutex, for instance. We invoke ->queue_rq() with
>>>> preemption disabled, so I'd hope that would not be the case... What
>>>> drivers block in ->queue_rq?
>>>=20
>>>=20
>>> I though I had converted a lot of them to GFP_NOIO instead of GFP_ATOMI=
C
>>> allocations, but I can't find any evidence of that.  Maybe it was just
>>> my imagination, or an unsubmitted patch series.  Sorry for the
>>> confusion.
>>=20
>>=20
>> OK, that makes more sense. Pretty sure we would have had complaints!
>>=20
>>>> Loop was another case that was on my radar to get rid of the queue_wor=
k
>>>> it currently has to do. Josef is currently testing the nbd driver usin=
g
>>>> this approach, so we should get some numbers there too. If we have to,
>>>> we can always bump up the concurrency to mimic more of the behavior of
>>>> having multiple workers running on the hardware queue. I'd prefer to
>>>> still do that in blk-mq, instead of having drivers reinventing their o=
wn
>>>> work offload functionality.
>>>=20
>>>=20
>>> There should be a lot of numbers in the list archives from the time
>>> that Ming did the loop conversion, as I've been trying to steer him
>>> that way, and he actually implemented and benchmarked it.
>>>=20
>>> We can't just increase the concurrency as a single work_struct item
>>> can't be queued multiple times even on a high concurreny workqueue.
>>=20
>>=20
>> But we could have more work items, if we had to. Even if loop isn't a
>> drop-in replacement for this simpler approach, I think it'll work well
>> enough for nbd. The 5% number from Josef is comparing to not having any
>> offload at all, I suspect the number from just converting to queue_work
>> in the driver would be similar.
>=20
> 5% might be a noise.
>=20
> From nbd's .queue_rq(), kthread_work should match its case because
> one per-nbd mutex is required and all cmds sent to the nbd are serialized=
,
> so using common work items may hurt performance caused by
> unnecessary context switch.

This is with my multi connection patch with 4 connections per device (so 4 =
hw queues).  The 5% is real but I don't think it'll get better by punting t=
o a worker per command, so this approach is fine for NBD.  Thanks,

Josef=

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
  2016-09-22 14:52 ` [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() Jens Axboe
  2016-09-22 14:56   ` Christoph Hellwig
  2016-09-22 18:56   ` Christoph Hellwig
@ 2016-09-23 21:56   ` Sagi Grimberg
  2 siblings, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2016-09-23 21:56 UTC (permalink / raw)
  To: Jens Axboe, axboe, linux-block; +Cc: jbacik

Looks good,

Reviewed-by: Sagi Grimberg <sagi@rimberg.me>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
  2016-09-22 18:56   ` Christoph Hellwig
  2016-09-22 20:29     ` Jens Axboe
@ 2016-09-23 21:59     ` Sagi Grimberg
  1 sibling, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2016-09-23 21:59 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: axboe, linux-block, jbacik


> ---
> From c69bf02929d9c37d193b004a4c3c85c1142fa996 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Thu, 22 Sep 2016 11:38:23 -0700
> Subject: blk-mq: remove non-blocking pass in blk_mq_map_request
>
> bt_get already does a non-blocking pass as well as running the queue
> when scheduling internally, no need to duplicate it.

Looks good too,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

Question (while we're on the subject):
Do consumers have a way to restrict blk-mq to block on
lack of tags? I'm thinking in the context of nvme-target
that can do more useful things than waiting for a tag...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2]: Add option for async ->queue_rq
  2016-09-22 14:52 [PATCH 0/2]: Add option for async ->queue_rq Jens Axboe
                   ` (2 preceding siblings ...)
  2016-09-22 18:01 ` [PATCH 0/2]: Add option for async ->queue_rq Josef Bacik
@ 2016-09-28  0:25 ` Bart Van Assche
  2016-09-28 16:43   ` Omar Sandoval
  3 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2016-09-28  0:25 UTC (permalink / raw)
  To: Jens Axboe, axboe, linux-block; +Cc: jbacik

On 09/22/16 07:52, Jens Axboe wrote:
> Two patches that add the ability for a driver to flag itself
> as wanting the ->queue_rq() invoked in a manner that allows
> it to block. We'll need that for the nbd conversion, to avoid
> having to add workqueue offload. We can use this in the current
> loop mq path as well.

Hello Jens,

How can this work since there is code that calls blk_mq_run_hw_queue() 
with a spin lock held? From mq_flush_data_end_io():

	spin_lock_irqsave(&fq->mq_flush_lock, flags);
	if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
		blk_mq_run_hw_queue(hctx, true);
	spin_unlock_irqrestore(&fq->mq_flush_lock, flags);

Sorry that I hadn't noticed this before.

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2]: Add option for async ->queue_rq
  2016-09-28  0:25 ` Bart Van Assche
@ 2016-09-28 16:43   ` Omar Sandoval
  0 siblings, 0 replies; 19+ messages in thread
From: Omar Sandoval @ 2016-09-28 16:43 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, axboe, linux-block, jbacik

On Tue, Sep 27, 2016 at 05:25:36PM -0700, Bart Van Assche wrote:
> On 09/22/16 07:52, Jens Axboe wrote:
> > Two patches that add the ability for a driver to flag itself
> > as wanting the ->queue_rq() invoked in a manner that allows
> > it to block. We'll need that for the nbd conversion, to avoid
> > having to add workqueue offload. We can use this in the current
> > loop mq path as well.
> 
> Hello Jens,
> 
> How can this work since there is code that calls blk_mq_run_hw_queue() with
> a spin lock held? From mq_flush_data_end_io():
> 
> 	spin_lock_irqsave(&fq->mq_flush_lock, flags);
> 	if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
> 		blk_mq_run_hw_queue(hctx, true);
> 	spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
> 
> Sorry that I hadn't noticed this before.
> 
> Bart.

Hey, Bart,

In this particular case, we're passing async=true to
blk_mq_run_hw_queue(), so we do

	kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
			&hctx->run_work, 0);

So ->queue_rq() is being called from a worker thread, not by the calling
thread. Jens' patch 2 makes it so that a driver can request that the
behavior is always as if async were true.

-- 
Omar

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-09-28 16:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22 14:52 [PATCH 0/2]: Add option for async ->queue_rq Jens Axboe
2016-09-22 14:52 ` [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() Jens Axboe
2016-09-22 14:56   ` Christoph Hellwig
2016-09-22 18:56   ` Christoph Hellwig
2016-09-22 20:29     ` Jens Axboe
2016-09-23 21:59     ` Sagi Grimberg
2016-09-23 21:56   ` Sagi Grimberg
2016-09-22 14:53 ` [PATCH 2/2] blk-mq: add flag for drivers wanting blocking ->queue_rq() Jens Axboe
2016-09-22 14:59   ` Christoph Hellwig
2016-09-22 15:03     ` Jens Axboe
2016-09-22 15:12       ` Christoph Hellwig
2016-09-22 15:17         ` Jens Axboe
2016-09-23  1:43           ` Ming Lei
2016-09-23  2:44             ` Josef Bacik
2016-09-23  1:35         ` Ming Lei
2016-09-22 15:04     ` Josef Bacik
2016-09-22 18:01 ` [PATCH 0/2]: Add option for async ->queue_rq Josef Bacik
2016-09-28  0:25 ` Bart Van Assche
2016-09-28 16:43   ` Omar Sandoval

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.