All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] block: enqueue splitted bios into same cpu
@ 2020-09-11  3:29 Jeffle Xu
  2020-09-11 11:01 ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Jeffle Xu @ 2020-09-11  3:29 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, Jeffle Xu

Splitted bios of one source bio can be enqueued into different CPU since
the submit_bio() routine can be preempted or fall asleep. However this
behaviour can't work well with iopolling.

Currently block iopolling only polls the hardwar queue of the input bio.
If one bio is splitted to several bios, one (bio 1) of which is enqueued
into CPU A, while the others enqueued into CPU B, then the polling of bio 1
will cotinuously poll the hardware queue of CPU A, though the other
splitted bios may be in other hardware queues.

The iopolling logic has no idea if the input bio is splitted bio, or if
it has other splitted siblings. Thus ensure that all splitted bios are
enqueued into one CPU at the beginning.

This is only one RFC patch and it is not complete since dm/mq-scheduler
have not been considered yet. Please let me know if it is on the correct
direction or not.

Besides I have one question on the split routine. Why the split routine
is implemented in a recursive style? Why we can't split the bio one time
and then submit the *already splitted* bios one by one?

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 block/bio.c               | 5 +++++
 block/blk-core.c          | 4 +++-
 block/blk-mq-sched.c      | 4 +++-
 block/blk-mq.c            | 6 ++++--
 block/blk-mq.h            | 1 +
 include/linux/blk_types.h | 1 +
 6 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index a9931f23d933..4580cb23ab7b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -281,6 +281,7 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 	memset(bio, 0, sizeof(*bio));
 	atomic_set(&bio->__bi_remaining, 1);
 	atomic_set(&bio->__bi_cnt, 1);
+	bio->bi_cpu_hint = -1;
 
 	bio->bi_io_vec = table;
 	bio->bi_max_vecs = max_vecs;
@@ -687,6 +688,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	bio->bi_opf = bio_src->bi_opf;
 	bio->bi_ioprio = bio_src->bi_ioprio;
 	bio->bi_write_hint = bio_src->bi_write_hint;
+	bio->bi_cpu_hint = bio_src->bi_cpu_hint;
 	bio->bi_iter = bio_src->bi_iter;
 	bio->bi_io_vec = bio_src->bi_io_vec;
 
@@ -1474,6 +1476,9 @@ struct bio *bio_split(struct bio *bio, int sectors,
 	if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
 		return NULL;
 
+	if (bio->bi_cpu_hint == -1)
+		bio->bi_cpu_hint = raw_smp_processor_id();
+
 	split = bio_clone_fast(bio, gfp, bs);
 	if (!split)
 		return NULL;
diff --git a/block/blk-core.c b/block/blk-core.c
index 10c08ac50697..f9d1950678d8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -775,7 +775,9 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
 			*same_queue_rq = rq;
 		}
 
-		if (rq->q != q || !blk_rq_merge_ok(rq, bio))
+		if (rq->q != q ||
+		   ((bio->bi_cpu_hint != -1) && (rq->mq_ctx->cpu != bio->bi_cpu_hint)) ||
+		   !blk_rq_merge_ok(rq, bio))
 			continue;
 
 		switch (blk_try_merge(rq, bio)) {
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index d2790e5b06d1..c2d6f06792a8 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -461,7 +461,9 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs)
 {
 	struct elevator_queue *e = q->elevator;
-	struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
+	unsigned int cpu = bio->bi_cpu_hint != -1 ?
+				bio->bi_cpu_hint : raw_smp_processor_id();
+	struct blk_mq_ctx *ctx = __blk_mq_get_ctx(q, cpu);
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, bio->bi_opf, ctx);
 	bool ret = false;
 	enum hctx_type type;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b3d2785eefe9..05716a5195d4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -348,7 +348,7 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
 	struct request_queue *q = data->q;
 	struct elevator_queue *e = q->elevator;
 	u64 alloc_time_ns = 0;
-	unsigned int tag;
+	unsigned int tag, cpu;
 
 	/* alloc_time includes depth and tag waits */
 	if (blk_queue_rq_alloc_time(q))
@@ -370,7 +370,8 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
 	}
 
 retry:
-	data->ctx = blk_mq_get_ctx(q);
+	cpu = (data->cpu_hint != -1) ? data->cpu_hint : raw_smp_processor_id();
+	data->ctx = __blk_mq_get_ctx(q, cpu);
 	data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
 	if (!e)
 		blk_mq_tag_busy(data->hctx);
@@ -2168,6 +2169,7 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
 	rq_qos_throttle(q, bio);
 
 	data.cmd_flags = bio->bi_opf;
+	data.cpu_hint = bio->bi_cpu_hint;
 	rq = __blk_mq_alloc_request(&data);
 	if (unlikely(!rq)) {
 		rq_qos_cleanup(q, bio);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 863a2f3346d4..89c4fd34c0bd 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -152,6 +152,7 @@ struct blk_mq_alloc_data {
 	blk_mq_req_flags_t flags;
 	unsigned int shallow_depth;
 	unsigned int cmd_flags;
+	int	     cpu_hint;
 
 	/* input & output parameter */
 	struct blk_mq_ctx *ctx;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 4ecf4fed171f..f81dea9ceb56 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -191,6 +191,7 @@ struct bio {
 	unsigned short		bi_flags;	/* status, etc and bvec pool number */
 	unsigned short		bi_ioprio;
 	unsigned short		bi_write_hint;
+	int			bi_cpu_hint;
 	blk_status_t		bi_status;
 	u8			bi_partno;
 	atomic_t		__bi_remaining;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] block: enqueue splitted bios into same cpu
  2020-09-11  3:29 [RFC] block: enqueue splitted bios into same cpu Jeffle Xu
@ 2020-09-11 11:01 ` Ming Lei
  2020-09-11 11:46   ` JeffleXu
       [not found]   ` <e787faa8-d31f-04e7-f722-5013a52dc8ab@linux.alibaba.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Ming Lei @ 2020-09-11 11:01 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: axboe, linux-block

On Fri, Sep 11, 2020 at 11:29:58AM +0800, Jeffle Xu wrote:
> Splitted bios of one source bio can be enqueued into different CPU since
> the submit_bio() routine can be preempted or fall asleep. However this
> behaviour can't work well with iopolling.

Do you have user visible problem wrt. io polling? If yes, can you
provide more details?

> 
> Currently block iopolling only polls the hardwar queue of the input bio.
> If one bio is splitted to several bios, one (bio 1) of which is enqueued
> into CPU A, while the others enqueued into CPU B, then the polling of bio 1
> will cotinuously poll the hardware queue of CPU A, though the other
> splitted bios may be in other hardware queues.

If it is guaranteed that the returned cookie is from bio 1, poll is
supposed to work as expected, since bio 1 is the chained head of these
bios, and the whole fs bio can be thought as done when bio1 .end_bio
is called.

> 
> The iopolling logic has no idea if the input bio is splitted bio, or if
> it has other splitted siblings. Thus ensure that all splitted bios are
> enqueued into one CPU at the beginning.

Yeah, that is why io poll can't work on DM.

> 
> This is only one RFC patch and it is not complete since dm/mq-scheduler
> have not been considered yet. Please let me know if it is on the correct
> direction or not.
> 
> Besides I have one question on the split routine. Why the split routine
> is implemented in a recursive style? Why we can't split the bio one time
> and then submit the *already splitted* bios one by one?

Forward progress has to be provided on new splitted bio allocation which
is from same bio_set.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] block: enqueue splitted bios into same cpu
  2020-09-11 11:01 ` Ming Lei
@ 2020-09-11 11:46   ` JeffleXu
       [not found]   ` <e787faa8-d31f-04e7-f722-5013a52dc8ab@linux.alibaba.com>
  1 sibling, 0 replies; 8+ messages in thread
From: JeffleXu @ 2020-09-11 11:46 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block

Thanks for replying. ;)


On 9/11/20 7:01 PM, Ming Lei wrote:
> On Fri, Sep 11, 2020 at 11:29:58AM +0800, Jeffle Xu wrote:
>> Splitted bios of one source bio can be enqueued into different CPU since
>> the submit_bio() routine can be preempted or fall asleep. However this
>> behaviour can't work well with iopolling.
> Do you have user visible problem wrt. io polling? If yes, can you
> provide more details?

No, there's no practical example yet. It's only a hint from the code base.


>
>> Currently block iopolling only polls the hardwar queue of the input bio.
>> If one bio is splitted to several bios, one (bio 1) of which is enqueued
>> into CPU A, while the others enqueued into CPU B, then the polling of bio 1
>> will cotinuously poll the hardware queue of CPU A, though the other
>> splitted bios may be in other hardware queues.
> If it is guaranteed that the returned cookie is from bio 1, poll is
> supposed to work as expected, since bio 1 is the chained head of these
> bios, and the whole fs bio can be thought as done when bio1 .end_bio
> is called.
Yes, it is, thanks for your explanation. But except for polling if the 
input bio has completed, one of the

important work of polling logic is to reap the completion queue. Let's 
say one bio is split into

two bios, bio 1 and bio 2, both of which are enqueued into the same 
hardware queue.When polling bio1,

though we have no idea about bio2 at all, the polling logic itself is 
still reaping the completion queue of

this hardware queue repeatedly, in which case the polling logic still 
stimulates reaping bio2.


Then what if these two split bios enqueued into two different hardware 
queue? Let's say bio1 is enqueued

into hardware queue A, while bio2 is enqueued into hardware queue B. 
When polling bio1, though the polling

logic is repeatedly reaping the completion queue of hardware queue A, it 
doesn't help reap bio2. bio2 is reaped

by IRQ as usual. This certainly works currently, but this behavior may 
deviate the polling design? I'm not sure.


In other words, if we can ensure that all split bios are enqueued into 
the same hardware queue, then the polling

logic *may* be faster.


>
>> The iopolling logic has no idea if the input bio is splitted bio, or if
>> it has other splitted siblings. Thus ensure that all splitted bios are
>> enqueued into one CPU at the beginning.
> Yeah, that is why io poll can't work on DM.

Exactly I'm interested in dm polling. The polling of bio to dm device 
can be mapped into the polling of the

several underlying device. Except for the the design of the cookie, 
currently I have not found other blocking

points technically. Please let me know if I missed something.
>
>> This is only one RFC patch and it is not complete since dm/mq-scheduler
>> have not been considered yet. Please let me know if it is on the correct
>> direction or not.
>>
>> Besides I have one question on the split routine. Why the split routine
>> is implemented in a recursive style? Why we can't split the bio one time
>> and then submit the *already splitted* bios one by one?
> Forward progress has to be provided on new splitted bio allocation which
> is from same bio_set.
Sorry I can't understand this. Is this a suggestion on how to improving 
this patch, or a reply to the question

why the split routine is implemented in a recursive style? Would you 
please provide more details?
>
>
> Thanks,
> Ming


Thanks,

Jeffle


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] block: enqueue splitted bios into same cpu
       [not found]   ` <e787faa8-d31f-04e7-f722-5013a52dc8ab@linux.alibaba.com>
@ 2020-09-13 14:00     ` Ming Lei
  2020-09-22  4:43       ` JeffleXu
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2020-09-13 14:00 UTC (permalink / raw)
  To: JeffleXu; +Cc: axboe, linux-block

On Fri, Sep 11, 2020 at 07:40:14PM +0800, JeffleXu wrote:
> 
> Thanks for replying ;)
> 
> 
> On 9/11/20 7:01 PM, Ming Lei wrote:
> > On Fri, Sep 11, 2020 at 11:29:58AM +0800, Jeffle Xu wrote:
> > > Splitted bios of one source bio can be enqueued into different CPU since
> > > the submit_bio() routine can be preempted or fall asleep. However this
> > > behaviour can't work well with iopolling.
> > Do you have user visible problem wrt. io polling? If yes, can you
> > provide more details?
> 
> No, there's no practical example yet. It's only a hint from the code base.
> 
> 
> > > Currently block iopolling only polls the hardwar queue of the input bio.
> > > If one bio is splitted to several bios, one (bio 1) of which is enqueued
> > > into CPU A, while the others enqueued into CPU B, then the polling of bio 1
> > > will cotinuously poll the hardware queue of CPU A, though the other
> > > splitted bios may be in other hardware queues.
> > If it is guaranteed that the returned cookie is from bio 1, poll is
> > supposed to work as expected, since bio 1 is the chained head of these
> > bios, and the whole fs bio can be thought as done when bio1 .end_bio
> > is called.
> 
> Yes, it is, thanks for your explanation. But except for polling if the input
> bio has completed, one of the
> 
> important work of polling logic is to reap the completion queue. Let's say
> one bio is split into
> 
> two bios, bio 1 and bio 2, both of which are enqueued into the same hardware
> queue.When polling bio1,
> 
> though we have no idea about bio2 at all, the polling logic itself is still
> reaping the completion queue of
> 
> this hardware queue repeatedly, in which case the polling logic still
> stimulates reaping bio2.
> 
> 
> Then what if these two split bios enqueued into two different hardware
> queue? Let's say bio1 is enqueued
> 
> into hardware queue A, while bio2 is enqueued into hardware queue B. When
> polling bio1, though the polling
> 
> logic is repeatedly reaping the completion queue of hardware queue A, it
> doesn't help reap bio2. bio2 is reaped
> 
> by IRQ as usual. This certainly works currently, but this behavior may
> deviate the polling design? I'm not sure.
> 
> 
> In other words, if we can ensure that all split bios are enqueued into the
> same hardware queue, then the polling
> 
> logic *may* be faster.

__submit_bio_noacct_mq() returns cookie from the last bio in current->bio_list, and
this bio should be the bio passed to __submit_bio_noacct_mq() when bio splitting happens.

Suppose CPU migration happens during bio splitting, the last bio should be
submitted to LLD much late than other bios, so when blk_poll() finds
completion on the hw queue of the last bio, usually other bios should
be completed already most of times.

Also CPU migration itself causes much bigger latency, so it is reasonable to
not expect good IO performance when CPU migration is involved. And CPU migration
on IO task shouldn't have been done frequently. That said it should be
fine to miss the poll in this situation.

Also the following part of your patch may not work reliably:

@@ -370,7 +370,8 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
        }

 retry:
-       data->ctx = blk_mq_get_ctx(q);
+       cpu = (data->cpu_hint != -1) ? data->cpu_hint : raw_smp_processor_id();
+       data->ctx = __blk_mq_get_ctx(q, cpu);
        data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
        if (!e)
                blk_mq_tag_busy(data->hctx);

If the cpu of data->cpu_hint is becoming offline, the above retry may
never be finished. Also I really don't like this way to allocate request
on specified cpu or ctx.

> 
> 
> > > The iopolling logic has no idea if the input bio is splitted bio, or if
> > > it has other splitted siblings. Thus ensure that all splitted bios are
> > > enqueued into one CPU at the beginning.
> > Yeah, that is why io poll can't work on DM.
> 
> Exactly I'm interested in dm polling. The polling of bio to dm device can be
> mapped into the polling of the
> 
> several underlying device. Except for the the design of the cookie,
> currently I have not found other blocking
> 
> points technically. Please let me know if I missed something.

At least dm(except for dm-mpath) doesn't use blk-mq , so far io poll is
based on blk-mq. Not mention it could be hard to return the expected
cookie.

> 
> 
> > 
> > > This is only one RFC patch and it is not complete since dm/mq-scheduler
> > > have not been considered yet. Please let me know if it is on the correct
> > > direction or not.
> > > 
> > > Besides I have one question on the split routine. Why the split routine
> > > is implemented in a recursive style? Why we can't split the bio one time
> > > and then submit the *already splitted* bios one by one?
> > Forward progress has to be provided on new splitted bio allocation which
> > is from same bio_set.
> 
> Sorry I can't understand this. Is this a suggestion on how to improving this
> patch, or a reply to the question
> 
> why the split routine is implemented in a recursive style? Would you please
> provide more details?

It is for preventing stack overflows.

Please take a close look at bio_alloc_bioset's comment and understand
why 'callers must never allocate more than 1 bio at a time from this pool'


Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] block: enqueue splitted bios into same cpu
  2020-09-13 14:00     ` Ming Lei
@ 2020-09-22  4:43       ` JeffleXu
  2020-09-22 11:56         ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: JeffleXu @ 2020-09-22  4:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block

Thanks for replying. Comments embedded below.


On 9/13/20 10:00 PM, Ming Lei wrote:
> On Fri, Sep 11, 2020 at 07:40:14PM +0800, JeffleXu wrote:
>> Thanks for replying ;)
>>
>>
>> On 9/11/20 7:01 PM, Ming Lei wrote:
>>> On Fri, Sep 11, 2020 at 11:29:58AM +0800, Jeffle Xu wrote:
>>>> Splitted bios of one source bio can be enqueued into different CPU since
>>>> the submit_bio() routine can be preempted or fall asleep. However this
>>>> behaviour can't work well with iopolling.
>>> Do you have user visible problem wrt. io polling? If yes, can you
>>> provide more details?
>> No, there's no practical example yet. It's only a hint from the code base.
>>
>>
>>>> Currently block iopolling only polls the hardwar queue of the input bio.
>>>> If one bio is splitted to several bios, one (bio 1) of which is enqueued
>>>> into CPU A, while the others enqueued into CPU B, then the polling of bio 1
>>>> will cotinuously poll the hardware queue of CPU A, though the other
>>>> splitted bios may be in other hardware queues.
>>> If it is guaranteed that the returned cookie is from bio 1, poll is
>>> supposed to work as expected, since bio 1 is the chained head of these
>>> bios, and the whole fs bio can be thought as done when bio1 .end_bio
>>> is called.
>> Yes, it is, thanks for your explanation. But except for polling if the input
>> bio has completed, one of the
>>
>> important work of polling logic is to reap the completion queue. Let's say
>> one bio is split into
>>
>> two bios, bio 1 and bio 2, both of which are enqueued into the same hardware
>> queue.When polling bio1,
>>
>> though we have no idea about bio2 at all, the polling logic itself is still
>> reaping the completion queue of
>>
>> this hardware queue repeatedly, in which case the polling logic still
>> stimulates reaping bio2.
>>
>>
>> Then what if these two split bios enqueued into two different hardware
>> queue? Let's say bio1 is enqueued
>>
>> into hardware queue A, while bio2 is enqueued into hardware queue B. When
>> polling bio1, though the polling
>>
>> logic is repeatedly reaping the completion queue of hardware queue A, it
>> doesn't help reap bio2. bio2 is reaped
>>
>> by IRQ as usual. This certainly works currently, but this behavior may
>> deviate the polling design? I'm not sure.
>>
>>
>> In other words, if we can ensure that all split bios are enqueued into the
>> same hardware queue, then the polling
>>
>> logic *may* be faster.
> __submit_bio_noacct_mq() returns cookie from the last bio in current->bio_list, and
> this bio should be the bio passed to __submit_bio_noacct_mq() when bio splitting happens.
>
> Suppose CPU migration happens during bio splitting, the last bio should be
> submitted to LLD much late than other bios, so when blk_poll() finds
> completion on the hw queue of the last bio, usually other bios should
> be completed already most of times.
>
> Also CPU migration itself causes much bigger latency, so it is reasonable to
> not expect good IO performance when CPU migration is involved. And CPU migration
> on IO task shouldn't have been done frequently. That said it should be
> fine to miss the poll in this situation.

Yes you're right. After diving into the code of nvme driver, currently 
nvme driver indeed allocate interrupt for polling queues,

that is, reusing the interrupt used by admin queue.

Jens had ever said that the interrupt may be disabled for queues working 
in polling mode someday (from my colleague). If

that is true, then this may become an issue. But at least now this 
indeed works.


>
> Also the following part of your patch may not work reliably:
>
> @@ -370,7 +370,8 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
>          }
>
>   retry:
> -       data->ctx = blk_mq_get_ctx(q);
> +       cpu = (data->cpu_hint != -1) ? data->cpu_hint : raw_smp_processor_id();
> +       data->ctx = __blk_mq_get_ctx(q, cpu);
>          data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
>          if (!e)
>                  blk_mq_tag_busy(data->hctx);
>
> If the cpu of data->cpu_hint is becoming offline, the above retry may
> never be finished. Also I really don't like this way to allocate request
> on specified cpu or ctx.
>
>>
>>>> The iopolling logic has no idea if the input bio is splitted bio, or if
>>>> it has other splitted siblings. Thus ensure that all splitted bios are
>>>> enqueued into one CPU at the beginning.
>>> Yeah, that is why io poll can't work on DM.
>> Exactly I'm interested in dm polling. The polling of bio to dm device can be
>> mapped into the polling of the
>>
>> several underlying device. Except for the the design of the cookie,
>> currently I have not found other blocking
>>
>> points technically. Please let me know if I missed something.
> At least dm(except for dm-mpath) doesn't use blk-mq , so far io poll is
> based on blk-mq. Not mention it could be hard to return the expected
> cookie.

Polling mode is important if we want to use io-uring on dm.


I want to refactor the cookie to u64 from unsigned int, thus the u64 
cookie can be a pointer to some structure for device

mapper, e.g. struct dm_io, and all cookies from underlying devices can 
be stored in this structure (struct dm_io).

This need to refactor the io polling framework somehow, and I'm not sure 
if the community likes this idea.


>
>>
>>>> This is only one RFC patch and it is not complete since dm/mq-scheduler
>>>> have not been considered yet. Please let me know if it is on the correct
>>>> direction or not.
>>>>
>>>> Besides I have one question on the split routine. Why the split routine
>>>> is implemented in a recursive style? Why we can't split the bio one time
>>>> and then submit the *already splitted* bios one by one?
>>> Forward progress has to be provided on new splitted bio allocation which
>>> is from same bio_set.
>> Sorry I can't understand this. Is this a suggestion on how to improving this
>> patch, or a reply to the question
>>
>> why the split routine is implemented in a recursive style? Would you please
>> provide more details?
> It is for preventing stack overflows.
>
> Please take a close look at bio_alloc_bioset's comment and understand
> why 'callers must never allocate more than 1 bio at a time from this pool'

Thanks.

Jeffle


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] block: enqueue splitted bios into same cpu
  2020-09-22  4:43       ` JeffleXu
@ 2020-09-22 11:56         ` Ming Lei
  2020-09-22 12:19           ` JeffleXu
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2020-09-22 11:56 UTC (permalink / raw)
  To: JeffleXu; +Cc: axboe, linux-block

On Tue, Sep 22, 2020 at 12:43:37PM +0800, JeffleXu wrote:
> Thanks for replying. Comments embedded below.
> 
> 
> On 9/13/20 10:00 PM, Ming Lei wrote:
> > On Fri, Sep 11, 2020 at 07:40:14PM +0800, JeffleXu wrote:
> > > Thanks for replying ;)
> > > 
> > > 
> > > On 9/11/20 7:01 PM, Ming Lei wrote:
> > > > On Fri, Sep 11, 2020 at 11:29:58AM +0800, Jeffle Xu wrote:
> > > > > Splitted bios of one source bio can be enqueued into different CPU since
> > > > > the submit_bio() routine can be preempted or fall asleep. However this
> > > > > behaviour can't work well with iopolling.
> > > > Do you have user visible problem wrt. io polling? If yes, can you
> > > > provide more details?
> > > No, there's no practical example yet. It's only a hint from the code base.
> > > 
> > > 
> > > > > Currently block iopolling only polls the hardwar queue of the input bio.
> > > > > If one bio is splitted to several bios, one (bio 1) of which is enqueued
> > > > > into CPU A, while the others enqueued into CPU B, then the polling of bio 1
> > > > > will cotinuously poll the hardware queue of CPU A, though the other
> > > > > splitted bios may be in other hardware queues.
> > > > If it is guaranteed that the returned cookie is from bio 1, poll is
> > > > supposed to work as expected, since bio 1 is the chained head of these
> > > > bios, and the whole fs bio can be thought as done when bio1 .end_bio
> > > > is called.
> > > Yes, it is, thanks for your explanation. But except for polling if the input
> > > bio has completed, one of the
> > > 
> > > important work of polling logic is to reap the completion queue. Let's say
> > > one bio is split into
> > > 
> > > two bios, bio 1 and bio 2, both of which are enqueued into the same hardware
> > > queue.When polling bio1,
> > > 
> > > though we have no idea about bio2 at all, the polling logic itself is still
> > > reaping the completion queue of
> > > 
> > > this hardware queue repeatedly, in which case the polling logic still
> > > stimulates reaping bio2.
> > > 
> > > 
> > > Then what if these two split bios enqueued into two different hardware
> > > queue? Let's say bio1 is enqueued
> > > 
> > > into hardware queue A, while bio2 is enqueued into hardware queue B. When
> > > polling bio1, though the polling
> > > 
> > > logic is repeatedly reaping the completion queue of hardware queue A, it
> > > doesn't help reap bio2. bio2 is reaped
> > > 
> > > by IRQ as usual. This certainly works currently, but this behavior may
> > > deviate the polling design? I'm not sure.
> > > 
> > > 
> > > In other words, if we can ensure that all split bios are enqueued into the
> > > same hardware queue, then the polling
> > > 
> > > logic *may* be faster.
> > __submit_bio_noacct_mq() returns cookie from the last bio in current->bio_list, and
> > this bio should be the bio passed to __submit_bio_noacct_mq() when bio splitting happens.
> > 
> > Suppose CPU migration happens during bio splitting, the last bio should be
> > submitted to LLD much late than other bios, so when blk_poll() finds
> > completion on the hw queue of the last bio, usually other bios should
> > be completed already most of times.
> > 
> > Also CPU migration itself causes much bigger latency, so it is reasonable to
> > not expect good IO performance when CPU migration is involved. And CPU migration
> > on IO task shouldn't have been done frequently. That said it should be
> > fine to miss the poll in this situation.
> 
> Yes you're right. After diving into the code of nvme driver, currently nvme
> driver indeed allocate interrupt for polling queues,

No, nvme driver doesn't allocate interrupt for poll queues, please see
nvme_setup_irqs().

> 
> that is, reusing the interrupt used by admin queue.
> 
> Jens had ever said that the interrupt may be disabled for queues working in
> polling mode someday (from my colleague). If
> 
> that is true, then this may become an issue. But at least now this indeed
> works.

What is the issue?


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] block: enqueue splitted bios into same cpu
  2020-09-22 11:56         ` Ming Lei
@ 2020-09-22 12:19           ` JeffleXu
  2020-09-23  7:15             ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: JeffleXu @ 2020-09-22 12:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block


On 9/22/20 7:56 PM, Ming Lei wrote:
> On Tue, Sep 22, 2020 at 12:43:37PM +0800, JeffleXu wrote:
>> Thanks for replying. Comments embedded below.
>>
>>
>> On 9/13/20 10:00 PM, Ming Lei wrote:
>>> On Fri, Sep 11, 2020 at 07:40:14PM +0800, JeffleXu wrote:
>>>> Thanks for replying ;)
>>>>
>>>>
>>>> On 9/11/20 7:01 PM, Ming Lei wrote:
>>>>> On Fri, Sep 11, 2020 at 11:29:58AM +0800, Jeffle Xu wrote:
>>>>>> Splitted bios of one source bio can be enqueued into different CPU since
>>>>>> the submit_bio() routine can be preempted or fall asleep. However this
>>>>>> behaviour can't work well with iopolling.
>>>>> Do you have user visible problem wrt. io polling? If yes, can you
>>>>> provide more details?
>>>> No, there's no practical example yet. It's only a hint from the code base.
>>>>
>>>>
>>>>>> Currently block iopolling only polls the hardwar queue of the input bio.
>>>>>> If one bio is splitted to several bios, one (bio 1) of which is enqueued
>>>>>> into CPU A, while the others enqueued into CPU B, then the polling of bio 1
>>>>>> will cotinuously poll the hardware queue of CPU A, though the other
>>>>>> splitted bios may be in other hardware queues.
>>>>> If it is guaranteed that the returned cookie is from bio 1, poll is
>>>>> supposed to work as expected, since bio 1 is the chained head of these
>>>>> bios, and the whole fs bio can be thought as done when bio1 .end_bio
>>>>> is called.
>>>> Yes, it is, thanks for your explanation. But except for polling if the input
>>>> bio has completed, one of the
>>>>
>>>> important work of polling logic is to reap the completion queue. Let's say
>>>> one bio is split into
>>>>
>>>> two bios, bio 1 and bio 2, both of which are enqueued into the same hardware
>>>> queue.When polling bio1,
>>>>
>>>> though we have no idea about bio2 at all, the polling logic itself is still
>>>> reaping the completion queue of
>>>>
>>>> this hardware queue repeatedly, in which case the polling logic still
>>>> stimulates reaping bio2.
>>>>
>>>>
>>>> Then what if these two split bios enqueued into two different hardware
>>>> queue? Let's say bio1 is enqueued
>>>>
>>>> into hardware queue A, while bio2 is enqueued into hardware queue B. When
>>>> polling bio1, though the polling
>>>>
>>>> logic is repeatedly reaping the completion queue of hardware queue A, it
>>>> doesn't help reap bio2. bio2 is reaped
>>>>
>>>> by IRQ as usual. This certainly works currently, but this behavior may
>>>> deviate the polling design? I'm not sure.
>>>>
>>>>
>>>> In other words, if we can ensure that all split bios are enqueued into the
>>>> same hardware queue, then the polling
>>>>
>>>> logic *may* be faster.
>>> __submit_bio_noacct_mq() returns cookie from the last bio in current->bio_list, and
>>> this bio should be the bio passed to __submit_bio_noacct_mq() when bio splitting happens.
>>>
>>> Suppose CPU migration happens during bio splitting, the last bio should be
>>> submitted to LLD much late than other bios, so when blk_poll() finds
>>> completion on the hw queue of the last bio, usually other bios should
>>> be completed already most of times.
>>>
>>> Also CPU migration itself causes much bigger latency, so it is reasonable to
>>> not expect good IO performance when CPU migration is involved. And CPU migration
>>> on IO task shouldn't have been done frequently. That said it should be
>>> fine to miss the poll in this situation.
>> Yes you're right. After diving into the code of nvme driver, currently nvme
>> driver indeed allocate interrupt for polling queues,
> No, nvme driver doesn't allocate interrupt for poll queues, please see
> nvme_setup_irqs().

Sorry I was wrong here. Indeed interrupts are disabled for IO queues in 
polling mode. Then this can be a problem.

If CPU migration happens, separate split bios can be enqueued into 
different polling hardware queues (e.g. queue 1

and queue 2). The caller is continuously polling on one of the polling 
hardware queue (e.g. queue 1) indicated by the

returned cookie. If there's no other thread polling on the other 
hardware queue (e.g. queue 2), the split bio on queue 2

will not be reaped since the interrupt of queue 2 is disabled. Finally 
the completion of this bio (on queue 2) relies on

timeout mechanism.

>
>> that is, reusing the interrupt used by admin queue.
>>
>> Jens had ever said that the interrupt may be disabled for queues working in
>> polling mode someday (from my colleague). If
>>
>> that is true, then this may become an issue. But at least now this indeed
>> works.
> What is the issue?

Same issue described above.


>
>
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] block: enqueue splitted bios into same cpu
  2020-09-22 12:19           ` JeffleXu
@ 2020-09-23  7:15             ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2020-09-23  7:15 UTC (permalink / raw)
  To: JeffleXu; +Cc: axboe, linux-block

On Tue, Sep 22, 2020 at 08:19:00PM +0800, JeffleXu wrote:
> 
> On 9/22/20 7:56 PM, Ming Lei wrote:
> > On Tue, Sep 22, 2020 at 12:43:37PM +0800, JeffleXu wrote:
> > > Thanks for replying. Comments embedded below.
> > > 
> > > 
> > > On 9/13/20 10:00 PM, Ming Lei wrote:
> > > > On Fri, Sep 11, 2020 at 07:40:14PM +0800, JeffleXu wrote:
> > > > > Thanks for replying ;)
> > > > > 
> > > > > 
> > > > > On 9/11/20 7:01 PM, Ming Lei wrote:
> > > > > > On Fri, Sep 11, 2020 at 11:29:58AM +0800, Jeffle Xu wrote:
> > > > > > > Splitted bios of one source bio can be enqueued into different CPU since
> > > > > > > the submit_bio() routine can be preempted or fall asleep. However this
> > > > > > > behaviour can't work well with iopolling.
> > > > > > Do you have user visible problem wrt. io polling? If yes, can you
> > > > > > provide more details?
> > > > > No, there's no practical example yet. It's only a hint from the code base.
> > > > > 
> > > > > 
> > > > > > > Currently block iopolling only polls the hardwar queue of the input bio.
> > > > > > > If one bio is splitted to several bios, one (bio 1) of which is enqueued
> > > > > > > into CPU A, while the others enqueued into CPU B, then the polling of bio 1
> > > > > > > will cotinuously poll the hardware queue of CPU A, though the other
> > > > > > > splitted bios may be in other hardware queues.
> > > > > > If it is guaranteed that the returned cookie is from bio 1, poll is
> > > > > > supposed to work as expected, since bio 1 is the chained head of these
> > > > > > bios, and the whole fs bio can be thought as done when bio1 .end_bio
> > > > > > is called.
> > > > > Yes, it is, thanks for your explanation. But except for polling if the input
> > > > > bio has completed, one of the
> > > > > 
> > > > > important work of polling logic is to reap the completion queue. Let's say
> > > > > one bio is split into
> > > > > 
> > > > > two bios, bio 1 and bio 2, both of which are enqueued into the same hardware
> > > > > queue.When polling bio1,
> > > > > 
> > > > > though we have no idea about bio2 at all, the polling logic itself is still
> > > > > reaping the completion queue of
> > > > > 
> > > > > this hardware queue repeatedly, in which case the polling logic still
> > > > > stimulates reaping bio2.
> > > > > 
> > > > > 
> > > > > Then what if these two split bios enqueued into two different hardware
> > > > > queue? Let's say bio1 is enqueued
> > > > > 
> > > > > into hardware queue A, while bio2 is enqueued into hardware queue B. When
> > > > > polling bio1, though the polling
> > > > > 
> > > > > logic is repeatedly reaping the completion queue of hardware queue A, it
> > > > > doesn't help reap bio2. bio2 is reaped
> > > > > 
> > > > > by IRQ as usual. This certainly works currently, but this behavior may
> > > > > deviate the polling design? I'm not sure.
> > > > > 
> > > > > 
> > > > > In other words, if we can ensure that all split bios are enqueued into the
> > > > > same hardware queue, then the polling
> > > > > 
> > > > > logic *may* be faster.
> > > > __submit_bio_noacct_mq() returns cookie from the last bio in current->bio_list, and
> > > > this bio should be the bio passed to __submit_bio_noacct_mq() when bio splitting happens.
> > > > 
> > > > Suppose CPU migration happens during bio splitting, the last bio should be
> > > > submitted to LLD much late than other bios, so when blk_poll() finds
> > > > completion on the hw queue of the last bio, usually other bios should
> > > > be completed already most of times.
> > > > 
> > > > Also CPU migration itself causes much bigger latency, so it is reasonable to
> > > > not expect good IO performance when CPU migration is involved. And CPU migration
> > > > on IO task shouldn't have been done frequently. That said it should be
> > > > fine to miss the poll in this situation.
> > > Yes you're right. After diving into the code of nvme driver, currently nvme
> > > driver indeed allocate interrupt for polling queues,
> > No, nvme driver doesn't allocate interrupt for poll queues, please see
> > nvme_setup_irqs().
> 
> Sorry I was wrong here. Indeed interrupts are disabled for IO queues in
> polling mode. Then this can be a problem.
> 
> If CPU migration happens, separate split bios can be enqueued into different
> polling hardware queues (e.g. queue 1
> 
> and queue 2). The caller is continuously polling on one of the polling
> hardware queue (e.g. queue 1) indicated by the
> 
> returned cookie. If there's no other thread polling on the other hardware
> queue (e.g. queue 2), the split bio on queue 2
> 
> will not be reaped since the interrupt of queue 2 is disabled. Finally the
> completion of this bio (on queue 2) relies on
> 
> timeout mechanism.

OK, looks one real issue, just found that request can only be completed
explicitly in nvme_mq_ops.poll(). Without calling ->poll() on specified
poll hw queue, request can't be completed at all.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-09-23  7:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11  3:29 [RFC] block: enqueue splitted bios into same cpu Jeffle Xu
2020-09-11 11:01 ` Ming Lei
2020-09-11 11:46   ` JeffleXu
     [not found]   ` <e787faa8-d31f-04e7-f722-5013a52dc8ab@linux.alibaba.com>
2020-09-13 14:00     ` Ming Lei
2020-09-22  4:43       ` JeffleXu
2020-09-22 11:56         ` Ming Lei
2020-09-22 12:19           ` JeffleXu
2020-09-23  7:15             ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.