All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.4] block: split bios to max possible length
@ 2016-01-04 18:24 Keith Busch
  2016-01-05  4:54 ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2016-01-04 18:24 UTC (permalink / raw)


This allows bio splits in the middle of a vector to form the largest
possible bio at the h/w's desired alignment, and guarantees the bio being
split will have some data. Previously, if the first vector's length was
greater than the allowable amount, the bio would split at a zero length
and hit a kernel BUG.

The length check is moved after the SG_GAPS check so that check doesn't
need to be duplicated in the split case.

Fixes: d3805611130af9b911e908af9f67a3f64f4f0914
bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=110231

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 block/blk-merge.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index e73846a..e886a7d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -81,9 +81,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	struct bio *new = NULL;
 
 	bio_for_each_segment(bv, bio, iter) {
-		if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector))
-			goto split;
-
 		/*
 		 * If the queue doesn't support SG gaps and adding this
 		 * offset would create a gap, disallow it.
@@ -91,6 +88,17 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
 			goto split;
 
+		if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector)) {
+			/*
+			 * Consider this a new segment if we're taking any part
+			 * of this vector.
+			 */
+			if (sectors < blk_max_size_offset(q, bio->bi_iter.bi_sector))
+				++nsegs;
+			sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector);
+			goto split;
+		}
+
 		if (bvprvp && blk_queue_cluster(q)) {
 			if (seg_size + bv.bv_len > queue_max_segment_size(q))
 				goto new_segment;
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-04 18:24 [PATCH for-4.4] block: split bios to max possible length Keith Busch
@ 2016-01-05  4:54 ` Ming Lei
  2016-01-05 15:09   ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2016-01-05  4:54 UTC (permalink / raw)


On Tue, Jan 5, 2016@2:24 AM, Keith Busch <keith.busch@intel.com> wrote:
> This allows bio splits in the middle of a vector to form the largest

Wrt. the current block stack, one segment always hold one or more bvec,
instead of part of bvec, so better to be careful about this handling.

> possible bio at the h/w's desired alignment, and guarantees the bio being
> split will have some data. Previously, if the first vector's length was
> greater than the allowable amount, the bio would split at a zero length
> and hit a kernel BUG.

That is introduced by d3805611130a, and zero length can't be splitted
previously because queue_max_sectors() is at least one PAGE_SIZE.

>
> The length check is moved after the SG_GAPS check so that check doesn't
> need to be duplicated in the split case.
>
> Fixes: d3805611130af9b911e908af9f67a3f64f4f0914
> bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=110231
>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>  block/blk-merge.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index e73846a..e886a7d 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -81,9 +81,6 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>         struct bio *new = NULL;
>
>         bio_for_each_segment(bv, bio, iter) {
> -               if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector))
> -                       goto split;
> -
>                 /*
>                  * If the queue doesn't support SG gaps and adding this
>                  * offset would create a gap, disallow it.
> @@ -91,6 +88,17 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>                 if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
>                         goto split;
>
> +               if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector)) {
> +                       /*
> +                        * Consider this a new segment if we're taking any part
> +                        * of this vector.
> +                        */
> +                       if (sectors < blk_max_size_offset(q, bio->bi_iter.bi_sector))
> +                               ++nsegs;

If segment count is increased, 'goto new_segment' should have been
run to update other variables(seg_size, bvprvp, ...).

> +                       sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector);
> +                       goto split;
> +               }
> +
>                 if (bvprvp && blk_queue_cluster(q)) {
>                         if (seg_size + bv.bv_len > queue_max_segment_size(q))
>                                 goto new_segment;
> --
> 2.6.2.307.g37023ba
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-05  4:54 ` Ming Lei
@ 2016-01-05 15:09   ` Keith Busch
  2016-01-06  2:17     ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2016-01-05 15:09 UTC (permalink / raw)


On Tue, Jan 05, 2016@12:54:53PM +0800, Ming Lei wrote:
> On Tue, Jan 5, 2016@2:24 AM, Keith Busch <keith.busch@intel.com> wrote:
> > This allows bio splits in the middle of a vector to form the largest
> 
> Wrt. the current block stack, one segment always hold one or more bvec,
> instead of part of bvec, so better to be careful about this handling.

Hi Ming,

Could you help me understand your concern here? If we split a vector
somewhere in the middle, it becomes two different bvecs. The first is
the last segment in the first bio, the second is the first segment in
the split bio, right?

It's not necessarily a new segment if it is physically contiguous with
the previous (if it exists at all), but duplicating the logic to coalesce
addresses doesn't seem to be worth that optimization.
 
> > possible bio at the h/w's desired alignment, and guarantees the bio being
> > split will have some data. Previously, if the first vector's length was
> > greater than the allowable amount, the bio would split at a zero length
> > and hit a kernel BUG.
> 
> That is introduced by d3805611130a, and zero length can't be splitted
> previously because queue_max_sectors() is at least one PAGE_SIZE.

Can a bvec's length exceed a PAGE_SIZE? They point to pages, so I
suppose not.

But it should be more efficient to split to the largest allowed by the
hardware. We can contrive a scenario where a bio would be split many
times more than necessary without this patch. Let's say queue_max_sectors
is a PAGE_SIZE, and we want to submit '2 * PAGE_SIZE' worth of data
addressed in 3 bvecs. Previously that would split three times; now it
will split only twice.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-05 15:09   ` Keith Busch
@ 2016-01-06  2:17     ` Ming Lei
  2016-01-06  5:51       ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2016-01-06  2:17 UTC (permalink / raw)


Hi Keith,

On Tue, Jan 5, 2016@11:09 PM, Keith Busch <keith.busch@intel.com> wrote:
> On Tue, Jan 05, 2016@12:54:53PM +0800, Ming Lei wrote:
>> On Tue, Jan 5, 2016@2:24 AM, Keith Busch <keith.busch@intel.com> wrote:
>> > This allows bio splits in the middle of a vector to form the largest
>>
>> Wrt. the current block stack, one segment always hold one or more bvec,
>> instead of part of bvec, so better to be careful about this handling.
>
> Hi Ming,
>
> Could you help me understand your concern here? If we split a vector
> somewhere in the middle, it becomes two different bvecs. The first is
> the last segment in the first bio, the second is the first segment in
> the split bio, right?

Firstly we didn't split one single bio vector before bio splitting.

Secondly, current bio split still doesn't support to split one single
bvec into two, and it just makes the two bios shared the original
bvec table, please see bio_split(), which calls bio_clone_fast()
to do that, and the bvec table has been immutable at that time.

>
> It's not necessarily a new segment if it is physically contiguous with
> the previous (if it exists at all), but duplicating the logic to coalesce
> addresses doesn't seem to be worth that optimization.

I understand your motivation in the two patches, actually before bio splitting,
we don't do sg merge for nvme because of the flag of NO_SG_MERGE,
which is ignored after bio splitting is introduced. So could you check if
the nvme performance can be good by putting NO_SG_MERGE back
in blk_bio_segment_split()? And the change should be simple, like the
attached patch.

>
>> > possible bio at the h/w's desired alignment, and guarantees the bio being
>> > split will have some data. Previously, if the first vector's length was
>> > greater than the allowable amount, the bio would split at a zero length
>> > and hit a kernel BUG.
>>
>> That is introduced by d3805611130a, and zero length can't be splitted
>> previously because queue_max_sectors() is at least one PAGE_SIZE.
>
> Can a bvec's length exceed a PAGE_SIZE? They point to pages, so I
> suppose not.

No, it doesn't, but blk_max_size_offset() may be less than PAGE_SIZE,
then zero splitting is triggered.

>
> But it should be more efficient to split to the largest allowed by the
> hardware. We can contrive a scenario where a bio would be split many

Previously Jens took the opposite approach to make each bvec
as one segment, and he mentioned performance is increased.

> times more than necessary without this patch. Let's say queue_max_sectors
> is a PAGE_SIZE, and we want to submit '2 * PAGE_SIZE' worth of data
> addressed in 3 bvecs. Previously that would split three times; now it
> will split only twice.

IMO, splitting is quite cheap, or we still can increase the limit of
queue_max_sectors() to the hardware allowed value.


-- 
Ming Lei
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blk-no-sg-merge.patch
Type: text/x-patch
Size: 642 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20160106/a03321cd/attachment.bin>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06  2:17     ` Ming Lei
@ 2016-01-06  5:51       ` Keith Busch
  2016-01-06  6:29         ` Ming Lei
  2016-01-06  9:46         ` Ming Lei
  0 siblings, 2 replies; 15+ messages in thread
From: Keith Busch @ 2016-01-06  5:51 UTC (permalink / raw)


On Wed, Jan 06, 2016@10:17:51AM +0800, Ming Lei wrote:
> Firstly we didn't split one single bio vector before bio splitting.
>
> Secondly, current bio split still doesn't support to split one single
> bvec into two, and it just makes the two bios shared the original
> bvec table, please see bio_split(), which calls bio_clone_fast()
> to do that, and the bvec table has been immutable at that time.

You are saying we can't split a bio in the middle of a vector?
bvec_iter_advance() says we can split anywhere.
 
> I understand your motivation in the two patches, actually before bio splitting,
> we don't do sg merge for nvme because of the flag of NO_SG_MERGE,
> which is ignored after bio splitting is introduced. So could you check if
> the nvme performance can be good by putting NO_SG_MERGE back
> in blk_bio_segment_split()? And the change should be simple, like the
> attached patch.

The nvme driver is okay to take physically merged pages. It splits them
into PRPs accordingly, and it's faster to DMA map physically contiguous
just because there are fewer segments to iterate, so NVMe would prefer
to let them coalesce.
 
> IMO, splitting is quite cheap,

Splitting is absolutely not cheap if your media is fast enough. I measure
every % of increased CPU instructions as the same % decrease in IOPs
with 3DXpoint.

But this patch was to split on stripe boundaries, which is an even worse
penalty if we get the split wrong.

> +	bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags);
>  
>  	bio_for_each_segment(bv, bio, iter) {
> -		if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector))
> +		if (no_sg_merge)
> +			goto new_segment;

Bad idea for NVMe. We need to split on SG gaps, which you've skipped,
or you will BUG_ON in the NVMe driver.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06  5:51       ` Keith Busch
@ 2016-01-06  6:29         ` Ming Lei
  2016-01-06  6:53           ` Ming Lei
  2016-01-06 15:29           ` Keith Busch
  2016-01-06  9:46         ` Ming Lei
  1 sibling, 2 replies; 15+ messages in thread
From: Ming Lei @ 2016-01-06  6:29 UTC (permalink / raw)


On Wed, Jan 6, 2016@1:51 PM, Keith Busch <keith.busch@intel.com> wrote:
> On Wed, Jan 06, 2016@10:17:51AM +0800, Ming Lei wrote:
>> Firstly we didn't split one single bio vector before bio splitting.
>>
>> Secondly, current bio split still doesn't support to split one single
>> bvec into two, and it just makes the two bios shared the original
>> bvec table, please see bio_split(), which calls bio_clone_fast()
>> to do that, and the bvec table has been immutable at that time.
>
> You are saying we can't split a bio in the middle of a vector?

I mean the current block stack may not be ready for that.

> bvec_iter_advance() says we can split anywhere.

bvec_iter_advance() can split anywhere, but the splitted bios may
cross one same bvec, which may cause trouble, for example,
BIOVEC_PHYS_MERGEABLE() may not work well and
blk_phys_contig_segment() too.

Also not mention your patch doesn't update front_seg_size/back_seg_size/
seg_size correctly.

That is why I said we should be careful about splitting bvec because it
is never used or supported before.

>
>> I understand your motivation in the two patches, actually before bio splitting,
>> we don't do sg merge for nvme because of the flag of NO_SG_MERGE,
>> which is ignored after bio splitting is introduced. So could you check if
>> the nvme performance can be good by putting NO_SG_MERGE back
>> in blk_bio_segment_split()? And the change should be simple, like the
>> attached patch.
>
> The nvme driver is okay to take physically merged pages. It splits them
> into PRPs accordingly, and it's faster to DMA map physically contiguous
> just because there are fewer segments to iterate, so NVMe would prefer
> to let them coalesce.

If so, NO_SG_MERGE should be removed now.

>
>> IMO, splitting is quite cheap,
>
> Splitting is absolutely not cheap if your media is fast enough. I measure
> every % of increased CPU instructions as the same % decrease in IOPs
> with 3DXpoint.

Could you share your test case? Last time I use null_blk to observe
the performance difference between NO_SG_MERGE vs. non-NO_SG_MERGE,
and basically no difference is observed.

>
> But this patch was to split on stripe boundaries, which is an even worse
> penalty if we get the split wrong.
>
>> +     bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags);
>>
>>       bio_for_each_segment(bv, bio, iter) {
>> -             if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector))
>> +             if (no_sg_merge)
>> +                     goto new_segment;
>
> Bad idea for NVMe. We need to split on SG gaps, which you've skipped,
> or you will BUG_ON in the NVMe driver.

I don't object to the idea of split on SG gap, and I mean we should make
sure the implementation is correct.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06  6:29         ` Ming Lei
@ 2016-01-06  6:53           ` Ming Lei
  2016-01-06 15:18             ` Keith Busch
  2016-01-06 15:29           ` Keith Busch
  1 sibling, 1 reply; 15+ messages in thread
From: Ming Lei @ 2016-01-06  6:53 UTC (permalink / raw)


On Wed, Jan 6, 2016@2:29 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Wed, Jan 6, 2016@1:51 PM, Keith Busch <keith.busch@intel.com> wrote:
>> On Wed, Jan 06, 2016@10:17:51AM +0800, Ming Lei wrote:
>>> Firstly we didn't split one single bio vector before bio splitting.
>>>
>>> Secondly, current bio split still doesn't support to split one single
>>> bvec into two, and it just makes the two bios shared the original
>>> bvec table, please see bio_split(), which calls bio_clone_fast()
>>> to do that, and the bvec table has been immutable at that time.
>>
>> You are saying we can't split a bio in the middle of a vector?
>
> I mean the current block stack may not be ready for that.
>
>> bvec_iter_advance() says we can split anywhere.
>
> bvec_iter_advance() can split anywhere, but the splitted bios may
> cross one same bvec, which may cause trouble, for example,
> BIOVEC_PHYS_MERGEABLE() may not work well and
> blk_phys_contig_segment() too.

blk_rq_map_sg() isn't ready for splitting in the middle of bvec too.

Thanks,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06  5:51       ` Keith Busch
  2016-01-06  6:29         ` Ming Lei
@ 2016-01-06  9:46         ` Ming Lei
  1 sibling, 0 replies; 15+ messages in thread
From: Ming Lei @ 2016-01-06  9:46 UTC (permalink / raw)


On Wed, 6 Jan 2016 05:51:34 +0000
Keith Busch <keith.busch@intel.com> wrote:

> On Wed, Jan 06, 2016@10:17:51AM +0800, Ming Lei wrote:
> 
> But this patch was to split on stripe boundaries, which is an even worse
> penalty if we get the split wrong.
> 
> > +	bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags);
> >  
> >  	bio_for_each_segment(bv, bio, iter) {
> > -		if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector))
> > +		if (no_sg_merge)
> > +			goto new_segment;
> 
> Bad idea for NVMe. We need to split on SG gaps, which you've skipped,
> or you will BUG_ON in the NVMe driver.

Given it is close to v4.4 release, I propose the following fix instead
of the bvec splitting approach, which is not ready in current block stack.

-----
 block/blk-merge.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index e73846a..9ffe431 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -81,8 +81,16 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	struct bio *new = NULL;
 
 	bio_for_each_segment(bv, bio, iter) {
-		if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector))
+		if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q,
+					bio->bi_iter.bi_sector)) {
+			/*
+			 * avoid to split out zero length bio if size to
+			 * chunk boundary is too small
+			 */
+			if (!sectors)
+				goto new_segment;
 			goto split;
+		}
 
 		/*
 		 * If the queue doesn't support SG gaps and adding this
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06  6:53           ` Ming Lei
@ 2016-01-06 15:18             ` Keith Busch
  2016-01-06 15:43               ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2016-01-06 15:18 UTC (permalink / raw)


On Wed, Jan 06, 2016@02:53:22PM +0800, Ming Lei wrote:
> On Wed, Jan 6, 2016@2:29 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> >
> > bvec_iter_advance() can split anywhere, but the splitted bios may
> > cross one same bvec, which may cause trouble, 

Cross? One starts where the other starts. Could you please explain what's
wrong?

> > for example,
> > BIOVEC_PHYS_MERGEABLE() may not work well and
> > blk_phys_contig_segment() too.

Could you please explain why it may not work well? I have no idea what
concern you have with BIOVEC_PHYS_MERGEABLE in this case.

The only place blk_phys_contig_segment is called from is
ll_merge_requests, and the first req of the split bio wouldn't even go
through here because it's marked REQ_NOMERGE.

> blk_rq_map_sg() isn't ready for splitting in the middle of bvec too.

Could you please explain? It just uses the bio iterator, which I'm pretty
sure is not broken.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06  6:29         ` Ming Lei
  2016-01-06  6:53           ` Ming Lei
@ 2016-01-06 15:29           ` Keith Busch
  1 sibling, 0 replies; 15+ messages in thread
From: Keith Busch @ 2016-01-06 15:29 UTC (permalink / raw)


On Wed, Jan 06, 2016@02:29:16PM +0800, Ming Lei wrote:
> Also not mention your patch doesn't update front_seg_size/back_seg_size/
> seg_size correctly.
> 
> That is why I said we should be careful about splitting bvec because it
> is never used or supported before.

Ok, that's an easy fix, but it's not really a functional difference. The
request is not mergeable after this, which is the only use for back and
front seg sizes.

> > The nvme driver is okay to take physically merged pages. It splits them
> > into PRPs accordingly, and it's faster to DMA map physically contiguous
> > just because there are fewer segments to iterate, so NVMe would prefer
> > to let them coalesce.
> 
> If so, NO_SG_MERGE should be removed now.
 
Sounds good.

> Could you share your test case? Last time I use null_blk to observe
> the performance difference between NO_SG_MERGE vs. non-NO_SG_MERGE,
> and basically no difference is observed.

null_blk doesn't do anything, so it's probably difficult to measure
that. Use a driver that DMA maps the data, uses MMIO to submit to h/w
and takes an interrupt. You can measure all these when you create 50%
more commands than necessary with unneeded splits.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06 15:18             ` Keith Busch
@ 2016-01-06 15:43               ` Ming Lei
  2016-01-06 16:21                 ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2016-01-06 15:43 UTC (permalink / raw)


On Wed, Jan 6, 2016@11:18 PM, Keith Busch <keith.busch@intel.com> wrote:
> On Wed, Jan 06, 2016@02:53:22PM +0800, Ming Lei wrote:
>> On Wed, Jan 6, 2016@2:29 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> >
>> > bvec_iter_advance() can split anywhere, but the splitted bios may
>> > cross one same bvec, which may cause trouble,
>
> Cross? One starts where the other starts. Could you please explain what's
> wrong?

Suppose bio is splitted as bio1 and bio2, then the last bvec of bio1 is same
with the first bvec of bio2.

>
>> > for example,
>> > BIOVEC_PHYS_MERGEABLE() may not work well and
>> > blk_phys_contig_segment() too.
>
> Could you please explain why it may not work well? I have no idea what
> concern you have with BIOVEC_PHYS_MERGEABLE in this case.
>
> The only place blk_phys_contig_segment is called from is
> ll_merge_requests, and the first req of the split bio wouldn't even go
> through here because it's marked REQ_NOMERGE.

OK, looks I forget this change.

>
>> blk_rq_map_sg() isn't ready for splitting in the middle of bvec too.
>
> Could you please explain? It just uses the bio iterator, which I'm pretty
> sure is not broken.

Please see the 1st line code of __blk_segment_map_sg(), in which only
one whole bvec is handled, and partial bvec can't be figured out there.

Think of it further, drivers often use bv.bv_len directly in the
iterator, for example:

        bio_for_each_segment(bvec, bio, iter)
                memcpy(page_address(bvec.bv_page) +
                                            bvec.bv_offset,  addr +
offset, bvec.bv_len);

So your patch will break these drivers, won't it?

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06 15:43               ` Ming Lei
@ 2016-01-06 16:21                 ` Keith Busch
  2016-01-07  0:14                   ` Ming Lei
  2016-01-07 10:46                   ` Kent Overstreet
  0 siblings, 2 replies; 15+ messages in thread
From: Keith Busch @ 2016-01-06 16:21 UTC (permalink / raw)


On Wed, Jan 06, 2016@11:43:45PM +0800, Ming Lei wrote:
> Please see the 1st line code of __blk_segment_map_sg(), in which only
> one whole bvec is handled, and partial bvec can't be figured out there.
> 
> Think of it further, drivers often use bv.bv_len directly in the
> iterator, for example:
> 
>         bio_for_each_segment(bvec, bio, iter)
>                 memcpy(page_address(bvec.bv_page) +
>                                             bvec.bv_offset,  addr +
> offset, bvec.bv_len);
> 
> So your patch will break these drivers, won't it?

CC'ing Kent in hopes he will clarify what happens on a split.

The bio_advance() code comments say it's handled:

"
 * This updates bi_sector, bi_size and bi_idx; if the number of bytes to
 * complete doesn't align with a bvec boundary, then bv_len and bv_offset will
 * be updated on the last bvec as well.
"

I admit I'm having a hard time seeing where bv_len and bv_offset updated
in this path. It was obviously handled after 054bdf646e then changed
with 4550dd6c6b.

If I follow correctly, 4550dd6c6b will implicity update the bvec's offset
and length during the split here since bio_iter_iovec resets the bvec's
length and offset:
---
#define __bio_for_each_segment(bvl, bio, iter, start)			\
	for (iter = (start);						\
	     (iter).bi_size &&						\
		((bvl = bio_iter_iovec((bio), (iter))), 1);		\
	     bio_advance_iter((bio), &(iter), (bvl).bv_len))
--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06 16:21                 ` Keith Busch
@ 2016-01-07  0:14                   ` Ming Lei
  2016-01-07 10:46                   ` Kent Overstreet
  1 sibling, 0 replies; 15+ messages in thread
From: Ming Lei @ 2016-01-07  0:14 UTC (permalink / raw)


On Thu, Jan 7, 2016@12:21 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Wed, Jan 06, 2016@11:43:45PM +0800, Ming Lei wrote:
>> Please see the 1st line code of __blk_segment_map_sg(), in which only
>> one whole bvec is handled, and partial bvec can't be figured out there.
>>
>> Think of it further, drivers often use bv.bv_len directly in the
>> iterator, for example:
>>
>>         bio_for_each_segment(bvec, bio, iter)
>>                 memcpy(page_address(bvec.bv_page) +
>>                                             bvec.bv_offset,  addr +
>> offset, bvec.bv_len);
>>
>> So your patch will break these drivers, won't it?
>
> CC'ing Kent in hopes he will clarify what happens on a split.
>
> The bio_advance() code comments say it's handled:
>
> "
>  * This updates bi_sector, bi_size and bi_idx; if the number of bytes to
>  * complete doesn't align with a bvec boundary, then bv_len and bv_offset will
>  * be updated on the last bvec as well.
> "

One exception is that both can be reseted in bio_clone_bioset().

>
> I admit I'm having a hard time seeing where bv_len and bv_offset updated
> in this path. It was obviously handled after 054bdf646e then changed
> with 4550dd6c6b.
>
> If I follow correctly, 4550dd6c6b will implicity update the bvec's offset
> and length during the split here since bio_iter_iovec resets the bvec's
> length and offset:
> ---
> #define __bio_for_each_segment(bvl, bio, iter, start)                   \
>         for (iter = (start);                                            \
>              (iter).bi_size &&                                          \
>                 ((bvl = bio_iter_iovec((bio), (iter))), 1);             \
>              bio_advance_iter((bio), &(iter), (bvl).bv_len))
> --



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-06 16:21                 ` Keith Busch
  2016-01-07  0:14                   ` Ming Lei
@ 2016-01-07 10:46                   ` Kent Overstreet
  2016-01-09 11:10                     ` Ming Lei
  1 sibling, 1 reply; 15+ messages in thread
From: Kent Overstreet @ 2016-01-07 10:46 UTC (permalink / raw)


On Wed, Jan 06, 2016@04:21:17PM +0000, Keith Busch wrote:
> On Wed, Jan 06, 2016@11:43:45PM +0800, Ming Lei wrote:
> > Please see the 1st line code of __blk_segment_map_sg(), in which only
> > one whole bvec is handled, and partial bvec can't be figured out there.
> > 
> > Think of it further, drivers often use bv.bv_len directly in the
> > iterator, for example:
> > 
> >         bio_for_each_segment(bvec, bio, iter)
> >                 memcpy(page_address(bvec.bv_page) +
> >                                             bvec.bv_offset,  addr +
> > offset, bvec.bv_len);
> > 
> > So your patch will break these drivers, won't it?
> 
> CC'ing Kent in hopes he will clarify what happens on a split.
> 
> The bio_advance() code comments say it's handled:
> 
> "
>  * This updates bi_sector, bi_size and bi_idx; if the number of bytes to
>  * complete doesn't align with a bvec boundary, then bv_len and bv_offset will
>  * be updated on the last bvec as well.
> "
> 
> I admit I'm having a hard time seeing where bv_len and bv_offset updated
> in this path. It was obviously handled after 054bdf646e then changed
> with 4550dd6c6b.
> 
> If I follow correctly, 4550dd6c6b will implicity update the bvec's offset
> and length during the split here since bio_iter_iovec resets the bvec's
> length and offset:
> ---
> #define __bio_for_each_segment(bvl, bio, iter, start)			\
> 	for (iter = (start);						\
> 	     (iter).bi_size &&						\
> 		((bvl = bio_iter_iovec((bio), (iter))), 1);		\
> 	     bio_advance_iter((bio), &(iter), (bvl).bv_len))
> --

Yes, splitting in the middle of a bvec is perfectly fine. The reason
bio_for_each_segment takes a struct bvec and not a struct bvec * is because it's
computing what bv_len should be (taking the min of bv_len and bi_size, roughly).

See include/linux/bio.h:

bio_for_each_segment()
  bio_iter_iovec()
    bvec_iter_bvec()
      bvec_iter_len()

which does the actual bv_len computation.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH for-4.4] block: split bios to max possible length
  2016-01-07 10:46                   ` Kent Overstreet
@ 2016-01-09 11:10                     ` Ming Lei
  0 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2016-01-09 11:10 UTC (permalink / raw)


On Thu, Jan 7, 2016 at 6:46 PM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Wed, Jan 06, 2016@04:21:17PM +0000, Keith Busch wrote:
...
> Yes, splitting in the middle of a bvec is perfectly fine. The reason
> bio_for_each_segment takes a struct bvec and not a struct bvec * is because it's
> computing what bv_len should be (taking the min of bv_len and bi_size, roughly).
>
> See include/linux/bio.h:
>
> bio_for_each_segment()
>   bio_iter_iovec()
>     bvec_iter_bvec()
>       bvec_iter_len()
>
> which does the actual bv_len computation.

Looks my understanding was wrong, and Keith's patch is correct, sorry
for the noise.

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-01-09 11:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-04 18:24 [PATCH for-4.4] block: split bios to max possible length Keith Busch
2016-01-05  4:54 ` Ming Lei
2016-01-05 15:09   ` Keith Busch
2016-01-06  2:17     ` Ming Lei
2016-01-06  5:51       ` Keith Busch
2016-01-06  6:29         ` Ming Lei
2016-01-06  6:53           ` Ming Lei
2016-01-06 15:18             ` Keith Busch
2016-01-06 15:43               ` Ming Lei
2016-01-06 16:21                 ` Keith Busch
2016-01-07  0:14                   ` Ming Lei
2016-01-07 10:46                   ` Kent Overstreet
2016-01-09 11:10                     ` Ming Lei
2016-01-06 15:29           ` Keith Busch
2016-01-06  9:46         ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.