All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM ATTEND] block: multipage bvecs
@ 2016-02-26 16:33 Ming Lei
  2016-02-28 11:17 ` Boaz Harrosh
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Ming Lei @ 2016-02-26 16:33 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-block, Linux FS Devel

Hi,

I'd like to participate in LSF/MM and discuss multipage bvecs.

Kent posted the idea[1] before, but never pushed out.
I have studied multipage bvecs for a while, and think
it is a good idea to improve block subsystem.

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in current kernel.

IMO, there are several advantages by supporting multipage bvecs:

- currently one bio from bio_alloc() can only hold at most 256
vectors, which means one bio can be used to transfer at most
1Mbytes(256*4K). With multipage bvecs fs can submit bigger
chunk via single bio because big physically contiguous segment
is very common.

- CPU consumed in iterating bvec table should be decreased

- block merge gets simplified a lot, and segment can be merged
just inside bio_add_page(), then singlepage bvec needn't to store
in bvec table, finally the segment can be splitted to driver with
proper size. blk_bio_map_sg() gets simplified too. Recent days,
block merge becomes a bit complicated and we saw quite bug reports/fixes
in block merge.

I'd like to hear opinions from fs guys about multipage bvecs based bio
because this should bring up some change to the bio interface(one bio
will represent bigger I/O than before).

Also I hope to discuss with guys in fs, dm, md, bcache... about
the implementation because this feature will bring changes on
these subsystems. So far, I have the following ideas:

1) change on bio_for_each_segment()

bvec returned from this iterator helper need to keep as singlepage
vector as before, so most users of bio iterator don't need change

2) change on bio_for_each_segment_all()

bio_for_each_segment_all() has to be changed because callers may
change the bvec and assume it is always singlepage now.

I think bio_for_each_segment_all() need to be splitted into
bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().

Both two new helpers returns pointer to bio_bvec like before.

*_rd() is used to iterate each vector for reading the pointed bvec,
and caller can not write to this vector. This helper can still
return singlepage bvec like before, so one extra local/temp 'bio_bvec'
variable has to be added for conversion from multipage bvec to
singlepage bvec.

*_wt() is used to iterate each vector for changing the bvec, and
only allowed for iterating bio with singlepage bvecs, there are
just several such cases, such as bio bounce, bio_alloc_pages(),
raid1 and raid10.

3) change bvecs of cloned bio
Such as bio bounce and raid1, one bio is cloned from the incoming
bio, and each bvec of the cloned bio may be updated. We have to
introduce singlepage version of bio_clone() to make the cloned bio
only include singlepage bvec, then the bvecs can be updated like
before.

One problem is that the cloned bio may not hold all singlepage bvec
converted from multipage bvecs in the source bio, and one simple
solution is to split the source bio and make sure its size can't be
bigger than 1Mbytes(256 single page vectors).

4) introduce bio_for_each_mp_segment()

bvec returned from this iterator helper will become multipage bvec
which should be the actual/real segment, so drivers may switch to
this helper if they can handle multipage segment directly, which
should be common case.


[1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [LSF/MM ATTEND] block: multipage bvecs
  2016-02-26 16:33 [LSF/MM ATTEND] block: multipage bvecs Ming Lei
@ 2016-02-28 11:17 ` Boaz Harrosh
  2016-02-28 14:34   ` Ming Lei
                     ` (2 more replies)
  2016-02-28 16:07 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 24+ messages in thread
From: Boaz Harrosh @ 2016-02-28 11:17 UTC (permalink / raw)
  To: Ming Lei, lsf-pc; +Cc: linux-block, Linux FS Devel

On 02/26/2016 06:33 PM, Ming Lei wrote:
> Hi,
> 
> I'd like to participate in LSF/MM and discuss multipage bvecs.
> 
> Kent posted the idea[1] before, but never pushed out.
> I have studied multipage bvecs for a while, and think
> it is a good idea to improve block subsystem.
> 
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in current kernel.
> 

Hi Ming Lei

This is an interesting talk for me.

I don't know if you ever tried it but I did. If I take a regular
SSD disk or a PCIE flash card that I have in my machine and
I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
submit_bio, I get 8 pages worth of IO with a single bvec and it
all just works.

Yes Yes I know it would break bunch of other places, probably
the single bvec case works better. But just to say that current
code is not that picky in assuming a single page size.

I would like to see an audit and test cases done in this regard
but to keep current API and make this transparent. I think
that all the below places you mentioned can be made transparent
to "big bvec" if coded carefully, and there need not be a separate
API for multi-page / single-page bvecs. It should all just work.
I might be wrong, have not looked at this deeply, but is my gut
feeling, that it can be possible.

Thanks for bringing up the issue
Boaz

> IMO, there are several advantages by supporting multipage bvecs:
> 
> - currently one bio from bio_alloc() can only hold at most 256
> vectors, which means one bio can be used to transfer at most
> 1Mbytes(256*4K). With multipage bvecs fs can submit bigger
> chunk via single bio because big physically contiguous segment
> is very common.
> 
> - CPU consumed in iterating bvec table should be decreased
> 
> - block merge gets simplified a lot, and segment can be merged
> just inside bio_add_page(), then singlepage bvec needn't to store
> in bvec table, finally the segment can be splitted to driver with
> proper size. blk_bio_map_sg() gets simplified too. Recent days,
> block merge becomes a bit complicated and we saw quite bug reports/fixes
> in block merge.
> 
> I'd like to hear opinions from fs guys about multipage bvecs based bio
> because this should bring up some change to the bio interface(one bio
> will represent bigger I/O than before).
> 
> Also I hope to discuss with guys in fs, dm, md, bcache... about
> the implementation because this feature will bring changes on
> these subsystems. So far, I have the following ideas:
> 
> 1) change on bio_for_each_segment()
> 
> bvec returned from this iterator helper need to keep as singlepage
> vector as before, so most users of bio iterator don't need change
> 
> 2) change on bio_for_each_segment_all()
> 
> bio_for_each_segment_all() has to be changed because callers may
> change the bvec and assume it is always singlepage now.
> 
> I think bio_for_each_segment_all() need to be splitted into
> bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().
> 
> Both two new helpers returns pointer to bio_bvec like before.
> 
> *_rd() is used to iterate each vector for reading the pointed bvec,
> and caller can not write to this vector. This helper can still
> return singlepage bvec like before, so one extra local/temp 'bio_bvec'
> variable has to be added for conversion from multipage bvec to
> singlepage bvec.
> 
> *_wt() is used to iterate each vector for changing the bvec, and
> only allowed for iterating bio with singlepage bvecs, there are
> just several such cases, such as bio bounce, bio_alloc_pages(),
> raid1 and raid10.
> 
> 3) change bvecs of cloned bio
> Such as bio bounce and raid1, one bio is cloned from the incoming
> bio, and each bvec of the cloned bio may be updated. We have to
> introduce singlepage version of bio_clone() to make the cloned bio
> only include singlepage bvec, then the bvecs can be updated like
> before.
> 
> One problem is that the cloned bio may not hold all singlepage bvec
> converted from multipage bvecs in the source bio, and one simple
> solution is to split the source bio and make sure its size can't be
> bigger than 1Mbytes(256 single page vectors).
> 
> 4) introduce bio_for_each_mp_segment()
> 
> bvec returned from this iterator helper will become multipage bvec
> which should be the actual/real segment, so drivers may switch to
> this helper if they can handle multipage segment directly, which
> should be common case.
> 
> 
> [1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2
> 
> Thanks,
> Ming Lei
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 11:17 ` Boaz Harrosh
@ 2016-02-28 14:34   ` Ming Lei
  2016-02-28 14:41     ` Ming Lei
  2016-02-28 16:01   ` [Lsf-pc] " James Bottomley
  2016-02-28 16:08   ` Christoph Hellwig
  2 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2016-02-28 14:34 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: lsf-pc, linux-block, Linux FS Devel

Hi Boaz,

It is nice to see you are interested in this topic, :-)

On Sun, Feb 28, 2016 at 7:17 PM, Boaz Harrosh <boaz@plexistor.com> wrote:
> On 02/26/2016 06:33 PM, Ming Lei wrote:
>> Hi,
>>
>> I'd like to participate in LSF/MM and discuss multipage bvecs.
>>
>> Kent posted the idea[1] before, but never pushed out.
>> I have studied multipage bvecs for a while, and think
>> it is a good idea to improve block subsystem.
>>
>> Multipage bvecs means that one 'struct bio_bvec' can hold
>> multiple pages which are physically contiguous instead
>> of one single page used in current kernel.
>>
>
> Hi Ming Lei
>
> This is an interesting talk for me.
>
> I don't know if you ever tried it but I did. If I take a regular
> SSD disk or a PCIE flash card that I have in my machine and
> I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
> submit_bio, I get 8 pages worth of IO with a single bvec and it
> all just works.

I think bio_for_each_segment() isn't ready yet, at least .bv_page
isn't updated, but it isn't difficult to support.

Also the thing is that there are lots of singlepage assumption
in kernel, such as updating .bv_page of 'bvev' from
bio_for_each_segment_all().

>
> Yes Yes I know it would break bunch of other places, probably
> the single bvec case works better. But just to say that current
> code is not that picky in assuming a single page size.

I agree, at least bio_bvec is defined as so.

>
> I would like to see an audit and test cases done in this regard
> but to keep current API and make this transparent. I think

Yeah, that is one of my goal  to make this transparent wrt. API,
and introduce as few as change to drivers & fs.

But in situation of bio_clone() & updating bvec, the cloned bio
may has to be splitted into singlebase bvec, then the source
bio can't be very big. So the callers has to be aware of
this story.

> that all the below places you mentioned can be made transparent
> to "big bvec" if coded carefully, and there need not be a separate
> API for multi-page / single-page bvecs. It should all just work.
> I might be wrong, have not looked at this deeply, but is my gut
> feeling, that it can be possible.

Yeah, so far, I think we can do that, :-)

This work may bring up change on fs, dm, bcache and raid code,
and that is why I propose the topic and hope we can talk with guys
in these subsystems.

>
> Thanks for bringing up the issue

You are welcome!

Thanks,
Ming

> Boaz
>
>> IMO, there are several advantages by supporting multipage bvecs:
>>
>> - currently one bio from bio_alloc() can only hold at most 256
>> vectors, which means one bio can be used to transfer at most
>> 1Mbytes(256*4K). With multipage bvecs fs can submit bigger
>> chunk via single bio because big physically contiguous segment
>> is very common.
>>
>> - CPU consumed in iterating bvec table should be decreased
>>
>> - block merge gets simplified a lot, and segment can be merged
>> just inside bio_add_page(), then singlepage bvec needn't to store
>> in bvec table, finally the segment can be splitted to driver with
>> proper size. blk_bio_map_sg() gets simplified too. Recent days,
>> block merge becomes a bit complicated and we saw quite bug reports/fixes
>> in block merge.
>>
>> I'd like to hear opinions from fs guys about multipage bvecs based bio
>> because this should bring up some change to the bio interface(one bio
>> will represent bigger I/O than before).
>>
>> Also I hope to discuss with guys in fs, dm, md, bcache... about
>> the implementation because this feature will bring changes on
>> these subsystems. So far, I have the following ideas:
>>
>> 1) change on bio_for_each_segment()
>>
>> bvec returned from this iterator helper need to keep as singlepage
>> vector as before, so most users of bio iterator don't need change
>>
>> 2) change on bio_for_each_segment_all()
>>
>> bio_for_each_segment_all() has to be changed because callers may
>> change the bvec and assume it is always singlepage now.
>>
>> I think bio_for_each_segment_all() need to be splitted into
>> bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().
>>
>> Both two new helpers returns pointer to bio_bvec like before.
>>
>> *_rd() is used to iterate each vector for reading the pointed bvec,
>> and caller can not write to this vector. This helper can still
>> return singlepage bvec like before, so one extra local/temp 'bio_bvec'
>> variable has to be added for conversion from multipage bvec to
>> singlepage bvec.
>>
>> *_wt() is used to iterate each vector for changing the bvec, and
>> only allowed for iterating bio with singlepage bvecs, there are
>> just several such cases, such as bio bounce, bio_alloc_pages(),
>> raid1 and raid10.
>>
>> 3) change bvecs of cloned bio
>> Such as bio bounce and raid1, one bio is cloned from the incoming
>> bio, and each bvec of the cloned bio may be updated. We have to
>> introduce singlepage version of bio_clone() to make the cloned bio
>> only include singlepage bvec, then the bvecs can be updated like
>> before.
>>
>> One problem is that the cloned bio may not hold all singlepage bvec
>> converted from multipage bvecs in the source bio, and one simple
>> solution is to split the source bio and make sure its size can't be
>> bigger than 1Mbytes(256 single page vectors).
>>
>> 4) introduce bio_for_each_mp_segment()
>>
>> bvec returned from this iterator helper will become multipage bvec
>> which should be the actual/real segment, so drivers may switch to
>> this helper if they can handle multipage segment directly, which
>> should be common case.
>>
>>
>> [1] http://marc.info/?l=linux-kernel&m=141680246629547&w=2
>>
>> Thanks,
>> Ming Lei
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 14:34   ` Ming Lei
@ 2016-02-28 14:41     ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2016-02-28 14:41 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: lsf-pc, linux-block, Linux FS Devel

On Sun, Feb 28, 2016 at 10:34 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> I would like to see an audit and test cases done in this regard
>> but to keep current API and make this transparent. I think
>
> Yeah, that is one of my goal  to make this transparent wrt. API,
> and introduce as few as change to drivers & fs.

Forget to mention, bio_for_each_segment_all() has to be
changed, but IMO this helper should have been used
as less as possible by fs and drivers.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 11:17 ` Boaz Harrosh
  2016-02-28 14:34   ` Ming Lei
@ 2016-02-28 16:01   ` James Bottomley
  2016-02-29  9:41     ` Boaz Harrosh
  2016-02-28 16:08   ` Christoph Hellwig
  2 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2016-02-28 16:01 UTC (permalink / raw)
  To: Boaz Harrosh, Ming Lei, lsf-pc; +Cc: linux-block, Linux FS Devel

On Sun, 2016-02-28 at 13:17 +0200, Boaz Harrosh wrote:
> On 02/26/2016 06:33 PM, Ming Lei wrote:
> > Hi,
> > 
> > I'd like to participate in LSF/MM and discuss multipage bvecs.
> > 
> > Kent posted the idea[1] before, but never pushed out.
> > I have studied multipage bvecs for a while, and think
> > it is a good idea to improve block subsystem.
> > 
> > Multipage bvecs means that one 'struct bio_bvec' can hold
> > multiple pages which are physically contiguous instead
> > of one single page used in current kernel.
> > 
> 
> Hi Ming Lei
> 
> This is an interesting talk for me.
> 
> I don't know if you ever tried it but I did. If I take a regular
> SSD disk or a PCIE flash card that I have in my machine and
> I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
> submit_bio, I get 8 pages worth of IO with a single bvec and it
> all just works.

No it wouldn't.  There's no check anywhere that a single bv entry is
under the queue max segment size because the assumption is bv_len <=
page size.  If you start sending multi-page single bio vec entries,
you'll violate those assumptions and eventually get an unmappable bio.

James



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-26 16:33 [LSF/MM ATTEND] block: multipage bvecs Ming Lei
  2016-02-28 11:17 ` Boaz Harrosh
@ 2016-02-28 16:07 ` Christoph Hellwig
  2016-02-28 16:26   ` James Bottomley
  2016-03-03  8:58 ` Christoph Hellwig
  2016-03-05  8:35 ` Ming Lei
  3 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2016-02-28 16:07 UTC (permalink / raw)
  To: Ming Lei; +Cc: lsf-pc, linux-block, Linux FS Devel

I've been playing around with Kent's old patches a bit as well,
and I think building up larger bvecs from the start would be
very useful, and it might make sense to get some broader exposure
of the issues around it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 11:17 ` Boaz Harrosh
  2016-02-28 14:34   ` Ming Lei
  2016-02-28 16:01   ` [Lsf-pc] " James Bottomley
@ 2016-02-28 16:08   ` Christoph Hellwig
  2016-02-29 10:16     ` Boaz Harrosh
  2 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2016-02-28 16:08 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Ming Lei, lsf-pc, linux-block, Linux FS Devel

On Sun, Feb 28, 2016 at 01:17:43PM +0200, Boaz Harrosh wrote:
> I don't know if you ever tried it but I did. If I take a regular
> SSD disk or a PCIE flash card that I have in my machine and
> I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
> submit_bio, I get 8 pages worth of IO with a single bvec and it
> all just works.

No, it will break in all kinds of places.  Also you really should
never just setup bvecs yourself, please always use bio_add_page!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 16:07 ` Christoph Hellwig
@ 2016-02-28 16:26   ` James Bottomley
  2016-02-28 16:29     ` Christoph Hellwig
  0 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2016-02-28 16:26 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei; +Cc: lsf-pc, linux-block, Linux FS Devel

On Sun, 2016-02-28 at 08:07 -0800, Christoph Hellwig wrote:
> I've been playing around with Kent's old patches a bit as well,
> and I think building up larger bvecs from the start would be
> very useful, and it might make sense to get some broader exposure
> of the issues around it.

You mean in bio_add_page() the code which currently aggregates chunks
within a page could build a bio vec entry up to the max segment size? 
 I think that is reasonable, especially now the bio splitting code can
actually split inside a bio vec entry.

James



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 16:26   ` James Bottomley
@ 2016-02-28 16:29     ` Christoph Hellwig
  2016-02-28 16:45       ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2016-02-28 16:29 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Ming Lei, lsf-pc, linux-block, Linux FS Devel

On Sun, Feb 28, 2016 at 08:26:46AM -0800, James Bottomley wrote:
> You mean in bio_add_page() the code which currently aggregates chunks
> within a page could build a bio vec entry up to the max segment size? 
>  I think that is reasonable, especially now the bio splitting code can
> actually split inside a bio vec entry.

Yes.  Kent has an old prototype that did this at:

https://evilpiepirate.org/git/linux-bcache.git/log/?h=block_stuff

I don't think any of that is reusable as-is, but the basic idea is
sounds and very useful.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 16:29     ` Christoph Hellwig
@ 2016-02-28 16:45       ` James Bottomley
  2016-02-28 16:59         ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2016-02-28 16:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ming Lei, Linux FS Devel, linux-block, lsf-pc

On Sun, 2016-02-28 at 08:29 -0800, Christoph Hellwig wrote:
> On Sun, Feb 28, 2016 at 08:26:46AM -0800, James Bottomley wrote:
> > You mean in bio_add_page() the code which currently aggregates 
> > chunks within a page could build a bio vec entry up to the max 
> > segment size?  I think that is reasonable, especially now the bio 
> > splitting code can actually split inside a bio vec entry.
> 
> Yes.  Kent has an old prototype that did this at:
> 
> https://evilpiepirate.org/git/linux-bcache.git/log/?h=block_stuff
> 
> I don't think any of that is reusable as-is, but the basic idea is
> sounds and very useful.

The basic idea, yes, but the actual code in that tree would still have
built up bv entries that are too big.  We have to thread bio_add_page()
with knowledge of the queue limits, which is somewhat hard since
they're deliberately queue agnostic.  Perhaps some global minimum queue
segment size would work?

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 16:45       ` James Bottomley
@ 2016-02-28 16:59         ` Ming Lei
  2016-02-28 17:09           ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2016-02-28 16:59 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, Linux FS Devel, linux-block, lsf-pc

On Mon, Feb 29, 2016 at 12:45 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Sun, 2016-02-28 at 08:29 -0800, Christoph Hellwig wrote:
>> On Sun, Feb 28, 2016 at 08:26:46AM -0800, James Bottomley wrote:
>> > You mean in bio_add_page() the code which currently aggregates
>> > chunks within a page could build a bio vec entry up to the max
>> > segment size?  I think that is reasonable, especially now the bio
>> > splitting code can actually split inside a bio vec entry.
>>
>> Yes.  Kent has an old prototype that did this at:
>>
>> https://evilpiepirate.org/git/linux-bcache.git/log/?h=block_stuff
>>
>> I don't think any of that is reusable as-is, but the basic idea is
>> sounds and very useful.
>
> The basic idea, yes, but the actual code in that tree would still have
> built up bv entries that are too big.  We have to thread bio_add_page()
> with knowledge of the queue limits, which is somewhat hard since
> they're deliberately queue agnostic.  Perhaps some global minimum queue
> segment size would work?

IMO, we can just build contiguous segment simply into one vector because
bio_add_page() in hot path, then compute segments during
bio splitting from submit_bio() path by applying all kinds of queue limit just
like current way.

Thanks
Ming

>
> James
>



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 16:59         ` Ming Lei
@ 2016-02-28 17:09           ` James Bottomley
  2016-02-28 18:49             ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2016-02-28 17:09 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Linux FS Devel, linux-block, lsf-pc

On Mon, 2016-02-29 at 00:59 +0800, Ming Lei wrote:
> On Mon, Feb 29, 2016 at 12:45 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Sun, 2016-02-28 at 08:29 -0800, Christoph Hellwig wrote:
> > > On Sun, Feb 28, 2016 at 08:26:46AM -0800, James Bottomley wrote:
> > > > You mean in bio_add_page() the code which currently aggregates
> > > > chunks within a page could build a bio vec entry up to the max
> > > > segment size?  I think that is reasonable, especially now the
> > > > bio
> > > > splitting code can actually split inside a bio vec entry.
> > > 
> > > Yes.  Kent has an old prototype that did this at:
> > > 
> > > https://evilpiepirate.org/git/linux-bcache.git/log/?h=block_stuff
> > > 
> > > I don't think any of that is reusable as-is, but the basic idea
> > > is
> > > sounds and very useful.
> > 
> > The basic idea, yes, but the actual code in that tree would still
> > have
> > built up bv entries that are too big.  We have to thread
> > bio_add_page()
> > with knowledge of the queue limits, which is somewhat hard since
> > they're deliberately queue agnostic.  Perhaps some global minimum
> > queue
> > segment size would work?
> 
> IMO, we can just build contiguous segment simply into one vector 
> because bio_add_page() in hot path, then compute segments during
> bio splitting from submit_bio() path by applying all kinds of queue 
> limit just like current way.

We can debate this, but I'm dubious about the effectiveness.  the
reason we have biovecs and don't use one bio per page is efficiency. 
 On large memory machines, most large IO transfers tend to be
physically contiguous because the allocators make it so.  The splitting
code splits into bios not biovecs, so we'll likely end up with one bio
per segment.  Is that better than one page per large biovec?  Not sure,
someone will have to do careful benchmarking.

I'm not saying lets not do this, just that it's not an obvious win.

James



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 17:09           ` James Bottomley
@ 2016-02-28 18:49             ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2016-02-28 18:49 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, Linux FS Devel, linux-block, lsf-pc

On Mon, Feb 29, 2016 at 1:09 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2016-02-29 at 00:59 +0800, Ming Lei wrote:
>> On Mon, Feb 29, 2016 at 12:45 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > On Sun, 2016-02-28 at 08:29 -0800, Christoph Hellwig wrote:
>> > > On Sun, Feb 28, 2016 at 08:26:46AM -0800, James Bottomley wrote:
>> > > > You mean in bio_add_page() the code which currently aggregates
>> > > > chunks within a page could build a bio vec entry up to the max
>> > > > segment size?  I think that is reasonable, especially now the
>> > > > bio
>> > > > splitting code can actually split inside a bio vec entry.
>> > >
>> > > Yes.  Kent has an old prototype that did this at:
>> > >
>> > > https://evilpiepirate.org/git/linux-bcache.git/log/?h=block_stuff
>> > >
>> > > I don't think any of that is reusable as-is, but the basic idea
>> > > is
>> > > sounds and very useful.
>> >
>> > The basic idea, yes, but the actual code in that tree would still
>> > have
>> > built up bv entries that are too big.  We have to thread
>> > bio_add_page()
>> > with knowledge of the queue limits, which is somewhat hard since
>> > they're deliberately queue agnostic.  Perhaps some global minimum
>> > queue
>> > segment size would work?
>>
>> IMO, we can just build contiguous segment simply into one vector
>> because bio_add_page() in hot path, then compute segments during
>> bio splitting from submit_bio() path by applying all kinds of queue
>> limit just like current way.
>
> We can debate this, but I'm dubious about the effectiveness.  the

When bio_add_page() is called, fs is preparing data, and it is reasonable
to figure out segments & split just after the bio is filled up and ready(
in submit_bio() path).

If fs knows pages are physically contiguous, it may be better to introduce
bio_add_pages() and add all these pages in batch, which looks more
efficient.

> reason we have biovecs and don't use one bio per page is efficiency.
>  On large memory machines, most large IO transfers tend to be
> physically contiguous because the allocators make it so.  The splitting
> code splits into bios not biovecs, so we'll likely end up with one bio
> per segment.  Is that better than one page per large biovec?  Not sure,

Firstly, it doesn't mean multipage bvecs ends up with single vector, because
most of current users continue to call bio_add_page() if it the bio
isn't full. And fs
often has more data to transfer.

Secondly current blk_bio_segment_split() just splits bios into bio because
each bvec only includes one page. When multipage bvec is introduced,
one multipage bvec may need to be splitted out into different segments
because of queue's limit, but both these segments may be OK to be
handled in one bio.

> someone will have to do careful benchmarking.

Yes, we need to do that carfully, :-)

In theory, for each of above two cases, multipage bvecs may improve
performance, also block merge can get simplified a lot.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 16:01   ` [Lsf-pc] " James Bottomley
@ 2016-02-29  9:41     ` Boaz Harrosh
  0 siblings, 0 replies; 24+ messages in thread
From: Boaz Harrosh @ 2016-02-29  9:41 UTC (permalink / raw)
  To: James Bottomley, Ming Lei, lsf-pc; +Cc: linux-block, Linux FS Devel

On 02/28/2016 06:01 PM, James Bottomley wrote:
> On Sun, 2016-02-28 at 13:17 +0200, Boaz Harrosh wrote:
>> On 02/26/2016 06:33 PM, Ming Lei wrote:
>>> Hi,
>>>
>>> I'd like to participate in LSF/MM and discuss multipage bvecs.
>>>
>>> Kent posted the idea[1] before, but never pushed out.
>>> I have studied multipage bvecs for a while, and think
>>> it is a good idea to improve block subsystem.
>>>
>>> Multipage bvecs means that one 'struct bio_bvec' can hold
>>> multiple pages which are physically contiguous instead
>>> of one single page used in current kernel.
>>>
>>
>> Hi Ming Lei
>>
>> This is an interesting talk for me.
>>
>> I don't know if you ever tried it but I did. If I take a regular
>> SSD disk or a PCIE flash card that I have in my machine and
>> I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
>> submit_bio, I get 8 pages worth of IO with a single bvec and it
>> all just works.
> 
> No it wouldn't.  There's no check anywhere that a single bv entry is
> under the queue max segment size because the assumption is bv_len <=
> page size.  If you start sending multi-page single bio vec entries,
> you'll violate those assumptions and eventually get an unmappable bio.
> 

I thought so too. Imagine my surprise. But it just works. Limits aside
for a second. I followed the code path and everywhere we are using bv_len.
so does the sg_map DMA code. And all comes out just fine. So if I'm not
crossing any boundaries it works.

I'd imagine that any SW driver that actually accesses the page as a page*
for say kmap() yes would crash terribly. But the virtual-to-phisical mapping
does work (tested 64bit only) . I know big surprise 

Thanks
Boaz

> James
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-28 16:08   ` Christoph Hellwig
@ 2016-02-29 10:16     ` Boaz Harrosh
  2016-02-29 15:46       ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Boaz Harrosh @ 2016-02-29 10:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ming Lei, lsf-pc, linux-block, Linux FS Devel

On 02/28/2016 06:08 PM, Christoph Hellwig wrote:
> On Sun, Feb 28, 2016 at 01:17:43PM +0200, Boaz Harrosh wrote:
>> I don't know if you ever tried it but I did. If I take a regular
>> SSD disk or a PCIE flash card that I have in my machine and
>> I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
>> submit_bio, I get 8 pages worth of IO with a single bvec and it
>> all just works.
> 
> No, it will break in all kinds of places.  Also you really should
> never just setup bvecs yourself, please always use bio_add_page!
> 

Guys when did you ever stop playing and became so serious? Of course
I never do that in submitted code. But I do like to experiment from
time to time and play around, I like it when my VM crashes ;-)

That said when did you last look at bio_add_page() it will just work
as well. (Specially lately since the limits are checked later ever
since bios can split)

So if you have a real hard stair you'll see that we consider bv_len
everywhere and the PAGE_SIZE assumption is more when allocating array
sizes and things like that. Again it will break on SW drivers like
brd and scsi_debug. But will currently work on anything going through
sg-lists and DMA mapping.

This is not the first time in the kernel that a page* and large
size denotes a set of contiguous pages, BTW

Cheers
Boaz


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-29 10:16     ` Boaz Harrosh
@ 2016-02-29 15:46       ` James Bottomley
  0 siblings, 0 replies; 24+ messages in thread
From: James Bottomley @ 2016-02-29 15:46 UTC (permalink / raw)
  To: Boaz Harrosh, Christoph Hellwig
  Cc: linux-block, Ming Lei, Linux FS Devel, lsf-pc

On Mon, 2016-02-29 at 12:16 +0200, Boaz Harrosh wrote:
> On 02/28/2016 06:08 PM, Christoph Hellwig wrote:
> > On Sun, Feb 28, 2016 at 01:17:43PM +0200, Boaz Harrosh wrote:
> > > I don't know if you ever tried it but I did. If I take a regular
> > > SSD disk or a PCIE flash card that I have in my machine and
> > > I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call
> > > submit_bio, I get 8 pages worth of IO with a single bvec and it
> > > all just works.
> > 
> > No, it will break in all kinds of places.  Also you really should
> > never just setup bvecs yourself, please always use bio_add_page!
> > 
> 
> Guys when did you ever stop playing and became so serious? Of course
> I never do that in submitted code. But I do like to experiment from
> time to time and play around, I like it when my VM crashes ;-)
> 
> That said when did you last look at bio_add_page() it will just work
> as well. (Specially lately since the limits are checked later ever
> since bios can split)
> 
> So if you have a real hard stair you'll see that we consider bv_len
> everywhere and the PAGE_SIZE assumption is more when allocating array
> sizes and things like that. Again it will break on SW drivers like
> brd and scsi_debug. But will currently work on anything going through
> sg-lists and DMA mapping.

No, it won't.  The segment mappers won't break up anything with bv_len
> max segement size, so it gets incorrectly mapped as a sglist entry
which is too long.  The max segment size can come from two places, it
can be a driver limitation, like IDE (the segment length register only
has 16 bits in older cards, or it can come from the platform IOMMU
descriptors.  Running on virtual hardware with no iommu isn't testing
any of this.

James



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-02-26 16:33 [LSF/MM ATTEND] block: multipage bvecs Ming Lei
  2016-02-28 11:17 ` Boaz Harrosh
  2016-02-28 16:07 ` Christoph Hellwig
@ 2016-03-03  8:58 ` Christoph Hellwig
  2016-03-03 11:04   ` Ming Lei
  2016-03-05  8:35 ` Ming Lei
  3 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2016-03-03  8:58 UTC (permalink / raw)
  To: Ming Lei; +Cc: lsf-pc, linux-block, Linux FS Devel

On Sat, Feb 27, 2016 at 12:33:00AM +0800, Ming Lei wrote:
> Hi,
> 
> I'd like to participate in LSF/MM and discuss multipage bvecs.
> 
> Kent posted the idea[1] before, but never pushed out.
> I have studied multipage bvecs for a while, and think
> it is a good idea to improve block subsystem.
> 
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in current kernel.

Btw, do you have a current version of the multipage biovec patches
somewhere?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-03-03  8:58 ` Christoph Hellwig
@ 2016-03-03 11:04   ` Ming Lei
  2016-03-03 12:11     ` Christoph Hellwig
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2016-03-03 11:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: lsf-pc, linux-block, Linux FS Devel

On Thu, Mar 3, 2016 at 4:58 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sat, Feb 27, 2016 at 12:33:00AM +0800, Ming Lei wrote:
>> Hi,
>>
>> I'd like to participate in LSF/MM and discuss multipage bvecs.
>>
>> Kent posted the idea[1] before, but never pushed out.
>> I have studied multipage bvecs for a while, and think
>> it is a good idea to improve block subsystem.
>>
>> Multipage bvecs means that one 'struct bio_bvec' can hold
>> multiple pages which are physically contiguous instead
>> of one single page used in current kernel.
>
> Btw, do you have a current version of the multipage biovec patches
> somewhere?

The helpers change are basically ready, and one new parameter is added
to bio_for_each_segment_all(), so its all callers need this kind of change,

Also bio_clone() & bvec updating isn't started, but this kind of work
should not be very difficult to complete.

So the work is in progressing, but I can setup a tree and make it public
if anyone is intereseted in.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-03-03 11:04   ` Ming Lei
@ 2016-03-03 12:11     ` Christoph Hellwig
  2016-03-03 23:49       ` Ming Lin
  2016-03-07  8:44       ` Ming Lei
  0 siblings, 2 replies; 24+ messages in thread
From: Christoph Hellwig @ 2016-03-03 12:11 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, linux-block, Linux FS Devel, lsf-pc

On Thu, Mar 03, 2016 at 07:04:20PM +0800, Ming Lei wrote:
> The helpers change are basically ready, and one new parameter is added
> to bio_for_each_segment_all(), so its all callers need this kind of change,
> 
> Also bio_clone() & bvec updating isn't started, but this kind of work
> should not be very difficult to complete.
> 
> So the work is in progressing, but I can setup a tree and make it public
> if anyone is intereseted in.

Having a git tree available would be very useful, but probably only once
the scheme starts working..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-03-03 12:11     ` Christoph Hellwig
@ 2016-03-03 23:49       ` Ming Lin
  2016-03-07  8:44       ` Ming Lei
  1 sibling, 0 replies; 24+ messages in thread
From: Ming Lin @ 2016-03-03 23:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, linux-block, Linux FS Devel, lsf-pc, Dongsu Park

On Thu, Mar 3, 2016 at 4:11 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Mar 03, 2016 at 07:04:20PM +0800, Ming Lei wrote:
>> The helpers change are basically ready, and one new parameter is added
>> to bio_for_each_segment_all(), so its all callers need this kind of change,
>>
>> Also bio_clone() & bvec updating isn't started, but this kind of work
>> should not be very difficult to complete.
>>
>> So the work is in progressing, but I can setup a tree and make it public
>> if anyone is intereseted in.
>
> Having a git tree available would be very useful, but probably only once
> the scheme starts working..

Dongsu and I have some fixes for Kent's original code.
https://github.com/dongsupark/linux/commits/zz_block-mpage-bvecs-for-3.18

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [LSF/MM ATTEND] block: multipage bvecs
  2016-02-26 16:33 [LSF/MM ATTEND] block: multipage bvecs Ming Lei
                   ` (2 preceding siblings ...)
  2016-03-03  8:58 ` Christoph Hellwig
@ 2016-03-05  8:35 ` Ming Lei
  3 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2016-03-05  8:35 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-block, Linux FS Devel

On Sat, Feb 27, 2016 at 12:33 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> Hi,
>
> I'd like to participate in LSF/MM and discuss multipage bvecs.
>
> Kent posted the idea[1] before, but never pushed out.
> I have studied multipage bvecs for a while, and think
> it is a good idea to improve block subsystem.
>
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in current kernel.
>
> IMO, there are several advantages by supporting multipage bvecs:
>
> - currently one bio from bio_alloc() can only hold at most 256
> vectors, which means one bio can be used to transfer at most
> 1Mbytes(256*4K). With multipage bvecs fs can submit bigger
> chunk via single bio because big physically contiguous segment
> is very common.
>
> - CPU consumed in iterating bvec table should be decreased
>
> - block merge gets simplified a lot, and segment can be merged
> just inside bio_add_page(), then singlepage bvec needn't to store
> in bvec table, finally the segment can be splitted to driver with
> proper size. blk_bio_map_sg() gets simplified too. Recent days,
> block merge becomes a bit complicated and we saw quite bug reports/fixes
> in block merge.
>
> I'd like to hear opinions from fs guys about multipage bvecs based bio
> because this should bring up some change to the bio interface(one bio
> will represent bigger I/O than before).
>
> Also I hope to discuss with guys in fs, dm, md, bcache... about
> the implementation because this feature will bring changes on
> these subsystems. So far, I have the following ideas:
>
> 1) change on bio_for_each_segment()
>
> bvec returned from this iterator helper need to keep as singlepage
> vector as before, so most users of bio iterator don't need change
>
> 2) change on bio_for_each_segment_all()
>
> bio_for_each_segment_all() has to be changed because callers may
> change the bvec and assume it is always singlepage now.
>
> I think bio_for_each_segment_all() need to be splitted into
> bio_for_each_segment_all_rd() and bio_for_each_segment_all_wt().
>
> Both two new helpers returns pointer to bio_bvec like before.
>
> *_rd() is used to iterate each vector for reading the pointed bvec,
> and caller can not write to this vector. This helper can still
> return singlepage bvec like before, so one extra local/temp 'bio_bvec'
> variable has to be added for conversion from multipage bvec to
> singlepage bvec.
>
> *_wt() is used to iterate each vector for changing the bvec, and
> only allowed for iterating bio with singlepage bvecs, there are
> just several such cases, such as bio bounce, bio_alloc_pages(),
> raid1 and raid10.
>
> 3) change bvecs of cloned bio
> Such as bio bounce and raid1, one bio is cloned from the incoming
> bio, and each bvec of the cloned bio may be updated. We have to
> introduce singlepage version of bio_clone() to make the cloned bio
> only include singlepage bvec, then the bvecs can be updated like
> before.
>
> One problem is that the cloned bio may not hold all singlepage bvec
> converted from multipage bvecs in the source bio, and one simple
> solution is to split the source bio and make sure its size can't be
> bigger than 1Mbytes(256 single page vectors).
>
> 4) introduce bio_for_each_mp_segment()
>
> bvec returned from this iterator helper will become multipage bvec
> which should be the actual/real segment, so drivers may switch to
> this helper if they can handle multipage segment directly, which
> should be common case.

5) remove most of direct access to bio->bi_io_vec & bio->bi_vcnt
in other subsystems

Most of this usage are from btrfs, raid1, raid10 and bcache,  the direct
access to .bi_io_vec and .bi_vcnt may cause mess once multipage
bvecs is introduced except for the following cases:

- direct access before calling bio_add_page()
- adding single page vector
- adding page directly instead of using bio_add_page()
- REQ_PC bio case

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-03-03 12:11     ` Christoph Hellwig
  2016-03-03 23:49       ` Ming Lin
@ 2016-03-07  8:44       ` Ming Lei
  2016-03-21 15:55         ` Christoph Hellwig
  1 sibling, 1 reply; 24+ messages in thread
From: Ming Lei @ 2016-03-07  8:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Linux FS Devel, lsf-pc

On Thu, Mar 3, 2016 at 8:11 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Mar 03, 2016 at 07:04:20PM +0800, Ming Lei wrote:
>> The helpers change are basically ready, and one new parameter is added
>> to bio_for_each_segment_all(), so its all callers need this kind of change,
>>
>> Also bio_clone() & bvec updating isn't started, but this kind of work
>> should not be very difficult to complete.
>>
>> So the work is in progressing, but I can setup a tree and make it public
>> if anyone is intereseted in.
>
> Having a git tree available would be very useful, but probably only once
> the scheme starts working..

Just create one branch for multipage bvecs:

http://kernel.ubuntu.com/git/ming/linux.git/log/?h=v4.5-rc-multipage-bvecs-early

At least two kinds of stuff aren't completed yet:

1) converting bio_for_each_segment() to *_rd() and *_wt()

- this change should be a mechanical thing

2) cleanup on direct access to .bi_io_vec and .bi_vcnt from btrfs,
dm, bcache, ...

- this kind of work may take some time to do
- most of them are in btrfs

3) others?

With these patches in above tree, I can run I/O tests
over null_blk under both direct-io and buffered I/O with
multipage bvecs enabled, looks it starts working.

When bs in my fio test(write) is set 10M, average bio size
can be more than 5MB with multipage bvecs, but only
1MB without multipage bvecs.

Any comments are welcome, :-)

thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-03-07  8:44       ` Ming Lei
@ 2016-03-21 15:55         ` Christoph Hellwig
  2016-03-22  0:12           ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2016-03-21 15:55 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, linux-block, Linux FS Devel, lsf-pc

On Mon, Mar 07, 2016 at 04:44:49PM +0800, Ming Lei wrote:
> Just create one branch for multipage bvecs:
> 
> http://kernel.ubuntu.com/git/ming/linux.git/log/?h=v4.5-rc-multipage-bvecs-early

Can you send out a few of the prep patches ASAP?  E.g. the
BIO_MAX_SECTORS and BIO_MAX_SIZE removal.

I really don't like the _mp / _sp naming for the iteratators,
I think they should have a _page and _segment in the name instead,
and preferably done in a way that breaks compilation for old code,
e.g. never reuse the old names.

I also think it would be really useful to convert bio bounce and raid1
(and generally everything bio based) to deal with multi-page biovecs
instead of making it opt-in.


> 1) converting bio_for_each_segment() to *_rd() and *_wt()
> 
> - this change should be a mechanical thing

I don't think the short prefix naming is very useful.  I'd also
like to understand where the need for thee comes from, as this
didn't appear in Kent's patches.  Better documentation of them
would be very useful.

> 
> 2) cleanup on direct access to .bi_io_vec and .bi_vcnt from btrfs,
> dm, bcache, ...
> 
> - this kind of work may take some time to do
> - most of them are in btrfs

Removing access to these from outside the core block code would be
a very useful patchset on it's own, so I' dsuggest to finish that
before continuing on the rest of the series.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs
  2016-03-21 15:55         ` Christoph Hellwig
@ 2016-03-22  0:12           ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2016-03-22  0:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, Linux FS Devel, lsf-pc, Boaz Harrosh, Jens Axboe, Ming Lin

Hi,

On Mon, Mar 21, 2016 at 11:55 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Mar 07, 2016 at 04:44:49PM +0800, Ming Lei wrote:
>> Just create one branch for multipage bvecs:
>>
>> http://kernel.ubuntu.com/git/ming/linux.git/log/?h=v4.5-rc-multipage-bvecs-early
>

BTW, two bugs are fixed in the folllowing branch:

http://kernel.ubuntu.com/git/ming/linux.git/log/?h=v4.5-rc-block-multipage-bvecs-early.1-ext4-scsi

then ext4 over virtio-blk/virtio-scsi works with multipage bvecs now.

> Can you send out a few of the prep patches ASAP?  E.g. the
> BIO_MAX_SECTORS and BIO_MAX_SIZE removal.

OK.

>
> I really don't like the _mp / _sp naming for the iteratators,
> I think they should have a _page and _segment in the name instead,

Currently I don't want to introduce extra changes, that is why I use
_mp / _sp naming, for example, bio_for_each_segment() actually
returns 'page'. So if we introduce _page & _segment, all the current
callers have to be changed. Maybe we can delay that into late stage.

> and preferably done in a way that breaks compilation for old code,
> e.g. never reuse the old names.
>
> I also think it would be really useful to convert bio bounce and raid1
> (and generally everything bio based) to deal with multi-page biovecs
> instead of making it opt-in.

For bio bounce, it isn't doable to deal with multipage because
allocating multi-pages for bouncing often fails.

>
>
>> 1) converting bio_for_each_segment() to *_rd() and *_wt()
>>
>> - this change should be a mechanical thing
>
> I don't think the short prefix naming is very useful.  I'd also
> like to understand where the need for thee comes from, as this
> didn't appear in Kent's patches.  Better documentation of them
> would be very useful.

I still hate the short prefix naming, but:

bio_for_each_segment() returns pointer to bvec in the .bi_io_vec directly,
which looks ugly, :-(

Most of users just read the bvec by this way, but some old users still need
to change the bvec. So for users who want to change the bvec, the
standard bio iterator helpers can't work becasue the bvec is always
copied to local variable. That is why I introduce the prefix naming.

But this change is just a mechanical conversion. With a new iterator
variable, it can be very flexible.

>
>>
>> 2) cleanup on direct access to .bi_io_vec and .bi_vcnt from btrfs,
>> dm, bcache, ...
>>
>> - this kind of work may take some time to do
>> - most of them are in btrfs
>
> Removing access to these from outside the core block code would be
> a very useful patchset on it's own, so I' dsuggest to finish that
> before continuing on the rest of the series.

Yeah, most of change should be in btrfs, and need to finish first.
Other block driver's change(raid, ...) can be handled by QUEUE_FLAG_NO_MP first.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-03-22  0:12 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26 16:33 [LSF/MM ATTEND] block: multipage bvecs Ming Lei
2016-02-28 11:17 ` Boaz Harrosh
2016-02-28 14:34   ` Ming Lei
2016-02-28 14:41     ` Ming Lei
2016-02-28 16:01   ` [Lsf-pc] " James Bottomley
2016-02-29  9:41     ` Boaz Harrosh
2016-02-28 16:08   ` Christoph Hellwig
2016-02-29 10:16     ` Boaz Harrosh
2016-02-29 15:46       ` James Bottomley
2016-02-28 16:07 ` Christoph Hellwig
2016-02-28 16:26   ` James Bottomley
2016-02-28 16:29     ` Christoph Hellwig
2016-02-28 16:45       ` James Bottomley
2016-02-28 16:59         ` Ming Lei
2016-02-28 17:09           ` James Bottomley
2016-02-28 18:49             ` Ming Lei
2016-03-03  8:58 ` Christoph Hellwig
2016-03-03 11:04   ` Ming Lei
2016-03-03 12:11     ` Christoph Hellwig
2016-03-03 23:49       ` Ming Lin
2016-03-07  8:44       ` Ming Lei
2016-03-21 15:55         ` Christoph Hellwig
2016-03-22  0:12           ` Ming Lei
2016-03-05  8:35 ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.