All of lore.kernel.org
 help / color / mirror / Atom feed
* RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
@ 2022-01-24  7:52 John Hubbard
  2022-01-24 10:05 ` Jan Kara
  2022-01-24 13:18 ` Matthew Wilcox
  0 siblings, 2 replies; 11+ messages in thread
From: John Hubbard @ 2022-01-24  7:52 UTC (permalink / raw)
  To: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-block

Hi,

Background: despite having very little experience in the block and bio
layers, I am attempting to convert the Direct IO parts of them from
using get_user_pages_fast(), to pin_user_pages_fast(). This requires the
use of a corresponding special release call: unpin_user_pages(), instead
of the generic put_page().

Fortunately, Christoph Hellwig has observed [1] (more than once [2]) that
only "a little" refactoring is required, because it is *almost* true
that bio_release_pages() could just be switched over from calling
put_page(), to unpin_user_page(). The "not quite" part is mainly due to
the zero page. There are a few write paths that pad zeroes, and they use
the zero page.

That's where I'd like some advice. How to refactor things, so that the
zero page does not end up in the list of pages that bio_release_pages()
acts upon?

To ground this in reality, one of the partial call stacks is:

do_direct_IO()
     dio_zero_block()
         page = ZERO_PAGE(0); <-- This is a problem

I'm not sure what to use, instead of that zero page! The zero page
doesn't need to be allocated nor tracked, and so any replacement
approaches would need either other storage, or some horrid scheme that I
won't go so far as to write on the screen. :)


[1] https://lore.kernel.org/all/20220120141219.GB11707@lst.de/ (2022)
[2] https://lore.kernel.org/kvm/20190724053053.GA18330@infradead.org/
(2019)


thanks in advance,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24  7:52 RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages() John Hubbard
@ 2022-01-24 10:05 ` Jan Kara
  2022-01-24 10:38   ` Chaitanya Kulkarni
  2022-01-24 21:06   ` John Hubbard
  2022-01-24 13:18 ` Matthew Wilcox
  1 sibling, 2 replies; 11+ messages in thread
From: Jan Kara @ 2022-01-24 10:05 UTC (permalink / raw)
  To: John Hubbard; +Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-block

Hello,

On Sun 23-01-22 23:52:07, John Hubbard wrote:
> Background: despite having very little experience in the block and bio
> layers, I am attempting to convert the Direct IO parts of them from
> using get_user_pages_fast(), to pin_user_pages_fast(). This requires the
> use of a corresponding special release call: unpin_user_pages(), instead
> of the generic put_page().
> 
> Fortunately, Christoph Hellwig has observed [1] (more than once [2]) that
> only "a little" refactoring is required, because it is *almost* true
> that bio_release_pages() could just be switched over from calling
> put_page(), to unpin_user_page(). The "not quite" part is mainly due to
> the zero page. There are a few write paths that pad zeroes, and they use
> the zero page.
> 
> That's where I'd like some advice. How to refactor things, so that the
> zero page does not end up in the list of pages that bio_release_pages()
> acts upon?
> 
> To ground this in reality, one of the partial call stacks is:
> 
> do_direct_IO()
>     dio_zero_block()
>         page = ZERO_PAGE(0); <-- This is a problem
> 
> I'm not sure what to use, instead of that zero page! The zero page
> doesn't need to be allocated nor tracked, and so any replacement
> approaches would need either other storage, or some horrid scheme that I
> won't go so far as to write on the screen. :)

Well, I'm not sure if you consider this ugly but currently we use
get_page() in that path exactly so that bio_release_pages() does not have
to care about zero page. So now we could grab pin on the zero page instead
through try_grab_page() or something like that...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24 10:05 ` Jan Kara
@ 2022-01-24 10:38   ` Chaitanya Kulkarni
  2022-01-24 12:19     ` Jan Kara
  2022-01-24 21:06   ` John Hubbard
  1 sibling, 1 reply; 11+ messages in thread
From: Chaitanya Kulkarni @ 2022-01-24 10:38 UTC (permalink / raw)
  To: John Hubbard; +Cc: Jan Kara, Christoph Hellwig, linux-fsdevel, linux-block

On 1/24/22 2:05 AM, Jan Kara wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hello,
> 
> On Sun 23-01-22 23:52:07, John Hubbard wrote:
>> Background: despite having very little experience in the block and bio
>> layers, I am attempting to convert the Direct IO parts of them from
>> using get_user_pages_fast(), to pin_user_pages_fast(). This requires the
>> use of a corresponding special release call: unpin_user_pages(), instead
>> of the generic put_page().
>>
>> Fortunately, Christoph Hellwig has observed [1] (more than once [2]) that
>> only "a little" refactoring is required, because it is *almost* true
>> that bio_release_pages() could just be switched over from calling
>> put_page(), to unpin_user_page(). The "not quite" part is mainly due to
>> the zero page. There are a few write paths that pad zeroes, and they use
>> the zero page.
>>
>> That's where I'd like some advice. How to refactor things, so that the
>> zero page does not end up in the list of pages that bio_release_pages()
>> acts upon?

this maybe wrong but thinking out loudly, have you consider adding a 
ZERO_PAGE() address check since it should have a unique same
address for each ZERO_PAGE() (unless I'm totally wrong here) and
using this check you can distinguish between ZERO_PAGE() and
non ZERO_PAGE() on the bio list in bio_release_pages().

>>
>> To ground this in reality, one of the partial call stacks is:
>>
>> do_direct_IO()
>>      dio_zero_block()
>>          page = ZERO_PAGE(0); <-- This is a problem
>>
>> I'm not sure what to use, instead of that zero page! The zero page
>> doesn't need to be allocated nor tracked, and so any replacement
>> approaches would need either other storage, or some horrid scheme that I
>> won't go so far as to write on the screen. :)
> 
> Well, I'm not sure if you consider this ugly but currently we use
> get_page() in that path exactly so that bio_release_pages() does not have
> to care about zero page. So now we could grab pin on the zero page instead
> through try_grab_page() or something like that...
> 
>        

submit_page_section() does call get_page() in that same path 
irrespective of whether it is ZERO_PAGE() or not, this actually
makes accounting much easier and we also avoid  any special case
for ZERO_PAGE().

dio_zero_block()
  submit_page_section()
    get_page()


That also means that on completion of dio for each bio we also call
put_page() from bio_release_page() path.

                                                           Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24 10:38   ` Chaitanya Kulkarni
@ 2022-01-24 12:19     ` Jan Kara
  2022-01-24 19:34       ` John Hubbard
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2022-01-24 12:19 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: John Hubbard, Jan Kara, Christoph Hellwig, linux-fsdevel, linux-block

On Mon 24-01-22 10:38:13, Chaitanya Kulkarni wrote:
> On 1/24/22 2:05 AM, Jan Kara wrote:
> > External email: Use caution opening links or attachments
> > 
> > Hello,
> > 
> > On Sun 23-01-22 23:52:07, John Hubbard wrote:
> >> Background: despite having very little experience in the block and bio
> >> layers, I am attempting to convert the Direct IO parts of them from
> >> using get_user_pages_fast(), to pin_user_pages_fast(). This requires the
> >> use of a corresponding special release call: unpin_user_pages(), instead
> >> of the generic put_page().
> >>
> >> Fortunately, Christoph Hellwig has observed [1] (more than once [2]) that
> >> only "a little" refactoring is required, because it is *almost* true
> >> that bio_release_pages() could just be switched over from calling
> >> put_page(), to unpin_user_page(). The "not quite" part is mainly due to
> >> the zero page. There are a few write paths that pad zeroes, and they use
> >> the zero page.
> >>
> >> That's where I'd like some advice. How to refactor things, so that the
> >> zero page does not end up in the list of pages that bio_release_pages()
> >> acts upon?
> 
> this maybe wrong but thinking out loudly, have you consider adding a 
> ZERO_PAGE() address check since it should have a unique same
> address for each ZERO_PAGE() (unless I'm totally wrong here) and
> using this check you can distinguish between ZERO_PAGE() and
> non ZERO_PAGE() on the bio list in bio_release_pages().

Well, that is another option but it seems a bit ugly and also on some
architectures (e.g. s390 AFAICS) there can be multiple zero pages (due to
coloring) so the test for zero page is not completely trivial (probably we
would have to grow some is_zero_page() checking function implemented
separately for each arch).

> >> To ground this in reality, one of the partial call stacks is:
> >>
> >> do_direct_IO()
> >>      dio_zero_block()
> >>          page = ZERO_PAGE(0); <-- This is a problem
> >>
> >> I'm not sure what to use, instead of that zero page! The zero page
> >> doesn't need to be allocated nor tracked, and so any replacement
> >> approaches would need either other storage, or some horrid scheme that I
> >> won't go so far as to write on the screen. :)
> > 
> > Well, I'm not sure if you consider this ugly but currently we use
> > get_page() in that path exactly so that bio_release_pages() does not have
> > to care about zero page. So now we could grab pin on the zero page instead
> > through try_grab_page() or something like that...
> > 
> >        
> 
> submit_page_section() does call get_page() in that same path 
> irrespective of whether it is ZERO_PAGE() or not, this actually
> makes accounting much easier and we also avoid  any special case
> for ZERO_PAGE().
> 
> dio_zero_block()
>   submit_page_section()
>     get_page()
> 
> 
> That also means that on completion of dio for each bio we also call
> put_page() from bio_release_page() path.

Well, yes. fs/direct-io.c grabs two page references in fact. One is passed
along with the bio and released by bio_release_pages(), the other one is
released directly in fs/direct-io.c after IO is submitted. We need to
somewhat reorganize the code to better define which is which because with
pinning they would be of different kind.

For zero page we currently grab the reference in dio_refill_pages() (this
needs to be converted to try_grab_page()) and we don't do anything in
dio_zero_block() - that code needs to be cleaned up to also call
try_grab_page() and we need to somehow deal with the ref currently grabbed
in submit_page_section()...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24  7:52 RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages() John Hubbard
  2022-01-24 10:05 ` Jan Kara
@ 2022-01-24 13:18 ` Matthew Wilcox
  2022-01-24 19:48   ` John Hubbard
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2022-01-24 13:18 UTC (permalink / raw)
  To: John Hubbard; +Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-block

On Sun, Jan 23, 2022 at 11:52:07PM -0800, John Hubbard wrote:
> To ground this in reality, one of the partial call stacks is:
> 
> do_direct_IO()
>     dio_zero_block()
>         page = ZERO_PAGE(0); <-- This is a problem
> 
> I'm not sure what to use, instead of that zero page! The zero page
> doesn't need to be allocated nor tracked, and so any replacement
> approaches would need either other storage, or some horrid scheme that I
> won't go so far as to write on the screen. :)

I'm not really sure what the problem is.

include/linux/mm.h:             is_zero_pfn(page_to_pfn(page));

and release_pages() already contains:
                if (is_huge_zero_page(page))
                        continue;

Why can't the BIO release function contain an is_zero_pfn() check?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24 12:19     ` Jan Kara
@ 2022-01-24 19:34       ` John Hubbard
  2022-01-24 21:06         ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: John Hubbard @ 2022-01-24 19:34 UTC (permalink / raw)
  To: Jan Kara, Chaitanya Kulkarni
  Cc: Christoph Hellwig, linux-fsdevel, linux-block

On 1/24/22 04:19, Jan Kara wrote:
...
>> this maybe wrong but thinking out loudly, have you consider adding a
>> ZERO_PAGE() address check since it should have a unique same

This crossed my mind, but I thought I might be missing something better
on the submission side, such as some way to do all of this without a
zero page.

>> address for each ZERO_PAGE() (unless I'm totally wrong here) and
>> using this check you can distinguish between ZERO_PAGE() and
>> non ZERO_PAGE() on the bio list in bio_release_pages().
> 
> Well, that is another option but it seems a bit ugly and also on some
> architectures (e.g. s390 AFAICS) there can be multiple zero pages (due to
> coloring) so the test for zero page is not completely trivial (probably we
> would have to grow some is_zero_page() checking function implemented
> separately for each arch).

Good point. And adding an is_zero_page() function would also make some
of these invocations correct across all architectures:

     is_zero_pfn(page_to_pfn(page))

...so it would also be a fix or at least an upgrade.

I had also wondered why there is no is_zero_page() wrapper function for
the above invocation. Maybe because there are only four call sites and
no one saw it as worthwhile yet.

Anyway, this looks viable, thanks for the quick responses!


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24 13:18 ` Matthew Wilcox
@ 2022-01-24 19:48   ` John Hubbard
  2022-01-26  5:42     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 11+ messages in thread
From: John Hubbard @ 2022-01-24 19:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-block

On 1/24/22 05:18, Matthew Wilcox wrote:
> On Sun, Jan 23, 2022 at 11:52:07PM -0800, John Hubbard wrote:
>> To ground this in reality, one of the partial call stacks is:
>>
>> do_direct_IO()
>>      dio_zero_block()
>>          page = ZERO_PAGE(0); <-- This is a problem
>>
>> I'm not sure what to use, instead of that zero page! The zero page
>> doesn't need to be allocated nor tracked, and so any replacement
>> approaches would need either other storage, or some horrid scheme that I
>> won't go so far as to write on the screen. :)
> 
> I'm not really sure what the problem is.
> 
> include/linux/mm.h:             is_zero_pfn(page_to_pfn(page));
> 
> and release_pages() already contains:
>                  if (is_huge_zero_page(page))
>                          continue;
> 
> Why can't the BIO release function contain an is_zero_pfn() check?

"OK." (With potential modifications as noted in the other thread with
Jan Kara and Chaitanya, we'll see.)

Thanks for responding with that answer so quickly, much appreciated!


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24 10:05 ` Jan Kara
  2022-01-24 10:38   ` Chaitanya Kulkarni
@ 2022-01-24 21:06   ` John Hubbard
  2022-01-24 22:17     ` Jan Kara
  1 sibling, 1 reply; 11+ messages in thread
From: John Hubbard @ 2022-01-24 21:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, linux-fsdevel, linux-block

On 1/24/22 02:05, Jan Kara wrote:
...
>> do_direct_IO()
>>      dio_zero_block()
>>          page = ZERO_PAGE(0); <-- This is a problem
>>
>> I'm not sure what to use, instead of that zero page! The zero page
>> doesn't need to be allocated nor tracked, and so any replacement
>> approaches would need either other storage, or some horrid scheme that I
>> won't go so far as to write on the screen. :)
> 
> Well, I'm not sure if you consider this ugly but currently we use
> get_page() in that path exactly so that bio_release_pages() does not have
> to care about zero page. So now we could grab pin on the zero page instead
> through try_grab_page() or something like that...
> 
> 								Honza

So it sounds like you prefer this over checking for the zero page in
bio_release_pages(). I'll take a look at both ideas, then, and see what
it looks like.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24 19:34       ` John Hubbard
@ 2022-01-24 21:06         ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2022-01-24 21:06 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jan Kara, Chaitanya Kulkarni, Christoph Hellwig, linux-fsdevel,
	linux-block

On Mon 24-01-22 11:34:25, John Hubbard wrote:
> On 1/24/22 04:19, Jan Kara wrote:
> > > address for each ZERO_PAGE() (unless I'm totally wrong here) and
> > > using this check you can distinguish between ZERO_PAGE() and
> > > non ZERO_PAGE() on the bio list in bio_release_pages().
> > 
> > Well, that is another option but it seems a bit ugly and also on some
> > architectures (e.g. s390 AFAICS) there can be multiple zero pages (due to
> > coloring) so the test for zero page is not completely trivial (probably we
> > would have to grow some is_zero_page() checking function implemented
> > separately for each arch).
> 
> Good point. And adding an is_zero_page() function would also make some
> of these invocations correct across all architectures:
> 
>     is_zero_pfn(page_to_pfn(page))
> 
> ...so it would also be a fix or at least an upgrade.

As I'm checking the is_zero_pfn() implementation, it should be working as it
should for all architectures. I just forgot we already have a function for
this.

> I had also wondered why there is no is_zero_page() wrapper function for
> the above invocation. Maybe because there are only four call sites and
> no one saw it as worthwhile yet.

Yeah, perhaps.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24 21:06   ` John Hubbard
@ 2022-01-24 22:17     ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2022-01-24 22:17 UTC (permalink / raw)
  To: John Hubbard; +Cc: Jan Kara, Christoph Hellwig, linux-fsdevel, linux-block

On Mon 24-01-22 13:06:03, John Hubbard wrote:
> On 1/24/22 02:05, Jan Kara wrote:
> ...
> > > do_direct_IO()
> > >      dio_zero_block()
> > >          page = ZERO_PAGE(0); <-- This is a problem
> > > 
> > > I'm not sure what to use, instead of that zero page! The zero page
> > > doesn't need to be allocated nor tracked, and so any replacement
> > > approaches would need either other storage, or some horrid scheme that I
> > > won't go so far as to write on the screen. :)
> > 
> > Well, I'm not sure if you consider this ugly but currently we use
> > get_page() in that path exactly so that bio_release_pages() does not have
> > to care about zero page. So now we could grab pin on the zero page instead
> > through try_grab_page() or something like that...
> > 
> > 								Honza
> 
> So it sounds like you prefer this over checking for the zero page in
> bio_release_pages(). I'll take a look at both ideas, then, and see what
> it looks like.

Yes, I somewhat prefer this because it seems more transparent to me.
Furthermore if e.g. we can have zero page mapped to userspace (not sure if
we can for normal mappings but at least for DAX mapping we can), and userspace
provides such mapping as a buffer for direct IO write, then we'll get zero
page attached to bio through iov_iter_get_pages() and we'd have to be very
careful to special-case zero page in iov_iter_get_pages() as well. Overall
it seems fragile to me... So it seems more robust to make sure all pages we
attach to bio are pinned.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()
  2022-01-24 19:48   ` John Hubbard
@ 2022-01-26  5:42     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 11+ messages in thread
From: Chaitanya Kulkarni @ 2022-01-26  5:42 UTC (permalink / raw)
  To: John Hubbard
  Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-block, Matthew Wilcox

On 1/24/22 11:48 AM, John Hubbard wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 1/24/22 05:18, Matthew Wilcox wrote:
>> On Sun, Jan 23, 2022 at 11:52:07PM -0800, John Hubbard wrote:
>>> To ground this in reality, one of the partial call stacks is:
>>>
>>> do_direct_IO()
>>>      dio_zero_block()
>>>          page = ZERO_PAGE(0); <-- This is a problem
>>>
>>> I'm not sure what to use, instead of that zero page! The zero page
>>> doesn't need to be allocated nor tracked, and so any replacement
>>> approaches would need either other storage, or some horrid scheme that I
>>> won't go so far as to write on the screen. :)
>>
>> I'm not really sure what the problem is.
>>
>> include/linux/mm.h:             is_zero_pfn(page_to_pfn(page));
>>
>> and release_pages() already contains:
>>                  if (is_huge_zero_page(page))
>>                          continue;
>>
>> Why can't the BIO release function contain an is_zero_pfn() check?
> 
> "OK." (With potential modifications as noted in the other thread with
> Jan Kara and Chaitanya, we'll see.)
> 
> Thanks for responding with that answer so quickly, much appreciated!
> 
> 

Just one more thing, sine we will be modifying the code for
allocation and de-allocation either way, it will be worth
documenting the performance impact on dio code with and
without the solutioneven if that is negligible in the cover-letter.

> thanks,
> -- 
> John Hubbard
> NVIDIA


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-01-26  5:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-24  7:52 RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages() John Hubbard
2022-01-24 10:05 ` Jan Kara
2022-01-24 10:38   ` Chaitanya Kulkarni
2022-01-24 12:19     ` Jan Kara
2022-01-24 19:34       ` John Hubbard
2022-01-24 21:06         ` Jan Kara
2022-01-24 21:06   ` John Hubbard
2022-01-24 22:17     ` Jan Kara
2022-01-24 13:18 ` Matthew Wilcox
2022-01-24 19:48   ` John Hubbard
2022-01-26  5:42     ` Chaitanya Kulkarni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.