Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [LSF/MM TOPIC] get_user_pages() pins in file mappings
@ 2019-01-24  9:04 Jan Kara
  2019-01-26  2:58 ` John Hubbard
  2019-02-04 23:46 ` John Hubbard
  0 siblings, 2 replies; 5+ messages in thread
From: Jan Kara @ 2019-01-24  9:04 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse, John Hubbard

This is a joint proposal with Dan Williams, John Hubbard, and Jérôme
Glisse.

Last year we've talked with Dan about issues we have with filesystems and
GUP [1]. The crux of the problem lies in the fact that there is no
coordination (or even awareness) between filesystem working on a page (such
as doing writeback) and GUP user modifying page contents and setting it
dirty. This can (and we have user reports of this) lead to data corruption,
kernel crashes, and other fun.

Since last year we have worked together on solving these problems and we
have explored couple dead ends as well as hopefully found solutions to some
of the partial problems. So I'd like to give some overview of where we
stand and what remains to be solved and get thoughts from wider community
about proposed solutions / problems to be solved.

In particular we hope to have reasonably robust mechanism of identifying
pages pinned by GUP (patches will be posted soon) - I'd like to run that by
MM folks (unless discussion happens on mailing lists before LSF/MM). We
also have ideas how filesystems should react to pinned page in their
writepages methods - there will be some changes needed in some filesystems
to bounce the page if they need stable page contents. So I'd like to
explain why we chose to do bouncing to fs people (i.e., why we cannot just
wait, skip the page, do something else etc.) to save us from the same
discussion with each fs separately and also hash out what the API for
filesystems to do this should look like. Finally we plan to keep pinned
page permanently dirty - again something I'd like to explain why we do this
and gather input from other people.

This should be ideally shared MM + FS session.

[1] https://lwn.net/Articles/753027/

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] get_user_pages() pins in file mappings
  2019-01-24  9:04 [LSF/MM TOPIC] get_user_pages() pins in file mappings Jan Kara
@ 2019-01-26  2:58 ` John Hubbard
  2019-02-04 23:46 ` John Hubbard
  1 sibling, 0 replies; 5+ messages in thread
From: John Hubbard @ 2019-01-26  2:58 UTC (permalink / raw)
  To: Jan Kara, lsf-pc; +Cc: linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse

On 1/24/19 1:04 AM, Jan Kara wrote:
> This is a joint proposal with Dan Williams, John Hubbard, and Jérôme
> Glisse.
> 
> Last year we've talked with Dan about issues we have with filesystems and
> GUP [1]. The crux of the problem lies in the fact that there is no
> coordination (or even awareness) between filesystem working on a page (such
> as doing writeback) and GUP user modifying page contents and setting it
> dirty. This can (and we have user reports of this) lead to data corruption,
> kernel crashes, and other fun.
> 
> Since last year we have worked together on solving these problems and we
> have explored couple dead ends as well as hopefully found solutions to some
> of the partial problems. So I'd like to give some overview of where we
> stand and what remains to be solved and get thoughts from wider community
> about proposed solutions / problems to be solved.
> 
> In particular we hope to have reasonably robust mechanism of identifying
> pages pinned by GUP (patches will be posted soon) - I'd like to run that by
> MM folks (unless discussion happens on mailing lists before LSF/MM). We
> also have ideas how filesystems should react to pinned page in their
> writepages methods - there will be some changes needed in some filesystems
> to bounce the page if they need stable page contents. So I'd like to
> explain why we chose to do bouncing to fs people (i.e., why we cannot just
> wait, skip the page, do something else etc.) to save us from the same
> discussion with each fs separately and also hash out what the API for
> filesystems to do this should look like. Finally we plan to keep pinned
> page permanently dirty - again something I'd like to explain why we do this
> and gather input from other people.
> 
> This should be ideally shared MM + FS session.
> 
> [1] https://lwn.net/Articles/753027/
> 

Yes! I'd like to attend and discuss this, for sure. 

Meanwhile, as usual, I'm a bit late on posting an updated RFC for the page
identification part, but that's coming very soon.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] get_user_pages() pins in file mappings
  2019-01-24  9:04 [LSF/MM TOPIC] get_user_pages() pins in file mappings Jan Kara
  2019-01-26  2:58 ` John Hubbard
@ 2019-02-04 23:46 ` John Hubbard
  2019-02-05 11:21   ` Jan Kara
  1 sibling, 1 reply; 5+ messages in thread
From: John Hubbard @ 2019-02-04 23:46 UTC (permalink / raw)
  To: Jan Kara, lsf-pc; +Cc: linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse

On 1/24/19 1:04 AM, Jan Kara wrote:

> In particular we hope to have reasonably robust mechanism of identifying
> pages pinned by GUP (patches will be posted soon) - I'd like to run that by
> MM folks (unless discussion happens on mailing lists before LSF/MM). We
> also have ideas how filesystems should react to pinned page in their
> writepages methods - there will be some changes needed in some filesystems
> to bounce the page if they need stable page contents. So I'd like to
> explain why we chose to do bouncing to fs people (i.e., why we cannot just
> wait, skip the page, do something else etc.) to save us from the same
> discussion with each fs separately and also hash out what the API for
> filesystems to do this should look like. Finally we plan to keep pinned
> page permanently dirty - again something I'd like to explain why we do this
> and gather input from other people.

Hi Jan,

Say, I was just talking through this point with someone on our driver team, 
and suddenly realized that I'm now slightly confused on one point. If we end
up keeping the gup-pinned pages effectively permanently dirty while pinned,
then maybe the call sites no longer need to specify "dirty" (or not) when
they call put_user_page*()?

In other words, the RFC [1] has this API:

    void put_user_page(struct page *page);
    void put_user_pages_dirty(struct page **pages, unsigned long npages);
    void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
    void put_user_pages(struct page **pages, unsigned long npages);

But maybe we only really need this:

    void put_user_page(struct page *page);
    void put_user_pages(struct page **pages, unsigned long npages);

?

[1] https://lkml.kernel.org/r/20190204052135.25784-1-jhubbard@nvidia.com

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] get_user_pages() pins in file mappings
  2019-02-04 23:46 ` John Hubbard
@ 2019-02-05 11:21   ` Jan Kara
  2019-02-06  2:10     ` John Hubbard
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2019-02-05 11:21 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jan Kara, lsf-pc, linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse

Hi John,

On Mon 04-02-19 15:46:10, John Hubbard wrote:
> On 1/24/19 1:04 AM, Jan Kara wrote:
> 
> > In particular we hope to have reasonably robust mechanism of identifying
> > pages pinned by GUP (patches will be posted soon) - I'd like to run that by
> > MM folks (unless discussion happens on mailing lists before LSF/MM). We
> > also have ideas how filesystems should react to pinned page in their
> > writepages methods - there will be some changes needed in some filesystems
> > to bounce the page if they need stable page contents. So I'd like to
> > explain why we chose to do bouncing to fs people (i.e., why we cannot just
> > wait, skip the page, do something else etc.) to save us from the same
> > discussion with each fs separately and also hash out what the API for
> > filesystems to do this should look like. Finally we plan to keep pinned
> > page permanently dirty - again something I'd like to explain why we do this
> > and gather input from other people.
> 
> Hi Jan,
> 
> Say, I was just talking through this point with someone on our driver team, 
> and suddenly realized that I'm now slightly confused on one point. If we end
> up keeping the gup-pinned pages effectively permanently dirty while pinned,
> then maybe the call sites no longer need to specify "dirty" (or not) when
> they call put_user_page*()?
> 
> In other words, the RFC [1] has this API:
> 
>     void put_user_page(struct page *page);
>     void put_user_pages_dirty(struct page **pages, unsigned long npages);
>     void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
>     void put_user_pages(struct page **pages, unsigned long npages);
> 
> But maybe we only really need this:
> 
>     void put_user_page(struct page *page);
>     void put_user_pages(struct page **pages, unsigned long npages);
> 
> ?
> 
> [1] https://lkml.kernel.org/r/20190204052135.25784-1-jhubbard@nvidia.com

So you are right that if we keep gup-pinned pages dirty, drivers could get
away without marking them as such. However I view "keep pages dirty" as an
implementation detail, rather than a promise of the API. So I'd like to
leave us the flexibility of choosing a different implementation in the
future. And as such I'd just leave the put_user_pages_dirty() variants in
place.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] get_user_pages() pins in file mappings
  2019-02-05 11:21   ` Jan Kara
@ 2019-02-06  2:10     ` John Hubbard
  0 siblings, 0 replies; 5+ messages in thread
From: John Hubbard @ 2019-02-06  2:10 UTC (permalink / raw)
  To: Jan Kara; +Cc: lsf-pc, linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse

On 2/5/19 3:21 AM, Jan Kara wrote:
> Hi John,
> 
> On Mon 04-02-19 15:46:10, John Hubbard wrote:
>> On 1/24/19 1:04 AM, Jan Kara wrote:
>>
>>> In particular we hope to have reasonably robust mechanism of identifying
>>> pages pinned by GUP (patches will be posted soon) - I'd like to run that by
>>> MM folks (unless discussion happens on mailing lists before LSF/MM). We
>>> also have ideas how filesystems should react to pinned page in their
>>> writepages methods - there will be some changes needed in some filesystems
>>> to bounce the page if they need stable page contents. So I'd like to
>>> explain why we chose to do bouncing to fs people (i.e., why we cannot just
>>> wait, skip the page, do something else etc.) to save us from the same
>>> discussion with each fs separately and also hash out what the API for
>>> filesystems to do this should look like. Finally we plan to keep pinned
>>> page permanently dirty - again something I'd like to explain why we do this
>>> and gather input from other people.
>>
>> Hi Jan,
>>
>> Say, I was just talking through this point with someone on our driver team, 
>> and suddenly realized that I'm now slightly confused on one point. If we end
>> up keeping the gup-pinned pages effectively permanently dirty while pinned,
>> then maybe the call sites no longer need to specify "dirty" (or not) when
>> they call put_user_page*()?
>>
>> In other words, the RFC [1] has this API:
>>
>>     void put_user_page(struct page *page);
>>     void put_user_pages_dirty(struct page **pages, unsigned long npages);
>>     void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
>>     void put_user_pages(struct page **pages, unsigned long npages);
>>
>> But maybe we only really need this:
>>
>>     void put_user_page(struct page *page);
>>     void put_user_pages(struct page **pages, unsigned long npages);
>>
>> ?
>>
>> [1] https://lkml.kernel.org/r/20190204052135.25784-1-jhubbard@nvidia.com
> 
> So you are right that if we keep gup-pinned pages dirty, drivers could get
> away without marking them as such. However I view "keep pages dirty" as an
> implementation detail, rather than a promise of the API. So I'd like to
> leave us the flexibility of choosing a different implementation in the
> future. And as such I'd just leave the put_user_pages_dirty() variants in
> place.
> 
> 								Honza

OK, sounds good. And after all, removing that information from the call sites 
is easy, but adding it back in would be hell, so leaving it in definitely
sounds better. I'm just looking for anything that says "this API isn't ready
yet", but I guess we're still OK there.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-24  9:04 [LSF/MM TOPIC] get_user_pages() pins in file mappings Jan Kara
2019-01-26  2:58 ` John Hubbard
2019-02-04 23:46 ` John Hubbard
2019-02-05 11:21   ` Jan Kara
2019-02-06  2:10     ` John Hubbard

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox