* [LSF/MM TOPIC] get_user_pages() pins in file mappings @ 2019-01-24 9:04 Jan Kara 2019-01-26 2:58 ` John Hubbard 2019-02-04 23:46 ` John Hubbard 0 siblings, 2 replies; 5+ messages in thread From: Jan Kara @ 2019-01-24 9:04 UTC (permalink / raw) To: lsf-pc; +Cc: linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse, John Hubbard This is a joint proposal with Dan Williams, John Hubbard, and Jérôme Glisse. Last year we've talked with Dan about issues we have with filesystems and GUP [1]. The crux of the problem lies in the fact that there is no coordination (or even awareness) between filesystem working on a page (such as doing writeback) and GUP user modifying page contents and setting it dirty. This can (and we have user reports of this) lead to data corruption, kernel crashes, and other fun. Since last year we have worked together on solving these problems and we have explored couple dead ends as well as hopefully found solutions to some of the partial problems. So I'd like to give some overview of where we stand and what remains to be solved and get thoughts from wider community about proposed solutions / problems to be solved. In particular we hope to have reasonably robust mechanism of identifying pages pinned by GUP (patches will be posted soon) - I'd like to run that by MM folks (unless discussion happens on mailing lists before LSF/MM). We also have ideas how filesystems should react to pinned page in their writepages methods - there will be some changes needed in some filesystems to bounce the page if they need stable page contents. So I'd like to explain why we chose to do bouncing to fs people (i.e., why we cannot just wait, skip the page, do something else etc.) to save us from the same discussion with each fs separately and also hash out what the API for filesystems to do this should look like. Finally we plan to keep pinned page permanently dirty - again something I'd like to explain why we do this and gather input from other people. This should be ideally shared MM + FS session. [1] https://lwn.net/Articles/753027/ Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM TOPIC] get_user_pages() pins in file mappings 2019-01-24 9:04 [LSF/MM TOPIC] get_user_pages() pins in file mappings Jan Kara @ 2019-01-26 2:58 ` John Hubbard 2019-02-04 23:46 ` John Hubbard 1 sibling, 0 replies; 5+ messages in thread From: John Hubbard @ 2019-01-26 2:58 UTC (permalink / raw) To: Jan Kara, lsf-pc; +Cc: linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse On 1/24/19 1:04 AM, Jan Kara wrote: > This is a joint proposal with Dan Williams, John Hubbard, and Jérôme > Glisse. > > Last year we've talked with Dan about issues we have with filesystems and > GUP [1]. The crux of the problem lies in the fact that there is no > coordination (or even awareness) between filesystem working on a page (such > as doing writeback) and GUP user modifying page contents and setting it > dirty. This can (and we have user reports of this) lead to data corruption, > kernel crashes, and other fun. > > Since last year we have worked together on solving these problems and we > have explored couple dead ends as well as hopefully found solutions to some > of the partial problems. So I'd like to give some overview of where we > stand and what remains to be solved and get thoughts from wider community > about proposed solutions / problems to be solved. > > In particular we hope to have reasonably robust mechanism of identifying > pages pinned by GUP (patches will be posted soon) - I'd like to run that by > MM folks (unless discussion happens on mailing lists before LSF/MM). We > also have ideas how filesystems should react to pinned page in their > writepages methods - there will be some changes needed in some filesystems > to bounce the page if they need stable page contents. So I'd like to > explain why we chose to do bouncing to fs people (i.e., why we cannot just > wait, skip the page, do something else etc.) to save us from the same > discussion with each fs separately and also hash out what the API for > filesystems to do this should look like. Finally we plan to keep pinned > page permanently dirty - again something I'd like to explain why we do this > and gather input from other people. > > This should be ideally shared MM + FS session. > > [1] https://lwn.net/Articles/753027/ > Yes! I'd like to attend and discuss this, for sure. Meanwhile, as usual, I'm a bit late on posting an updated RFC for the page identification part, but that's coming very soon. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM TOPIC] get_user_pages() pins in file mappings 2019-01-24 9:04 [LSF/MM TOPIC] get_user_pages() pins in file mappings Jan Kara 2019-01-26 2:58 ` John Hubbard @ 2019-02-04 23:46 ` John Hubbard 2019-02-05 11:21 ` Jan Kara 1 sibling, 1 reply; 5+ messages in thread From: John Hubbard @ 2019-02-04 23:46 UTC (permalink / raw) To: Jan Kara, lsf-pc; +Cc: linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse On 1/24/19 1:04 AM, Jan Kara wrote: > In particular we hope to have reasonably robust mechanism of identifying > pages pinned by GUP (patches will be posted soon) - I'd like to run that by > MM folks (unless discussion happens on mailing lists before LSF/MM). We > also have ideas how filesystems should react to pinned page in their > writepages methods - there will be some changes needed in some filesystems > to bounce the page if they need stable page contents. So I'd like to > explain why we chose to do bouncing to fs people (i.e., why we cannot just > wait, skip the page, do something else etc.) to save us from the same > discussion with each fs separately and also hash out what the API for > filesystems to do this should look like. Finally we plan to keep pinned > page permanently dirty - again something I'd like to explain why we do this > and gather input from other people. Hi Jan, Say, I was just talking through this point with someone on our driver team, and suddenly realized that I'm now slightly confused on one point. If we end up keeping the gup-pinned pages effectively permanently dirty while pinned, then maybe the call sites no longer need to specify "dirty" (or not) when they call put_user_page*()? In other words, the RFC [1] has this API: void put_user_page(struct page *page); void put_user_pages_dirty(struct page **pages, unsigned long npages); void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); void put_user_pages(struct page **pages, unsigned long npages); But maybe we only really need this: void put_user_page(struct page *page); void put_user_pages(struct page **pages, unsigned long npages); ? [1] https://lkml.kernel.org/r/20190204052135.25784-1-jhubbard@nvidia.com thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM TOPIC] get_user_pages() pins in file mappings 2019-02-04 23:46 ` John Hubbard @ 2019-02-05 11:21 ` Jan Kara 2019-02-06 2:10 ` John Hubbard 0 siblings, 1 reply; 5+ messages in thread From: Jan Kara @ 2019-02-05 11:21 UTC (permalink / raw) To: John Hubbard Cc: Jan Kara, lsf-pc, linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse Hi John, On Mon 04-02-19 15:46:10, John Hubbard wrote: > On 1/24/19 1:04 AM, Jan Kara wrote: > > > In particular we hope to have reasonably robust mechanism of identifying > > pages pinned by GUP (patches will be posted soon) - I'd like to run that by > > MM folks (unless discussion happens on mailing lists before LSF/MM). We > > also have ideas how filesystems should react to pinned page in their > > writepages methods - there will be some changes needed in some filesystems > > to bounce the page if they need stable page contents. So I'd like to > > explain why we chose to do bouncing to fs people (i.e., why we cannot just > > wait, skip the page, do something else etc.) to save us from the same > > discussion with each fs separately and also hash out what the API for > > filesystems to do this should look like. Finally we plan to keep pinned > > page permanently dirty - again something I'd like to explain why we do this > > and gather input from other people. > > Hi Jan, > > Say, I was just talking through this point with someone on our driver team, > and suddenly realized that I'm now slightly confused on one point. If we end > up keeping the gup-pinned pages effectively permanently dirty while pinned, > then maybe the call sites no longer need to specify "dirty" (or not) when > they call put_user_page*()? > > In other words, the RFC [1] has this API: > > void put_user_page(struct page *page); > void put_user_pages_dirty(struct page **pages, unsigned long npages); > void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); > void put_user_pages(struct page **pages, unsigned long npages); > > But maybe we only really need this: > > void put_user_page(struct page *page); > void put_user_pages(struct page **pages, unsigned long npages); > > ? > > [1] https://lkml.kernel.org/r/20190204052135.25784-1-jhubbard@nvidia.com So you are right that if we keep gup-pinned pages dirty, drivers could get away without marking them as such. However I view "keep pages dirty" as an implementation detail, rather than a promise of the API. So I'd like to leave us the flexibility of choosing a different implementation in the future. And as such I'd just leave the put_user_pages_dirty() variants in place. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM TOPIC] get_user_pages() pins in file mappings 2019-02-05 11:21 ` Jan Kara @ 2019-02-06 2:10 ` John Hubbard 0 siblings, 0 replies; 5+ messages in thread From: John Hubbard @ 2019-02-06 2:10 UTC (permalink / raw) To: Jan Kara; +Cc: lsf-pc, linux-fsdevel, linux-mm, Dan Williams, Jerome Glisse On 2/5/19 3:21 AM, Jan Kara wrote: > Hi John, > > On Mon 04-02-19 15:46:10, John Hubbard wrote: >> On 1/24/19 1:04 AM, Jan Kara wrote: >> >>> In particular we hope to have reasonably robust mechanism of identifying >>> pages pinned by GUP (patches will be posted soon) - I'd like to run that by >>> MM folks (unless discussion happens on mailing lists before LSF/MM). We >>> also have ideas how filesystems should react to pinned page in their >>> writepages methods - there will be some changes needed in some filesystems >>> to bounce the page if they need stable page contents. So I'd like to >>> explain why we chose to do bouncing to fs people (i.e., why we cannot just >>> wait, skip the page, do something else etc.) to save us from the same >>> discussion with each fs separately and also hash out what the API for >>> filesystems to do this should look like. Finally we plan to keep pinned >>> page permanently dirty - again something I'd like to explain why we do this >>> and gather input from other people. >> >> Hi Jan, >> >> Say, I was just talking through this point with someone on our driver team, >> and suddenly realized that I'm now slightly confused on one point. If we end >> up keeping the gup-pinned pages effectively permanently dirty while pinned, >> then maybe the call sites no longer need to specify "dirty" (or not) when >> they call put_user_page*()? >> >> In other words, the RFC [1] has this API: >> >> void put_user_page(struct page *page); >> void put_user_pages_dirty(struct page **pages, unsigned long npages); >> void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); >> void put_user_pages(struct page **pages, unsigned long npages); >> >> But maybe we only really need this: >> >> void put_user_page(struct page *page); >> void put_user_pages(struct page **pages, unsigned long npages); >> >> ? >> >> [1] https://lkml.kernel.org/r/20190204052135.25784-1-jhubbard@nvidia.com > > So you are right that if we keep gup-pinned pages dirty, drivers could get > away without marking them as such. However I view "keep pages dirty" as an > implementation detail, rather than a promise of the API. So I'd like to > leave us the flexibility of choosing a different implementation in the > future. And as such I'd just leave the put_user_pages_dirty() variants in > place. > > Honza OK, sounds good. And after all, removing that information from the call sites is easy, but adding it back in would be hell, so leaving it in definitely sounds better. I'm just looking for anything that says "this API isn't ready yet", but I guess we're still OK there. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-02-06 2:10 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-24 9:04 [LSF/MM TOPIC] get_user_pages() pins in file mappings Jan Kara 2019-01-26 2:58 ` John Hubbard 2019-02-04 23:46 ` John Hubbard 2019-02-05 11:21 ` Jan Kara 2019-02-06 2:10 ` John Hubbard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).