From: John Hubbard <jhubbard@nvidia.com> To: "Dan Williams" <dan.j.williams@intel.com>, "Jérôme Glisse" <jglisse@redhat.com> Cc: John Hubbard <john.hubbard@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Linux MM <linux-mm@kvack.org>, Jan Kara <jack@suse.cz>, <tom@talpey.com>, Al Viro <viro@zeniv.linux.org.uk>, <benve@cisco.com>, Christoph Hellwig <hch@infradead.org>, Christopher Lameter <cl@linux.com>, "Dalessandro, Dennis" <dennis.dalessandro@intel.com>, Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@ziepe.ca>, Matthew Wilcox <willy@infradead.org>, Michal Hocko <mhocko@kernel.org>, <mike.marciniszyn@intel.com>, <rcampbell@nvidia.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org> Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions Date: Tue, 4 Dec 2018 16:59:55 -0800 Message-ID: <6455f657-708d-5b7f-00bf-89ca8a226c8e@nvidia.com> (raw) In-Reply-To: <CAPcyv4hhkB09eBzPErVntZrNYE0RP5+6Z6+sP0kq6ng9ZOzTfg@mail.gmail.com> On 12/4/18 4:40 PM, Dan Williams wrote: > On Tue, Dec 4, 2018 at 4:37 PM Jerome Glisse <jglisse@redhat.com> wrote: >> >> On Tue, Dec 04, 2018 at 03:03:02PM -0800, Dan Williams wrote: >>> On Tue, Dec 4, 2018 at 1:56 PM John Hubbard <jhubbard@nvidia.com> wrote: >>>> >>>> On 12/4/18 12:28 PM, Dan Williams wrote: >>>>> On Mon, Dec 3, 2018 at 4:17 PM <john.hubbard@gmail.com> wrote: >>>>>> >>>>>> From: John Hubbard <jhubbard@nvidia.com> >>>>>> >>>>>> Introduces put_user_page(), which simply calls put_page(). >>>>>> This provides a way to update all get_user_pages*() callers, >>>>>> so that they call put_user_page(), instead of put_page(). >>>>>> >>>>>> Also introduces put_user_pages(), and a few dirty/locked variations, >>>>>> as a replacement for release_pages(), and also as a replacement >>>>>> for open-coded loops that release multiple pages. >>>>>> These may be used for subsequent performance improvements, >>>>>> via batching of pages to be released. >>>>>> >>>>>> This is the first step of fixing the problem described in [1]. The steps >>>>>> are: >>>>>> >>>>>> 1) (This patch): provide put_user_page*() routines, intended to be used >>>>>> for releasing pages that were pinned via get_user_pages*(). >>>>>> >>>>>> 2) Convert all of the call sites for get_user_pages*(), to >>>>>> invoke put_user_page*(), instead of put_page(). This involves dozens of >>>>>> call sites, and will take some time. >>>>>> >>>>>> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to >>>>>> implement tracking of these pages. This tracking will be separate from >>>>>> the existing struct page refcounting. >>>>>> >>>>>> 4) Use the tracking and identification of these pages, to implement >>>>>> special handling (especially in writeback paths) when the pages are >>>>>> backed by a filesystem. Again, [1] provides details as to why that is >>>>>> desirable. >>>>> >>>>> I thought at Plumbers we talked about using a page bit to tag pages >>>>> that have had their reference count elevated by get_user_pages()? That >>>>> way there is no need to distinguish put_page() from put_user_page() it >>>>> just happens internally to put_page(). At the conference Matthew was >>>>> offering to free up a page bit for this purpose. >>>>> >>>> >>>> ...but then, upon further discussion in that same session, we realized that >>>> that doesn't help. You need a reference count. Otherwise a random put_page >>>> could affect your dma-pinned pages, etc, etc. >>> >>> Ok, sorry, I mis-remembered. So, you're effectively trying to capture >>> the end of the page pin event separate from the final 'put' of the >>> page? Makes sense. >>> >>>> I was not able to actually find any place where a single additional page >>>> bit would help our situation, which is why this still uses LRU fields for >>>> both the two bits required (the RFC [1] still applies), and the dma_pinned_count. >>> >>> Except the LRU fields are already in use for ZONE_DEVICE pages... how >>> does this proposal interact with those? >>> >>>> [1] https://lore.kernel.org/r/20181110085041.10071-7-jhubbard@nvidia.com >>>> >>>>>> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" >>>>>> >>>>>> Reviewed-by: Jan Kara <jack@suse.cz> >>>>> >>>>> Wish, you could have been there Jan. I'm missing why it's safe to >>>>> assume that a single put_user_page() is paired with a get_user_page()? >>>>> >>>> >>>> A put_user_page() per page, or a put_user_pages() for an array of pages. See >>>> patch 0002 for several examples. >>> >>> Yes, however I was more concerned about validation and trying to >>> locate missed places where put_page() is used instead of >>> put_user_page(). >>> >>> It would be interesting to see if we could have a debug mode where >>> get_user_pages() returned dynamically allocated pages from a known >>> address range and catch drivers that operate on a user-pinned page >>> without using the proper helper to 'put' it. I think we might also >>> need a ref_user_page() for drivers that may do their own get_page() >>> and expect the dma_pinned_count to also increase. Good idea about a new ref_user_page() call. It's going to hard to find those places at all of the call sites, btw. >> >> Total crazy idea for this, but this is the right time of day >> for this (for me at least it is beer time :)) What about mapping >> all struct page in two different range of kernel virtual address >> and when get user space is use it returns a pointer from the second >> range of kernel virtual address to the struct page. Then in put_page >> you know for sure if the code putting the page got it from GUP or >> from somewhere else. page_to_pfn() would need some trickery to >> handle that. > > Yes, exactly what I was thinking, if only as a debug mode since > instrumenting every pfn/page translation would be expensive. > That does sound viable as a debug mode. I'll try it out. A reliable way (in both directions) of sorting out put_page() vs. put_user_page() would be a huge improvement, even if just in debug mode. >> Dunno if we are running out of kernel virtual address (outside >> 32bits that i believe we are trying to shot down quietly behind >> the bar). > > There's room, KASAN is in a roughly similar place. > Looks like I'd better post a new version of the entire RFC, rather than just these two patches. It's still less fully-baked than I'd hoped. :) thanks, -- John Hubbard NVIDIA
next prev parent reply index Thread overview: 213+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-12-04 0:17 [PATCH 0/2] put_user_page*(): start converting the call sites john.hubbard 2018-12-04 0:17 ` [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions john.hubbard 2018-12-04 7:53 ` Mike Rapoport 2018-12-05 1:40 ` John Hubbard 2018-12-04 20:28 ` Dan Williams 2018-12-04 21:56 ` John Hubbard 2018-12-04 23:03 ` Dan Williams 2018-12-05 0:36 ` Jerome Glisse 2018-12-05 0:40 ` Dan Williams 2018-12-05 0:59 ` John Hubbard [this message] 2018-12-05 0:58 ` John Hubbard 2018-12-05 1:00 ` Dan Williams 2018-12-05 1:15 ` Matthew Wilcox 2018-12-05 1:44 ` Jerome Glisse 2018-12-05 1:57 ` John Hubbard 2018-12-07 2:45 ` John Hubbard 2018-12-07 19:16 ` Jerome Glisse 2018-12-07 19:26 ` Dan Williams 2018-12-07 19:40 ` Jerome Glisse 2018-12-08 0:52 ` John Hubbard 2018-12-08 2:24 ` Jerome Glisse 2018-12-10 10:28 ` Jan Kara 2018-12-12 15:03 ` Jerome Glisse 2018-12-12 16:27 ` Dan Williams 2018-12-12 17:02 ` Jerome Glisse 2018-12-12 17:49 ` Dan Williams 2018-12-12 19:07 ` John Hubbard 2018-12-12 21:30 ` Jerome Glisse 2018-12-12 21:40 ` Dan Williams 2018-12-12 21:53 ` Jerome Glisse 2018-12-12 22:11 ` Matthew Wilcox 2018-12-12 22:16 ` Jerome Glisse 2018-12-12 23:37 ` Jason Gunthorpe 2018-12-12 23:46 ` John Hubbard 2018-12-12 23:54 ` Dan Williams 2018-12-13 0:01 ` Jerome Glisse 2018-12-13 0:18 ` Dan Williams 2018-12-13 0:44 ` Jerome Glisse 2018-12-13 3:26 ` Jason Gunthorpe 2018-12-13 3:20 ` Jason Gunthorpe 2018-12-13 12:43 ` Jerome Glisse 2018-12-13 13:40 ` Tom Talpey 2018-12-13 14:18 ` Jerome Glisse 2018-12-13 14:51 ` Tom Talpey 2018-12-13 15:18 ` Jerome Glisse 2018-12-13 18:12 ` Tom Talpey 2018-12-13 19:18 ` Jerome Glisse 2018-12-14 10:41 ` Jan Kara 2018-12-14 15:25 ` Jerome Glisse 2018-12-12 21:56 ` John Hubbard 2018-12-12 22:04 ` Jerome Glisse 2018-12-12 22:11 ` John Hubbard 2018-12-12 22:14 ` Jerome Glisse 2018-12-12 22:17 ` John Hubbard 2018-12-12 21:46 ` Dave Chinner 2018-12-12 21:59 ` Jerome Glisse 2018-12-13 0:51 ` Dave Chinner 2018-12-13 2:02 ` Jerome Glisse 2018-12-13 15:56 ` Christopher Lameter 2018-12-13 16:02 ` Jerome Glisse 2018-12-14 6:00 ` Dave Chinner 2018-12-14 15:13 ` Jerome Glisse 2018-12-14 3:52 ` John Hubbard 2018-12-14 5:21 ` Dan Williams 2018-12-14 6:11 ` John Hubbard 2018-12-14 15:20 ` Jerome Glisse 2018-12-14 19:38 ` Dan Williams 2018-12-14 19:48 ` Matthew Wilcox 2018-12-14 19:53 ` Dave Hansen 2018-12-14 20:03 ` Matthew Wilcox 2018-12-14 20:17 ` Dan Williams 2018-12-14 20:29 ` Matthew Wilcox 2018-12-15 0:41 ` John Hubbard 2018-12-17 8:56 ` Jan Kara 2018-12-17 18:28 ` Dan Williams 2018-12-14 15:43 ` Jan Kara 2018-12-16 21:58 ` Dave Chinner 2018-12-17 18:11 ` Jerome Glisse 2018-12-17 18:34 ` Matthew Wilcox 2018-12-17 19:48 ` Jerome Glisse 2018-12-17 19:51 ` Matthew Wilcox 2018-12-17 19:54 ` Jerome Glisse 2018-12-17 19:59 ` Matthew Wilcox 2018-12-17 20:55 ` Jerome Glisse 2018-12-17 21:03 ` Matthew Wilcox 2018-12-17 21:15 ` Jerome Glisse 2018-12-18 1:09 ` Dave Chinner 2018-12-18 6:12 ` Darrick J. Wong 2018-12-18 9:30 ` Jan Kara 2018-12-18 23:29 ` John Hubbard 2018-12-19 2:07 ` Jerome Glisse 2018-12-19 11:08 ` Jan Kara 2018-12-20 10:54 ` John Hubbard 2018-12-20 16:50 ` Jerome Glisse 2018-12-20 16:57 ` Dan Williams 2018-12-20 16:49 ` Jerome Glisse 2019-01-03 1:55 ` Jerome Glisse 2019-01-03 3:27 ` John Hubbard 2019-01-03 14:57 ` Jerome Glisse 2019-01-03 9:26 ` Jan Kara 2019-01-03 14:44 ` Jerome Glisse 2019-01-11 2:59 ` John Hubbard 2019-01-11 2:59 ` John Hubbard 2019-01-11 16:51 ` Jerome Glisse 2019-01-11 16:51 ` Jerome Glisse 2019-01-12 1:04 ` John Hubbard 2019-01-12 1:04 ` John Hubbard 2019-01-12 2:02 ` Jerome Glisse 2019-01-12 2:02 ` Jerome Glisse 2019-01-12 2:38 ` John Hubbard 2019-01-12 2:38 ` John Hubbard 2019-01-12 2:46 ` Jerome Glisse 2019-01-12 2:46 ` Jerome Glisse 2019-01-12 3:06 ` John Hubbard 2019-01-12 3:06 ` John Hubbard 2019-01-12 3:25 ` Jerome Glisse 2019-01-12 3:25 ` Jerome Glisse 2019-01-12 20:46 ` John Hubbard 2019-01-12 20:46 ` John Hubbard 2019-01-14 14:54 ` Jan Kara 2019-01-14 14:54 ` Jan Kara 2019-01-14 17:21 ` Jerome Glisse 2019-01-14 17:21 ` Jerome Glisse 2019-01-14 19:09 ` John Hubbard 2019-01-14 19:09 ` John Hubbard 2019-01-15 8:34 ` Jan Kara 2019-01-15 8:34 ` Jan Kara 2019-01-15 21:39 ` John Hubbard 2019-01-15 21:39 ` John Hubbard 2019-01-15 8:07 ` Jan Kara 2019-01-15 8:07 ` Jan Kara 2019-01-15 17:15 ` Jerome Glisse 2019-01-15 17:15 ` Jerome Glisse 2019-01-15 21:56 ` John Hubbard 2019-01-15 21:56 ` John Hubbard 2019-01-15 22:12 ` Jerome Glisse 2019-01-15 22:12 ` Jerome Glisse 2019-01-16 0:44 ` John Hubbard 2019-01-16 0:44 ` John Hubbard 2019-01-16 1:56 ` Jerome Glisse 2019-01-16 1:56 ` Jerome Glisse 2019-01-16 2:01 ` Dan Williams 2019-01-16 2:01 ` Dan Williams 2019-01-16 2:23 ` Jerome Glisse 2019-01-16 2:23 ` Jerome Glisse 2019-01-16 4:34 ` Dave Chinner 2019-01-16 4:34 ` Dave Chinner 2019-01-16 14:50 ` Jerome Glisse 2019-01-16 14:50 ` Jerome Glisse 2019-01-16 22:51 ` Dave Chinner 2019-01-16 22:51 ` Dave Chinner 2019-01-16 11:38 ` Jan Kara 2019-01-16 11:38 ` Jan Kara 2019-01-16 13:08 ` Jerome Glisse 2019-01-16 13:08 ` Jerome Glisse 2019-01-17 5:42 ` John Hubbard 2019-01-17 5:42 ` John Hubbard 2019-01-17 15:21 ` Jerome Glisse 2019-01-17 15:21 ` Jerome Glisse 2019-01-18 0:16 ` Dave Chinner 2019-01-18 1:59 ` Jerome Glisse 2019-01-17 9:30 ` Jan Kara 2019-01-17 9:30 ` Jan Kara 2019-01-17 15:17 ` Jerome Glisse 2019-01-17 15:17 ` Jerome Glisse 2019-01-22 15:24 ` Jan Kara 2019-01-22 16:46 ` Jerome Glisse 2019-01-23 18:02 ` Jan Kara 2019-01-23 19:04 ` Jerome Glisse 2019-01-29 0:22 ` John Hubbard 2019-01-29 1:23 ` Jerome Glisse 2019-01-29 6:41 ` John Hubbard 2019-01-29 10:12 ` Jan Kara 2019-01-30 2:21 ` John Hubbard 2019-01-17 5:25 ` John Hubbard 2019-01-17 5:25 ` John Hubbard 2019-01-17 9:04 ` Jan Kara 2019-01-17 9:04 ` Jan Kara 2019-01-12 3:14 ` Jerome Glisse 2019-01-12 3:14 ` Jerome Glisse 2018-12-18 10:33 ` Jan Kara 2018-12-18 23:42 ` Dave Chinner 2018-12-19 3:03 ` Jason Gunthorpe 2018-12-19 5:26 ` Dan Williams 2018-12-19 11:19 ` Jan Kara 2018-12-19 10:28 ` Dave Chinner 2018-12-19 11:35 ` Jan Kara 2018-12-19 16:56 ` Jason Gunthorpe 2018-12-19 22:33 ` Dave Chinner 2018-12-20 9:07 ` Jan Kara 2018-12-20 16:54 ` Jerome Glisse 2018-12-19 13:24 ` Jan Kara 2018-12-08 5:18 ` Matthew Wilcox 2018-12-12 19:13 ` John Hubbard 2018-12-08 7:16 ` Dan Williams 2018-12-08 16:33 ` Jerome Glisse 2018-12-08 16:48 ` Christoph Hellwig 2018-12-08 17:47 ` Jerome Glisse 2018-12-08 18:26 ` Christoph Hellwig 2018-12-08 18:45 ` Jerome Glisse 2018-12-08 18:09 ` Dan Williams 2018-12-08 18:12 ` Christoph Hellwig 2018-12-11 6:18 ` Dave Chinner 2018-12-05 5:52 ` Dan Williams 2018-12-05 11:16 ` Jan Kara 2018-12-04 0:17 ` [PATCH 2/2] infiniband/mm: convert put_page() to put_user_page*() john.hubbard 2018-12-04 17:10 ` [PATCH 0/2] put_user_page*(): start converting the call sites David Laight 2018-12-05 1:05 ` John Hubbard 2018-12-05 14:08 ` David Laight 2018-12-28 8:37 ` Pavel Machek 2019-02-08 7:56 [PATCH 0/2] mm: put_user_page() call site conversion first john.hubbard 2019-02-08 7:56 ` [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions john.hubbard 2019-02-08 10:32 ` Mike Rapoport 2019-02-08 20:44 ` John Hubbard
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=6455f657-708d-5b7f-00bf-89ca8a226c8e@nvidia.com \ --to=jhubbard@nvidia.com \ --cc=akpm@linux-foundation.org \ --cc=benve@cisco.com \ --cc=cl@linux.com \ --cc=dan.j.williams@intel.com \ --cc=dennis.dalessandro@intel.com \ --cc=dledford@redhat.com \ --cc=hch@infradead.org \ --cc=jack@suse.cz \ --cc=jgg@ziepe.ca \ --cc=jglisse@redhat.com \ --cc=john.hubbard@gmail.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=mike.marciniszyn@intel.com \ --cc=rcampbell@nvidia.com \ --cc=tom@talpey.com \ --cc=viro@zeniv.linux.org.uk \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-Fsdevel Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \ linux-fsdevel@vger.kernel.org public-inbox-index linux-fsdevel Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git