From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions To: Dan Williams CC: John Hubbard , Andrew Morton , Linux MM , Jan Kara , , Al Viro , , Christoph Hellwig , Christopher Lameter , "Dalessandro, Dennis" , Doug Ledford , Jason Gunthorpe , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Matthew Wilcox , Michal Hocko , , , Linux Kernel Mailing List , linux-fsdevel References: <20181204001720.26138-1-jhubbard@nvidia.com> <20181204001720.26138-2-jhubbard@nvidia.com> From: John Hubbard Message-ID: <3c91d335-921c-4704-d159-2975ff3a5f20@nvidia.com> Date: Tue, 4 Dec 2018 16:58:01 -0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US-large Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: On 12/4/18 3:03 PM, Dan Williams wrote: > On Tue, Dec 4, 2018 at 1:56 PM John Hubbard wrote: >> >> On 12/4/18 12:28 PM, Dan Williams wrote: >>> On Mon, Dec 3, 2018 at 4:17 PM wrote: >>>> >>>> From: John Hubbard >>>> >>>> Introduces put_user_page(), which simply calls put_page(). >>>> This provides a way to update all get_user_pages*() callers, >>>> so that they call put_user_page(), instead of put_page(). >>>> >>>> Also introduces put_user_pages(), and a few dirty/locked variations, >>>> as a replacement for release_pages(), and also as a replacement >>>> for open-coded loops that release multiple pages. >>>> These may be used for subsequent performance improvements, >>>> via batching of pages to be released. >>>> >>>> This is the first step of fixing the problem described in [1]. The steps >>>> are: >>>> >>>> 1) (This patch): provide put_user_page*() routines, intended to be used >>>> for releasing pages that were pinned via get_user_pages*(). >>>> >>>> 2) Convert all of the call sites for get_user_pages*(), to >>>> invoke put_user_page*(), instead of put_page(). This involves dozens of >>>> call sites, and will take some time. >>>> >>>> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to >>>> implement tracking of these pages. This tracking will be separate from >>>> the existing struct page refcounting. >>>> >>>> 4) Use the tracking and identification of these pages, to implement >>>> special handling (especially in writeback paths) when the pages are >>>> backed by a filesystem. Again, [1] provides details as to why that is >>>> desirable. >>> >>> I thought at Plumbers we talked about using a page bit to tag pages >>> that have had their reference count elevated by get_user_pages()? That >>> way there is no need to distinguish put_page() from put_user_page() it >>> just happens internally to put_page(). At the conference Matthew was >>> offering to free up a page bit for this purpose. >>> >> >> ...but then, upon further discussion in that same session, we realized that >> that doesn't help. You need a reference count. Otherwise a random put_page >> could affect your dma-pinned pages, etc, etc. > > Ok, sorry, I mis-remembered. So, you're effectively trying to capture > the end of the page pin event separate from the final 'put' of the > page? Makes sense. > Yes, that's it exactly. >> I was not able to actually find any place where a single additional page >> bit would help our situation, which is why this still uses LRU fields for >> both the two bits required (the RFC [1] still applies), and the dma_pinned_count. > > Except the LRU fields are already in use for ZONE_DEVICE pages... how > does this proposal interact with those? Very badly: page->pgmap and page->hmm_data both get corrupted. Is there an entire use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? Said another way: is it reasonable to disallow calling get_user_pages() on ZONE_DEVICE pages? If we have to support get_user_pages() on ZONE_DEVICE pages, then the whole LRU field approach is unusable. thanks, -- John Hubbard NVIDIA