From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions From: John Hubbard To: Jerome Glisse , Matthew Wilcox CC: Dan Williams , John Hubbard , Andrew Morton , Linux MM , Jan Kara , , Al Viro , , Christoph Hellwig , Christopher Lameter , "Dalessandro, Dennis" , Doug Ledford , Jason Gunthorpe , Michal Hocko , , , Linux Kernel Mailing List , linux-fsdevel References: <20181204001720.26138-1-jhubbard@nvidia.com> <20181204001720.26138-2-jhubbard@nvidia.com> <3c91d335-921c-4704-d159-2975ff3a5f20@nvidia.com> <20181205011519.GV10377@bombadil.infradead.org> <20181205014441.GA3045@redhat.com> <59ca5c4b-fd5b-1fc6-f891-c7986d91908e@nvidia.com> Message-ID: <7b4733be-13d3-c790-ff1b-ac51b505e9a6@nvidia.com> Date: Thu, 6 Dec 2018 18:45:49 -0800 MIME-Version: 1.0 In-Reply-To: <59ca5c4b-fd5b-1fc6-f891-c7986d91908e@nvidia.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US-large Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: On 12/4/18 5:57 PM, John Hubbard wrote: > On 12/4/18 5:44 PM, Jerome Glisse wrote: >> On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote: >>> On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote: >>>> On 12/4/18 3:03 PM, Dan Williams wrote: >>>>> Except the LRU fields are already in use for ZONE_DEVICE pages... how >>>>> does this proposal interact with those? >>>> >>>> Very badly: page->pgmap and page->hmm_data both get corrupted. Is there an entire >>>> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? Said another >>>> way: is it reasonable to disallow calling get_user_pages() on ZONE_DEVICE pages? >>>> >>>> If we have to support get_user_pages() on ZONE_DEVICE pages, then the whole >>>> LRU field approach is unusable. >>> >>> We just need to rearrange ZONE_DEVICE pages. Please excuse the whitespace >>> damage: >>> >>> +++ b/include/linux/mm_types.h >>> @@ -151,10 +151,12 @@ struct page { >>> #endif >>> }; >>> struct { /* ZONE_DEVICE pages */ >>> + unsigned long _zd_pad_2; /* LRU */ >>> + unsigned long _zd_pad_3; /* LRU */ >>> + unsigned long _zd_pad_1; /* uses mapping */ >>> /** @pgmap: Points to the hosting device page map. */ >>> struct dev_pagemap *pgmap; >>> unsigned long hmm_data; >>> - unsigned long _zd_pad_1; /* uses mapping */ >>> }; >>> >>> /** @rcu_head: You can use this to free a page by RCU. */ >>> >>> You don't use page->private or page->index, do you Dan? >> >> page->private and page->index are use by HMM DEVICE page. >> > > OK, so for the ZONE_DEVICE + HMM case, that leaves just one field remaining for > dma-pinned information. Which might work. To recap, we need: > > -- 1 bit for PageDmaPinned > -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru. > -- N bits for a reference count > > Those *could* be packed into a single 64-bit field, if really necessary. > ...actually, this needs to work on 32-bit systems, as well. And HMM is using a lot. However, it is still possible for this to work. Matthew, can I have that bit now please? I'm about out of options, and now it will actually solve the problem here. Given: 1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not on the LRU. That, in turn, means only 1 bit instead of 2 bits (in addition to a counter) is required, for that case. 2) There is an independent bit available (according to Matthew). 3) HMM uses 4 of the 5 struct page fields, so only one field is available for a counter in that case. 4) get_user_pages() must work on ZONE_DEVICE and HMM pages. 5) For a proper atomic counter for both 32- and 64-bit, we really do need a complete unsigned long field. So that leads to the following approach: -- Use a single unsigned long field for an atomic reference count for the DMA pinned count. For normal pages, this will be the *second* field of the LRU (in order to avoid PageTail bit). For ZONE_DEVICE pages, we can also line up the fields so that the second LRU field is available and reserved for this DMA pinned count. Basically _zd_pad_1 gets move up and optionally renamed: diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 017ab82e36ca..b5dcd9398cae 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -90,8 +90,8 @@ struct page { * are in use. */ struct { - unsigned long dma_pinned_flags; - atomic_t dma_pinned_count; + unsigned long dma_pinned_flags; /* LRU.next */ + atomic_t dma_pinned_count; /* LRU.prev */ }; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ @@ -161,9 +161,9 @@ struct page { }; struct { /* ZONE_DEVICE pages */ /** @pgmap: Points to the hosting device page map. */ - struct dev_pagemap *pgmap; - unsigned long hmm_data; - unsigned long _zd_pad_1; /* uses mapping */ + struct dev_pagemap *pgmap; /* LRU.next */ + unsigned long _zd_pad_1; /* LRU.prev or dma_pinned_count */ + unsigned long hmm_data; /* uses mapping */ }; /** @rcu_head: You can use this to free a page by RCU. */ -- Use an additional, fully independent page bit (from Matthew) for PageDmaPinned. thanks, -- John Hubbard NVIDIA