On 8 May 2020, at 16:06, Ralph Campbell wrote: > On 5/8/20 12:51 PM, Christoph Hellwig wrote: >> On Fri, May 08, 2020 at 12:20:07PM -0700, Ralph Campbell wrote: >>> hmm_range_fault() returns an array of page frame numbers and flags for >>> how the pages are mapped in the requested process' page tables. The PFN >>> can be used to get the struct page with hmm_pfn_to_page() and the page size >>> order can be determined with compound_order(page) but if the page is larger >>> than order 0 (PAGE_SIZE), there is no indication that the page is mapped >>> using a larger page size. To be fully general, hmm_range_fault() would need >>> to return the mapping size to handle cases like a 1GB compound page being >>> mapped with 2MB PMD entries. However, the most common case is the mapping >>> size the same as the underlying compound page size. >>> Add a new output flag to indicate this so that callers know it is safe to >>> use a large device page table mapping if one is available. >> >> Why do you need the flag? The caller should be able to just use >> page_size() (or willys new thp_size helper). >> > > The question is whether or not a large page can be mapped with smaller > page table entries with different permissions. If one process has a 2MB > page mapped with 4K PTEs with different read/write permissions, I don't think > it would be OK for a device to map the whole 2MB with write access enabled. > The flag is supposed to indicate that the whole page can be mapped by the > device with the indicated read/write permissions. If hmm_range_fault() only walks one VMA at a time, you would not have this permission issue, right? Since all pages from one VMA should have the same permission. But it seems that hmm_range_fault() deals with pages across multiple VMAs. Maybe we should make hmm_range_fault() bail out early when it encounters a VMA with a different permission than the existing ones? — Best Regards, Yan Zi