All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/3] split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-10 17:26 ` Alex Sierra
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-03-10 17:26 UTC (permalink / raw)
  To: jgg
  Cc: david, Felix.Kuehling, linux-mm, rcampbell, linux-ext4,
	linux-xfs, amd-gfx, dri-devel, hch, jglisse, apopple, willy,
	akpm

DEVICE_COHERENT pages introduce a subtle distinction in the way
"normal" pages can be used by various callers throughout the kernel.
They behave like normal pages for purposes of mapping in CPU page
tables, and for COW. But they do not support LRU lists, NUMA
migration or THP. Therefore we split vm_normal_page into two
functions vm_normal_any_page and vm_normal_lru_page. The latter will
only return pages that can be put on an LRU list and that support
NUMA migration, KSM and THP.

HMM tests were added to selftest to excercise these changes with
device coherent pages. New test called hmm_cow_in_device, will test
pages marked as COW, allocated in device zone. Also, more
configurations were added into hmm_gup_test to test basic get
user pages and get user pages fast paths in device zone pages.

Alex Sierra (3):
  mm: split vm_normal_pages for LRU and non-LRU handling
  tools: add more gup configs to hmm_gup selftests
  tools: add selftests to hmm for COW in device memory

 fs/proc/task_mmu.c                     |  12 +--
 include/linux/mm.h                     |  11 +-
 mm/gup.c                               |  10 +-
 mm/hmm.c                               |   2 +-
 mm/huge_memory.c                       |   2 +-
 mm/khugepaged.c                        |   8 +-
 mm/ksm.c                               |   4 +-
 mm/madvise.c                           |   4 +-
 mm/memcontrol.c                        |   2 +-
 mm/memory.c                            |  38 ++++---
 mm/mempolicy.c                         |   4 +-
 mm/migrate.c                           |   2 +-
 mm/migrate_device.c                    |   2 +-
 mm/mlock.c                             |   6 +-
 mm/mprotect.c                          |   2 +-
 tools/testing/selftests/vm/hmm-tests.c | 139 +++++++++++++++++++++----
 16 files changed, 185 insertions(+), 63 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v1 0/3] split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-10 17:26 ` Alex Sierra
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-03-10 17:26 UTC (permalink / raw)
  To: jgg
  Cc: rcampbell, willy, david, Felix.Kuehling, apopple, amd-gfx,
	linux-xfs, linux-mm, jglisse, dri-devel, akpm, linux-ext4, hch

DEVICE_COHERENT pages introduce a subtle distinction in the way
"normal" pages can be used by various callers throughout the kernel.
They behave like normal pages for purposes of mapping in CPU page
tables, and for COW. But they do not support LRU lists, NUMA
migration or THP. Therefore we split vm_normal_page into two
functions vm_normal_any_page and vm_normal_lru_page. The latter will
only return pages that can be put on an LRU list and that support
NUMA migration, KSM and THP.

HMM tests were added to selftest to excercise these changes with
device coherent pages. New test called hmm_cow_in_device, will test
pages marked as COW, allocated in device zone. Also, more
configurations were added into hmm_gup_test to test basic get
user pages and get user pages fast paths in device zone pages.

Alex Sierra (3):
  mm: split vm_normal_pages for LRU and non-LRU handling
  tools: add more gup configs to hmm_gup selftests
  tools: add selftests to hmm for COW in device memory

 fs/proc/task_mmu.c                     |  12 +--
 include/linux/mm.h                     |  11 +-
 mm/gup.c                               |  10 +-
 mm/hmm.c                               |   2 +-
 mm/huge_memory.c                       |   2 +-
 mm/khugepaged.c                        |   8 +-
 mm/ksm.c                               |   4 +-
 mm/madvise.c                           |   4 +-
 mm/memcontrol.c                        |   2 +-
 mm/memory.c                            |  38 ++++---
 mm/mempolicy.c                         |   4 +-
 mm/migrate.c                           |   2 +-
 mm/migrate_device.c                    |   2 +-
 mm/mlock.c                             |   6 +-
 mm/mprotect.c                          |   2 +-
 tools/testing/selftests/vm/hmm-tests.c | 139 +++++++++++++++++++++----
 16 files changed, 185 insertions(+), 63 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
  2022-03-10 17:26 ` Alex Sierra
@ 2022-03-10 17:26   ` Alex Sierra
  -1 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-03-10 17:26 UTC (permalink / raw)
  To: jgg
  Cc: david, Felix.Kuehling, linux-mm, rcampbell, linux-ext4,
	linux-xfs, amd-gfx, dri-devel, hch, jglisse, apopple, willy,
	akpm

DEVICE_COHERENT pages introduce a subtle distinction in the way
"normal" pages can be used by various callers throughout the kernel.
They behave like normal pages for purposes of mapping in CPU page
tables, and for COW. But they do not support LRU lists, NUMA
migration or THP. Therefore we split vm_normal_page into two
functions vm_normal_any_page and vm_normal_lru_page. The latter will
only return pages that can be put on an LRU list and that support
NUMA migration, KSM and THP.

We also introduced a FOLL_LRU flag that adds the same behaviour to
follow_page and related APIs, to allow callers to specify that they
expect to put pages on an LRU list.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 fs/proc/task_mmu.c  | 12 ++++++------
 include/linux/mm.h  | 11 +++++++----
 mm/gup.c            | 10 ++++++----
 mm/hmm.c            |  2 +-
 mm/huge_memory.c    |  2 +-
 mm/khugepaged.c     |  8 ++++----
 mm/ksm.c            |  4 ++--
 mm/madvise.c        |  4 ++--
 mm/memcontrol.c     |  2 +-
 mm/memory.c         | 38 ++++++++++++++++++++++++++------------
 mm/mempolicy.c      |  4 ++--
 mm/migrate.c        |  2 +-
 mm/migrate_device.c |  2 +-
 mm/mlock.c          |  6 +++---
 mm/mprotect.c       |  2 +-
 15 files changed, 64 insertions(+), 45 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f46060eb91b5..cada3e702693 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -528,7 +528,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 	bool migration = false;
 
 	if (pte_present(*pte)) {
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_any_page(vma, addr, *pte);
 	} else if (is_swap_pte(*pte)) {
 		swp_entry_t swpent = pte_to_swp_entry(*pte);
 
@@ -723,7 +723,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
 	struct page *page = NULL;
 
 	if (pte_present(*pte)) {
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_any_page(vma, addr, *pte);
 	} else if (is_swap_pte(*pte)) {
 		swp_entry_t swpent = pte_to_swp_entry(*pte);
 
@@ -1077,7 +1077,7 @@ static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr,
 		return false;
 	if (likely(!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)))
 		return false;
-	page = vm_normal_page(vma, addr, pte);
+	page = vm_normal_any_page(vma, addr, pte);
 	if (!page)
 		return false;
 	return page_maybe_dma_pinned(page);
@@ -1190,7 +1190,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 		if (!pte_present(ptent))
 			continue;
 
-		page = vm_normal_page(vma, addr, ptent);
+		page = vm_normal_any_page(vma, addr, ptent);
 		if (!page)
 			continue;
 
@@ -1402,7 +1402,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		if (pm->show_pfn)
 			frame = pte_pfn(pte);
 		flags |= PM_PRESENT;
-		page = vm_normal_page(vma, addr, pte);
+		page = vm_normal_any_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
 			flags |= PM_SOFT_DIRTY;
 		if (pte_uffd_wp(pte))
@@ -1784,7 +1784,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
 	if (!pte_present(pte))
 		return NULL;
 
-	page = vm_normal_page(vma, addr, pte);
+	page = vm_normal_lru_page(vma, addr, pte);
 	if (!page)
 		return NULL;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d507c32724c0..46b0bb43cef3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -593,8 +593,8 @@ struct vm_operations_struct {
 					unsigned long addr);
 #endif
 	/*
-	 * Called by vm_normal_page() for special PTEs to find the
-	 * page for @addr.  This is useful if the default behavior
+	 * Called by vm_normal_*_page() for special PTEs to find the
+	 * page for @addr. This is useful if the default behavior
 	 * (using pte_page()) would not find the correct page.
 	 */
 	struct page *(*find_special_page)(struct vm_area_struct *vma,
@@ -1781,7 +1781,9 @@ static inline bool can_do_mlock(void) { return false; }
 extern int user_shm_lock(size_t, struct ucounts *);
 extern void user_shm_unlock(size_t, struct ucounts *);
 
-struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
+struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
+			     pte_t pte);
+struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t pte);
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 				pmd_t pmd);
@@ -2901,6 +2903,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
 #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
 #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
+#define FOLL_LRU	0x100000 /* return only LRU (anon or page cache) */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
@@ -3227,7 +3230,7 @@ extern long copy_huge_page_from_user(struct page *dst_page,
  * @vma: Pointer to the struct vm_area_struct to consider
  *
  * Whether transhuge page-table entries are considered "special" following
- * the definition in vm_normal_page().
+ * the definition in vm_normal_*_page().
  *
  * Return: true if transhuge page-table entries should be considered special,
  * false otherwise.
diff --git a/mm/gup.c b/mm/gup.c
index 41349b685eaf..0ed597143d80 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -539,8 +539,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		pte_unmap_unlock(ptep, ptl);
 		return NULL;
 	}
-
-	page = vm_normal_page(vma, address, pte);
+	if (flags & (FOLL_MLOCK | FOLL_LRU))
+		page = vm_normal_lru_page(vma, address, pte);
+	else
+		page = vm_normal_any_page(vma, address, pte);
 	if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
 		/*
 		 * Only return device mapping pages in the FOLL_GET or FOLL_PIN
@@ -824,7 +826,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma,
  *
  * Return: the mapped (struct page *), %NULL if no mapping exists, or
  * an error pointer if there is a mapping to something not represented
- * by a page descriptor (see also vm_normal_page()).
+ * by a page descriptor (see also vm_normal_*_page()).
  */
 static struct page *follow_page_mask(struct vm_area_struct *vma,
 			      unsigned long address, unsigned int flags,
@@ -917,7 +919,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
 	*vma = get_gate_vma(mm);
 	if (!page)
 		goto out;
-	*page = vm_normal_page(*vma, address, *pte);
+	*page = vm_normal_any_page(*vma, address, *pte);
 	if (!*page) {
 		if ((gup_flags & FOLL_DUMP) || !is_zero_pfn(pte_pfn(*pte)))
 			goto unmap;
diff --git a/mm/hmm.c b/mm/hmm.c
index bd56641c79d4..90c949d66712 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -300,7 +300,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	 * Since each architecture defines a struct page for the zero page, just
 	 * fall through and treat it like a normal page.
 	 */
-	if (!vm_normal_page(walk->vma, addr, pte) &&
+	if (!vm_normal_any_page(walk->vma, addr, pte) &&
 	    !pte_devmap(pte) &&
 	    !is_zero_pfn(pte_pfn(pte))) {
 		if (hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0)) {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 406a3c28c026..ea1efc825774 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2966,7 +2966,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
 		}
 
 		/* FOLL_DUMP to ignore special (like zero) pages */
-		follflags = FOLL_GET | FOLL_DUMP;
+		follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
 		page = follow_page(vma, addr, follflags);
 
 		if (IS_ERR(page))
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 131492fd1148..a7153db09afa 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -627,7 +627,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 			result = SCAN_PTE_NON_PRESENT;
 			goto out;
 		}
-		page = vm_normal_page(vma, address, pteval);
+		page = vm_normal_lru_page(vma, address, pteval);
 		if (unlikely(!page)) {
 			result = SCAN_PAGE_NULL;
 			goto out;
@@ -1286,7 +1286,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 		if (pte_write(pteval))
 			writable = true;
 
-		page = vm_normal_page(vma, _address, pteval);
+		page = vm_normal_lru_page(vma, _address, pteval);
 		if (unlikely(!page)) {
 			result = SCAN_PAGE_NULL;
 			goto out_unmap;
@@ -1494,7 +1494,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
 		if (!pte_present(*pte))
 			goto abort;
 
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_lru_page(vma, addr, *pte);
 
 		/*
 		 * Note that uprobe, debugger, or MAP_PRIVATE may change the
@@ -1512,7 +1512,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
 
 		if (pte_none(*pte))
 			continue;
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_lru_page(vma, addr, *pte);
 		page_remove_rmap(page, false);
 	}
 
diff --git a/mm/ksm.c b/mm/ksm.c
index c20bd4d9a0d9..352d37e44694 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -474,7 +474,7 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
 	do {
 		cond_resched();
 		page = follow_page(vma, addr,
-				FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE);
+				FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE | FOLL_LRU);
 		if (IS_ERR_OR_NULL(page))
 			break;
 		if (PageKsm(page))
@@ -559,7 +559,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item)
 	if (!vma)
 		goto out;
 
-	page = follow_page(vma, addr, FOLL_GET);
+	page = follow_page(vma, addr, FOLL_GET | FOLL_LRU);
 	if (IS_ERR_OR_NULL(page))
 		goto out;
 	if (PageAnon(page)) {
diff --git a/mm/madvise.c b/mm/madvise.c
index 38d0f515d548..8dcfb93fe0d5 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -411,7 +411,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		if (!pte_present(ptent))
 			continue;
 
-		page = vm_normal_page(vma, addr, ptent);
+		page = vm_normal_lru_page(vma, addr, ptent);
 		if (!page)
 			continue;
 
@@ -621,7 +621,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			continue;
 		}
 
-		page = vm_normal_page(vma, addr, ptent);
+		page = vm_normal_lru_page(vma, addr, ptent);
 		if (!page)
 			continue;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d9cd8f2afa3c..1d791760608b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5476,7 +5476,7 @@ enum mc_target_type {
 static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
 						unsigned long addr, pte_t ptent)
 {
-	struct page *page = vm_normal_page(vma, addr, ptent);
+	struct page *page = vm_normal_any_page(vma, addr, ptent);
 
 	if (!page || !page_mapped(page))
 		return NULL;
diff --git a/mm/memory.c b/mm/memory.c
index c125c4969913..1422b1f71f0e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -565,7 +565,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
 }
 
 /*
- * vm_normal_page -- This function gets the "struct page" associated with a pte.
+ * vm_normal_any_page -- This function gets the "struct page" associated with a pte.
  *
  * "Special" mappings do not wish to be associated with a "struct page" (either
  * it doesn't exist, or it exists but they don't want to touch it). In this
@@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
  * PFNMAP mappings in order to support COWable mappings.
  *
  */
-struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
+struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
 			    pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
@@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			return NULL;
 		if (is_zero_pfn(pfn))
 			return NULL;
-		if (pte_devmap(pte))
-			return NULL;
 
 		print_bad_pte(vma, addr, pte, NULL);
 		return NULL;
@@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 	return pfn_to_page(pfn);
 }
 
+/*
+ * vm_normal_lru_page -- This function gets the "struct page" associated
+ * with a pte only for page cache and anon page. These pages are LRU handled.
+ */
+struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
+			    pte_t pte)
+{
+	struct page *page;
+
+	page = vm_normal_any_page(vma, addr, pte);
+	if (page && is_zone_device_page(page))
+		return NULL;
+
+	return page;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 				pmd_t pmd)
@@ -670,7 +684,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 	/*
 	 * There is no pmd_special() but there may be special pmds, e.g.
 	 * in a direct-access (dax) mapping, so let's just replicate the
-	 * !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here.
+	 * !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_any_page() here.
 	 */
 	if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
 		if (vma->vm_flags & VM_MIXEDMAP) {
@@ -946,7 +960,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 	pte_t pte = *src_pte;
 	struct page *page;
 
-	page = vm_normal_page(src_vma, addr, pte);
+	page = vm_normal_any_page(src_vma, addr, pte);
 	if (page) {
 		int retval;
 
@@ -1358,7 +1372,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 		if (pte_present(ptent)) {
 			struct page *page;
 
-			page = vm_normal_page(vma, addr, ptent);
+			page = vm_normal_any_page(vma, addr, ptent);
 			if (unlikely(zap_skip_check_mapping(details, page)))
 				continue;
 			ptent = ptep_get_and_clear_full(mm, addr, pte,
@@ -2168,7 +2182,7 @@ EXPORT_SYMBOL(vmf_insert_pfn);
 
 static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn)
 {
-	/* these checks mirror the abort conditions in vm_normal_page */
+	/* these checks mirror the abort conditions in vm_normal_lru_page */
 	if (vma->vm_flags & VM_MIXEDMAP)
 		return true;
 	if (pfn_t_devmap(pfn))
@@ -2198,7 +2212,7 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma,
 
 	/*
 	 * If we don't have pte special, then we have to use the pfn_valid()
-	 * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must*
+	 * based VM_MIXEDMAP scheme (see vm_normal_any_page), and thus we *must*
 	 * refcount the page if pfn_valid is true (hence insert_page rather
 	 * than insert_pfn).  If a zero_pfn were inserted into a VM_MIXEDMAP
 	 * without pte special, it would there be refcounted as a normal page.
@@ -2408,7 +2422,7 @@ int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
 	 * There's a horrible special case to handle copy-on-write
 	 * behaviour that some programs depend on. We mark the "original"
 	 * un-COW'ed pages by matching them up with "vma->vm_pgoff".
-	 * See vm_normal_page() for details.
+	 * See vm_normal_any_page() for details.
 	 */
 	if (is_cow_mapping(vma->vm_flags)) {
 		if (addr != vma->vm_start || end != vma->vm_end)
@@ -3267,7 +3281,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 		     mm_tlb_flush_pending(vmf->vma->vm_mm)))
 		flush_tlb_page(vmf->vma, vmf->address);
 
-	vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
+	vmf->page = vm_normal_any_page(vma, vmf->address, vmf->orig_pte);
 	if (!vmf->page) {
 		/*
 		 * VM_MIXEDMAP !pfn_valid() case, or VM_SOFTDIRTY clear on a
@@ -4364,7 +4378,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	old_pte = ptep_get(vmf->pte);
 	pte = pte_modify(old_pte, vma->vm_page_prot);
 
-	page = vm_normal_page(vma, vmf->address, pte);
+	page = vm_normal_lru_page(vma, vmf->address, pte);
 	if (!page)
 		goto out_map;
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 69284d3b5e53..651408f14b3e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -527,11 +527,11 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
 		if (!pte_present(*pte))
 			continue;
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_lru_page(vma, addr, *pte);
 		if (!page)
 			continue;
 		/*
-		 * vm_normal_page() filters out zero pages, but there might
+		 * vm_normal_lru_page() filters out zero pages, but there might
 		 * still be PageReserved pages to skip, perhaps in a VDSO.
 		 */
 		if (PageReserved(page))
diff --git a/mm/migrate.c b/mm/migrate.c
index c31d04b46a5e..17d049311b78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 		goto out;
 
 	/* FOLL_DUMP to ignore special (like zero) pages */
-	follflags = FOLL_GET | FOLL_DUMP;
+	follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
 	page = follow_page(vma, addr, follflags);
 
 	err = PTR_ERR(page);
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 3373b535d5c9..fac1b6978361 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -154,7 +154,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 				migrate->cpages++;
 				goto next;
 			}
-			page = vm_normal_page(migrate->vma, addr, pte);
+			page = vm_normal_any_page(migrate->vma, addr, pte);
 			if (page && !is_zone_device_page(page) &&
 			    !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
 				goto next;
diff --git a/mm/mlock.c b/mm/mlock.c
index 25934e7db3e1..bb09926aeee7 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -342,7 +342,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
  * a non-TPH page already pinned and in the @pvec, and that it belongs to @zone.
  *
  * The rest of @pvec is filled by subsequent pages within the same pmd and same
- * zone, as long as the pte's are present and vm_normal_page() succeeds. These
+ * zone, as long as the pte's are present and vm_normal_lru_page() succeeds. These
  * pages also get pinned.
  *
  * Returns the address of the next page that should be scanned. This equals
@@ -373,7 +373,7 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
 		struct page *page = NULL;
 		pte++;
 		if (pte_present(*pte))
-			page = vm_normal_page(vma, start, *pte);
+			page = vm_normal_lru_page(vma, start, *pte);
 		/*
 		 * Break if page could not be obtained or the page's node+zone does not
 		 * match
@@ -439,7 +439,7 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
 		 * suits munlock very well (and if somehow an abnormal page
 		 * has sneaked into the range, we won't oops here: great).
 		 */
-		page = follow_page(vma, start, FOLL_GET | FOLL_DUMP);
+		page = follow_page(vma, start, FOLL_GET | FOLL_DUMP | FOLL_LRU);
 
 		if (page && !IS_ERR(page)) {
 			if (PageTransTail(page)) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 2887644fd150..bc3e75334aeb 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -88,7 +88,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				if (pte_protnone(oldpte))
 					continue;
 
-				page = vm_normal_page(vma, addr, oldpte);
+				page = vm_normal_lru_page(vma, addr, oldpte);
 				if (!page || PageKsm(page))
 					continue;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-10 17:26   ` Alex Sierra
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-03-10 17:26 UTC (permalink / raw)
  To: jgg
  Cc: rcampbell, willy, david, Felix.Kuehling, apopple, amd-gfx,
	linux-xfs, linux-mm, jglisse, dri-devel, akpm, linux-ext4, hch

DEVICE_COHERENT pages introduce a subtle distinction in the way
"normal" pages can be used by various callers throughout the kernel.
They behave like normal pages for purposes of mapping in CPU page
tables, and for COW. But they do not support LRU lists, NUMA
migration or THP. Therefore we split vm_normal_page into two
functions vm_normal_any_page and vm_normal_lru_page. The latter will
only return pages that can be put on an LRU list and that support
NUMA migration, KSM and THP.

We also introduced a FOLL_LRU flag that adds the same behaviour to
follow_page and related APIs, to allow callers to specify that they
expect to put pages on an LRU list.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 fs/proc/task_mmu.c  | 12 ++++++------
 include/linux/mm.h  | 11 +++++++----
 mm/gup.c            | 10 ++++++----
 mm/hmm.c            |  2 +-
 mm/huge_memory.c    |  2 +-
 mm/khugepaged.c     |  8 ++++----
 mm/ksm.c            |  4 ++--
 mm/madvise.c        |  4 ++--
 mm/memcontrol.c     |  2 +-
 mm/memory.c         | 38 ++++++++++++++++++++++++++------------
 mm/mempolicy.c      |  4 ++--
 mm/migrate.c        |  2 +-
 mm/migrate_device.c |  2 +-
 mm/mlock.c          |  6 +++---
 mm/mprotect.c       |  2 +-
 15 files changed, 64 insertions(+), 45 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f46060eb91b5..cada3e702693 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -528,7 +528,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 	bool migration = false;
 
 	if (pte_present(*pte)) {
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_any_page(vma, addr, *pte);
 	} else if (is_swap_pte(*pte)) {
 		swp_entry_t swpent = pte_to_swp_entry(*pte);
 
@@ -723,7 +723,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
 	struct page *page = NULL;
 
 	if (pte_present(*pte)) {
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_any_page(vma, addr, *pte);
 	} else if (is_swap_pte(*pte)) {
 		swp_entry_t swpent = pte_to_swp_entry(*pte);
 
@@ -1077,7 +1077,7 @@ static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr,
 		return false;
 	if (likely(!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)))
 		return false;
-	page = vm_normal_page(vma, addr, pte);
+	page = vm_normal_any_page(vma, addr, pte);
 	if (!page)
 		return false;
 	return page_maybe_dma_pinned(page);
@@ -1190,7 +1190,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 		if (!pte_present(ptent))
 			continue;
 
-		page = vm_normal_page(vma, addr, ptent);
+		page = vm_normal_any_page(vma, addr, ptent);
 		if (!page)
 			continue;
 
@@ -1402,7 +1402,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		if (pm->show_pfn)
 			frame = pte_pfn(pte);
 		flags |= PM_PRESENT;
-		page = vm_normal_page(vma, addr, pte);
+		page = vm_normal_any_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
 			flags |= PM_SOFT_DIRTY;
 		if (pte_uffd_wp(pte))
@@ -1784,7 +1784,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
 	if (!pte_present(pte))
 		return NULL;
 
-	page = vm_normal_page(vma, addr, pte);
+	page = vm_normal_lru_page(vma, addr, pte);
 	if (!page)
 		return NULL;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d507c32724c0..46b0bb43cef3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -593,8 +593,8 @@ struct vm_operations_struct {
 					unsigned long addr);
 #endif
 	/*
-	 * Called by vm_normal_page() for special PTEs to find the
-	 * page for @addr.  This is useful if the default behavior
+	 * Called by vm_normal_*_page() for special PTEs to find the
+	 * page for @addr. This is useful if the default behavior
 	 * (using pte_page()) would not find the correct page.
 	 */
 	struct page *(*find_special_page)(struct vm_area_struct *vma,
@@ -1781,7 +1781,9 @@ static inline bool can_do_mlock(void) { return false; }
 extern int user_shm_lock(size_t, struct ucounts *);
 extern void user_shm_unlock(size_t, struct ucounts *);
 
-struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
+struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
+			     pte_t pte);
+struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t pte);
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 				pmd_t pmd);
@@ -2901,6 +2903,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
 #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
 #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
+#define FOLL_LRU	0x100000 /* return only LRU (anon or page cache) */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
@@ -3227,7 +3230,7 @@ extern long copy_huge_page_from_user(struct page *dst_page,
  * @vma: Pointer to the struct vm_area_struct to consider
  *
  * Whether transhuge page-table entries are considered "special" following
- * the definition in vm_normal_page().
+ * the definition in vm_normal_*_page().
  *
  * Return: true if transhuge page-table entries should be considered special,
  * false otherwise.
diff --git a/mm/gup.c b/mm/gup.c
index 41349b685eaf..0ed597143d80 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -539,8 +539,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		pte_unmap_unlock(ptep, ptl);
 		return NULL;
 	}
-
-	page = vm_normal_page(vma, address, pte);
+	if (flags & (FOLL_MLOCK | FOLL_LRU))
+		page = vm_normal_lru_page(vma, address, pte);
+	else
+		page = vm_normal_any_page(vma, address, pte);
 	if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
 		/*
 		 * Only return device mapping pages in the FOLL_GET or FOLL_PIN
@@ -824,7 +826,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct *vma,
  *
  * Return: the mapped (struct page *), %NULL if no mapping exists, or
  * an error pointer if there is a mapping to something not represented
- * by a page descriptor (see also vm_normal_page()).
+ * by a page descriptor (see also vm_normal_*_page()).
  */
 static struct page *follow_page_mask(struct vm_area_struct *vma,
 			      unsigned long address, unsigned int flags,
@@ -917,7 +919,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
 	*vma = get_gate_vma(mm);
 	if (!page)
 		goto out;
-	*page = vm_normal_page(*vma, address, *pte);
+	*page = vm_normal_any_page(*vma, address, *pte);
 	if (!*page) {
 		if ((gup_flags & FOLL_DUMP) || !is_zero_pfn(pte_pfn(*pte)))
 			goto unmap;
diff --git a/mm/hmm.c b/mm/hmm.c
index bd56641c79d4..90c949d66712 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -300,7 +300,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	 * Since each architecture defines a struct page for the zero page, just
 	 * fall through and treat it like a normal page.
 	 */
-	if (!vm_normal_page(walk->vma, addr, pte) &&
+	if (!vm_normal_any_page(walk->vma, addr, pte) &&
 	    !pte_devmap(pte) &&
 	    !is_zero_pfn(pte_pfn(pte))) {
 		if (hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0)) {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 406a3c28c026..ea1efc825774 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2966,7 +2966,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
 		}
 
 		/* FOLL_DUMP to ignore special (like zero) pages */
-		follflags = FOLL_GET | FOLL_DUMP;
+		follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
 		page = follow_page(vma, addr, follflags);
 
 		if (IS_ERR(page))
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 131492fd1148..a7153db09afa 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -627,7 +627,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 			result = SCAN_PTE_NON_PRESENT;
 			goto out;
 		}
-		page = vm_normal_page(vma, address, pteval);
+		page = vm_normal_lru_page(vma, address, pteval);
 		if (unlikely(!page)) {
 			result = SCAN_PAGE_NULL;
 			goto out;
@@ -1286,7 +1286,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 		if (pte_write(pteval))
 			writable = true;
 
-		page = vm_normal_page(vma, _address, pteval);
+		page = vm_normal_lru_page(vma, _address, pteval);
 		if (unlikely(!page)) {
 			result = SCAN_PAGE_NULL;
 			goto out_unmap;
@@ -1494,7 +1494,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
 		if (!pte_present(*pte))
 			goto abort;
 
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_lru_page(vma, addr, *pte);
 
 		/*
 		 * Note that uprobe, debugger, or MAP_PRIVATE may change the
@@ -1512,7 +1512,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
 
 		if (pte_none(*pte))
 			continue;
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_lru_page(vma, addr, *pte);
 		page_remove_rmap(page, false);
 	}
 
diff --git a/mm/ksm.c b/mm/ksm.c
index c20bd4d9a0d9..352d37e44694 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -474,7 +474,7 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
 	do {
 		cond_resched();
 		page = follow_page(vma, addr,
-				FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE);
+				FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE | FOLL_LRU);
 		if (IS_ERR_OR_NULL(page))
 			break;
 		if (PageKsm(page))
@@ -559,7 +559,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item)
 	if (!vma)
 		goto out;
 
-	page = follow_page(vma, addr, FOLL_GET);
+	page = follow_page(vma, addr, FOLL_GET | FOLL_LRU);
 	if (IS_ERR_OR_NULL(page))
 		goto out;
 	if (PageAnon(page)) {
diff --git a/mm/madvise.c b/mm/madvise.c
index 38d0f515d548..8dcfb93fe0d5 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -411,7 +411,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		if (!pte_present(ptent))
 			continue;
 
-		page = vm_normal_page(vma, addr, ptent);
+		page = vm_normal_lru_page(vma, addr, ptent);
 		if (!page)
 			continue;
 
@@ -621,7 +621,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			continue;
 		}
 
-		page = vm_normal_page(vma, addr, ptent);
+		page = vm_normal_lru_page(vma, addr, ptent);
 		if (!page)
 			continue;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d9cd8f2afa3c..1d791760608b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5476,7 +5476,7 @@ enum mc_target_type {
 static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
 						unsigned long addr, pte_t ptent)
 {
-	struct page *page = vm_normal_page(vma, addr, ptent);
+	struct page *page = vm_normal_any_page(vma, addr, ptent);
 
 	if (!page || !page_mapped(page))
 		return NULL;
diff --git a/mm/memory.c b/mm/memory.c
index c125c4969913..1422b1f71f0e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -565,7 +565,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
 }
 
 /*
- * vm_normal_page -- This function gets the "struct page" associated with a pte.
+ * vm_normal_any_page -- This function gets the "struct page" associated with a pte.
  *
  * "Special" mappings do not wish to be associated with a "struct page" (either
  * it doesn't exist, or it exists but they don't want to touch it). In this
@@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
  * PFNMAP mappings in order to support COWable mappings.
  *
  */
-struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
+struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
 			    pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
@@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			return NULL;
 		if (is_zero_pfn(pfn))
 			return NULL;
-		if (pte_devmap(pte))
-			return NULL;
 
 		print_bad_pte(vma, addr, pte, NULL);
 		return NULL;
@@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 	return pfn_to_page(pfn);
 }
 
+/*
+ * vm_normal_lru_page -- This function gets the "struct page" associated
+ * with a pte only for page cache and anon page. These pages are LRU handled.
+ */
+struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
+			    pte_t pte)
+{
+	struct page *page;
+
+	page = vm_normal_any_page(vma, addr, pte);
+	if (page && is_zone_device_page(page))
+		return NULL;
+
+	return page;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 				pmd_t pmd)
@@ -670,7 +684,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 	/*
 	 * There is no pmd_special() but there may be special pmds, e.g.
 	 * in a direct-access (dax) mapping, so let's just replicate the
-	 * !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here.
+	 * !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_any_page() here.
 	 */
 	if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
 		if (vma->vm_flags & VM_MIXEDMAP) {
@@ -946,7 +960,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 	pte_t pte = *src_pte;
 	struct page *page;
 
-	page = vm_normal_page(src_vma, addr, pte);
+	page = vm_normal_any_page(src_vma, addr, pte);
 	if (page) {
 		int retval;
 
@@ -1358,7 +1372,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 		if (pte_present(ptent)) {
 			struct page *page;
 
-			page = vm_normal_page(vma, addr, ptent);
+			page = vm_normal_any_page(vma, addr, ptent);
 			if (unlikely(zap_skip_check_mapping(details, page)))
 				continue;
 			ptent = ptep_get_and_clear_full(mm, addr, pte,
@@ -2168,7 +2182,7 @@ EXPORT_SYMBOL(vmf_insert_pfn);
 
 static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn)
 {
-	/* these checks mirror the abort conditions in vm_normal_page */
+	/* these checks mirror the abort conditions in vm_normal_lru_page */
 	if (vma->vm_flags & VM_MIXEDMAP)
 		return true;
 	if (pfn_t_devmap(pfn))
@@ -2198,7 +2212,7 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma,
 
 	/*
 	 * If we don't have pte special, then we have to use the pfn_valid()
-	 * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must*
+	 * based VM_MIXEDMAP scheme (see vm_normal_any_page), and thus we *must*
 	 * refcount the page if pfn_valid is true (hence insert_page rather
 	 * than insert_pfn).  If a zero_pfn were inserted into a VM_MIXEDMAP
 	 * without pte special, it would there be refcounted as a normal page.
@@ -2408,7 +2422,7 @@ int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
 	 * There's a horrible special case to handle copy-on-write
 	 * behaviour that some programs depend on. We mark the "original"
 	 * un-COW'ed pages by matching them up with "vma->vm_pgoff".
-	 * See vm_normal_page() for details.
+	 * See vm_normal_any_page() for details.
 	 */
 	if (is_cow_mapping(vma->vm_flags)) {
 		if (addr != vma->vm_start || end != vma->vm_end)
@@ -3267,7 +3281,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
 		     mm_tlb_flush_pending(vmf->vma->vm_mm)))
 		flush_tlb_page(vmf->vma, vmf->address);
 
-	vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
+	vmf->page = vm_normal_any_page(vma, vmf->address, vmf->orig_pte);
 	if (!vmf->page) {
 		/*
 		 * VM_MIXEDMAP !pfn_valid() case, or VM_SOFTDIRTY clear on a
@@ -4364,7 +4378,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	old_pte = ptep_get(vmf->pte);
 	pte = pte_modify(old_pte, vma->vm_page_prot);
 
-	page = vm_normal_page(vma, vmf->address, pte);
+	page = vm_normal_lru_page(vma, vmf->address, pte);
 	if (!page)
 		goto out_map;
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 69284d3b5e53..651408f14b3e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -527,11 +527,11 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
 		if (!pte_present(*pte))
 			continue;
-		page = vm_normal_page(vma, addr, *pte);
+		page = vm_normal_lru_page(vma, addr, *pte);
 		if (!page)
 			continue;
 		/*
-		 * vm_normal_page() filters out zero pages, but there might
+		 * vm_normal_lru_page() filters out zero pages, but there might
 		 * still be PageReserved pages to skip, perhaps in a VDSO.
 		 */
 		if (PageReserved(page))
diff --git a/mm/migrate.c b/mm/migrate.c
index c31d04b46a5e..17d049311b78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 		goto out;
 
 	/* FOLL_DUMP to ignore special (like zero) pages */
-	follflags = FOLL_GET | FOLL_DUMP;
+	follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
 	page = follow_page(vma, addr, follflags);
 
 	err = PTR_ERR(page);
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 3373b535d5c9..fac1b6978361 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -154,7 +154,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 				migrate->cpages++;
 				goto next;
 			}
-			page = vm_normal_page(migrate->vma, addr, pte);
+			page = vm_normal_any_page(migrate->vma, addr, pte);
 			if (page && !is_zone_device_page(page) &&
 			    !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
 				goto next;
diff --git a/mm/mlock.c b/mm/mlock.c
index 25934e7db3e1..bb09926aeee7 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -342,7 +342,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
  * a non-TPH page already pinned and in the @pvec, and that it belongs to @zone.
  *
  * The rest of @pvec is filled by subsequent pages within the same pmd and same
- * zone, as long as the pte's are present and vm_normal_page() succeeds. These
+ * zone, as long as the pte's are present and vm_normal_lru_page() succeeds. These
  * pages also get pinned.
  *
  * Returns the address of the next page that should be scanned. This equals
@@ -373,7 +373,7 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
 		struct page *page = NULL;
 		pte++;
 		if (pte_present(*pte))
-			page = vm_normal_page(vma, start, *pte);
+			page = vm_normal_lru_page(vma, start, *pte);
 		/*
 		 * Break if page could not be obtained or the page's node+zone does not
 		 * match
@@ -439,7 +439,7 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
 		 * suits munlock very well (and if somehow an abnormal page
 		 * has sneaked into the range, we won't oops here: great).
 		 */
-		page = follow_page(vma, start, FOLL_GET | FOLL_DUMP);
+		page = follow_page(vma, start, FOLL_GET | FOLL_DUMP | FOLL_LRU);
 
 		if (page && !IS_ERR(page)) {
 			if (PageTransTail(page)) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 2887644fd150..bc3e75334aeb 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -88,7 +88,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				if (pte_protnone(oldpte))
 					continue;
 
-				page = vm_normal_page(vma, addr, oldpte);
+				page = vm_normal_lru_page(vma, addr, oldpte);
 				if (!page || PageKsm(page))
 					continue;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v1 2/3] tools: add more gup configs to hmm_gup selftests
  2022-03-10 17:26 ` Alex Sierra
@ 2022-03-10 17:26   ` Alex Sierra
  -1 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-03-10 17:26 UTC (permalink / raw)
  To: jgg
  Cc: david, Felix.Kuehling, linux-mm, rcampbell, linux-ext4,
	linux-xfs, amd-gfx, dri-devel, hch, jglisse, apopple, willy,
	akpm

Test device pages with get_user_pages and get_user_pages_fast.
The motivation is to test device coherent type pages in the gup and
gup fast paths, after vm_normal_pages was split into LRU and non-LRU
handled.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 tools/testing/selftests/vm/hmm-tests.c | 65 +++++++++++++++++---------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
index 11b83a8084fe..65e30ab6494c 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -1769,6 +1769,24 @@ TEST_F(hmm, exclusive_cow)
 	hmm_buffer_free(buffer);
 }
 
+static int gup_test_exec(int gup_fd, unsigned long addr,
+			 int cmd, int npages, int size)
+{
+	struct gup_test gup = {
+		.nr_pages_per_call	= npages,
+		.addr			= addr,
+		.gup_flags		= FOLL_WRITE,
+		.size			= size,
+	};
+
+	if (ioctl(gup_fd, cmd, &gup)) {
+		perror("ioctl on error\n");
+		return errno;
+	}
+
+	return 0;
+}
+
 /*
  * Test get user device pages through gup_test. Setting PIN_LONGTERM flag.
  * This should trigger a migration back to system memory for both, private
@@ -1779,7 +1797,6 @@ TEST_F(hmm, exclusive_cow)
 TEST_F(hmm, hmm_gup_test)
 {
 	struct hmm_buffer *buffer;
-	struct gup_test gup;
 	int gup_fd;
 	unsigned long npages;
 	unsigned long size;
@@ -1792,8 +1809,7 @@ TEST_F(hmm, hmm_gup_test)
 	if (gup_fd == -1)
 		SKIP(return, "Skipping test, could not find gup_test driver");
 
-	npages = 4;
-	ASSERT_NE(npages, 0);
+	npages = 3;
 	size = npages << self->page_shift;
 
 	buffer = malloc(sizeof(*buffer));
@@ -1822,28 +1838,35 @@ TEST_F(hmm, hmm_gup_test)
 	for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
 		ASSERT_EQ(ptr[i], i);
 
-	gup.nr_pages_per_call = npages;
-	gup.addr = (unsigned long)buffer->ptr;
-	gup.gup_flags = FOLL_WRITE;
-	gup.size = size;
-	/*
-	 * Calling gup_test ioctl. It will try to PIN_LONGTERM these device pages
-	 * causing a migration back to system memory for both, private and coherent
-	 * type pages.
-	 */
-	if (ioctl(gup_fd, PIN_LONGTERM_BENCHMARK, &gup)) {
-		perror("ioctl on PIN_LONGTERM_BENCHMARK\n");
-		goto out_test;
-	}
-
-	/* Take snapshot to make sure pages have been migrated to sys memory */
+	ASSERT_EQ(gup_test_exec(gup_fd,
+				(unsigned long)buffer->ptr,
+				GUP_BASIC_TEST, 1, self->page_size), 0);
+	ASSERT_EQ(gup_test_exec(gup_fd,
+				(unsigned long)buffer->ptr + 1 * self->page_size,
+				GUP_FAST_BENCHMARK, 1, self->page_size), 0);
+	ASSERT_EQ(gup_test_exec(gup_fd,
+				(unsigned long)buffer->ptr + 2 * self->page_size,
+				PIN_LONGTERM_BENCHMARK, 1, self->page_size), 0);
+
+	/* Take snapshot to CPU pagetables */
 	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_SNAPSHOT, buffer, npages);
 	ASSERT_EQ(ret, 0);
 	ASSERT_EQ(buffer->cpages, npages);
 	m = buffer->mirror;
-	for (i = 0; i < npages; i++)
-		ASSERT_EQ(m[i], HMM_DMIRROR_PROT_WRITE);
-out_test:
+	if (hmm_is_coherent_type(variant->device_number)) {
+		ASSERT_EQ(HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL | HMM_DMIRROR_PROT_WRITE, m[0]);
+		ASSERT_EQ(HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL | HMM_DMIRROR_PROT_WRITE, m[1]);
+	} else {
+		ASSERT_EQ(HMM_DMIRROR_PROT_WRITE, m[0]);
+		ASSERT_EQ(HMM_DMIRROR_PROT_WRITE, m[1]);
+	}
+	ASSERT_EQ(HMM_DMIRROR_PROT_WRITE, m[2]);
+	/* Check again the content on the pages. Make sure there's no
+	 * corrupted data.
+	 */
+	for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+		ASSERT_EQ(ptr[i], i);
+
 	close(gup_fd);
 	hmm_buffer_free(buffer);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v1 2/3] tools: add more gup configs to hmm_gup selftests
@ 2022-03-10 17:26   ` Alex Sierra
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-03-10 17:26 UTC (permalink / raw)
  To: jgg
  Cc: rcampbell, willy, david, Felix.Kuehling, apopple, amd-gfx,
	linux-xfs, linux-mm, jglisse, dri-devel, akpm, linux-ext4, hch

Test device pages with get_user_pages and get_user_pages_fast.
The motivation is to test device coherent type pages in the gup and
gup fast paths, after vm_normal_pages was split into LRU and non-LRU
handled.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 tools/testing/selftests/vm/hmm-tests.c | 65 +++++++++++++++++---------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
index 11b83a8084fe..65e30ab6494c 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -1769,6 +1769,24 @@ TEST_F(hmm, exclusive_cow)
 	hmm_buffer_free(buffer);
 }
 
+static int gup_test_exec(int gup_fd, unsigned long addr,
+			 int cmd, int npages, int size)
+{
+	struct gup_test gup = {
+		.nr_pages_per_call	= npages,
+		.addr			= addr,
+		.gup_flags		= FOLL_WRITE,
+		.size			= size,
+	};
+
+	if (ioctl(gup_fd, cmd, &gup)) {
+		perror("ioctl on error\n");
+		return errno;
+	}
+
+	return 0;
+}
+
 /*
  * Test get user device pages through gup_test. Setting PIN_LONGTERM flag.
  * This should trigger a migration back to system memory for both, private
@@ -1779,7 +1797,6 @@ TEST_F(hmm, exclusive_cow)
 TEST_F(hmm, hmm_gup_test)
 {
 	struct hmm_buffer *buffer;
-	struct gup_test gup;
 	int gup_fd;
 	unsigned long npages;
 	unsigned long size;
@@ -1792,8 +1809,7 @@ TEST_F(hmm, hmm_gup_test)
 	if (gup_fd == -1)
 		SKIP(return, "Skipping test, could not find gup_test driver");
 
-	npages = 4;
-	ASSERT_NE(npages, 0);
+	npages = 3;
 	size = npages << self->page_shift;
 
 	buffer = malloc(sizeof(*buffer));
@@ -1822,28 +1838,35 @@ TEST_F(hmm, hmm_gup_test)
 	for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
 		ASSERT_EQ(ptr[i], i);
 
-	gup.nr_pages_per_call = npages;
-	gup.addr = (unsigned long)buffer->ptr;
-	gup.gup_flags = FOLL_WRITE;
-	gup.size = size;
-	/*
-	 * Calling gup_test ioctl. It will try to PIN_LONGTERM these device pages
-	 * causing a migration back to system memory for both, private and coherent
-	 * type pages.
-	 */
-	if (ioctl(gup_fd, PIN_LONGTERM_BENCHMARK, &gup)) {
-		perror("ioctl on PIN_LONGTERM_BENCHMARK\n");
-		goto out_test;
-	}
-
-	/* Take snapshot to make sure pages have been migrated to sys memory */
+	ASSERT_EQ(gup_test_exec(gup_fd,
+				(unsigned long)buffer->ptr,
+				GUP_BASIC_TEST, 1, self->page_size), 0);
+	ASSERT_EQ(gup_test_exec(gup_fd,
+				(unsigned long)buffer->ptr + 1 * self->page_size,
+				GUP_FAST_BENCHMARK, 1, self->page_size), 0);
+	ASSERT_EQ(gup_test_exec(gup_fd,
+				(unsigned long)buffer->ptr + 2 * self->page_size,
+				PIN_LONGTERM_BENCHMARK, 1, self->page_size), 0);
+
+	/* Take snapshot to CPU pagetables */
 	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_SNAPSHOT, buffer, npages);
 	ASSERT_EQ(ret, 0);
 	ASSERT_EQ(buffer->cpages, npages);
 	m = buffer->mirror;
-	for (i = 0; i < npages; i++)
-		ASSERT_EQ(m[i], HMM_DMIRROR_PROT_WRITE);
-out_test:
+	if (hmm_is_coherent_type(variant->device_number)) {
+		ASSERT_EQ(HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL | HMM_DMIRROR_PROT_WRITE, m[0]);
+		ASSERT_EQ(HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL | HMM_DMIRROR_PROT_WRITE, m[1]);
+	} else {
+		ASSERT_EQ(HMM_DMIRROR_PROT_WRITE, m[0]);
+		ASSERT_EQ(HMM_DMIRROR_PROT_WRITE, m[1]);
+	}
+	ASSERT_EQ(HMM_DMIRROR_PROT_WRITE, m[2]);
+	/* Check again the content on the pages. Make sure there's no
+	 * corrupted data.
+	 */
+	for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+		ASSERT_EQ(ptr[i], i);
+
 	close(gup_fd);
 	hmm_buffer_free(buffer);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v1 3/3] tools: add selftests to hmm for COW in device memory
  2022-03-10 17:26 ` Alex Sierra
@ 2022-03-10 17:26   ` Alex Sierra
  -1 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-03-10 17:26 UTC (permalink / raw)
  To: jgg
  Cc: david, Felix.Kuehling, linux-mm, rcampbell, linux-ext4,
	linux-xfs, amd-gfx, dri-devel, hch, jglisse, apopple, willy,
	akpm

The objective is to test device migration mechanism in pages marked
as COW, for private and coherent device type. In case of writing to
COW private page(s), a page fault will migrate pages back to system
memory first. Then, these pages will be duplicated. In case of COW
device coherent type, pages are duplicated directly from device
memory.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 tools/testing/selftests/vm/hmm-tests.c | 80 ++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
index 65e30ab6494c..d70b780df877 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -1870,4 +1870,84 @@ TEST_F(hmm, hmm_gup_test)
 	close(gup_fd);
 	hmm_buffer_free(buffer);
 }
+
+/*
+ * Test copy-on-write in device pages.
+ * In case of writing to COW private page(s), a page fault will migrate pages
+ * back to system memory first. Then, these pages will be duplicated. In case
+ * of COW device coherent type, pages are duplicated directly from device
+ * memory.
+ */
+TEST_F(hmm, hmm_cow_in_device)
+{
+	struct hmm_buffer *buffer;
+	unsigned long npages;
+	unsigned long size;
+	unsigned long i;
+	int *ptr;
+	int ret;
+	unsigned char *m;
+	pid_t pid;
+	int status;
+
+	npages = 4;
+	size = npages << self->page_shift;
+
+	buffer = malloc(sizeof(*buffer));
+	ASSERT_NE(buffer, NULL);
+
+	buffer->fd = -1;
+	buffer->size = size;
+	buffer->mirror = malloc(size);
+	ASSERT_NE(buffer->mirror, NULL);
+
+	buffer->ptr = mmap(NULL, size,
+			   PROT_READ | PROT_WRITE,
+			   MAP_PRIVATE | MAP_ANONYMOUS,
+			   buffer->fd, 0);
+	ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+	/* Initialize buffer in system memory. */
+	for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+		ptr[i] = i;
+
+	/* Migrate memory to device. */
+
+	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(buffer->cpages, npages);
+
+	pid = fork();
+	if (pid == -1)
+		ASSERT_EQ(pid, 0);
+	if (!pid) {
+		/* Child process waitd for SIGTERM from the parent. */
+		while (1) {
+		}
+		perror("Should not reach this\n");
+		exit(0);
+	}
+	/* Parent process writes to COW pages(s) and gets a
+	 * new copy in system. In case of device private pages,
+	 * this write causes a migration to system mem first.
+	 */
+	for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+		ptr[i] = i;
+
+	/* Terminate child and wait */
+	EXPECT_EQ(0, kill(pid, SIGTERM));
+	EXPECT_EQ(pid, waitpid(pid, &status, 0));
+	EXPECT_NE(0, WIFSIGNALED(status));
+	EXPECT_EQ(SIGTERM, WTERMSIG(status));
+
+	/* Take snapshot to CPU pagetables */
+	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_SNAPSHOT, buffer, npages);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(buffer->cpages, npages);
+	m = buffer->mirror;
+	for (i = 0; i < npages; i++)
+		ASSERT_EQ(HMM_DMIRROR_PROT_WRITE, m[i]);
+
+	hmm_buffer_free(buffer);
+}
 TEST_HARNESS_MAIN
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v1 3/3] tools: add selftests to hmm for COW in device memory
@ 2022-03-10 17:26   ` Alex Sierra
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-03-10 17:26 UTC (permalink / raw)
  To: jgg
  Cc: rcampbell, willy, david, Felix.Kuehling, apopple, amd-gfx,
	linux-xfs, linux-mm, jglisse, dri-devel, akpm, linux-ext4, hch

The objective is to test device migration mechanism in pages marked
as COW, for private and coherent device type. In case of writing to
COW private page(s), a page fault will migrate pages back to system
memory first. Then, these pages will be duplicated. In case of COW
device coherent type, pages are duplicated directly from device
memory.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 tools/testing/selftests/vm/hmm-tests.c | 80 ++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
index 65e30ab6494c..d70b780df877 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -1870,4 +1870,84 @@ TEST_F(hmm, hmm_gup_test)
 	close(gup_fd);
 	hmm_buffer_free(buffer);
 }
+
+/*
+ * Test copy-on-write in device pages.
+ * In case of writing to COW private page(s), a page fault will migrate pages
+ * back to system memory first. Then, these pages will be duplicated. In case
+ * of COW device coherent type, pages are duplicated directly from device
+ * memory.
+ */
+TEST_F(hmm, hmm_cow_in_device)
+{
+	struct hmm_buffer *buffer;
+	unsigned long npages;
+	unsigned long size;
+	unsigned long i;
+	int *ptr;
+	int ret;
+	unsigned char *m;
+	pid_t pid;
+	int status;
+
+	npages = 4;
+	size = npages << self->page_shift;
+
+	buffer = malloc(sizeof(*buffer));
+	ASSERT_NE(buffer, NULL);
+
+	buffer->fd = -1;
+	buffer->size = size;
+	buffer->mirror = malloc(size);
+	ASSERT_NE(buffer->mirror, NULL);
+
+	buffer->ptr = mmap(NULL, size,
+			   PROT_READ | PROT_WRITE,
+			   MAP_PRIVATE | MAP_ANONYMOUS,
+			   buffer->fd, 0);
+	ASSERT_NE(buffer->ptr, MAP_FAILED);
+
+	/* Initialize buffer in system memory. */
+	for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+		ptr[i] = i;
+
+	/* Migrate memory to device. */
+
+	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(buffer->cpages, npages);
+
+	pid = fork();
+	if (pid == -1)
+		ASSERT_EQ(pid, 0);
+	if (!pid) {
+		/* Child process waitd for SIGTERM from the parent. */
+		while (1) {
+		}
+		perror("Should not reach this\n");
+		exit(0);
+	}
+	/* Parent process writes to COW pages(s) and gets a
+	 * new copy in system. In case of device private pages,
+	 * this write causes a migration to system mem first.
+	 */
+	for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+		ptr[i] = i;
+
+	/* Terminate child and wait */
+	EXPECT_EQ(0, kill(pid, SIGTERM));
+	EXPECT_EQ(pid, waitpid(pid, &status, 0));
+	EXPECT_NE(0, WIFSIGNALED(status));
+	EXPECT_EQ(SIGTERM, WTERMSIG(status));
+
+	/* Take snapshot to CPU pagetables */
+	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_SNAPSHOT, buffer, npages);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(buffer->cpages, npages);
+	m = buffer->mirror;
+	for (i = 0; i < npages; i++)
+		ASSERT_EQ(HMM_DMIRROR_PROT_WRITE, m[i]);
+
+	hmm_buffer_free(buffer);
+}
 TEST_HARNESS_MAIN
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
  2022-03-10 17:26   ` Alex Sierra
@ 2022-03-10 19:25     ` Matthew Wilcox
  -1 siblings, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2022-03-10 19:25 UTC (permalink / raw)
  To: Alex Sierra
  Cc: jgg, david, Felix.Kuehling, linux-mm, rcampbell, linux-ext4,
	linux-xfs, amd-gfx, dri-devel, hch, jglisse, apopple, akpm

On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote:
> @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>   * PFNMAP mappings in order to support COWable mappings.
>   *
>   */
> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
>  			    pte_t pte)
>  {
>  	unsigned long pfn = pte_pfn(pte);
> @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>  			return NULL;
>  		if (is_zero_pfn(pfn))
>  			return NULL;
> -		if (pte_devmap(pte))
> -			return NULL;
>  
>  		print_bad_pte(vma, addr, pte, NULL);
>  		return NULL;

... what?

Haven't you just made it so that a devmap page always prints a bad PTE
message, and then returns NULL anyway?

Surely this should be:

		if (pte_devmap(pte))
-			return NULL;
+			return pfn_to_page(pfn);

or maybe

+			goto check_pfn;

But I don't know about that highest_memmap_pfn check.

> @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>  	return pfn_to_page(pfn);
>  }
>  
> +/*
> + * vm_normal_lru_page -- This function gets the "struct page" associated
> + * with a pte only for page cache and anon page. These pages are LRU handled.
> + */
> +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
> +			    pte_t pte)

It seems a shame to add a new function without proper kernel-doc.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-10 19:25     ` Matthew Wilcox
  0 siblings, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2022-03-10 19:25 UTC (permalink / raw)
  To: Alex Sierra
  Cc: rcampbell, david, Felix.Kuehling, apopple, amd-gfx, linux-xfs,
	linux-mm, jglisse, dri-devel, jgg, akpm, linux-ext4, hch

On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote:
> @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>   * PFNMAP mappings in order to support COWable mappings.
>   *
>   */
> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
>  			    pte_t pte)
>  {
>  	unsigned long pfn = pte_pfn(pte);
> @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>  			return NULL;
>  		if (is_zero_pfn(pfn))
>  			return NULL;
> -		if (pte_devmap(pte))
> -			return NULL;
>  
>  		print_bad_pte(vma, addr, pte, NULL);
>  		return NULL;

... what?

Haven't you just made it so that a devmap page always prints a bad PTE
message, and then returns NULL anyway?

Surely this should be:

		if (pte_devmap(pte))
-			return NULL;
+			return pfn_to_page(pfn);

or maybe

+			goto check_pfn;

But I don't know about that highest_memmap_pfn check.

> @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>  	return pfn_to_page(pfn);
>  }
>  
> +/*
> + * vm_normal_lru_page -- This function gets the "struct page" associated
> + * with a pte only for page cache and anon page. These pages are LRU handled.
> + */
> +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
> +			    pte_t pte)

It seems a shame to add a new function without proper kernel-doc.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
  2022-03-10 19:25     ` Matthew Wilcox
@ 2022-03-10 21:58       ` Felix Kuehling
  -1 siblings, 0 replies; 24+ messages in thread
From: Felix Kuehling @ 2022-03-10 21:58 UTC (permalink / raw)
  To: Matthew Wilcox, Alex Sierra
  Cc: jgg, david, linux-mm, rcampbell, linux-ext4, linux-xfs, amd-gfx,
	dri-devel, hch, jglisse, apopple, akpm


Am 2022-03-10 um 14:25 schrieb Matthew Wilcox:
> On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote:
>> @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>>    * PFNMAP mappings in order to support COWable mappings.
>>    *
>>    */
>> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>> +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
>>   			    pte_t pte)
>>   {
>>   	unsigned long pfn = pte_pfn(pte);
>> @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>   			return NULL;
>>   		if (is_zero_pfn(pfn))
>>   			return NULL;
>> -		if (pte_devmap(pte))
>> -			return NULL;
>>   
>>   		print_bad_pte(vma, addr, pte, NULL);
>>   		return NULL;
> ... what?
>
> Haven't you just made it so that a devmap page always prints a bad PTE
> message, and then returns NULL anyway?

Yeah, that was stupid. :/  I think the long-term goal was to get rid of 
pte_devmap. But for now, as long as we have pte_special with pte_devmap, 
we'll need a special case to handle that like a normal page.

I only see the PFN_DEV|PFN_MAP flags set in a few places: 
drivers/dax/device.c, drivers/nvdimm/pmem.c, fs/fuse/virtio_fs.c. I 
guess we need to test at least one of them for this patch series to make 
sure we're not breaking them.


>
> Surely this should be:
>
> 		if (pte_devmap(pte))
> -			return NULL;
> +			return pfn_to_page(pfn);
>
> or maybe
>
> +			goto check_pfn;
>
> But I don't know about that highest_memmap_pfn check.

Looks to me like it should work. highest_memmap_pfn gets updated in 
memremap_pages -> pagemap_range -> move_pfn_range_to_zone -> 
memmap_init_range.

Regards,
   Felix


>
>> @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>   	return pfn_to_page(pfn);
>>   }
>>   
>> +/*
>> + * vm_normal_lru_page -- This function gets the "struct page" associated
>> + * with a pte only for page cache and anon page. These pages are LRU handled.
>> + */
>> +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
>> +			    pte_t pte)
> It seems a shame to add a new function without proper kernel-doc.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-10 21:58       ` Felix Kuehling
  0 siblings, 0 replies; 24+ messages in thread
From: Felix Kuehling @ 2022-03-10 21:58 UTC (permalink / raw)
  To: Matthew Wilcox, Alex Sierra
  Cc: rcampbell, david, apopple, amd-gfx, linux-xfs, linux-mm, jglisse,
	dri-devel, jgg, akpm, linux-ext4, hch


Am 2022-03-10 um 14:25 schrieb Matthew Wilcox:
> On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote:
>> @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>>    * PFNMAP mappings in order to support COWable mappings.
>>    *
>>    */
>> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>> +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
>>   			    pte_t pte)
>>   {
>>   	unsigned long pfn = pte_pfn(pte);
>> @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>   			return NULL;
>>   		if (is_zero_pfn(pfn))
>>   			return NULL;
>> -		if (pte_devmap(pte))
>> -			return NULL;
>>   
>>   		print_bad_pte(vma, addr, pte, NULL);
>>   		return NULL;
> ... what?
>
> Haven't you just made it so that a devmap page always prints a bad PTE
> message, and then returns NULL anyway?

Yeah, that was stupid. :/  I think the long-term goal was to get rid of 
pte_devmap. But for now, as long as we have pte_special with pte_devmap, 
we'll need a special case to handle that like a normal page.

I only see the PFN_DEV|PFN_MAP flags set in a few places: 
drivers/dax/device.c, drivers/nvdimm/pmem.c, fs/fuse/virtio_fs.c. I 
guess we need to test at least one of them for this patch series to make 
sure we're not breaking them.


>
> Surely this should be:
>
> 		if (pte_devmap(pte))
> -			return NULL;
> +			return pfn_to_page(pfn);
>
> or maybe
>
> +			goto check_pfn;
>
> But I don't know about that highest_memmap_pfn check.

Looks to me like it should work. highest_memmap_pfn gets updated in 
memremap_pages -> pagemap_range -> move_pfn_range_to_zone -> 
memmap_init_range.

Regards,
   Felix


>
>> @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>   	return pfn_to_page(pfn);
>>   }
>>   
>> +/*
>> + * vm_normal_lru_page -- This function gets the "struct page" associated
>> + * with a pte only for page cache and anon page. These pages are LRU handled.
>> + */
>> +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
>> +			    pte_t pte)
> It seems a shame to add a new function without proper kernel-doc.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
  2022-03-10 17:26   ` Alex Sierra
@ 2022-03-11  9:16     ` David Hildenbrand
  -1 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2022-03-11  9:16 UTC (permalink / raw)
  To: Alex Sierra, jgg
  Cc: Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs,
	amd-gfx, dri-devel, hch, jglisse, apopple, willy, akpm

On 10.03.22 18:26, Alex Sierra wrote:
> DEVICE_COHERENT pages introduce a subtle distinction in the way
> "normal" pages can be used by various callers throughout the kernel.
> They behave like normal pages for purposes of mapping in CPU page
> tables, and for COW. But they do not support LRU lists, NUMA
> migration or THP. Therefore we split vm_normal_page into two
> functions vm_normal_any_page and vm_normal_lru_page. The latter will
> only return pages that can be put on an LRU list and that support
> NUMA migration, KSM and THP.
> 
> We also introduced a FOLL_LRU flag that adds the same behaviour to
> follow_page and related APIs, to allow callers to specify that they
> expect to put pages on an LRU list.
> 

I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
as this patch is dominated by that change, I'd suggest (again) to just
drop it as I don't see any value of that renaming. No specifier implies any.

The general idea of this change LGTM.


I wonder how this interacts with the actual DEVICE_COHERENT coherent
series. Is this a preparation? Should it be part of the DEVICE_COHERENT
series?

IOW, should this patch start with

"With DEVICE_COHERENT, we'll soon have vm_normal_pages() return
device-managed anonymous pages that are not LRU pages. Although they
behave like normal pages for purposes of mapping in CPU page, and for
COW, they do not support LRU lists, NUMA migration or THP. [...]"

But then, I'm confused by patch 2 and 3, because it feels more like we'd
already have DEVICE_COHERENT then ("hmm_is_coherent_type").


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-11  9:16     ` David Hildenbrand
  0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2022-03-11  9:16 UTC (permalink / raw)
  To: Alex Sierra, jgg
  Cc: rcampbell, willy, Felix.Kuehling, apopple, amd-gfx, linux-xfs,
	linux-mm, jglisse, dri-devel, akpm, linux-ext4, hch

On 10.03.22 18:26, Alex Sierra wrote:
> DEVICE_COHERENT pages introduce a subtle distinction in the way
> "normal" pages can be used by various callers throughout the kernel.
> They behave like normal pages for purposes of mapping in CPU page
> tables, and for COW. But they do not support LRU lists, NUMA
> migration or THP. Therefore we split vm_normal_page into two
> functions vm_normal_any_page and vm_normal_lru_page. The latter will
> only return pages that can be put on an LRU list and that support
> NUMA migration, KSM and THP.
> 
> We also introduced a FOLL_LRU flag that adds the same behaviour to
> follow_page and related APIs, to allow callers to specify that they
> expect to put pages on an LRU list.
> 

I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
as this patch is dominated by that change, I'd suggest (again) to just
drop it as I don't see any value of that renaming. No specifier implies any.

The general idea of this change LGTM.


I wonder how this interacts with the actual DEVICE_COHERENT coherent
series. Is this a preparation? Should it be part of the DEVICE_COHERENT
series?

IOW, should this patch start with

"With DEVICE_COHERENT, we'll soon have vm_normal_pages() return
device-managed anonymous pages that are not LRU pages. Although they
behave like normal pages for purposes of mapping in CPU page, and for
COW, they do not support LRU lists, NUMA migration or THP. [...]"

But then, I'm confused by patch 2 and 3, because it feels more like we'd
already have DEVICE_COHERENT then ("hmm_is_coherent_type").


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
  2022-03-11  9:16     ` David Hildenbrand
@ 2022-03-11 17:08       ` Felix Kuehling
  -1 siblings, 0 replies; 24+ messages in thread
From: Felix Kuehling @ 2022-03-11 17:08 UTC (permalink / raw)
  To: David Hildenbrand, Alex Sierra, jgg
  Cc: linux-mm, rcampbell, linux-ext4, linux-xfs, amd-gfx, dri-devel,
	hch, jglisse, apopple, willy, akpm

On 2022-03-11 04:16, David Hildenbrand wrote:
> On 10.03.22 18:26, Alex Sierra wrote:
>> DEVICE_COHERENT pages introduce a subtle distinction in the way
>> "normal" pages can be used by various callers throughout the kernel.
>> They behave like normal pages for purposes of mapping in CPU page
>> tables, and for COW. But they do not support LRU lists, NUMA
>> migration or THP. Therefore we split vm_normal_page into two
>> functions vm_normal_any_page and vm_normal_lru_page. The latter will
>> only return pages that can be put on an LRU list and that support
>> NUMA migration, KSM and THP.
>>
>> We also introduced a FOLL_LRU flag that adds the same behaviour to
>> follow_page and related APIs, to allow callers to specify that they
>> expect to put pages on an LRU list.
>>
> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
> as this patch is dominated by that change, I'd suggest (again) to just
> drop it as I don't see any value of that renaming. No specifier implies any.

OK. If nobody objects, we can adopts that naming convention.


>
> The general idea of this change LGTM.
>
>
> I wonder how this interacts with the actual DEVICE_COHERENT coherent
> series. Is this a preparation? Should it be part of the DEVICE_COHERENT
> series?

Yes, it should be part of that series. Alex developed it on top of the 
series for now. But I think eventually it would need to be spliced into it.

Patch1 would need to go somewhere before the other DEVICE_COHERENT 
patches (with minor modifications). Patch 2 could be squashed into 
"tools: add hmm gup test for long term pinned device pages" or go next 
to it. Patch 3 doesn't have a direct dependency on device-coherent 
pages. It only mentions them in comments.


>
> IOW, should this patch start with
>
> "With DEVICE_COHERENT, we'll soon have vm_normal_pages() return
> device-managed anonymous pages that are not LRU pages. Although they
> behave like normal pages for purposes of mapping in CPU page, and for
> COW, they do not support LRU lists, NUMA migration or THP. [...]"

Yes, that makes sense.

Regards,
   Felix


>
> But then, I'm confused by patch 2 and 3, because it feels more like we'd
> already have DEVICE_COHERENT then ("hmm_is_coherent_type").
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-11 17:08       ` Felix Kuehling
  0 siblings, 0 replies; 24+ messages in thread
From: Felix Kuehling @ 2022-03-11 17:08 UTC (permalink / raw)
  To: David Hildenbrand, Alex Sierra, jgg
  Cc: rcampbell, willy, apopple, dri-devel, linux-xfs, linux-mm,
	jglisse, amd-gfx, akpm, linux-ext4, hch

On 2022-03-11 04:16, David Hildenbrand wrote:
> On 10.03.22 18:26, Alex Sierra wrote:
>> DEVICE_COHERENT pages introduce a subtle distinction in the way
>> "normal" pages can be used by various callers throughout the kernel.
>> They behave like normal pages for purposes of mapping in CPU page
>> tables, and for COW. But they do not support LRU lists, NUMA
>> migration or THP. Therefore we split vm_normal_page into two
>> functions vm_normal_any_page and vm_normal_lru_page. The latter will
>> only return pages that can be put on an LRU list and that support
>> NUMA migration, KSM and THP.
>>
>> We also introduced a FOLL_LRU flag that adds the same behaviour to
>> follow_page and related APIs, to allow callers to specify that they
>> expect to put pages on an LRU list.
>>
> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
> as this patch is dominated by that change, I'd suggest (again) to just
> drop it as I don't see any value of that renaming. No specifier implies any.

OK. If nobody objects, we can adopts that naming convention.


>
> The general idea of this change LGTM.
>
>
> I wonder how this interacts with the actual DEVICE_COHERENT coherent
> series. Is this a preparation? Should it be part of the DEVICE_COHERENT
> series?

Yes, it should be part of that series. Alex developed it on top of the 
series for now. But I think eventually it would need to be spliced into it.

Patch1 would need to go somewhere before the other DEVICE_COHERENT 
patches (with minor modifications). Patch 2 could be squashed into 
"tools: add hmm gup test for long term pinned device pages" or go next 
to it. Patch 3 doesn't have a direct dependency on device-coherent 
pages. It only mentions them in comments.


>
> IOW, should this patch start with
>
> "With DEVICE_COHERENT, we'll soon have vm_normal_pages() return
> device-managed anonymous pages that are not LRU pages. Although they
> behave like normal pages for purposes of mapping in CPU page, and for
> COW, they do not support LRU lists, NUMA migration or THP. [...]"

Yes, that makes sense.

Regards,
   Felix


>
> But then, I'm confused by patch 2 and 3, because it feels more like we'd
> already have DEVICE_COHERENT then ("hmm_is_coherent_type").
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
  2022-03-10 21:58       ` Felix Kuehling
@ 2022-03-17  2:50         ` Alistair Popple
  -1 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-03-17  2:50 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Matthew Wilcox, Alex Sierra, jgg, david, linux-mm, rcampbell,
	linux-ext4, linux-xfs, amd-gfx, dri-devel, hch, jglisse, akpm

[-- Attachment #1: Type: text/plain, Size: 2861 bytes --]

Felix Kuehling <felix.kuehling@amd.com> writes:

> Am 2022-03-10 um 14:25 schrieb Matthew Wilcox:
>> On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote:
>>> @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>>>    * PFNMAP mappings in order to support COWable mappings.
>>>    *
>>>    */
>>> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>> +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
>>>   			    pte_t pte)
>>>   {
>>>   	unsigned long pfn = pte_pfn(pte);
>>> @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>>   			return NULL;
>>>   		if (is_zero_pfn(pfn))
>>>   			return NULL;
>>> -		if (pte_devmap(pte))
>>> -			return NULL;
>>>     		print_bad_pte(vma, addr, pte, NULL);
>>>   		return NULL;
>> ... what?
>>
>> Haven't you just made it so that a devmap page always prints a bad PTE
>> message, and then returns NULL anyway?
>
> Yeah, that was stupid. :/  I think the long-term goal was to get rid of
> pte_devmap. But for now, as long as we have pte_special with pte_devmap,
> we'll need a special case to handle that like a normal page.
>
> I only see the PFN_DEV|PFN_MAP flags set in a few places: drivers/dax/device.c,
> drivers/nvdimm/pmem.c, fs/fuse/virtio_fs.c. I guess we need to test at least one
> of them for this patch series to make sure we're not breaking them.
>
>
>>
>> Surely this should be:
>>
>> 		if (pte_devmap(pte))
>> -			return NULL;
>> +			return pfn_to_page(pfn);
>>
>> or maybe
>>
>> +			goto check_pfn;
>>
>> But I don't know about that highest_memmap_pfn check.
>
> Looks to me like it should work. highest_memmap_pfn gets updated in
> memremap_pages -> pagemap_range -> move_pfn_range_to_zone ->
> memmap_init_range.

FWIW the previous version of this feature which was removed in 25b2995a35b6
("mm: remove MEMORY_DEVICE_PUBLIC support") had a similar comparison with
highest_memmap_pfn:

if (likely(pfn <= highest_memmap_pfn)) {
        struct page *page = pfn_to_page(pfn);

        if (is_device_public_page(page)) {
                if (with_public_device)
                        return page;
                return NULL;
        }
}

> Regards,
>   Felix
>
>
>>
>>> @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>>   	return pfn_to_page(pfn);
>>>   }
>>>   +/*
>>> + * vm_normal_lru_page -- This function gets the "struct page" associated
>>> + * with a pte only for page cache and anon page. These pages are LRU handled.
>>> + */
>>> +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
>>> +			    pte_t pte)
>> It seems a shame to add a new function without proper kernel-doc.
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-17  2:50         ` Alistair Popple
  0 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-03-17  2:50 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Alex Sierra, rcampbell, david, dri-devel, Matthew Wilcox,
	linux-xfs, linux-mm, jglisse, amd-gfx, jgg, akpm, linux-ext4,
	hch

[-- Attachment #1: Type: text/plain, Size: 2861 bytes --]

Felix Kuehling <felix.kuehling@amd.com> writes:

> Am 2022-03-10 um 14:25 schrieb Matthew Wilcox:
>> On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote:
>>> @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>>>    * PFNMAP mappings in order to support COWable mappings.
>>>    *
>>>    */
>>> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>> +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
>>>   			    pte_t pte)
>>>   {
>>>   	unsigned long pfn = pte_pfn(pte);
>>> @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>>   			return NULL;
>>>   		if (is_zero_pfn(pfn))
>>>   			return NULL;
>>> -		if (pte_devmap(pte))
>>> -			return NULL;
>>>     		print_bad_pte(vma, addr, pte, NULL);
>>>   		return NULL;
>> ... what?
>>
>> Haven't you just made it so that a devmap page always prints a bad PTE
>> message, and then returns NULL anyway?
>
> Yeah, that was stupid. :/  I think the long-term goal was to get rid of
> pte_devmap. But for now, as long as we have pte_special with pte_devmap,
> we'll need a special case to handle that like a normal page.
>
> I only see the PFN_DEV|PFN_MAP flags set in a few places: drivers/dax/device.c,
> drivers/nvdimm/pmem.c, fs/fuse/virtio_fs.c. I guess we need to test at least one
> of them for this patch series to make sure we're not breaking them.
>
>
>>
>> Surely this should be:
>>
>> 		if (pte_devmap(pte))
>> -			return NULL;
>> +			return pfn_to_page(pfn);
>>
>> or maybe
>>
>> +			goto check_pfn;
>>
>> But I don't know about that highest_memmap_pfn check.
>
> Looks to me like it should work. highest_memmap_pfn gets updated in
> memremap_pages -> pagemap_range -> move_pfn_range_to_zone ->
> memmap_init_range.

FWIW the previous version of this feature which was removed in 25b2995a35b6
("mm: remove MEMORY_DEVICE_PUBLIC support") had a similar comparison with
highest_memmap_pfn:

if (likely(pfn <= highest_memmap_pfn)) {
        struct page *page = pfn_to_page(pfn);

        if (is_device_public_page(page)) {
                if (with_public_device)
                        return page;
                return NULL;
        }
}

> Regards,
>   Felix
>
>
>>
>>> @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>>   	return pfn_to_page(pfn);
>>>   }
>>>   +/*
>>> + * vm_normal_lru_page -- This function gets the "struct page" associated
>>> + * with a pte only for page cache and anon page. These pages are LRU handled.
>>> + */
>>> +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
>>> +			    pte_t pte)
>> It seems a shame to add a new function without proper kernel-doc.
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
  2022-03-11 17:08       ` Felix Kuehling
@ 2022-03-17  2:54         ` Alistair Popple
  -1 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-03-17  2:54 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: David Hildenbrand, Alex Sierra, jgg, linux-mm, rcampbell,
	linux-ext4, linux-xfs, amd-gfx, dri-devel, hch, jglisse, willy,
	akpm

[-- Attachment #1: Type: text/plain, Size: 3378 bytes --]

Felix Kuehling <felix.kuehling@amd.com> writes:

> On 2022-03-11 04:16, David Hildenbrand wrote:
>> On 10.03.22 18:26, Alex Sierra wrote:
>>> DEVICE_COHERENT pages introduce a subtle distinction in the way
>>> "normal" pages can be used by various callers throughout the kernel.
>>> They behave like normal pages for purposes of mapping in CPU page
>>> tables, and for COW. But they do not support LRU lists, NUMA
>>> migration or THP. Therefore we split vm_normal_page into two
>>> functions vm_normal_any_page and vm_normal_lru_page. The latter will
>>> only return pages that can be put on an LRU list and that support
>>> NUMA migration, KSM and THP.
>>>
>>> We also introduced a FOLL_LRU flag that adds the same behaviour to
>>> follow_page and related APIs, to allow callers to specify that they
>>> expect to put pages on an LRU list.
>>>
>> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
>> as this patch is dominated by that change, I'd suggest (again) to just
>> drop it as I don't see any value of that renaming. No specifier implies any.
>
> OK. If nobody objects, we can adopts that naming convention.

I'd prefer we avoid the churn too, but I don't think we should make
vm_normal_page() the equivalent of vm_normal_any_page(). It would mean
vm_normal_page() would return non-LRU device coherent pages, but to me at least
device coherent pages seem special and not what I'd expect from a function with
"normal" in the name.

So I think it would be better to s/vm_normal_lru_page/vm_normal_page/ and keep
vm_normal_any_page() (or perhaps call it vm_any_page?). This is basically what
the previous incarnation of this feature did:

struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
                            pte_t pte, bool with_public_device);
#define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false)

Except we should add:

#define vm_normal_any_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, true)

>> The general idea of this change LGTM.
>>
>>
>> I wonder how this interacts with the actual DEVICE_COHERENT coherent
>> series. Is this a preparation? Should it be part of the DEVICE_COHERENT
>> series?
>
> Yes, it should be part of that series. Alex developed it on top of the series
> for now. But I think eventually it would need to be spliced into it.

Agreed, this needs to go at the start of the DEVICE_COHERENT series.

Thanks.

Alistair

> Patch1 would need to go somewhere before the other DEVICE_COHERENT patches (with
> minor modifications). Patch 2 could be squashed into "tools: add hmm gup test
> for long term pinned device pages" or go next to it. Patch 3 doesn't have a
> direct dependency on device-coherent pages. It only mentions them in comments.
>
>
>>
>> IOW, should this patch start with
>>
>> "With DEVICE_COHERENT, we'll soon have vm_normal_pages() return
>> device-managed anonymous pages that are not LRU pages. Although they
>> behave like normal pages for purposes of mapping in CPU page, and for
>> COW, they do not support LRU lists, NUMA migration or THP. [...]"
>
> Yes, that makes sense.
>
> Regards,
>   Felix
>
>
>>
>> But then, I'm confused by patch 2 and 3, because it feels more like we'd
>> already have DEVICE_COHERENT then ("hmm_is_coherent_type").
>>
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-17  2:54         ` Alistair Popple
  0 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-03-17  2:54 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Alex Sierra, rcampbell, willy, David Hildenbrand, amd-gfx,
	linux-xfs, linux-mm, jglisse, dri-devel, jgg, akpm, linux-ext4,
	hch

[-- Attachment #1: Type: text/plain, Size: 3378 bytes --]

Felix Kuehling <felix.kuehling@amd.com> writes:

> On 2022-03-11 04:16, David Hildenbrand wrote:
>> On 10.03.22 18:26, Alex Sierra wrote:
>>> DEVICE_COHERENT pages introduce a subtle distinction in the way
>>> "normal" pages can be used by various callers throughout the kernel.
>>> They behave like normal pages for purposes of mapping in CPU page
>>> tables, and for COW. But they do not support LRU lists, NUMA
>>> migration or THP. Therefore we split vm_normal_page into two
>>> functions vm_normal_any_page and vm_normal_lru_page. The latter will
>>> only return pages that can be put on an LRU list and that support
>>> NUMA migration, KSM and THP.
>>>
>>> We also introduced a FOLL_LRU flag that adds the same behaviour to
>>> follow_page and related APIs, to allow callers to specify that they
>>> expect to put pages on an LRU list.
>>>
>> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
>> as this patch is dominated by that change, I'd suggest (again) to just
>> drop it as I don't see any value of that renaming. No specifier implies any.
>
> OK. If nobody objects, we can adopts that naming convention.

I'd prefer we avoid the churn too, but I don't think we should make
vm_normal_page() the equivalent of vm_normal_any_page(). It would mean
vm_normal_page() would return non-LRU device coherent pages, but to me at least
device coherent pages seem special and not what I'd expect from a function with
"normal" in the name.

So I think it would be better to s/vm_normal_lru_page/vm_normal_page/ and keep
vm_normal_any_page() (or perhaps call it vm_any_page?). This is basically what
the previous incarnation of this feature did:

struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
                            pte_t pte, bool with_public_device);
#define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false)

Except we should add:

#define vm_normal_any_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, true)

>> The general idea of this change LGTM.
>>
>>
>> I wonder how this interacts with the actual DEVICE_COHERENT coherent
>> series. Is this a preparation? Should it be part of the DEVICE_COHERENT
>> series?
>
> Yes, it should be part of that series. Alex developed it on top of the series
> for now. But I think eventually it would need to be spliced into it.

Agreed, this needs to go at the start of the DEVICE_COHERENT series.

Thanks.

Alistair

> Patch1 would need to go somewhere before the other DEVICE_COHERENT patches (with
> minor modifications). Patch 2 could be squashed into "tools: add hmm gup test
> for long term pinned device pages" or go next to it. Patch 3 doesn't have a
> direct dependency on device-coherent pages. It only mentions them in comments.
>
>
>>
>> IOW, should this patch start with
>>
>> "With DEVICE_COHERENT, we'll soon have vm_normal_pages() return
>> device-managed anonymous pages that are not LRU pages. Although they
>> behave like normal pages for purposes of mapping in CPU page, and for
>> COW, they do not support LRU lists, NUMA migration or THP. [...]"
>
> Yes, that makes sense.
>
> Regards,
>   Felix
>
>
>>
>> But then, I'm confused by patch 2 and 3, because it feels more like we'd
>> already have DEVICE_COHERENT then ("hmm_is_coherent_type").
>>
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
  2022-03-17  2:54         ` Alistair Popple
@ 2022-03-17  8:13           ` David Hildenbrand
  -1 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2022-03-17  8:13 UTC (permalink / raw)
  To: Alistair Popple, Felix Kuehling
  Cc: Alex Sierra, jgg, linux-mm, rcampbell, linux-ext4, linux-xfs,
	amd-gfx, dri-devel, hch, jglisse, willy, akpm

On 17.03.22 03:54, Alistair Popple wrote:
> Felix Kuehling <felix.kuehling@amd.com> writes:
> 
>> On 2022-03-11 04:16, David Hildenbrand wrote:
>>> On 10.03.22 18:26, Alex Sierra wrote:
>>>> DEVICE_COHERENT pages introduce a subtle distinction in the way
>>>> "normal" pages can be used by various callers throughout the kernel.
>>>> They behave like normal pages for purposes of mapping in CPU page
>>>> tables, and for COW. But they do not support LRU lists, NUMA
>>>> migration or THP. Therefore we split vm_normal_page into two
>>>> functions vm_normal_any_page and vm_normal_lru_page. The latter will
>>>> only return pages that can be put on an LRU list and that support
>>>> NUMA migration, KSM and THP.
>>>>
>>>> We also introduced a FOLL_LRU flag that adds the same behaviour to
>>>> follow_page and related APIs, to allow callers to specify that they
>>>> expect to put pages on an LRU list.
>>>>
>>> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
>>> as this patch is dominated by that change, I'd suggest (again) to just
>>> drop it as I don't see any value of that renaming. No specifier implies any.
>>
>> OK. If nobody objects, we can adopts that naming convention.
> 
> I'd prefer we avoid the churn too, but I don't think we should make
> vm_normal_page() the equivalent of vm_normal_any_page(). It would mean
> vm_normal_page() would return non-LRU device coherent pages, but to me at least
> device coherent pages seem special and not what I'd expect from a function with
> "normal" in the name.
> 
> So I think it would be better to s/vm_normal_lru_page/vm_normal_page/ and keep
> vm_normal_any_page() (or perhaps call it vm_any_page?). This is basically what
> the previous incarnation of this feature did:
> 
> struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>                             pte_t pte, bool with_public_device);
> #define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false)
> 
> Except we should add:
> 
> #define vm_normal_any_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, true)
> 

"normal" simply tells us that this is not a special mapping -- IOW, we
want the VM to take a look at the memmap and not treat it like a PFN
map. What we're changing is that we're now also returning non-lru pages.
Fair enough, that's why we introduce vm_normal_lru_page() as a
replacement where we really can only deal with lru pages.

vm_normal_page vs vm_normal_lru_page is good enough. "lru" further
limits what we get via vm_normal_page, that's even how it's implemented.

vm_normal_page vs vm_normal_any_page is confusing IMHO.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-17  8:13           ` David Hildenbrand
  0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2022-03-17  8:13 UTC (permalink / raw)
  To: Alistair Popple, Felix Kuehling
  Cc: Alex Sierra, rcampbell, willy, amd-gfx, linux-xfs, linux-mm,
	jglisse, dri-devel, jgg, akpm, linux-ext4, hch

On 17.03.22 03:54, Alistair Popple wrote:
> Felix Kuehling <felix.kuehling@amd.com> writes:
> 
>> On 2022-03-11 04:16, David Hildenbrand wrote:
>>> On 10.03.22 18:26, Alex Sierra wrote:
>>>> DEVICE_COHERENT pages introduce a subtle distinction in the way
>>>> "normal" pages can be used by various callers throughout the kernel.
>>>> They behave like normal pages for purposes of mapping in CPU page
>>>> tables, and for COW. But they do not support LRU lists, NUMA
>>>> migration or THP. Therefore we split vm_normal_page into two
>>>> functions vm_normal_any_page and vm_normal_lru_page. The latter will
>>>> only return pages that can be put on an LRU list and that support
>>>> NUMA migration, KSM and THP.
>>>>
>>>> We also introduced a FOLL_LRU flag that adds the same behaviour to
>>>> follow_page and related APIs, to allow callers to specify that they
>>>> expect to put pages on an LRU list.
>>>>
>>> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
>>> as this patch is dominated by that change, I'd suggest (again) to just
>>> drop it as I don't see any value of that renaming. No specifier implies any.
>>
>> OK. If nobody objects, we can adopts that naming convention.
> 
> I'd prefer we avoid the churn too, but I don't think we should make
> vm_normal_page() the equivalent of vm_normal_any_page(). It would mean
> vm_normal_page() would return non-LRU device coherent pages, but to me at least
> device coherent pages seem special and not what I'd expect from a function with
> "normal" in the name.
> 
> So I think it would be better to s/vm_normal_lru_page/vm_normal_page/ and keep
> vm_normal_any_page() (or perhaps call it vm_any_page?). This is basically what
> the previous incarnation of this feature did:
> 
> struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>                             pte_t pte, bool with_public_device);
> #define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false)
> 
> Except we should add:
> 
> #define vm_normal_any_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, true)
> 

"normal" simply tells us that this is not a special mapping -- IOW, we
want the VM to take a look at the memmap and not treat it like a PFN
map. What we're changing is that we're now also returning non-lru pages.
Fair enough, that's why we introduce vm_normal_lru_page() as a
replacement where we really can only deal with lru pages.

vm_normal_page vs vm_normal_lru_page is good enough. "lru" further
limits what we get via vm_normal_page, that's even how it's implemented.

vm_normal_page vs vm_normal_any_page is confusing IMHO.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
  2022-03-17  8:13           ` David Hildenbrand
@ 2022-03-17 13:25             ` Jason Gunthorpe
  -1 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2022-03-17 13:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Alistair Popple, Felix Kuehling, Alex Sierra, linux-mm,
	rcampbell, linux-ext4, linux-xfs, amd-gfx, dri-devel, hch,
	jglisse, willy, akpm

On Thu, Mar 17, 2022 at 09:13:50AM +0100, David Hildenbrand wrote:
> On 17.03.22 03:54, Alistair Popple wrote:
> > Felix Kuehling <felix.kuehling@amd.com> writes:
> > 
> >> On 2022-03-11 04:16, David Hildenbrand wrote:
> >>> On 10.03.22 18:26, Alex Sierra wrote:
> >>>> DEVICE_COHERENT pages introduce a subtle distinction in the way
> >>>> "normal" pages can be used by various callers throughout the kernel.
> >>>> They behave like normal pages for purposes of mapping in CPU page
> >>>> tables, and for COW. But they do not support LRU lists, NUMA
> >>>> migration or THP. Therefore we split vm_normal_page into two
> >>>> functions vm_normal_any_page and vm_normal_lru_page. The latter will
> >>>> only return pages that can be put on an LRU list and that support
> >>>> NUMA migration, KSM and THP.
> >>>>
> >>>> We also introduced a FOLL_LRU flag that adds the same behaviour to
> >>>> follow_page and related APIs, to allow callers to specify that they
> >>>> expect to put pages on an LRU list.
> >>>>
> >>> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
> >>> as this patch is dominated by that change, I'd suggest (again) to just
> >>> drop it as I don't see any value of that renaming. No specifier implies any.
> >>
> >> OK. If nobody objects, we can adopts that naming convention.
> > 
> > I'd prefer we avoid the churn too, but I don't think we should make
> > vm_normal_page() the equivalent of vm_normal_any_page(). It would mean
> > vm_normal_page() would return non-LRU device coherent pages, but to me at least
> > device coherent pages seem special and not what I'd expect from a function with
> > "normal" in the name.
> > 
> > So I think it would be better to s/vm_normal_lru_page/vm_normal_page/ and keep
> > vm_normal_any_page() (or perhaps call it vm_any_page?). This is basically what
> > the previous incarnation of this feature did:
> > 
> > struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> >                             pte_t pte, bool with_public_device);
> > #define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false)
> > 
> > Except we should add:
> > 
> > #define vm_normal_any_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, true)
> > 
> 
> "normal" simply tells us that this is not a special mapping -- IOW, we
> want the VM to take a look at the memmap and not treat it like a PFN
> map. What we're changing is that we're now also returning non-lru pages.
> Fair enough, that's why we introduce vm_normal_lru_page() as a
> replacement where we really can only deal with lru pages.
> 
> vm_normal_page vs vm_normal_lru_page is good enough. "lru" further
> limits what we get via vm_normal_page, that's even how it's implemented.

This naming makes sense to me.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
@ 2022-03-17 13:25             ` Jason Gunthorpe
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2022-03-17 13:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Alex Sierra, rcampbell, willy, Felix Kuehling, Alistair Popple,
	amd-gfx, linux-xfs, linux-mm, jglisse, dri-devel, akpm,
	linux-ext4, hch

On Thu, Mar 17, 2022 at 09:13:50AM +0100, David Hildenbrand wrote:
> On 17.03.22 03:54, Alistair Popple wrote:
> > Felix Kuehling <felix.kuehling@amd.com> writes:
> > 
> >> On 2022-03-11 04:16, David Hildenbrand wrote:
> >>> On 10.03.22 18:26, Alex Sierra wrote:
> >>>> DEVICE_COHERENT pages introduce a subtle distinction in the way
> >>>> "normal" pages can be used by various callers throughout the kernel.
> >>>> They behave like normal pages for purposes of mapping in CPU page
> >>>> tables, and for COW. But they do not support LRU lists, NUMA
> >>>> migration or THP. Therefore we split vm_normal_page into two
> >>>> functions vm_normal_any_page and vm_normal_lru_page. The latter will
> >>>> only return pages that can be put on an LRU list and that support
> >>>> NUMA migration, KSM and THP.
> >>>>
> >>>> We also introduced a FOLL_LRU flag that adds the same behaviour to
> >>>> follow_page and related APIs, to allow callers to specify that they
> >>>> expect to put pages on an LRU list.
> >>>>
> >>> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And
> >>> as this patch is dominated by that change, I'd suggest (again) to just
> >>> drop it as I don't see any value of that renaming. No specifier implies any.
> >>
> >> OK. If nobody objects, we can adopts that naming convention.
> > 
> > I'd prefer we avoid the churn too, but I don't think we should make
> > vm_normal_page() the equivalent of vm_normal_any_page(). It would mean
> > vm_normal_page() would return non-LRU device coherent pages, but to me at least
> > device coherent pages seem special and not what I'd expect from a function with
> > "normal" in the name.
> > 
> > So I think it would be better to s/vm_normal_lru_page/vm_normal_page/ and keep
> > vm_normal_any_page() (or perhaps call it vm_any_page?). This is basically what
> > the previous incarnation of this feature did:
> > 
> > struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> >                             pte_t pte, bool with_public_device);
> > #define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false)
> > 
> > Except we should add:
> > 
> > #define vm_normal_any_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, true)
> > 
> 
> "normal" simply tells us that this is not a special mapping -- IOW, we
> want the VM to take a look at the memmap and not treat it like a PFN
> map. What we're changing is that we're now also returning non-lru pages.
> Fair enough, that's why we introduce vm_normal_lru_page() as a
> replacement where we really can only deal with lru pages.
> 
> vm_normal_page vs vm_normal_lru_page is good enough. "lru" further
> limits what we get via vm_normal_page, that's even how it's implemented.

This naming makes sense to me.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-03-17 13:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-10 17:26 [PATCH v1 0/3] split vm_normal_pages for LRU and non-LRU handling Alex Sierra
2022-03-10 17:26 ` Alex Sierra
2022-03-10 17:26 ` [PATCH v1 1/3] mm: " Alex Sierra
2022-03-10 17:26   ` Alex Sierra
2022-03-10 19:25   ` Matthew Wilcox
2022-03-10 19:25     ` Matthew Wilcox
2022-03-10 21:58     ` Felix Kuehling
2022-03-10 21:58       ` Felix Kuehling
2022-03-17  2:50       ` Alistair Popple
2022-03-17  2:50         ` Alistair Popple
2022-03-11  9:16   ` David Hildenbrand
2022-03-11  9:16     ` David Hildenbrand
2022-03-11 17:08     ` Felix Kuehling
2022-03-11 17:08       ` Felix Kuehling
2022-03-17  2:54       ` Alistair Popple
2022-03-17  2:54         ` Alistair Popple
2022-03-17  8:13         ` David Hildenbrand
2022-03-17  8:13           ` David Hildenbrand
2022-03-17 13:25           ` Jason Gunthorpe
2022-03-17 13:25             ` Jason Gunthorpe
2022-03-10 17:26 ` [PATCH v1 2/3] tools: add more gup configs to hmm_gup selftests Alex Sierra
2022-03-10 17:26   ` Alex Sierra
2022-03-10 17:26 ` [PATCH v1 3/3] tools: add selftests to hmm for COW in device memory Alex Sierra
2022-03-10 17:26   ` Alex Sierra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.