linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers
@ 2022-06-06 21:44 Yang Shi
  2022-06-06 21:44 ` [v3 PATCH 1/7] mm: khugepaged: check THP flag in hugepage_vma_check() Yang Shi
                   ` (8 more replies)
  0 siblings, 9 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-06 21:44 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, willy, akpm; +Cc: shy828301, linux-mm, linux-kernel


v3: * Fixed the comment from Willy
v2: * Rebased to the latest mm-unstable
    * Fixed potential regression for smaps's THPeligible

This series is the follow-up of the discussion about cleaning up transhuge_xxx
helpers at https://lore.kernel.org/linux-mm/627a71f8-e879-69a5-ceb3-fc8d29d2f7f1@suse.cz/.

THP has a bunch of helpers that do VMA sanity check for different paths, they
do the similar checks for the most callsites and have a lot duplicate codes.
And it is confusing what helpers should be used at what conditions.

This series reorganized and cleaned up the code so that we could consolidate
all the checks into hugepage_vma_check().

The transhuge_vma_enabled(), transparent_hugepage_active() and
__transparent_hugepage_enabled() are killed by this series.

Added transhuge_vma_size_ok() helper to remove some duplicate code.


Yang Shi (7):
      mm: khugepaged: check THP flag in hugepage_vma_check()
      mm: thp: introduce transhuge_vma_size_ok() helper
      mm: khugepaged: remove the redundant anon vma check
      mm: khugepaged: use transhuge_vma_suitable replace open-code
      mm: thp: kill transparent_hugepage_active()
      mm: thp: kill __transhuge_page_enabled()
      mm: khugepaged: reorg some khugepaged helpers

 fs/proc/task_mmu.c         |  2 +-
 include/linux/huge_mm.h    | 84 ++++++++++++++++++++++++++++------------------------------------------
 include/linux/khugepaged.h | 21 ++----------------
 mm/huge_memory.c           | 64 +++++++++++++++++++++++++++++++++++++++++++++--------
 mm/khugepaged.c            | 78 +++++++++++++++--------------------------------------------------
 mm/memory.c                |  7 ++++--
 6 files changed, 114 insertions(+), 142 deletions(-)



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [v3 PATCH 1/7] mm: khugepaged: check THP flag in hugepage_vma_check()
  2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
@ 2022-06-06 21:44 ` Yang Shi
  2022-06-09 17:49   ` Zach O'Keefe
  2022-06-10  7:09   ` Miaohe Lin
  2022-06-06 21:44 ` [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper Yang Shi
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-06 21:44 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, willy, akpm; +Cc: shy828301, linux-mm, linux-kernel

Currently the THP flag check in hugepage_vma_check() will fallthrough if
the flag is NEVER and VM_HUGEPAGE is set.  This is not a problem for now
since all the callers have the flag checked before or can't be invoked if
the flag is NEVER.

However, the following patch will call hugepage_vma_check() in more
places, for example, page fault, so this flag must be checked in
hugepge_vma_check().

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/khugepaged.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 671ac7800e53..84b9cf4b9be9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -458,6 +458,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
 	if (shmem_file(vma->vm_file))
 		return shmem_huge_enabled(vma);
 
+	if (!khugepaged_enabled())
+		return false;
+
 	/* THP settings require madvise. */
 	if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
 		return false;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper
  2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
  2022-06-06 21:44 ` [v3 PATCH 1/7] mm: khugepaged: check THP flag in hugepage_vma_check() Yang Shi
@ 2022-06-06 21:44 ` Yang Shi
  2022-06-09 22:21   ` Zach O'Keefe
  2022-06-10  7:20   ` Miaohe Lin
  2022-06-06 21:44 ` [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check Yang Shi
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-06 21:44 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, willy, akpm; +Cc: shy828301, linux-mm, linux-kernel

There are couple of places that check whether the vma size is ok for
THP or not, they are open coded and duplicate, introduce
transhuge_vma_size_ok() helper to do the job.

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 include/linux/huge_mm.h | 17 +++++++++++++++++
 mm/huge_memory.c        |  5 +----
 mm/khugepaged.c         | 12 ++++++------
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 648cb3ce7099..a8f61db47f2a 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -116,6 +116,18 @@ extern struct kobj_attribute shmem_enabled_attr;
 
 extern unsigned long transparent_hugepage_flags;
 
+/*
+ * The vma size has to be large enough to hold an aligned HPAGE_PMD_SIZE area.
+ */
+static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
+{
+	if (round_up(vma->vm_start, HPAGE_PMD_SIZE) <
+	    (vma->vm_end & HPAGE_PMD_MASK))
+		return true;
+
+	return false;
+}
+
 static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 		unsigned long addr)
 {
@@ -345,6 +357,11 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
 	return false;
 }
 
+static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
+{
+	return false;
+}
+
 static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 		unsigned long addr)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 48182c8fe151..36ada544e494 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -71,10 +71,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
 
 bool transparent_hugepage_active(struct vm_area_struct *vma)
 {
-	/* The addr is used to check if the vma size fits */
-	unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
-
-	if (!transhuge_vma_suitable(vma, addr))
+	if (!transhuge_vma_size_ok(vma))
 		return false;
 	if (vma_is_anonymous(vma))
 		return __transparent_hugepage_enabled(vma);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 84b9cf4b9be9..d0f8020164fc 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -454,6 +454,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
 				vma->vm_pgoff, HPAGE_PMD_NR))
 		return false;
 
+	if (!transhuge_vma_size_ok(vma))
+		return false;
+
 	/* Enabled via shmem mount options or sysfs settings. */
 	if (shmem_file(vma->vm_file))
 		return shmem_huge_enabled(vma);
@@ -512,9 +515,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
 			  unsigned long vm_flags)
 {
 	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
-	    khugepaged_enabled() &&
-	    (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
-	     (vma->vm_end & HPAGE_PMD_MASK))) {
+	    khugepaged_enabled()) {
 		if (hugepage_vma_check(vma, vm_flags))
 			__khugepaged_enter(vma->vm_mm);
 	}
@@ -2142,10 +2143,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 			progress++;
 			continue;
 		}
-		hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
+
+		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
 		hend = vma->vm_end & HPAGE_PMD_MASK;
-		if (hstart >= hend)
-			goto skip;
 		if (khugepaged_scan.address > hend)
 			goto skip;
 		if (khugepaged_scan.address < hstart)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check
  2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
  2022-06-06 21:44 ` [v3 PATCH 1/7] mm: khugepaged: check THP flag in hugepage_vma_check() Yang Shi
  2022-06-06 21:44 ` [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper Yang Shi
@ 2022-06-06 21:44 ` Yang Shi
  2022-06-09 23:23   ` Zach O'Keefe
  2022-06-10  7:23   ` Miaohe Lin
  2022-06-06 21:44 ` [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code Yang Shi
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-06 21:44 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, willy, akpm; +Cc: shy828301, linux-mm, linux-kernel

The hugepage_vma_check() already checked it, so remove the redundant
check.

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/khugepaged.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d0f8020164fc..7a5d1c1a1833 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -966,9 +966,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
 		return SCAN_ADDRESS_RANGE;
 	if (!hugepage_vma_check(vma, vma->vm_flags))
 		return SCAN_VMA_CHECK;
-	/* Anon VMA expected */
-	if (!vma->anon_vma || !vma_is_anonymous(vma))
-		return SCAN_VMA_CHECK;
 	return 0;
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code
  2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
                   ` (2 preceding siblings ...)
  2022-06-06 21:44 ` [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check Yang Shi
@ 2022-06-06 21:44 ` Yang Shi
  2022-06-10  1:51   ` Zach O'Keefe
  2022-06-06 21:44 ` [v3 PATCH 5/7] mm: thp: kill transparent_hugepage_active() Yang Shi
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-06 21:44 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, willy, akpm; +Cc: shy828301, linux-mm, linux-kernel

The hugepage_vma_revalidate() needs to check if the address is still in
the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock,
but it was open-coded, use transhuge_vma_suitable() to do the job.  And
add proper comments for transhuge_vma_suitable().

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 include/linux/huge_mm.h | 6 ++++++
 mm/khugepaged.c         | 5 +----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a8f61db47f2a..79d5919beb83 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
 	return false;
 }
 
+/*
+ * Do the below checks:
+ *   - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned.
+ *   - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE
+ *     area.
+ */
 static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 		unsigned long addr)
 {
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 7a5d1c1a1833..ca1754d3a827 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
 		struct vm_area_struct **vmap)
 {
 	struct vm_area_struct *vma;
-	unsigned long hstart, hend;
 
 	if (unlikely(khugepaged_test_exit(mm)))
 		return SCAN_ANY_PROCESS;
@@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
 	if (!vma)
 		return SCAN_VMA_NULL;
 
-	hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
-	hend = vma->vm_end & HPAGE_PMD_MASK;
-	if (address < hstart || address + HPAGE_PMD_SIZE > hend)
+	if (!transhuge_vma_suitable(vma, address))
 		return SCAN_ADDRESS_RANGE;
 	if (!hugepage_vma_check(vma, vma->vm_flags))
 		return SCAN_VMA_CHECK;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [v3 PATCH 5/7] mm: thp: kill transparent_hugepage_active()
  2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
                   ` (3 preceding siblings ...)
  2022-06-06 21:44 ` [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code Yang Shi
@ 2022-06-06 21:44 ` Yang Shi
  2022-06-10  1:02   ` Zach O'Keefe
  2022-06-06 21:44 ` [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled() Yang Shi
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-06 21:44 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, willy, akpm; +Cc: shy828301, linux-mm, linux-kernel

The transparent_hugepage_active() was introduced to show THP eligibility
bit in smaps in proc, smaps is the only user.  But it actually does the
similar check as hugepage_vma_check() which is used by khugepaged.  We
definitely don't have to maintain two similar checks, so kill
transparent_hugepage_active().

Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it
is not only for khugepaged anymore.

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 fs/proc/task_mmu.c         |  2 +-
 include/linux/huge_mm.h    | 16 +++++++-----
 include/linux/khugepaged.h |  4 +--
 mm/huge_memory.c           | 50 ++++++++++++++++++++++++++++++++-----
 mm/khugepaged.c            | 51 +++-----------------------------------
 5 files changed, 60 insertions(+), 63 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2dd8c8a66924..fd79566e204c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
 	__show_smap(m, &mss, false);
 
 	seq_printf(m, "THPeligible:    %d\n",
-		   transparent_hugepage_active(vma));
+		   hugepage_vma_check(vma, vma->vm_flags, true));
 
 	if (arch_pkeys_enabled())
 		seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 79d5919beb83..f561c3e16def 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -209,7 +209,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 	       !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
 }
 
-bool transparent_hugepage_active(struct vm_area_struct *vma);
+bool hugepage_vma_check(struct vm_area_struct *vma,
+			unsigned long vm_flags,
+			bool smaps);
 
 #define transparent_hugepage_use_zero_page()				\
 	(transparent_hugepage_flags &					\
@@ -358,11 +360,6 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
 	return false;
 }
 
-static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
-{
-	return false;
-}
-
 static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
 {
 	return false;
@@ -380,6 +377,13 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
 	return false;
 }
 
+static inline bool hugepage_vma_check(struct vm_area_struct *vma,
+				       unsigned long vm_flags,
+				       bool smaps)
+{
+	return false;
+}
+
 static inline void prep_transhuge_page(struct page *page) {}
 
 #define transparent_hugepage_flags 0UL
diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 392d34c3c59a..8a6452e089ca 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -10,8 +10,6 @@ extern struct attribute_group khugepaged_attr_group;
 extern int khugepaged_init(void);
 extern void khugepaged_destroy(void);
 extern int start_stop_khugepaged(void);
-extern bool hugepage_vma_check(struct vm_area_struct *vma,
-			       unsigned long vm_flags);
 extern void __khugepaged_enter(struct mm_struct *mm);
 extern void __khugepaged_exit(struct mm_struct *mm);
 extern void khugepaged_enter_vma(struct vm_area_struct *vma,
@@ -57,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
 {
 	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
 	    khugepaged_enabled()) {
-		if (hugepage_vma_check(vma, vm_flags))
+		if (hugepage_vma_check(vma, vm_flags, false))
 			__khugepaged_enter(vma->vm_mm);
 	}
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 36ada544e494..bc8370856e85 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -69,18 +69,56 @@ static atomic_t huge_zero_refcount;
 struct page *huge_zero_page __read_mostly;
 unsigned long huge_zero_pfn __read_mostly = ~0UL;
 
-bool transparent_hugepage_active(struct vm_area_struct *vma)
+bool hugepage_vma_check(struct vm_area_struct *vma,
+			unsigned long vm_flags,
+			bool smaps)
 {
+	if (!transhuge_vma_enabled(vma, vm_flags))
+		return false;
+
+	if (vm_flags & VM_NO_KHUGEPAGED)
+		return false;
+
+	/* Don't run khugepaged against DAX vma */
+	if (vma_is_dax(vma))
+		return false;
+
+	if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
+				vma->vm_pgoff, HPAGE_PMD_NR))
+		return false;
+
 	if (!transhuge_vma_size_ok(vma))
 		return false;
-	if (vma_is_anonymous(vma))
-		return __transparent_hugepage_enabled(vma);
-	if (vma_is_shmem(vma))
+
+	/* Enabled via shmem mount options or sysfs settings. */
+	if (shmem_file(vma->vm_file))
 		return shmem_huge_enabled(vma);
-	if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
+
+	if (!khugepaged_enabled())
+		return false;
+
+	/* THP settings require madvise. */
+	if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
+		return false;
+
+	/* Only regular file is valid */
+	if (file_thp_enabled(vma))
 		return true;
 
-	return false;
+	if (!vma_is_anonymous(vma))
+		return false;
+
+	if (vma_is_temporary_stack(vma))
+		return false;
+
+	/*
+	 * THPeligible bit of smaps should show 1 for proper VMAs even
+	 * though anon_vma is not initialized yet.
+	 */
+	if (!vma->anon_vma)
+		return smaps;
+
+	return true;
 }
 
 static bool get_huge_zero_page(void)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ca1754d3a827..aa0769e3b0d9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -437,49 +437,6 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
 	return atomic_read(&mm->mm_users) == 0;
 }
 
-bool hugepage_vma_check(struct vm_area_struct *vma,
-			unsigned long vm_flags)
-{
-	if (!transhuge_vma_enabled(vma, vm_flags))
-		return false;
-
-	if (vm_flags & VM_NO_KHUGEPAGED)
-		return false;
-
-	/* Don't run khugepaged against DAX vma */
-	if (vma_is_dax(vma))
-		return false;
-
-	if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
-				vma->vm_pgoff, HPAGE_PMD_NR))
-		return false;
-
-	if (!transhuge_vma_size_ok(vma))
-		return false;
-
-	/* Enabled via shmem mount options or sysfs settings. */
-	if (shmem_file(vma->vm_file))
-		return shmem_huge_enabled(vma);
-
-	if (!khugepaged_enabled())
-		return false;
-
-	/* THP settings require madvise. */
-	if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
-		return false;
-
-	/* Only regular file is valid */
-	if (file_thp_enabled(vma))
-		return true;
-
-	if (!vma->anon_vma || !vma_is_anonymous(vma))
-		return false;
-	if (vma_is_temporary_stack(vma))
-		return false;
-
-	return true;
-}
-
 void __khugepaged_enter(struct mm_struct *mm)
 {
 	struct mm_slot *mm_slot;
@@ -516,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
 {
 	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
 	    khugepaged_enabled()) {
-		if (hugepage_vma_check(vma, vm_flags))
+		if (hugepage_vma_check(vma, vm_flags, false))
 			__khugepaged_enter(vma->vm_mm);
 	}
 }
@@ -961,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
 
 	if (!transhuge_vma_suitable(vma, address))
 		return SCAN_ADDRESS_RANGE;
-	if (!hugepage_vma_check(vma, vma->vm_flags))
+	if (!hugepage_vma_check(vma, vma->vm_flags, false))
 		return SCAN_VMA_CHECK;
 	return 0;
 }
@@ -1442,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
 	 * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
 	 * will not fail the vma for missing VM_HUGEPAGE
 	 */
-	if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
+	if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
 		return;
 
 	/* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
@@ -2132,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 			progress++;
 			break;
 		}
-		if (!hugepage_vma_check(vma, vma->vm_flags)) {
+		if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
 skip:
 			progress++;
 			continue;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled()
  2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
                   ` (4 preceding siblings ...)
  2022-06-06 21:44 ` [v3 PATCH 5/7] mm: thp: kill transparent_hugepage_active() Yang Shi
@ 2022-06-06 21:44 ` Yang Shi
  2022-06-10  2:22   ` Zach O'Keefe
  2022-06-06 21:44 ` [v3 PATCH 7/7] mm: khugepaged: reorg some khugepaged helpers Yang Shi
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-06 21:44 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, willy, akpm; +Cc: shy828301, linux-mm, linux-kernel

The page fault path checks THP eligibility with
__transhuge_page_enabled() which does the similar thing as
hugepage_vma_check(), so use hugepage_vma_check() instead.

However page fault allows DAX and !anon_vma cases, so added a new flag,
in_pf, to hugepage_vma_check() to make page fault work correctly.

The in_pf flag is also used to skip shmem and file THP for page fault
since shmem handles THP in its own shmem_fault() and file THP allocation
on fault is not supported yet.

Also remove hugepage_vma_enabled() since hugepage_vma_check() is the
only caller now, it is not necessary to have a helper function.

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 fs/proc/task_mmu.c         |  2 +-
 include/linux/huge_mm.h    | 57 ++------------------------------------
 include/linux/khugepaged.h |  2 +-
 mm/huge_memory.c           | 25 ++++++++++++-----
 mm/khugepaged.c            |  8 +++---
 mm/memory.c                |  7 +++--
 6 files changed, 31 insertions(+), 70 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index fd79566e204c..a0850303baec 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
 	__show_smap(m, &mss, false);
 
 	seq_printf(m, "THPeligible:    %d\n",
-		   hugepage_vma_check(vma, vma->vm_flags, true));
+		   hugepage_vma_check(vma, vma->vm_flags, true, false));
 
 	if (arch_pkeys_enabled())
 		seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index f561c3e16def..d478e8875023 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -153,48 +153,6 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 	return true;
 }
 
-static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
-					  unsigned long vm_flags)
-{
-	/* Explicitly disabled through madvise. */
-	if ((vm_flags & VM_NOHUGEPAGE) ||
-	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
-		return false;
-	return true;
-}
-
-/*
- * to be used on vmas which are known to support THP.
- * Use transparent_hugepage_active otherwise
- */
-static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
-{
-
-	/*
-	 * If the hardware/firmware marked hugepage support disabled.
-	 */
-	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
-		return false;
-
-	if (!transhuge_vma_enabled(vma, vma->vm_flags))
-		return false;
-
-	if (vma_is_temporary_stack(vma))
-		return false;
-
-	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
-		return true;
-
-	if (vma_is_dax(vma))
-		return true;
-
-	if (transparent_hugepage_flags &
-				(1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
-		return !!(vma->vm_flags & VM_HUGEPAGE);
-
-	return false;
-}
-
 static inline bool file_thp_enabled(struct vm_area_struct *vma)
 {
 	struct inode *inode;
@@ -211,7 +169,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 
 bool hugepage_vma_check(struct vm_area_struct *vma,
 			unsigned long vm_flags,
-			bool smaps);
+			bool smaps, bool in_pf);
 
 #define transparent_hugepage_use_zero_page()				\
 	(transparent_hugepage_flags &					\
@@ -355,11 +313,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio)
 	return false;
 }
 
-static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
-{
-	return false;
-}
-
 static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
 {
 	return false;
@@ -371,15 +324,9 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 	return false;
 }
 
-static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
-					  unsigned long vm_flags)
-{
-	return false;
-}
-
 static inline bool hugepage_vma_check(struct vm_area_struct *vma,
 				       unsigned long vm_flags,
-				       bool smaps)
+				       bool smaps, bool in_pf)
 {
 	return false;
 }
diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 8a6452e089ca..e047be601268 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -55,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
 {
 	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
 	    khugepaged_enabled()) {
-		if (hugepage_vma_check(vma, vm_flags, false))
+		if (hugepage_vma_check(vma, vm_flags, false, false))
 			__khugepaged_enter(vma->vm_mm);
 	}
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bc8370856e85..b95786ada466 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -71,17 +71,25 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
 
 bool hugepage_vma_check(struct vm_area_struct *vma,
 			unsigned long vm_flags,
-			bool smaps)
+			bool smaps, bool in_pf)
 {
-	if (!transhuge_vma_enabled(vma, vm_flags))
+	/* Explicitly disabled through madvise or prctl. */
+	if ((vm_flags & VM_NOHUGEPAGE) ||
+	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+		return false;
+	/*
+	 * If the hardware/firmware marked hugepage support disabled.
+	 */
+	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
 		return false;
 
+	/* Special VMA and hugetlb VMA */
 	if (vm_flags & VM_NO_KHUGEPAGED)
 		return false;
 
-	/* Don't run khugepaged against DAX vma */
+	/* khugepaged doesn't collapse DAX vma, but page fault is fine. */
 	if (vma_is_dax(vma))
-		return false;
+		return in_pf;
 
 	if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
 				vma->vm_pgoff, HPAGE_PMD_NR))
@@ -91,7 +99,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
 		return false;
 
 	/* Enabled via shmem mount options or sysfs settings. */
-	if (shmem_file(vma->vm_file))
+	if (!in_pf && shmem_file(vma->vm_file))
 		return shmem_huge_enabled(vma);
 
 	if (!khugepaged_enabled())
@@ -102,7 +110,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
 		return false;
 
 	/* Only regular file is valid */
-	if (file_thp_enabled(vma))
+	if (!in_pf && file_thp_enabled(vma))
 		return true;
 
 	if (!vma_is_anonymous(vma))
@@ -114,9 +122,12 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
 	/*
 	 * THPeligible bit of smaps should show 1 for proper VMAs even
 	 * though anon_vma is not initialized yet.
+	 *
+	 * Allow page fault since anon_vma may be not initialized until
+	 * the first page fault.
 	 */
 	if (!vma->anon_vma)
-		return smaps;
+		return (smaps || in_pf);
 
 	return true;
 }
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index aa0769e3b0d9..ab6183c5489f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -473,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
 {
 	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
 	    khugepaged_enabled()) {
-		if (hugepage_vma_check(vma, vm_flags, false))
+		if (hugepage_vma_check(vma, vm_flags, false, false))
 			__khugepaged_enter(vma->vm_mm);
 	}
 }
@@ -918,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
 
 	if (!transhuge_vma_suitable(vma, address))
 		return SCAN_ADDRESS_RANGE;
-	if (!hugepage_vma_check(vma, vma->vm_flags, false))
+	if (!hugepage_vma_check(vma, vma->vm_flags, false, false))
 		return SCAN_VMA_CHECK;
 	return 0;
 }
@@ -1399,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
 	 * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
 	 * will not fail the vma for missing VM_HUGEPAGE
 	 */
-	if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
+	if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false))
 		return;
 
 	/* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
@@ -2089,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 			progress++;
 			break;
 		}
-		if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
+		if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) {
 skip:
 			progress++;
 			continue;
diff --git a/mm/memory.c b/mm/memory.c
index bc5d40eec5d5..673f7561a30a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4962,6 +4962,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 		.gfp_mask = __get_fault_gfp_mask(vma),
 	};
 	struct mm_struct *mm = vma->vm_mm;
+	unsigned long vm_flags = vma->vm_flags;
 	pgd_t *pgd;
 	p4d_t *p4d;
 	vm_fault_t ret;
@@ -4975,7 +4976,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	if (!vmf.pud)
 		return VM_FAULT_OOM;
 retry_pud:
-	if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
+	if (pud_none(*vmf.pud) &&
+	    hugepage_vma_check(vma, vm_flags, false, true)) {
 		ret = create_huge_pud(&vmf);
 		if (!(ret & VM_FAULT_FALLBACK))
 			return ret;
@@ -5008,7 +5010,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	if (pud_trans_unstable(vmf.pud))
 		goto retry_pud;
 
-	if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
+	if (pmd_none(*vmf.pmd) &&
+	    hugepage_vma_check(vma, vm_flags, false, true)) {
 		ret = create_huge_pmd(&vmf);
 		if (!(ret & VM_FAULT_FALLBACK))
 			return ret;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [v3 PATCH 7/7] mm: khugepaged: reorg some khugepaged helpers
  2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
                   ` (5 preceding siblings ...)
  2022-06-06 21:44 ` [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled() Yang Shi
@ 2022-06-06 21:44 ` Yang Shi
  2022-06-09 23:32 ` [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Zach O'Keefe
  2022-06-10  7:08 ` Miaohe Lin
  8 siblings, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-06 21:44 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, willy, akpm; +Cc: shy828301, linux-mm, linux-kernel

The khugepaged_{enabled|always|req_madv} are not khugepaged only
anymore, move them to huge_mm.h and rename to hugepage_flags_xxx, and
remove khugepaged_req_madv due to no users.

Also move khugepaged_defrag to khugepaged.c since its only caller is in
that file, it doesn't have to be in a header file.

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 include/linux/huge_mm.h    |  8 ++++++++
 include/linux/khugepaged.h | 17 +----------------
 mm/huge_memory.c           |  4 ++--
 mm/khugepaged.c            | 18 +++++++++++-------
 4 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index d478e8875023..ce2d05ee4816 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -116,6 +116,14 @@ extern struct kobj_attribute shmem_enabled_attr;
 
 extern unsigned long transparent_hugepage_flags;
 
+#define hugepage_flags_enabled()					       \
+	(transparent_hugepage_flags &				       \
+	 ((1<<TRANSPARENT_HUGEPAGE_FLAG) |		       \
+	  (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)))
+#define hugepage_flags_always()				\
+	(transparent_hugepage_flags &			\
+	 (1<<TRANSPARENT_HUGEPAGE_FLAG))
+
 /*
  * The vma size has to be large enough to hold an aligned HPAGE_PMD_SIZE area.
  */
diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index e047be601268..9c3b56132eba 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -24,20 +24,6 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
 }
 #endif
 
-#define khugepaged_enabled()					       \
-	(transparent_hugepage_flags &				       \
-	 ((1<<TRANSPARENT_HUGEPAGE_FLAG) |		       \
-	  (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)))
-#define khugepaged_always()				\
-	(transparent_hugepage_flags &			\
-	 (1<<TRANSPARENT_HUGEPAGE_FLAG))
-#define khugepaged_req_madv()					\
-	(transparent_hugepage_flags &				\
-	 (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
-#define khugepaged_defrag()					\
-	(transparent_hugepage_flags &				\
-	 (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG))
-
 static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
 {
 	if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
@@ -53,8 +39,7 @@ static inline void khugepaged_exit(struct mm_struct *mm)
 static inline void khugepaged_enter(struct vm_area_struct *vma,
 				   unsigned long vm_flags)
 {
-	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
-	    khugepaged_enabled()) {
+	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags)) {
 		if (hugepage_vma_check(vma, vm_flags, false, false))
 			__khugepaged_enter(vma->vm_mm);
 	}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b95786ada466..866b98a39496 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -102,11 +102,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
 	if (!in_pf && shmem_file(vma->vm_file))
 		return shmem_huge_enabled(vma);
 
-	if (!khugepaged_enabled())
+	if (!hugepage_flags_enabled())
 		return false;
 
 	/* THP settings require madvise. */
-	if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
+	if (!(vm_flags & VM_HUGEPAGE) && !hugepage_flags_always())
 		return false;
 
 	/* Only regular file is valid */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ab6183c5489f..2523c085625a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -472,7 +472,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
 			  unsigned long vm_flags)
 {
 	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
-	    khugepaged_enabled()) {
+	    hugepage_flags_enabled()) {
 		if (hugepage_vma_check(vma, vm_flags, false, false))
 			__khugepaged_enter(vma->vm_mm);
 	}
@@ -763,6 +763,10 @@ static bool khugepaged_scan_abort(int nid)
 	return false;
 }
 
+#define khugepaged_defrag()					\
+	(transparent_hugepage_flags &				\
+	 (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG))
+
 /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
 static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
 {
@@ -860,7 +864,7 @@ static struct page *khugepaged_alloc_hugepage(bool *wait)
 			khugepaged_alloc_sleep();
 		} else
 			count_vm_event(THP_COLLAPSE_ALLOC);
-	} while (unlikely(!hpage) && likely(khugepaged_enabled()));
+	} while (unlikely(!hpage) && likely(hugepage_flags_enabled()));
 
 	return hpage;
 }
@@ -2173,7 +2177,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 static int khugepaged_has_work(void)
 {
 	return !list_empty(&khugepaged_scan.mm_head) &&
-		khugepaged_enabled();
+		hugepage_flags_enabled();
 }
 
 static int khugepaged_wait_event(void)
@@ -2238,7 +2242,7 @@ static void khugepaged_wait_work(void)
 		return;
 	}
 
-	if (khugepaged_enabled())
+	if (hugepage_flags_enabled())
 		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
 }
 
@@ -2269,7 +2273,7 @@ static void set_recommended_min_free_kbytes(void)
 	int nr_zones = 0;
 	unsigned long recommended_min;
 
-	if (!khugepaged_enabled()) {
+	if (!hugepage_flags_enabled()) {
 		calculate_min_free_kbytes();
 		goto update_wmarks;
 	}
@@ -2319,7 +2323,7 @@ int start_stop_khugepaged(void)
 	int err = 0;
 
 	mutex_lock(&khugepaged_mutex);
-	if (khugepaged_enabled()) {
+	if (hugepage_flags_enabled()) {
 		if (!khugepaged_thread)
 			khugepaged_thread = kthread_run(khugepaged, NULL,
 							"khugepaged");
@@ -2345,7 +2349,7 @@ int start_stop_khugepaged(void)
 void khugepaged_min_free_kbytes_update(void)
 {
 	mutex_lock(&khugepaged_mutex);
-	if (khugepaged_enabled() && khugepaged_thread)
+	if (hugepage_flags_enabled() && khugepaged_thread)
 		set_recommended_min_free_kbytes();
 	mutex_unlock(&khugepaged_mutex);
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 1/7] mm: khugepaged: check THP flag in hugepage_vma_check()
  2022-06-06 21:44 ` [v3 PATCH 1/7] mm: khugepaged: check THP flag in hugepage_vma_check() Yang Shi
@ 2022-06-09 17:49   ` Zach O'Keefe
  2022-06-10  7:09   ` Miaohe Lin
  1 sibling, 0 replies; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-09 17:49 UTC (permalink / raw)
  To: Yang Shi; +Cc: vbabka, kirill.shutemov, willy, akpm, linux-mm, linux-kernel

Reviewed-by: Zach O'Keefe <zokeefe@google.com>

On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
>
> Currently the THP flag check in hugepage_vma_check() will fallthrough if
> the flag is NEVER and VM_HUGEPAGE is set.  This is not a problem for now
> since all the callers have the flag checked before or can't be invoked if
> the flag is NEVER.
>
> However, the following patch will call hugepage_vma_check() in more
> places, for example, page fault, so this flag must be checked in
> hugepge_vma_check().
>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/khugepaged.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 671ac7800e53..84b9cf4b9be9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -458,6 +458,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
>         if (shmem_file(vma->vm_file))
>                 return shmem_huge_enabled(vma);
>
> +       if (!khugepaged_enabled())
> +               return false;
> +
>         /* THP settings require madvise. */
>         if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
>                 return false;
> --
> 2.26.3
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper
  2022-06-06 21:44 ` [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper Yang Shi
@ 2022-06-09 22:21   ` Zach O'Keefe
  2022-06-10  0:08     ` Yang Shi
  2022-06-10  7:20   ` Miaohe Lin
  1 sibling, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-09 22:21 UTC (permalink / raw)
  To: Yang Shi; +Cc: vbabka, kirill.shutemov, willy, akpm, linux-mm, linux-kernel

On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
>
> There are couple of places that check whether the vma size is ok for
> THP or not, they are open coded and duplicate, introduce
> transhuge_vma_size_ok() helper to do the job.
>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  include/linux/huge_mm.h | 17 +++++++++++++++++
>  mm/huge_memory.c        |  5 +----
>  mm/khugepaged.c         | 12 ++++++------
>  3 files changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 648cb3ce7099..a8f61db47f2a 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -116,6 +116,18 @@ extern struct kobj_attribute shmem_enabled_attr;
>
>  extern unsigned long transparent_hugepage_flags;
>
> +/*
> + * The vma size has to be large enough to hold an aligned HPAGE_PMD_SIZE area.
> + */
> +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> +{
> +       if (round_up(vma->vm_start, HPAGE_PMD_SIZE) <
> +           (vma->vm_end & HPAGE_PMD_MASK))
> +               return true;
> +
> +       return false;
> +}

First time coming across round_up() - thanks for that - but for
symmetry, maybe also use round_down() for the end? No strong opinion -
just a suggestion given I've just discovered it.

>  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>                 unsigned long addr)
>  {
> @@ -345,6 +357,11 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
>         return false;
>  }
>
> +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> +{
> +       return false;
> +}
> +
>  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>                 unsigned long addr)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 48182c8fe151..36ada544e494 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -71,10 +71,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
>
>  bool transparent_hugepage_active(struct vm_area_struct *vma)
>  {
> -       /* The addr is used to check if the vma size fits */
> -       unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> -
> -       if (!transhuge_vma_suitable(vma, addr))
> +       if (!transhuge_vma_size_ok(vma))
>                 return false;
>         if (vma_is_anonymous(vma))
>                 return __transparent_hugepage_enabled(vma);

Do we need a check for vma->vm_pgoff alignment here, after
!vma_is_anonymous(), and now that we don't call
transhuge_vma_suitable()?

> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 84b9cf4b9be9..d0f8020164fc 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -454,6 +454,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
>                                 vma->vm_pgoff, HPAGE_PMD_NR))
>                 return false;
>
> +       if (!transhuge_vma_size_ok(vma))
> +               return false;
> +
>         /* Enabled via shmem mount options or sysfs settings. */
>         if (shmem_file(vma->vm_file))
>                 return shmem_huge_enabled(vma);
> @@ -512,9 +515,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
>                           unsigned long vm_flags)
>  {
>         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> -           khugepaged_enabled() &&
> -           (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
> -            (vma->vm_end & HPAGE_PMD_MASK))) {
> +           khugepaged_enabled()) {
>                 if (hugepage_vma_check(vma, vm_flags))
>                         __khugepaged_enter(vma->vm_mm);
>         }
> @@ -2142,10 +2143,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
>                         progress++;
>                         continue;
>                 }
> -               hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> +
> +               hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
>                 hend = vma->vm_end & HPAGE_PMD_MASK;
> -               if (hstart >= hend)
> -                       goto skip;
>                 if (khugepaged_scan.address > hend)
>                         goto skip;
>                 if (khugepaged_scan.address < hstart)

Likewise, could do round_down() here (just a suggestion)

> --
> 2.26.3
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check
  2022-06-06 21:44 ` [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check Yang Shi
@ 2022-06-09 23:23   ` Zach O'Keefe
  2022-06-10  0:01     ` Yang Shi
  2022-06-10  7:23   ` Miaohe Lin
  1 sibling, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-09 23:23 UTC (permalink / raw)
  To: Yang Shi; +Cc: vbabka, kirill.shutemov, willy, akpm, linux-mm, linux-kernel

On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
>
> The hugepage_vma_check() already checked it, so remove the redundant
> check.
>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/khugepaged.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d0f8020164fc..7a5d1c1a1833 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -966,9 +966,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>                 return SCAN_ADDRESS_RANGE;
>         if (!hugepage_vma_check(vma, vma->vm_flags))
>                 return SCAN_VMA_CHECK;
> -       /* Anon VMA expected */
> -       if (!vma->anon_vma || !vma_is_anonymous(vma))
> -               return SCAN_VMA_CHECK;
>         return 0;
>  }
>
> --
> 2.26.3
>
>

So, I don't know if this is possible, but I wonder if there is a race here:

hugepage_vma_revalidate() is called in the anon path when mmap_lock
after dropped + reacquired, and we want to refind / revalidate the
vma, since it might have changed.

There is the possibility that the memory was unmapped, then remapped
as file or shmem. If so, hugepage_vma_check() could return true
without actually checking vma->anon_vma || !vma_is_anonymous(vma) -
and we probably do want to (re)validate that this is indeed still an
anon vma.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers
  2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
                   ` (6 preceding siblings ...)
  2022-06-06 21:44 ` [v3 PATCH 7/7] mm: khugepaged: reorg some khugepaged helpers Yang Shi
@ 2022-06-09 23:32 ` Zach O'Keefe
  2022-06-10  7:08 ` Miaohe Lin
  8 siblings, 0 replies; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-09 23:32 UTC (permalink / raw)
  To: Yang Shi; +Cc: vbabka, kirill.shutemov, willy, akpm, linux-mm, linux-kernel

On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
>
>
> v3: * Fixed the comment from Willy
> v2: * Rebased to the latest mm-unstable
>     * Fixed potential regression for smaps's THPeligible
>
> This series is the follow-up of the discussion about cleaning up transhuge_xxx
> helpers at https://lore.kernel.org/linux-mm/627a71f8-e879-69a5-ceb3-fc8d29d2f7f1@suse.cz/.
>
> THP has a bunch of helpers that do VMA sanity check for different paths, they
> do the similar checks for the most callsites and have a lot duplicate codes.
> And it is confusing what helpers should be used at what conditions.
>
> This series reorganized and cleaned up the code so that we could consolidate
> all the checks into hugepage_vma_check().

By the way, thanks for doing this work. I know I personally was quite
confused about which vma checking function does what / which I should
be using. I briefly tried sketching out how to do something like this
as well - but the various corner cases where e.g. hugepage_vma_check()
and transparent_hugepage_active() differed got confusing. Thanks for
figuring this all out.

> The transhuge_vma_enabled(), transparent_hugepage_active() and
> __transparent_hugepage_enabled() are killed by this series.
>
> Added transhuge_vma_size_ok() helper to remove some duplicate code.
>
>
> Yang Shi (7):
>       mm: khugepaged: check THP flag in hugepage_vma_check()
>       mm: thp: introduce transhuge_vma_size_ok() helper
>       mm: khugepaged: remove the redundant anon vma check
>       mm: khugepaged: use transhuge_vma_suitable replace open-code
>       mm: thp: kill transparent_hugepage_active()
>       mm: thp: kill __transhuge_page_enabled()
>       mm: khugepaged: reorg some khugepaged helpers
>
>  fs/proc/task_mmu.c         |  2 +-
>  include/linux/huge_mm.h    | 84 ++++++++++++++++++++++++++++------------------------------------------
>  include/linux/khugepaged.h | 21 ++----------------
>  mm/huge_memory.c           | 64 +++++++++++++++++++++++++++++++++++++++++++++--------
>  mm/khugepaged.c            | 78 +++++++++++++++--------------------------------------------------
>  mm/memory.c                |  7 ++++--
>  6 files changed, 114 insertions(+), 142 deletions(-)
>
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check
  2022-06-09 23:23   ` Zach O'Keefe
@ 2022-06-10  0:01     ` Yang Shi
  0 siblings, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-10  0:01 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Thu, Jun 9, 2022 at 4:24 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > The hugepage_vma_check() already checked it, so remove the redundant
> > check.
> >
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  mm/khugepaged.c | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index d0f8020164fc..7a5d1c1a1833 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -966,9 +966,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> >                 return SCAN_ADDRESS_RANGE;
> >         if (!hugepage_vma_check(vma, vma->vm_flags))
> >                 return SCAN_VMA_CHECK;
> > -       /* Anon VMA expected */
> > -       if (!vma->anon_vma || !vma_is_anonymous(vma))
> > -               return SCAN_VMA_CHECK;
> >         return 0;
> >  }
> >
> > --
> > 2.26.3
> >
> >
>
> So, I don't know if this is possible, but I wonder if there is a race here:
>
> hugepage_vma_revalidate() is called in the anon path when mmap_lock
> after dropped + reacquired, and we want to refind / revalidate the
> vma, since it might have changed.
>
> There is the possibility that the memory was unmapped, then remapped
> as file or shmem. If so, hugepage_vma_check() could return true
> without actually checking vma->anon_vma || !vma_is_anonymous(vma) -
> and we probably do want to (re)validate that this is indeed still an
> anon vma.

Nice catch! Totally possible. I did overlook this. I will drop this
patch in the next version or maybe making the comment clearer is a
better choice.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper
  2022-06-09 22:21   ` Zach O'Keefe
@ 2022-06-10  0:08     ` Yang Shi
  2022-06-10  0:51       ` Zach O'Keefe
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-10  0:08 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Thu, Jun 9, 2022 at 3:21 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > There are couple of places that check whether the vma size is ok for
> > THP or not, they are open coded and duplicate, introduce
> > transhuge_vma_size_ok() helper to do the job.
> >
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  include/linux/huge_mm.h | 17 +++++++++++++++++
> >  mm/huge_memory.c        |  5 +----
> >  mm/khugepaged.c         | 12 ++++++------
> >  3 files changed, 24 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 648cb3ce7099..a8f61db47f2a 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -116,6 +116,18 @@ extern struct kobj_attribute shmem_enabled_attr;
> >
> >  extern unsigned long transparent_hugepage_flags;
> >
> > +/*
> > + * The vma size has to be large enough to hold an aligned HPAGE_PMD_SIZE area.
> > + */
> > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > +{
> > +       if (round_up(vma->vm_start, HPAGE_PMD_SIZE) <
> > +           (vma->vm_end & HPAGE_PMD_MASK))
> > +               return true;
> > +
> > +       return false;
> > +}
>
> First time coming across round_up() - thanks for that - but for
> symmetry, maybe also use round_down() for the end? No strong opinion -
> just a suggestion given I've just discovered it.

Yeah, round_down is fine too.

>
> >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> >                 unsigned long addr)
> >  {
> > @@ -345,6 +357,11 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> >         return false;
> >  }
> >
> > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > +{
> > +       return false;
> > +}
> > +
> >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> >                 unsigned long addr)
> >  {
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 48182c8fe151..36ada544e494 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -71,10 +71,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> >
> >  bool transparent_hugepage_active(struct vm_area_struct *vma)
> >  {
> > -       /* The addr is used to check if the vma size fits */
> > -       unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> > -
> > -       if (!transhuge_vma_suitable(vma, addr))
> > +       if (!transhuge_vma_size_ok(vma))
> >                 return false;
> >         if (vma_is_anonymous(vma))
> >                 return __transparent_hugepage_enabled(vma);
>
> Do we need a check for vma->vm_pgoff alignment here, after
> !vma_is_anonymous(), and now that we don't call
> transhuge_vma_suitable()?

Actually I was thinking about this too. But the THPeligible bit shown
by smaps is a little bit ambiguous for file vma. The document says:
"THPeligible" indicates whether the mapping is eligible for allocating
THP pages - 1 if true, 0 otherwise.

Even though it doesn't fulfill the alignment, it is still possible to
get THP allocated, but just can't be PMD mapped. So the old behavior
of THPeligible for file vma seems problematic, or at least doesn't
match the document.

I should elaborate this in the commit log.

>
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 84b9cf4b9be9..d0f8020164fc 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -454,6 +454,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> >                                 vma->vm_pgoff, HPAGE_PMD_NR))
> >                 return false;
> >
> > +       if (!transhuge_vma_size_ok(vma))
> > +               return false;
> > +
> >         /* Enabled via shmem mount options or sysfs settings. */
> >         if (shmem_file(vma->vm_file))
> >                 return shmem_huge_enabled(vma);
> > @@ -512,9 +515,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> >                           unsigned long vm_flags)
> >  {
> >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > -           khugepaged_enabled() &&
> > -           (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
> > -            (vma->vm_end & HPAGE_PMD_MASK))) {
> > +           khugepaged_enabled()) {
> >                 if (hugepage_vma_check(vma, vm_flags))
> >                         __khugepaged_enter(vma->vm_mm);
> >         }
> > @@ -2142,10 +2143,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> >                         progress++;
> >                         continue;
> >                 }
> > -               hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > +
> > +               hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
> >                 hend = vma->vm_end & HPAGE_PMD_MASK;
> > -               if (hstart >= hend)
> > -                       goto skip;
> >                 if (khugepaged_scan.address > hend)
> >                         goto skip;
> >                 if (khugepaged_scan.address < hstart)
>
> Likewise, could do round_down() here (just a suggestion)

Fine to me.

>
> > --
> > 2.26.3
> >
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper
  2022-06-10  0:08     ` Yang Shi
@ 2022-06-10  0:51       ` Zach O'Keefe
  2022-06-10 16:38         ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-10  0:51 UTC (permalink / raw)
  To: Yang Shi
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Thu, Jun 9, 2022 at 5:08 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, Jun 9, 2022 at 3:21 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > There are couple of places that check whether the vma size is ok for
> > > THP or not, they are open coded and duplicate, introduce
> > > transhuge_vma_size_ok() helper to do the job.
> > >
> > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > ---
> > >  include/linux/huge_mm.h | 17 +++++++++++++++++
> > >  mm/huge_memory.c        |  5 +----
> > >  mm/khugepaged.c         | 12 ++++++------
> > >  3 files changed, 24 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index 648cb3ce7099..a8f61db47f2a 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -116,6 +116,18 @@ extern struct kobj_attribute shmem_enabled_attr;
> > >
> > >  extern unsigned long transparent_hugepage_flags;
> > >
> > > +/*
> > > + * The vma size has to be large enough to hold an aligned HPAGE_PMD_SIZE area.
> > > + */
> > > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > +{
> > > +       if (round_up(vma->vm_start, HPAGE_PMD_SIZE) <
> > > +           (vma->vm_end & HPAGE_PMD_MASK))
> > > +               return true;
> > > +
> > > +       return false;
> > > +}
> >
> > First time coming across round_up() - thanks for that - but for
> > symmetry, maybe also use round_down() for the end? No strong opinion -
> > just a suggestion given I've just discovered it.
>
> Yeah, round_down is fine too.
>
> >
> > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > >                 unsigned long addr)
> > >  {
> > > @@ -345,6 +357,11 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> > >         return false;
> > >  }
> > >
> > > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > +{
> > > +       return false;
> > > +}
> > > +
> > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > >                 unsigned long addr)
> > >  {
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 48182c8fe151..36ada544e494 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -71,10 +71,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > >
> > >  bool transparent_hugepage_active(struct vm_area_struct *vma)
> > >  {
> > > -       /* The addr is used to check if the vma size fits */
> > > -       unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> > > -
> > > -       if (!transhuge_vma_suitable(vma, addr))
> > > +       if (!transhuge_vma_size_ok(vma))
> > >                 return false;
> > >         if (vma_is_anonymous(vma))
> > >                 return __transparent_hugepage_enabled(vma);
> >
> > Do we need a check for vma->vm_pgoff alignment here, after
> > !vma_is_anonymous(), and now that we don't call
> > transhuge_vma_suitable()?
>
> Actually I was thinking about this too. But the THPeligible bit shown
> by smaps is a little bit ambiguous for file vma. The document says:
> "THPeligible" indicates whether the mapping is eligible for allocating
> THP pages - 1 if true, 0 otherwise.
>
> Even though it doesn't fulfill the alignment, it is still possible to
> get THP allocated, but just can't be PMD mapped. So the old behavior
> of THPeligible for file vma seems problematic, or at least doesn't
> match the document.

I think the term "THP" is used ambiguously. Often, but not always, in
the code, folks will go out of their way to specify "hugepage-sized"
page vs "pmd-mapped hugepage" - but at least from my experience,
external documentation doesn't. Given that THP as a concept doesn't
make much sense without the possibility of pmd-mapping, I think
"THPeligible here means "pmd mappable". For example, AnonHugePages in
smaps means  pmd-mapped anon hugepages.

That all said - the following patches will delete
transparent_hugepage_active() anyways.

> I should elaborate this in the commit log.
>
> >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 84b9cf4b9be9..d0f8020164fc 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -454,6 +454,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > >                                 vma->vm_pgoff, HPAGE_PMD_NR))
> > >                 return false;
> > >
> > > +       if (!transhuge_vma_size_ok(vma))
> > > +               return false;
> > > +
> > >         /* Enabled via shmem mount options or sysfs settings. */
> > >         if (shmem_file(vma->vm_file))
> > >                 return shmem_huge_enabled(vma);
> > > @@ -512,9 +515,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> > >                           unsigned long vm_flags)
> > >  {
> > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > -           khugepaged_enabled() &&
> > > -           (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
> > > -            (vma->vm_end & HPAGE_PMD_MASK))) {
> > > +           khugepaged_enabled()) {
> > >                 if (hugepage_vma_check(vma, vm_flags))
> > >                         __khugepaged_enter(vma->vm_mm);
> > >         }
> > > @@ -2142,10 +2143,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> > >                         progress++;
> > >                         continue;
> > >                 }
> > > -               hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > +
> > > +               hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
> > >                 hend = vma->vm_end & HPAGE_PMD_MASK;
> > > -               if (hstart >= hend)
> > > -                       goto skip;
> > >                 if (khugepaged_scan.address > hend)
> > >                         goto skip;
> > >                 if (khugepaged_scan.address < hstart)
> >
> > Likewise, could do round_down() here (just a suggestion)
>
> Fine to me.
>
> >
> > > --
> > > 2.26.3
> > >
> > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 5/7] mm: thp: kill transparent_hugepage_active()
  2022-06-06 21:44 ` [v3 PATCH 5/7] mm: thp: kill transparent_hugepage_active() Yang Shi
@ 2022-06-10  1:02   ` Zach O'Keefe
  2022-06-10 17:02     ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-10  1:02 UTC (permalink / raw)
  To: Yang Shi; +Cc: vbabka, kirill.shutemov, willy, akpm, linux-mm, linux-kernel

On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
>
> The transparent_hugepage_active() was introduced to show THP eligibility
> bit in smaps in proc, smaps is the only user.  But it actually does the
> similar check as hugepage_vma_check() which is used by khugepaged.  We
> definitely don't have to maintain two similar checks, so kill
> transparent_hugepage_active().

I never realized smaps was the only user! Great!

> Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it
> is not only for khugepaged anymore.
>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  fs/proc/task_mmu.c         |  2 +-
>  include/linux/huge_mm.h    | 16 +++++++-----
>  include/linux/khugepaged.h |  4 +--
>  mm/huge_memory.c           | 50 ++++++++++++++++++++++++++++++++-----
>  mm/khugepaged.c            | 51 +++-----------------------------------
>  5 files changed, 60 insertions(+), 63 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 2dd8c8a66924..fd79566e204c 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
>         __show_smap(m, &mss, false);
>
>         seq_printf(m, "THPeligible:    %d\n",
> -                  transparent_hugepage_active(vma));
> +                  hugepage_vma_check(vma, vma->vm_flags, true));
>
>         if (arch_pkeys_enabled())
>                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 79d5919beb83..f561c3e16def 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -209,7 +209,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>                !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>  }
>
> -bool transparent_hugepage_active(struct vm_area_struct *vma);
> +bool hugepage_vma_check(struct vm_area_struct *vma,
> +                       unsigned long vm_flags,
> +                       bool smaps);
>
>  #define transparent_hugepage_use_zero_page()                           \
>         (transparent_hugepage_flags &                                   \
> @@ -358,11 +360,6 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>         return false;
>  }
>
> -static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> -{
> -       return false;
> -}
> -
>  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
>  {
>         return false;
> @@ -380,6 +377,13 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
>         return false;
>  }
>
> +static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> +                                      unsigned long vm_flags,
> +                                      bool smaps)
> +{
> +       return false;
> +}
> +
>  static inline void prep_transhuge_page(struct page *page) {}
>
>  #define transparent_hugepage_flags 0UL
> diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> index 392d34c3c59a..8a6452e089ca 100644
> --- a/include/linux/khugepaged.h
> +++ b/include/linux/khugepaged.h
> @@ -10,8 +10,6 @@ extern struct attribute_group khugepaged_attr_group;
>  extern int khugepaged_init(void);
>  extern void khugepaged_destroy(void);
>  extern int start_stop_khugepaged(void);
> -extern bool hugepage_vma_check(struct vm_area_struct *vma,
> -                              unsigned long vm_flags);
>  extern void __khugepaged_enter(struct mm_struct *mm);
>  extern void __khugepaged_exit(struct mm_struct *mm);
>  extern void khugepaged_enter_vma(struct vm_area_struct *vma,
> @@ -57,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
>  {
>         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
>             khugepaged_enabled()) {
> -               if (hugepage_vma_check(vma, vm_flags))
> +               if (hugepage_vma_check(vma, vm_flags, false))
>                         __khugepaged_enter(vma->vm_mm);
>         }
>  }
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 36ada544e494..bc8370856e85 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -69,18 +69,56 @@ static atomic_t huge_zero_refcount;
>  struct page *huge_zero_page __read_mostly;
>  unsigned long huge_zero_pfn __read_mostly = ~0UL;
>
> -bool transparent_hugepage_active(struct vm_area_struct *vma)
> +bool hugepage_vma_check(struct vm_area_struct *vma,
> +                       unsigned long vm_flags,
> +                       bool smaps)
>  {
> +       if (!transhuge_vma_enabled(vma, vm_flags))
> +               return false;
> +
> +       if (vm_flags & VM_NO_KHUGEPAGED)
> +               return false;
> +
> +       /* Don't run khugepaged against DAX vma */
> +       if (vma_is_dax(vma))
> +               return false;
> +
> +       if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> +                               vma->vm_pgoff, HPAGE_PMD_NR))
> +               return false;
> +
>         if (!transhuge_vma_size_ok(vma))
>                 return false;
> -       if (vma_is_anonymous(vma))
> -               return __transparent_hugepage_enabled(vma);
> -       if (vma_is_shmem(vma))
> +
> +       /* Enabled via shmem mount options or sysfs settings. */
> +       if (shmem_file(vma->vm_file))
>                 return shmem_huge_enabled(vma);
> -       if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
> +
> +       if (!khugepaged_enabled())
> +               return false;
> +
> +       /* THP settings require madvise. */
> +       if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> +               return false;
> +
> +       /* Only regular file is valid */
> +       if (file_thp_enabled(vma))
>                 return true;
>
> -       return false;
> +       if (!vma_is_anonymous(vma))
> +               return false;
> +
> +       if (vma_is_temporary_stack(vma))
> +               return false;
> +
> +       /*
> +        * THPeligible bit of smaps should show 1 for proper VMAs even
> +        * though anon_vma is not initialized yet.
> +        */
> +       if (!vma->anon_vma)
> +               return smaps;
> +
> +       return true;
>  }

There are a few cases where the return value for smaps will be
different from before. I presume this won't be an issue, and that any
difference resulting from this change is actually a positive
difference, given it more accurately reflects the thp eligibility of
the vma? For example, a VM_NO_KHUGEPAGED-marked vma might now show 0
where it otherwise showed 1.

>  static bool get_huge_zero_page(void)
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index ca1754d3a827..aa0769e3b0d9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -437,49 +437,6 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
>         return atomic_read(&mm->mm_users) == 0;
>  }
>
> -bool hugepage_vma_check(struct vm_area_struct *vma,
> -                       unsigned long vm_flags)
> -{
> -       if (!transhuge_vma_enabled(vma, vm_flags))
> -               return false;
> -
> -       if (vm_flags & VM_NO_KHUGEPAGED)
> -               return false;
> -
> -       /* Don't run khugepaged against DAX vma */
> -       if (vma_is_dax(vma))
> -               return false;
> -
> -       if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> -                               vma->vm_pgoff, HPAGE_PMD_NR))
> -               return false;
> -
> -       if (!transhuge_vma_size_ok(vma))
> -               return false;
> -
> -       /* Enabled via shmem mount options or sysfs settings. */
> -       if (shmem_file(vma->vm_file))
> -               return shmem_huge_enabled(vma);
> -
> -       if (!khugepaged_enabled())
> -               return false;
> -
> -       /* THP settings require madvise. */
> -       if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> -               return false;
> -
> -       /* Only regular file is valid */
> -       if (file_thp_enabled(vma))
> -               return true;
> -
> -       if (!vma->anon_vma || !vma_is_anonymous(vma))
> -               return false;
> -       if (vma_is_temporary_stack(vma))
> -               return false;
> -
> -       return true;
> -}
> -
>  void __khugepaged_enter(struct mm_struct *mm)
>  {
>         struct mm_slot *mm_slot;
> @@ -516,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
>  {
>         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
>             khugepaged_enabled()) {
> -               if (hugepage_vma_check(vma, vm_flags))
> +               if (hugepage_vma_check(vma, vm_flags, false))
>                         __khugepaged_enter(vma->vm_mm);
>         }
>  }
> @@ -961,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>
>         if (!transhuge_vma_suitable(vma, address))
>                 return SCAN_ADDRESS_RANGE;
> -       if (!hugepage_vma_check(vma, vma->vm_flags))
> +       if (!hugepage_vma_check(vma, vma->vm_flags, false))
>                 return SCAN_VMA_CHECK;
>         return 0;
>  }
> @@ -1442,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
>          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
>          * will not fail the vma for missing VM_HUGEPAGE
>          */
> -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
> +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
>                 return;
>
>         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> @@ -2132,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
>                         progress++;
>                         break;
>                 }
> -               if (!hugepage_vma_check(vma, vma->vm_flags)) {
> +               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
>  skip:
>                         progress++;
>                         continue;
> --
> 2.26.3
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code
  2022-06-06 21:44 ` [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code Yang Shi
@ 2022-06-10  1:51   ` Zach O'Keefe
  2022-06-10 16:59     ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-10  1:51 UTC (permalink / raw)
  To: Yang Shi; +Cc: vbabka, kirill.shutemov, willy, akpm, linux-mm, linux-kernel

On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
>
> The hugepage_vma_revalidate() needs to check if the address is still in
> the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock,
> but it was open-coded, use transhuge_vma_suitable() to do the job.  And
> add proper comments for transhuge_vma_suitable().
>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  include/linux/huge_mm.h | 6 ++++++
>  mm/khugepaged.c         | 5 +----
>  2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index a8f61db47f2a..79d5919beb83 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
>         return false;
>  }
>
> +/*
> + * Do the below checks:
> + *   - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned.
> + *   - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE
> + *     area.
> + */

AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but
rather that linear_page_index(vma, round_up(vma->vm_start,
HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was
pretty confused about this (hopefully I have it right now - if not -
case and point :) ), so it might be a good opportunity to add some
extra commentary to help future travelers understand why this
constraint exists.

Also I wonder while we're at it if we can rename this to
transhuge_addr_aligned() or transhuge_addr_suitable() or something.

Otherwise I think the change is a nice cleanup.

>  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>                 unsigned long addr)
>  {
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7a5d1c1a1833..ca1754d3a827 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>                 struct vm_area_struct **vmap)
>  {
>         struct vm_area_struct *vma;
> -       unsigned long hstart, hend;
>
>         if (unlikely(khugepaged_test_exit(mm)))
>                 return SCAN_ANY_PROCESS;
> @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>         if (!vma)
>                 return SCAN_VMA_NULL;
>
> -       hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> -       hend = vma->vm_end & HPAGE_PMD_MASK;
> -       if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> +       if (!transhuge_vma_suitable(vma, address))
>                 return SCAN_ADDRESS_RANGE;
>         if (!hugepage_vma_check(vma, vma->vm_flags))
>                 return SCAN_VMA_CHECK;
> --
> 2.26.3
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled()
  2022-06-06 21:44 ` [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled() Yang Shi
@ 2022-06-10  2:22   ` Zach O'Keefe
  2022-06-10 17:24     ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-10  2:22 UTC (permalink / raw)
  To: Yang Shi; +Cc: vbabka, kirill.shutemov, willy, akpm, linux-mm, linux-kernel

On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
>
> The page fault path checks THP eligibility with
> __transhuge_page_enabled() which does the similar thing as
> hugepage_vma_check(), so use hugepage_vma_check() instead.
>
> However page fault allows DAX and !anon_vma cases, so added a new flag,
> in_pf, to hugepage_vma_check() to make page fault work correctly.
>
> The in_pf flag is also used to skip shmem and file THP for page fault
> since shmem handles THP in its own shmem_fault() and file THP allocation
> on fault is not supported yet.
>
> Also remove hugepage_vma_enabled() since hugepage_vma_check() is the
> only caller now, it is not necessary to have a helper function.
>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  fs/proc/task_mmu.c         |  2 +-
>  include/linux/huge_mm.h    | 57 ++------------------------------------
>  include/linux/khugepaged.h |  2 +-
>  mm/huge_memory.c           | 25 ++++++++++++-----
>  mm/khugepaged.c            |  8 +++---
>  mm/memory.c                |  7 +++--
>  6 files changed, 31 insertions(+), 70 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index fd79566e204c..a0850303baec 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
>         __show_smap(m, &mss, false);
>
>         seq_printf(m, "THPeligible:    %d\n",
> -                  hugepage_vma_check(vma, vma->vm_flags, true));
> +                  hugepage_vma_check(vma, vma->vm_flags, true, false));
>
>         if (arch_pkeys_enabled())
>                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index f561c3e16def..d478e8875023 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -153,48 +153,6 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>         return true;
>  }
>
> -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> -                                         unsigned long vm_flags)
> -{
> -       /* Explicitly disabled through madvise. */
> -       if ((vm_flags & VM_NOHUGEPAGE) ||
> -           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> -               return false;
> -       return true;
> -}
> -
> -/*
> - * to be used on vmas which are known to support THP.
> - * Use transparent_hugepage_active otherwise
> - */
> -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> -{
> -
> -       /*
> -        * If the hardware/firmware marked hugepage support disabled.
> -        */
> -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> -               return false;
> -
> -       if (!transhuge_vma_enabled(vma, vma->vm_flags))
> -               return false;
> -
> -       if (vma_is_temporary_stack(vma))
> -               return false;
> -
> -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> -               return true;
> -
> -       if (vma_is_dax(vma))
> -               return true;
> -
> -       if (transparent_hugepage_flags &
> -                               (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
> -               return !!(vma->vm_flags & VM_HUGEPAGE);
> -
> -       return false;
> -}
> -
>  static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  {
>         struct inode *inode;
> @@ -211,7 +169,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>
>  bool hugepage_vma_check(struct vm_area_struct *vma,
>                         unsigned long vm_flags,
> -                       bool smaps);
> +                       bool smaps, bool in_pf);
>
>  #define transparent_hugepage_use_zero_page()                           \
>         (transparent_hugepage_flags &                                   \
> @@ -355,11 +313,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio)
>         return false;
>  }
>
> -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> -{
> -       return false;
> -}
> -
>  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
>  {
>         return false;
> @@ -371,15 +324,9 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>         return false;
>  }
>
> -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> -                                         unsigned long vm_flags)
> -{
> -       return false;
> -}
> -
>  static inline bool hugepage_vma_check(struct vm_area_struct *vma,
>                                        unsigned long vm_flags,
> -                                      bool smaps)
> +                                      bool smaps, bool in_pf)
>  {
>         return false;
>  }
> diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> index 8a6452e089ca..e047be601268 100644
> --- a/include/linux/khugepaged.h
> +++ b/include/linux/khugepaged.h
> @@ -55,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
>  {
>         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
>             khugepaged_enabled()) {
> -               if (hugepage_vma_check(vma, vm_flags, false))
> +               if (hugepage_vma_check(vma, vm_flags, false, false))
>                         __khugepaged_enter(vma->vm_mm);
>         }
>  }
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bc8370856e85..b95786ada466 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -71,17 +71,25 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
>
>  bool hugepage_vma_check(struct vm_area_struct *vma,
>                         unsigned long vm_flags,
> -                       bool smaps)
> +                       bool smaps, bool in_pf)
>  {
> -       if (!transhuge_vma_enabled(vma, vm_flags))
> +       /* Explicitly disabled through madvise or prctl. */

Or s390 kvm (not that this has to be exhaustively maintained).

> +       if ((vm_flags & VM_NOHUGEPAGE) ||
> +           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> +               return false;
> +       /*
> +        * If the hardware/firmware marked hugepage support disabled.
> +        */
> +       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
>                 return false;

This introduces an extra check for khugepaged path. I don't know
enough about TRANSPARENT_HUGEPAGE_NEVER_DAX, but I assume this is ok?
What would have happened previously if khugepaged tried to collapse
this memory?

> +       /* Special VMA and hugetlb VMA */
>         if (vm_flags & VM_NO_KHUGEPAGED)
>                 return false;

This adds an extra check along the fault path. Is it also safe to add?

> -       /* Don't run khugepaged against DAX vma */
> +       /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
>         if (vma_is_dax(vma))
> -               return false;
> +               return in_pf;

I assume vma_is_temporary_stack() and vma_is_dax() is mutually exclusive.

>         if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
>                                 vma->vm_pgoff, HPAGE_PMD_NR))
> @@ -91,7 +99,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
>                 return false;
>
>         /* Enabled via shmem mount options or sysfs settings. */
> -       if (shmem_file(vma->vm_file))
> +       if (!in_pf && shmem_file(vma->vm_file))
>                 return shmem_huge_enabled(vma);

Will shmem_file() ever be true in the fault path? Or is this just an
optimization?

>         if (!khugepaged_enabled())
> @@ -102,7 +110,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
>                 return false;
>
>         /* Only regular file is valid */
> -       if (file_thp_enabled(vma))
> +       if (!in_pf && file_thp_enabled(vma))
>                 return true;

Likewise for file_thp_enabled()

>         if (!vma_is_anonymous(vma))
> @@ -114,9 +122,12 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
>         /*
>          * THPeligible bit of smaps should show 1 for proper VMAs even
>          * though anon_vma is not initialized yet.
> +        *
> +        * Allow page fault since anon_vma may be not initialized until
> +        * the first page fault.
>          */
>         if (!vma->anon_vma)
> -               return smaps;
> +               return (smaps || in_pf);
>
>         return true;
>  }
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index aa0769e3b0d9..ab6183c5489f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -473,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
>  {
>         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
>             khugepaged_enabled()) {
> -               if (hugepage_vma_check(vma, vm_flags, false))
> +               if (hugepage_vma_check(vma, vm_flags, false, false))
>                         __khugepaged_enter(vma->vm_mm);
>         }
>  }
> @@ -918,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>
>         if (!transhuge_vma_suitable(vma, address))
>                 return SCAN_ADDRESS_RANGE;
> -       if (!hugepage_vma_check(vma, vma->vm_flags, false))
> +       if (!hugepage_vma_check(vma, vma->vm_flags, false, false))
>                 return SCAN_VMA_CHECK;
>         return 0;
>  }
> @@ -1399,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
>          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
>          * will not fail the vma for missing VM_HUGEPAGE
>          */
> -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false))
>                 return;
>
>         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> @@ -2089,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
>                         progress++;
>                         break;
>                 }
> -               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> +               if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) {
>  skip:
>                         progress++;
>                         continue;
> diff --git a/mm/memory.c b/mm/memory.c
> index bc5d40eec5d5..673f7561a30a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4962,6 +4962,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>                 .gfp_mask = __get_fault_gfp_mask(vma),
>         };
>         struct mm_struct *mm = vma->vm_mm;
> +       unsigned long vm_flags = vma->vm_flags;
>         pgd_t *pgd;
>         p4d_t *p4d;
>         vm_fault_t ret;
> @@ -4975,7 +4976,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>         if (!vmf.pud)
>                 return VM_FAULT_OOM;
>  retry_pud:
> -       if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
> +       if (pud_none(*vmf.pud) &&
> +           hugepage_vma_check(vma, vm_flags, false, true)) {
>                 ret = create_huge_pud(&vmf);
>                 if (!(ret & VM_FAULT_FALLBACK))
>                         return ret;
> @@ -5008,7 +5010,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>         if (pud_trans_unstable(vmf.pud))
>                 goto retry_pud;
>
> -       if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
> +       if (pmd_none(*vmf.pmd) &&
> +           hugepage_vma_check(vma, vm_flags, false, true)) {
>                 ret = create_huge_pmd(&vmf);
>                 if (!(ret & VM_FAULT_FALLBACK))
>                         return ret;
> --
> 2.26.3
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers
  2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
                   ` (7 preceding siblings ...)
  2022-06-09 23:32 ` [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Zach O'Keefe
@ 2022-06-10  7:08 ` Miaohe Lin
  8 siblings, 0 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-10  7:08 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-mm, linux-kernel, vbabka, kirill.shutemov, willy, akpm

On 2022/6/7 5:44, Yang Shi wrote:
> 
> v3: * Fixed the comment from Willy
> v2: * Rebased to the latest mm-unstable
>     * Fixed potential regression for smaps's THPeligible
> 
> This series is the follow-up of the discussion about cleaning up transhuge_xxx
> helpers at https://lore.kernel.org/linux-mm/627a71f8-e879-69a5-ceb3-fc8d29d2f7f1@suse.cz/.
> 
> THP has a bunch of helpers that do VMA sanity check for different paths, they
> do the similar checks for the most callsites and have a lot duplicate codes.
> And it is confusing what helpers should be used at what conditions.

Yes, these helpers really confused me when I read the code. Thanks for doing this!

> 
> This series reorganized and cleaned up the code so that we could consolidate
> all the checks into hugepage_vma_check().
> 
> The transhuge_vma_enabled(), transparent_hugepage_active() and
> __transparent_hugepage_enabled() are killed by this series.
> 
> Added transhuge_vma_size_ok() helper to remove some duplicate code.
> 
> 
> Yang Shi (7):
>       mm: khugepaged: check THP flag in hugepage_vma_check()
>       mm: thp: introduce transhuge_vma_size_ok() helper
>       mm: khugepaged: remove the redundant anon vma check
>       mm: khugepaged: use transhuge_vma_suitable replace open-code
>       mm: thp: kill transparent_hugepage_active()
>       mm: thp: kill __transhuge_page_enabled()
>       mm: khugepaged: reorg some khugepaged helpers
> 
>  fs/proc/task_mmu.c         |  2 +-
>  include/linux/huge_mm.h    | 84 ++++++++++++++++++++++++++++------------------------------------------
>  include/linux/khugepaged.h | 21 ++----------------
>  mm/huge_memory.c           | 64 +++++++++++++++++++++++++++++++++++++++++++++--------
>  mm/khugepaged.c            | 78 +++++++++++++++--------------------------------------------------
>  mm/memory.c                |  7 ++++--
>  6 files changed, 114 insertions(+), 142 deletions(-)
> 
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 1/7] mm: khugepaged: check THP flag in hugepage_vma_check()
  2022-06-06 21:44 ` [v3 PATCH 1/7] mm: khugepaged: check THP flag in hugepage_vma_check() Yang Shi
  2022-06-09 17:49   ` Zach O'Keefe
@ 2022-06-10  7:09   ` Miaohe Lin
  1 sibling, 0 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-10  7:09 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-mm, linux-kernel, vbabka, kirill.shutemov, willy, akpm

On 2022/6/7 5:44, Yang Shi wrote:
> Currently the THP flag check in hugepage_vma_check() will fallthrough if
> the flag is NEVER and VM_HUGEPAGE is set.  This is not a problem for now
> since all the callers have the flag checked before or can't be invoked if
> the flag is NEVER.
> 
> However, the following patch will call hugepage_vma_check() in more
> places, for example, page fault, so this flag must be checked in
> hugepge_vma_check().
> 
> Signed-off-by: Yang Shi <shy828301@gmail.com>

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks!

> ---
>  mm/khugepaged.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 671ac7800e53..84b9cf4b9be9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -458,6 +458,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
>  	if (shmem_file(vma->vm_file))
>  		return shmem_huge_enabled(vma);
>  
> +	if (!khugepaged_enabled())
> +		return false;
> +
>  	/* THP settings require madvise. */
>  	if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
>  		return false;
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper
  2022-06-06 21:44 ` [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper Yang Shi
  2022-06-09 22:21   ` Zach O'Keefe
@ 2022-06-10  7:20   ` Miaohe Lin
  2022-06-10 16:47     ` Yang Shi
  1 sibling, 1 reply; 40+ messages in thread
From: Miaohe Lin @ 2022-06-10  7:20 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-mm, linux-kernel, vbabka, kirill.shutemov, willy, akpm

On 2022/6/7 5:44, Yang Shi wrote:
> There are couple of places that check whether the vma size is ok for
> THP or not, they are open coded and duplicate, introduce
> transhuge_vma_size_ok() helper to do the job.
> 
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  include/linux/huge_mm.h | 17 +++++++++++++++++
>  mm/huge_memory.c        |  5 +----
>  mm/khugepaged.c         | 12 ++++++------
>  3 files changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 648cb3ce7099..a8f61db47f2a 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -116,6 +116,18 @@ extern struct kobj_attribute shmem_enabled_attr;
>  
>  extern unsigned long transparent_hugepage_flags;
>  
> +/*
> + * The vma size has to be large enough to hold an aligned HPAGE_PMD_SIZE area.
> + */
> +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> +{
> +	if (round_up(vma->vm_start, HPAGE_PMD_SIZE) <
> +	    (vma->vm_end & HPAGE_PMD_MASK))
> +		return true;
> +
> +	return false;
> +}
> +
>  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>  		unsigned long addr)
>  {
> @@ -345,6 +357,11 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
>  	return false;
>  }
>  
> +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> +{
> +	return false;
> +}
> +
>  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>  		unsigned long addr)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 48182c8fe151..36ada544e494 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -71,10 +71,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
>  
>  bool transparent_hugepage_active(struct vm_area_struct *vma)
>  {
> -	/* The addr is used to check if the vma size fits */
> -	unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> -
> -	if (!transhuge_vma_suitable(vma, addr))

There is also pgoff check for file page in transhuge_vma_suitable. Is it ignored
deliberately?

> +	if (!transhuge_vma_size_ok(vma))
>  		return false;
>  	if (vma_is_anonymous(vma))
>  		return __transparent_hugepage_enabled(vma);
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 84b9cf4b9be9..d0f8020164fc 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -454,6 +454,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
>  				vma->vm_pgoff, HPAGE_PMD_NR))
>  		return false;
>  
> +	if (!transhuge_vma_size_ok(vma))
> +		return false;
> +
>  	/* Enabled via shmem mount options or sysfs settings. */
>  	if (shmem_file(vma->vm_file))
>  		return shmem_huge_enabled(vma);
> @@ -512,9 +515,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
>  			  unsigned long vm_flags)
>  {
>  	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> -	    khugepaged_enabled() &&
> -	    (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
> -	     (vma->vm_end & HPAGE_PMD_MASK))) {
> +	    khugepaged_enabled()) {
>  		if (hugepage_vma_check(vma, vm_flags))
>  			__khugepaged_enter(vma->vm_mm);
>  	}

After this change, khugepaged_enter_vma is identical to khugepaged_enter. Should one of
them be removed?

Thanks!

> @@ -2142,10 +2143,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
>  			progress++;
>  			continue;
>  		}
> -		hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> +
> +		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
>  		hend = vma->vm_end & HPAGE_PMD_MASK;
> -		if (hstart >= hend)
> -			goto skip;
>  		if (khugepaged_scan.address > hend)
>  			goto skip;
>  		if (khugepaged_scan.address < hstart)
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check
  2022-06-06 21:44 ` [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check Yang Shi
  2022-06-09 23:23   ` Zach O'Keefe
@ 2022-06-10  7:23   ` Miaohe Lin
  2022-06-10  7:28     ` Miaohe Lin
  1 sibling, 1 reply; 40+ messages in thread
From: Miaohe Lin @ 2022-06-10  7:23 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-mm, linux-kernel, vbabka, kirill.shutemov, willy, akpm

On 2022/6/7 5:44, Yang Shi wrote:
> The hugepage_vma_check() already checked it, so remove the redundant
> check.
> 
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/khugepaged.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d0f8020164fc..7a5d1c1a1833 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -966,9 +966,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>  		return SCAN_ADDRESS_RANGE;
>  	if (!hugepage_vma_check(vma, vma->vm_flags))
>  		return SCAN_VMA_CHECK;
> -	/* Anon VMA expected */
> -	if (!vma->anon_vma || !vma_is_anonymous(vma))
> -		return SCAN_VMA_CHECK;

Is it possible that hugepage_vma_check returns true due to the shmem check, or file thp check since
we dropped mmap_lock ? So anon vma is explicitly checked again here?

Thanks!

>  	return 0;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check
  2022-06-10  7:23   ` Miaohe Lin
@ 2022-06-10  7:28     ` Miaohe Lin
  0 siblings, 0 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-10  7:28 UTC (permalink / raw)
  To: Yang Shi; +Cc: linux-mm, linux-kernel, vbabka, kirill.shutemov, willy, akpm

On 2022/6/10 15:23, Miaohe Lin wrote:
> On 2022/6/7 5:44, Yang Shi wrote:
>> The hugepage_vma_check() already checked it, so remove the redundant
>> check.
>>
>> Signed-off-by: Yang Shi <shy828301@gmail.com>
>> ---
>>  mm/khugepaged.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d0f8020164fc..7a5d1c1a1833 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -966,9 +966,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>>  		return SCAN_ADDRESS_RANGE;
>>  	if (!hugepage_vma_check(vma, vma->vm_flags))
>>  		return SCAN_VMA_CHECK;
>> -	/* Anon VMA expected */
>> -	if (!vma->anon_vma || !vma_is_anonymous(vma))
>> -		return SCAN_VMA_CHECK;
> 
> Is it possible that hugepage_vma_check returns true due to the shmem check, or file thp check since
> we dropped mmap_lock ? So anon vma is explicitly checked again here?

I just see your discussion with similar problem. Sorry for make noise.

> 
> Thanks!
> 
>>  	return 0;
>>  }
>>  
>>
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper
  2022-06-10  0:51       ` Zach O'Keefe
@ 2022-06-10 16:38         ` Yang Shi
  2022-06-10 21:24           ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-10 16:38 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Thu, Jun 9, 2022 at 5:52 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Thu, Jun 9, 2022 at 5:08 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Thu, Jun 9, 2022 at 3:21 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > >
> > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > There are couple of places that check whether the vma size is ok for
> > > > THP or not, they are open coded and duplicate, introduce
> > > > transhuge_vma_size_ok() helper to do the job.
> > > >
> > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > ---
> > > >  include/linux/huge_mm.h | 17 +++++++++++++++++
> > > >  mm/huge_memory.c        |  5 +----
> > > >  mm/khugepaged.c         | 12 ++++++------
> > > >  3 files changed, 24 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index 648cb3ce7099..a8f61db47f2a 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -116,6 +116,18 @@ extern struct kobj_attribute shmem_enabled_attr;
> > > >
> > > >  extern unsigned long transparent_hugepage_flags;
> > > >
> > > > +/*
> > > > + * The vma size has to be large enough to hold an aligned HPAGE_PMD_SIZE area.
> > > > + */
> > > > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > > +{
> > > > +       if (round_up(vma->vm_start, HPAGE_PMD_SIZE) <
> > > > +           (vma->vm_end & HPAGE_PMD_MASK))
> > > > +               return true;
> > > > +
> > > > +       return false;
> > > > +}
> > >
> > > First time coming across round_up() - thanks for that - but for
> > > symmetry, maybe also use round_down() for the end? No strong opinion -
> > > just a suggestion given I've just discovered it.
> >
> > Yeah, round_down is fine too.
> >
> > >
> > > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > >                 unsigned long addr)
> > > >  {
> > > > @@ -345,6 +357,11 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > >         return false;
> > > >  }
> > > >
> > > > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > > +{
> > > > +       return false;
> > > > +}
> > > > +
> > > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > >                 unsigned long addr)
> > > >  {
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > index 48182c8fe151..36ada544e494 100644
> > > > --- a/mm/huge_memory.c
> > > > +++ b/mm/huge_memory.c
> > > > @@ -71,10 +71,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > > >
> > > >  bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > >  {
> > > > -       /* The addr is used to check if the vma size fits */
> > > > -       unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> > > > -
> > > > -       if (!transhuge_vma_suitable(vma, addr))
> > > > +       if (!transhuge_vma_size_ok(vma))
> > > >                 return false;
> > > >         if (vma_is_anonymous(vma))
> > > >                 return __transparent_hugepage_enabled(vma);
> > >
> > > Do we need a check for vma->vm_pgoff alignment here, after
> > > !vma_is_anonymous(), and now that we don't call
> > > transhuge_vma_suitable()?
> >
> > Actually I was thinking about this too. But the THPeligible bit shown
> > by smaps is a little bit ambiguous for file vma. The document says:
> > "THPeligible" indicates whether the mapping is eligible for allocating
> > THP pages - 1 if true, 0 otherwise.
> >
> > Even though it doesn't fulfill the alignment, it is still possible to
> > get THP allocated, but just can't be PMD mapped. So the old behavior
> > of THPeligible for file vma seems problematic, or at least doesn't
> > match the document.
>
> I think the term "THP" is used ambiguously. Often, but not always, in
> the code, folks will go out of their way to specify "hugepage-sized"
> page vs "pmd-mapped hugepage" - but at least from my experience,
> external documentation doesn't. Given that THP as a concept doesn't
> make much sense without the possibility of pmd-mapping, I think
> "THPeligible here means "pmd mappable". For example, AnonHugePages in
> smaps means  pmd-mapped anon hugepages.

Yeah, depends on the expectation.

>
> That all said - the following patches will delete
> transparent_hugepage_active() anyways.

Yes, how I could forget this :-( The following removal of
transparent_hugepage_active() will restore the old behavior.

>
> > I should elaborate this in the commit log.
> >
> > >
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index 84b9cf4b9be9..d0f8020164fc 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -454,6 +454,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > >                                 vma->vm_pgoff, HPAGE_PMD_NR))
> > > >                 return false;
> > > >
> > > > +       if (!transhuge_vma_size_ok(vma))
> > > > +               return false;
> > > > +
> > > >         /* Enabled via shmem mount options or sysfs settings. */
> > > >         if (shmem_file(vma->vm_file))
> > > >                 return shmem_huge_enabled(vma);
> > > > @@ -512,9 +515,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> > > >                           unsigned long vm_flags)
> > > >  {
> > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > > -           khugepaged_enabled() &&
> > > > -           (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
> > > > -            (vma->vm_end & HPAGE_PMD_MASK))) {
> > > > +           khugepaged_enabled()) {
> > > >                 if (hugepage_vma_check(vma, vm_flags))
> > > >                         __khugepaged_enter(vma->vm_mm);
> > > >         }
> > > > @@ -2142,10 +2143,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> > > >                         progress++;
> > > >                         continue;
> > > >                 }
> > > > -               hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > > +
> > > > +               hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
> > > >                 hend = vma->vm_end & HPAGE_PMD_MASK;
> > > > -               if (hstart >= hend)
> > > > -                       goto skip;
> > > >                 if (khugepaged_scan.address > hend)
> > > >                         goto skip;
> > > >                 if (khugepaged_scan.address < hstart)
> > >
> > > Likewise, could do round_down() here (just a suggestion)
> >
> > Fine to me.
> >
> > >
> > > > --
> > > > 2.26.3
> > > >
> > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper
  2022-06-10  7:20   ` Miaohe Lin
@ 2022-06-10 16:47     ` Yang Shi
  0 siblings, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-10 16:47 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Linux MM, Linux Kernel Mailing List, Vlastimil Babka,
	Kirill A. Shutemov, Matthew Wilcox, Andrew Morton

On Fri, Jun 10, 2022 at 12:20 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/6/7 5:44, Yang Shi wrote:
> > There are couple of places that check whether the vma size is ok for
> > THP or not, they are open coded and duplicate, introduce
> > transhuge_vma_size_ok() helper to do the job.
> >
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  include/linux/huge_mm.h | 17 +++++++++++++++++
> >  mm/huge_memory.c        |  5 +----
> >  mm/khugepaged.c         | 12 ++++++------
> >  3 files changed, 24 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 648cb3ce7099..a8f61db47f2a 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -116,6 +116,18 @@ extern struct kobj_attribute shmem_enabled_attr;
> >
> >  extern unsigned long transparent_hugepage_flags;
> >
> > +/*
> > + * The vma size has to be large enough to hold an aligned HPAGE_PMD_SIZE area.
> > + */
> > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > +{
> > +     if (round_up(vma->vm_start, HPAGE_PMD_SIZE) <
> > +         (vma->vm_end & HPAGE_PMD_MASK))
> > +             return true;
> > +
> > +     return false;
> > +}
> > +
> >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> >               unsigned long addr)
> >  {
> > @@ -345,6 +357,11 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> >       return false;
> >  }
> >
> > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > +{
> > +     return false;
> > +}
> > +
> >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> >               unsigned long addr)
> >  {
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 48182c8fe151..36ada544e494 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -71,10 +71,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> >
> >  bool transparent_hugepage_active(struct vm_area_struct *vma)
> >  {
> > -     /* The addr is used to check if the vma size fits */
> > -     unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> > -
> > -     if (!transhuge_vma_suitable(vma, addr))
>
> There is also pgoff check for file page in transhuge_vma_suitable. Is it ignored
> deliberately?

This has been discussed in the previous threads. The following removal
of transparent_hugepage_active() will restore the behavior.

>
> > +     if (!transhuge_vma_size_ok(vma))
> >               return false;
> >       if (vma_is_anonymous(vma))
> >               return __transparent_hugepage_enabled(vma);
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 84b9cf4b9be9..d0f8020164fc 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -454,6 +454,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> >                               vma->vm_pgoff, HPAGE_PMD_NR))
> >               return false;
> >
> > +     if (!transhuge_vma_size_ok(vma))
> > +             return false;
> > +
> >       /* Enabled via shmem mount options or sysfs settings. */
> >       if (shmem_file(vma->vm_file))
> >               return shmem_huge_enabled(vma);
> > @@ -512,9 +515,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> >                         unsigned long vm_flags)
> >  {
> >       if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > -         khugepaged_enabled() &&
> > -         (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
> > -          (vma->vm_end & HPAGE_PMD_MASK))) {
> > +         khugepaged_enabled()) {
> >               if (hugepage_vma_check(vma, vm_flags))
> >                       __khugepaged_enter(vma->vm_mm);
> >       }
>
> After this change, khugepaged_enter_vma is identical to khugepaged_enter. Should one of
> them be removed?

Thanks for catching this. Although the later patch will make them
slightly different (khugepaged_enter() won't check hugepage flag
anymore), but the only user of khugepaged_enter() is page fault, and
it seems not worth keeping both. Will remove khugepaged_enter() in the
next version.

>
> Thanks!
>
> > @@ -2142,10 +2143,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> >                       progress++;
> >                       continue;
> >               }
> > -             hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > +
> > +             hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
> >               hend = vma->vm_end & HPAGE_PMD_MASK;
> > -             if (hstart >= hend)
> > -                     goto skip;
> >               if (khugepaged_scan.address > hend)
> >                       goto skip;
> >               if (khugepaged_scan.address < hstart)
> >
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code
  2022-06-10  1:51   ` Zach O'Keefe
@ 2022-06-10 16:59     ` Yang Shi
  2022-06-10 22:03       ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-10 16:59 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > The hugepage_vma_revalidate() needs to check if the address is still in
> > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock,
> > but it was open-coded, use transhuge_vma_suitable() to do the job.  And
> > add proper comments for transhuge_vma_suitable().
> >
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  include/linux/huge_mm.h | 6 ++++++
> >  mm/khugepaged.c         | 5 +----
> >  2 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index a8f61db47f2a..79d5919beb83 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> >         return false;
> >  }
> >
> > +/*
> > + * Do the below checks:
> > + *   - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned.
> > + *   - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE
> > + *     area.
> > + */
>
> AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but
> rather that linear_page_index(vma, round_up(vma->vm_start,
> HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was

Yeah, you are right.

> pretty confused about this (hopefully I have it right now - if not -
> case and point :) ), so it might be a good opportunity to add some
> extra commentary to help future travelers understand why this
> constraint exists.

I'm not fully sure I understand this 100%. I think this is related to
how page cache is structured. I will try to add more comments.

>
> Also I wonder while we're at it if we can rename this to
> transhuge_addr_aligned() or transhuge_addr_suitable() or something.

I think it is still actually used to check vma.

>
> Otherwise I think the change is a nice cleanup.
>
> >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> >                 unsigned long addr)
> >  {
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 7a5d1c1a1833..ca1754d3a827 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> >                 struct vm_area_struct **vmap)
> >  {
> >         struct vm_area_struct *vma;
> > -       unsigned long hstart, hend;
> >
> >         if (unlikely(khugepaged_test_exit(mm)))
> >                 return SCAN_ANY_PROCESS;
> > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> >         if (!vma)
> >                 return SCAN_VMA_NULL;
> >
> > -       hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > -       hend = vma->vm_end & HPAGE_PMD_MASK;
> > -       if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > +       if (!transhuge_vma_suitable(vma, address))
> >                 return SCAN_ADDRESS_RANGE;
> >         if (!hugepage_vma_check(vma, vma->vm_flags))
> >                 return SCAN_VMA_CHECK;
> > --
> > 2.26.3
> >
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 5/7] mm: thp: kill transparent_hugepage_active()
  2022-06-10  1:02   ` Zach O'Keefe
@ 2022-06-10 17:02     ` Yang Shi
  2022-06-13 15:06       ` Zach O'Keefe
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-10 17:02 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Thu, Jun 9, 2022 at 6:03 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > The transparent_hugepage_active() was introduced to show THP eligibility
> > bit in smaps in proc, smaps is the only user.  But it actually does the
> > similar check as hugepage_vma_check() which is used by khugepaged.  We
> > definitely don't have to maintain two similar checks, so kill
> > transparent_hugepage_active().
>
> I never realized smaps was the only user! Great!
>
> > Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it
> > is not only for khugepaged anymore.
> >
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  fs/proc/task_mmu.c         |  2 +-
> >  include/linux/huge_mm.h    | 16 +++++++-----
> >  include/linux/khugepaged.h |  4 +--
> >  mm/huge_memory.c           | 50 ++++++++++++++++++++++++++++++++-----
> >  mm/khugepaged.c            | 51 +++-----------------------------------
> >  5 files changed, 60 insertions(+), 63 deletions(-)
> >
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 2dd8c8a66924..fd79566e204c 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
> >         __show_smap(m, &mss, false);
> >
> >         seq_printf(m, "THPeligible:    %d\n",
> > -                  transparent_hugepage_active(vma));
> > +                  hugepage_vma_check(vma, vma->vm_flags, true));
> >
> >         if (arch_pkeys_enabled())
> >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 79d5919beb83..f561c3e16def 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -209,7 +209,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >                !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> >  }
> >
> > -bool transparent_hugepage_active(struct vm_area_struct *vma);
> > +bool hugepage_vma_check(struct vm_area_struct *vma,
> > +                       unsigned long vm_flags,
> > +                       bool smaps);
> >
> >  #define transparent_hugepage_use_zero_page()                           \
> >         (transparent_hugepage_flags &                                   \
> > @@ -358,11 +360,6 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> >         return false;
> >  }
> >
> > -static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> > -{
> > -       return false;
> > -}
> > -
> >  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> >  {
> >         return false;
> > @@ -380,6 +377,13 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> >         return false;
> >  }
> >
> > +static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> > +                                      unsigned long vm_flags,
> > +                                      bool smaps)
> > +{
> > +       return false;
> > +}
> > +
> >  static inline void prep_transhuge_page(struct page *page) {}
> >
> >  #define transparent_hugepage_flags 0UL
> > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > index 392d34c3c59a..8a6452e089ca 100644
> > --- a/include/linux/khugepaged.h
> > +++ b/include/linux/khugepaged.h
> > @@ -10,8 +10,6 @@ extern struct attribute_group khugepaged_attr_group;
> >  extern int khugepaged_init(void);
> >  extern void khugepaged_destroy(void);
> >  extern int start_stop_khugepaged(void);
> > -extern bool hugepage_vma_check(struct vm_area_struct *vma,
> > -                              unsigned long vm_flags);
> >  extern void __khugepaged_enter(struct mm_struct *mm);
> >  extern void __khugepaged_exit(struct mm_struct *mm);
> >  extern void khugepaged_enter_vma(struct vm_area_struct *vma,
> > @@ -57,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
> >  {
> >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> >             khugepaged_enabled()) {
> > -               if (hugepage_vma_check(vma, vm_flags))
> > +               if (hugepage_vma_check(vma, vm_flags, false))
> >                         __khugepaged_enter(vma->vm_mm);
> >         }
> >  }
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 36ada544e494..bc8370856e85 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -69,18 +69,56 @@ static atomic_t huge_zero_refcount;
> >  struct page *huge_zero_page __read_mostly;
> >  unsigned long huge_zero_pfn __read_mostly = ~0UL;
> >
> > -bool transparent_hugepage_active(struct vm_area_struct *vma)
> > +bool hugepage_vma_check(struct vm_area_struct *vma,
> > +                       unsigned long vm_flags,
> > +                       bool smaps)
> >  {
> > +       if (!transhuge_vma_enabled(vma, vm_flags))
> > +               return false;
> > +
> > +       if (vm_flags & VM_NO_KHUGEPAGED)
> > +               return false;
> > +
> > +       /* Don't run khugepaged against DAX vma */
> > +       if (vma_is_dax(vma))
> > +               return false;
> > +
> > +       if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > +                               vma->vm_pgoff, HPAGE_PMD_NR))
> > +               return false;
> > +
> >         if (!transhuge_vma_size_ok(vma))
> >                 return false;
> > -       if (vma_is_anonymous(vma))
> > -               return __transparent_hugepage_enabled(vma);
> > -       if (vma_is_shmem(vma))
> > +
> > +       /* Enabled via shmem mount options or sysfs settings. */
> > +       if (shmem_file(vma->vm_file))
> >                 return shmem_huge_enabled(vma);
> > -       if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
> > +
> > +       if (!khugepaged_enabled())
> > +               return false;
> > +
> > +       /* THP settings require madvise. */
> > +       if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> > +               return false;
> > +
> > +       /* Only regular file is valid */
> > +       if (file_thp_enabled(vma))
> >                 return true;
> >
> > -       return false;
> > +       if (!vma_is_anonymous(vma))
> > +               return false;
> > +
> > +       if (vma_is_temporary_stack(vma))
> > +               return false;
> > +
> > +       /*
> > +        * THPeligible bit of smaps should show 1 for proper VMAs even
> > +        * though anon_vma is not initialized yet.
> > +        */
> > +       if (!vma->anon_vma)
> > +               return smaps;
> > +
> > +       return true;
> >  }
>
> There are a few cases where the return value for smaps will be
> different from before. I presume this won't be an issue, and that any
> difference resulting from this change is actually a positive
> difference, given it more accurately reflects the thp eligibility of
> the vma? For example, a VM_NO_KHUGEPAGED-marked vma might now show 0
> where it otherwise showed 1.

Yes, returning 1 for VM_NO_KHUGEPAGED vmas is wrong. Actually TBH I
suspect very few people actually use this bit. Anyway I will elaborate
this in the commit log.

>
> >  static bool get_huge_zero_page(void)
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index ca1754d3a827..aa0769e3b0d9 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -437,49 +437,6 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
> >         return atomic_read(&mm->mm_users) == 0;
> >  }
> >
> > -bool hugepage_vma_check(struct vm_area_struct *vma,
> > -                       unsigned long vm_flags)
> > -{
> > -       if (!transhuge_vma_enabled(vma, vm_flags))
> > -               return false;
> > -
> > -       if (vm_flags & VM_NO_KHUGEPAGED)
> > -               return false;
> > -
> > -       /* Don't run khugepaged against DAX vma */
> > -       if (vma_is_dax(vma))
> > -               return false;
> > -
> > -       if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > -                               vma->vm_pgoff, HPAGE_PMD_NR))
> > -               return false;
> > -
> > -       if (!transhuge_vma_size_ok(vma))
> > -               return false;
> > -
> > -       /* Enabled via shmem mount options or sysfs settings. */
> > -       if (shmem_file(vma->vm_file))
> > -               return shmem_huge_enabled(vma);
> > -
> > -       if (!khugepaged_enabled())
> > -               return false;
> > -
> > -       /* THP settings require madvise. */
> > -       if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> > -               return false;
> > -
> > -       /* Only regular file is valid */
> > -       if (file_thp_enabled(vma))
> > -               return true;
> > -
> > -       if (!vma->anon_vma || !vma_is_anonymous(vma))
> > -               return false;
> > -       if (vma_is_temporary_stack(vma))
> > -               return false;
> > -
> > -       return true;
> > -}
> > -
> >  void __khugepaged_enter(struct mm_struct *mm)
> >  {
> >         struct mm_slot *mm_slot;
> > @@ -516,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> >  {
> >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> >             khugepaged_enabled()) {
> > -               if (hugepage_vma_check(vma, vm_flags))
> > +               if (hugepage_vma_check(vma, vm_flags, false))
> >                         __khugepaged_enter(vma->vm_mm);
> >         }
> >  }
> > @@ -961,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> >
> >         if (!transhuge_vma_suitable(vma, address))
> >                 return SCAN_ADDRESS_RANGE;
> > -       if (!hugepage_vma_check(vma, vma->vm_flags))
> > +       if (!hugepage_vma_check(vma, vma->vm_flags, false))
> >                 return SCAN_VMA_CHECK;
> >         return 0;
> >  }
> > @@ -1442,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
> >          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
> >          * will not fail the vma for missing VM_HUGEPAGE
> >          */
> > -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
> > +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> >                 return;
> >
> >         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> > @@ -2132,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> >                         progress++;
> >                         break;
> >                 }
> > -               if (!hugepage_vma_check(vma, vma->vm_flags)) {
> > +               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> >  skip:
> >                         progress++;
> >                         continue;
> > --
> > 2.26.3
> >
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled()
  2022-06-10  2:22   ` Zach O'Keefe
@ 2022-06-10 17:24     ` Yang Shi
  2022-06-10 21:07       ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-10 17:24 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Thu, Jun 9, 2022 at 7:22 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > The page fault path checks THP eligibility with
> > __transhuge_page_enabled() which does the similar thing as
> > hugepage_vma_check(), so use hugepage_vma_check() instead.
> >
> > However page fault allows DAX and !anon_vma cases, so added a new flag,
> > in_pf, to hugepage_vma_check() to make page fault work correctly.
> >
> > The in_pf flag is also used to skip shmem and file THP for page fault
> > since shmem handles THP in its own shmem_fault() and file THP allocation
> > on fault is not supported yet.
> >
> > Also remove hugepage_vma_enabled() since hugepage_vma_check() is the
> > only caller now, it is not necessary to have a helper function.
> >
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  fs/proc/task_mmu.c         |  2 +-
> >  include/linux/huge_mm.h    | 57 ++------------------------------------
> >  include/linux/khugepaged.h |  2 +-
> >  mm/huge_memory.c           | 25 ++++++++++++-----
> >  mm/khugepaged.c            |  8 +++---
> >  mm/memory.c                |  7 +++--
> >  6 files changed, 31 insertions(+), 70 deletions(-)
> >
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index fd79566e204c..a0850303baec 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
> >         __show_smap(m, &mss, false);
> >
> >         seq_printf(m, "THPeligible:    %d\n",
> > -                  hugepage_vma_check(vma, vma->vm_flags, true));
> > +                  hugepage_vma_check(vma, vma->vm_flags, true, false));
> >
> >         if (arch_pkeys_enabled())
> >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index f561c3e16def..d478e8875023 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -153,48 +153,6 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> >         return true;
> >  }
> >
> > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > -                                         unsigned long vm_flags)
> > -{
> > -       /* Explicitly disabled through madvise. */
> > -       if ((vm_flags & VM_NOHUGEPAGE) ||
> > -           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > -               return false;
> > -       return true;
> > -}
> > -
> > -/*
> > - * to be used on vmas which are known to support THP.
> > - * Use transparent_hugepage_active otherwise
> > - */
> > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > -{
> > -
> > -       /*
> > -        * If the hardware/firmware marked hugepage support disabled.
> > -        */
> > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> > -               return false;
> > -
> > -       if (!transhuge_vma_enabled(vma, vma->vm_flags))
> > -               return false;
> > -
> > -       if (vma_is_temporary_stack(vma))
> > -               return false;
> > -
> > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> > -               return true;
> > -
> > -       if (vma_is_dax(vma))
> > -               return true;
> > -
> > -       if (transparent_hugepage_flags &
> > -                               (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
> > -               return !!(vma->vm_flags & VM_HUGEPAGE);
> > -
> > -       return false;
> > -}
> > -
> >  static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >  {
> >         struct inode *inode;
> > @@ -211,7 +169,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >
> >  bool hugepage_vma_check(struct vm_area_struct *vma,
> >                         unsigned long vm_flags,
> > -                       bool smaps);
> > +                       bool smaps, bool in_pf);
> >
> >  #define transparent_hugepage_use_zero_page()                           \
> >         (transparent_hugepage_flags &                                   \
> > @@ -355,11 +313,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio)
> >         return false;
> >  }
> >
> > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > -{
> > -       return false;
> > -}
> > -
> >  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> >  {
> >         return false;
> > @@ -371,15 +324,9 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> >         return false;
> >  }
> >
> > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > -                                         unsigned long vm_flags)
> > -{
> > -       return false;
> > -}
> > -
> >  static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> >                                        unsigned long vm_flags,
> > -                                      bool smaps)
> > +                                      bool smaps, bool in_pf)
> >  {
> >         return false;
> >  }
> > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > index 8a6452e089ca..e047be601268 100644
> > --- a/include/linux/khugepaged.h
> > +++ b/include/linux/khugepaged.h
> > @@ -55,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
> >  {
> >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> >             khugepaged_enabled()) {
> > -               if (hugepage_vma_check(vma, vm_flags, false))
> > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> >                         __khugepaged_enter(vma->vm_mm);
> >         }
> >  }
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index bc8370856e85..b95786ada466 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -71,17 +71,25 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> >
> >  bool hugepage_vma_check(struct vm_area_struct *vma,
> >                         unsigned long vm_flags,
> > -                       bool smaps)
> > +                       bool smaps, bool in_pf)
> >  {
> > -       if (!transhuge_vma_enabled(vma, vm_flags))
> > +       /* Explicitly disabled through madvise or prctl. */
>
> Or s390 kvm (not that this has to be exhaustively maintained).
>
> > +       if ((vm_flags & VM_NOHUGEPAGE) ||
> > +           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > +               return false;
> > +       /*
> > +        * If the hardware/firmware marked hugepage support disabled.
> > +        */
> > +       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> >                 return false;
>
> This introduces an extra check for khugepaged path. I don't know
> enough about TRANSPARENT_HUGEPAGE_NEVER_DAX, but I assume this is ok?
> What would have happened previously if khugepaged tried to collapse
> this memory?

Please refer to commit bae849538157 ("mm/pmem: avoid inserting
hugepage PTE entry with fsdax if hugepage support is disabled") for
why this flag was introduced.

It is set if hardware doesn't support hugepages, and khugepaged
doesn't collapse since khugepaged won't be started at all.

But this flag needs to be checked in the page fault path.

>
> > +       /* Special VMA and hugetlb VMA */
> >         if (vm_flags & VM_NO_KHUGEPAGED)
> >                 return false;
>
> This adds an extra check along the fault path. Is it also safe to add?

I think it is safe since hugepage_vma_check() is just used by THP.
Hugetlb has its own page fault handler.

>
> > -       /* Don't run khugepaged against DAX vma */
> > +       /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
> >         if (vma_is_dax(vma))
> > -               return false;
> > +               return in_pf;
>
> I assume vma_is_temporary_stack() and vma_is_dax() is mutually exclusive.

I think so.

>
> >         if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> >                                 vma->vm_pgoff, HPAGE_PMD_NR))
> > @@ -91,7 +99,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> >                 return false;
> >
> >         /* Enabled via shmem mount options or sysfs settings. */
> > -       if (shmem_file(vma->vm_file))
> > +       if (!in_pf && shmem_file(vma->vm_file))
> >                 return shmem_huge_enabled(vma);
>
> Will shmem_file() ever be true in the fault path? Or is this just an
> optimization?

It could be true. But shmem has its own implementation for huge page
fault and doesn't implement huge_fault() in its vm_operations, so it
will fallback even though "in_pf" is not checked.

But xfs does have huge_fault() implemented, so it may try to allocate
THP for non-DAX xfs files. So the "in_pf" flag is introduced to handle
this. Since we need this flag anyway, why not use it to return earlier
for shmem instead of relying on fallback.

Anyway this is all because __transparent_huge_enabled() is replaced by
hugepage_vma_check().

> >         if (!khugepaged_enabled())
> > @@ -102,7 +110,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> >                 return false;
> >
> >         /* Only regular file is valid */
> > -       if (file_thp_enabled(vma))
> > +       if (!in_pf && file_thp_enabled(vma))
> >                 return true;
>
> Likewise for file_thp_enabled()

Yes, same as the above.

>
> >         if (!vma_is_anonymous(vma))
> > @@ -114,9 +122,12 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> >         /*
> >          * THPeligible bit of smaps should show 1 for proper VMAs even
> >          * though anon_vma is not initialized yet.
> > +        *
> > +        * Allow page fault since anon_vma may be not initialized until
> > +        * the first page fault.
> >          */
> >         if (!vma->anon_vma)
> > -               return smaps;
> > +               return (smaps || in_pf);
> >
> >         return true;
> >  }
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index aa0769e3b0d9..ab6183c5489f 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -473,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> >  {
> >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> >             khugepaged_enabled()) {
> > -               if (hugepage_vma_check(vma, vm_flags, false))
> > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> >                         __khugepaged_enter(vma->vm_mm);
> >         }
> >  }
> > @@ -918,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> >
> >         if (!transhuge_vma_suitable(vma, address))
> >                 return SCAN_ADDRESS_RANGE;
> > -       if (!hugepage_vma_check(vma, vma->vm_flags, false))
> > +       if (!hugepage_vma_check(vma, vma->vm_flags, false, false))
> >                 return SCAN_VMA_CHECK;
> >         return 0;
> >  }
> > @@ -1399,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
> >          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
> >          * will not fail the vma for missing VM_HUGEPAGE
> >          */
> > -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> > +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false))
> >                 return;
> >
> >         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> > @@ -2089,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> >                         progress++;
> >                         break;
> >                 }
> > -               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> > +               if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) {
> >  skip:
> >                         progress++;
> >                         continue;
> > diff --git a/mm/memory.c b/mm/memory.c
> > index bc5d40eec5d5..673f7561a30a 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4962,6 +4962,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> >                 .gfp_mask = __get_fault_gfp_mask(vma),
> >         };
> >         struct mm_struct *mm = vma->vm_mm;
> > +       unsigned long vm_flags = vma->vm_flags;
> >         pgd_t *pgd;
> >         p4d_t *p4d;
> >         vm_fault_t ret;
> > @@ -4975,7 +4976,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> >         if (!vmf.pud)
> >                 return VM_FAULT_OOM;
> >  retry_pud:
> > -       if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
> > +       if (pud_none(*vmf.pud) &&
> > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> >                 ret = create_huge_pud(&vmf);
> >                 if (!(ret & VM_FAULT_FALLBACK))
> >                         return ret;
> > @@ -5008,7 +5010,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> >         if (pud_trans_unstable(vmf.pud))
> >                 goto retry_pud;
> >
> > -       if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
> > +       if (pmd_none(*vmf.pmd) &&
> > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> >                 ret = create_huge_pmd(&vmf);
> >                 if (!(ret & VM_FAULT_FALLBACK))
> >                         return ret;
> > --
> > 2.26.3
> >
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled()
  2022-06-10 17:24     ` Yang Shi
@ 2022-06-10 21:07       ` Yang Shi
  2022-06-13 14:54         ` Zach O'Keefe
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-10 21:07 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Fri, Jun 10, 2022 at 10:24 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, Jun 9, 2022 at 7:22 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > The page fault path checks THP eligibility with
> > > __transhuge_page_enabled() which does the similar thing as
> > > hugepage_vma_check(), so use hugepage_vma_check() instead.
> > >
> > > However page fault allows DAX and !anon_vma cases, so added a new flag,
> > > in_pf, to hugepage_vma_check() to make page fault work correctly.
> > >
> > > The in_pf flag is also used to skip shmem and file THP for page fault
> > > since shmem handles THP in its own shmem_fault() and file THP allocation
> > > on fault is not supported yet.
> > >
> > > Also remove hugepage_vma_enabled() since hugepage_vma_check() is the
> > > only caller now, it is not necessary to have a helper function.
> > >
> > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > ---
> > >  fs/proc/task_mmu.c         |  2 +-
> > >  include/linux/huge_mm.h    | 57 ++------------------------------------
> > >  include/linux/khugepaged.h |  2 +-
> > >  mm/huge_memory.c           | 25 ++++++++++++-----
> > >  mm/khugepaged.c            |  8 +++---
> > >  mm/memory.c                |  7 +++--
> > >  6 files changed, 31 insertions(+), 70 deletions(-)
> > >
> > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > index fd79566e204c..a0850303baec 100644
> > > --- a/fs/proc/task_mmu.c
> > > +++ b/fs/proc/task_mmu.c
> > > @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
> > >         __show_smap(m, &mss, false);
> > >
> > >         seq_printf(m, "THPeligible:    %d\n",
> > > -                  hugepage_vma_check(vma, vma->vm_flags, true));
> > > +                  hugepage_vma_check(vma, vma->vm_flags, true, false));
> > >
> > >         if (arch_pkeys_enabled())
> > >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index f561c3e16def..d478e8875023 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -153,48 +153,6 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > >         return true;
> > >  }
> > >
> > > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > > -                                         unsigned long vm_flags)
> > > -{
> > > -       /* Explicitly disabled through madvise. */
> > > -       if ((vm_flags & VM_NOHUGEPAGE) ||
> > > -           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > > -               return false;
> > > -       return true;
> > > -}
> > > -
> > > -/*
> > > - * to be used on vmas which are known to support THP.
> > > - * Use transparent_hugepage_active otherwise
> > > - */
> > > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > > -{
> > > -
> > > -       /*
> > > -        * If the hardware/firmware marked hugepage support disabled.
> > > -        */
> > > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> > > -               return false;
> > > -
> > > -       if (!transhuge_vma_enabled(vma, vma->vm_flags))
> > > -               return false;
> > > -
> > > -       if (vma_is_temporary_stack(vma))
> > > -               return false;
> > > -
> > > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> > > -               return true;
> > > -
> > > -       if (vma_is_dax(vma))
> > > -               return true;
> > > -
> > > -       if (transparent_hugepage_flags &
> > > -                               (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
> > > -               return !!(vma->vm_flags & VM_HUGEPAGE);
> > > -
> > > -       return false;
> > > -}
> > > -
> > >  static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > >  {
> > >         struct inode *inode;
> > > @@ -211,7 +169,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > >
> > >  bool hugepage_vma_check(struct vm_area_struct *vma,
> > >                         unsigned long vm_flags,
> > > -                       bool smaps);
> > > +                       bool smaps, bool in_pf);
> > >
> > >  #define transparent_hugepage_use_zero_page()                           \
> > >         (transparent_hugepage_flags &                                   \
> > > @@ -355,11 +313,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio)
> > >         return false;
> > >  }
> > >
> > > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > > -{
> > > -       return false;
> > > -}
> > > -
> > >  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > >  {
> > >         return false;
> > > @@ -371,15 +324,9 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > >         return false;
> > >  }
> > >
> > > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > > -                                         unsigned long vm_flags)
> > > -{
> > > -       return false;
> > > -}
> > > -
> > >  static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> > >                                        unsigned long vm_flags,
> > > -                                      bool smaps)
> > > +                                      bool smaps, bool in_pf)
> > >  {
> > >         return false;
> > >  }
> > > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > > index 8a6452e089ca..e047be601268 100644
> > > --- a/include/linux/khugepaged.h
> > > +++ b/include/linux/khugepaged.h
> > > @@ -55,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
> > >  {
> > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > >             khugepaged_enabled()) {
> > > -               if (hugepage_vma_check(vma, vm_flags, false))
> > > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> > >                         __khugepaged_enter(vma->vm_mm);
> > >         }
> > >  }
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index bc8370856e85..b95786ada466 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -71,17 +71,25 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > >
> > >  bool hugepage_vma_check(struct vm_area_struct *vma,
> > >                         unsigned long vm_flags,
> > > -                       bool smaps)
> > > +                       bool smaps, bool in_pf)
> > >  {
> > > -       if (!transhuge_vma_enabled(vma, vm_flags))
> > > +       /* Explicitly disabled through madvise or prctl. */
> >
> > Or s390 kvm (not that this has to be exhaustively maintained).
> >
> > > +       if ((vm_flags & VM_NOHUGEPAGE) ||
> > > +           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > > +               return false;
> > > +       /*
> > > +        * If the hardware/firmware marked hugepage support disabled.
> > > +        */
> > > +       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> > >                 return false;
> >
> > This introduces an extra check for khugepaged path. I don't know
> > enough about TRANSPARENT_HUGEPAGE_NEVER_DAX, but I assume this is ok?
> > What would have happened previously if khugepaged tried to collapse
> > this memory?
>
> Please refer to commit bae849538157 ("mm/pmem: avoid inserting
> hugepage PTE entry with fsdax if hugepage support is disabled") for
> why this flag was introduced.
>
> It is set if hardware doesn't support hugepages, and khugepaged
> doesn't collapse since khugepaged won't be started at all.
>
> But this flag needs to be checked in the page fault path.
>
> >
> > > +       /* Special VMA and hugetlb VMA */
> > >         if (vm_flags & VM_NO_KHUGEPAGED)
> > >                 return false;
> >
> > This adds an extra check along the fault path. Is it also safe to add?
>
> I think it is safe since hugepage_vma_check() is just used by THP.
> Hugetlb has its own page fault handler.

I just found one exception. The fuse dax has VM_MIXEDMAP set for its
vmas, so this check should be moved after vma_is_dax() check.

AFAICT, only dax supports huge_fault() and dax vmas don't have any
VM_SPECIAL flags set other than fuse.

>
> >
> > > -       /* Don't run khugepaged against DAX vma */
> > > +       /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
> > >         if (vma_is_dax(vma))
> > > -               return false;
> > > +               return in_pf;
> >
> > I assume vma_is_temporary_stack() and vma_is_dax() is mutually exclusive.
>
> I think so.
>
> >
> > >         if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > >                                 vma->vm_pgoff, HPAGE_PMD_NR))
> > > @@ -91,7 +99,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > >                 return false;
> > >
> > >         /* Enabled via shmem mount options or sysfs settings. */
> > > -       if (shmem_file(vma->vm_file))
> > > +       if (!in_pf && shmem_file(vma->vm_file))
> > >                 return shmem_huge_enabled(vma);
> >
> > Will shmem_file() ever be true in the fault path? Or is this just an
> > optimization?
>
> It could be true. But shmem has its own implementation for huge page
> fault and doesn't implement huge_fault() in its vm_operations, so it
> will fallback even though "in_pf" is not checked.
>
> But xfs does have huge_fault() implemented, so it may try to allocate
> THP for non-DAX xfs files. So the "in_pf" flag is introduced to handle
> this. Since we need this flag anyway, why not use it to return earlier
> for shmem instead of relying on fallback.
>
> Anyway this is all because __transparent_huge_enabled() is replaced by
> hugepage_vma_check().
>
> > >         if (!khugepaged_enabled())
> > > @@ -102,7 +110,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > >                 return false;
> > >
> > >         /* Only regular file is valid */
> > > -       if (file_thp_enabled(vma))
> > > +       if (!in_pf && file_thp_enabled(vma))
> > >                 return true;
> >
> > Likewise for file_thp_enabled()
>
> Yes, same as the above.
>
> >
> > >         if (!vma_is_anonymous(vma))
> > > @@ -114,9 +122,12 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > >         /*
> > >          * THPeligible bit of smaps should show 1 for proper VMAs even
> > >          * though anon_vma is not initialized yet.
> > > +        *
> > > +        * Allow page fault since anon_vma may be not initialized until
> > > +        * the first page fault.
> > >          */
> > >         if (!vma->anon_vma)
> > > -               return smaps;
> > > +               return (smaps || in_pf);
> > >
> > >         return true;
> > >  }
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index aa0769e3b0d9..ab6183c5489f 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -473,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> > >  {
> > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > >             khugepaged_enabled()) {
> > > -               if (hugepage_vma_check(vma, vm_flags, false))
> > > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> > >                         __khugepaged_enter(vma->vm_mm);
> > >         }
> > >  }
> > > @@ -918,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > >
> > >         if (!transhuge_vma_suitable(vma, address))
> > >                 return SCAN_ADDRESS_RANGE;
> > > -       if (!hugepage_vma_check(vma, vma->vm_flags, false))
> > > +       if (!hugepage_vma_check(vma, vma->vm_flags, false, false))
> > >                 return SCAN_VMA_CHECK;
> > >         return 0;
> > >  }
> > > @@ -1399,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
> > >          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
> > >          * will not fail the vma for missing VM_HUGEPAGE
> > >          */
> > > -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> > > +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false))
> > >                 return;
> > >
> > >         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> > > @@ -2089,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> > >                         progress++;
> > >                         break;
> > >                 }
> > > -               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> > > +               if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) {
> > >  skip:
> > >                         progress++;
> > >                         continue;
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index bc5d40eec5d5..673f7561a30a 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -4962,6 +4962,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > >                 .gfp_mask = __get_fault_gfp_mask(vma),
> > >         };
> > >         struct mm_struct *mm = vma->vm_mm;
> > > +       unsigned long vm_flags = vma->vm_flags;
> > >         pgd_t *pgd;
> > >         p4d_t *p4d;
> > >         vm_fault_t ret;
> > > @@ -4975,7 +4976,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > >         if (!vmf.pud)
> > >                 return VM_FAULT_OOM;
> > >  retry_pud:
> > > -       if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
> > > +       if (pud_none(*vmf.pud) &&
> > > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> > >                 ret = create_huge_pud(&vmf);
> > >                 if (!(ret & VM_FAULT_FALLBACK))
> > >                         return ret;
> > > @@ -5008,7 +5010,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > >         if (pud_trans_unstable(vmf.pud))
> > >                 goto retry_pud;
> > >
> > > -       if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
> > > +       if (pmd_none(*vmf.pmd) &&
> > > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> > >                 ret = create_huge_pmd(&vmf);
> > >                 if (!(ret & VM_FAULT_FALLBACK))
> > >                         return ret;
> > > --
> > > 2.26.3
> > >
> > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper
  2022-06-10 16:38         ` Yang Shi
@ 2022-06-10 21:24           ` Yang Shi
  0 siblings, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-10 21:24 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Fri, Jun 10, 2022 at 9:38 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, Jun 9, 2022 at 5:52 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On Thu, Jun 9, 2022 at 5:08 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Thu, Jun 9, 2022 at 3:21 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > > >
> > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > >
> > > > > There are couple of places that check whether the vma size is ok for
> > > > > THP or not, they are open coded and duplicate, introduce
> > > > > transhuge_vma_size_ok() helper to do the job.
> > > > >
> > > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > > ---
> > > > >  include/linux/huge_mm.h | 17 +++++++++++++++++
> > > > >  mm/huge_memory.c        |  5 +----
> > > > >  mm/khugepaged.c         | 12 ++++++------
> > > > >  3 files changed, 24 insertions(+), 10 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > index 648cb3ce7099..a8f61db47f2a 100644
> > > > > --- a/include/linux/huge_mm.h
> > > > > +++ b/include/linux/huge_mm.h
> > > > > @@ -116,6 +116,18 @@ extern struct kobj_attribute shmem_enabled_attr;
> > > > >
> > > > >  extern unsigned long transparent_hugepage_flags;
> > > > >
> > > > > +/*
> > > > > + * The vma size has to be large enough to hold an aligned HPAGE_PMD_SIZE area.
> > > > > + */
> > > > > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > > > +{
> > > > > +       if (round_up(vma->vm_start, HPAGE_PMD_SIZE) <
> > > > > +           (vma->vm_end & HPAGE_PMD_MASK))
> > > > > +               return true;
> > > > > +
> > > > > +       return false;
> > > > > +}
> > > >
> > > > First time coming across round_up() - thanks for that - but for
> > > > symmetry, maybe also use round_down() for the end? No strong opinion -
> > > > just a suggestion given I've just discovered it.
> > >
> > > Yeah, round_down is fine too.
> > >
> > > >
> > > > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > > >                 unsigned long addr)
> > > > >  {
> > > > > @@ -345,6 +357,11 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > > >         return false;
> > > > >  }
> > > > >
> > > > > +static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > > > +{
> > > > > +       return false;
> > > > > +}
> > > > > +
> > > > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > > >                 unsigned long addr)
> > > > >  {
> > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > > index 48182c8fe151..36ada544e494 100644
> > > > > --- a/mm/huge_memory.c
> > > > > +++ b/mm/huge_memory.c
> > > > > @@ -71,10 +71,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > > > >
> > > > >  bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > > >  {
> > > > > -       /* The addr is used to check if the vma size fits */
> > > > > -       unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> > > > > -
> > > > > -       if (!transhuge_vma_suitable(vma, addr))
> > > > > +       if (!transhuge_vma_size_ok(vma))
> > > > >                 return false;
> > > > >         if (vma_is_anonymous(vma))
> > > > >                 return __transparent_hugepage_enabled(vma);
> > > >
> > > > Do we need a check for vma->vm_pgoff alignment here, after
> > > > !vma_is_anonymous(), and now that we don't call
> > > > transhuge_vma_suitable()?
> > >
> > > Actually I was thinking about this too. But the THPeligible bit shown
> > > by smaps is a little bit ambiguous for file vma. The document says:
> > > "THPeligible" indicates whether the mapping is eligible for allocating
> > > THP pages - 1 if true, 0 otherwise.
> > >
> > > Even though it doesn't fulfill the alignment, it is still possible to
> > > get THP allocated, but just can't be PMD mapped. So the old behavior
> > > of THPeligible for file vma seems problematic, or at least doesn't
> > > match the document.
> >
> > I think the term "THP" is used ambiguously. Often, but not always, in
> > the code, folks will go out of their way to specify "hugepage-sized"
> > page vs "pmd-mapped hugepage" - but at least from my experience,
> > external documentation doesn't. Given that THP as a concept doesn't
> > make much sense without the possibility of pmd-mapping, I think
> > "THPeligible here means "pmd mappable". For example, AnonHugePages in
> > smaps means  pmd-mapped anon hugepages.
>
> Yeah, depends on the expectation.

The funny thing is I was the last one who touched the THPeligible. It
seems the document needs to be updated too to make "pmd mappable" more
explicitly.

>
> >
> > That all said - the following patches will delete
> > transparent_hugepage_active() anyways.
>
> Yes, how I could forget this :-( The following removal of
> transparent_hugepage_active() will restore the old behavior.
>
> >
> > > I should elaborate this in the commit log.
> > >
> > > >
> > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > index 84b9cf4b9be9..d0f8020164fc 100644
> > > > > --- a/mm/khugepaged.c
> > > > > +++ b/mm/khugepaged.c
> > > > > @@ -454,6 +454,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > >                                 vma->vm_pgoff, HPAGE_PMD_NR))
> > > > >                 return false;
> > > > >
> > > > > +       if (!transhuge_vma_size_ok(vma))
> > > > > +               return false;
> > > > > +
> > > > >         /* Enabled via shmem mount options or sysfs settings. */
> > > > >         if (shmem_file(vma->vm_file))
> > > > >                 return shmem_huge_enabled(vma);
> > > > > @@ -512,9 +515,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> > > > >                           unsigned long vm_flags)
> > > > >  {
> > > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > > > -           khugepaged_enabled() &&
> > > > > -           (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
> > > > > -            (vma->vm_end & HPAGE_PMD_MASK))) {
> > > > > +           khugepaged_enabled()) {
> > > > >                 if (hugepage_vma_check(vma, vm_flags))
> > > > >                         __khugepaged_enter(vma->vm_mm);
> > > > >         }
> > > > > @@ -2142,10 +2143,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> > > > >                         progress++;
> > > > >                         continue;
> > > > >                 }
> > > > > -               hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > > > +
> > > > > +               hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
> > > > >                 hend = vma->vm_end & HPAGE_PMD_MASK;
> > > > > -               if (hstart >= hend)
> > > > > -                       goto skip;
> > > > >                 if (khugepaged_scan.address > hend)
> > > > >                         goto skip;
> > > > >                 if (khugepaged_scan.address < hstart)
> > > >
> > > > Likewise, could do round_down() here (just a suggestion)
> > >
> > > Fine to me.
> > >
> > > >
> > > > > --
> > > > > 2.26.3
> > > > >
> > > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code
  2022-06-10 16:59     ` Yang Shi
@ 2022-06-10 22:03       ` Yang Shi
  2022-06-11  0:27         ` Zach O'Keefe
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-10 22:03 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > The hugepage_vma_revalidate() needs to check if the address is still in
> > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock,
> > > but it was open-coded, use transhuge_vma_suitable() to do the job.  And
> > > add proper comments for transhuge_vma_suitable().
> > >
> > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > ---
> > >  include/linux/huge_mm.h | 6 ++++++
> > >  mm/khugepaged.c         | 5 +----
> > >  2 files changed, 7 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index a8f61db47f2a..79d5919beb83 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > >         return false;
> > >  }
> > >
> > > +/*
> > > + * Do the below checks:
> > > + *   - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned.
> > > + *   - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE
> > > + *     area.
> > > + */
> >
> > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but
> > rather that linear_page_index(vma, round_up(vma->vm_start,
> > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was
>
> Yeah, you are right.
>
> > pretty confused about this (hopefully I have it right now - if not -
> > case and point :) ), so it might be a good opportunity to add some
> > extra commentary to help future travelers understand why this
> > constraint exists.
>
> I'm not fully sure I understand this 100%. I think this is related to
> how page cache is structured. I will try to add more comments.

How's about "The underlying THP is always properly aligned in page
cache, but it may be across the boundary of VMA if the VMA is
misaligned, so the THP can't be PMD mapped for this case."

>
> >
> > Also I wonder while we're at it if we can rename this to
> > transhuge_addr_aligned() or transhuge_addr_suitable() or something.
>
> I think it is still actually used to check vma.
>
> >
> > Otherwise I think the change is a nice cleanup.
> >
> > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > >                 unsigned long addr)
> > >  {
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 7a5d1c1a1833..ca1754d3a827 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > >                 struct vm_area_struct **vmap)
> > >  {
> > >         struct vm_area_struct *vma;
> > > -       unsigned long hstart, hend;
> > >
> > >         if (unlikely(khugepaged_test_exit(mm)))
> > >                 return SCAN_ANY_PROCESS;
> > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > >         if (!vma)
> > >                 return SCAN_VMA_NULL;
> > >
> > > -       hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > -       hend = vma->vm_end & HPAGE_PMD_MASK;
> > > -       if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > > +       if (!transhuge_vma_suitable(vma, address))
> > >                 return SCAN_ADDRESS_RANGE;
> > >         if (!hugepage_vma_check(vma, vma->vm_flags))
> > >                 return SCAN_VMA_CHECK;
> > > --
> > > 2.26.3
> > >
> > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code
  2022-06-10 22:03       ` Yang Shi
@ 2022-06-11  0:27         ` Zach O'Keefe
  2022-06-11  3:25           ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-11  0:27 UTC (permalink / raw)
  To: Yang Shi
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Fri, Jun 10, 2022 at 3:04 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > >
> > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > The hugepage_vma_revalidate() needs to check if the address is still in
> > > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock,
> > > > but it was open-coded, use transhuge_vma_suitable() to do the job.  And
> > > > add proper comments for transhuge_vma_suitable().
> > > >
> > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > ---
> > > >  include/linux/huge_mm.h | 6 ++++++
> > > >  mm/khugepaged.c         | 5 +----
> > > >  2 files changed, 7 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index a8f61db47f2a..79d5919beb83 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > >         return false;
> > > >  }
> > > >
> > > > +/*
> > > > + * Do the below checks:
> > > > + *   - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned.
> > > > + *   - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE
> > > > + *     area.
> > > > + */
> > >
> > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but
> > > rather that linear_page_index(vma, round_up(vma->vm_start,
> > > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was
> >
> > Yeah, you are right.
> >
> > > pretty confused about this (hopefully I have it right now - if not -
> > > case and point :) ), so it might be a good opportunity to add some
> > > extra commentary to help future travelers understand why this
> > > constraint exists.
> >
> > I'm not fully sure I understand this 100%. I think this is related to
> > how page cache is structured. I will try to add more comments.
>
> How's about "The underlying THP is always properly aligned in page
> cache, but it may be across the boundary of VMA if the VMA is
> misaligned, so the THP can't be PMD mapped for this case."

I could certainly still be wrong / am learning here - but I *thought*
the reason for this check was to make sure that the hugepage
to-be-collapsed is naturally aligned within the file (since, AFAIK,
without this constraint, different mm's might have different ideas
about where hugepages in the file should be).

> >
> > >
> > > Also I wonder while we're at it if we can rename this to
> > > transhuge_addr_aligned() or transhuge_addr_suitable() or something.
> >
> > I think it is still actually used to check vma.
> >
> > >
> > > Otherwise I think the change is a nice cleanup.
> > >
> > > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > >                 unsigned long addr)
> > > >  {
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index 7a5d1c1a1833..ca1754d3a827 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > >                 struct vm_area_struct **vmap)
> > > >  {
> > > >         struct vm_area_struct *vma;
> > > > -       unsigned long hstart, hend;
> > > >
> > > >         if (unlikely(khugepaged_test_exit(mm)))
> > > >                 return SCAN_ANY_PROCESS;
> > > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > >         if (!vma)
> > > >                 return SCAN_VMA_NULL;
> > > >
> > > > -       hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > > -       hend = vma->vm_end & HPAGE_PMD_MASK;
> > > > -       if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > > > +       if (!transhuge_vma_suitable(vma, address))
> > > >                 return SCAN_ADDRESS_RANGE;
> > > >         if (!hugepage_vma_check(vma, vma->vm_flags))
> > > >                 return SCAN_VMA_CHECK;
> > > > --
> > > > 2.26.3
> > > >
> > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code
  2022-06-11  0:27         ` Zach O'Keefe
@ 2022-06-11  3:25           ` Yang Shi
  2022-06-11 21:43             ` Zach O'Keefe
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-11  3:25 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Fri, Jun 10, 2022 at 5:28 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Fri, Jun 10, 2022 at 3:04 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > > >
> > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > >
> > > > > The hugepage_vma_revalidate() needs to check if the address is still in
> > > > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock,
> > > > > but it was open-coded, use transhuge_vma_suitable() to do the job.  And
> > > > > add proper comments for transhuge_vma_suitable().
> > > > >
> > > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > > ---
> > > > >  include/linux/huge_mm.h | 6 ++++++
> > > > >  mm/khugepaged.c         | 5 +----
> > > > >  2 files changed, 7 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > index a8f61db47f2a..79d5919beb83 100644
> > > > > --- a/include/linux/huge_mm.h
> > > > > +++ b/include/linux/huge_mm.h
> > > > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > > >         return false;
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * Do the below checks:
> > > > > + *   - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned.
> > > > > + *   - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE
> > > > > + *     area.
> > > > > + */
> > > >
> > > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but
> > > > rather that linear_page_index(vma, round_up(vma->vm_start,
> > > > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was
> > >
> > > Yeah, you are right.
> > >
> > > > pretty confused about this (hopefully I have it right now - if not -
> > > > case and point :) ), so it might be a good opportunity to add some
> > > > extra commentary to help future travelers understand why this
> > > > constraint exists.
> > >
> > > I'm not fully sure I understand this 100%. I think this is related to
> > > how page cache is structured. I will try to add more comments.
> >
> > How's about "The underlying THP is always properly aligned in page
> > cache, but it may be across the boundary of VMA if the VMA is
> > misaligned, so the THP can't be PMD mapped for this case."
>
> I could certainly still be wrong / am learning here - but I *thought*
> the reason for this check was to make sure that the hugepage
> to-be-collapsed is naturally aligned within the file (since, AFAIK,
> without this constraint, different mm's might have different ideas
> about where hugepages in the file should be).

The hugepage is definitely naturally aligned within the file, this is
guaranteed by how page cache is organized, you could find some example
code from shmem fault, for example, the below code snippet:

hindex = round_down(index, folio_nr_pages(folio));
error = shmem_add_to_page_cache(folio, mapping, hindex, NULL, gfp &
GFP_RECLAIM_MASK, charge_mm);

The index is actually rounded down to HPAGE_PMD_NR aligned.

The check in hugepage_vma_check() is used to guarantee there is an PMD
aligned area in the vma exactly overlapping with a PMD range in the
page cache. For example, you have a vma starting from 0x1000 maps to
the file's page offset of 0, even though you get THP for the file, it
can not be PMD mapped to the vma. But if it maps to the file's page
offset of 1, then starting from 0x200000 (assuming the vma is big
enough) it can PMD map the second THP in the page cache. Does it make
sense?

>
> > >
> > > >
> > > > Also I wonder while we're at it if we can rename this to
> > > > transhuge_addr_aligned() or transhuge_addr_suitable() or something.
> > >
> > > I think it is still actually used to check vma.
> > >
> > > >
> > > > Otherwise I think the change is a nice cleanup.
> > > >
> > > > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > > >                 unsigned long addr)
> > > > >  {
> > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > index 7a5d1c1a1833..ca1754d3a827 100644
> > > > > --- a/mm/khugepaged.c
> > > > > +++ b/mm/khugepaged.c
> > > > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > >                 struct vm_area_struct **vmap)
> > > > >  {
> > > > >         struct vm_area_struct *vma;
> > > > > -       unsigned long hstart, hend;
> > > > >
> > > > >         if (unlikely(khugepaged_test_exit(mm)))
> > > > >                 return SCAN_ANY_PROCESS;
> > > > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > >         if (!vma)
> > > > >                 return SCAN_VMA_NULL;
> > > > >
> > > > > -       hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > > > -       hend = vma->vm_end & HPAGE_PMD_MASK;
> > > > > -       if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > > > > +       if (!transhuge_vma_suitable(vma, address))
> > > > >                 return SCAN_ADDRESS_RANGE;
> > > > >         if (!hugepage_vma_check(vma, vma->vm_flags))
> > > > >                 return SCAN_VMA_CHECK;
> > > > > --
> > > > > 2.26.3
> > > > >
> > > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code
  2022-06-11  3:25           ` Yang Shi
@ 2022-06-11 21:43             ` Zach O'Keefe
  2022-06-14 17:40               ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-11 21:43 UTC (permalink / raw)
  To: Yang Shi
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On 10 Jun 20:25, Yang Shi wrote:
> On Fri, Jun 10, 2022 at 5:28 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On Fri, Jun 10, 2022 at 3:04 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > > > >
> > > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > >
> > > > > > The hugepage_vma_revalidate() needs to check if the address is still in
> > > > > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock,
> > > > > > but it was open-coded, use transhuge_vma_suitable() to do the job.  And
> > > > > > add proper comments for transhuge_vma_suitable().
> > > > > >
> > > > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > > > ---
> > > > > >  include/linux/huge_mm.h | 6 ++++++
> > > > > >  mm/khugepaged.c         | 5 +----
> > > > > >  2 files changed, 7 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > > index a8f61db47f2a..79d5919beb83 100644
> > > > > > --- a/include/linux/huge_mm.h
> > > > > > +++ b/include/linux/huge_mm.h
> > > > > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > > > >         return false;
> > > > > >  }
> > > > > >
> > > > > > +/*
> > > > > > + * Do the below checks:
> > > > > > + *   - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned.
> > > > > > + *   - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE
> > > > > > + *     area.
> > > > > > + */
> > > > >
> > > > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but
> > > > > rather that linear_page_index(vma, round_up(vma->vm_start,
> > > > > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was
> > > >
> > > > Yeah, you are right.
> > > >
> > > > > pretty confused about this (hopefully I have it right now - if not -
> > > > > case and point :) ), so it might be a good opportunity to add some
> > > > > extra commentary to help future travelers understand why this
> > > > > constraint exists.
> > > >
> > > > I'm not fully sure I understand this 100%. I think this is related to
> > > > how page cache is structured. I will try to add more comments.
> > >
> > > How's about "The underlying THP is always properly aligned in page
> > > cache, but it may be across the boundary of VMA if the VMA is
> > > misaligned, so the THP can't be PMD mapped for this case."
> >
> > I could certainly still be wrong / am learning here - but I *thought*
> > the reason for this check was to make sure that the hugepage
> > to-be-collapsed is naturally aligned within the file (since, AFAIK,
> > without this constraint, different mm's might have different ideas
> > about where hugepages in the file should be).
> 
> The hugepage is definitely naturally aligned within the file, this is
> guaranteed by how page cache is organized, you could find some example
> code from shmem fault, for example, the below code snippet:
> 
> hindex = round_down(index, folio_nr_pages(folio));
> error = shmem_add_to_page_cache(folio, mapping, hindex, NULL, gfp &
> GFP_RECLAIM_MASK, charge_mm);
> 
> The index is actually rounded down to HPAGE_PMD_NR aligned.

Thanks for the reference here.

> The check in hugepage_vma_check() is used to guarantee there is an PMD
> aligned area in the vma exactly overlapping with a PMD range in the
> page cache. For example, you have a vma starting from 0x1000 maps to
> the file's page offset of 0, even though you get THP for the file, it
> can not be PMD mapped to the vma. But if it maps to the file's page
> offset of 1, then starting from 0x200000 (assuming the vma is big
> enough) it can PMD map the second THP in the page cache. Does it make
> sense?
>

Yes, this makes sense - thanks for providing your insight. I think I was
basically thinking the same thing ; except your description is more accurate
(namely, that is *some* pmd-aligned range covered by the vma that maps to a
hugepage-aligned offset in the file (I mistakenly took this to be the *first*
pmd-aligned address >= vma->vm_start)).

Also, with this in mind, your previous suggested comment makes sense. If I had
to take a stab at it, I would say something like:

"The hugepage is guaranteed to be hugepage-aligned within the file, but we must
check that the PMD-aligned addresses in the VMA map to PMD-aligned offsets
within the file, else the hugepage will not be PMD-mappable".

WDYT?

> >
> > > >
> > > > >
> > > > > Also I wonder while we're at it if we can rename this to
> > > > > transhuge_addr_aligned() or transhuge_addr_suitable() or something.
> > > >
> > > > I think it is still actually used to check vma.
> > > >
> > > > >
> > > > > Otherwise I think the change is a nice cleanup.
> > > > >
> > > > > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > > > >                 unsigned long addr)
> > > > > >  {
> > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > > index 7a5d1c1a1833..ca1754d3a827 100644
> > > > > > --- a/mm/khugepaged.c
> > > > > > +++ b/mm/khugepaged.c
> > > > > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > > >                 struct vm_area_struct **vmap)
> > > > > >  {
> > > > > >         struct vm_area_struct *vma;
> > > > > > -       unsigned long hstart, hend;
> > > > > >
> > > > > >         if (unlikely(khugepaged_test_exit(mm)))
> > > > > >                 return SCAN_ANY_PROCESS;
> > > > > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > > >         if (!vma)
> > > > > >                 return SCAN_VMA_NULL;
> > > > > >
> > > > > > -       hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > > > > -       hend = vma->vm_end & HPAGE_PMD_MASK;
> > > > > > -       if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > > > > > +       if (!transhuge_vma_suitable(vma, address))
> > > > > >                 return SCAN_ADDRESS_RANGE;
> > > > > >         if (!hugepage_vma_check(vma, vma->vm_flags))
> > > > > >                 return SCAN_VMA_CHECK;
> > > > > > --
> > > > > > 2.26.3
> > > > > >
> > > > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled()
  2022-06-10 21:07       ` Yang Shi
@ 2022-06-13 14:54         ` Zach O'Keefe
  2022-06-14 18:51           ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-13 14:54 UTC (permalink / raw)
  To: Yang Shi
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On 10 Jun 14:07, Yang Shi wrote:
> On Fri, Jun 10, 2022 at 10:24 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Thu, Jun 9, 2022 at 7:22 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > >
> > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > The page fault path checks THP eligibility with
> > > > __transhuge_page_enabled() which does the similar thing as
> > > > hugepage_vma_check(), so use hugepage_vma_check() instead.
> > > >
> > > > However page fault allows DAX and !anon_vma cases, so added a new flag,
> > > > in_pf, to hugepage_vma_check() to make page fault work correctly.
> > > >
> > > > The in_pf flag is also used to skip shmem and file THP for page fault
> > > > since shmem handles THP in its own shmem_fault() and file THP allocation
> > > > on fault is not supported yet.
> > > >
> > > > Also remove hugepage_vma_enabled() since hugepage_vma_check() is the
> > > > only caller now, it is not necessary to have a helper function.
> > > >
> > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > ---
> > > >  fs/proc/task_mmu.c         |  2 +-
> > > >  include/linux/huge_mm.h    | 57 ++------------------------------------
> > > >  include/linux/khugepaged.h |  2 +-
> > > >  mm/huge_memory.c           | 25 ++++++++++++-----
> > > >  mm/khugepaged.c            |  8 +++---
> > > >  mm/memory.c                |  7 +++--
> > > >  6 files changed, 31 insertions(+), 70 deletions(-)
> > > >
> > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > > index fd79566e204c..a0850303baec 100644
> > > > --- a/fs/proc/task_mmu.c
> > > > +++ b/fs/proc/task_mmu.c
> > > > @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
> > > >         __show_smap(m, &mss, false);
> > > >
> > > >         seq_printf(m, "THPeligible:    %d\n",
> > > > -                  hugepage_vma_check(vma, vma->vm_flags, true));
> > > > +                  hugepage_vma_check(vma, vma->vm_flags, true, false));
> > > >
> > > >         if (arch_pkeys_enabled())
> > > >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index f561c3e16def..d478e8875023 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -153,48 +153,6 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > >         return true;
> > > >  }
> > > >
> > > > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > > > -                                         unsigned long vm_flags)
> > > > -{
> > > > -       /* Explicitly disabled through madvise. */
> > > > -       if ((vm_flags & VM_NOHUGEPAGE) ||
> > > > -           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > > > -               return false;
> > > > -       return true;
> > > > -}
> > > > -
> > > > -/*
> > > > - * to be used on vmas which are known to support THP.
> > > > - * Use transparent_hugepage_active otherwise
> > > > - */
> > > > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > > > -{
> > > > -
> > > > -       /*
> > > > -        * If the hardware/firmware marked hugepage support disabled.
> > > > -        */
> > > > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> > > > -               return false;
> > > > -
> > > > -       if (!transhuge_vma_enabled(vma, vma->vm_flags))
> > > > -               return false;
> > > > -
> > > > -       if (vma_is_temporary_stack(vma))
> > > > -               return false;
> > > > -
> > > > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> > > > -               return true;
> > > > -
> > > > -       if (vma_is_dax(vma))
> > > > -               return true;
> > > > -
> > > > -       if (transparent_hugepage_flags &
> > > > -                               (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
> > > > -               return !!(vma->vm_flags & VM_HUGEPAGE);
> > > > -
> > > > -       return false;
> > > > -}
> > > > -
> > > >  static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > > >  {
> > > >         struct inode *inode;
> > > > @@ -211,7 +169,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > > >
> > > >  bool hugepage_vma_check(struct vm_area_struct *vma,
> > > >                         unsigned long vm_flags,
> > > > -                       bool smaps);
> > > > +                       bool smaps, bool in_pf);
> > > >
> > > >  #define transparent_hugepage_use_zero_page()                           \
> > > >         (transparent_hugepage_flags &                                   \
> > > > @@ -355,11 +313,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio)
> > > >         return false;
> > > >  }
> > > >
> > > > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > > > -{
> > > > -       return false;
> > > > -}
> > > > -
> > > >  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > >  {
> > > >         return false;
> > > > @@ -371,15 +324,9 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > >         return false;
> > > >  }
> > > >
> > > > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > > > -                                         unsigned long vm_flags)
> > > > -{
> > > > -       return false;
> > > > -}
> > > > -
> > > >  static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> > > >                                        unsigned long vm_flags,
> > > > -                                      bool smaps)
> > > > +                                      bool smaps, bool in_pf)
> > > >  {
> > > >         return false;
> > > >  }
> > > > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > > > index 8a6452e089ca..e047be601268 100644
> > > > --- a/include/linux/khugepaged.h
> > > > +++ b/include/linux/khugepaged.h
> > > > @@ -55,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
> > > >  {
> > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > >             khugepaged_enabled()) {
> > > > -               if (hugepage_vma_check(vma, vm_flags, false))
> > > > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> > > >                         __khugepaged_enter(vma->vm_mm);
> > > >         }
> > > >  }
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > index bc8370856e85..b95786ada466 100644
> > > > --- a/mm/huge_memory.c
> > > > +++ b/mm/huge_memory.c
> > > > @@ -71,17 +71,25 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > > >
> > > >  bool hugepage_vma_check(struct vm_area_struct *vma,
> > > >                         unsigned long vm_flags,
> > > > -                       bool smaps)
> > > > +                       bool smaps, bool in_pf)
> > > >  {
> > > > -       if (!transhuge_vma_enabled(vma, vm_flags))
> > > > +       /* Explicitly disabled through madvise or prctl. */
> > >
> > > Or s390 kvm (not that this has to be exhaustively maintained).
> > >
> > > > +       if ((vm_flags & VM_NOHUGEPAGE) ||
> > > > +           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > > > +               return false;
> > > > +       /*
> > > > +        * If the hardware/firmware marked hugepage support disabled.
> > > > +        */
> > > > +       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> > > >                 return false;
> > >
> > > This introduces an extra check for khugepaged path. I don't know
> > > enough about TRANSPARENT_HUGEPAGE_NEVER_DAX, but I assume this is ok?
> > > What would have happened previously if khugepaged tried to collapse
> > > this memory?
> >
> > Please refer to commit bae849538157 ("mm/pmem: avoid inserting
> > hugepage PTE entry with fsdax if hugepage support is disabled") for
> > why this flag was introduced.
> >
> > It is set if hardware doesn't support hugepages, and khugepaged
> > doesn't collapse since khugepaged won't be started at all.
> >
> > But this flag needs to be checked in the page fault path.
> >

Thanks for the ref to the commit. I'm not sure I understand it in its entirety,
but at least I can tell khugepaged won't be started :)

> > >
> > > > +       /* Special VMA and hugetlb VMA */
> > > >         if (vm_flags & VM_NO_KHUGEPAGED)
> > > >                 return false;
> > >
> > > This adds an extra check along the fault path. Is it also safe to add?
> >
> > I think it is safe since hugepage_vma_check() is just used by THP.
> > Hugetlb has its own page fault handler.
> 
> I just found one exception. The fuse dax has VM_MIXEDMAP set for its
> vmas, so this check should be moved after vma_is_dax() check.
> 
> AFAICT, only dax supports huge_fault() and dax vmas don't have any
> VM_SPECIAL flags set other than fuse.
>

Ordering wrt VM_NO_KHUGEPAGED check seems fine. We could always use in_pf to opt
out of this check, but I think itemizing where collapse and fault paths are
different would be good.

> >
> > >
> > > > -       /* Don't run khugepaged against DAX vma */
> > > > +       /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
> > > >         if (vma_is_dax(vma))
> > > > -               return false;
> > > > +               return in_pf;
> > >
> > > I assume vma_is_temporary_stack() and vma_is_dax() is mutually exclusive.
> >
> > I think so.
> >
> > >
> > > >         if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > > >                                 vma->vm_pgoff, HPAGE_PMD_NR))
> > > > @@ -91,7 +99,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > >                 return false;
> > > >
> > > >         /* Enabled via shmem mount options or sysfs settings. */
> > > > -       if (shmem_file(vma->vm_file))
> > > > +       if (!in_pf && shmem_file(vma->vm_file))
> > > >                 return shmem_huge_enabled(vma);
> > >
> > > Will shmem_file() ever be true in the fault path? Or is this just an
> > > optimization?
> >
> > It could be true. But shmem has its own implementation for huge page
> > fault and doesn't implement huge_fault() in its vm_operations, so it
> > will fallback even though "in_pf" is not checked.
> >
> > But xfs does have huge_fault() implemented, so it may try to allocate
> > THP for non-DAX xfs files. So the "in_pf" flag is introduced to handle
> > this. Since we need this flag anyway, why not use it to return earlier
> > for shmem instead of relying on fallback.
> >
> > Anyway this is all because __transparent_huge_enabled() is replaced by
> > hugepage_vma_check().
> >

Thanks for the explanation. Admittedly I don't fully understand the involvement
of xfs in the shmem case, but general "fail early" logic seems fine to me.

> > > >         if (!khugepaged_enabled())
> > > > @@ -102,7 +110,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > >                 return false;
> > > >
> > > >         /* Only regular file is valid */
> > > > -       if (file_thp_enabled(vma))
> > > > +       if (!in_pf && file_thp_enabled(vma))
> > > >                 return true;
> > >
> > > Likewise for file_thp_enabled()
> >
> > Yes, same as the above.

Ditto.

> > >
> > > >         if (!vma_is_anonymous(vma))
> > > > @@ -114,9 +122,12 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > >         /*
> > > >          * THPeligible bit of smaps should show 1 for proper VMAs even
> > > >          * though anon_vma is not initialized yet.
> > > > +        *
> > > > +        * Allow page fault since anon_vma may be not initialized until
> > > > +        * the first page fault.
> > > >          */
> > > >         if (!vma->anon_vma)
> > > > -               return smaps;
> > > > +               return (smaps || in_pf);
> > > >
> > > >         return true;
> > > >  }
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index aa0769e3b0d9..ab6183c5489f 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -473,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> > > >  {
> > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > >             khugepaged_enabled()) {
> > > > -               if (hugepage_vma_check(vma, vm_flags, false))
> > > > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> > > >                         __khugepaged_enter(vma->vm_mm);
> > > >         }
> > > >  }
> > > > @@ -918,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > >
> > > >         if (!transhuge_vma_suitable(vma, address))
> > > >                 return SCAN_ADDRESS_RANGE;
> > > > -       if (!hugepage_vma_check(vma, vma->vm_flags, false))
> > > > +       if (!hugepage_vma_check(vma, vma->vm_flags, false, false))
> > > >                 return SCAN_VMA_CHECK;
> > > >         return 0;
> > > >  }
> > > > @@ -1399,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
> > > >          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
> > > >          * will not fail the vma for missing VM_HUGEPAGE
> > > >          */
> > > > -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> > > > +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false))
> > > >                 return;
> > > >
> > > >         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> > > > @@ -2089,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> > > >                         progress++;
> > > >                         break;
> > > >                 }
> > > > -               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> > > > +               if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) {
> > > >  skip:
> > > >                         progress++;
> > > >                         continue;
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index bc5d40eec5d5..673f7561a30a 100644
> > > > --- a/mm/memory.c
> > > > +++ b/mm/memory.c
> > > > @@ -4962,6 +4962,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > > >                 .gfp_mask = __get_fault_gfp_mask(vma),
> > > >         };
> > > >         struct mm_struct *mm = vma->vm_mm;
> > > > +       unsigned long vm_flags = vma->vm_flags;
> > > >         pgd_t *pgd;
> > > >         p4d_t *p4d;
> > > >         vm_fault_t ret;
> > > > @@ -4975,7 +4976,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > > >         if (!vmf.pud)
> > > >                 return VM_FAULT_OOM;
> > > >  retry_pud:
> > > > -       if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
> > > > +       if (pud_none(*vmf.pud) &&
> > > > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> > > >                 ret = create_huge_pud(&vmf);
> > > >                 if (!(ret & VM_FAULT_FALLBACK))
> > > >                         return ret;
> > > > @@ -5008,7 +5010,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > > >         if (pud_trans_unstable(vmf.pud))
> > > >                 goto retry_pud;
> > > >
> > > > -       if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
> > > > +       if (pmd_none(*vmf.pmd) &&
> > > > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> > > >                 ret = create_huge_pmd(&vmf);
> > > >                 if (!(ret & VM_FAULT_FALLBACK))
> > > >                         return ret;
> > > > --
> > > > 2.26.3
> > > >
> > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 5/7] mm: thp: kill transparent_hugepage_active()
  2022-06-10 17:02     ` Yang Shi
@ 2022-06-13 15:06       ` Zach O'Keefe
  2022-06-14 19:16         ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-13 15:06 UTC (permalink / raw)
  To: Yang Shi
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On 10 Jun 10:02, Yang Shi wrote:
> On Thu, Jun 9, 2022 at 6:03 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > The transparent_hugepage_active() was introduced to show THP eligibility
> > > bit in smaps in proc, smaps is the only user.  But it actually does the
> > > similar check as hugepage_vma_check() which is used by khugepaged.  We
> > > definitely don't have to maintain two similar checks, so kill
> > > transparent_hugepage_active().
> >
> > I never realized smaps was the only user! Great!
> >
> > > Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it
> > > is not only for khugepaged anymore.
> > >
> > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > ---
> > >  fs/proc/task_mmu.c         |  2 +-
> > >  include/linux/huge_mm.h    | 16 +++++++-----
> > >  include/linux/khugepaged.h |  4 +--
> > >  mm/huge_memory.c           | 50 ++++++++++++++++++++++++++++++++-----
> > >  mm/khugepaged.c            | 51 +++-----------------------------------
> > >  5 files changed, 60 insertions(+), 63 deletions(-)
> > >
> > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > index 2dd8c8a66924..fd79566e204c 100644
> > > --- a/fs/proc/task_mmu.c
> > > +++ b/fs/proc/task_mmu.c
> > > @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
> > >         __show_smap(m, &mss, false);
> > >
> > >         seq_printf(m, "THPeligible:    %d\n",
> > > -                  transparent_hugepage_active(vma));
> > > +                  hugepage_vma_check(vma, vma->vm_flags, true));
> > >
> > >         if (arch_pkeys_enabled())
> > >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index 79d5919beb83..f561c3e16def 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -209,7 +209,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > >                !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> > >  }
> > >
> > > -bool transparent_hugepage_active(struct vm_area_struct *vma);
> > > +bool hugepage_vma_check(struct vm_area_struct *vma,
> > > +                       unsigned long vm_flags,
> > > +                       bool smaps);
> > >
> > >  #define transparent_hugepage_use_zero_page()                           \
> > >         (transparent_hugepage_flags &                                   \
> > > @@ -358,11 +360,6 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > >         return false;
> > >  }
> > >
> > > -static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > -{
> > > -       return false;
> > > -}
> > > -
> > >  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > >  {
> > >         return false;
> > > @@ -380,6 +377,13 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > >         return false;
> > >  }
> > >
> > > +static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> > > +                                      unsigned long vm_flags,
> > > +                                      bool smaps)
> > > +{
> > > +       return false;
> > > +}
> > > +
> > >  static inline void prep_transhuge_page(struct page *page) {}
> > >
> > >  #define transparent_hugepage_flags 0UL
> > > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > > index 392d34c3c59a..8a6452e089ca 100644
> > > --- a/include/linux/khugepaged.h
> > > +++ b/include/linux/khugepaged.h
> > > @@ -10,8 +10,6 @@ extern struct attribute_group khugepaged_attr_group;
> > >  extern int khugepaged_init(void);
> > >  extern void khugepaged_destroy(void);
> > >  extern int start_stop_khugepaged(void);
> > > -extern bool hugepage_vma_check(struct vm_area_struct *vma,
> > > -                              unsigned long vm_flags);
> > >  extern void __khugepaged_enter(struct mm_struct *mm);
> > >  extern void __khugepaged_exit(struct mm_struct *mm);
> > >  extern void khugepaged_enter_vma(struct vm_area_struct *vma,
> > > @@ -57,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
> > >  {
> > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > >             khugepaged_enabled()) {
> > > -               if (hugepage_vma_check(vma, vm_flags))
> > > +               if (hugepage_vma_check(vma, vm_flags, false))
> > >                         __khugepaged_enter(vma->vm_mm);
> > >         }
> > >  }
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 36ada544e494..bc8370856e85 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -69,18 +69,56 @@ static atomic_t huge_zero_refcount;
> > >  struct page *huge_zero_page __read_mostly;
> > >  unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > >
> > > -bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > +bool hugepage_vma_check(struct vm_area_struct *vma,
> > > +                       unsigned long vm_flags,
> > > +                       bool smaps)
> > >  {
> > > +       if (!transhuge_vma_enabled(vma, vm_flags))
> > > +               return false;
> > > +
> > > +       if (vm_flags & VM_NO_KHUGEPAGED)
> > > +               return false;
> > > +
> > > +       /* Don't run khugepaged against DAX vma */
> > > +       if (vma_is_dax(vma))
> > > +               return false;
> > > +
> > > +       if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > > +                               vma->vm_pgoff, HPAGE_PMD_NR))
> > > +               return false;
> > > +
> > >         if (!transhuge_vma_size_ok(vma))
> > >                 return false;

I know we just introduced transhuge_vma_size_ok(), but is there a way to
consolidate the above two checks into a single transhuge_vma_suitable(), the
same way it used to be done in transparent_hugepage_active()? I.e.

transhuge_vma_suitable(vma, vma->vm_end - HPAGE_PMD_SIZE).

Which checks if the vma can hold an aligned hugepage, as well as centralizes
the (what I think to be) complicated file mapping check.

> > > -       if (vma_is_anonymous(vma))
> > > -               return __transparent_hugepage_enabled(vma);
> > > -       if (vma_is_shmem(vma))
> > > +
> > > +       /* Enabled via shmem mount options or sysfs settings. */
> > > +       if (shmem_file(vma->vm_file))
> > >                 return shmem_huge_enabled(vma);
> > > -       if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
> > > +
> > > +       if (!khugepaged_enabled())
> > > +               return false;
> > > +
> > > +       /* THP settings require madvise. */
> > > +       if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> > > +               return false;
> > > +
> > > +       /* Only regular file is valid */
> > > +       if (file_thp_enabled(vma))
> > >                 return true;
> > >
> > > -       return false;
> > > +       if (!vma_is_anonymous(vma))
> > > +               return false;
> > > +
> > > +       if (vma_is_temporary_stack(vma))
> > > +               return false;
> > > +
> > > +       /*
> > > +        * THPeligible bit of smaps should show 1 for proper VMAs even
> > > +        * though anon_vma is not initialized yet.
> > > +        */
> > > +       if (!vma->anon_vma)
> > > +               return smaps;
> > > +
> > > +       return true;
> > >  }
> >
> > There are a few cases where the return value for smaps will be
> > different from before. I presume this won't be an issue, and that any
> > difference resulting from this change is actually a positive
> > difference, given it more accurately reflects the thp eligibility of
> > the vma? For example, a VM_NO_KHUGEPAGED-marked vma might now show 0
> > where it otherwise showed 1.
> 
> Yes, returning 1 for VM_NO_KHUGEPAGED vmas is wrong. Actually TBH I
> suspect very few people actually use this bit. Anyway I will elaborate
> this in the commit log.
> 
> >
> > >  static bool get_huge_zero_page(void)
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index ca1754d3a827..aa0769e3b0d9 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -437,49 +437,6 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
> > >         return atomic_read(&mm->mm_users) == 0;
> > >  }
> > >
> > > -bool hugepage_vma_check(struct vm_area_struct *vma,
> > > -                       unsigned long vm_flags)
> > > -{
> > > -       if (!transhuge_vma_enabled(vma, vm_flags))
> > > -               return false;
> > > -
> > > -       if (vm_flags & VM_NO_KHUGEPAGED)
> > > -               return false;
> > > -
> > > -       /* Don't run khugepaged against DAX vma */
> > > -       if (vma_is_dax(vma))
> > > -               return false;
> > > -
> > > -       if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > > -                               vma->vm_pgoff, HPAGE_PMD_NR))
> > > -               return false;
> > > -
> > > -       if (!transhuge_vma_size_ok(vma))
> > > -               return false;
> > > -
> > > -       /* Enabled via shmem mount options or sysfs settings. */
> > > -       if (shmem_file(vma->vm_file))
> > > -               return shmem_huge_enabled(vma);
> > > -
> > > -       if (!khugepaged_enabled())
> > > -               return false;
> > > -
> > > -       /* THP settings require madvise. */
> > > -       if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> > > -               return false;
> > > -
> > > -       /* Only regular file is valid */
> > > -       if (file_thp_enabled(vma))
> > > -               return true;
> > > -
> > > -       if (!vma->anon_vma || !vma_is_anonymous(vma))
> > > -               return false;
> > > -       if (vma_is_temporary_stack(vma))
> > > -               return false;
> > > -
> > > -       return true;
> > > -}
> > > -
> > >  void __khugepaged_enter(struct mm_struct *mm)
> > >  {
> > >         struct mm_slot *mm_slot;
> > > @@ -516,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> > >  {
> > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > >             khugepaged_enabled()) {
> > > -               if (hugepage_vma_check(vma, vm_flags))
> > > +               if (hugepage_vma_check(vma, vm_flags, false))
> > >                         __khugepaged_enter(vma->vm_mm);
> > >         }
> > >  }
> > > @@ -961,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > >
> > >         if (!transhuge_vma_suitable(vma, address))
> > >                 return SCAN_ADDRESS_RANGE;
> > > -       if (!hugepage_vma_check(vma, vma->vm_flags))
> > > +       if (!hugepage_vma_check(vma, vma->vm_flags, false))
> > >                 return SCAN_VMA_CHECK;
> > >         return 0;
> > >  }
> > > @@ -1442,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
> > >          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
> > >          * will not fail the vma for missing VM_HUGEPAGE
> > >          */
> > > -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
> > > +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> > >                 return;
> > >
> > >         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> > > @@ -2132,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> > >                         progress++;
> > >                         break;
> > >                 }
> > > -               if (!hugepage_vma_check(vma, vma->vm_flags)) {
> > > +               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> > >  skip:
> > >                         progress++;
> > >                         continue;
> > > --
> > > 2.26.3
> > >
> > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code
  2022-06-11 21:43             ` Zach O'Keefe
@ 2022-06-14 17:40               ` Yang Shi
  0 siblings, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-14 17:40 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Sat, Jun 11, 2022 at 2:43 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On 10 Jun 20:25, Yang Shi wrote:
> > On Fri, Jun 10, 2022 at 5:28 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > >
> > > On Fri, Jun 10, 2022 at 3:04 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > > >
> > > > > > > The hugepage_vma_revalidate() needs to check if the address is still in
> > > > > > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock,
> > > > > > > but it was open-coded, use transhuge_vma_suitable() to do the job.  And
> > > > > > > add proper comments for transhuge_vma_suitable().
> > > > > > >
> > > > > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > > > > ---
> > > > > > >  include/linux/huge_mm.h | 6 ++++++
> > > > > > >  mm/khugepaged.c         | 5 +----
> > > > > > >  2 files changed, 7 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > > > index a8f61db47f2a..79d5919beb83 100644
> > > > > > > --- a/include/linux/huge_mm.h
> > > > > > > +++ b/include/linux/huge_mm.h
> > > > > > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > > > > >         return false;
> > > > > > >  }
> > > > > > >
> > > > > > > +/*
> > > > > > > + * Do the below checks:
> > > > > > > + *   - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned.
> > > > > > > + *   - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE
> > > > > > > + *     area.
> > > > > > > + */
> > > > > >
> > > > > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but
> > > > > > rather that linear_page_index(vma, round_up(vma->vm_start,
> > > > > > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was
> > > > >
> > > > > Yeah, you are right.
> > > > >
> > > > > > pretty confused about this (hopefully I have it right now - if not -
> > > > > > case and point :) ), so it might be a good opportunity to add some
> > > > > > extra commentary to help future travelers understand why this
> > > > > > constraint exists.
> > > > >
> > > > > I'm not fully sure I understand this 100%. I think this is related to
> > > > > how page cache is structured. I will try to add more comments.
> > > >
> > > > How's about "The underlying THP is always properly aligned in page
> > > > cache, but it may be across the boundary of VMA if the VMA is
> > > > misaligned, so the THP can't be PMD mapped for this case."
> > >
> > > I could certainly still be wrong / am learning here - but I *thought*
> > > the reason for this check was to make sure that the hugepage
> > > to-be-collapsed is naturally aligned within the file (since, AFAIK,
> > > without this constraint, different mm's might have different ideas
> > > about where hugepages in the file should be).
> >
> > The hugepage is definitely naturally aligned within the file, this is
> > guaranteed by how page cache is organized, you could find some example
> > code from shmem fault, for example, the below code snippet:
> >
> > hindex = round_down(index, folio_nr_pages(folio));
> > error = shmem_add_to_page_cache(folio, mapping, hindex, NULL, gfp &
> > GFP_RECLAIM_MASK, charge_mm);
> >
> > The index is actually rounded down to HPAGE_PMD_NR aligned.
>
> Thanks for the reference here.
>
> > The check in hugepage_vma_check() is used to guarantee there is an PMD
> > aligned area in the vma exactly overlapping with a PMD range in the
> > page cache. For example, you have a vma starting from 0x1000 maps to
> > the file's page offset of 0, even though you get THP for the file, it
> > can not be PMD mapped to the vma. But if it maps to the file's page
> > offset of 1, then starting from 0x200000 (assuming the vma is big
> > enough) it can PMD map the second THP in the page cache. Does it make
> > sense?
> >
>
> Yes, this makes sense - thanks for providing your insight. I think I was
> basically thinking the same thing ; except your description is more accurate
> (namely, that is *some* pmd-aligned range covered by the vma that maps to a
> hugepage-aligned offset in the file (I mistakenly took this to be the *first*
> pmd-aligned address >= vma->vm_start)).
>
> Also, with this in mind, your previous suggested comment makes sense. If I had
> to take a stab at it, I would say something like:
>
> "The hugepage is guaranteed to be hugepage-aligned within the file, but we must
> check that the PMD-aligned addresses in the VMA map to PMD-aligned offsets
> within the file, else the hugepage will not be PMD-mappable".
>
> WDYT?

Looks good to me. Thanks for the wording.

>
> > >
> > > > >
> > > > > >
> > > > > > Also I wonder while we're at it if we can rename this to
> > > > > > transhuge_addr_aligned() or transhuge_addr_suitable() or something.
> > > > >
> > > > > I think it is still actually used to check vma.
> > > > >
> > > > > >
> > > > > > Otherwise I think the change is a nice cleanup.
> > > > > >
> > > > > > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > > > > >                 unsigned long addr)
> > > > > > >  {
> > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > > > index 7a5d1c1a1833..ca1754d3a827 100644
> > > > > > > --- a/mm/khugepaged.c
> > > > > > > +++ b/mm/khugepaged.c
> > > > > > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > > > >                 struct vm_area_struct **vmap)
> > > > > > >  {
> > > > > > >         struct vm_area_struct *vma;
> > > > > > > -       unsigned long hstart, hend;
> > > > > > >
> > > > > > >         if (unlikely(khugepaged_test_exit(mm)))
> > > > > > >                 return SCAN_ANY_PROCESS;
> > > > > > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > > > >         if (!vma)
> > > > > > >                 return SCAN_VMA_NULL;
> > > > > > >
> > > > > > > -       hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > > > > > -       hend = vma->vm_end & HPAGE_PMD_MASK;
> > > > > > > -       if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > > > > > > +       if (!transhuge_vma_suitable(vma, address))
> > > > > > >                 return SCAN_ADDRESS_RANGE;
> > > > > > >         if (!hugepage_vma_check(vma, vma->vm_flags))
> > > > > > >                 return SCAN_VMA_CHECK;
> > > > > > > --
> > > > > > > 2.26.3
> > > > > > >
> > > > > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled()
  2022-06-13 14:54         ` Zach O'Keefe
@ 2022-06-14 18:51           ` Yang Shi
  2022-06-14 23:55             ` Zach O'Keefe
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-14 18:51 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Mon, Jun 13, 2022 at 7:54 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On 10 Jun 14:07, Yang Shi wrote:
> > On Fri, Jun 10, 2022 at 10:24 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Thu, Jun 9, 2022 at 7:22 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > > >
> > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > >
> > > > > The page fault path checks THP eligibility with
> > > > > __transhuge_page_enabled() which does the similar thing as
> > > > > hugepage_vma_check(), so use hugepage_vma_check() instead.
> > > > >
> > > > > However page fault allows DAX and !anon_vma cases, so added a new flag,
> > > > > in_pf, to hugepage_vma_check() to make page fault work correctly.
> > > > >
> > > > > The in_pf flag is also used to skip shmem and file THP for page fault
> > > > > since shmem handles THP in its own shmem_fault() and file THP allocation
> > > > > on fault is not supported yet.
> > > > >
> > > > > Also remove hugepage_vma_enabled() since hugepage_vma_check() is the
> > > > > only caller now, it is not necessary to have a helper function.
> > > > >
> > > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > > ---
> > > > >  fs/proc/task_mmu.c         |  2 +-
> > > > >  include/linux/huge_mm.h    | 57 ++------------------------------------
> > > > >  include/linux/khugepaged.h |  2 +-
> > > > >  mm/huge_memory.c           | 25 ++++++++++++-----
> > > > >  mm/khugepaged.c            |  8 +++---
> > > > >  mm/memory.c                |  7 +++--
> > > > >  6 files changed, 31 insertions(+), 70 deletions(-)
> > > > >
> > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > > > index fd79566e204c..a0850303baec 100644
> > > > > --- a/fs/proc/task_mmu.c
> > > > > +++ b/fs/proc/task_mmu.c
> > > > > @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
> > > > >         __show_smap(m, &mss, false);
> > > > >
> > > > >         seq_printf(m, "THPeligible:    %d\n",
> > > > > -                  hugepage_vma_check(vma, vma->vm_flags, true));
> > > > > +                  hugepage_vma_check(vma, vma->vm_flags, true, false));
> > > > >
> > > > >         if (arch_pkeys_enabled())
> > > > >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > index f561c3e16def..d478e8875023 100644
> > > > > --- a/include/linux/huge_mm.h
> > > > > +++ b/include/linux/huge_mm.h
> > > > > @@ -153,48 +153,6 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > > >         return true;
> > > > >  }
> > > > >
> > > > > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > > > > -                                         unsigned long vm_flags)
> > > > > -{
> > > > > -       /* Explicitly disabled through madvise. */
> > > > > -       if ((vm_flags & VM_NOHUGEPAGE) ||
> > > > > -           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > > > > -               return false;
> > > > > -       return true;
> > > > > -}
> > > > > -
> > > > > -/*
> > > > > - * to be used on vmas which are known to support THP.
> > > > > - * Use transparent_hugepage_active otherwise
> > > > > - */
> > > > > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > > > > -{
> > > > > -
> > > > > -       /*
> > > > > -        * If the hardware/firmware marked hugepage support disabled.
> > > > > -        */
> > > > > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> > > > > -               return false;
> > > > > -
> > > > > -       if (!transhuge_vma_enabled(vma, vma->vm_flags))
> > > > > -               return false;
> > > > > -
> > > > > -       if (vma_is_temporary_stack(vma))
> > > > > -               return false;
> > > > > -
> > > > > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> > > > > -               return true;
> > > > > -
> > > > > -       if (vma_is_dax(vma))
> > > > > -               return true;
> > > > > -
> > > > > -       if (transparent_hugepage_flags &
> > > > > -                               (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
> > > > > -               return !!(vma->vm_flags & VM_HUGEPAGE);
> > > > > -
> > > > > -       return false;
> > > > > -}
> > > > > -
> > > > >  static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > > > >  {
> > > > >         struct inode *inode;
> > > > > @@ -211,7 +169,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > > > >
> > > > >  bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > >                         unsigned long vm_flags,
> > > > > -                       bool smaps);
> > > > > +                       bool smaps, bool in_pf);
> > > > >
> > > > >  #define transparent_hugepage_use_zero_page()                           \
> > > > >         (transparent_hugepage_flags &                                   \
> > > > > @@ -355,11 +313,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio)
> > > > >         return false;
> > > > >  }
> > > > >
> > > > > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > > > > -{
> > > > > -       return false;
> > > > > -}
> > > > > -
> > > > >  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > > >  {
> > > > >         return false;
> > > > > @@ -371,15 +324,9 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > > >         return false;
> > > > >  }
> > > > >
> > > > > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > > > > -                                         unsigned long vm_flags)
> > > > > -{
> > > > > -       return false;
> > > > > -}
> > > > > -
> > > > >  static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > >                                        unsigned long vm_flags,
> > > > > -                                      bool smaps)
> > > > > +                                      bool smaps, bool in_pf)
> > > > >  {
> > > > >         return false;
> > > > >  }
> > > > > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > > > > index 8a6452e089ca..e047be601268 100644
> > > > > --- a/include/linux/khugepaged.h
> > > > > +++ b/include/linux/khugepaged.h
> > > > > @@ -55,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
> > > > >  {
> > > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > > >             khugepaged_enabled()) {
> > > > > -               if (hugepage_vma_check(vma, vm_flags, false))
> > > > > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> > > > >                         __khugepaged_enter(vma->vm_mm);
> > > > >         }
> > > > >  }
> > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > > index bc8370856e85..b95786ada466 100644
> > > > > --- a/mm/huge_memory.c
> > > > > +++ b/mm/huge_memory.c
> > > > > @@ -71,17 +71,25 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > > > >
> > > > >  bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > >                         unsigned long vm_flags,
> > > > > -                       bool smaps)
> > > > > +                       bool smaps, bool in_pf)
> > > > >  {
> > > > > -       if (!transhuge_vma_enabled(vma, vm_flags))
> > > > > +       /* Explicitly disabled through madvise or prctl. */
> > > >
> > > > Or s390 kvm (not that this has to be exhaustively maintained).
> > > >
> > > > > +       if ((vm_flags & VM_NOHUGEPAGE) ||
> > > > > +           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > > > > +               return false;
> > > > > +       /*
> > > > > +        * If the hardware/firmware marked hugepage support disabled.
> > > > > +        */
> > > > > +       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> > > > >                 return false;
> > > >
> > > > This introduces an extra check for khugepaged path. I don't know
> > > > enough about TRANSPARENT_HUGEPAGE_NEVER_DAX, but I assume this is ok?
> > > > What would have happened previously if khugepaged tried to collapse
> > > > this memory?
> > >
> > > Please refer to commit bae849538157 ("mm/pmem: avoid inserting
> > > hugepage PTE entry with fsdax if hugepage support is disabled") for
> > > why this flag was introduced.
> > >
> > > It is set if hardware doesn't support hugepages, and khugepaged
> > > doesn't collapse since khugepaged won't be started at all.
> > >
> > > But this flag needs to be checked in the page fault path.
> > >
>
> Thanks for the ref to the commit. I'm not sure I understand it in its entirety,
> but at least I can tell khugepaged won't be started :)
>
> > > >
> > > > > +       /* Special VMA and hugetlb VMA */
> > > > >         if (vm_flags & VM_NO_KHUGEPAGED)
> > > > >                 return false;
> > > >
> > > > This adds an extra check along the fault path. Is it also safe to add?
> > >
> > > I think it is safe since hugepage_vma_check() is just used by THP.
> > > Hugetlb has its own page fault handler.
> >
> > I just found one exception. The fuse dax has VM_MIXEDMAP set for its
> > vmas, so this check should be moved after vma_is_dax() check.
> >
> > AFAICT, only dax supports huge_fault() and dax vmas don't have any
> > VM_SPECIAL flags set other than fuse.
> >
>
> Ordering wrt VM_NO_KHUGEPAGED check seems fine. We could always use in_pf to opt
> out of this check, but I think itemizing where collapse and fault paths are
> different would be good.

Maybe using "in_pf" is easier to follow? Depending on the order of
check seems subtle although we already did so for shmem (shmem check
must be done before hugepage flags check).

>
> > >
> > > >
> > > > > -       /* Don't run khugepaged against DAX vma */
> > > > > +       /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
> > > > >         if (vma_is_dax(vma))
> > > > > -               return false;
> > > > > +               return in_pf;
> > > >
> > > > I assume vma_is_temporary_stack() and vma_is_dax() is mutually exclusive.
> > >
> > > I think so.
> > >
> > > >
> > > > >         if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > > > >                                 vma->vm_pgoff, HPAGE_PMD_NR))
> > > > > @@ -91,7 +99,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > >                 return false;
> > > > >
> > > > >         /* Enabled via shmem mount options or sysfs settings. */
> > > > > -       if (shmem_file(vma->vm_file))
> > > > > +       if (!in_pf && shmem_file(vma->vm_file))
> > > > >                 return shmem_huge_enabled(vma);
> > > >
> > > > Will shmem_file() ever be true in the fault path? Or is this just an
> > > > optimization?
> > >
> > > It could be true. But shmem has its own implementation for huge page
> > > fault and doesn't implement huge_fault() in its vm_operations, so it
> > > will fallback even though "in_pf" is not checked.
> > >
> > > But xfs does have huge_fault() implemented, so it may try to allocate
> > > THP for non-DAX xfs files. So the "in_pf" flag is introduced to handle
> > > this. Since we need this flag anyway, why not use it to return earlier
> > > for shmem instead of relying on fallback.
> > >
> > > Anyway this is all because __transparent_huge_enabled() is replaced by
> > > hugepage_vma_check().
> > >
>
> Thanks for the explanation. Admittedly I don't fully understand the involvement
> of xfs in the shmem case, but general "fail early" logic seems fine to me.
> > > > >         if (!khugepaged_enabled())
> > > > > @@ -102,7 +110,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > >                 return false;
> > > > >
> > > > >         /* Only regular file is valid */
> > > > > -       if (file_thp_enabled(vma))
> > > > > +       if (!in_pf && file_thp_enabled(vma))
> > > > >                 return true;
> > > >
> > > > Likewise for file_thp_enabled()
> > >
> > > Yes, same as the above.
>
> Ditto.
>
> > > >
> > > > >         if (!vma_is_anonymous(vma))
> > > > > @@ -114,9 +122,12 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > >         /*
> > > > >          * THPeligible bit of smaps should show 1 for proper VMAs even
> > > > >          * though anon_vma is not initialized yet.
> > > > > +        *
> > > > > +        * Allow page fault since anon_vma may be not initialized until
> > > > > +        * the first page fault.
> > > > >          */
> > > > >         if (!vma->anon_vma)
> > > > > -               return smaps;
> > > > > +               return (smaps || in_pf);
> > > > >
> > > > >         return true;
> > > > >  }
> > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > index aa0769e3b0d9..ab6183c5489f 100644
> > > > > --- a/mm/khugepaged.c
> > > > > +++ b/mm/khugepaged.c
> > > > > @@ -473,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> > > > >  {
> > > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > > >             khugepaged_enabled()) {
> > > > > -               if (hugepage_vma_check(vma, vm_flags, false))
> > > > > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> > > > >                         __khugepaged_enter(vma->vm_mm);
> > > > >         }
> > > > >  }
> > > > > @@ -918,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > >
> > > > >         if (!transhuge_vma_suitable(vma, address))
> > > > >                 return SCAN_ADDRESS_RANGE;
> > > > > -       if (!hugepage_vma_check(vma, vma->vm_flags, false))
> > > > > +       if (!hugepage_vma_check(vma, vma->vm_flags, false, false))
> > > > >                 return SCAN_VMA_CHECK;
> > > > >         return 0;
> > > > >  }
> > > > > @@ -1399,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
> > > > >          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
> > > > >          * will not fail the vma for missing VM_HUGEPAGE
> > > > >          */
> > > > > -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> > > > > +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false))
> > > > >                 return;
> > > > >
> > > > >         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> > > > > @@ -2089,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> > > > >                         progress++;
> > > > >                         break;
> > > > >                 }
> > > > > -               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> > > > > +               if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) {
> > > > >  skip:
> > > > >                         progress++;
> > > > >                         continue;
> > > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > > index bc5d40eec5d5..673f7561a30a 100644
> > > > > --- a/mm/memory.c
> > > > > +++ b/mm/memory.c
> > > > > @@ -4962,6 +4962,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > > > >                 .gfp_mask = __get_fault_gfp_mask(vma),
> > > > >         };
> > > > >         struct mm_struct *mm = vma->vm_mm;
> > > > > +       unsigned long vm_flags = vma->vm_flags;
> > > > >         pgd_t *pgd;
> > > > >         p4d_t *p4d;
> > > > >         vm_fault_t ret;
> > > > > @@ -4975,7 +4976,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > > > >         if (!vmf.pud)
> > > > >                 return VM_FAULT_OOM;
> > > > >  retry_pud:
> > > > > -       if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
> > > > > +       if (pud_none(*vmf.pud) &&
> > > > > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> > > > >                 ret = create_huge_pud(&vmf);
> > > > >                 if (!(ret & VM_FAULT_FALLBACK))
> > > > >                         return ret;
> > > > > @@ -5008,7 +5010,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > > > >         if (pud_trans_unstable(vmf.pud))
> > > > >                 goto retry_pud;
> > > > >
> > > > > -       if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
> > > > > +       if (pmd_none(*vmf.pmd) &&
> > > > > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> > > > >                 ret = create_huge_pmd(&vmf);
> > > > >                 if (!(ret & VM_FAULT_FALLBACK))
> > > > >                         return ret;
> > > > > --
> > > > > 2.26.3
> > > > >
> > > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 5/7] mm: thp: kill transparent_hugepage_active()
  2022-06-13 15:06       ` Zach O'Keefe
@ 2022-06-14 19:16         ` Yang Shi
  0 siblings, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-14 19:16 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On Mon, Jun 13, 2022 at 8:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On 10 Jun 10:02, Yang Shi wrote:
> > On Thu, Jun 9, 2022 at 6:03 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > >
> > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > The transparent_hugepage_active() was introduced to show THP eligibility
> > > > bit in smaps in proc, smaps is the only user.  But it actually does the
> > > > similar check as hugepage_vma_check() which is used by khugepaged.  We
> > > > definitely don't have to maintain two similar checks, so kill
> > > > transparent_hugepage_active().
> > >
> > > I never realized smaps was the only user! Great!
> > >
> > > > Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it
> > > > is not only for khugepaged anymore.
> > > >
> > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > ---
> > > >  fs/proc/task_mmu.c         |  2 +-
> > > >  include/linux/huge_mm.h    | 16 +++++++-----
> > > >  include/linux/khugepaged.h |  4 +--
> > > >  mm/huge_memory.c           | 50 ++++++++++++++++++++++++++++++++-----
> > > >  mm/khugepaged.c            | 51 +++-----------------------------------
> > > >  5 files changed, 60 insertions(+), 63 deletions(-)
> > > >
> > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > > index 2dd8c8a66924..fd79566e204c 100644
> > > > --- a/fs/proc/task_mmu.c
> > > > +++ b/fs/proc/task_mmu.c
> > > > @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
> > > >         __show_smap(m, &mss, false);
> > > >
> > > >         seq_printf(m, "THPeligible:    %d\n",
> > > > -                  transparent_hugepage_active(vma));
> > > > +                  hugepage_vma_check(vma, vma->vm_flags, true));
> > > >
> > > >         if (arch_pkeys_enabled())
> > > >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index 79d5919beb83..f561c3e16def 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -209,7 +209,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > > >                !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> > > >  }
> > > >
> > > > -bool transparent_hugepage_active(struct vm_area_struct *vma);
> > > > +bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > +                       unsigned long vm_flags,
> > > > +                       bool smaps);
> > > >
> > > >  #define transparent_hugepage_use_zero_page()                           \
> > > >         (transparent_hugepage_flags &                                   \
> > > > @@ -358,11 +360,6 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > > >         return false;
> > > >  }
> > > >
> > > > -static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > > -{
> > > > -       return false;
> > > > -}
> > > > -
> > > >  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > >  {
> > > >         return false;
> > > > @@ -380,6 +377,13 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > > >         return false;
> > > >  }
> > > >
> > > > +static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > +                                      unsigned long vm_flags,
> > > > +                                      bool smaps)
> > > > +{
> > > > +       return false;
> > > > +}
> > > > +
> > > >  static inline void prep_transhuge_page(struct page *page) {}
> > > >
> > > >  #define transparent_hugepage_flags 0UL
> > > > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > > > index 392d34c3c59a..8a6452e089ca 100644
> > > > --- a/include/linux/khugepaged.h
> > > > +++ b/include/linux/khugepaged.h
> > > > @@ -10,8 +10,6 @@ extern struct attribute_group khugepaged_attr_group;
> > > >  extern int khugepaged_init(void);
> > > >  extern void khugepaged_destroy(void);
> > > >  extern int start_stop_khugepaged(void);
> > > > -extern bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > -                              unsigned long vm_flags);
> > > >  extern void __khugepaged_enter(struct mm_struct *mm);
> > > >  extern void __khugepaged_exit(struct mm_struct *mm);
> > > >  extern void khugepaged_enter_vma(struct vm_area_struct *vma,
> > > > @@ -57,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
> > > >  {
> > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > >             khugepaged_enabled()) {
> > > > -               if (hugepage_vma_check(vma, vm_flags))
> > > > +               if (hugepage_vma_check(vma, vm_flags, false))
> > > >                         __khugepaged_enter(vma->vm_mm);
> > > >         }
> > > >  }
> > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > index 36ada544e494..bc8370856e85 100644
> > > > --- a/mm/huge_memory.c
> > > > +++ b/mm/huge_memory.c
> > > > @@ -69,18 +69,56 @@ static atomic_t huge_zero_refcount;
> > > >  struct page *huge_zero_page __read_mostly;
> > > >  unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > > >
> > > > -bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > > +bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > +                       unsigned long vm_flags,
> > > > +                       bool smaps)
> > > >  {
> > > > +       if (!transhuge_vma_enabled(vma, vm_flags))
> > > > +               return false;
> > > > +
> > > > +       if (vm_flags & VM_NO_KHUGEPAGED)
> > > > +               return false;
> > > > +
> > > > +       /* Don't run khugepaged against DAX vma */
> > > > +       if (vma_is_dax(vma))
> > > > +               return false;
> > > > +
> > > > +       if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > > > +                               vma->vm_pgoff, HPAGE_PMD_NR))
> > > > +               return false;
> > > > +
> > > >         if (!transhuge_vma_size_ok(vma))
> > > >                 return false;
>
> I know we just introduced transhuge_vma_size_ok(), but is there a way to
> consolidate the above two checks into a single transhuge_vma_suitable(), the
> same way it used to be done in transparent_hugepage_active()? I.e.
>
> transhuge_vma_suitable(vma, vma->vm_end - HPAGE_PMD_SIZE).
>
> Which checks if the vma can hold an aligned hugepage, as well as centralizes
> the (what I think to be) complicated file mapping check.

Good point. Thanks for the suggestion. Actually
transhuge_vma_size_ok() is just called by hugepage_vma_check(). And
hugepage_vma_check() does check the alignment for file vma too, so I
think they could be consolidated into one function as you suggested.
And this should help keep the THPeligible bit the same throughout the
series.

>
> > > > -       if (vma_is_anonymous(vma))
> > > > -               return __transparent_hugepage_enabled(vma);
> > > > -       if (vma_is_shmem(vma))
> > > > +
> > > > +       /* Enabled via shmem mount options or sysfs settings. */
> > > > +       if (shmem_file(vma->vm_file))
> > > >                 return shmem_huge_enabled(vma);
> > > > -       if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
> > > > +
> > > > +       if (!khugepaged_enabled())
> > > > +               return false;
> > > > +
> > > > +       /* THP settings require madvise. */
> > > > +       if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> > > > +               return false;
> > > > +
> > > > +       /* Only regular file is valid */
> > > > +       if (file_thp_enabled(vma))
> > > >                 return true;
> > > >
> > > > -       return false;
> > > > +       if (!vma_is_anonymous(vma))
> > > > +               return false;
> > > > +
> > > > +       if (vma_is_temporary_stack(vma))
> > > > +               return false;
> > > > +
> > > > +       /*
> > > > +        * THPeligible bit of smaps should show 1 for proper VMAs even
> > > > +        * though anon_vma is not initialized yet.
> > > > +        */
> > > > +       if (!vma->anon_vma)
> > > > +               return smaps;
> > > > +
> > > > +       return true;
> > > >  }
> > >
> > > There are a few cases where the return value for smaps will be
> > > different from before. I presume this won't be an issue, and that any
> > > difference resulting from this change is actually a positive
> > > difference, given it more accurately reflects the thp eligibility of
> > > the vma? For example, a VM_NO_KHUGEPAGED-marked vma might now show 0
> > > where it otherwise showed 1.
> >
> > Yes, returning 1 for VM_NO_KHUGEPAGED vmas is wrong. Actually TBH I
> > suspect very few people actually use this bit. Anyway I will elaborate
> > this in the commit log.
> >
> > >
> > > >  static bool get_huge_zero_page(void)
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index ca1754d3a827..aa0769e3b0d9 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -437,49 +437,6 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
> > > >         return atomic_read(&mm->mm_users) == 0;
> > > >  }
> > > >
> > > > -bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > -                       unsigned long vm_flags)
> > > > -{
> > > > -       if (!transhuge_vma_enabled(vma, vm_flags))
> > > > -               return false;
> > > > -
> > > > -       if (vm_flags & VM_NO_KHUGEPAGED)
> > > > -               return false;
> > > > -
> > > > -       /* Don't run khugepaged against DAX vma */
> > > > -       if (vma_is_dax(vma))
> > > > -               return false;
> > > > -
> > > > -       if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > > > -                               vma->vm_pgoff, HPAGE_PMD_NR))
> > > > -               return false;
> > > > -
> > > > -       if (!transhuge_vma_size_ok(vma))
> > > > -               return false;
> > > > -
> > > > -       /* Enabled via shmem mount options or sysfs settings. */
> > > > -       if (shmem_file(vma->vm_file))
> > > > -               return shmem_huge_enabled(vma);
> > > > -
> > > > -       if (!khugepaged_enabled())
> > > > -               return false;
> > > > -
> > > > -       /* THP settings require madvise. */
> > > > -       if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> > > > -               return false;
> > > > -
> > > > -       /* Only regular file is valid */
> > > > -       if (file_thp_enabled(vma))
> > > > -               return true;
> > > > -
> > > > -       if (!vma->anon_vma || !vma_is_anonymous(vma))
> > > > -               return false;
> > > > -       if (vma_is_temporary_stack(vma))
> > > > -               return false;
> > > > -
> > > > -       return true;
> > > > -}
> > > > -
> > > >  void __khugepaged_enter(struct mm_struct *mm)
> > > >  {
> > > >         struct mm_slot *mm_slot;
> > > > @@ -516,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> > > >  {
> > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > >             khugepaged_enabled()) {
> > > > -               if (hugepage_vma_check(vma, vm_flags))
> > > > +               if (hugepage_vma_check(vma, vm_flags, false))
> > > >                         __khugepaged_enter(vma->vm_mm);
> > > >         }
> > > >  }
> > > > @@ -961,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > >
> > > >         if (!transhuge_vma_suitable(vma, address))
> > > >                 return SCAN_ADDRESS_RANGE;
> > > > -       if (!hugepage_vma_check(vma, vma->vm_flags))
> > > > +       if (!hugepage_vma_check(vma, vma->vm_flags, false))
> > > >                 return SCAN_VMA_CHECK;
> > > >         return 0;
> > > >  }
> > > > @@ -1442,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
> > > >          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
> > > >          * will not fail the vma for missing VM_HUGEPAGE
> > > >          */
> > > > -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
> > > > +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> > > >                 return;
> > > >
> > > >         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> > > > @@ -2132,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> > > >                         progress++;
> > > >                         break;
> > > >                 }
> > > > -               if (!hugepage_vma_check(vma, vma->vm_flags)) {
> > > > +               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> > > >  skip:
> > > >                         progress++;
> > > >                         continue;
> > > > --
> > > > 2.26.3
> > > >
> > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled()
  2022-06-14 18:51           ` Yang Shi
@ 2022-06-14 23:55             ` Zach O'Keefe
  0 siblings, 0 replies; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-14 23:55 UTC (permalink / raw)
  To: Yang Shi
  Cc: Vlastimil Babka, Kirill A. Shutemov, Matthew Wilcox,
	Andrew Morton, Linux MM, Linux Kernel Mailing List

On 14 Jun 11:51, Yang Shi wrote:
> On Mon, Jun 13, 2022 at 7:54 AM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On 10 Jun 14:07, Yang Shi wrote:
> > > On Fri, Jun 10, 2022 at 10:24 AM Yang Shi <shy828301@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 9, 2022 at 7:22 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > > > >
> > > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote:
> > > > > >
> > > > > > The page fault path checks THP eligibility with
> > > > > > __transhuge_page_enabled() which does the similar thing as
> > > > > > hugepage_vma_check(), so use hugepage_vma_check() instead.
> > > > > >
> > > > > > However page fault allows DAX and !anon_vma cases, so added a new flag,
> > > > > > in_pf, to hugepage_vma_check() to make page fault work correctly.
> > > > > >
> > > > > > The in_pf flag is also used to skip shmem and file THP for page fault
> > > > > > since shmem handles THP in its own shmem_fault() and file THP allocation
> > > > > > on fault is not supported yet.
> > > > > >
> > > > > > Also remove hugepage_vma_enabled() since hugepage_vma_check() is the
> > > > > > only caller now, it is not necessary to have a helper function.
> > > > > >
> > > > > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > > > > ---
> > > > > >  fs/proc/task_mmu.c         |  2 +-
> > > > > >  include/linux/huge_mm.h    | 57 ++------------------------------------
> > > > > >  include/linux/khugepaged.h |  2 +-
> > > > > >  mm/huge_memory.c           | 25 ++++++++++++-----
> > > > > >  mm/khugepaged.c            |  8 +++---
> > > > > >  mm/memory.c                |  7 +++--
> > > > > >  6 files changed, 31 insertions(+), 70 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > > > > index fd79566e204c..a0850303baec 100644
> > > > > > --- a/fs/proc/task_mmu.c
> > > > > > +++ b/fs/proc/task_mmu.c
> > > > > > @@ -860,7 +860,7 @@ static int show_smap(struct seq_file *m, void *v)
> > > > > >         __show_smap(m, &mss, false);
> > > > > >
> > > > > >         seq_printf(m, "THPeligible:    %d\n",
> > > > > > -                  hugepage_vma_check(vma, vma->vm_flags, true));
> > > > > > +                  hugepage_vma_check(vma, vma->vm_flags, true, false));
> > > > > >
> > > > > >         if (arch_pkeys_enabled())
> > > > > >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > > index f561c3e16def..d478e8875023 100644
> > > > > > --- a/include/linux/huge_mm.h
> > > > > > +++ b/include/linux/huge_mm.h
> > > > > > @@ -153,48 +153,6 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > > > >         return true;
> > > > > >  }
> > > > > >
> > > > > > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > > > > > -                                         unsigned long vm_flags)
> > > > > > -{
> > > > > > -       /* Explicitly disabled through madvise. */
> > > > > > -       if ((vm_flags & VM_NOHUGEPAGE) ||
> > > > > > -           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > > > > > -               return false;
> > > > > > -       return true;
> > > > > > -}
> > > > > > -
> > > > > > -/*
> > > > > > - * to be used on vmas which are known to support THP.
> > > > > > - * Use transparent_hugepage_active otherwise
> > > > > > - */
> > > > > > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > > > > > -{
> > > > > > -
> > > > > > -       /*
> > > > > > -        * If the hardware/firmware marked hugepage support disabled.
> > > > > > -        */
> > > > > > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> > > > > > -               return false;
> > > > > > -
> > > > > > -       if (!transhuge_vma_enabled(vma, vma->vm_flags))
> > > > > > -               return false;
> > > > > > -
> > > > > > -       if (vma_is_temporary_stack(vma))
> > > > > > -               return false;
> > > > > > -
> > > > > > -       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> > > > > > -               return true;
> > > > > > -
> > > > > > -       if (vma_is_dax(vma))
> > > > > > -               return true;
> > > > > > -
> > > > > > -       if (transparent_hugepage_flags &
> > > > > > -                               (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
> > > > > > -               return !!(vma->vm_flags & VM_HUGEPAGE);
> > > > > > -
> > > > > > -       return false;
> > > > > > -}
> > > > > > -
> > > > > >  static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > > > > >  {
> > > > > >         struct inode *inode;
> > > > > > @@ -211,7 +169,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > > > > >
> > > > > >  bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > > >                         unsigned long vm_flags,
> > > > > > -                       bool smaps);
> > > > > > +                       bool smaps, bool in_pf);
> > > > > >
> > > > > >  #define transparent_hugepage_use_zero_page()                           \
> > > > > >         (transparent_hugepage_flags &                                   \
> > > > > > @@ -355,11 +313,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio)
> > > > > >         return false;
> > > > > >  }
> > > > > >
> > > > > > -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> > > > > > -{
> > > > > > -       return false;
> > > > > > -}
> > > > > > -
> > > > > >  static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma)
> > > > > >  {
> > > > > >         return false;
> > > > > > @@ -371,15 +324,9 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > > > >         return false;
> > > > > >  }
> > > > > >
> > > > > > -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> > > > > > -                                         unsigned long vm_flags)
> > > > > > -{
> > > > > > -       return false;
> > > > > > -}
> > > > > > -
> > > > > >  static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > > >                                        unsigned long vm_flags,
> > > > > > -                                      bool smaps)
> > > > > > +                                      bool smaps, bool in_pf)
> > > > > >  {
> > > > > >         return false;
> > > > > >  }
> > > > > > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > > > > > index 8a6452e089ca..e047be601268 100644
> > > > > > --- a/include/linux/khugepaged.h
> > > > > > +++ b/include/linux/khugepaged.h
> > > > > > @@ -55,7 +55,7 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
> > > > > >  {
> > > > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > > > >             khugepaged_enabled()) {
> > > > > > -               if (hugepage_vma_check(vma, vm_flags, false))
> > > > > > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> > > > > >                         __khugepaged_enter(vma->vm_mm);
> > > > > >         }
> > > > > >  }
> > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > > > index bc8370856e85..b95786ada466 100644
> > > > > > --- a/mm/huge_memory.c
> > > > > > +++ b/mm/huge_memory.c
> > > > > > @@ -71,17 +71,25 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
> > > > > >
> > > > > >  bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > > >                         unsigned long vm_flags,
> > > > > > -                       bool smaps)
> > > > > > +                       bool smaps, bool in_pf)
> > > > > >  {
> > > > > > -       if (!transhuge_vma_enabled(vma, vm_flags))
> > > > > > +       /* Explicitly disabled through madvise or prctl. */
> > > > >
> > > > > Or s390 kvm (not that this has to be exhaustively maintained).
> > > > >
> > > > > > +       if ((vm_flags & VM_NOHUGEPAGE) ||
> > > > > > +           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > > > > > +               return false;
> > > > > > +       /*
> > > > > > +        * If the hardware/firmware marked hugepage support disabled.
> > > > > > +        */
> > > > > > +       if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> > > > > >                 return false;
> > > > >
> > > > > This introduces an extra check for khugepaged path. I don't know
> > > > > enough about TRANSPARENT_HUGEPAGE_NEVER_DAX, but I assume this is ok?
> > > > > What would have happened previously if khugepaged tried to collapse
> > > > > this memory?
> > > >
> > > > Please refer to commit bae849538157 ("mm/pmem: avoid inserting
> > > > hugepage PTE entry with fsdax if hugepage support is disabled") for
> > > > why this flag was introduced.
> > > >
> > > > It is set if hardware doesn't support hugepages, and khugepaged
> > > > doesn't collapse since khugepaged won't be started at all.
> > > >
> > > > But this flag needs to be checked in the page fault path.
> > > >
> >
> > Thanks for the ref to the commit. I'm not sure I understand it in its entirety,
> > but at least I can tell khugepaged won't be started :)
> >
> > > > >
> > > > > > +       /* Special VMA and hugetlb VMA */
> > > > > >         if (vm_flags & VM_NO_KHUGEPAGED)
> > > > > >                 return false;
> > > > >
> > > > > This adds an extra check along the fault path. Is it also safe to add?
> > > >
> > > > I think it is safe since hugepage_vma_check() is just used by THP.
> > > > Hugetlb has its own page fault handler.
> > >
> > > I just found one exception. The fuse dax has VM_MIXEDMAP set for its
> > > vmas, so this check should be moved after vma_is_dax() check.
> > >
> > > AFAICT, only dax supports huge_fault() and dax vmas don't have any
> > > VM_SPECIAL flags set other than fuse.
> > >
> >
> > Ordering wrt VM_NO_KHUGEPAGED check seems fine. We could always use in_pf to opt
> > out of this check, but I think itemizing where collapse and fault paths are
> > different would be good.
> 
> Maybe using "in_pf" is easier to follow? Depending on the order of
> check seems subtle although we already did so for shmem (shmem check
> must be done before hugepage flags check).

Ya, I think the subtleties wrt ordering is getting tricky. I'm OK either way -
but we should document some of these ordering subtleties less someone
inadvertently changes something in the future and has to rediscover e.g. that
particular VM_MIXEDMAP case.

> >
> > > >
> > > > >
> > > > > > -       /* Don't run khugepaged against DAX vma */
> > > > > > +       /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
> > > > > >         if (vma_is_dax(vma))
> > > > > > -               return false;
> > > > > > +               return in_pf;
> > > > >
> > > > > I assume vma_is_temporary_stack() and vma_is_dax() is mutually exclusive.
> > > >
> > > > I think so.
> > > >
> > > > >
> > > > > >         if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > > > > >                                 vma->vm_pgoff, HPAGE_PMD_NR))
> > > > > > @@ -91,7 +99,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > > >                 return false;
> > > > > >
> > > > > >         /* Enabled via shmem mount options or sysfs settings. */
> > > > > > -       if (shmem_file(vma->vm_file))
> > > > > > +       if (!in_pf && shmem_file(vma->vm_file))
> > > > > >                 return shmem_huge_enabled(vma);
> > > > >
> > > > > Will shmem_file() ever be true in the fault path? Or is this just an
> > > > > optimization?
> > > >
> > > > It could be true. But shmem has its own implementation for huge page
> > > > fault and doesn't implement huge_fault() in its vm_operations, so it
> > > > will fallback even though "in_pf" is not checked.
> > > >
> > > > But xfs does have huge_fault() implemented, so it may try to allocate
> > > > THP for non-DAX xfs files. So the "in_pf" flag is introduced to handle
> > > > this. Since we need this flag anyway, why not use it to return earlier
> > > > for shmem instead of relying on fallback.
> > > >
> > > > Anyway this is all because __transparent_huge_enabled() is replaced by
> > > > hugepage_vma_check().
> > > >
> >
> > Thanks for the explanation. Admittedly I don't fully understand the involvement
> > of xfs in the shmem case, but general "fail early" logic seems fine to me.
> > > > > >         if (!khugepaged_enabled())
> > > > > > @@ -102,7 +110,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > > >                 return false;
> > > > > >
> > > > > >         /* Only regular file is valid */
> > > > > > -       if (file_thp_enabled(vma))
> > > > > > +       if (!in_pf && file_thp_enabled(vma))
> > > > > >                 return true;
> > > > >
> > > > > Likewise for file_thp_enabled()
> > > >
> > > > Yes, same as the above.
> >
> > Ditto.
> >
> > > > >
> > > > > >         if (!vma_is_anonymous(vma))
> > > > > > @@ -114,9 +122,12 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
> > > > > >         /*
> > > > > >          * THPeligible bit of smaps should show 1 for proper VMAs even
> > > > > >          * though anon_vma is not initialized yet.
> > > > > > +        *
> > > > > > +        * Allow page fault since anon_vma may be not initialized until
> > > > > > +        * the first page fault.
> > > > > >          */
> > > > > >         if (!vma->anon_vma)
> > > > > > -               return smaps;
> > > > > > +               return (smaps || in_pf);
> > > > > >
> > > > > >         return true;
> > > > > >  }
> > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > > index aa0769e3b0d9..ab6183c5489f 100644
> > > > > > --- a/mm/khugepaged.c
> > > > > > +++ b/mm/khugepaged.c
> > > > > > @@ -473,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> > > > > >  {
> > > > > >         if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > > > > >             khugepaged_enabled()) {
> > > > > > -               if (hugepage_vma_check(vma, vm_flags, false))
> > > > > > +               if (hugepage_vma_check(vma, vm_flags, false, false))
> > > > > >                         __khugepaged_enter(vma->vm_mm);
> > > > > >         }
> > > > > >  }
> > > > > > @@ -918,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > > >
> > > > > >         if (!transhuge_vma_suitable(vma, address))
> > > > > >                 return SCAN_ADDRESS_RANGE;
> > > > > > -       if (!hugepage_vma_check(vma, vma->vm_flags, false))
> > > > > > +       if (!hugepage_vma_check(vma, vma->vm_flags, false, false))
> > > > > >                 return SCAN_VMA_CHECK;
> > > > > >         return 0;
> > > > > >  }
> > > > > > @@ -1399,7 +1399,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
> > > > > >          * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
> > > > > >          * will not fail the vma for missing VM_HUGEPAGE
> > > > > >          */
> > > > > > -       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> > > > > > +       if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false))
> > > > > >                 return;
> > > > > >
> > > > > >         /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> > > > > > @@ -2089,7 +2089,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> > > > > >                         progress++;
> > > > > >                         break;
> > > > > >                 }
> > > > > > -               if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> > > > > > +               if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) {
> > > > > >  skip:
> > > > > >                         progress++;
> > > > > >                         continue;
> > > > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > > > index bc5d40eec5d5..673f7561a30a 100644
> > > > > > --- a/mm/memory.c
> > > > > > +++ b/mm/memory.c
> > > > > > @@ -4962,6 +4962,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > > > > >                 .gfp_mask = __get_fault_gfp_mask(vma),
> > > > > >         };
> > > > > >         struct mm_struct *mm = vma->vm_mm;
> > > > > > +       unsigned long vm_flags = vma->vm_flags;
> > > > > >         pgd_t *pgd;
> > > > > >         p4d_t *p4d;
> > > > > >         vm_fault_t ret;
> > > > > > @@ -4975,7 +4976,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > > > > >         if (!vmf.pud)
> > > > > >                 return VM_FAULT_OOM;
> > > > > >  retry_pud:
> > > > > > -       if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
> > > > > > +       if (pud_none(*vmf.pud) &&
> > > > > > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> > > > > >                 ret = create_huge_pud(&vmf);
> > > > > >                 if (!(ret & VM_FAULT_FALLBACK))
> > > > > >                         return ret;
> > > > > > @@ -5008,7 +5010,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> > > > > >         if (pud_trans_unstable(vmf.pud))
> > > > > >                 goto retry_pud;
> > > > > >
> > > > > > -       if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
> > > > > > +       if (pmd_none(*vmf.pmd) &&
> > > > > > +           hugepage_vma_check(vma, vm_flags, false, true)) {
> > > > > >                 ret = create_huge_pmd(&vmf);
> > > > > >                 if (!(ret & VM_FAULT_FALLBACK))
> > > > > >                         return ret;
> > > > > > --
> > > > > > 2.26.3
> > > > > >
> > > > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2022-06-14 23:55 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-06 21:44 [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Yang Shi
2022-06-06 21:44 ` [v3 PATCH 1/7] mm: khugepaged: check THP flag in hugepage_vma_check() Yang Shi
2022-06-09 17:49   ` Zach O'Keefe
2022-06-10  7:09   ` Miaohe Lin
2022-06-06 21:44 ` [v3 PATCH 2/7] mm: thp: introduce transhuge_vma_size_ok() helper Yang Shi
2022-06-09 22:21   ` Zach O'Keefe
2022-06-10  0:08     ` Yang Shi
2022-06-10  0:51       ` Zach O'Keefe
2022-06-10 16:38         ` Yang Shi
2022-06-10 21:24           ` Yang Shi
2022-06-10  7:20   ` Miaohe Lin
2022-06-10 16:47     ` Yang Shi
2022-06-06 21:44 ` [v3 PATCH 3/7] mm: khugepaged: remove the redundant anon vma check Yang Shi
2022-06-09 23:23   ` Zach O'Keefe
2022-06-10  0:01     ` Yang Shi
2022-06-10  7:23   ` Miaohe Lin
2022-06-10  7:28     ` Miaohe Lin
2022-06-06 21:44 ` [v3 PATCH 4/7] mm: khugepaged: use transhuge_vma_suitable replace open-code Yang Shi
2022-06-10  1:51   ` Zach O'Keefe
2022-06-10 16:59     ` Yang Shi
2022-06-10 22:03       ` Yang Shi
2022-06-11  0:27         ` Zach O'Keefe
2022-06-11  3:25           ` Yang Shi
2022-06-11 21:43             ` Zach O'Keefe
2022-06-14 17:40               ` Yang Shi
2022-06-06 21:44 ` [v3 PATCH 5/7] mm: thp: kill transparent_hugepage_active() Yang Shi
2022-06-10  1:02   ` Zach O'Keefe
2022-06-10 17:02     ` Yang Shi
2022-06-13 15:06       ` Zach O'Keefe
2022-06-14 19:16         ` Yang Shi
2022-06-06 21:44 ` [v3 PATCH 6/7] mm: thp: kill __transhuge_page_enabled() Yang Shi
2022-06-10  2:22   ` Zach O'Keefe
2022-06-10 17:24     ` Yang Shi
2022-06-10 21:07       ` Yang Shi
2022-06-13 14:54         ` Zach O'Keefe
2022-06-14 18:51           ` Yang Shi
2022-06-14 23:55             ` Zach O'Keefe
2022-06-06 21:44 ` [v3 PATCH 7/7] mm: khugepaged: reorg some khugepaged helpers Yang Shi
2022-06-09 23:32 ` [mm-unstable v3 PATCH 0/7] Cleanup transhuge_xxx helpers Zach O'Keefe
2022-06-10  7:08 ` Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).