linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
@ 2022-04-04 20:02 Yang Shi
  2022-04-04 20:02 ` [v3 PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE Yang Shi
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Yang Shi @ 2022-04-04 20:02 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, willy,
	ziy, tytso, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel


Changelog
v3: * Register mm to khugepaged in common mmap path instead of touching
      filesystem code (patch 8/8) suggested by Ted.
    * New patch 7/8 cleaned up and renamed khugepaged_enter_vma_merge()
      to khugepaged_enter_vma().
    * Collected acked-by from Song Liu for patch 1 ~ 6.
    * Rebased on top of 5.18-rc1.
    * Excluded linux-xfs and linux-ext4 list since the series doesn't
      touch fs code anymore, but keep linux-fsdevel posted. 
v2: * Collected reviewed-by tags from Miaohe Lin.
    * Fixed build error for patch 4/8.

The readonly FS THP relies on khugepaged to collapse THP for suitable
vmas.  But it is kind of "random luck" for khugepaged to see the
readonly FS vmas (see report: https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/) since currently the vmas are registered to khugepaged when:
  - Anon huge pmd page fault
  - VMA merge
  - MADV_HUGEPAGE
  - Shmem mmap

If the above conditions are not met, even though khugepaged is enabled
it won't see readonly FS vmas at all.  MADV_HUGEPAGE could be specified
explicitly to tell khugepaged to collapse this area, but when khugepaged
mode is "always" it should scan suitable vmas as long as VM_NOHUGEPAGE
is not set.

So make sure readonly FS vmas are registered to khugepaged to make the
behavior more consistent.

Registering suitable vmas in common mmap path, that could cover both
readonly FS vmas and shmem vmas, so removed the khugepaged calls in
shmem.c.

The patch 1 ~ 7 are minor bug fixes, clean up and preparation patches.
The patch 8 is the real meat. 


Tested with khugepaged test in selftests and the testcase provided by
Vlastimil Babka in https://lore.kernel.org/lkml/df3b5d1c-a36b-2c73-3e27-99e74983de3a@suse.cz/
by commenting out MADV_HUGEPAGE call.


Yang Shi (8):
      sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE
      mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED
      mm: khugepaged: skip DAX vma
      mm: thp: only regular file could be THP eligible
      mm: khugepaged: make khugepaged_enter() void function
      mm: khugepaged: move some khugepaged_* functions to khugepaged.c
      mm: khugepaged: introduce khugepaged_enter_vma() helper
      mm: mmap: register suitable readonly file vmas for khugepaged

 include/linux/huge_mm.h        | 14 ++++++++++++
 include/linux/khugepaged.h     | 59 ++++++++++++---------------------------------------
 include/linux/sched/coredump.h |  3 ++-
 kernel/fork.c                  |  4 +---
 mm/huge_memory.c               | 15 ++++---------
 mm/khugepaged.c                | 76 +++++++++++++++++++++++++++++++++++++-----------------------------
 mm/mmap.c                      | 14 ++++++++----
 mm/shmem.c                     | 12 -----------
 8 files changed, 88 insertions(+), 109 deletions(-)




^ permalink raw reply	[flat|nested] 27+ messages in thread

* [v3 PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
@ 2022-04-04 20:02 ` Yang Shi
  2022-05-09 12:25   ` Vlastimil Babka
  2022-04-04 20:02 ` [v3 PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED Yang Shi
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-04-04 20:02 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, willy,
	ziy, tytso, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

MMF_VM_HUGEPAGE is set as long as the mm is available for khugepaged by
khugepaged_enter(), not only when VM_HUGEPAGE is set on vma.  Correct
the comment to avoid confusion.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 include/linux/sched/coredump.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 4d9e3a656875..4d0a5be28b70 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -57,7 +57,8 @@ static inline int get_dumpable(struct mm_struct *mm)
 #endif
 					/* leave room for more dump flags */
 #define MMF_VM_MERGEABLE	16	/* KSM may merge identical pages */
-#define MMF_VM_HUGEPAGE		17	/* set when VM_HUGEPAGE is set on vma */
+#define MMF_VM_HUGEPAGE		17	/* set when mm is available for
+					   khugepaged */
 /*
  * This one-shot flag is dropped due to necessity of changing exe once again
  * on NFS restore
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [v3 PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
  2022-04-04 20:02 ` [v3 PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE Yang Shi
@ 2022-04-04 20:02 ` Yang Shi
  2022-05-09 12:45   ` Vlastimil Babka
  2022-04-04 20:02 ` [v3 PATCH 3/8] mm: khugepaged: skip DAX vma Yang Shi
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-04-04 20:02 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, willy,
	ziy, tytso, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

The hugepage_vma_check() called by khugepaged_enter_vma_merge() does
check VM_NO_KHUGEPAGED. Remove the check from caller and move the check
in hugepage_vma_check() up.

More checks may be run for VM_NO_KHUGEPAGED vmas, but MADV_HUGEPAGE is
definitely not a hot path, so cleaner code does outweigh.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/khugepaged.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a4e5eaf3eb01..7d197d9e3258 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -365,8 +365,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
 		 * register it here without waiting a page fault that
 		 * may not happen any time soon.
 		 */
-		if (!(*vm_flags & VM_NO_KHUGEPAGED) &&
-				khugepaged_enter_vma_merge(vma, *vm_flags))
+		if (khugepaged_enter_vma_merge(vma, *vm_flags))
 			return -ENOMEM;
 		break;
 	case MADV_NOHUGEPAGE:
@@ -445,6 +444,9 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
 	if (!transhuge_vma_enabled(vma, vm_flags))
 		return false;
 
+	if (vm_flags & VM_NO_KHUGEPAGED)
+		return false;
+
 	if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
 				vma->vm_pgoff, HPAGE_PMD_NR))
 		return false;
@@ -470,7 +472,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
 		return false;
 	if (vma_is_temporary_stack(vma))
 		return false;
-	return !(vm_flags & VM_NO_KHUGEPAGED);
+
+	return true;
 }
 
 int __khugepaged_enter(struct mm_struct *mm)
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [v3 PATCH 3/8] mm: khugepaged: skip DAX vma
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
  2022-04-04 20:02 ` [v3 PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE Yang Shi
  2022-04-04 20:02 ` [v3 PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED Yang Shi
@ 2022-04-04 20:02 ` Yang Shi
  2022-05-09 12:46   ` Vlastimil Babka
  2022-04-04 20:02 ` [v3 PATCH 4/8] mm: thp: only regular file could be THP eligible Yang Shi
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-04-04 20:02 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, willy,
	ziy, tytso, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

The DAX vma may be seen by khugepaged when the mm has other khugepaged
suitable vmas.  So khugepaged may try to collapse THP for DAX vma, but
it will fail due to page sanity check, for example, page is not
on LRU.

So it is not harmful, but it is definitely pointless to run khugepaged
against DAX vma, so skip it in early check.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/khugepaged.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 7d197d9e3258..964a4d2c942a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -447,6 +447,10 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
 	if (vm_flags & VM_NO_KHUGEPAGED)
 		return false;
 
+	/* Don't run khugepaged against DAX vma */
+	if (vma_is_dax(vma))
+		return false;
+
 	if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
 				vma->vm_pgoff, HPAGE_PMD_NR))
 		return false;
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [v3 PATCH 4/8] mm: thp: only regular file could be THP eligible
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
                   ` (2 preceding siblings ...)
  2022-04-04 20:02 ` [v3 PATCH 3/8] mm: khugepaged: skip DAX vma Yang Shi
@ 2022-04-04 20:02 ` Yang Shi
  2022-05-09 13:41   ` Vlastimil Babka
  2022-04-04 20:02 ` [v3 PATCH 5/8] mm: khugepaged: make khugepaged_enter() void function Yang Shi
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-04-04 20:02 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, willy,
	ziy, tytso, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

Since commit a4aeaa06d45e ("mm: khugepaged: skip huge page collapse for
special files"), khugepaged just collapses THP for regular file which is
the intended usecase for readonly fs THP.  Only show regular file as THP
eligible accordingly.

And make file_thp_enabled() available for khugepaged too in order to remove
duplicate code.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 include/linux/huge_mm.h | 14 ++++++++++++++
 mm/huge_memory.c        | 11 ++---------
 mm/khugepaged.c         |  9 ++-------
 3 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2999190adc22..62a6f667850d 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -172,6 +172,20 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
 	return false;
 }
 
+static inline bool file_thp_enabled(struct vm_area_struct *vma)
+{
+	struct inode *inode;
+
+	if (!vma->vm_file)
+		return false;
+
+	inode = vma->vm_file->f_inode;
+
+	return (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) &&
+	       (vma->vm_flags & VM_EXEC) &&
+	       !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
+}
+
 bool transparent_hugepage_active(struct vm_area_struct *vma);
 
 #define transparent_hugepage_use_zero_page()				\
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2fe38212e07c..183b793fd28e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -68,13 +68,6 @@ static atomic_t huge_zero_refcount;
 struct page *huge_zero_page __read_mostly;
 unsigned long huge_zero_pfn __read_mostly = ~0UL;
 
-static inline bool file_thp_enabled(struct vm_area_struct *vma)
-{
-	return transhuge_vma_enabled(vma, vma->vm_flags) && vma->vm_file &&
-	       !inode_is_open_for_write(vma->vm_file->f_inode) &&
-	       (vma->vm_flags & VM_EXEC);
-}
-
 bool transparent_hugepage_active(struct vm_area_struct *vma)
 {
 	/* The addr is used to check if the vma size fits */
@@ -86,8 +79,8 @@ bool transparent_hugepage_active(struct vm_area_struct *vma)
 		return __transparent_hugepage_enabled(vma);
 	if (vma_is_shmem(vma))
 		return shmem_huge_enabled(vma);
-	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
-		return file_thp_enabled(vma);
+	if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
+		return true;
 
 	return false;
 }
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 964a4d2c942a..609c1bc0a027 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -464,13 +464,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
 		return false;
 
 	/* Only regular file is valid */
-	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
-	    (vm_flags & VM_EXEC)) {
-		struct inode *inode = vma->vm_file->f_inode;
-
-		return !inode_is_open_for_write(inode) &&
-			S_ISREG(inode->i_mode);
-	}
+	if (file_thp_enabled(vma))
+		return true;
 
 	if (!vma->anon_vma || vma->vm_ops)
 		return false;
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [v3 PATCH 5/8] mm: khugepaged: make khugepaged_enter() void function
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
                   ` (3 preceding siblings ...)
  2022-04-04 20:02 ` [v3 PATCH 4/8] mm: thp: only regular file could be THP eligible Yang Shi
@ 2022-04-04 20:02 ` Yang Shi
  2022-05-09 13:46   ` Vlastimil Babka
  2022-04-04 20:02 ` [v3 PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c Yang Shi
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-04-04 20:02 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, willy,
	ziy, tytso, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

The most callers of khugepaged_enter() don't care about the return
value.  Only dup_mmap(), anonymous THP page fault and MADV_HUGEPAGE handle
the error by returning -ENOMEM.  Actually it is not harmful for them to
ignore the error case either.  It also sounds overkilling to fail fork()
and page fault early due to khugepaged_enter() error, and MADV_HUGEPAGE
does set VM_HUGEPAGE flag regardless of the error.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 include/linux/khugepaged.h | 30 ++++++++++++------------------
 kernel/fork.c              |  4 +---
 mm/huge_memory.c           |  4 ++--
 mm/khugepaged.c            | 18 +++++++-----------
 4 files changed, 22 insertions(+), 34 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 2fcc01891b47..0423d3619f26 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -12,10 +12,10 @@ extern struct attribute_group khugepaged_attr_group;
 extern int khugepaged_init(void);
 extern void khugepaged_destroy(void);
 extern int start_stop_khugepaged(void);
-extern int __khugepaged_enter(struct mm_struct *mm);
+extern void __khugepaged_enter(struct mm_struct *mm);
 extern void __khugepaged_exit(struct mm_struct *mm);
-extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
-				      unsigned long vm_flags);
+extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+				       unsigned long vm_flags);
 extern void khugepaged_min_free_kbytes_update(void);
 #ifdef CONFIG_SHMEM
 extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr);
@@ -40,11 +40,10 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
 	(transparent_hugepage_flags &				\
 	 (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG))
 
-static inline int khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
 {
 	if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
-		return __khugepaged_enter(mm);
-	return 0;
+		__khugepaged_enter(mm);
 }
 
 static inline void khugepaged_exit(struct mm_struct *mm)
@@ -53,7 +52,7 @@ static inline void khugepaged_exit(struct mm_struct *mm)
 		__khugepaged_exit(mm);
 }
 
-static inline int khugepaged_enter(struct vm_area_struct *vma,
+static inline void khugepaged_enter(struct vm_area_struct *vma,
 				   unsigned long vm_flags)
 {
 	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags))
@@ -62,27 +61,22 @@ static inline int khugepaged_enter(struct vm_area_struct *vma,
 		     (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) &&
 		    !(vm_flags & VM_NOHUGEPAGE) &&
 		    !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
-			if (__khugepaged_enter(vma->vm_mm))
-				return -ENOMEM;
-	return 0;
+			__khugepaged_enter(vma->vm_mm);
 }
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
-static inline int khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
 {
-	return 0;
 }
 static inline void khugepaged_exit(struct mm_struct *mm)
 {
 }
-static inline int khugepaged_enter(struct vm_area_struct *vma,
-				   unsigned long vm_flags)
+static inline void khugepaged_enter(struct vm_area_struct *vma,
+				    unsigned long vm_flags)
 {
-	return 0;
 }
-static inline int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
-					     unsigned long vm_flags)
+static inline void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+					      unsigned long vm_flags)
 {
-	return 0;
 }
 static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
 					   unsigned long addr)
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..0d13baf86650 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -612,9 +612,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	retval = ksm_fork(mm, oldmm);
 	if (retval)
 		goto out;
-	retval = khugepaged_fork(mm, oldmm);
-	if (retval)
-		goto out;
+	khugepaged_fork(mm, oldmm);
 
 	prev = NULL;
 	for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 183b793fd28e..4fd5a6a79d44 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -725,8 +725,8 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
 		return VM_FAULT_FALLBACK;
 	if (unlikely(anon_vma_prepare(vma)))
 		return VM_FAULT_OOM;
-	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
-		return VM_FAULT_OOM;
+	khugepaged_enter(vma, vma->vm_flags);
+
 	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
 			!mm_forbids_zeropage(vma->vm_mm) &&
 			transparent_hugepage_use_zero_page()) {
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 609c1bc0a027..b69eda934d70 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -365,8 +365,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
 		 * register it here without waiting a page fault that
 		 * may not happen any time soon.
 		 */
-		if (khugepaged_enter_vma_merge(vma, *vm_flags))
-			return -ENOMEM;
+		khugepaged_enter_vma_merge(vma, *vm_flags);
 		break;
 	case MADV_NOHUGEPAGE:
 		*vm_flags &= ~VM_HUGEPAGE;
@@ -475,20 +474,20 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
 	return true;
 }
 
-int __khugepaged_enter(struct mm_struct *mm)
+void __khugepaged_enter(struct mm_struct *mm)
 {
 	struct mm_slot *mm_slot;
 	int wakeup;
 
 	mm_slot = alloc_mm_slot();
 	if (!mm_slot)
-		return -ENOMEM;
+		return;
 
 	/* __khugepaged_exit() must not run from under us */
 	VM_BUG_ON_MM(khugepaged_test_exit(mm), mm);
 	if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) {
 		free_mm_slot(mm_slot);
-		return 0;
+		return;
 	}
 
 	spin_lock(&khugepaged_mm_lock);
@@ -504,11 +503,9 @@ int __khugepaged_enter(struct mm_struct *mm)
 	mmgrab(mm);
 	if (wakeup)
 		wake_up_interruptible(&khugepaged_wait);
-
-	return 0;
 }
 
-int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
 			       unsigned long vm_flags)
 {
 	unsigned long hstart, hend;
@@ -519,13 +516,12 @@ int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
 	 * file-private shmem THP is not supported.
 	 */
 	if (!hugepage_vma_check(vma, vm_flags))
-		return 0;
+		return;
 
 	hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
 	hend = vma->vm_end & HPAGE_PMD_MASK;
 	if (hstart < hend)
-		return khugepaged_enter(vma, vm_flags);
-	return 0;
+		khugepaged_enter(vma, vm_flags);
 }
 
 void __khugepaged_exit(struct mm_struct *mm)
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [v3 PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
                   ` (4 preceding siblings ...)
  2022-04-04 20:02 ` [v3 PATCH 5/8] mm: khugepaged: make khugepaged_enter() void function Yang Shi
@ 2022-04-04 20:02 ` Yang Shi
  2022-05-09 15:31   ` Vlastimil Babka
  2022-04-04 20:02 ` [v3 PATCH 7/8] mm: khugepaged: introduce khugepaged_enter_vma() helper Yang Shi
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-04-04 20:02 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, willy,
	ziy, tytso, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

To reuse hugepage_vma_check() for khugepaged_enter() so that we could
remove some duplicate code.  But moving hugepage_vma_check() to
khugepaged.h needs to include huge_mm.h in it, it seems not optimal to
bloat khugepaged.h.

And the khugepaged_* functions actually are wrappers for some non-inline
functions, so it seems the benefits are not too much to keep them inline.

So move the khugepaged_* functions to khugepaged.c, any callers just
need to include khugepaged.h which is quite small.  For example, the
following patches will call khugepaged_enter() in filemap page fault path
for regular filesystems to make readonly FS THP collapse more consistent.
The  filemap.c just needs to include khugepaged.h.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 include/linux/khugepaged.h | 37 ++++++-------------------------------
 mm/khugepaged.c            | 20 ++++++++++++++++++++
 2 files changed, 26 insertions(+), 31 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 0423d3619f26..6acf9701151e 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -2,10 +2,6 @@
 #ifndef _LINUX_KHUGEPAGED_H
 #define _LINUX_KHUGEPAGED_H
 
-#include <linux/sched/coredump.h> /* MMF_VM_HUGEPAGE */
-#include <linux/shmem_fs.h>
-
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern struct attribute_group khugepaged_attr_group;
 
@@ -16,6 +12,12 @@ extern void __khugepaged_enter(struct mm_struct *mm);
 extern void __khugepaged_exit(struct mm_struct *mm);
 extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
 				       unsigned long vm_flags);
+extern void khugepaged_fork(struct mm_struct *mm,
+			    struct mm_struct *oldmm);
+extern void khugepaged_exit(struct mm_struct *mm);
+extern void khugepaged_enter(struct vm_area_struct *vma,
+			     unsigned long vm_flags);
+
 extern void khugepaged_min_free_kbytes_update(void);
 #ifdef CONFIG_SHMEM
 extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr);
@@ -33,36 +35,9 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
 #define khugepaged_always()				\
 	(transparent_hugepage_flags &			\
 	 (1<<TRANSPARENT_HUGEPAGE_FLAG))
-#define khugepaged_req_madv()					\
-	(transparent_hugepage_flags &				\
-	 (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
 #define khugepaged_defrag()					\
 	(transparent_hugepage_flags &				\
 	 (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG))
-
-static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
-{
-	if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
-		__khugepaged_enter(mm);
-}
-
-static inline void khugepaged_exit(struct mm_struct *mm)
-{
-	if (test_bit(MMF_VM_HUGEPAGE, &mm->flags))
-		__khugepaged_exit(mm);
-}
-
-static inline void khugepaged_enter(struct vm_area_struct *vma,
-				   unsigned long vm_flags)
-{
-	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags))
-		if ((khugepaged_always() ||
-		     (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) ||
-		     (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) &&
-		    !(vm_flags & VM_NOHUGEPAGE) &&
-		    !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
-			__khugepaged_enter(vma->vm_mm);
-}
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
 {
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b69eda934d70..ec5b0a691d87 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -556,6 +556,26 @@ void __khugepaged_exit(struct mm_struct *mm)
 	}
 }
 
+void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+{
+	if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
+		__khugepaged_enter(mm);
+}
+
+void khugepaged_exit(struct mm_struct *mm)
+{
+	if (test_bit(MMF_VM_HUGEPAGE, &mm->flags))
+		__khugepaged_exit(mm);
+}
+
+void khugepaged_enter(struct vm_area_struct *vma, unsigned long vm_flags)
+{
+	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
+	    khugepaged_enabled())
+		if (hugepage_vma_check(vma, vm_flags))
+			__khugepaged_enter(vma->vm_mm);
+}
+
 static void release_pte_page(struct page *page)
 {
 	mod_node_page_state(page_pgdat(page),
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [v3 PATCH 7/8] mm: khugepaged: introduce khugepaged_enter_vma() helper
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
                   ` (5 preceding siblings ...)
  2022-04-04 20:02 ` [v3 PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c Yang Shi
@ 2022-04-04 20:02 ` Yang Shi
  2022-05-09 15:39   ` Vlastimil Babka
  2022-04-04 20:02 ` [v3 PATCH 8/8] mm: mmap: register suitable readonly file vmas for khugepaged Yang Shi
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-04-04 20:02 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, willy,
	ziy, tytso, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

The khugepaged_enter_vma_merge() actually does as the same thing as the
khugepaged_enter() section called by shmem_mmap(), so consolidate them
into one helper and rename it to khugepaged_enter_vma().

Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 include/linux/khugepaged.h |  8 ++++----
 mm/khugepaged.c            | 26 +++++++++-----------------
 mm/mmap.c                  |  8 ++++----
 mm/shmem.c                 | 12 ++----------
 4 files changed, 19 insertions(+), 35 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 6acf9701151e..f4b12be155ab 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -10,8 +10,8 @@ extern void khugepaged_destroy(void);
 extern int start_stop_khugepaged(void);
 extern void __khugepaged_enter(struct mm_struct *mm);
 extern void __khugepaged_exit(struct mm_struct *mm);
-extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
-				       unsigned long vm_flags);
+extern void khugepaged_enter_vma(struct vm_area_struct *vma,
+				 unsigned long vm_flags);
 extern void khugepaged_fork(struct mm_struct *mm,
 			    struct mm_struct *oldmm);
 extern void khugepaged_exit(struct mm_struct *mm);
@@ -49,8 +49,8 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
 				    unsigned long vm_flags)
 {
 }
-static inline void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
-					      unsigned long vm_flags)
+static inline void khugepaged_enter_vma(struct vm_area_struct *vma,
+					unsigned long vm_flags)
 {
 }
 static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ec5b0a691d87..c5c3202d7401 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -365,7 +365,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
 		 * register it here without waiting a page fault that
 		 * may not happen any time soon.
 		 */
-		khugepaged_enter_vma_merge(vma, *vm_flags);
+		khugepaged_enter_vma(vma, *vm_flags);
 		break;
 	case MADV_NOHUGEPAGE:
 		*vm_flags &= ~VM_HUGEPAGE;
@@ -505,23 +505,15 @@ void __khugepaged_enter(struct mm_struct *mm)
 		wake_up_interruptible(&khugepaged_wait);
 }
 
-void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
-			       unsigned long vm_flags)
+void khugepaged_enter_vma(struct vm_area_struct *vma,
+			  unsigned long vm_flags)
 {
-	unsigned long hstart, hend;
-
-	/*
-	 * khugepaged only supports read-only files for non-shmem files.
-	 * khugepaged does not yet work on special mappings. And
-	 * file-private shmem THP is not supported.
-	 */
-	if (!hugepage_vma_check(vma, vm_flags))
-		return;
-
-	hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
-	hend = vma->vm_end & HPAGE_PMD_MASK;
-	if (hstart < hend)
-		khugepaged_enter(vma, vm_flags);
+	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
+	    khugepaged_enabled() &&
+	    (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
+	     (vma->vm_end & HPAGE_PMD_MASK)))
+		if (hugepage_vma_check(vma, vm_flags))
+			__khugepaged_enter(vma->vm_mm);
 }
 
 void __khugepaged_exit(struct mm_struct *mm)
diff --git a/mm/mmap.c b/mm/mmap.c
index 3aa839f81e63..604c8dece5dd 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1218,7 +1218,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 					 end, prev->vm_pgoff, NULL, prev);
 		if (err)
 			return NULL;
-		khugepaged_enter_vma_merge(prev, vm_flags);
+		khugepaged_enter_vma(prev, vm_flags);
 		return prev;
 	}
 
@@ -1245,7 +1245,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
 		}
 		if (err)
 			return NULL;
-		khugepaged_enter_vma_merge(area, vm_flags);
+		khugepaged_enter_vma(area, vm_flags);
 		return area;
 	}
 
@@ -2460,7 +2460,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 		}
 	}
 	anon_vma_unlock_write(vma->anon_vma);
-	khugepaged_enter_vma_merge(vma, vma->vm_flags);
+	khugepaged_enter_vma(vma, vma->vm_flags);
 	validate_mm(mm);
 	return error;
 }
@@ -2538,7 +2538,7 @@ int expand_downwards(struct vm_area_struct *vma,
 		}
 	}
 	anon_vma_unlock_write(vma->anon_vma);
-	khugepaged_enter_vma_merge(vma, vma->vm_flags);
+	khugepaged_enter_vma(vma, vma->vm_flags);
 	validate_mm(mm);
 	return error;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 529c9ad3e926..92eca974771d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2239,11 +2239,7 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 
 	file_accessed(file);
 	vma->vm_ops = &shmem_vm_ops;
-	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
-			((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
-			(vma->vm_end & HPAGE_PMD_MASK)) {
-		khugepaged_enter(vma, vma->vm_flags);
-	}
+	khugepaged_enter_vma(vma, vma->vm_flags);
 	return 0;
 }
 
@@ -4136,11 +4132,7 @@ int shmem_zero_setup(struct vm_area_struct *vma)
 	vma->vm_file = file;
 	vma->vm_ops = &shmem_vm_ops;
 
-	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
-			((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
-			(vma->vm_end & HPAGE_PMD_MASK)) {
-		khugepaged_enter(vma, vma->vm_flags);
-	}
+	khugepaged_enter_vma(vma, vma->vm_flags);
 
 	return 0;
 }
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [v3 PATCH 8/8] mm: mmap: register suitable readonly file vmas for khugepaged
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
                   ` (6 preceding siblings ...)
  2022-04-04 20:02 ` [v3 PATCH 7/8] mm: khugepaged: introduce khugepaged_enter_vma() helper Yang Shi
@ 2022-04-04 20:02 ` Yang Shi
  2022-05-09 15:43   ` Vlastimil Babka
  2022-04-05  0:16 ` [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Matthew Wilcox
  2022-05-09 16:05 ` Vlastimil Babka
  9 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-04-04 20:02 UTC (permalink / raw)
  To: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, willy,
	ziy, tytso, akpm
  Cc: shy828301, linux-mm, linux-fsdevel, linux-kernel

The readonly FS THP relies on khugepaged to collapse THP for suitable
vmas.  But it is kind of "random luck" for khugepaged to see the
readonly FS vmas (https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/)
since currently the vmas are registered to khugepaged when:
  - Anon huge pmd page fault
  - VMA merge
  - MADV_HUGEPAGE
  - Shmem mmap

If the above conditions are not met, even though khugepaged is enabled
it won't see readonly FS vmas at all.  MADV_HUGEPAGE could be specified
explicitly to tell khugepaged to collapse this area, but when khugepaged
mode is "always" it should scan suitable vmas as long as VM_NOHUGEPAGE
is not set.

So make sure readonly FS vmas are registered to khugepaged to make the
behavior more consistent.

Registering suitable vmas in common mmap path, that could cover both
readonly FS vmas and shmem vmas, so removed the khugepaged calls in
shmem.c.

Still need to keep the khugepaged call in vma_merge() since vma_merge()
is called in a lot of places, for example, madvise, mprotect, etc.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/mmap.c  | 6 ++++++
 mm/shmem.c | 4 ----
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 604c8dece5dd..616ebbc2d052 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1842,6 +1842,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	}
 
 	vma_link(mm, vma, prev, rb_link, rb_parent);
+
+	/*
+	 * vma_merge() calls khugepaged_enter_vma() either, the below
+	 * call covers the non-merge case.
+	 */
+	khugepaged_enter_vma(vma, vma->vm_flags);
 	/* Once vma denies write, undo our temporary denial count */
 unmap_writable:
 	if (file && vm_flags & VM_SHARED)
diff --git a/mm/shmem.c b/mm/shmem.c
index 92eca974771d..0c448080d210 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -34,7 +34,6 @@
 #include <linux/export.h>
 #include <linux/swap.h>
 #include <linux/uio.h>
-#include <linux/khugepaged.h>
 #include <linux/hugetlb.h>
 #include <linux/fs_parser.h>
 #include <linux/swapfile.h>
@@ -2239,7 +2238,6 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 
 	file_accessed(file);
 	vma->vm_ops = &shmem_vm_ops;
-	khugepaged_enter_vma(vma, vma->vm_flags);
 	return 0;
 }
 
@@ -4132,8 +4130,6 @@ int shmem_zero_setup(struct vm_area_struct *vma)
 	vma->vm_file = file;
 	vma->vm_ops = &shmem_vm_ops;
 
-	khugepaged_enter_vma(vma, vma->vm_flags);
-
 	return 0;
 }
 
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
                   ` (7 preceding siblings ...)
  2022-04-04 20:02 ` [v3 PATCH 8/8] mm: mmap: register suitable readonly file vmas for khugepaged Yang Shi
@ 2022-04-05  0:16 ` Matthew Wilcox
  2022-04-05  0:48   ` Yang Shi
  2022-05-09 16:05 ` Vlastimil Babka
  9 siblings, 1 reply; 27+ messages in thread
From: Matthew Wilcox @ 2022-04-05  0:16 UTC (permalink / raw)
  To: Yang Shi
  Cc: vbabka, kirill.shutemov, linmiaohe, songliubraving, riel, ziy,
	tytso, akpm, linux-mm, linux-fsdevel, linux-kernel

On Mon, Apr 04, 2022 at 01:02:42PM -0700, Yang Shi wrote:
> The readonly FS THP relies on khugepaged to collapse THP for suitable
> vmas.  But it is kind of "random luck" for khugepaged to see the
> readonly FS vmas (see report: https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/) since currently the vmas are registered to khugepaged when:

I still don't see the point.  The effort should be put into
supporting large folios, not in making this hack work better.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
  2022-04-05  0:16 ` [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Matthew Wilcox
@ 2022-04-05  0:48   ` Yang Shi
  2022-04-27 20:58     ` Matthew Wilcox
  0 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-04-05  0:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Vlastimil Babka, Kirill A. Shutemov, Miaohe Lin, Song Liu,
	Rik van Riel, Zi Yan, Theodore Ts'o, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, Apr 4, 2022 at 5:16 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Apr 04, 2022 at 01:02:42PM -0700, Yang Shi wrote:
> > The readonly FS THP relies on khugepaged to collapse THP for suitable
> > vmas.  But it is kind of "random luck" for khugepaged to see the
> > readonly FS vmas (see report: https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/) since currently the vmas are registered to khugepaged when:
>
> I still don't see the point.  The effort should be put into
> supporting large folios, not in making this hack work better.

The series makes sense even though the hack is replaced by large
folios IMHO. The problem is the file VMAs may be not registered by
khugepaged consistently for some THP modes, for example, always,
regardless of whether it's readonly or the hack is gone or not. IIUC
even though the hack is replaced by the large folios, we still have
khugepaged to collapse pmd-mappable huge pages for both anonymous vmas
and file vmas, right? Or are you thinking about killing khugepaged
soon with supporting large folios?

Anyway it may make things clearer if the cover letter is rephrased to:

When khugepaged collapses file THPs, its behavior is not consistent.
It is kind of "random luck" for khugepaged to see the file vmas (see
report: https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/)
since currently the vmas are registered to khugepaged when:
  - Anon huge pmd page fault
  - VMA merge
  - MADV_HUGEPAGE
  - Shmem mmap

If the above conditions are not met, even though khugepaged is enabled
it won't see any file vma at all.  MADV_HUGEPAGE could be specified
explicitly to tell khugepaged to collapse this area, but when
khugepaged mode is "always" it should scan suitable vmas as long as
VM_NOHUGEPAGE is not set.

So make sure file vmas are registered to khugepaged to make the
behavior more consistent.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
  2022-04-05  0:48   ` Yang Shi
@ 2022-04-27 20:58     ` Matthew Wilcox
  2022-04-27 22:38       ` Yang Shi
  2022-04-27 23:16       ` Yang Shi
  0 siblings, 2 replies; 27+ messages in thread
From: Matthew Wilcox @ 2022-04-27 20:58 UTC (permalink / raw)
  To: Yang Shi
  Cc: Vlastimil Babka, Kirill A. Shutemov, Miaohe Lin, Song Liu,
	Rik van Riel, Zi Yan, Theodore Ts'o, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, Apr 04, 2022 at 05:48:49PM -0700, Yang Shi wrote:
> When khugepaged collapses file THPs, its behavior is not consistent.
> It is kind of "random luck" for khugepaged to see the file vmas (see
> report: https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/)
> since currently the vmas are registered to khugepaged when:
>   - Anon huge pmd page fault
>   - VMA merge
>   - MADV_HUGEPAGE
>   - Shmem mmap
> 
> If the above conditions are not met, even though khugepaged is enabled
> it won't see any file vma at all.  MADV_HUGEPAGE could be specified
> explicitly to tell khugepaged to collapse this area, but when
> khugepaged mode is "always" it should scan suitable vmas as long as
> VM_NOHUGEPAGE is not set.

I don't see that as being true at all.  The point of this hack was that
applications which really knew what they were doing could enable it.
It makes no sense to me that setting "always" by the sysadmin for shmem
also force-enables ROTHP, even for applications which aren't aware of it.

Most telling, I think, is that Song Liu hasn't weighed in on this at
all.  It's clearly not important to the original author.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
  2022-04-27 20:58     ` Matthew Wilcox
@ 2022-04-27 22:38       ` Yang Shi
  2022-04-27 23:16       ` Yang Shi
  1 sibling, 0 replies; 27+ messages in thread
From: Yang Shi @ 2022-04-27 22:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Vlastimil Babka, Kirill A. Shutemov, Miaohe Lin, Song Liu,
	Rik van Riel, Zi Yan, Theodore Ts'o, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Wed, Apr 27, 2022 at 1:59 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Apr 04, 2022 at 05:48:49PM -0700, Yang Shi wrote:
> > When khugepaged collapses file THPs, its behavior is not consistent.
> > It is kind of "random luck" for khugepaged to see the file vmas (see
> > report: https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/)
> > since currently the vmas are registered to khugepaged when:
> >   - Anon huge pmd page fault
> >   - VMA merge
> >   - MADV_HUGEPAGE
> >   - Shmem mmap
> >
> > If the above conditions are not met, even though khugepaged is enabled
> > it won't see any file vma at all.  MADV_HUGEPAGE could be specified
> > explicitly to tell khugepaged to collapse this area, but when
> > khugepaged mode is "always" it should scan suitable vmas as long as
> > VM_NOHUGEPAGE is not set.
>
> I don't see that as being true at all.  The point of this hack was that
> applications which really knew what they were doing could enable it.
> It makes no sense to me that setting "always" by the sysadmin for shmem
> also force-enables ROTHP, even for applications which aren't aware of it.
>
> Most telling, I think, is that Song Liu hasn't weighed in on this at
> all.  It's clearly not important to the original author.

I tend to agree that MADV_MADVISE should be preferred when this
feature (or hack) was designed in the original author's mind in the
first place. And "madvise" is definitely the recommended way to use
THP, but I don't think it means we should not care "always" and assume
nobody uses it otherwise the issue would have not been reported.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
  2022-04-27 20:58     ` Matthew Wilcox
  2022-04-27 22:38       ` Yang Shi
@ 2022-04-27 23:16       ` Yang Shi
  1 sibling, 0 replies; 27+ messages in thread
From: Yang Shi @ 2022-04-27 23:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Vlastimil Babka, Kirill A. Shutemov, Miaohe Lin, Song Liu,
	Rik van Riel, Zi Yan, Theodore Ts'o, Andrew Morton, Linux MM,
	Linux FS-devel Mailing List, Linux Kernel Mailing List

On Wed, Apr 27, 2022 at 1:59 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Apr 04, 2022 at 05:48:49PM -0700, Yang Shi wrote:
> > When khugepaged collapses file THPs, its behavior is not consistent.
> > It is kind of "random luck" for khugepaged to see the file vmas (see
> > report: https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/)
> > since currently the vmas are registered to khugepaged when:
> >   - Anon huge pmd page fault
> >   - VMA merge
> >   - MADV_HUGEPAGE
> >   - Shmem mmap
> >
> > If the above conditions are not met, even though khugepaged is enabled
> > it won't see any file vma at all.  MADV_HUGEPAGE could be specified
> > explicitly to tell khugepaged to collapse this area, but when
> > khugepaged mode is "always" it should scan suitable vmas as long as
> > VM_NOHUGEPAGE is not set.
>
> I don't see that as being true at all.  The point of this hack was that
> applications which really knew what they were doing could enable it.
> It makes no sense to me that setting "always" by the sysadmin for shmem
> also force-enables ROTHP, even for applications which aren't aware of it.
>
> Most telling, I think, is that Song Liu hasn't weighed in on this at
> all.  It's clearly not important to the original author.

Song Liu already acked the series, please see
https://lore.kernel.org/linux-mm/96F2D93B-2043-44C3-8062-C639372A0212@fb.com/


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE
  2022-04-04 20:02 ` [v3 PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE Yang Shi
@ 2022-05-09 12:25   ` Vlastimil Babka
  0 siblings, 0 replies; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-09 12:25 UTC (permalink / raw)
  To: Yang Shi, kirill.shutemov, linmiaohe, songliubraving, riel,
	willy, ziy, tytso, akpm
  Cc: linux-mm, linux-fsdevel, linux-kernel

On 4/4/22 22:02, Yang Shi wrote:
> MMF_VM_HUGEPAGE is set as long as the mm is available for khugepaged by
> khugepaged_enter(), not only when VM_HUGEPAGE is set on vma.  Correct
> the comment to avoid confusion.
> 
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> Acked-by: Song Liu <song@kernel.org>
> Signed-off-by: Yang Shi <shy828301@gmail.com>

Acked-by: Vlastmil Babka <vbabka@suse.cz>

> ---
>  include/linux/sched/coredump.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 4d9e3a656875..4d0a5be28b70 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -57,7 +57,8 @@ static inline int get_dumpable(struct mm_struct *mm)
>  #endif
>  					/* leave room for more dump flags */
>  #define MMF_VM_MERGEABLE	16	/* KSM may merge identical pages */
> -#define MMF_VM_HUGEPAGE		17	/* set when VM_HUGEPAGE is set on vma */
> +#define MMF_VM_HUGEPAGE		17	/* set when mm is available for
> +					   khugepaged */
>  /*
>   * This one-shot flag is dropped due to necessity of changing exe once again
>   * on NFS restore



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED
  2022-04-04 20:02 ` [v3 PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED Yang Shi
@ 2022-05-09 12:45   ` Vlastimil Babka
  0 siblings, 0 replies; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-09 12:45 UTC (permalink / raw)
  To: Yang Shi, kirill.shutemov, linmiaohe, songliubraving, riel,
	willy, ziy, tytso, akpm
  Cc: linux-mm, linux-fsdevel, linux-kernel

On 4/4/22 22:02, Yang Shi wrote:
> The hugepage_vma_check() called by khugepaged_enter_vma_merge() does
> check VM_NO_KHUGEPAGED. Remove the check from caller and move the check
> in hugepage_vma_check() up.
> 
> More checks may be run for VM_NO_KHUGEPAGED vmas, but MADV_HUGEPAGE is
> definitely not a hot path, so cleaner code does outweigh.
> 
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> Acked-by: Song Liu <song@kernel.org>
> Signed-off-by: Yang Shi <shy828301@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/khugepaged.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a4e5eaf3eb01..7d197d9e3258 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -365,8 +365,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>  		 * register it here without waiting a page fault that
>  		 * may not happen any time soon.
>  		 */
> -		if (!(*vm_flags & VM_NO_KHUGEPAGED) &&
> -				khugepaged_enter_vma_merge(vma, *vm_flags))
> +		if (khugepaged_enter_vma_merge(vma, *vm_flags))
>  			return -ENOMEM;
>  		break;
>  	case MADV_NOHUGEPAGE:
> @@ -445,6 +444,9 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
>  	if (!transhuge_vma_enabled(vma, vm_flags))
>  		return false;
>  
> +	if (vm_flags & VM_NO_KHUGEPAGED)
> +		return false;
> +
>  	if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
>  				vma->vm_pgoff, HPAGE_PMD_NR))
>  		return false;
> @@ -470,7 +472,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
>  		return false;
>  	if (vma_is_temporary_stack(vma))
>  		return false;
> -	return !(vm_flags & VM_NO_KHUGEPAGED);
> +
> +	return true;
>  }
>  
>  int __khugepaged_enter(struct mm_struct *mm)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 3/8] mm: khugepaged: skip DAX vma
  2022-04-04 20:02 ` [v3 PATCH 3/8] mm: khugepaged: skip DAX vma Yang Shi
@ 2022-05-09 12:46   ` Vlastimil Babka
  0 siblings, 0 replies; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-09 12:46 UTC (permalink / raw)
  To: Yang Shi, kirill.shutemov, linmiaohe, songliubraving, riel,
	willy, ziy, tytso, akpm
  Cc: linux-mm, linux-fsdevel, linux-kernel

On 4/4/22 22:02, Yang Shi wrote:
> The DAX vma may be seen by khugepaged when the mm has other khugepaged
> suitable vmas.  So khugepaged may try to collapse THP for DAX vma, but
> it will fail due to page sanity check, for example, page is not
> on LRU.
> 
> So it is not harmful, but it is definitely pointless to run khugepaged
> against DAX vma, so skip it in early check.
> 
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> Acked-by: Song Liu <song@kernel.org>
> Signed-off-by: Yang Shi <shy828301@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/khugepaged.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7d197d9e3258..964a4d2c942a 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -447,6 +447,10 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
>  	if (vm_flags & VM_NO_KHUGEPAGED)
>  		return false;
>  
> +	/* Don't run khugepaged against DAX vma */
> +	if (vma_is_dax(vma))
> +		return false;
> +
>  	if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
>  				vma->vm_pgoff, HPAGE_PMD_NR))
>  		return false;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 4/8] mm: thp: only regular file could be THP eligible
  2022-04-04 20:02 ` [v3 PATCH 4/8] mm: thp: only regular file could be THP eligible Yang Shi
@ 2022-05-09 13:41   ` Vlastimil Babka
  0 siblings, 0 replies; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-09 13:41 UTC (permalink / raw)
  To: Yang Shi, kirill.shutemov, linmiaohe, songliubraving, riel,
	willy, ziy, tytso, akpm
  Cc: linux-mm, linux-fsdevel, linux-kernel

On 4/4/22 22:02, Yang Shi wrote:
> Since commit a4aeaa06d45e ("mm: khugepaged: skip huge page collapse for
> special files"), khugepaged just collapses THP for regular file which is
> the intended usecase for readonly fs THP.  Only show regular file as THP
> eligible accordingly.
> 
> And make file_thp_enabled() available for khugepaged too in order to remove
> duplicate code.
> 
> Acked-by: Song Liu <song@kernel.org>
> Signed-off-by: Yang Shi <shy828301@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/huge_mm.h | 14 ++++++++++++++
>  mm/huge_memory.c        | 11 ++---------
>  mm/khugepaged.c         |  9 ++-------
>  3 files changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2999190adc22..62a6f667850d 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -172,6 +172,20 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>  	return false;
>  }
>  
> +static inline bool file_thp_enabled(struct vm_area_struct *vma)
> +{
> +	struct inode *inode;
> +
> +	if (!vma->vm_file)
> +		return false;
> +
> +	inode = vma->vm_file->f_inode;
> +
> +	return (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) &&
> +	       (vma->vm_flags & VM_EXEC) &&
> +	       !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> +}
> +
>  bool transparent_hugepage_active(struct vm_area_struct *vma);
>  
>  #define transparent_hugepage_use_zero_page()				\
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 2fe38212e07c..183b793fd28e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -68,13 +68,6 @@ static atomic_t huge_zero_refcount;
>  struct page *huge_zero_page __read_mostly;
>  unsigned long huge_zero_pfn __read_mostly = ~0UL;
>  
> -static inline bool file_thp_enabled(struct vm_area_struct *vma)
> -{
> -	return transhuge_vma_enabled(vma, vma->vm_flags) && vma->vm_file &&
> -	       !inode_is_open_for_write(vma->vm_file->f_inode) &&
> -	       (vma->vm_flags & VM_EXEC);
> -}
> -
>  bool transparent_hugepage_active(struct vm_area_struct *vma)
>  {
>  	/* The addr is used to check if the vma size fits */
> @@ -86,8 +79,8 @@ bool transparent_hugepage_active(struct vm_area_struct *vma)
>  		return __transparent_hugepage_enabled(vma);
>  	if (vma_is_shmem(vma))
>  		return shmem_huge_enabled(vma);
> -	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> -		return file_thp_enabled(vma);
> +	if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
> +		return true;
>  
>  	return false;
>  }
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 964a4d2c942a..609c1bc0a027 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -464,13 +464,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
>  		return false;
>  
>  	/* Only regular file is valid */
> -	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> -	    (vm_flags & VM_EXEC)) {
> -		struct inode *inode = vma->vm_file->f_inode;
> -
> -		return !inode_is_open_for_write(inode) &&
> -			S_ISREG(inode->i_mode);
> -	}
> +	if (file_thp_enabled(vma))
> +		return true;
>  
>  	if (!vma->anon_vma || vma->vm_ops)
>  		return false;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 5/8] mm: khugepaged: make khugepaged_enter() void function
  2022-04-04 20:02 ` [v3 PATCH 5/8] mm: khugepaged: make khugepaged_enter() void function Yang Shi
@ 2022-05-09 13:46   ` Vlastimil Babka
  0 siblings, 0 replies; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-09 13:46 UTC (permalink / raw)
  To: Yang Shi, kirill.shutemov, linmiaohe, songliubraving, riel,
	willy, ziy, tytso, akpm
  Cc: linux-mm, linux-fsdevel, linux-kernel

On 4/4/22 22:02, Yang Shi wrote:
> The most callers of khugepaged_enter() don't care about the return
> value.  Only dup_mmap(), anonymous THP page fault and MADV_HUGEPAGE handle
> the error by returning -ENOMEM.  Actually it is not harmful for them to
> ignore the error case either.  It also sounds overkilling to fail fork()
> and page fault early due to khugepaged_enter() error, and MADV_HUGEPAGE
> does set VM_HUGEPAGE flag regardless of the error.
> 
> Acked-by: Song Liu <song@kernel.org>
> Signed-off-by: Yang Shi <shy828301@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c
  2022-04-04 20:02 ` [v3 PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c Yang Shi
@ 2022-05-09 15:31   ` Vlastimil Babka
  2022-05-09 23:00     ` Yang Shi
  0 siblings, 1 reply; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-09 15:31 UTC (permalink / raw)
  To: Yang Shi, kirill.shutemov, linmiaohe, songliubraving, riel,
	willy, ziy, tytso, akpm
  Cc: linux-mm, linux-fsdevel, linux-kernel

On 4/4/22 22:02, Yang Shi wrote:
> To reuse hugepage_vma_check() for khugepaged_enter() so that we could
> remove some duplicate code.  But moving hugepage_vma_check() to
> khugepaged.h needs to include huge_mm.h in it, it seems not optimal to
> bloat khugepaged.h.
> 
> And the khugepaged_* functions actually are wrappers for some non-inline
> functions, so it seems the benefits are not too much to keep them inline.
> 
> So move the khugepaged_* functions to khugepaged.c, any callers just
> need to include khugepaged.h which is quite small.  For example, the
> following patches will call khugepaged_enter() in filemap page fault path
> for regular filesystems to make readonly FS THP collapse more consistent.
> The  filemap.c just needs to include khugepaged.h.

This last part is inaccurate in v3?

> Acked-by: Song Liu <song@kernel.org>
> Signed-off-by: Yang Shi <shy828301@gmail.com>

I think moving the tiny wrappers is unnecessary.

How about just making hugepage_vma_check() not static and declare it in
khugepaged.h, then it can be used from khugepaged_enter() in the same file
and AFAICS no need to include huge_mm.h there?

> ---
>  include/linux/khugepaged.h | 37 ++++++-------------------------------
>  mm/khugepaged.c            | 20 ++++++++++++++++++++
>  2 files changed, 26 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> index 0423d3619f26..6acf9701151e 100644
> --- a/include/linux/khugepaged.h
> +++ b/include/linux/khugepaged.h
> @@ -2,10 +2,6 @@
>  #ifndef _LINUX_KHUGEPAGED_H
>  #define _LINUX_KHUGEPAGED_H
>  
> -#include <linux/sched/coredump.h> /* MMF_VM_HUGEPAGE */
> -#include <linux/shmem_fs.h>
> -
> -
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  extern struct attribute_group khugepaged_attr_group;
>  
> @@ -16,6 +12,12 @@ extern void __khugepaged_enter(struct mm_struct *mm);
>  extern void __khugepaged_exit(struct mm_struct *mm);
>  extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
>  				       unsigned long vm_flags);
> +extern void khugepaged_fork(struct mm_struct *mm,
> +			    struct mm_struct *oldmm);
> +extern void khugepaged_exit(struct mm_struct *mm);
> +extern void khugepaged_enter(struct vm_area_struct *vma,
> +			     unsigned long vm_flags);
> +
>  extern void khugepaged_min_free_kbytes_update(void);
>  #ifdef CONFIG_SHMEM
>  extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr);
> @@ -33,36 +35,9 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
>  #define khugepaged_always()				\
>  	(transparent_hugepage_flags &			\
>  	 (1<<TRANSPARENT_HUGEPAGE_FLAG))
> -#define khugepaged_req_madv()					\
> -	(transparent_hugepage_flags &				\
> -	 (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
>  #define khugepaged_defrag()					\
>  	(transparent_hugepage_flags &				\
>  	 (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG))
> -
> -static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> -{
> -	if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
> -		__khugepaged_enter(mm);
> -}
> -
> -static inline void khugepaged_exit(struct mm_struct *mm)
> -{
> -	if (test_bit(MMF_VM_HUGEPAGE, &mm->flags))
> -		__khugepaged_exit(mm);
> -}
> -
> -static inline void khugepaged_enter(struct vm_area_struct *vma,
> -				   unsigned long vm_flags)
> -{
> -	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags))
> -		if ((khugepaged_always() ||
> -		     (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) ||
> -		     (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) &&
> -		    !(vm_flags & VM_NOHUGEPAGE) &&
> -		    !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> -			__khugepaged_enter(vma->vm_mm);
> -}
>  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
>  static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
>  {
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b69eda934d70..ec5b0a691d87 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -556,6 +556,26 @@ void __khugepaged_exit(struct mm_struct *mm)
>  	}
>  }
>  
> +void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> +{
> +	if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
> +		__khugepaged_enter(mm);
> +}
> +
> +void khugepaged_exit(struct mm_struct *mm)
> +{
> +	if (test_bit(MMF_VM_HUGEPAGE, &mm->flags))
> +		__khugepaged_exit(mm);
> +}
> +
> +void khugepaged_enter(struct vm_area_struct *vma, unsigned long vm_flags)
> +{
> +	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> +	    khugepaged_enabled())
> +		if (hugepage_vma_check(vma, vm_flags))
> +			__khugepaged_enter(vma->vm_mm);
> +}
> +
>  static void release_pte_page(struct page *page)
>  {
>  	mod_node_page_state(page_pgdat(page),



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 7/8] mm: khugepaged: introduce khugepaged_enter_vma() helper
  2022-04-04 20:02 ` [v3 PATCH 7/8] mm: khugepaged: introduce khugepaged_enter_vma() helper Yang Shi
@ 2022-05-09 15:39   ` Vlastimil Babka
  0 siblings, 0 replies; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-09 15:39 UTC (permalink / raw)
  To: Yang Shi, kirill.shutemov, linmiaohe, songliubraving, riel,
	willy, ziy, tytso, akpm
  Cc: linux-mm, linux-fsdevel, linux-kernel

On 4/4/22 22:02, Yang Shi wrote:
> The khugepaged_enter_vma_merge() actually does as the same thing as the
> khugepaged_enter() section called by shmem_mmap(), so consolidate them
> into one helper and rename it to khugepaged_enter_vma().
> 
> Signed-off-by: Yang Shi <shy828301@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 8/8] mm: mmap: register suitable readonly file vmas for khugepaged
  2022-04-04 20:02 ` [v3 PATCH 8/8] mm: mmap: register suitable readonly file vmas for khugepaged Yang Shi
@ 2022-05-09 15:43   ` Vlastimil Babka
  0 siblings, 0 replies; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-09 15:43 UTC (permalink / raw)
  To: Yang Shi, kirill.shutemov, linmiaohe, songliubraving, riel,
	willy, ziy, tytso, akpm
  Cc: linux-mm, linux-fsdevel, linux-kernel

On 4/4/22 22:02, Yang Shi wrote:
> The readonly FS THP relies on khugepaged to collapse THP for suitable
> vmas.  But it is kind of "random luck" for khugepaged to see the
> readonly FS vmas (https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/)
> since currently the vmas are registered to khugepaged when:
>   - Anon huge pmd page fault
>   - VMA merge
>   - MADV_HUGEPAGE
>   - Shmem mmap
> 
> If the above conditions are not met, even though khugepaged is enabled
> it won't see readonly FS vmas at all.  MADV_HUGEPAGE could be specified
> explicitly to tell khugepaged to collapse this area, but when khugepaged
> mode is "always" it should scan suitable vmas as long as VM_NOHUGEPAGE
> is not set.
> 
> So make sure readonly FS vmas are registered to khugepaged to make the
> behavior more consistent.
> 
> Registering suitable vmas in common mmap path, that could cover both
> readonly FS vmas and shmem vmas, so removed the khugepaged calls in
> shmem.c.
> 
> Still need to keep the khugepaged call in vma_merge() since vma_merge()
> is called in a lot of places, for example, madvise, mprotect, etc.
> 
> Reported-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Yang Shi <shy828301@gmail.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/mmap.c  | 6 ++++++
>  mm/shmem.c | 4 ----
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 604c8dece5dd..616ebbc2d052 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1842,6 +1842,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  	}
>  
>  	vma_link(mm, vma, prev, rb_link, rb_parent);
> +
> +	/*
> +	 * vma_merge() calls khugepaged_enter_vma() either, the below
> +	 * call covers the non-merge case.
> +	 */
> +	khugepaged_enter_vma(vma, vma->vm_flags);
>  	/* Once vma denies write, undo our temporary denial count */
>  unmap_writable:
>  	if (file && vm_flags & VM_SHARED)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 92eca974771d..0c448080d210 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -34,7 +34,6 @@
>  #include <linux/export.h>
>  #include <linux/swap.h>
>  #include <linux/uio.h>
> -#include <linux/khugepaged.h>
>  #include <linux/hugetlb.h>
>  #include <linux/fs_parser.h>
>  #include <linux/swapfile.h>
> @@ -2239,7 +2238,6 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
>  
>  	file_accessed(file);
>  	vma->vm_ops = &shmem_vm_ops;
> -	khugepaged_enter_vma(vma, vma->vm_flags);
>  	return 0;
>  }
>  
> @@ -4132,8 +4130,6 @@ int shmem_zero_setup(struct vm_area_struct *vma)
>  	vma->vm_file = file;
>  	vma->vm_ops = &shmem_vm_ops;
>  
> -	khugepaged_enter_vma(vma, vma->vm_flags);
> -
>  	return 0;
>  }
>  



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
  2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
                   ` (8 preceding siblings ...)
  2022-04-05  0:16 ` [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Matthew Wilcox
@ 2022-05-09 16:05 ` Vlastimil Babka
  2022-05-09 20:34   ` Yang Shi
  9 siblings, 1 reply; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-09 16:05 UTC (permalink / raw)
  To: Yang Shi, kirill.shutemov, linmiaohe, songliubraving, riel,
	willy, ziy, tytso, akpm
  Cc: linux-mm, linux-fsdevel, linux-kernel

On 4/4/22 22:02, Yang Shi wrote:
>  include/linux/huge_mm.h        | 14 ++++++++++++
>  include/linux/khugepaged.h     | 59 ++++++++++++---------------------------------------
>  include/linux/sched/coredump.h |  3 ++-
>  kernel/fork.c                  |  4 +---
>  mm/huge_memory.c               | 15 ++++---------
>  mm/khugepaged.c                | 76 +++++++++++++++++++++++++++++++++++++-----------------------------
>  mm/mmap.c                      | 14 ++++++++----
>  mm/shmem.c                     | 12 -----------
>  8 files changed, 88 insertions(+), 109 deletions(-)
 
Resending my general feedback from mm-commits thread to include the
public ML's:

There's modestly less lines in the end, some duplicate code removed,
special casing in shmem.c removed, that's all good as it is. Also patch 8/8
become quite boring in v3, no need to change individual filesystems and also
no hook in fault path, just the common mmap path. So I would just handle
patch 6 differently as I just replied to it, and acked the rest.

That said it's still unfortunately rather a mess of functions that have
similar names. transhuge_vma_enabled(vma). hugepage_vma_check(vma),
transparent_hugepage_active(vma), transhuge_vma_suitable(vma, addr)?
So maybe still some space for further cleanups. But the series is fine as it
is so we don't have to wait for it now.

We could also consider that the tracking of which mm is to be scanned is
modelled after ksm which has its own madvise flag, but also no "always"
mode. What if for THP we only tracked actual THP madvised mm's, and in
"always" mode just scanned all vm's, would that allow ripping out some code
perhaps, while not adding too many unnecessary scans? If some processes are
being scanned without any effect, maybe track success separately, and scan
them less frequently etc. That could be ultimately more efficinet than
painfully tracking just *eligibility* for scanning in "always" mode?

Even more radical thing to consider (maybe that's a LSF/MM level topic, too
bad :) is that we scan pagetables in ksm, khugepaged, numa balancing, soon
in MGLRU, and I probably forgot something else. Maybe time to think about
unifying those scanners?
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
  2022-05-09 16:05 ` Vlastimil Babka
@ 2022-05-09 20:34   ` Yang Shi
  2022-05-10  7:35     ` Vlastimil Babka
  0 siblings, 1 reply; 27+ messages in thread
From: Yang Shi @ 2022-05-09 20:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Miaohe Lin, Song Liu, Rik van Riel,
	Matthew Wilcox, Zi Yan, Theodore Ts'o, Andrew Morton,
	Linux MM, Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, May 9, 2022 at 9:05 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 4/4/22 22:02, Yang Shi wrote:
> >  include/linux/huge_mm.h        | 14 ++++++++++++
> >  include/linux/khugepaged.h     | 59 ++++++++++++---------------------------------------
> >  include/linux/sched/coredump.h |  3 ++-
> >  kernel/fork.c                  |  4 +---
> >  mm/huge_memory.c               | 15 ++++---------
> >  mm/khugepaged.c                | 76 +++++++++++++++++++++++++++++++++++++-----------------------------
> >  mm/mmap.c                      | 14 ++++++++----
> >  mm/shmem.c                     | 12 -----------
> >  8 files changed, 88 insertions(+), 109 deletions(-)
>
> Resending my general feedback from mm-commits thread to include the
> public ML's:
>
> There's modestly less lines in the end, some duplicate code removed,
> special casing in shmem.c removed, that's all good as it is. Also patch 8/8
> become quite boring in v3, no need to change individual filesystems and also
> no hook in fault path, just the common mmap path. So I would just handle
> patch 6 differently as I just replied to it, and acked the rest.
>
> That said it's still unfortunately rather a mess of functions that have
> similar names. transhuge_vma_enabled(vma). hugepage_vma_check(vma),
> transparent_hugepage_active(vma), transhuge_vma_suitable(vma, addr)?
> So maybe still some space for further cleanups. But the series is fine as it
> is so we don't have to wait for it now.

Yeah, I agree that we do have a lot thp checks. Will find some time to
look into it deeper later.

>
> We could also consider that the tracking of which mm is to be scanned is
> modelled after ksm which has its own madvise flag, but also no "always"
> mode. What if for THP we only tracked actual THP madvised mm's, and in
> "always" mode just scanned all vm's, would that allow ripping out some code
> perhaps, while not adding too many unnecessary scans? If some processes are

Do you mean add all mm(s) to the scan list unconditionally? I don't
think it will scale.

> being scanned without any effect, maybe track success separately, and scan
> them less frequently etc. That could be ultimately more efficinet than
> painfully tracking just *eligibility* for scanning in "always" mode?

Sounds like we need a couple of different lists, for example, inactive
and active? And promote or demote mm(s) between the two lists? TBH I
don't see too many benefits at the moment. Or I misunderstood you?

>
> Even more radical thing to consider (maybe that's a LSF/MM level topic, too
> bad :) is that we scan pagetables in ksm, khugepaged, numa balancing, soon
> in MGLRU, and I probably forgot something else. Maybe time to think about
> unifying those scanners?

We do have pagewalk (walk_page_range()) which is used by a couple of
mm stuff, for example, mlock, mempolicy, mprotect, etc. I'm not sure
whether it is feasible for khugepaged, ksm, etc, or not since I didn't
look that hard. But I agree it should be worth looking at.

>
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c
  2022-05-09 15:31   ` Vlastimil Babka
@ 2022-05-09 23:00     ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2022-05-09 23:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Miaohe Lin, Song Liu, Rik van Riel,
	Matthew Wilcox, Zi Yan, Theodore Ts'o, Andrew Morton,
	Linux MM, Linux FS-devel Mailing List, Linux Kernel Mailing List

On Mon, May 9, 2022 at 8:31 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 4/4/22 22:02, Yang Shi wrote:
> > To reuse hugepage_vma_check() for khugepaged_enter() so that we could
> > remove some duplicate code.  But moving hugepage_vma_check() to
> > khugepaged.h needs to include huge_mm.h in it, it seems not optimal to
> > bloat khugepaged.h.
> >
> > And the khugepaged_* functions actually are wrappers for some non-inline
> > functions, so it seems the benefits are not too much to keep them inline.
> >
> > So move the khugepaged_* functions to khugepaged.c, any callers just
> > need to include khugepaged.h which is quite small.  For example, the
> > following patches will call khugepaged_enter() in filemap page fault path
> > for regular filesystems to make readonly FS THP collapse more consistent.
> > The  filemap.c just needs to include khugepaged.h.
>
> This last part is inaccurate in v3?

Yeah, thanks for catching this. Since the patch will be reworked, so
the commit log will be reworked as well.

>
> > Acked-by: Song Liu <song@kernel.org>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
>
> I think moving the tiny wrappers is unnecessary.
>
> How about just making hugepage_vma_check() not static and declare it in
> khugepaged.h, then it can be used from khugepaged_enter() in the same file
> and AFAICS no need to include huge_mm.h there?

Sounds good to me, will fix it in v4. Thanks for reviewing and acking
the series.

>
> > ---
> >  include/linux/khugepaged.h | 37 ++++++-------------------------------
> >  mm/khugepaged.c            | 20 ++++++++++++++++++++
> >  2 files changed, 26 insertions(+), 31 deletions(-)
> >
> > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > index 0423d3619f26..6acf9701151e 100644
> > --- a/include/linux/khugepaged.h
> > +++ b/include/linux/khugepaged.h
> > @@ -2,10 +2,6 @@
> >  #ifndef _LINUX_KHUGEPAGED_H
> >  #define _LINUX_KHUGEPAGED_H
> >
> > -#include <linux/sched/coredump.h> /* MMF_VM_HUGEPAGE */
> > -#include <linux/shmem_fs.h>
> > -
> > -
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  extern struct attribute_group khugepaged_attr_group;
> >
> > @@ -16,6 +12,12 @@ extern void __khugepaged_enter(struct mm_struct *mm);
> >  extern void __khugepaged_exit(struct mm_struct *mm);
> >  extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
> >                                      unsigned long vm_flags);
> > +extern void khugepaged_fork(struct mm_struct *mm,
> > +                         struct mm_struct *oldmm);
> > +extern void khugepaged_exit(struct mm_struct *mm);
> > +extern void khugepaged_enter(struct vm_area_struct *vma,
> > +                          unsigned long vm_flags);
> > +
> >  extern void khugepaged_min_free_kbytes_update(void);
> >  #ifdef CONFIG_SHMEM
> >  extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr);
> > @@ -33,36 +35,9 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
> >  #define khugepaged_always()                          \
> >       (transparent_hugepage_flags &                   \
> >        (1<<TRANSPARENT_HUGEPAGE_FLAG))
> > -#define khugepaged_req_madv()                                        \
> > -     (transparent_hugepage_flags &                           \
> > -      (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
> >  #define khugepaged_defrag()                                  \
> >       (transparent_hugepage_flags &                           \
> >        (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG))
> > -
> > -static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> > -{
> > -     if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
> > -             __khugepaged_enter(mm);
> > -}
> > -
> > -static inline void khugepaged_exit(struct mm_struct *mm)
> > -{
> > -     if (test_bit(MMF_VM_HUGEPAGE, &mm->flags))
> > -             __khugepaged_exit(mm);
> > -}
> > -
> > -static inline void khugepaged_enter(struct vm_area_struct *vma,
> > -                                unsigned long vm_flags)
> > -{
> > -     if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags))
> > -             if ((khugepaged_always() ||
> > -                  (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) ||
> > -                  (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) &&
> > -                 !(vm_flags & VM_NOHUGEPAGE) &&
> > -                 !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > -                     __khugepaged_enter(vma->vm_mm);
> > -}
> >  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
> >  static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> >  {
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index b69eda934d70..ec5b0a691d87 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -556,6 +556,26 @@ void __khugepaged_exit(struct mm_struct *mm)
> >       }
> >  }
> >
> > +void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> > +{
> > +     if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
> > +             __khugepaged_enter(mm);
> > +}
> > +
> > +void khugepaged_exit(struct mm_struct *mm)
> > +{
> > +     if (test_bit(MMF_VM_HUGEPAGE, &mm->flags))
> > +             __khugepaged_exit(mm);
> > +}
> > +
> > +void khugepaged_enter(struct vm_area_struct *vma, unsigned long vm_flags)
> > +{
> > +     if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> > +         khugepaged_enabled())
> > +             if (hugepage_vma_check(vma, vm_flags))
> > +                     __khugepaged_enter(vma->vm_mm);
> > +}
> > +
> >  static void release_pte_page(struct page *page)
> >  {
> >       mod_node_page_state(page_pgdat(page),
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
  2022-05-09 20:34   ` Yang Shi
@ 2022-05-10  7:35     ` Vlastimil Babka
  2022-05-10 19:25       ` Yang Shi
  0 siblings, 1 reply; 27+ messages in thread
From: Vlastimil Babka @ 2022-05-10  7:35 UTC (permalink / raw)
  To: Yang Shi
  Cc: Kirill A. Shutemov, Miaohe Lin, Song Liu, Rik van Riel,
	Matthew Wilcox, Zi Yan, Theodore Ts'o, Andrew Morton,
	Linux MM, Linux FS-devel Mailing List, Linux Kernel Mailing List

On 5/9/22 22:34, Yang Shi wrote:
> On Mon, May 9, 2022 at 9:05 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 4/4/22 22:02, Yang Shi wrote:
>> >  include/linux/huge_mm.h        | 14 ++++++++++++
>> >  include/linux/khugepaged.h     | 59 ++++++++++++---------------------------------------
>> >  include/linux/sched/coredump.h |  3 ++-
>> >  kernel/fork.c                  |  4 +---
>> >  mm/huge_memory.c               | 15 ++++---------
>> >  mm/khugepaged.c                | 76 +++++++++++++++++++++++++++++++++++++-----------------------------
>> >  mm/mmap.c                      | 14 ++++++++----
>> >  mm/shmem.c                     | 12 -----------
>> >  8 files changed, 88 insertions(+), 109 deletions(-)
>>
>> Resending my general feedback from mm-commits thread to include the
>> public ML's:
>>
>> There's modestly less lines in the end, some duplicate code removed,
>> special casing in shmem.c removed, that's all good as it is. Also patch 8/8
>> become quite boring in v3, no need to change individual filesystems and also
>> no hook in fault path, just the common mmap path. So I would just handle
>> patch 6 differently as I just replied to it, and acked the rest.
>>
>> That said it's still unfortunately rather a mess of functions that have
>> similar names. transhuge_vma_enabled(vma). hugepage_vma_check(vma),
>> transparent_hugepage_active(vma), transhuge_vma_suitable(vma, addr)?
>> So maybe still some space for further cleanups. But the series is fine as it
>> is so we don't have to wait for it now.
> 
> Yeah, I agree that we do have a lot thp checks. Will find some time to
> look into it deeper later.

Thanks.

>>
>> We could also consider that the tracking of which mm is to be scanned is
>> modelled after ksm which has its own madvise flag, but also no "always"
>> mode. What if for THP we only tracked actual THP madvised mm's, and in
>> "always" mode just scanned all vm's, would that allow ripping out some code
>> perhaps, while not adding too many unnecessary scans? If some processes are
> 
> Do you mean add all mm(s) to the scan list unconditionally? I don't
> think it will scale.

It might be interesting to find out how many mm's (percentage of all mm's)
are typically in the list with "always" enabled. I wouldn't be surprised if
it was nearly all of them. Having at least one large enough anonymous area
sounds like something all processes would have these days?

>> being scanned without any effect, maybe track success separately, and scan
>> them less frequently etc. That could be ultimately more efficinet than
>> painfully tracking just *eligibility* for scanning in "always" mode?
> 
> Sounds like we need a couple of different lists, for example, inactive
> and active? And promote or demote mm(s) between the two lists? TBH I
> don't see too many benefits at the moment. Or I misunderstood you?

Yeah, something like that. It would of course require finding out whether
khugepaged is consuming too much cpu uselessly these days while not
processing fast enough mm's where it succeeds more.

>>
>> Even more radical thing to consider (maybe that's a LSF/MM level topic, too
>> bad :) is that we scan pagetables in ksm, khugepaged, numa balancing, soon
>> in MGLRU, and I probably forgot something else. Maybe time to think about
>> unifying those scanners?
> 
> We do have pagewalk (walk_page_range()) which is used by a couple of
> mm stuff, for example, mlock, mempolicy, mprotect, etc. I'm not sure
> whether it is feasible for khugepaged, ksm, etc, or not since I didn't
> look that hard. But I agree it should be worth looking at.

pagewalk is a framework to simplify writing code that processes page tables
for a given one-off task, yeah. But this would be something a bit different,
e.g. a kernel thread that does the sum of what khugepaged/ksm/etc do. Numa
balancing uses task_work instead of kthread so that would require
consideration on which mechanism the unified daemon would use.

>>
>>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
  2022-05-10  7:35     ` Vlastimil Babka
@ 2022-05-10 19:25       ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2022-05-10 19:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Miaohe Lin, Song Liu, Rik van Riel,
	Matthew Wilcox, Zi Yan, Theodore Ts'o, Andrew Morton,
	Linux MM, Linux FS-devel Mailing List, Linux Kernel Mailing List

On Tue, May 10, 2022 at 12:35 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 5/9/22 22:34, Yang Shi wrote:
> > On Mon, May 9, 2022 at 9:05 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >> On 4/4/22 22:02, Yang Shi wrote:
> >> >  include/linux/huge_mm.h        | 14 ++++++++++++
> >> >  include/linux/khugepaged.h     | 59 ++++++++++++---------------------------------------
> >> >  include/linux/sched/coredump.h |  3 ++-
> >> >  kernel/fork.c                  |  4 +---
> >> >  mm/huge_memory.c               | 15 ++++---------
> >> >  mm/khugepaged.c                | 76 +++++++++++++++++++++++++++++++++++++-----------------------------
> >> >  mm/mmap.c                      | 14 ++++++++----
> >> >  mm/shmem.c                     | 12 -----------
> >> >  8 files changed, 88 insertions(+), 109 deletions(-)
> >>
> >> Resending my general feedback from mm-commits thread to include the
> >> public ML's:
> >>
> >> There's modestly less lines in the end, some duplicate code removed,
> >> special casing in shmem.c removed, that's all good as it is. Also patch 8/8
> >> become quite boring in v3, no need to change individual filesystems and also
> >> no hook in fault path, just the common mmap path. So I would just handle
> >> patch 6 differently as I just replied to it, and acked the rest.
> >>
> >> That said it's still unfortunately rather a mess of functions that have
> >> similar names. transhuge_vma_enabled(vma). hugepage_vma_check(vma),
> >> transparent_hugepage_active(vma), transhuge_vma_suitable(vma, addr)?
> >> So maybe still some space for further cleanups. But the series is fine as it
> >> is so we don't have to wait for it now.
> >
> > Yeah, I agree that we do have a lot thp checks. Will find some time to
> > look into it deeper later.
>
> Thanks.
>
> >>
> >> We could also consider that the tracking of which mm is to be scanned is
> >> modelled after ksm which has its own madvise flag, but also no "always"
> >> mode. What if for THP we only tracked actual THP madvised mm's, and in
> >> "always" mode just scanned all vm's, would that allow ripping out some code
> >> perhaps, while not adding too many unnecessary scans? If some processes are
> >
> > Do you mean add all mm(s) to the scan list unconditionally? I don't
> > think it will scale.
>
> It might be interesting to find out how many mm's (percentage of all mm's)
> are typically in the list with "always" enabled. I wouldn't be surprised if
> it was nearly all of them. Having at least one large enough anonymous area
> sounds like something all processes would have these days?

Just did a simple test on my Fedora VM with "always", which is almost idle.

The number of user processes (excluding kernel thread) is:
[yangs@localhost ~]$ ps -aux --no-headers | awk '{if ($5 > 0) print $5}' | wc -l
113

The number of mm on khugepaged scan list counted by a simple drgn
script is 56. The below is the drgn script:
>>> i = 0
>>> mm_list = prog['khugepaged_scan'].mm_head.address_of_()
>>> for mm in list_for_each(mm_list):
...     i = i + 1
...
>>> print(i)

So 50% processes on the list. For busy machines, the percentage may be
higher. And the most big enough processes (with large anon mapping)
should be on the list.

>
> >> being scanned without any effect, maybe track success separately, and scan
> >> them less frequently etc. That could be ultimately more efficinet than
> >> painfully tracking just *eligibility* for scanning in "always" mode?
> >
> > Sounds like we need a couple of different lists, for example, inactive
> > and active? And promote or demote mm(s) between the two lists? TBH I
> > don't see too many benefits at the moment. Or I misunderstood you?
>
> Yeah, something like that. It would of course require finding out whether
> khugepaged is consuming too much cpu uselessly these days while not
> processing fast enough mm's where it succeeds more.

Yeah, currently we don't have such statistics at mm or vma
granularity. But we may be able to get some stats by some
post-processing with trace events.

>
> >>
> >> Even more radical thing to consider (maybe that's a LSF/MM level topic, too
> >> bad :) is that we scan pagetables in ksm, khugepaged, numa balancing, soon
> >> in MGLRU, and I probably forgot something else. Maybe time to think about
> >> unifying those scanners?
> >
> > We do have pagewalk (walk_page_range()) which is used by a couple of
> > mm stuff, for example, mlock, mempolicy, mprotect, etc. I'm not sure
> > whether it is feasible for khugepaged, ksm, etc, or not since I didn't
> > look that hard. But I agree it should be worth looking at.
>
> pagewalk is a framework to simplify writing code that processes page tables
> for a given one-off task, yeah. But this would be something a bit different,
> e.g. a kernel thread that does the sum of what khugepaged/ksm/etc do. Numa
> balancing uses task_work instead of kthread so that would require
> consideration on which mechanism the unified daemon would use.

Aha, OK, you mean consolidate khugepaged, ksmd, etc into one kthread.
TBH I don't know whether that will work out or not. Maybe the first
step is to use the same page table walk framework for all of them?

>
> >>
> >>
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-05-10 19:25 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 20:02 [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
2022-04-04 20:02 ` [v3 PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE Yang Shi
2022-05-09 12:25   ` Vlastimil Babka
2022-04-04 20:02 ` [v3 PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED Yang Shi
2022-05-09 12:45   ` Vlastimil Babka
2022-04-04 20:02 ` [v3 PATCH 3/8] mm: khugepaged: skip DAX vma Yang Shi
2022-05-09 12:46   ` Vlastimil Babka
2022-04-04 20:02 ` [v3 PATCH 4/8] mm: thp: only regular file could be THP eligible Yang Shi
2022-05-09 13:41   ` Vlastimil Babka
2022-04-04 20:02 ` [v3 PATCH 5/8] mm: khugepaged: make khugepaged_enter() void function Yang Shi
2022-05-09 13:46   ` Vlastimil Babka
2022-04-04 20:02 ` [v3 PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c Yang Shi
2022-05-09 15:31   ` Vlastimil Babka
2022-05-09 23:00     ` Yang Shi
2022-04-04 20:02 ` [v3 PATCH 7/8] mm: khugepaged: introduce khugepaged_enter_vma() helper Yang Shi
2022-05-09 15:39   ` Vlastimil Babka
2022-04-04 20:02 ` [v3 PATCH 8/8] mm: mmap: register suitable readonly file vmas for khugepaged Yang Shi
2022-05-09 15:43   ` Vlastimil Babka
2022-04-05  0:16 ` [v3 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Matthew Wilcox
2022-04-05  0:48   ` Yang Shi
2022-04-27 20:58     ` Matthew Wilcox
2022-04-27 22:38       ` Yang Shi
2022-04-27 23:16       ` Yang Shi
2022-05-09 16:05 ` Vlastimil Babka
2022-05-09 20:34   ` Yang Shi
2022-05-10  7:35     ` Vlastimil Babka
2022-05-10 19:25       ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).