All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7
@ 2022-07-20 14:05 Zach O'Keefe
  2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe

Hey Andrew,

These are a few requested cleanups for the "mm: userspace hugepage collapse, v7"
series[1] currently in mm-unstable.  Please considering squashing them into the
v7 series.  Note that https://lkml.kernel.org/r/Ys4aTRqWIbjNs1mI@google.com is
still outstanding, and that the series is incomplete until a suitable resolution
is reached there.

Thanks, and apologies for the multiple fixes / adjustments required here.
Zach

[1] https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/


Zach O'Keefe (4):
  mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR
  mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
  mm/khugepaged: delay computation of hpage boundaries until use
  Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint"

 include/trace/events/huge_memory.h | 22 ----------
 mm/khugepaged.c                    | 64 +++++++++++++++++-------------
 2 files changed, 37 insertions(+), 49 deletions(-)

-- 
2.37.0.170.g444d1eabd0-goog



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR
  2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
@ 2022-07-20 14:06 ` Zach O'Keefe
  2022-07-20 17:26   ` Yang Shi
  2022-07-21  0:41   ` David Rientjes
  2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe

Minimally, node_load[] entries just need to be able to hold the maximum
value of HPAGE_PMD_NR, which is compile-time defined per-arch based on
PMD_SHIFT and PAGE_SHIFT.  node_load[] is only written either via memset(),
or with via post-increment. struct collapse_control may be allocated
via kmalloc() in other collapse contexts, and MAX_NUMNODES may be
arbitrarily large. #define the underlying type of node_load[] based off
HPAGE_PMD_NR to avoid excessive memory allocated for this struct.

Fixes: 3b07f3bb225a ("mm/khugepaged: add struct collapse_control")
Link: https://lore.kernel.org/linux-mm/Ys2CeIm%2FQmQwWh9a@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 mm/khugepaged.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 69990dacde14..ecd28bfeab60 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -92,8 +92,11 @@ struct collapse_control {
 	bool is_khugepaged;
 
 	/* Num pages scanned per node */
-	int node_load[MAX_NUMNODES];
-
+#if HPAGE_PMD_ORDER < 16
+	u16 node_load[MAX_NUMNODES];
+#else
+	u32 node_load[MAX_NUMNODES];
+#endif
 	/* Last target selected in hpage_collapse_find_target_node() */
 	int last_target_node;
 };
-- 
2.37.0.170.g444d1eabd0-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
  2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
  2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
@ 2022-07-20 14:06 ` Zach O'Keefe
  2022-07-20 17:27   ` Yang Shi
  2022-07-21  0:42   ` David Rientjes
  2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
  2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
  3 siblings, 2 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe

cc->is_khugepaged is used to predicate the khugepaged-only behavior
of enforcing khugepaged heuristics limited by the sysfs knobs
khugepaged_max_ptes_[none|swap|shared].

In branches where khugepaged_max_ptes_* is checked, consistently check
cc->is_khugepaged first.  Also, local counters (for comparison vs
khugepaged_max_ptes_* limits) were previously incremented in the
comparison expression.  Some of these counters (unmapped) are
additionally used outside of khugepaged_max_ptes_* enforcement, and
all counters are communicated in tracepoints.  Move the correct
accounting of these counters before branching statements to avoid future
errors due to C's short-circuiting evaluation.

Fixes: 9fab4752a181 ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 mm/khugepaged.c | 49 +++++++++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ecd28bfeab60..290422577172 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -574,9 +574,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		pte_t pteval = *_pte;
 		if (pte_none(pteval) || (pte_present(pteval) &&
 				is_zero_pfn(pte_pfn(pteval)))) {
+			++none_or_zero;
 			if (!userfaultfd_armed(vma) &&
-			    (++none_or_zero <= khugepaged_max_ptes_none ||
-			     !cc->is_khugepaged)) {
+			    (!cc->is_khugepaged ||
+			     none_or_zero <= khugepaged_max_ptes_none)) {
 				continue;
 			} else {
 				result = SCAN_EXCEED_NONE_PTE;
@@ -596,11 +597,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 
 		VM_BUG_ON_PAGE(!PageAnon(page), page);
 
-		if (cc->is_khugepaged && page_mapcount(page) > 1 &&
-		    ++shared > khugepaged_max_ptes_shared) {
-			result = SCAN_EXCEED_SHARED_PTE;
-			count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
-			goto out;
+		if (page_mapcount(page) > 1) {
+			++shared;
+			if (cc->is_khugepaged &&
+			    shared > khugepaged_max_ptes_shared) {
+				result = SCAN_EXCEED_SHARED_PTE;
+				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
+				goto out;
+			}
 		}
 
 		if (PageCompound(page)) {
@@ -1170,8 +1174,9 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 	     _pte++, _address += PAGE_SIZE) {
 		pte_t pteval = *_pte;
 		if (is_swap_pte(pteval)) {
-			if (++unmapped <= khugepaged_max_ptes_swap ||
-			    !cc->is_khugepaged) {
+			++unmapped;
+			if (!cc->is_khugepaged ||
+			    unmapped <= khugepaged_max_ptes_swap) {
 				/*
 				 * Always be strict with uffd-wp
 				 * enabled swap entries.  Please see
@@ -1189,9 +1194,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 			}
 		}
 		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+			++none_or_zero;
 			if (!userfaultfd_armed(vma) &&
-			    (++none_or_zero <= khugepaged_max_ptes_none ||
-			     !cc->is_khugepaged)) {
+			    (!cc->is_khugepaged ||
+			     none_or_zero <= khugepaged_max_ptes_none)) {
 				continue;
 			} else {
 				result = SCAN_EXCEED_NONE_PTE;
@@ -1221,12 +1227,14 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 			goto out_unmap;
 		}
 
-		if (cc->is_khugepaged &&
-		    page_mapcount(page) > 1 &&
-		    ++shared > khugepaged_max_ptes_shared) {
-			result = SCAN_EXCEED_SHARED_PTE;
-			count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
-			goto out_unmap;
+		if (page_mapcount(page) > 1) {
+			++shared;
+			if (cc->is_khugepaged &&
+			    shared > khugepaged_max_ptes_shared) {
+				result = SCAN_EXCEED_SHARED_PTE;
+				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
+				goto out_unmap;
+			}
 		}
 
 		page = compound_head(page);
@@ -1961,8 +1969,9 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
 			continue;
 
 		if (xa_is_value(page)) {
+			++swap;
 			if (cc->is_khugepaged &&
-			    ++swap > khugepaged_max_ptes_swap) {
+			    swap > khugepaged_max_ptes_swap) {
 				result = SCAN_EXCEED_SWAP_PTE;
 				count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
 				break;
@@ -2013,8 +2022,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
 	rcu_read_unlock();
 
 	if (result == SCAN_SUCCEED) {
-		if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none &&
-		    cc->is_khugepaged) {
+		if (cc->is_khugepaged &&
+		    present < HPAGE_PMD_NR - khugepaged_max_ptes_none) {
 			result = SCAN_EXCEED_NONE_PTE;
 			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
 		} else {
-- 
2.37.0.170.g444d1eabd0-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use
  2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
  2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
  2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
@ 2022-07-20 14:06 ` Zach O'Keefe
  2022-07-20 17:30   ` Yang Shi
  2022-07-21  0:43   ` David Rientjes
  2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
  3 siblings, 2 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe

Only compute hstart/hend once we've passed all checks that would
cause early return in madvise_collapse().

Fixes: c9d968ffd9ba ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 mm/khugepaged.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 290422577172..70e9d9950415 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2417,9 +2417,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	if (!vma->anon_vma || !vma_is_anonymous(vma))
 		return -EINVAL;
 
-	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
-	hend = end & HPAGE_PMD_MASK;
-
 	if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false))
 		return -EINVAL;
 
@@ -2432,6 +2429,9 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	mmgrab(mm);
 	lru_add_drain_all();
 
+	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
+	hend = end & HPAGE_PMD_MASK;
+
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		int result = SCAN_FAIL;
 
-- 
2.37.0.170.g444d1eabd0-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint"
  2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
                   ` (2 preceding siblings ...)
  2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
@ 2022-07-20 14:06 ` Zach O'Keefe
  2022-07-20 17:30   ` Yang Shi
  2022-07-21  0:43   ` David Rientjes
  3 siblings, 2 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe

In the anonymous collapse path, huge_memory:mm_khugepaged_scan_pmd can
be used to get roughly the same information as this proposed tracepoint.
Remove it.

Fixes: 0fff8a0de881 ("mm/madvise: add huge_memory:mm_madvise_collapse tracepoint")
Link: https://lore.kernel.org/linux-mm/Ys2vzYyVFmljt+B8@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 include/trace/events/huge_memory.h | 22 ----------------------
 mm/khugepaged.c                    |  2 --
 2 files changed, 24 deletions(-)

diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
index 38d339ffdb16..55392bf30a03 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -167,27 +167,5 @@ TRACE_EVENT(mm_collapse_huge_page_swapin,
 		__entry->ret)
 );
 
-TRACE_EVENT(mm_madvise_collapse,
-
-	TP_PROTO(struct mm_struct *mm, unsigned long addr, int result),
-
-	TP_ARGS(mm, addr, result),
-
-	TP_STRUCT__entry(__field(struct mm_struct *, mm)
-			 __field(unsigned long, addr)
-			 __field(int, result)
-	),
-
-	TP_fast_assign(__entry->mm = mm;
-		       __entry->addr = addr;
-		       __entry->result = result;
-	),
-
-	TP_printk("mm=%p addr=%#lx result=%s",
-		  __entry->mm,
-		  __entry->addr,
-		  __print_symbolic(__entry->result, SCAN_STATUS))
-);
-
 #endif /* __HUGE_MEMORY_H */
 #include <trace/define_trace.h>
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 70e9d9950415..28cb8429dad4 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2452,8 +2452,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
 		if (!mmap_locked)
 			*prev = NULL;  /* Tell caller we dropped mmap_lock */
 
-		trace_mm_madvise_collapse(mm, addr, result);
-
 		switch (result) {
 		case SCAN_SUCCEED:
 		case SCAN_PMD_MAPPED:
-- 
2.37.0.170.g444d1eabd0-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR
  2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
@ 2022-07-20 17:26   ` Yang Shi
  2022-07-21  0:41   ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: Yang Shi @ 2022-07-20 17:26 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin

On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> Minimally, node_load[] entries just need to be able to hold the maximum
> value of HPAGE_PMD_NR, which is compile-time defined per-arch based on
> PMD_SHIFT and PAGE_SHIFT.  node_load[] is only written either via memset(),
> or with via post-increment. struct collapse_control may be allocated
> via kmalloc() in other collapse contexts, and MAX_NUMNODES may be
> arbitrarily large. #define the underlying type of node_load[] based off
> HPAGE_PMD_NR to avoid excessive memory allocated for this struct.
>
> Fixes: 3b07f3bb225a ("mm/khugepaged: add struct collapse_control")
> Link: https://lore.kernel.org/linux-mm/Ys2CeIm%2FQmQwWh9a@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Reviewed-by: Yang Shi <shy828301@gmail.com>

> ---
>  mm/khugepaged.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 69990dacde14..ecd28bfeab60 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -92,8 +92,11 @@ struct collapse_control {
>         bool is_khugepaged;
>
>         /* Num pages scanned per node */
> -       int node_load[MAX_NUMNODES];
> -
> +#if HPAGE_PMD_ORDER < 16
> +       u16 node_load[MAX_NUMNODES];
> +#else
> +       u32 node_load[MAX_NUMNODES];
> +#endif
>         /* Last target selected in hpage_collapse_find_target_node() */
>         int last_target_node;
>  };
> --
> 2.37.0.170.g444d1eabd0-goog
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
  2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
@ 2022-07-20 17:27   ` Yang Shi
  2022-07-20 19:09     ` Zach O'Keefe
  2022-07-21  0:42   ` David Rientjes
  1 sibling, 1 reply; 14+ messages in thread
From: Yang Shi @ 2022-07-20 17:27 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin

On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> cc->is_khugepaged is used to predicate the khugepaged-only behavior
> of enforcing khugepaged heuristics limited by the sysfs knobs
> khugepaged_max_ptes_[none|swap|shared].
>
> In branches where khugepaged_max_ptes_* is checked, consistently check
> cc->is_khugepaged first.  Also, local counters (for comparison vs
> khugepaged_max_ptes_* limits) were previously incremented in the
> comparison expression.  Some of these counters (unmapped) are
> additionally used outside of khugepaged_max_ptes_* enforcement, and
> all counters are communicated in tracepoints.  Move the correct
> accounting of these counters before branching statements to avoid future
> errors due to C's short-circuiting evaluation.

Yeah, it is safer to not depend on the order of branch statements to
inc the counter.

Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Fixes: 9fab4752a181 ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
> Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> ---
>  mm/khugepaged.c | 49 +++++++++++++++++++++++++++++--------------------
>  1 file changed, 29 insertions(+), 20 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index ecd28bfeab60..290422577172 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -574,9 +574,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>                 pte_t pteval = *_pte;
>                 if (pte_none(pteval) || (pte_present(pteval) &&
>                                 is_zero_pfn(pte_pfn(pteval)))) {
> +                       ++none_or_zero;
>                         if (!userfaultfd_armed(vma) &&
> -                           (++none_or_zero <= khugepaged_max_ptes_none ||
> -                            !cc->is_khugepaged)) {
> +                           (!cc->is_khugepaged ||
> +                            none_or_zero <= khugepaged_max_ptes_none)) {
>                                 continue;
>                         } else {
>                                 result = SCAN_EXCEED_NONE_PTE;
> @@ -596,11 +597,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>
>                 VM_BUG_ON_PAGE(!PageAnon(page), page);
>
> -               if (cc->is_khugepaged && page_mapcount(page) > 1 &&
> -                   ++shared > khugepaged_max_ptes_shared) {
> -                       result = SCAN_EXCEED_SHARED_PTE;
> -                       count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> -                       goto out;
> +               if (page_mapcount(page) > 1) {
> +                       ++shared;
> +                       if (cc->is_khugepaged &&
> +                           shared > khugepaged_max_ptes_shared) {
> +                               result = SCAN_EXCEED_SHARED_PTE;
> +                               count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> +                               goto out;
> +                       }
>                 }
>
>                 if (PageCompound(page)) {
> @@ -1170,8 +1174,9 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>              _pte++, _address += PAGE_SIZE) {
>                 pte_t pteval = *_pte;
>                 if (is_swap_pte(pteval)) {
> -                       if (++unmapped <= khugepaged_max_ptes_swap ||
> -                           !cc->is_khugepaged) {
> +                       ++unmapped;
> +                       if (!cc->is_khugepaged ||
> +                           unmapped <= khugepaged_max_ptes_swap) {
>                                 /*
>                                  * Always be strict with uffd-wp
>                                  * enabled swap entries.  Please see
> @@ -1189,9 +1194,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>                         }
>                 }
>                 if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> +                       ++none_or_zero;
>                         if (!userfaultfd_armed(vma) &&
> -                           (++none_or_zero <= khugepaged_max_ptes_none ||
> -                            !cc->is_khugepaged)) {
> +                           (!cc->is_khugepaged ||
> +                            none_or_zero <= khugepaged_max_ptes_none)) {
>                                 continue;
>                         } else {
>                                 result = SCAN_EXCEED_NONE_PTE;
> @@ -1221,12 +1227,14 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>                         goto out_unmap;
>                 }
>
> -               if (cc->is_khugepaged &&
> -                   page_mapcount(page) > 1 &&
> -                   ++shared > khugepaged_max_ptes_shared) {
> -                       result = SCAN_EXCEED_SHARED_PTE;
> -                       count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> -                       goto out_unmap;
> +               if (page_mapcount(page) > 1) {
> +                       ++shared;
> +                       if (cc->is_khugepaged &&
> +                           shared > khugepaged_max_ptes_shared) {
> +                               result = SCAN_EXCEED_SHARED_PTE;
> +                               count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> +                               goto out_unmap;
> +                       }
>                 }
>
>                 page = compound_head(page);
> @@ -1961,8 +1969,9 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
>                         continue;
>
>                 if (xa_is_value(page)) {
> +                       ++swap;
>                         if (cc->is_khugepaged &&
> -                           ++swap > khugepaged_max_ptes_swap) {
> +                           swap > khugepaged_max_ptes_swap) {
>                                 result = SCAN_EXCEED_SWAP_PTE;
>                                 count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>                                 break;
> @@ -2013,8 +2022,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
>         rcu_read_unlock();
>
>         if (result == SCAN_SUCCEED) {
> -               if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none &&
> -                   cc->is_khugepaged) {
> +               if (cc->is_khugepaged &&
> +                   present < HPAGE_PMD_NR - khugepaged_max_ptes_none) {
>                         result = SCAN_EXCEED_NONE_PTE;
>                         count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>                 } else {
> --
> 2.37.0.170.g444d1eabd0-goog
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use
  2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
@ 2022-07-20 17:30   ` Yang Shi
  2022-07-21  0:43   ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: Yang Shi @ 2022-07-20 17:30 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin

On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> Only compute hstart/hend once we've passed all checks that would
> cause early return in madvise_collapse().
>
> Fixes: c9d968ffd9ba ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Reviewed-by: Yang Shi <shy828301@gmail.com>


> ---
>  mm/khugepaged.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 290422577172..70e9d9950415 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2417,9 +2417,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
>         if (!vma->anon_vma || !vma_is_anonymous(vma))
>                 return -EINVAL;
>
> -       hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> -       hend = end & HPAGE_PMD_MASK;
> -
>         if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false))
>                 return -EINVAL;
>
> @@ -2432,6 +2429,9 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
>         mmgrab(mm);
>         lru_add_drain_all();
>
> +       hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> +       hend = end & HPAGE_PMD_MASK;
> +
>         for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>                 int result = SCAN_FAIL;
>
> --
> 2.37.0.170.g444d1eabd0-goog
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint"
  2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
@ 2022-07-20 17:30   ` Yang Shi
  2022-07-21  0:43   ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: Yang Shi @ 2022-07-20 17:30 UTC (permalink / raw)
  To: Zach O'Keefe; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin

On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> In the anonymous collapse path, huge_memory:mm_khugepaged_scan_pmd can
> be used to get roughly the same information as this proposed tracepoint.
> Remove it.
>
> Fixes: 0fff8a0de881 ("mm/madvise: add huge_memory:mm_madvise_collapse tracepoint")
> Link: https://lore.kernel.org/linux-mm/Ys2vzYyVFmljt+B8@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Reviewed-by: Yang Shi <shy828301@gmail.com>

> ---
>  include/trace/events/huge_memory.h | 22 ----------------------
>  mm/khugepaged.c                    |  2 --
>  2 files changed, 24 deletions(-)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 38d339ffdb16..55392bf30a03 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -167,27 +167,5 @@ TRACE_EVENT(mm_collapse_huge_page_swapin,
>                 __entry->ret)
>  );
>
> -TRACE_EVENT(mm_madvise_collapse,
> -
> -       TP_PROTO(struct mm_struct *mm, unsigned long addr, int result),
> -
> -       TP_ARGS(mm, addr, result),
> -
> -       TP_STRUCT__entry(__field(struct mm_struct *, mm)
> -                        __field(unsigned long, addr)
> -                        __field(int, result)
> -       ),
> -
> -       TP_fast_assign(__entry->mm = mm;
> -                      __entry->addr = addr;
> -                      __entry->result = result;
> -       ),
> -
> -       TP_printk("mm=%p addr=%#lx result=%s",
> -                 __entry->mm,
> -                 __entry->addr,
> -                 __print_symbolic(__entry->result, SCAN_STATUS))
> -);
> -
>  #endif /* __HUGE_MEMORY_H */
>  #include <trace/define_trace.h>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 70e9d9950415..28cb8429dad4 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2452,8 +2452,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
>                 if (!mmap_locked)
>                         *prev = NULL;  /* Tell caller we dropped mmap_lock */
>
> -               trace_mm_madvise_collapse(mm, addr, result);
> -
>                 switch (result) {
>                 case SCAN_SUCCEED:
>                 case SCAN_PMD_MAPPED:
> --
> 2.37.0.170.g444d1eabd0-goog
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
  2022-07-20 17:27   ` Yang Shi
@ 2022-07-20 19:09     ` Zach O'Keefe
  0 siblings, 0 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 19:09 UTC (permalink / raw)
  To: Yang Shi; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin

On Jul 20 10:27, Yang Shi wrote:
> On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > cc->is_khugepaged is used to predicate the khugepaged-only behavior
> > of enforcing khugepaged heuristics limited by the sysfs knobs
> > khugepaged_max_ptes_[none|swap|shared].
> >
> > In branches where khugepaged_max_ptes_* is checked, consistently check
> > cc->is_khugepaged first.  Also, local counters (for comparison vs
> > khugepaged_max_ptes_* limits) were previously incremented in the
> > comparison expression.  Some of these counters (unmapped) are
> > additionally used outside of khugepaged_max_ptes_* enforcement, and
> > all counters are communicated in tracepoints.  Move the correct
> > accounting of these counters before branching statements to avoid future
> > errors due to C's short-circuiting evaluation.
> 
> Yeah, it is safer to not depend on the order of branch statements to
> inc the counter.
> 

Only cost me a couple hours when I got bit by this after naively moving checks
around :) Hopefully I can save the next person.

Also, thanks for the reviews, Yang!

> Reviewed-by: Yang Shi <shy828301@gmail.com>
> 
> >
> > Fixes: 9fab4752a181 ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
> > Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > ---
> >  mm/khugepaged.c | 49 +++++++++++++++++++++++++++++--------------------
> >  1 file changed, 29 insertions(+), 20 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index ecd28bfeab60..290422577172 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -574,9 +574,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >                 pte_t pteval = *_pte;
> >                 if (pte_none(pteval) || (pte_present(pteval) &&
> >                                 is_zero_pfn(pte_pfn(pteval)))) {
> > +                       ++none_or_zero;
> >                         if (!userfaultfd_armed(vma) &&
> > -                           (++none_or_zero <= khugepaged_max_ptes_none ||
> > -                            !cc->is_khugepaged)) {
> > +                           (!cc->is_khugepaged ||
> > +                            none_or_zero <= khugepaged_max_ptes_none)) {
> >                                 continue;
> >                         } else {
> >                                 result = SCAN_EXCEED_NONE_PTE;
> > @@ -596,11 +597,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >
> >                 VM_BUG_ON_PAGE(!PageAnon(page), page);
> >
> > -               if (cc->is_khugepaged && page_mapcount(page) > 1 &&
> > -                   ++shared > khugepaged_max_ptes_shared) {
> > -                       result = SCAN_EXCEED_SHARED_PTE;
> > -                       count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > -                       goto out;
> > +               if (page_mapcount(page) > 1) {
> > +                       ++shared;
> > +                       if (cc->is_khugepaged &&
> > +                           shared > khugepaged_max_ptes_shared) {
> > +                               result = SCAN_EXCEED_SHARED_PTE;
> > +                               count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > +                               goto out;
> > +                       }
> >                 }
> >
> >                 if (PageCompound(page)) {
> > @@ -1170,8 +1174,9 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> >              _pte++, _address += PAGE_SIZE) {
> >                 pte_t pteval = *_pte;
> >                 if (is_swap_pte(pteval)) {
> > -                       if (++unmapped <= khugepaged_max_ptes_swap ||
> > -                           !cc->is_khugepaged) {
> > +                       ++unmapped;
> > +                       if (!cc->is_khugepaged ||
> > +                           unmapped <= khugepaged_max_ptes_swap) {
> >                                 /*
> >                                  * Always be strict with uffd-wp
> >                                  * enabled swap entries.  Please see
> > @@ -1189,9 +1194,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> >                         }
> >                 }
> >                 if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> > +                       ++none_or_zero;
> >                         if (!userfaultfd_armed(vma) &&
> > -                           (++none_or_zero <= khugepaged_max_ptes_none ||
> > -                            !cc->is_khugepaged)) {
> > +                           (!cc->is_khugepaged ||
> > +                            none_or_zero <= khugepaged_max_ptes_none)) {
> >                                 continue;
> >                         } else {
> >                                 result = SCAN_EXCEED_NONE_PTE;
> > @@ -1221,12 +1227,14 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> >                         goto out_unmap;
> >                 }
> >
> > -               if (cc->is_khugepaged &&
> > -                   page_mapcount(page) > 1 &&
> > -                   ++shared > khugepaged_max_ptes_shared) {
> > -                       result = SCAN_EXCEED_SHARED_PTE;
> > -                       count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > -                       goto out_unmap;
> > +               if (page_mapcount(page) > 1) {
> > +                       ++shared;
> > +                       if (cc->is_khugepaged &&
> > +                           shared > khugepaged_max_ptes_shared) {
> > +                               result = SCAN_EXCEED_SHARED_PTE;
> > +                               count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > +                               goto out_unmap;
> > +                       }
> >                 }
> >
> >                 page = compound_head(page);
> > @@ -1961,8 +1969,9 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
> >                         continue;
> >
> >                 if (xa_is_value(page)) {
> > +                       ++swap;
> >                         if (cc->is_khugepaged &&
> > -                           ++swap > khugepaged_max_ptes_swap) {
> > +                           swap > khugepaged_max_ptes_swap) {
> >                                 result = SCAN_EXCEED_SWAP_PTE;
> >                                 count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> >                                 break;
> > @@ -2013,8 +2022,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
> >         rcu_read_unlock();
> >
> >         if (result == SCAN_SUCCEED) {
> > -               if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none &&
> > -                   cc->is_khugepaged) {
> > +               if (cc->is_khugepaged &&
> > +                   present < HPAGE_PMD_NR - khugepaged_max_ptes_none) {
> >                         result = SCAN_EXCEED_NONE_PTE;
> >                         count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> >                 } else {
> > --
> > 2.37.0.170.g444d1eabd0-goog
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR
  2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
  2022-07-20 17:26   ` Yang Shi
@ 2022-07-21  0:41   ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2022-07-21  0:41 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Andrew Morton, linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin

On Wed, 20 Jul 2022, Zach O'Keefe wrote:

> Minimally, node_load[] entries just need to be able to hold the maximum
> value of HPAGE_PMD_NR, which is compile-time defined per-arch based on
> PMD_SHIFT and PAGE_SHIFT.  node_load[] is only written either via memset(),
> or with via post-increment. struct collapse_control may be allocated
> via kmalloc() in other collapse contexts, and MAX_NUMNODES may be
> arbitrarily large. #define the underlying type of node_load[] based off
> HPAGE_PMD_NR to avoid excessive memory allocated for this struct.
> 
> Fixes: 3b07f3bb225a ("mm/khugepaged: add struct collapse_control")
> Link: https://lore.kernel.org/linux-mm/Ys2CeIm%2FQmQwWh9a@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
  2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
  2022-07-20 17:27   ` Yang Shi
@ 2022-07-21  0:42   ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2022-07-21  0:42 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Andrew Morton, linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin

On Wed, 20 Jul 2022, Zach O'Keefe wrote:

> cc->is_khugepaged is used to predicate the khugepaged-only behavior
> of enforcing khugepaged heuristics limited by the sysfs knobs
> khugepaged_max_ptes_[none|swap|shared].
> 
> In branches where khugepaged_max_ptes_* is checked, consistently check
> cc->is_khugepaged first.  Also, local counters (for comparison vs
> khugepaged_max_ptes_* limits) were previously incremented in the
> comparison expression.  Some of these counters (unmapped) are
> additionally used outside of khugepaged_max_ptes_* enforcement, and
> all counters are communicated in tracepoints.  Move the correct
> accounting of these counters before branching statements to avoid future
> errors due to C's short-circuiting evaluation.
> 
> Fixes: 9fab4752a181 ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
> Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use
  2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
  2022-07-20 17:30   ` Yang Shi
@ 2022-07-21  0:43   ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2022-07-21  0:43 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Andrew Morton, linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin

On Wed, 20 Jul 2022, Zach O'Keefe wrote:

> Only compute hstart/hend once we've passed all checks that would
> cause early return in madvise_collapse().
> 
> Fixes: c9d968ffd9ba ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint"
  2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
  2022-07-20 17:30   ` Yang Shi
@ 2022-07-21  0:43   ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2022-07-21  0:43 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Andrew Morton, linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin

On Wed, 20 Jul 2022, Zach O'Keefe wrote:

> In the anonymous collapse path, huge_memory:mm_khugepaged_scan_pmd can
> be used to get roughly the same information as this proposed tracepoint.
> Remove it.
> 
> Fixes: 0fff8a0de881 ("mm/madvise: add huge_memory:mm_madvise_collapse tracepoint")
> Link: https://lore.kernel.org/linux-mm/Ys2vzYyVFmljt+B8@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-07-21  0:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
2022-07-20 17:26   ` Yang Shi
2022-07-21  0:41   ` David Rientjes
2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
2022-07-20 17:27   ` Yang Shi
2022-07-20 19:09     ` Zach O'Keefe
2022-07-21  0:42   ` David Rientjes
2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
2022-07-20 17:30   ` Yang Shi
2022-07-21  0:43   ` David Rientjes
2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
2022-07-20 17:30   ` Yang Shi
2022-07-21  0:43   ` David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.