* [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7
@ 2022-07-20 14:05 Zach O'Keefe
2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:05 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe
Hey Andrew,
These are a few requested cleanups for the "mm: userspace hugepage collapse, v7"
series[1] currently in mm-unstable. Please considering squashing them into the
v7 series. Note that https://lkml.kernel.org/r/Ys4aTRqWIbjNs1mI@google.com is
still outstanding, and that the series is incomplete until a suitable resolution
is reached there.
Thanks, and apologies for the multiple fixes / adjustments required here.
Zach
[1] https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
Zach O'Keefe (4):
mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR
mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
mm/khugepaged: delay computation of hpage boundaries until use
Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint"
include/trace/events/huge_memory.h | 22 ----------
mm/khugepaged.c | 64 +++++++++++++++++-------------
2 files changed, 37 insertions(+), 49 deletions(-)
--
2.37.0.170.g444d1eabd0-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR
2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
@ 2022-07-20 14:06 ` Zach O'Keefe
2022-07-20 17:26 ` Yang Shi
2022-07-21 0:41 ` David Rientjes
2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
` (2 subsequent siblings)
3 siblings, 2 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:06 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe
Minimally, node_load[] entries just need to be able to hold the maximum
value of HPAGE_PMD_NR, which is compile-time defined per-arch based on
PMD_SHIFT and PAGE_SHIFT. node_load[] is only written either via memset(),
or with via post-increment. struct collapse_control may be allocated
via kmalloc() in other collapse contexts, and MAX_NUMNODES may be
arbitrarily large. #define the underlying type of node_load[] based off
HPAGE_PMD_NR to avoid excessive memory allocated for this struct.
Fixes: 3b07f3bb225a ("mm/khugepaged: add struct collapse_control")
Link: https://lore.kernel.org/linux-mm/Ys2CeIm%2FQmQwWh9a@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
mm/khugepaged.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 69990dacde14..ecd28bfeab60 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -92,8 +92,11 @@ struct collapse_control {
bool is_khugepaged;
/* Num pages scanned per node */
- int node_load[MAX_NUMNODES];
-
+#if HPAGE_PMD_ORDER < 16
+ u16 node_load[MAX_NUMNODES];
+#else
+ u32 node_load[MAX_NUMNODES];
+#endif
/* Last target selected in hpage_collapse_find_target_node() */
int last_target_node;
};
--
2.37.0.170.g444d1eabd0-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
@ 2022-07-20 14:06 ` Zach O'Keefe
2022-07-20 17:27 ` Yang Shi
2022-07-21 0:42 ` David Rientjes
2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
3 siblings, 2 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:06 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe
cc->is_khugepaged is used to predicate the khugepaged-only behavior
of enforcing khugepaged heuristics limited by the sysfs knobs
khugepaged_max_ptes_[none|swap|shared].
In branches where khugepaged_max_ptes_* is checked, consistently check
cc->is_khugepaged first. Also, local counters (for comparison vs
khugepaged_max_ptes_* limits) were previously incremented in the
comparison expression. Some of these counters (unmapped) are
additionally used outside of khugepaged_max_ptes_* enforcement, and
all counters are communicated in tracepoints. Move the correct
accounting of these counters before branching statements to avoid future
errors due to C's short-circuiting evaluation.
Fixes: 9fab4752a181 ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
mm/khugepaged.c | 49 +++++++++++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 20 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ecd28bfeab60..290422577172 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -574,9 +574,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
pte_t pteval = *_pte;
if (pte_none(pteval) || (pte_present(pteval) &&
is_zero_pfn(pte_pfn(pteval)))) {
+ ++none_or_zero;
if (!userfaultfd_armed(vma) &&
- (++none_or_zero <= khugepaged_max_ptes_none ||
- !cc->is_khugepaged)) {
+ (!cc->is_khugepaged ||
+ none_or_zero <= khugepaged_max_ptes_none)) {
continue;
} else {
result = SCAN_EXCEED_NONE_PTE;
@@ -596,11 +597,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
VM_BUG_ON_PAGE(!PageAnon(page), page);
- if (cc->is_khugepaged && page_mapcount(page) > 1 &&
- ++shared > khugepaged_max_ptes_shared) {
- result = SCAN_EXCEED_SHARED_PTE;
- count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
- goto out;
+ if (page_mapcount(page) > 1) {
+ ++shared;
+ if (cc->is_khugepaged &&
+ shared > khugepaged_max_ptes_shared) {
+ result = SCAN_EXCEED_SHARED_PTE;
+ count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
+ goto out;
+ }
}
if (PageCompound(page)) {
@@ -1170,8 +1174,9 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
_pte++, _address += PAGE_SIZE) {
pte_t pteval = *_pte;
if (is_swap_pte(pteval)) {
- if (++unmapped <= khugepaged_max_ptes_swap ||
- !cc->is_khugepaged) {
+ ++unmapped;
+ if (!cc->is_khugepaged ||
+ unmapped <= khugepaged_max_ptes_swap) {
/*
* Always be strict with uffd-wp
* enabled swap entries. Please see
@@ -1189,9 +1194,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
}
}
if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+ ++none_or_zero;
if (!userfaultfd_armed(vma) &&
- (++none_or_zero <= khugepaged_max_ptes_none ||
- !cc->is_khugepaged)) {
+ (!cc->is_khugepaged ||
+ none_or_zero <= khugepaged_max_ptes_none)) {
continue;
} else {
result = SCAN_EXCEED_NONE_PTE;
@@ -1221,12 +1227,14 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
goto out_unmap;
}
- if (cc->is_khugepaged &&
- page_mapcount(page) > 1 &&
- ++shared > khugepaged_max_ptes_shared) {
- result = SCAN_EXCEED_SHARED_PTE;
- count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
- goto out_unmap;
+ if (page_mapcount(page) > 1) {
+ ++shared;
+ if (cc->is_khugepaged &&
+ shared > khugepaged_max_ptes_shared) {
+ result = SCAN_EXCEED_SHARED_PTE;
+ count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
+ goto out_unmap;
+ }
}
page = compound_head(page);
@@ -1961,8 +1969,9 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
continue;
if (xa_is_value(page)) {
+ ++swap;
if (cc->is_khugepaged &&
- ++swap > khugepaged_max_ptes_swap) {
+ swap > khugepaged_max_ptes_swap) {
result = SCAN_EXCEED_SWAP_PTE;
count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
break;
@@ -2013,8 +2022,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
rcu_read_unlock();
if (result == SCAN_SUCCEED) {
- if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none &&
- cc->is_khugepaged) {
+ if (cc->is_khugepaged &&
+ present < HPAGE_PMD_NR - khugepaged_max_ptes_none) {
result = SCAN_EXCEED_NONE_PTE;
count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
} else {
--
2.37.0.170.g444d1eabd0-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use
2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
@ 2022-07-20 14:06 ` Zach O'Keefe
2022-07-20 17:30 ` Yang Shi
2022-07-21 0:43 ` David Rientjes
2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
3 siblings, 2 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:06 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe
Only compute hstart/hend once we've passed all checks that would
cause early return in madvise_collapse().
Fixes: c9d968ffd9ba ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
mm/khugepaged.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 290422577172..70e9d9950415 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2417,9 +2417,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
if (!vma->anon_vma || !vma_is_anonymous(vma))
return -EINVAL;
- hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
- hend = end & HPAGE_PMD_MASK;
-
if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false))
return -EINVAL;
@@ -2432,6 +2429,9 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmgrab(mm);
lru_add_drain_all();
+ hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
+ hend = end & HPAGE_PMD_MASK;
+
for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
int result = SCAN_FAIL;
--
2.37.0.170.g444d1eabd0-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint"
2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
` (2 preceding siblings ...)
2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
@ 2022-07-20 14:06 ` Zach O'Keefe
2022-07-20 17:30 ` Yang Shi
2022-07-21 0:43 ` David Rientjes
3 siblings, 2 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 14:06 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin, Zach O'Keefe
In the anonymous collapse path, huge_memory:mm_khugepaged_scan_pmd can
be used to get roughly the same information as this proposed tracepoint.
Remove it.
Fixes: 0fff8a0de881 ("mm/madvise: add huge_memory:mm_madvise_collapse tracepoint")
Link: https://lore.kernel.org/linux-mm/Ys2vzYyVFmljt+B8@google.com/
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
include/trace/events/huge_memory.h | 22 ----------------------
mm/khugepaged.c | 2 --
2 files changed, 24 deletions(-)
diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
index 38d339ffdb16..55392bf30a03 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -167,27 +167,5 @@ TRACE_EVENT(mm_collapse_huge_page_swapin,
__entry->ret)
);
-TRACE_EVENT(mm_madvise_collapse,
-
- TP_PROTO(struct mm_struct *mm, unsigned long addr, int result),
-
- TP_ARGS(mm, addr, result),
-
- TP_STRUCT__entry(__field(struct mm_struct *, mm)
- __field(unsigned long, addr)
- __field(int, result)
- ),
-
- TP_fast_assign(__entry->mm = mm;
- __entry->addr = addr;
- __entry->result = result;
- ),
-
- TP_printk("mm=%p addr=%#lx result=%s",
- __entry->mm,
- __entry->addr,
- __print_symbolic(__entry->result, SCAN_STATUS))
-);
-
#endif /* __HUGE_MEMORY_H */
#include <trace/define_trace.h>
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 70e9d9950415..28cb8429dad4 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2452,8 +2452,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
if (!mmap_locked)
*prev = NULL; /* Tell caller we dropped mmap_lock */
- trace_mm_madvise_collapse(mm, addr, result);
-
switch (result) {
case SCAN_SUCCEED:
case SCAN_PMD_MAPPED:
--
2.37.0.170.g444d1eabd0-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR
2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
@ 2022-07-20 17:26 ` Yang Shi
2022-07-21 0:41 ` David Rientjes
1 sibling, 0 replies; 14+ messages in thread
From: Yang Shi @ 2022-07-20 17:26 UTC (permalink / raw)
To: Zach O'Keefe; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin
On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> Minimally, node_load[] entries just need to be able to hold the maximum
> value of HPAGE_PMD_NR, which is compile-time defined per-arch based on
> PMD_SHIFT and PAGE_SHIFT. node_load[] is only written either via memset(),
> or with via post-increment. struct collapse_control may be allocated
> via kmalloc() in other collapse contexts, and MAX_NUMNODES may be
> arbitrarily large. #define the underlying type of node_load[] based off
> HPAGE_PMD_NR to avoid excessive memory allocated for this struct.
>
> Fixes: 3b07f3bb225a ("mm/khugepaged: add struct collapse_control")
> Link: https://lore.kernel.org/linux-mm/Ys2CeIm%2FQmQwWh9a@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
> ---
> mm/khugepaged.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 69990dacde14..ecd28bfeab60 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -92,8 +92,11 @@ struct collapse_control {
> bool is_khugepaged;
>
> /* Num pages scanned per node */
> - int node_load[MAX_NUMNODES];
> -
> +#if HPAGE_PMD_ORDER < 16
> + u16 node_load[MAX_NUMNODES];
> +#else
> + u32 node_load[MAX_NUMNODES];
> +#endif
> /* Last target selected in hpage_collapse_find_target_node() */
> int last_target_node;
> };
> --
> 2.37.0.170.g444d1eabd0-goog
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
@ 2022-07-20 17:27 ` Yang Shi
2022-07-20 19:09 ` Zach O'Keefe
2022-07-21 0:42 ` David Rientjes
1 sibling, 1 reply; 14+ messages in thread
From: Yang Shi @ 2022-07-20 17:27 UTC (permalink / raw)
To: Zach O'Keefe; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin
On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> cc->is_khugepaged is used to predicate the khugepaged-only behavior
> of enforcing khugepaged heuristics limited by the sysfs knobs
> khugepaged_max_ptes_[none|swap|shared].
>
> In branches where khugepaged_max_ptes_* is checked, consistently check
> cc->is_khugepaged first. Also, local counters (for comparison vs
> khugepaged_max_ptes_* limits) were previously incremented in the
> comparison expression. Some of these counters (unmapped) are
> additionally used outside of khugepaged_max_ptes_* enforcement, and
> all counters are communicated in tracepoints. Move the correct
> accounting of these counters before branching statements to avoid future
> errors due to C's short-circuiting evaluation.
Yeah, it is safer to not depend on the order of branch statements to
inc the counter.
Reviewed-by: Yang Shi <shy828301@gmail.com>
>
> Fixes: 9fab4752a181 ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
> Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> ---
> mm/khugepaged.c | 49 +++++++++++++++++++++++++++++--------------------
> 1 file changed, 29 insertions(+), 20 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index ecd28bfeab60..290422577172 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -574,9 +574,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> pte_t pteval = *_pte;
> if (pte_none(pteval) || (pte_present(pteval) &&
> is_zero_pfn(pte_pfn(pteval)))) {
> + ++none_or_zero;
> if (!userfaultfd_armed(vma) &&
> - (++none_or_zero <= khugepaged_max_ptes_none ||
> - !cc->is_khugepaged)) {
> + (!cc->is_khugepaged ||
> + none_or_zero <= khugepaged_max_ptes_none)) {
> continue;
> } else {
> result = SCAN_EXCEED_NONE_PTE;
> @@ -596,11 +597,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>
> VM_BUG_ON_PAGE(!PageAnon(page), page);
>
> - if (cc->is_khugepaged && page_mapcount(page) > 1 &&
> - ++shared > khugepaged_max_ptes_shared) {
> - result = SCAN_EXCEED_SHARED_PTE;
> - count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> - goto out;
> + if (page_mapcount(page) > 1) {
> + ++shared;
> + if (cc->is_khugepaged &&
> + shared > khugepaged_max_ptes_shared) {
> + result = SCAN_EXCEED_SHARED_PTE;
> + count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> + goto out;
> + }
> }
>
> if (PageCompound(page)) {
> @@ -1170,8 +1174,9 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> _pte++, _address += PAGE_SIZE) {
> pte_t pteval = *_pte;
> if (is_swap_pte(pteval)) {
> - if (++unmapped <= khugepaged_max_ptes_swap ||
> - !cc->is_khugepaged) {
> + ++unmapped;
> + if (!cc->is_khugepaged ||
> + unmapped <= khugepaged_max_ptes_swap) {
> /*
> * Always be strict with uffd-wp
> * enabled swap entries. Please see
> @@ -1189,9 +1194,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> }
> }
> if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> + ++none_or_zero;
> if (!userfaultfd_armed(vma) &&
> - (++none_or_zero <= khugepaged_max_ptes_none ||
> - !cc->is_khugepaged)) {
> + (!cc->is_khugepaged ||
> + none_or_zero <= khugepaged_max_ptes_none)) {
> continue;
> } else {
> result = SCAN_EXCEED_NONE_PTE;
> @@ -1221,12 +1227,14 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> goto out_unmap;
> }
>
> - if (cc->is_khugepaged &&
> - page_mapcount(page) > 1 &&
> - ++shared > khugepaged_max_ptes_shared) {
> - result = SCAN_EXCEED_SHARED_PTE;
> - count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> - goto out_unmap;
> + if (page_mapcount(page) > 1) {
> + ++shared;
> + if (cc->is_khugepaged &&
> + shared > khugepaged_max_ptes_shared) {
> + result = SCAN_EXCEED_SHARED_PTE;
> + count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> + goto out_unmap;
> + }
> }
>
> page = compound_head(page);
> @@ -1961,8 +1969,9 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
> continue;
>
> if (xa_is_value(page)) {
> + ++swap;
> if (cc->is_khugepaged &&
> - ++swap > khugepaged_max_ptes_swap) {
> + swap > khugepaged_max_ptes_swap) {
> result = SCAN_EXCEED_SWAP_PTE;
> count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> break;
> @@ -2013,8 +2022,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
> rcu_read_unlock();
>
> if (result == SCAN_SUCCEED) {
> - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none &&
> - cc->is_khugepaged) {
> + if (cc->is_khugepaged &&
> + present < HPAGE_PMD_NR - khugepaged_max_ptes_none) {
> result = SCAN_EXCEED_NONE_PTE;
> count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> } else {
> --
> 2.37.0.170.g444d1eabd0-goog
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use
2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
@ 2022-07-20 17:30 ` Yang Shi
2022-07-21 0:43 ` David Rientjes
1 sibling, 0 replies; 14+ messages in thread
From: Yang Shi @ 2022-07-20 17:30 UTC (permalink / raw)
To: Zach O'Keefe; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin
On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> Only compute hstart/hend once we've passed all checks that would
> cause early return in madvise_collapse().
>
> Fixes: c9d968ffd9ba ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
> ---
> mm/khugepaged.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 290422577172..70e9d9950415 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2417,9 +2417,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
> if (!vma->anon_vma || !vma_is_anonymous(vma))
> return -EINVAL;
>
> - hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> - hend = end & HPAGE_PMD_MASK;
> -
> if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false))
> return -EINVAL;
>
> @@ -2432,6 +2429,9 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
> mmgrab(mm);
> lru_add_drain_all();
>
> + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> + hend = end & HPAGE_PMD_MASK;
> +
> for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
> int result = SCAN_FAIL;
>
> --
> 2.37.0.170.g444d1eabd0-goog
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint"
2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
@ 2022-07-20 17:30 ` Yang Shi
2022-07-21 0:43 ` David Rientjes
1 sibling, 0 replies; 14+ messages in thread
From: Yang Shi @ 2022-07-20 17:30 UTC (permalink / raw)
To: Zach O'Keefe; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin
On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> In the anonymous collapse path, huge_memory:mm_khugepaged_scan_pmd can
> be used to get roughly the same information as this proposed tracepoint.
> Remove it.
>
> Fixes: 0fff8a0de881 ("mm/madvise: add huge_memory:mm_madvise_collapse tracepoint")
> Link: https://lore.kernel.org/linux-mm/Ys2vzYyVFmljt+B8@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
> ---
> include/trace/events/huge_memory.h | 22 ----------------------
> mm/khugepaged.c | 2 --
> 2 files changed, 24 deletions(-)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 38d339ffdb16..55392bf30a03 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -167,27 +167,5 @@ TRACE_EVENT(mm_collapse_huge_page_swapin,
> __entry->ret)
> );
>
> -TRACE_EVENT(mm_madvise_collapse,
> -
> - TP_PROTO(struct mm_struct *mm, unsigned long addr, int result),
> -
> - TP_ARGS(mm, addr, result),
> -
> - TP_STRUCT__entry(__field(struct mm_struct *, mm)
> - __field(unsigned long, addr)
> - __field(int, result)
> - ),
> -
> - TP_fast_assign(__entry->mm = mm;
> - __entry->addr = addr;
> - __entry->result = result;
> - ),
> -
> - TP_printk("mm=%p addr=%#lx result=%s",
> - __entry->mm,
> - __entry->addr,
> - __print_symbolic(__entry->result, SCAN_STATUS))
> -);
> -
> #endif /* __HUGE_MEMORY_H */
> #include <trace/define_trace.h>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 70e9d9950415..28cb8429dad4 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2452,8 +2452,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
> if (!mmap_locked)
> *prev = NULL; /* Tell caller we dropped mmap_lock */
>
> - trace_mm_madvise_collapse(mm, addr, result);
> -
> switch (result) {
> case SCAN_SUCCEED:
> case SCAN_PMD_MAPPED:
> --
> 2.37.0.170.g444d1eabd0-goog
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
2022-07-20 17:27 ` Yang Shi
@ 2022-07-20 19:09 ` Zach O'Keefe
0 siblings, 0 replies; 14+ messages in thread
From: Zach O'Keefe @ 2022-07-20 19:09 UTC (permalink / raw)
To: Yang Shi; +Cc: Andrew Morton, Linux MM, Hugh Dickins, Miaohe Lin
On Jul 20 10:27, Yang Shi wrote:
> On Wed, Jul 20, 2022 at 7:06 AM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > cc->is_khugepaged is used to predicate the khugepaged-only behavior
> > of enforcing khugepaged heuristics limited by the sysfs knobs
> > khugepaged_max_ptes_[none|swap|shared].
> >
> > In branches where khugepaged_max_ptes_* is checked, consistently check
> > cc->is_khugepaged first. Also, local counters (for comparison vs
> > khugepaged_max_ptes_* limits) were previously incremented in the
> > comparison expression. Some of these counters (unmapped) are
> > additionally used outside of khugepaged_max_ptes_* enforcement, and
> > all counters are communicated in tracepoints. Move the correct
> > accounting of these counters before branching statements to avoid future
> > errors due to C's short-circuiting evaluation.
>
> Yeah, it is safer to not depend on the order of branch statements to
> inc the counter.
>
Only cost me a couple hours when I got bit by this after naively moving checks
around :) Hopefully I can save the next person.
Also, thanks for the reviews, Yang!
> Reviewed-by: Yang Shi <shy828301@gmail.com>
>
> >
> > Fixes: 9fab4752a181 ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
> > Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > ---
> > mm/khugepaged.c | 49 +++++++++++++++++++++++++++++--------------------
> > 1 file changed, 29 insertions(+), 20 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index ecd28bfeab60..290422577172 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -574,9 +574,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> > pte_t pteval = *_pte;
> > if (pte_none(pteval) || (pte_present(pteval) &&
> > is_zero_pfn(pte_pfn(pteval)))) {
> > + ++none_or_zero;
> > if (!userfaultfd_armed(vma) &&
> > - (++none_or_zero <= khugepaged_max_ptes_none ||
> > - !cc->is_khugepaged)) {
> > + (!cc->is_khugepaged ||
> > + none_or_zero <= khugepaged_max_ptes_none)) {
> > continue;
> > } else {
> > result = SCAN_EXCEED_NONE_PTE;
> > @@ -596,11 +597,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >
> > VM_BUG_ON_PAGE(!PageAnon(page), page);
> >
> > - if (cc->is_khugepaged && page_mapcount(page) > 1 &&
> > - ++shared > khugepaged_max_ptes_shared) {
> > - result = SCAN_EXCEED_SHARED_PTE;
> > - count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > - goto out;
> > + if (page_mapcount(page) > 1) {
> > + ++shared;
> > + if (cc->is_khugepaged &&
> > + shared > khugepaged_max_ptes_shared) {
> > + result = SCAN_EXCEED_SHARED_PTE;
> > + count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > + goto out;
> > + }
> > }
> >
> > if (PageCompound(page)) {
> > @@ -1170,8 +1174,9 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> > _pte++, _address += PAGE_SIZE) {
> > pte_t pteval = *_pte;
> > if (is_swap_pte(pteval)) {
> > - if (++unmapped <= khugepaged_max_ptes_swap ||
> > - !cc->is_khugepaged) {
> > + ++unmapped;
> > + if (!cc->is_khugepaged ||
> > + unmapped <= khugepaged_max_ptes_swap) {
> > /*
> > * Always be strict with uffd-wp
> > * enabled swap entries. Please see
> > @@ -1189,9 +1194,10 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> > }
> > }
> > if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> > + ++none_or_zero;
> > if (!userfaultfd_armed(vma) &&
> > - (++none_or_zero <= khugepaged_max_ptes_none ||
> > - !cc->is_khugepaged)) {
> > + (!cc->is_khugepaged ||
> > + none_or_zero <= khugepaged_max_ptes_none)) {
> > continue;
> > } else {
> > result = SCAN_EXCEED_NONE_PTE;
> > @@ -1221,12 +1227,14 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> > goto out_unmap;
> > }
> >
> > - if (cc->is_khugepaged &&
> > - page_mapcount(page) > 1 &&
> > - ++shared > khugepaged_max_ptes_shared) {
> > - result = SCAN_EXCEED_SHARED_PTE;
> > - count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > - goto out_unmap;
> > + if (page_mapcount(page) > 1) {
> > + ++shared;
> > + if (cc->is_khugepaged &&
> > + shared > khugepaged_max_ptes_shared) {
> > + result = SCAN_EXCEED_SHARED_PTE;
> > + count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > + goto out_unmap;
> > + }
> > }
> >
> > page = compound_head(page);
> > @@ -1961,8 +1969,9 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
> > continue;
> >
> > if (xa_is_value(page)) {
> > + ++swap;
> > if (cc->is_khugepaged &&
> > - ++swap > khugepaged_max_ptes_swap) {
> > + swap > khugepaged_max_ptes_swap) {
> > result = SCAN_EXCEED_SWAP_PTE;
> > count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> > break;
> > @@ -2013,8 +2022,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
> > rcu_read_unlock();
> >
> > if (result == SCAN_SUCCEED) {
> > - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none &&
> > - cc->is_khugepaged) {
> > + if (cc->is_khugepaged &&
> > + present < HPAGE_PMD_NR - khugepaged_max_ptes_none) {
> > result = SCAN_EXCEED_NONE_PTE;
> > count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > } else {
> > --
> > 2.37.0.170.g444d1eabd0-goog
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR
2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
2022-07-20 17:26 ` Yang Shi
@ 2022-07-21 0:41 ` David Rientjes
1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2022-07-21 0:41 UTC (permalink / raw)
To: Zach O'Keefe
Cc: Andrew Morton, linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin
On Wed, 20 Jul 2022, Zach O'Keefe wrote:
> Minimally, node_load[] entries just need to be able to hold the maximum
> value of HPAGE_PMD_NR, which is compile-time defined per-arch based on
> PMD_SHIFT and PAGE_SHIFT. node_load[] is only written either via memset(),
> or with via post-increment. struct collapse_control may be allocated
> via kmalloc() in other collapse contexts, and MAX_NUMNODES may be
> arbitrarily large. #define the underlying type of node_load[] based off
> HPAGE_PMD_NR to avoid excessive memory allocated for this struct.
>
> Fixes: 3b07f3bb225a ("mm/khugepaged: add struct collapse_control")
> Link: https://lore.kernel.org/linux-mm/Ys2CeIm%2FQmQwWh9a@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: David Rientjes <rientjes@google.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks
2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
2022-07-20 17:27 ` Yang Shi
@ 2022-07-21 0:42 ` David Rientjes
1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2022-07-21 0:42 UTC (permalink / raw)
To: Zach O'Keefe
Cc: Andrew Morton, linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin
On Wed, 20 Jul 2022, Zach O'Keefe wrote:
> cc->is_khugepaged is used to predicate the khugepaged-only behavior
> of enforcing khugepaged heuristics limited by the sysfs knobs
> khugepaged_max_ptes_[none|swap|shared].
>
> In branches where khugepaged_max_ptes_* is checked, consistently check
> cc->is_khugepaged first. Also, local counters (for comparison vs
> khugepaged_max_ptes_* limits) were previously incremented in the
> comparison expression. Some of these counters (unmapped) are
> additionally used outside of khugepaged_max_ptes_* enforcement, and
> all counters are communicated in tracepoints. Move the correct
> accounting of these counters before branching statements to avoid future
> errors due to C's short-circuiting evaluation.
>
> Fixes: 9fab4752a181 ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
> Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: David Rientjes <rientjes@google.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use
2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
2022-07-20 17:30 ` Yang Shi
@ 2022-07-21 0:43 ` David Rientjes
1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2022-07-21 0:43 UTC (permalink / raw)
To: Zach O'Keefe
Cc: Andrew Morton, linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin
On Wed, 20 Jul 2022, Zach O'Keefe wrote:
> Only compute hstart/hend once we've passed all checks that would
> cause early return in madvise_collapse().
>
> Fixes: c9d968ffd9ba ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: David Rientjes <rientjes@google.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint"
2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
2022-07-20 17:30 ` Yang Shi
@ 2022-07-21 0:43 ` David Rientjes
1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2022-07-21 0:43 UTC (permalink / raw)
To: Zach O'Keefe
Cc: Andrew Morton, linux-mm, Hugh Dickins, Yang Shi, Miaohe Lin
On Wed, 20 Jul 2022, Zach O'Keefe wrote:
> In the anonymous collapse path, huge_memory:mm_khugepaged_scan_pmd can
> be used to get roughly the same information as this proposed tracepoint.
> Remove it.
>
> Fixes: 0fff8a0de881 ("mm/madvise: add huge_memory:mm_madvise_collapse tracepoint")
> Link: https://lore.kernel.org/linux-mm/Ys2vzYyVFmljt+B8@google.com/
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: David Rientjes <rientjes@google.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-07-21 0:43 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-20 14:05 [PATCH mm-unstable 0/4] mm: fixes for userspace hugepage collapse, v7 Zach O'Keefe
2022-07-20 14:06 ` [PATCH mm-unstable 1/4] mm/khugepaged: Use minimal bits to store num page < HPAGE_PMD_NR Zach O'Keefe
2022-07-20 17:26 ` Yang Shi
2022-07-21 0:41 ` David Rientjes
2022-07-20 14:06 ` [PATCH mm-unstable 2/4] mm/khugepaged: consistently order cc->is_khugepaged and pte_* checks Zach O'Keefe
2022-07-20 17:27 ` Yang Shi
2022-07-20 19:09 ` Zach O'Keefe
2022-07-21 0:42 ` David Rientjes
2022-07-20 14:06 ` [PATCH mm-unstable 3/4] mm/khugepaged: delay computation of hpage boundaries until use Zach O'Keefe
2022-07-20 17:30 ` Yang Shi
2022-07-21 0:43 ` David Rientjes
2022-07-20 14:06 ` [PATCH mm-unstable 4/4] Revert "mm/madvise: add huge_memory:mm_madvise_collapse tracepoint" Zach O'Keefe
2022-07-20 17:30 ` Yang Shi
2022-07-21 0:43 ` David Rientjes
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.