From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: Yu Zhao <yuzhao@google.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [PATCH v2 0/5] Avoid building lrugen page table walk code
Date: Fri, 7 Jul 2023 18:54:43 +0530 [thread overview]
Message-ID: <47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com> (raw)
In-Reply-To: <CAOUHufY-g7ZXymsJ53L9KCqD3nLoG0qrCpF884Tbo9snw-aSYQ@mail.gmail.com>
On 7/7/23 1:27 PM, Yu Zhao wrote:
> On Thu, Jul 6, 2023 at 12:21 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> This patchset avoids building changes added by commit bd74fdaea146 ("mm:
>> multi-gen LRU: support page table walks") on platforms that don't support
>> hardware atomic updates of access bits.
>>
>> Aneesh Kumar K.V (5):
>> mm/mglru: Create a new helper iterate_mm_list_walk
>> mm/mglru: Move Bloom filter code around
>> mm/mglru: Move code around to make future patch easy
>> mm/mglru: move iterate_mm_list_walk Helper
>> mm/mglru: Don't build multi-gen LRU page table walk code on
>> architecture not supported
>>
>> arch/Kconfig | 3 +
>> arch/arm64/Kconfig | 1 +
>> arch/x86/Kconfig | 1 +
>> include/linux/memcontrol.h | 2 +-
>> include/linux/mm_types.h | 10 +-
>> include/linux/mmzone.h | 12 +-
>> kernel/fork.c | 2 +-
>> mm/memcontrol.c | 2 +-
>> mm/vmscan.c | 955 +++++++++++++++++++------------------
>> 9 files changed, 528 insertions(+), 460 deletions(-)
>
> 1. There is no need for a new Kconfig -- the condition is simply
> defined(CONFIG_LRU_GEN) && !defined(arch_has_hw_pte_young)
>
> 2. The best practice to disable static functions is not by macros but:
>
> static int const_cond(void)
> {
> return 1;
> }
>
> int main(void)
> {
> int a = const_cond();
>
> if (a)
> return 0;
>
> /* the compiler doesn't generate code for static funcs below */
> static_func_1();
> ...
> static_func_N();
>
> LTO also optimizes external functions. But not everyone uses it. So we
> still need macros for them, and of course data structures.
>
> 3. In 4/5, you have:
>
> @@ -461,6 +461,7 @@ enum {
> struct lru_gen_mm_state {
> /* set to max_seq after each iteration */
> unsigned long seq;
> +#ifdef CONFIG_LRU_TASK_PAGE_AGING
> /* where the current iteration continues after */
> struct list_head *head;
> /* where the last iteration ended before */
> @@ -469,6 +470,11 @@ struct lru_gen_mm_state {
> unsigned long *filters[NR_BLOOM_FILTERS];
> /* the mm stats for debugging */
> unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
> +#else
> + /* protect the seq update above */
> + /* May be we can use lruvec->lock? */
> + spinlock_t lock;
> +#endif
> };
>
> The answer is yes, and not only that, we don't need lru_gen_mm_state at all.
>
> I'm attaching a patch that fixes all above. If you want to post it,
> please feel free -- fully test it please, since I didn't. Otherwise I
> can ask TJ to help make this work for you.
>
> $ git diff --stat
> include/linux/memcontrol.h | 2 +-
> include/linux/mm_types.h | 12 +-
> include/linux/mmzone.h | 2 +
> kernel/bounds.c | 6 +-
> kernel/fork.c | 2 +-
> mm/vmscan.c | 169 +++++++++++++++++++--------
> 6 files changed, 137 insertions(+), 56 deletions(-)
>
> On x86:
>
> $ ./scripts/bloat-o-meter mm/vmscan.o.old mm/vmscan.o
> add/remove: 24/34 grow/shrink: 2/7 up/down: 966/-8716 (-7750)
> Function old new delta
> ...
> should_skip_vma 206 - -206
> get_pte_pfn 261 - -261
> lru_gen_add_mm 323 - -323
> lru_gen_seq_show 1710 1370 -340
> lru_gen_del_mm 432 - -432
> reset_batch_size 572 - -572
> try_to_inc_max_seq 2947 1635 -1312
> walk_pmd_range_locked 1508 - -1508
> walk_pud_range 3238 - -3238
> Total: Before=99449, After=91699, chg -7.79%
>
> $ objdump -S mm/vmscan.o | grep -A 20 "<try_to_inc_max_seq>:"
> 000000000000a350 <try_to_inc_max_seq>:
> {
> a350: e8 00 00 00 00 call a355 <try_to_inc_max_seq+0x5>
> a355: 55 push %rbp
> a356: 48 89 e5 mov %rsp,%rbp
> a359: 41 57 push %r15
> a35b: 41 56 push %r14
> a35d: 41 55 push %r13
> a35f: 41 54 push %r12
> a361: 53 push %rbx
> a362: 48 83 ec 70 sub $0x70,%rsp
> a366: 41 89 d4 mov %edx,%r12d
> a369: 49 89 f6 mov %rsi,%r14
> a36c: 49 89 ff mov %rdi,%r15
> spin_lock_irq(&lruvec->lru_lock);
> a36f: 48 8d 5f 50 lea 0x50(%rdi),%rbx
> a373: 48 89 df mov %rbx,%rdi
> a376: e8 00 00 00 00 call a37b <try_to_inc_max_seq+0x2b>
> success = max_seq == lrugen->max_seq;
> a37b: 49 8b 87 88 00 00 00 mov 0x88(%r15),%rax
> a382: 4c 39 f0 cmp %r14,%rax
For the below diff:
@@ -4497,14 +4547,16 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
struct lru_gen_mm_walk *walk;
struct mm_struct *mm = NULL;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
+ if (!mm_state)
+ return inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+
/* see the comment in iterate_mm_list() */
- if (max_seq <= READ_ONCE(lruvec->mm_state.seq)) {
- success = false;
- goto done;
- }
+ if (max_seq <= READ_ONCE(mm_state->seq))
+ return false;
/*
* If the hardware doesn't automatically set the accessed bit, fallback
@@ -4534,8 +4586,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
walk_mm(lruvec, mm, walk);
} while (mm);
done:
- if (success)
- inc_max_seq(lruvec, can_swap, force_scan);
+ if (success) {
+ success = inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+ WARN_ON_ONCE(!success);
+ }
return success;
}
@
We did discuss a possible race that can happen if we allow multiple callers hit inc_max_seq at the same time.
inc_max_seq drop the lru_lock and restart the loop at the previous value of type. ie. if we want to do the above
we might also need the below?
modified mm/vmscan.c
@@ -4368,6 +4368,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
int type, zone;
struct lru_gen_struct *lrugen = &lruvec->lrugen;
+retry:
spin_lock_irq(&lruvec->lru_lock);
VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4381,7 +4382,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
while (!inc_min_seq(lruvec, type, can_swap)) {
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
- spin_lock_irq(&lruvec->lru_lock);
+ goto retry;
}
}
I also found that allowing only one cpu to increment max seq value and making other request
with the same max_seq return false is also useful in performance runs. ie, we need an equivalent of this?
+ if (max_seq <= READ_ONCE(mm_state->seq))
+ return false;
next prev parent reply other threads:[~2023-07-07 13:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-06 6:20 [PATCH v2 0/5] Avoid building lrugen page table walk code Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 1/5] mm/mglru: Create a new helper iterate_mm_list_walk Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 2/5] mm/mglru: Move Bloom filter code around Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 3/5] mm/mglru: Move code around to make future patch easy Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 4/5] mm/mglru: move iterate_mm_list_walk Helper Aneesh Kumar K.V
2023-07-06 6:20 ` [PATCH v2 5/5] mm/mglru: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
2023-07-07 7:57 ` [PATCH v2 0/5] Avoid building lrugen page table walk code Yu Zhao
2023-07-07 13:24 ` Aneesh Kumar K V [this message]
2023-07-08 2:06 ` Yu Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.