All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: Yu Zhao <yuzhao@google.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [PATCH v2 0/5] Avoid building lrugen page table walk code
Date: Fri, 7 Jul 2023 18:54:43 +0530	[thread overview]
Message-ID: <47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com> (raw)
In-Reply-To: <CAOUHufY-g7ZXymsJ53L9KCqD3nLoG0qrCpF884Tbo9snw-aSYQ@mail.gmail.com>

On 7/7/23 1:27 PM, Yu Zhao wrote:
> On Thu, Jul 6, 2023 at 12:21 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> This patchset avoids building changes added by commit bd74fdaea146 ("mm:
>> multi-gen LRU: support page table walks") on platforms that don't support
>> hardware atomic updates of access bits.
>>
>> Aneesh Kumar K.V (5):
>>   mm/mglru: Create a new helper iterate_mm_list_walk
>>   mm/mglru: Move Bloom filter code around
>>   mm/mglru: Move code around to make future patch easy
>>   mm/mglru: move iterate_mm_list_walk Helper
>>   mm/mglru: Don't build multi-gen LRU page table walk code on
>>     architecture not supported
>>
>>  arch/Kconfig               |   3 +
>>  arch/arm64/Kconfig         |   1 +
>>  arch/x86/Kconfig           |   1 +
>>  include/linux/memcontrol.h |   2 +-
>>  include/linux/mm_types.h   |  10 +-
>>  include/linux/mmzone.h     |  12 +-
>>  kernel/fork.c              |   2 +-
>>  mm/memcontrol.c            |   2 +-
>>  mm/vmscan.c                | 955 +++++++++++++++++++------------------
>>  9 files changed, 528 insertions(+), 460 deletions(-)
> 
> 1. There is no need for a new Kconfig -- the condition is simply
>    defined(CONFIG_LRU_GEN) && !defined(arch_has_hw_pte_young)
> 
> 2. The best practice to disable static functions is not by macros but:
> 
>     static int const_cond(void)
>     {
>         return 1;
>     }
> 
>     int main(void)
>     {
>         int a = const_cond();
> 
>         if (a)
>                 return 0;
> 
>         /* the compiler doesn't generate code for static funcs below */
>         static_func_1();
>         ...
>         static_func_N();
> 
> LTO also optimizes external functions. But not everyone uses it. So we
> still need macros for them, and of course data structures.
> 
> 3. In 4/5, you have:
> 
>     @@ -461,6 +461,7 @@ enum {
>      struct lru_gen_mm_state {
>         /* set to max_seq after each iteration */
>         unsigned long seq;
>     +#ifdef CONFIG_LRU_TASK_PAGE_AGING
>         /* where the current iteration continues after */
>         struct list_head *head;
>         /* where the last iteration ended before */
>     @@ -469,6 +470,11 @@ struct lru_gen_mm_state {
>         unsigned long *filters[NR_BLOOM_FILTERS];
>         /* the mm stats for debugging */
>         unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
>     +#else
>     +   /* protect the seq update above */
>     +   /* May be we can use lruvec->lock?  */
>     +   spinlock_t lock;
>     +#endif
>      };
> 
> The answer is yes, and not only that, we don't need lru_gen_mm_state at all.
> 
> I'm attaching a patch that fixes all above. If you want to post it,
> please feel free -- fully test it please, since I didn't. Otherwise I
> can ask TJ to help make this work for you.
> 
> $ git diff --stat
>  include/linux/memcontrol.h |   2 +-
>  include/linux/mm_types.h   |  12 +-
>  include/linux/mmzone.h     |   2 +
>  kernel/bounds.c            |   6 +-
>  kernel/fork.c              |   2 +-
>  mm/vmscan.c                | 169 +++++++++++++++++++--------
>  6 files changed, 137 insertions(+), 56 deletions(-)
> 
> On x86:
> 
> $ ./scripts/bloat-o-meter mm/vmscan.o.old mm/vmscan.o
> add/remove: 24/34 grow/shrink: 2/7 up/down: 966/-8716 (-7750)
> Function                                     old     new   delta
> ...
> should_skip_vma                              206       -    -206
> get_pte_pfn                                  261       -    -261
> lru_gen_add_mm                               323       -    -323
> lru_gen_seq_show                            1710    1370    -340
> lru_gen_del_mm                               432       -    -432
> reset_batch_size                             572       -    -572
> try_to_inc_max_seq                          2947    1635   -1312
> walk_pmd_range_locked                       1508       -   -1508
> walk_pud_range                              3238       -   -3238
> Total: Before=99449, After=91699, chg -7.79%
> 
> $ objdump -S mm/vmscan.o | grep -A 20 "<try_to_inc_max_seq>:"
> 000000000000a350 <try_to_inc_max_seq>:
> {
>     a350: e8 00 00 00 00        call   a355 <try_to_inc_max_seq+0x5>
>     a355: 55                    push   %rbp
>     a356: 48 89 e5              mov    %rsp,%rbp
>     a359: 41 57                push   %r15
>     a35b: 41 56                push   %r14
>     a35d: 41 55                push   %r13
>     a35f: 41 54                push   %r12
>     a361: 53                    push   %rbx
>     a362: 48 83 ec 70          sub    $0x70,%rsp
>     a366: 41 89 d4              mov    %edx,%r12d
>     a369: 49 89 f6              mov    %rsi,%r14
>     a36c: 49 89 ff              mov    %rdi,%r15
> spin_lock_irq(&lruvec->lru_lock);
>     a36f: 48 8d 5f 50          lea    0x50(%rdi),%rbx
>     a373: 48 89 df              mov    %rbx,%rdi
>     a376: e8 00 00 00 00        call   a37b <try_to_inc_max_seq+0x2b>
> success = max_seq == lrugen->max_seq;
>     a37b: 49 8b 87 88 00 00 00 mov    0x88(%r15),%rax
>     a382: 4c 39 f0              cmp    %r14,%rax

For the below diff:

@@ -4497,14 +4547,16 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 	struct lru_gen_mm_walk *walk;
 	struct mm_struct *mm = NULL;
 	struct lru_gen_folio *lrugen = &lruvec->lrugen;
+	struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
 
 	VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
 
+	if (!mm_state)
+		return inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+
 	/* see the comment in iterate_mm_list() */
-	if (max_seq <= READ_ONCE(lruvec->mm_state.seq)) {
-		success = false;
-		goto done;
-	}
+	if (max_seq <= READ_ONCE(mm_state->seq))
+		return false;
 
 	/*
 	 * If the hardware doesn't automatically set the accessed bit, fallback
@@ -4534,8 +4586,10 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 			walk_mm(lruvec, mm, walk);
 	} while (mm);
 done:
-	if (success)
-		inc_max_seq(lruvec, can_swap, force_scan);
+	if (success) {
+		success = inc_max_seq(lruvec, max_seq, can_swap, force_scan);
+		WARN_ON_ONCE(!success);
+	}
 
 	return success;
 }
@

We did discuss a possible race that can happen if we allow multiple callers hit inc_max_seq at the same time. 
inc_max_seq drop the lru_lock and restart the loop at the previous value of type. ie. if we want to do the above
we might also need the below? 


modified   mm/vmscan.c
@@ -4368,6 +4368,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
 	int type, zone;
 	struct lru_gen_struct *lrugen = &lruvec->lrugen;
 
+retry:
 	spin_lock_irq(&lruvec->lru_lock);
 
 	VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4381,7 +4382,7 @@ void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
 		while (!inc_min_seq(lruvec, type, can_swap)) {
 			spin_unlock_irq(&lruvec->lru_lock);
 			cond_resched();
-			spin_lock_irq(&lruvec->lru_lock);
+			goto retry;
 		}
 	}
 
I also found that allowing only one cpu to increment max seq value and making other request
with the same max_seq return false is also useful in performance runs. ie, we need an equivalent of this? 


+	if (max_seq <= READ_ONCE(mm_state->seq))
+		return false;



  reply	other threads:[~2023-07-07 13:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-06  6:20 [PATCH v2 0/5] Avoid building lrugen page table walk code Aneesh Kumar K.V
2023-07-06  6:20 ` [PATCH v2 1/5] mm/mglru: Create a new helper iterate_mm_list_walk Aneesh Kumar K.V
2023-07-06  6:20 ` [PATCH v2 2/5] mm/mglru: Move Bloom filter code around Aneesh Kumar K.V
2023-07-06  6:20 ` [PATCH v2 3/5] mm/mglru: Move code around to make future patch easy Aneesh Kumar K.V
2023-07-06  6:20 ` [PATCH v2 4/5] mm/mglru: move iterate_mm_list_walk Helper Aneesh Kumar K.V
2023-07-06  6:20 ` [PATCH v2 5/5] mm/mglru: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
2023-07-07  7:57 ` [PATCH v2 0/5] Avoid building lrugen page table walk code Yu Zhao
2023-07-07 13:24   ` Aneesh Kumar K V [this message]
2023-07-08  2:06     ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.