From: Nicholas Piggin <npiggin@gmail.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com>, Randy Dunlap <rdunlap@infradead.org>, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, Anton Blanchard <anton@ozlabs.org>, Andy Lutomirski <luto@kernel.org> Subject: [PATCH v3 3/4] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Date: Tue, 1 Jun 2021 16:23:02 +1000 [thread overview] Message-ID: <20210601062303.3932513-4-npiggin@gmail.com> (raw) In-Reply-To: <20210601062303.3932513-1-npiggin@gmail.com> On big systems, the mm refcount can become highly contented when doing a lot of context switching with threaded applications (particularly switching between the idle thread and an application thread). Abandoning lazy tlb slows switching down quite a bit in the important user->idle->user cases, so instead implement a non-refcounted scheme that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down any remaining lazy ones. Shootdown IPIs are some concern, but they have not been observed to be a big problem with this scheme (the powerpc implementation generated 314 additional interrupts on a 144 CPU system during a kernel compile). There are a number of strategies that could be employed to reduce IPIs if they turn out to be a problem for some workload. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/Kconfig | 14 +++++++++++++- kernel/fork.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index 276e1c1c0219..91e1882e3284 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -439,11 +439,23 @@ config NO_MMU_LAZY_TLB def_bool n # Use normal mm refcounting for MMU_LAZY_TLB kernel thread references. -# For now, this must be enabled if MMU_LAZY_TLB is enabled. config MMU_LAZY_TLB_REFCOUNT def_bool y depends on MMU_LAZY_TLB +# Instead of refcounting the lazy mm struct for kernel thread references +# (which can cause contention with multi-threaded apps on large multiprocessor +# systems), this option causes __mmdrop to IPI all CPUs in the mm_cpumask and +# switch to init_mm if they were using the to-be-freed mm as the lazy tlb. To +# implement this, architectures must use _lazy_tlb variants of mm refcounting +# when releasing kernel thread mm references, and mm_cpumask must include at +# least all possible CPUs in which the mm might be lazy, at the time of the +# final mmdrop. mmgrab/mmdrop in arch/ code must be switched to _lazy_tlb +# postfix as necessary. +config MMU_LAZY_TLB_SHOOTDOWN + bool + depends on MMU_LAZY_TLB + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool diff --git a/kernel/fork.c b/kernel/fork.c index dc06afd725cb..d485c24426a0 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -674,6 +674,53 @@ static void check_mm(struct mm_struct *mm) #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) #define free_mm(mm) (kmem_cache_free(mm_cachep, (mm))) +static void do_shoot_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + if (current->active_mm == mm) { + WARN_ON_ONCE(current->mm); + current->active_mm = &init_mm; + switch_mm(mm, &init_mm, current); + } +} + +static void do_check_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + WARN_ON_ONCE(current->active_mm == mm); +} + +static void shoot_lazy_tlbs(struct mm_struct *mm) +{ + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) { + /* + * IPI overheads have not found to be expensive, but they could + * be reduced in a number of possible ways, for example (in + * roughly increasing order of complexity): + * - A batch of mms requiring IPIs could be gathered and freed + * at once. + * - CPUs could store their active mm somewhere that can be + * remotely checked without a lock, to filter out + * false-positives in the cpumask. + * - After mm_users or mm_count reaches zero, switching away + * from the mm could clear mm_cpumask to reduce some IPIs + * (some batching or delaying would help). + * - A delayed freeing and RCU-like quiescing sequence based on + * mm switching to avoid IPIs completely. + */ + on_each_cpu_mask(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1); + if (IS_ENABLED(CONFIG_DEBUG_VM)) + on_each_cpu(do_check_lazy_tlb, (void *)mm, 1); + } else { + /* + * In this case, lazy tlb mms are refounted and would not reach + * __mmdrop until all CPUs have switched away and mmdrop()ed. + */ + } +} + /* * Called when the last reference to the mm * is dropped: either by a lazy thread or by @@ -683,7 +730,12 @@ void __mmdrop(struct mm_struct *mm) { BUG_ON(mm == &init_mm); WARN_ON_ONCE(mm == current->mm); + + /* Ensure no CPUs are using this as their lazy tlb mm */ + shoot_lazy_tlbs(mm); + WARN_ON_ONCE(mm == current->active_mm); + mm_free_pgd(mm); destroy_context(mm); mmu_notifier_subscriptions_destroy(mm); -- 2.23.0
WARNING: multiple messages have this Message-ID (diff)
From: Nicholas Piggin <npiggin@gmail.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-arch@vger.kernel.org, Randy Dunlap <rdunlap@infradead.org>, linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>, linux-mm@kvack.org, Andy Lutomirski <luto@kernel.org>, linuxppc-dev@lists.ozlabs.org Subject: [PATCH v3 3/4] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Date: Tue, 1 Jun 2021 16:23:02 +1000 [thread overview] Message-ID: <20210601062303.3932513-4-npiggin@gmail.com> (raw) In-Reply-To: <20210601062303.3932513-1-npiggin@gmail.com> On big systems, the mm refcount can become highly contented when doing a lot of context switching with threaded applications (particularly switching between the idle thread and an application thread). Abandoning lazy tlb slows switching down quite a bit in the important user->idle->user cases, so instead implement a non-refcounted scheme that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down any remaining lazy ones. Shootdown IPIs are some concern, but they have not been observed to be a big problem with this scheme (the powerpc implementation generated 314 additional interrupts on a 144 CPU system during a kernel compile). There are a number of strategies that could be employed to reduce IPIs if they turn out to be a problem for some workload. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/Kconfig | 14 +++++++++++++- kernel/fork.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index 276e1c1c0219..91e1882e3284 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -439,11 +439,23 @@ config NO_MMU_LAZY_TLB def_bool n # Use normal mm refcounting for MMU_LAZY_TLB kernel thread references. -# For now, this must be enabled if MMU_LAZY_TLB is enabled. config MMU_LAZY_TLB_REFCOUNT def_bool y depends on MMU_LAZY_TLB +# Instead of refcounting the lazy mm struct for kernel thread references +# (which can cause contention with multi-threaded apps on large multiprocessor +# systems), this option causes __mmdrop to IPI all CPUs in the mm_cpumask and +# switch to init_mm if they were using the to-be-freed mm as the lazy tlb. To +# implement this, architectures must use _lazy_tlb variants of mm refcounting +# when releasing kernel thread mm references, and mm_cpumask must include at +# least all possible CPUs in which the mm might be lazy, at the time of the +# final mmdrop. mmgrab/mmdrop in arch/ code must be switched to _lazy_tlb +# postfix as necessary. +config MMU_LAZY_TLB_SHOOTDOWN + bool + depends on MMU_LAZY_TLB + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool diff --git a/kernel/fork.c b/kernel/fork.c index dc06afd725cb..d485c24426a0 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -674,6 +674,53 @@ static void check_mm(struct mm_struct *mm) #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) #define free_mm(mm) (kmem_cache_free(mm_cachep, (mm))) +static void do_shoot_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + if (current->active_mm == mm) { + WARN_ON_ONCE(current->mm); + current->active_mm = &init_mm; + switch_mm(mm, &init_mm, current); + } +} + +static void do_check_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + WARN_ON_ONCE(current->active_mm == mm); +} + +static void shoot_lazy_tlbs(struct mm_struct *mm) +{ + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) { + /* + * IPI overheads have not found to be expensive, but they could + * be reduced in a number of possible ways, for example (in + * roughly increasing order of complexity): + * - A batch of mms requiring IPIs could be gathered and freed + * at once. + * - CPUs could store their active mm somewhere that can be + * remotely checked without a lock, to filter out + * false-positives in the cpumask. + * - After mm_users or mm_count reaches zero, switching away + * from the mm could clear mm_cpumask to reduce some IPIs + * (some batching or delaying would help). + * - A delayed freeing and RCU-like quiescing sequence based on + * mm switching to avoid IPIs completely. + */ + on_each_cpu_mask(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1); + if (IS_ENABLED(CONFIG_DEBUG_VM)) + on_each_cpu(do_check_lazy_tlb, (void *)mm, 1); + } else { + /* + * In this case, lazy tlb mms are refounted and would not reach + * __mmdrop until all CPUs have switched away and mmdrop()ed. + */ + } +} + /* * Called when the last reference to the mm * is dropped: either by a lazy thread or by @@ -683,7 +730,12 @@ void __mmdrop(struct mm_struct *mm) { BUG_ON(mm == &init_mm); WARN_ON_ONCE(mm == current->mm); + + /* Ensure no CPUs are using this as their lazy tlb mm */ + shoot_lazy_tlbs(mm); + WARN_ON_ONCE(mm == current->active_mm); + mm_free_pgd(mm); destroy_context(mm); mmu_notifier_subscriptions_destroy(mm); -- 2.23.0
next prev parent reply other threads:[~2021-06-01 6:23 UTC|newest] Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-01 6:22 [PATCH v3 0/4] shoot lazy tlbs Nicholas Piggin 2021-06-01 6:22 ` Nicholas Piggin 2021-06-01 6:23 ` [PATCH v3 1/4] lazy tlb: introduce lazy mm refcount helper functions Nicholas Piggin 2021-06-01 6:23 ` Nicholas Piggin 2021-06-01 6:23 ` [PATCH v3 2/4] lazy tlb: allow lazy tlb mm switching to be configurable Nicholas Piggin 2021-06-01 6:23 ` Nicholas Piggin 2021-06-01 6:23 ` Nicholas Piggin [this message] 2021-06-01 6:23 ` [PATCH v3 3/4] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Nicholas Piggin 2021-06-01 6:23 ` [PATCH v3 4/4] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN Nicholas Piggin 2021-06-01 6:23 ` Nicholas Piggin 2021-06-04 16:54 ` [PATCH v3 0/4] shoot lazy tlbs Andy Lutomirski 2021-06-04 16:54 ` Andy Lutomirski 2021-06-04 17:05 ` Andy Lutomirski 2021-06-04 17:05 ` Andy Lutomirski 2021-06-05 0:17 ` Nicholas Piggin 2021-06-05 0:17 ` Nicholas Piggin 2021-06-05 0:26 ` Nicholas Piggin 2021-06-05 0:26 ` Nicholas Piggin 2021-06-05 2:52 ` Nicholas Piggin 2021-06-05 2:52 ` Nicholas Piggin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210601062303.3932513-4-npiggin@gmail.com \ --to=npiggin@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=anton@ozlabs.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=luto@kernel.org \ --cc=rdunlap@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.