From: Nicholas Piggin <npiggin@gmail.com>
To: Andy Lutomirski <luto@kernel.org>, x86@kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Dave Hansen <dave.hansen@intel.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 2/8] x86/mm: Handle unlazying membarrier core sync in the arch code
Date: Wed, 16 Jun 2021 14:25:23 +1000 [thread overview]
Message-ID: <1623817261.p1mg6dm2ud.astroid@bobo.none> (raw)
In-Reply-To: <571b7e6b6a907e8a1ffc541c3f0005d347406fd0.1623813516.git.luto@kernel.org>
Excerpts from Andy Lutomirski's message of June 16, 2021 1:21 pm:
> The core scheduler isn't a great place for
> membarrier_mm_sync_core_before_usermode() -- the core scheduler
> doesn't actually know whether we are lazy. With the old code, if a
> CPU is running a membarrier-registered task, goes idle, gets unlazied
> via a TLB shootdown IPI, and switches back to the
> membarrier-registered task, it will do an unnecessary core sync.
I don't really mind, but ARM64 at least hints they might need it
at some point. They can always add it back then, but let's check.
> Conveniently, x86 is the only architecture that does anything in this
> sync_core_before_usermode(), so membarrier_mm_sync_core_before_usermode()
> is a no-op on all other architectures and we can just move the code.
If ARM64 does want it (now or later adds it back), x86 can always make
the membarrier_mm_sync_core_before_usermode() a nop with comment
explaining where it executes the serializing instruction.
I'm fine with the patch though, except I would leave the comment in the
core sched code saying any arch specific sequence to deal with
SYNC_CORE is required for that case.
Thanks,
Nick
>
> (I am not claiming that the SYNC_CORE code was correct before or after this
> change on any non-x86 architecture. I merely claim that this change
> improves readability, is correct on x86, and makes no change on any other
> architecture.)
>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> arch/x86/mm/tlb.c | 53 +++++++++++++++++++++++++++++++---------
> include/linux/sched/mm.h | 13 ----------
> kernel/sched/core.c | 13 ++++------
> 3 files changed, 46 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 78804680e923..59488d663e68 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -8,6 +8,7 @@
> #include <linux/export.h>
> #include <linux/cpu.h>
> #include <linux/debugfs.h>
> +#include <linux/sched/mm.h>
>
> #include <asm/tlbflush.h>
> #include <asm/mmu_context.h>
> @@ -473,16 +474,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> this_cpu_write(cpu_tlbstate_shared.is_lazy, false);
>
> /*
> - * The membarrier system call requires a full memory barrier and
> - * core serialization before returning to user-space, after
> - * storing to rq->curr, when changing mm. This is because
> - * membarrier() sends IPIs to all CPUs that are in the target mm
> - * to make them issue memory barriers. However, if another CPU
> - * switches to/from the target mm concurrently with
> - * membarrier(), it can cause that CPU not to receive an IPI
> - * when it really should issue a memory barrier. Writing to CR3
> - * provides that full memory barrier and core serializing
> - * instruction.
> + * membarrier() support requires that, when we change rq->curr->mm:
> + *
> + * - If next->mm has membarrier registered, a full memory barrier
> + * after writing rq->curr (or rq->curr->mm if we switched the mm
> + * without switching tasks) and before returning to user mode.
> + *
> + * - If next->mm has SYNC_CORE registered, then we sync core before
> + * returning to user mode.
> + *
> + * In the case where prev->mm == next->mm, membarrier() uses an IPI
> + * instead, and no particular barriers are needed while context
> + * switching.
> + *
> + * x86 gets all of this as a side-effect of writing to CR3 except
> + * in the case where we unlazy without flushing.
> + *
> + * All other architectures are civilized and do all of this implicitly
> + * when transitioning from kernel to user mode.
> */
> if (real_prev == next) {
> VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
> @@ -500,7 +509,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> /*
> * If the CPU is not in lazy TLB mode, we are just switching
> * from one thread in a process to another thread in the same
> - * process. No TLB flush required.
> + * process. No TLB flush or membarrier() synchronization
> + * is required.
> */
> if (!was_lazy)
> return;
> @@ -510,16 +520,35 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> * If the TLB is up to date, just use it.
> * The barrier synchronizes with the tlb_gen increment in
> * the TLB shootdown code.
> + *
> + * As a future optimization opportunity, it's plausible
> + * that the x86 memory model is strong enough that this
> + * smp_mb() isn't needed.
> */
> smp_mb();
> next_tlb_gen = atomic64_read(&next->context.tlb_gen);
> if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
> - next_tlb_gen)
> + next_tlb_gen) {
> +#ifdef CONFIG_MEMBARRIER
> + /*
> + * We switched logical mm but we're not going to
> + * write to CR3. We already did smp_mb() above,
> + * but membarrier() might require a sync_core()
> + * as well.
> + */
> + if (unlikely(atomic_read(&next->membarrier_state) &
> + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE))
> + sync_core_before_usermode();
> +#endif
> +
> return;
> + }
>
> /*
> * TLB contents went out of date while we were in lazy
> * mode. Fall through to the TLB switching code below.
> + * No need for an explicit membarrier invocation -- the CR3
> + * write will serialize.
> */
> new_asid = prev_asid;
> need_flush = true;
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index e24b1fe348e3..24d97d1b6252 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -345,16 +345,6 @@ enum {
> #include <asm/membarrier.h>
> #endif
>
> -static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
> -{
> - if (current->mm != mm)
> - return;
> - if (likely(!(atomic_read(&mm->membarrier_state) &
> - MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)))
> - return;
> - sync_core_before_usermode();
> -}
> -
> extern void membarrier_exec_mmap(struct mm_struct *mm);
>
> extern void membarrier_update_current_mm(struct mm_struct *next_mm);
> @@ -370,9 +360,6 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
> static inline void membarrier_exec_mmap(struct mm_struct *mm)
> {
> }
> -static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
> -{
> -}
> static inline void membarrier_update_current_mm(struct mm_struct *next_mm)
> {
> }
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5226cc26a095..e4c122f8bf21 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4220,22 +4220,19 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> kmap_local_sched_in();
>
> fire_sched_in_preempt_notifiers(current);
> +
> /*
> * When switching through a kernel thread, the loop in
> * membarrier_{private,global}_expedited() may have observed that
> * kernel thread and not issued an IPI. It is therefore possible to
> * schedule between user->kernel->user threads without passing though
> * switch_mm(). Membarrier requires a barrier after storing to
> - * rq->curr, before returning to userspace, so provide them here:
> - *
> - * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly
> - * provided by mmdrop(),
> - * - a sync_core for SYNC_CORE.
> + * rq->curr, before returning to userspace, and mmdrop() provides
> + * this barrier.
> */
> - if (mm) {
> - membarrier_mm_sync_core_before_usermode(mm);
> + if (mm)
> mmdrop(mm);
> - }
> +
> if (unlikely(prev_state == TASK_DEAD)) {
> if (prev->sched_class->task_dead)
> prev->sched_class->task_dead(prev);
> --
> 2.31.1
>
>
next prev parent reply other threads:[~2021-06-16 4:25 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-16 3:21 [PATCH 0/8] membarrier cleanups Andy Lutomirski
2021-06-16 3:21 ` [PATCH 1/8] membarrier: Document why membarrier() works Andy Lutomirski
2021-06-16 4:00 ` Nicholas Piggin
2021-06-16 7:30 ` Peter Zijlstra
2021-06-17 23:45 ` Andy Lutomirski
2021-06-16 3:21 ` [PATCH 2/8] x86/mm: Handle unlazying membarrier core sync in the arch code Andy Lutomirski
2021-06-16 4:25 ` Nicholas Piggin [this message]
2021-06-16 18:31 ` Andy Lutomirski
2021-06-16 17:49 ` Mathieu Desnoyers
2021-06-16 18:31 ` Andy Lutomirski
2021-06-16 3:21 ` [PATCH 3/8] membarrier: Remove membarrier_arch_switch_mm() prototype in core code Andy Lutomirski
2021-06-16 4:26 ` Nicholas Piggin
2021-06-16 17:52 ` Mathieu Desnoyers
2021-06-16 3:21 ` [PATCH 4/8] membarrier: Make the post-switch-mm barrier explicit Andy Lutomirski
2021-06-16 4:19 ` Nicholas Piggin
2021-06-16 7:35 ` Peter Zijlstra
2021-06-16 18:41 ` Andy Lutomirski
2021-06-17 1:37 ` Nicholas Piggin
2021-06-17 2:57 ` Andy Lutomirski
2021-06-17 5:32 ` Andy Lutomirski
2021-06-17 6:51 ` Nicholas Piggin
2021-06-17 23:49 ` Andy Lutomirski
2021-06-19 2:53 ` Nicholas Piggin
2021-06-19 3:20 ` Andy Lutomirski
2021-06-19 4:27 ` Nicholas Piggin
2021-06-17 9:08 ` [RFC][PATCH] sched: Use lightweight hazard pointers to grab lazy mms Peter Zijlstra
2021-06-17 9:10 ` Peter Zijlstra
2021-06-17 10:00 ` Nicholas Piggin
2021-06-17 9:13 ` Peter Zijlstra
2021-06-17 14:06 ` Andy Lutomirski
2021-06-17 9:28 ` Peter Zijlstra
2021-06-17 14:03 ` Andy Lutomirski
2021-06-17 14:10 ` Andy Lutomirski
2021-06-17 15:45 ` Peter Zijlstra
2021-06-18 3:29 ` Paul E. McKenney
2021-06-18 5:04 ` Andy Lutomirski
2021-06-17 15:02 ` [PATCH 4/8] membarrier: Make the post-switch-mm barrier explicit Paul E. McKenney
2021-06-18 0:06 ` Andy Lutomirski
2021-06-18 3:35 ` Paul E. McKenney
2021-06-17 8:45 ` Peter Zijlstra
2021-06-16 3:21 ` [PATCH 5/8] membarrier, kthread: Use _ONCE accessors for task->mm Andy Lutomirski
2021-06-16 4:28 ` Nicholas Piggin
2021-06-16 18:08 ` Mathieu Desnoyers
2021-06-16 18:45 ` Andy Lutomirski
2021-06-16 3:21 ` [PATCH 6/8] powerpc/membarrier: Remove special barrier on mm switch Andy Lutomirski
2021-06-16 4:36 ` Nicholas Piggin
2021-06-16 3:21 ` [PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE Andy Lutomirski
2021-06-16 9:28 ` Russell King (Oracle)
2021-06-16 10:16 ` Peter Zijlstra
2021-06-16 10:20 ` Peter Zijlstra
2021-06-16 10:34 ` Russell King (Oracle)
2021-06-16 11:10 ` Peter Zijlstra
2021-06-16 13:22 ` Russell King (Oracle)
2021-06-16 15:04 ` Catalin Marinas
2021-06-16 15:23 ` Russell King (Oracle)
2021-06-16 15:45 ` Catalin Marinas
2021-06-16 16:00 ` Catalin Marinas
2021-06-16 16:27 ` Russell King (Oracle)
2021-06-17 8:55 ` Krzysztof Hałasa
2021-06-18 12:54 ` Linus Walleij
2021-06-18 13:19 ` Russell King (Oracle)
2021-06-18 13:36 ` Arnd Bergmann
2021-06-17 10:40 ` Mark Rutland
2021-06-17 11:23 ` Russell King (Oracle)
2021-06-17 11:33 ` Mark Rutland
2021-06-17 13:41 ` Andy Lutomirski
2021-06-17 13:51 ` Mark Rutland
2021-06-17 14:00 ` Andy Lutomirski
2021-06-17 14:20 ` Mark Rutland
2021-06-17 15:01 ` Peter Zijlstra
2021-06-17 15:13 ` Peter Zijlstra
2021-06-17 14:16 ` Mathieu Desnoyers
2021-06-17 14:05 ` Peter Zijlstra
2021-06-18 0:07 ` Andy Lutomirski
2021-06-16 3:21 ` [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation Andy Lutomirski
2021-06-16 4:45 ` Nicholas Piggin
2021-06-16 18:52 ` Andy Lutomirski
2021-06-16 23:48 ` Andy Lutomirski
2021-06-18 15:27 ` Christophe Leroy
2021-06-16 10:20 ` Will Deacon
2021-06-16 23:58 ` Andy Lutomirski
2021-06-17 14:47 ` Mathieu Desnoyers
2021-06-18 0:12 ` Andy Lutomirski
2021-06-18 16:31 ` Mathieu Desnoyers
2021-06-18 19:58 ` Andy Lutomirski
2021-06-18 20:09 ` Mathieu Desnoyers
2021-06-19 6:02 ` Nicholas Piggin
2021-06-19 15:50 ` Andy Lutomirski
2021-06-20 2:10 ` Nicholas Piggin
2021-06-17 15:16 ` Mathieu Desnoyers
2021-06-18 0:13 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1623817261.p1mg6dm2ud.astroid@bobo.none \
--to=npiggin@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).