linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, "levi . yun" <yeoreum.yun@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	stable@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Will Deacon <will@kernel.org>, Aaron Lu <aaron.lu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org
Subject: Re: [PATCH] sched: Add missing memory barrier in switch_mm_cid
Date: Fri, 12 Apr 2024 12:22:56 +0200	[thread overview]
Message-ID: <ZhkLgJ2ZkI3JO0m/@gmail.com> (raw)
In-Reply-To: <20240411174302.353889-1-mathieu.desnoyers@efficios.com>


* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
> which the core scheduler code has depended upon since commit:
> 
>     commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")
> 
> If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
> unset the actively used cid when it fails to observe active task after it
> sets lazy_put.
> 
> There *is* a memory barrier between storing to rq->curr and _return to
> userspace_ (as required by membarrier), but the rseq mm_cid has stricter
> requirements: the barrier needs to be issued between store to rq->curr
> and switch_mm_cid(), which happens earlier than:
> 
> - spin_unlock(),
> - switch_to().
> 
> So it's fine when the architecture switch_mm() happens to have that
> barrier already, but less so when the architecture only provides the
> full barrier in switch_to() or spin_unlock().
> 
> It is a bug in the rseq switch_mm_cid() implementation. All architectures
> that don't have memory barriers in switch_mm(), but rather have the full
> barrier either in finish_lock_switch() or switch_to() have them too late
> for the needs of switch_mm_cid().
> 
> Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
> generic barrier.h header, and use it in switch_mm_cid() for scheduler
> transitions where switch_mm() is expected to provide a memory barrier.
> 
> Architectures can override smp_mb__after_switch_mm() if their
> switch_mm() implementation provides an implicit memory barrier.
> Override it with a no-op on x86 which implicitly provide this memory
> barrier by writing to CR3.
> 
> Link: https://lore.kernel.org/lkml/20240305145335.2696125-1-yeoreum.yun@arm.com/
> Reported-by: levi.yun <yeoreum.yun@arm.com>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for x86
> Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
> Cc: <stable@vger.kernel.org> # 6.4.x
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: levi.yun <yeoreum.yun@arm.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Aaron Lu <aaron.lu@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: x86@kernel.org
> ---
>  arch/x86/include/asm/barrier.h |  3 +++
>  include/asm-generic/barrier.h  |  8 ++++++++
>  kernel/sched/sched.h           | 20 ++++++++++++++------
>  3 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 0216f63a366b..d0795b5fab46 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -79,6 +79,9 @@ do {									\
>  #define __smp_mb__before_atomic()	do { } while (0)
>  #define __smp_mb__after_atomic()	do { } while (0)
>  
> +/* Writing to CR3 provides a full memory barrier in switch_mm(). */
> +#define smp_mb__after_switch_mm()	do { } while (0)
> +
>  #include <asm-generic/barrier.h>
>  
>  #endif /* _ASM_X86_BARRIER_H */
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index 961f4d88f9ef..5a6c94d7a598 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -296,5 +296,13 @@ do {									\
>  #define io_stop_wc() do { } while (0)
>  #endif
>  
> +/*
> + * Architectures that guarantee an implicit smp_mb() in switch_mm()
> + * can override smp_mb__after_switch_mm.
> + */
> +#ifndef smp_mb__after_switch_mm
> +#define smp_mb__after_switch_mm()	smp_mb()
> +#endif
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __ASM_GENERIC_BARRIER_H */
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 001fe047bd5d..35717359d3ca 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -79,6 +79,8 @@
>  # include <asm/paravirt_api_clock.h>
>  #endif
>  
> +#include <asm/barrier.h>
> +
>  #include "cpupri.h"
>  #include "cpudeadline.h"
>  
> @@ -3445,13 +3447,19 @@ static inline void switch_mm_cid(struct rq *rq,
>  		 * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu].
>  		 * Provide it here.
>  		 */
> -		if (!prev->mm)                          // from kernel
> +		if (!prev->mm) {                        // from kernel
>  			smp_mb();
> -		/*
> -		 * user -> user transition guarantees a memory barrier through
> -		 * switch_mm() when current->mm changes. If current->mm is
> -		 * unchanged, no barrier is needed.
> -		 */
> +		} else {				// from user
> +			/*
> +			 * user -> user transition relies on an implicit
> +			 * memory barrier in switch_mm() when
> +			 * current->mm changes. If the architecture
> +			 * switch_mm() does not have an implicit memory
> +			 * barrier, it is emitted here.  If current->mm
> +			 * is unchanged, no barrier is needed.
> +			 */
> +			smp_mb__after_switch_mm();
> +		}
>  	}
>  	if (prev->mm_cid_active) {
>  		mm_cid_snapshot_time(rq, prev->mm);

Please move switch_mm_cid() from sched.h to core.c, where its only user 
resides.

Thanks,

	Ingo

  reply	other threads:[~2024-04-12 10:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11 17:43 [PATCH] sched: Add missing memory barrier in switch_mm_cid Mathieu Desnoyers
2024-04-12 10:22 ` Ingo Molnar [this message]
2024-04-12 14:38   ` Mathieu Desnoyers
  -- strict thread matches above, loose matches on Subject: below --
2024-03-08 15:07 Mathieu Desnoyers
2024-03-19  9:20 ` Yeo Reum Yun
2024-04-08  9:38   ` Yeo Reum Yun
2024-04-10 15:22     ` Mathieu Desnoyers
2024-04-09 10:17 ` Catalin Marinas
2024-04-10 17:18 ` Mathieu Desnoyers
2024-04-11 14:29   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhkLgJ2ZkI3JO0m/@gmail.com \
    --to=mingo@kernel.org \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hpa@zytor.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).