linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] membarrier fixes
@ 2020-12-02 15:35 Andy Lutomirski
  2020-12-02 15:35 ` [PATCH v2 1/4] x86/membarrier: Get rid of a dubious optimization Andy Lutomirski
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Andy Lutomirski @ 2020-12-02 15:35 UTC (permalink / raw)
  To: x86, Mathieu Desnoyers
  Cc: LKML, Nicholas Piggin, Arnd Bergmann, Anton Blanchard, Andy Lutomirski

Hi all-

This is v2, and this time around everything is tagged for -stable.

Changes from v1:
 - patch 1: comment fixes from Mathier
 - patch 2: improved comments
 - patch 3: split out as a separate patch
 - patch 4: now has a proper explanation

Mathieu, I think we have to make sync_core sync the caller.  See patch 4.

Andy Lutomirski (4):
  x86/membarrier: Get rid of a dubious optimization
  membarrier: Add an actual barrier before rseq_preempt()
  membarrier: Explicitly sync remote cores when SYNC_CORE is requested
  membarrier: Execute SYNC_CORE on the calling thread

 arch/x86/include/asm/sync_core.h |  9 ++--
 arch/x86/mm/tlb.c                | 10 ++++-
 kernel/sched/membarrier.c        | 75 ++++++++++++++++++++++++--------
 3 files changed, 71 insertions(+), 23 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/4] x86/membarrier: Get rid of a dubious optimization
  2020-12-02 15:35 [PATCH v2 0/4] membarrier fixes Andy Lutomirski
@ 2020-12-02 15:35 ` Andy Lutomirski
  2020-12-02 15:35 ` [PATCH v2 2/4] membarrier: Add an actual barrier before rseq_preempt() Andy Lutomirski
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2020-12-02 15:35 UTC (permalink / raw)
  To: x86, Mathieu Desnoyers
  Cc: LKML, Nicholas Piggin, Arnd Bergmann, Anton Blanchard,
	Andy Lutomirski, stable

sync_core_before_usermode() had an incorrect optimization.  If we're
in an IRQ, we can get to usermode without IRET -- we just have to
schedule to a different task in the same mm and do SYSRET.
Fortunately, there were no callers of sync_core_before_usermode()
that could have had in_irq() or in_nmi() equal to true, because it's
only ever called from the scheduler.

While we're at it, clarify a related comment.

Cc: stable@vger.kernel.org
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/sync_core.h |  9 +++++----
 arch/x86/mm/tlb.c                | 10 ++++++++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index 0fd4a9dfb29c..ab7382f92aff 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -98,12 +98,13 @@ static inline void sync_core_before_usermode(void)
 	/* With PTI, we unconditionally serialize before running user code. */
 	if (static_cpu_has(X86_FEATURE_PTI))
 		return;
+
 	/*
-	 * Return from interrupt and NMI is done through iret, which is core
-	 * serializing.
+	 * Even if we're in an interrupt, we might reschedule before returning,
+	 * in which case we could switch to a different thread in the same mm
+	 * and return using SYSRET or SYSEXIT.  Instead of trying to keep
+	 * track of our need to sync the core, just sync right away.
 	 */
-	if (in_irq() || in_nmi())
-		return;
 	sync_core();
 }
 
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 11666ba19b62..569ac1d57f55 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -474,8 +474,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	/*
 	 * The membarrier system call requires a full memory barrier and
 	 * core serialization before returning to user-space, after
-	 * storing to rq->curr. Writing to CR3 provides that full
-	 * memory barrier and core serializing instruction.
+	 * storing to rq->curr, when changing mm.  This is because
+	 * membarrier() sends IPIs to all CPUs that are in the target mm
+	 * to make them issue memory barriers.  However, if another CPU
+	 * switches to/from the target mm concurrently with
+	 * membarrier(), it can cause that CPU not to receive an IPI
+	 * when it really should issue a memory barrier.  Writing to CR3
+	 * provides that full memory barrier and core serializing
+	 * instruction.
 	 */
 	if (real_prev == next) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/4] membarrier: Add an actual barrier before rseq_preempt()
  2020-12-02 15:35 [PATCH v2 0/4] membarrier fixes Andy Lutomirski
  2020-12-02 15:35 ` [PATCH v2 1/4] x86/membarrier: Get rid of a dubious optimization Andy Lutomirski
@ 2020-12-02 15:35 ` Andy Lutomirski
  2020-12-02 19:40   ` Mathieu Desnoyers
  2020-12-02 15:35 ` [PATCH v2 3/4] membarrier: Explicitly sync remote cores when SYNC_CORE is requested Andy Lutomirski
  2020-12-02 15:35 ` [PATCH v2 4/4] membarrier: Execute SYNC_CORE on the calling thread Andy Lutomirski
  3 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2020-12-02 15:35 UTC (permalink / raw)
  To: x86, Mathieu Desnoyers
  Cc: LKML, Nicholas Piggin, Arnd Bergmann, Anton Blanchard,
	Andy Lutomirski, stable

It seems to me that most RSEQ membarrier users will expect any
stores done before the membarrier() syscall to be visible to the
target task(s).  While this is extremely likely to be true in
practice, nothing actually guarantees it by a strict reading of the
x86 manuals.  Rather than providing this guarantee by accident and
potentially causing a problem down the road, just add an explicit
barrier.

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 kernel/sched/membarrier.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 5a40b3828ff2..6251d3d12abe 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -168,6 +168,14 @@ static void ipi_mb(void *info)
 
 static void ipi_rseq(void *info)
 {
+	/*
+	 * Ensure that all stores done by the calling thread are visible
+	 * to the current task before the current task resumes.  We could
+	 * probably optimize this away on most architectures, but by the
+	 * time we've already sent an IPI, the cost of the extra smp_mb()
+	 * is negligible.
+	 */
+	smp_mb();
 	rseq_preempt(current);
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/4] membarrier: Explicitly sync remote cores when SYNC_CORE is requested
  2020-12-02 15:35 [PATCH v2 0/4] membarrier fixes Andy Lutomirski
  2020-12-02 15:35 ` [PATCH v2 1/4] x86/membarrier: Get rid of a dubious optimization Andy Lutomirski
  2020-12-02 15:35 ` [PATCH v2 2/4] membarrier: Add an actual barrier before rseq_preempt() Andy Lutomirski
@ 2020-12-02 15:35 ` Andy Lutomirski
  2020-12-02 19:43   ` Mathieu Desnoyers
  2020-12-02 15:35 ` [PATCH v2 4/4] membarrier: Execute SYNC_CORE on the calling thread Andy Lutomirski
  3 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2020-12-02 15:35 UTC (permalink / raw)
  To: x86, Mathieu Desnoyers
  Cc: LKML, Nicholas Piggin, Arnd Bergmann, Anton Blanchard,
	Andy Lutomirski, stable

membarrier() does not explicitly sync_core() remote CPUs; instead, it
relies on the assumption that an IPI will result in a core sync.  On
x86, I think this may be true in practice, but it's not architecturally
reliable.  In particular, the SDM and APM do not appear to guarantee
that interrupt delivery is serializing.  While IRET does serialize, IPI
return can schedule, thereby switching to another task in the same mm
that was sleeping in a syscall.  The new task could then SYSRET back to
usermode without ever executing IRET.

Make this more robust by explicitly calling sync_core_before_usermode()
on remote cores.  (This also helps people who search the kernel tree for
instances of sync_core() and sync_core_before_usermode() -- one might be
surprised that the core membarrier code doesn't currently show up in a
such a search.)

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 kernel/sched/membarrier.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 6251d3d12abe..01538b31f27e 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -166,6 +166,23 @@ static void ipi_mb(void *info)
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
+static void ipi_sync_core(void *info)
+{
+	/*
+	 * The smp_mb() in membarrier after all the IPIs is supposed to
+	 * ensure that memory on remote CPUs that occur before the IPI
+	 * become visible to membarrier()'s caller -- see scenario B in
+	 * the big comment at the top of this file.
+	 *
+	 * A sync_core() would provide this guarantee, but
+	 * sync_core_before_usermode() might end up being deferred until
+	 * after membarrier()'s smp_mb().
+	 */
+	smp_mb();	/* IPIs should be serializing but paranoid. */
+
+	sync_core_before_usermode();
+}
+
 static void ipi_rseq(void *info)
 {
 	/*
@@ -301,6 +318,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		if (!(atomic_read(&mm->membarrier_state) &
 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
 			return -EPERM;
+		ipi_func = ipi_sync_core;
 	} else if (flags == MEMBARRIER_FLAG_RSEQ) {
 		if (!IS_ENABLED(CONFIG_RSEQ))
 			return -EINVAL;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/4] membarrier: Execute SYNC_CORE on the calling thread
  2020-12-02 15:35 [PATCH v2 0/4] membarrier fixes Andy Lutomirski
                   ` (2 preceding siblings ...)
  2020-12-02 15:35 ` [PATCH v2 3/4] membarrier: Explicitly sync remote cores when SYNC_CORE is requested Andy Lutomirski
@ 2020-12-02 15:35 ` Andy Lutomirski
  2020-12-02 19:39   ` Mathieu Desnoyers
  3 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2020-12-02 15:35 UTC (permalink / raw)
  To: x86, Mathieu Desnoyers
  Cc: LKML, Nicholas Piggin, Arnd Bergmann, Anton Blanchard,
	Andy Lutomirski, stable

membarrier()'s MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is documented
as syncing the core on all sibling threads but not necessarily the
calling thread.  This behavior is fundamentally buggy and cannot be used
safely.  Suppose a user program has two threads.  Thread A is on CPU 0
and thread B is on CPU 1.  Thread A modifies some text and calls
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE).  Then thread B
executes the modified code.  If, at any point after membarrier() decides
which CPUs to target, thread A could be preempted and replaced by thread
B on CPU 0.  This could even happen on exit from the membarrier()
syscall.  If this happens, thread B will end up running on CPU 0 without
having synced.

In principle, this could be fixed by arranging for the scheduler to
sync_core_before_usermode() whenever switching between two threads in
the same mm if there is any possibility of a concurrent membarrier()
call, but this would have considerable overhead.  Instead, make
membarrier() sync the calling CPU as well.

As an optimization, this avoids an extra smp_mb() in the default
barrier-only mode.

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 kernel/sched/membarrier.c | 49 +++++++++++++++++++++++++--------------
 1 file changed, 32 insertions(+), 17 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 01538b31f27e..7df7c0e60647 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -352,8 +352,6 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 
 		if (cpu_id >= nr_cpu_ids || !cpu_online(cpu_id))
 			goto out;
-		if (cpu_id == raw_smp_processor_id())
-			goto out;
 		rcu_read_lock();
 		p = rcu_dereference(cpu_rq(cpu_id)->curr);
 		if (!p || p->mm != mm) {
@@ -368,16 +366,6 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		for_each_online_cpu(cpu) {
 			struct task_struct *p;
 
-			/*
-			 * Skipping the current CPU is OK even through we can be
-			 * migrated at any point. The current CPU, at the point
-			 * where we read raw_smp_processor_id(), is ensured to
-			 * be in program order with respect to the caller
-			 * thread. Therefore, we can skip this CPU from the
-			 * iteration.
-			 */
-			if (cpu == raw_smp_processor_id())
-				continue;
 			p = rcu_dereference(cpu_rq(cpu)->curr);
 			if (p && p->mm == mm)
 				__cpumask_set_cpu(cpu, tmpmask);
@@ -385,12 +373,39 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		rcu_read_unlock();
 	}
 
-	preempt_disable();
-	if (cpu_id >= 0)
+	if (cpu_id >= 0) {
+		/*
+		 * smp_call_function_single() will call ipi_func() if cpu_id
+		 * is the calling CPU.
+		 */
 		smp_call_function_single(cpu_id, ipi_func, NULL, 1);
-	else
-		smp_call_function_many(tmpmask, ipi_func, NULL, 1);
-	preempt_enable();
+	} else {
+		/*
+		 * For regular membarrier, we can save a few cycles by
+		 * skipping the current cpu -- we're about to do smp_mb()
+		 * below, and if we migrate to a different cpu, this cpu
+		 * and the new cpu will execute a full barrier in the
+		 * scheduler.
+		 *
+		 * For CORE_SYNC, we do need a barrier on the current cpu --
+		 * otherwise, if we are migrated and replaced by a different
+		 * task in the same mm just before, during, or after
+		 * membarrier, we will end up with some thread in the mm
+		 * running without a core sync.
+		 *
+		 * For RSEQ, it seems polite to target the calling thread
+		 * as well, although it's not clear it makes much difference
+		 * either way.  Users aren't supposed to run syscalls in an
+		 * rseq critical section.
+		 */
+		if (ipi_func == ipi_mb) {
+			preempt_disable();
+			smp_call_function_many(tmpmask, ipi_func, NULL, true);
+			preempt_enable();
+		} else {
+			on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
+		}
+	}
 
 out:
 	if (cpu_id < 0)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 4/4] membarrier: Execute SYNC_CORE on the calling thread
  2020-12-02 15:35 ` [PATCH v2 4/4] membarrier: Execute SYNC_CORE on the calling thread Andy Lutomirski
@ 2020-12-02 19:39   ` Mathieu Desnoyers
  0 siblings, 0 replies; 8+ messages in thread
From: Mathieu Desnoyers @ 2020-12-02 19:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Nicholas Piggin, Arnd Bergmann,
	Anton Blanchard, stable

----- On Dec 2, 2020, at 10:35 AM, Andy Lutomirski luto@kernel.org wrote:

> membarrier()'s MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is documented
> as syncing the core on all sibling threads but not necessarily the
> calling thread.  This behavior is fundamentally buggy and cannot be used
> safely.  Suppose a user program has two threads.  Thread A is on CPU 0
> and thread B is on CPU 1.  Thread A modifies some text and calls
> membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE).  Then thread B
> executes the modified code.  If, at any point after membarrier() decides
> which CPUs to target, thread A could be preempted and replaced by thread
> B on CPU 0.  This could even happen on exit from the membarrier()
> syscall.  If this happens, thread B will end up running on CPU 0 without
> having synced.

Indeed, good catch! We only have sync core in the scheduler when switching
between mm, so indeed we cannot rely on the scheduler to issue a sync core
for us when switching between threads with the same mm.

> In principle, this could be fixed by arranging for the scheduler to
> sync_core_before_usermode() whenever switching between two threads in
> the same mm if there is any possibility of a concurrent membarrier()
> call, but this would have considerable overhead.  Instead, make
> membarrier() sync the calling CPU as well.

Yes, I agree that sync core on self is the right approach here.

> As an optimization, this avoids an extra smp_mb() in the default
> barrier-only mode.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> kernel/sched/membarrier.c | 49 +++++++++++++++++++++++++--------------
> 1 file changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 01538b31f27e..7df7c0e60647 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -352,8 +352,6 @@ static int membarrier_private_expedited(int flags, int
> cpu_id)
> 

There is one small optimization you will want to adapt here:

        if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
                return 0;

should become:

        if (flags != MEMBARRIER_FLAG_SYNC_CORE && atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
                return 0;

So we issue a core sync for self in single-threaded applications,
to make things consistent. We can then document that membarrier sync core
issues a core sync on all thread siblings including the caller thread.

> 		if (cpu_id >= nr_cpu_ids || !cpu_online(cpu_id))
> 			goto out;
> -		if (cpu_id == raw_smp_processor_id())
> -			goto out;
> 		rcu_read_lock();
> 		p = rcu_dereference(cpu_rq(cpu_id)->curr);
> 		if (!p || p->mm != mm) {
> @@ -368,16 +366,6 @@ static int membarrier_private_expedited(int flags, int
> cpu_id)
> 		for_each_online_cpu(cpu) {
> 			struct task_struct *p;
> 
> -			/*
> -			 * Skipping the current CPU is OK even through we can be
> -			 * migrated at any point. The current CPU, at the point
> -			 * where we read raw_smp_processor_id(), is ensured to
> -			 * be in program order with respect to the caller
> -			 * thread. Therefore, we can skip this CPU from the
> -			 * iteration.
> -			 */
> -			if (cpu == raw_smp_processor_id())
> -				continue;
> 			p = rcu_dereference(cpu_rq(cpu)->curr);
> 			if (p && p->mm == mm)
> 				__cpumask_set_cpu(cpu, tmpmask);
> @@ -385,12 +373,39 @@ static int membarrier_private_expedited(int flags, int
> cpu_id)
> 		rcu_read_unlock();
> 	}
> 
> -	preempt_disable();
> -	if (cpu_id >= 0)
> +	if (cpu_id >= 0) {
> +		/*
> +		 * smp_call_function_single() will call ipi_func() if cpu_id
> +		 * is the calling CPU.
> +		 */
> 		smp_call_function_single(cpu_id, ipi_func, NULL, 1);
> -	else
> -		smp_call_function_many(tmpmask, ipi_func, NULL, 1);
> -	preempt_enable();
> +	} else {
> +		/*
> +		 * For regular membarrier, we can save a few cycles by
> +		 * skipping the current cpu -- we're about to do smp_mb()
> +		 * below, and if we migrate to a different cpu, this cpu
> +		 * and the new cpu will execute a full barrier in the
> +		 * scheduler.
> +		 *
> +		 * For CORE_SYNC, we do need a barrier on the current cpu --
> +		 * otherwise, if we are migrated and replaced by a different
> +		 * task in the same mm just before, during, or after
> +		 * membarrier, we will end up with some thread in the mm
> +		 * running without a core sync.
> +		 *
> +		 * For RSEQ, it seems polite to target the calling thread
> +		 * as well, although it's not clear it makes much difference
> +		 * either way.  Users aren't supposed to run syscalls in an
> +		 * rseq critical section.

Considering that we want a consistent behavior between single and multi-threaded
programs (as I pointed out above wrt the optimization change), I think it would
be better to skip self for the rseq ipi, in the same way we'd want to return
early for a membarrier-rseq-private on a single-threaded mm. Users are _really_
not supposed to run system calls in rseq critical sections. The kernel even kills
the offending applications when run on kernels with CONFIG_DEBUG_RSEQ=y. So it seems
rather pointless to waste cycles doing a rseq fence on self considering this.

Thanks,

Mathieu

> +		 */
> +		if (ipi_func == ipi_mb) {
> +			preempt_disable();
> +			smp_call_function_many(tmpmask, ipi_func, NULL, true);
> +			preempt_enable();
> +		} else {
> +			on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
> +		}
> +	}
> 
> out:
> 	if (cpu_id < 0)
> --
> 2.28.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/4] membarrier: Add an actual barrier before rseq_preempt()
  2020-12-02 15:35 ` [PATCH v2 2/4] membarrier: Add an actual barrier before rseq_preempt() Andy Lutomirski
@ 2020-12-02 19:40   ` Mathieu Desnoyers
  0 siblings, 0 replies; 8+ messages in thread
From: Mathieu Desnoyers @ 2020-12-02 19:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Nicholas Piggin, Arnd Bergmann,
	Anton Blanchard, stable

----- On Dec 2, 2020, at 10:35 AM, Andy Lutomirski luto@kernel.org wrote:

> It seems to me that most RSEQ membarrier users will expect any
> stores done before the membarrier() syscall to be visible to the
> target task(s).  While this is extremely likely to be true in
> practice, nothing actually guarantees it by a strict reading of the
> x86 manuals.  Rather than providing this guarantee by accident and
> potentially causing a problem down the road, just add an explicit
> barrier.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

> ---
> kernel/sched/membarrier.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 5a40b3828ff2..6251d3d12abe 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -168,6 +168,14 @@ static void ipi_mb(void *info)
> 
> static void ipi_rseq(void *info)
> {
> +	/*
> +	 * Ensure that all stores done by the calling thread are visible
> +	 * to the current task before the current task resumes.  We could
> +	 * probably optimize this away on most architectures, but by the
> +	 * time we've already sent an IPI, the cost of the extra smp_mb()
> +	 * is negligible.
> +	 */
> +	smp_mb();
> 	rseq_preempt(current);
> }
> 
> --
> 2.28.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 3/4] membarrier: Explicitly sync remote cores when SYNC_CORE is requested
  2020-12-02 15:35 ` [PATCH v2 3/4] membarrier: Explicitly sync remote cores when SYNC_CORE is requested Andy Lutomirski
@ 2020-12-02 19:43   ` Mathieu Desnoyers
  0 siblings, 0 replies; 8+ messages in thread
From: Mathieu Desnoyers @ 2020-12-02 19:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Nicholas Piggin, Arnd Bergmann,
	Anton Blanchard, stable

----- On Dec 2, 2020, at 10:35 AM, Andy Lutomirski luto@kernel.org wrote:

> membarrier() does not explicitly sync_core() remote CPUs; instead, it
> relies on the assumption that an IPI will result in a core sync.  On
> x86, I think this may be true in practice, but it's not architecturally
> reliable.  In particular, the SDM and APM do not appear to guarantee
> that interrupt delivery is serializing.  While IRET does serialize, IPI
> return can schedule, thereby switching to another task in the same mm
> that was sleeping in a syscall.  The new task could then SYSRET back to
> usermode without ever executing IRET.
> 
> Make this more robust by explicitly calling sync_core_before_usermode()
> on remote cores.  (This also helps people who search the kernel tree for
> instances of sync_core() and sync_core_before_usermode() -- one might be
> surprised that the core membarrier code doesn't currently show up in a
> such a search.)
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

> ---
> kernel/sched/membarrier.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
> 
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 6251d3d12abe..01538b31f27e 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -166,6 +166,23 @@ static void ipi_mb(void *info)
> 	smp_mb();	/* IPIs should be serializing but paranoid. */
> }
> 
> +static void ipi_sync_core(void *info)
> +{
> +	/*
> +	 * The smp_mb() in membarrier after all the IPIs is supposed to
> +	 * ensure that memory on remote CPUs that occur before the IPI
> +	 * become visible to membarrier()'s caller -- see scenario B in
> +	 * the big comment at the top of this file.
> +	 *
> +	 * A sync_core() would provide this guarantee, but
> +	 * sync_core_before_usermode() might end up being deferred until
> +	 * after membarrier()'s smp_mb().
> +	 */
> +	smp_mb();	/* IPIs should be serializing but paranoid. */
> +
> +	sync_core_before_usermode();
> +}
> +
> static void ipi_rseq(void *info)
> {
> 	/*
> @@ -301,6 +318,7 @@ static int membarrier_private_expedited(int flags, int
> cpu_id)
> 		if (!(atomic_read(&mm->membarrier_state) &
> 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
> 			return -EPERM;
> +		ipi_func = ipi_sync_core;
> 	} else if (flags == MEMBARRIER_FLAG_RSEQ) {
> 		if (!IS_ENABLED(CONFIG_RSEQ))
> 			return -EINVAL;
> --
> 2.28.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-12-02 19:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-02 15:35 [PATCH v2 0/4] membarrier fixes Andy Lutomirski
2020-12-02 15:35 ` [PATCH v2 1/4] x86/membarrier: Get rid of a dubious optimization Andy Lutomirski
2020-12-02 15:35 ` [PATCH v2 2/4] membarrier: Add an actual barrier before rseq_preempt() Andy Lutomirski
2020-12-02 19:40   ` Mathieu Desnoyers
2020-12-02 15:35 ` [PATCH v2 3/4] membarrier: Explicitly sync remote cores when SYNC_CORE is requested Andy Lutomirski
2020-12-02 19:43   ` Mathieu Desnoyers
2020-12-02 15:35 ` [PATCH v2 4/4] membarrier: Execute SYNC_CORE on the calling thread Andy Lutomirski
2020-12-02 19:39   ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).