All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	paulmck <paulmck@linux.ibm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Russell King, ARM Linux" <linux@armlinux.org.uk>,
	Chris Metcalf <cmetcalf@ezchip.com>, Chris Lameter <cl@linux.com>,
	Kirill Tkhai <tkhai@yandex.ru>, Mike Galbraith <efault@gmx.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state racy load
Date: Wed, 4 Sep 2019 20:26:07 +0200	[thread overview]
Message-ID: <20190904182607.GG17205@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <1825272223.1740.1567617173011.JavaMail.zimbra@efficios.com>

On Wed, Sep 04, 2019 at 01:12:53PM -0400, Mathieu Desnoyers wrote:
> ----- On Sep 4, 2019, at 12:09 PM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Wed, Sep 04, 2019 at 11:19:00AM -0400, Mathieu Desnoyers wrote:
> >> ----- On Sep 3, 2019, at 4:36 PM, Linus Torvalds torvalds@linux-foundation.org
> >> wrote:
> > 
> >> > I wonder if the easiest model might be to just use a percpu variable
> >> > instead for the membarrier stuff? It's not like it has to be in
> >> > 'struct task_struct' at all, I think. We only care about the current
> >> > runqueues, and those are percpu anyway.
> >> 
> >> One issue here is that membarrier iterates over all runqueues without
> >> grabbing any runqueue lock. If we copy that state from mm to rq on
> >> sched switch prepare, we would need to ensure we have the proper
> >> memory barriers between:
> >> 
> >> prior user-space memory accesses  /  setting the runqueue membarrier state
> >> 
> >> and
> >> 
> >> setting the runqueue membarrier state / following user-space memory accesses
> >> 
> >> Copying the membarrier state into the task struct leverages the fact that
> >> we have documented and guaranteed those barriers around the rq->curr update
> >> in the scheduler.
> > 
> > Should be the same as the barriers we already rely on for rq->curr, no?
> > That is, if we put this before switch_mm() then we have
> > smp_mb__after_spinlock() and switch_mm() itself.
> 
> Yes, I think we can piggy-back on the already documented barriers documented around
> rq->curr store.
> 
> > Also, if we place mm->membarrier_state in the same cacheline as mm->pgd
> > (which switch_mm() is bound to load) then we should be fine, I think.
> 
> Yes, if we make sure membarrier_prepare_task_switch only updates the
> rq->membarrier_state if prev->mm != next->mm, we should be able to avoid
> loading next->mm->membarrier_state when switch_mm() is not invoked.
> 
> I'll prepare RFC patch implementing this approach.

Thinking about this a bit; switching it 'on' still requires some
thinking. Consider register on an already threaded process of which
multiple threads are 'current' on multiple CPUs. In that case none of
the rq bits will be set.

Not even synchronize_rcu() is sufficient to force it either, since we
only update on switch_mm() and nothing guarantees we pass that.

One possible approach would be to IPI broadcast (after setting the
->mm->membarrier_State) and having the IPI update the rq state from
'current->mm'.

But possible I'm just confusing evryone again. I'm not having a good day
today.

  reply	other threads:[~2019-09-04 18:26 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 20:11 [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state racy load Mathieu Desnoyers
2019-09-03 20:11 ` [RFC PATCH 2/2] Fix: sched/membarrier: private expedited registration check Mathieu Desnoyers
2019-09-03 20:24 ` [RFC PATCH 1/2] Fix: sched/membarrier: p->mm->membarrier_state racy load Peter Zijlstra
2019-09-03 20:36   ` Linus Torvalds
2019-09-04 15:19     ` Mathieu Desnoyers
2019-09-04 16:09       ` Peter Zijlstra
2019-09-04 17:12         ` Mathieu Desnoyers
2019-09-04 18:26           ` Peter Zijlstra [this message]
2019-09-06  0:51             ` Mathieu Desnoyers
2019-09-03 20:41   ` Mathieu Desnoyers
2019-09-04 11:28     ` Peter Zijlstra
2019-09-04 11:49       ` Peter Zijlstra
2019-09-04 15:26         ` Mathieu Desnoyers
2019-09-04 12:03       ` Oleg Nesterov
2019-09-04 12:43         ` Peter Zijlstra
2019-09-04 13:17           ` Oleg Nesterov
2019-09-03 20:27 ` Linus Torvalds
2019-09-03 20:53   ` Mathieu Desnoyers
2019-09-04 10:53 ` Oleg Nesterov
2019-09-04 11:39   ` Peter Zijlstra
2019-09-04 15:24   ` Mathieu Desnoyers
2019-09-04 11:11 ` Oleg Nesterov
2019-09-04 16:11   ` Mathieu Desnoyers
2019-09-08 13:46   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190904182607.GG17205@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=cl@linux.com \
    --cc=cmetcalf@ezchip.com \
    --cc=ebiederm@xmission.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tkhai@yandex.ru \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.