All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andrew Hunter <ahh@google.com>,
	maged michael <maged.michael@gmail.com>,
	gromer <gromer@google.com>, Avi Kivity <avi@scylladb.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Dave Watson <davejwatson@fb.com>,
	Alan Stern <stern@rowland.harvard.edu>,
	Will Deacon <will.deacon@arm.com>,
	Andy Lutomirski <luto@kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>
Subject: Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
Date: Sun, 24 Sep 2017 14:23:04 +0000 (UTC)	[thread overview]
Message-ID: <1879888051.17397.1506262984228.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20170924133038.GA8673@tardis>

----- On Sep 24, 2017, at 9:30 AM, Boqun Feng boqun.feng@gmail.com wrote:

> On Fri, Sep 22, 2017 at 03:10:10PM +0000, Mathieu Desnoyers wrote:
>> ----- On Sep 22, 2017, at 4:59 AM, Boqun Feng boqun.feng@gmail.com wrote:
>> 
>> > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote:
>> > [...]
>> >> +static inline void membarrier_arch_sched_in(struct task_struct *prev,
>> >> +		struct task_struct *next)
>> >> +{
>> >> +	/*
>> >> +	 * Only need the full barrier when switching between processes.
>> >> +	 */
>> >> +	if (likely(!test_ti_thread_flag(task_thread_info(next),
>> >> +			TIF_MEMBARRIER_PRIVATE_EXPEDITED)
>> >> +				|| prev->mm == next->mm))
>> > 
>> > And we also don't need the smp_mb() if !prev->mm, because switching from
>> > kernel to user will have a smp_mb() implied by mmdrop()?
>> 
>> Right. And we also don't need it when switching from userspace to kernel
> 
> Yep, but this case is covered already, as I think we don't allow kernel
> thread to have TIF_MEMBARRIER_PRIVATE_EXPEDITED set, right?

Good point.

> 
>> thread neither. Something like this:
>> 
>> static inline void membarrier_arch_sched_in(struct task_struct *prev,
>>                 struct task_struct *next)
>> {
>>         /*
>>          * Only need the full barrier when switching between processes.
>>          * Barrier when switching from kernel to userspace is not
>>          * required here, given that it is implied by mmdrop(). Barrier
>>          * when switching from userspace to kernel is not needed after
>>          * store to rq->curr.
>>          */
>>         if (likely(!test_ti_thread_flag(task_thread_info(next),
>>                         TIF_MEMBARRIER_PRIVATE_EXPEDITED)
>>                         || !prev->mm || !next->mm || prev->mm == next->mm))
> 
> , so no need to test next->mm here.
> 

Right, it's redundant wrt testing the thread flag.

>>                 return;
>> 
>>         /*
>>          * The membarrier system call requires a full memory barrier
>>          * after storing to rq->curr, before going back to user-space.
>>          */
>>         smp_mb();
>> }
>> 
>> > 
>> >> +		return;
>> >> +
>> >> +	/*
>> >> +	 * The membarrier system call requires a full memory barrier
>> >> +	 * after storing to rq->curr, before going back to user-space.
>> >> +	 */
>> >> +	smp_mb();
>> >> +}
>> > 
>> > [...]
>> > 
>> >> +static inline void membarrier_fork(struct task_struct *t,
>> >> +		unsigned long clone_flags)
>> >> +{
>> >> +	if (!current->mm || !t->mm)
>> >> +		return;
>> >> +	t->mm->membarrier_private_expedited =
>> >> +		current->mm->membarrier_private_expedited;
>> > 
>> > Have we already done the copy of ->membarrier_private_expedited in
>> > copy_mm()?
>> 
>> copy_mm() is performed without holding current->sighand->siglock, so
>> it appears to be racing with concurrent membarrier register cmd.
> 
> Speak of racing, I think we currently have a problem if we do a
> register_private_expedited in one thread and do a
> membarrer_private_expedited in another thread(sharing the same mm), as
> follow:
> 
>	{t1,t2,t3 sharing the same ->mm}
>	CPU 0				CPU 1				CPU2
>	====================		===================		============
>	{in thread t1}
>	membarrier_register_private_expedited():
>	  ...
>	  WRITE_ONCE(->mm->membarrier_private_expedited, 1);
>	  membarrier_arch_register_private_expedited():
>	    ...
>	    <haven't set the TIF for t3 yet>
> 
>	    				{in thread t2}
>					membarrier_private_expedited():
>					  READ_ONCE(->mm->membarrier_private_expedited); // == 1
>					  ...
>					  for_each_online_cpu()
>					    ...
>					    <p is cpu_rq(CPU2)->curr>
>					    if (p && p->mm == current->mm) // false
>					    <so no ipi sent to CPU2>
>					    				
>									{about to switch to t3}
>									rq->curr = t3;
>									....
>									context_switch():
>									  ...
>									  finish_task_swtich():
>									    membarrier_sched_in():
>									    <TIF is not set>
>									    // no smp_mb() here.
> 
> , and we will miss the smp_mb() on CPU2, right? And this could even
> happen if t2 has a membarrier_register_private_expedited() preceding the
> membarrier_private_expedited().
>					
> Am I missing something subtle here?

I think the problem sits in this function:

static void membarrier_register_private_expedited(void)
{
        struct task_struct *p = current;

        if (READ_ONCE(p->mm->membarrier_private_expedited))
                return;
        WRITE_ONCE(p->mm->membarrier_private_expedited, 1);
        membarrier_arch_register_private_expedited(p);
}

I need to change the order between WRITE_ONCE() and invoking
membarrier_arch_register_private_expedited. If I issue the
WRITE_ONCE after the arch code (which sets the TIF flags),
then concurrent membarrier priv exped commands will simply
return an -EPERM error:

static void membarrier_register_private_expedited(void)
{
        struct task_struct *p = current;

        if (READ_ONCE(p->mm->membarrier_private_expedited))
                return;
        membarrier_arch_register_private_expedited(p);
        WRITE_ONCE(p->mm->membarrier_private_expedited, 1);
}

Do you agree that this would fix the race you identified ?

Thanks,

Mathieu


> 
> Regards,
> Boqun
> 
> 
>> However, given that it is a single flag updated with WRITE_ONCE()
>> and read with READ_ONCE(), it might be OK to rely on copy_mm there.
>> If userspace runs registration concurrently with fork, they should
>> not expect the child to be specifically registered or unregistered.
>> 
>> So yes, I think you are right about removing this copy and relying on
>> copy_mm() instead. I also think we can improve membarrier_arch_fork()
>> on powerpc to test the current thread flag rather than using current->mm.
>> 
>> Which leads to those two changes:
>> 
>> static inline void membarrier_fork(struct task_struct *t,
>>                 unsigned long clone_flags)
>> {
>>         /*
>>          * Prior copy_mm() copies the membarrier_private_expedited field
>>          * from current->mm to t->mm.
>>          */
>>         membarrier_arch_fork(t, clone_flags);
>> }
>> 
>> And on PowerPC:
>> 
>> static inline void membarrier_arch_fork(struct task_struct *t,
>>                 unsigned long clone_flags)
>> {
>>         /*
>>          * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>>          * fork is protected by siglock. membarrier_arch_fork is called
>>          * with siglock held.
>>          */
>>         if (test_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED))
>>                 set_ti_thread_flag(task_thread_info(t),
>>                                 TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> }
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> 
>> > 
>> > Regards,
>> > Boqun
>> > 
>> >> +	membarrier_arch_fork(t, clone_flags);
>> >> +}
>> >> +static inline void membarrier_execve(struct task_struct *t)
>> >> +{
>> >> +	t->mm->membarrier_private_expedited = 0;
>> >> +	membarrier_arch_execve(t);
>> >> +}
>> >> +#else
>> >> +static inline void membarrier_sched_in(struct task_struct *prev,
>> >> +		struct task_struct *next)
>> >> +{
>> >> +}
>> >> +static inline void membarrier_fork(struct task_struct *t,
>> >> +		unsigned long clone_flags)
>> >> +{
>> >> +}
>> >> +static inline void membarrier_execve(struct task_struct *t)
>> >> +{
>> >> +}
>> >> +#endif
>> >> +
>> > [...]
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2017-09-24 14:22 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-19 22:13 [RFC PATCH v3 1/2] membarrier: Provide register expedited private command Mathieu Desnoyers
2017-09-19 22:13 ` [RFC PATCH 2/2] membarrier: selftest: Test private expedited cmd Mathieu Desnoyers
2017-09-22  3:22 ` [RFC PATCH v3 1/2] membarrier: Provide register expedited private command Boqun Feng
2017-09-22  3:30   ` Boqun Feng
2017-09-22  5:22     ` Mathieu Desnoyers
2017-09-22  8:24   ` Peter Zijlstra
2017-09-22  8:56     ` Boqun Feng
2017-09-22  8:59 ` Boqun Feng
2017-09-22 15:10   ` Mathieu Desnoyers
2017-09-24 13:30     ` Boqun Feng
2017-09-24 14:23       ` Mathieu Desnoyers [this message]
2017-09-25 12:10         ` Boqun Feng
2017-09-25 12:25           ` Peter Zijlstra
2017-09-25 12:42             ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1879888051.17397.1506262984228.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ahh@google.com \
    --cc=avi@scylladb.com \
    --cc=benh@kernel.crashing.org \
    --cc=boqun.feng@gmail.com \
    --cc=davejwatson@fb.com \
    --cc=gromer@google.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=maged.michael@gmail.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=stern@rowland.harvard.edu \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.