All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Rik van Riel <riel@redhat.com>, Brian Gerst <brgerst@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>,
	Andrew Lutomirski <luto@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"linux-tip-commits@vger.kernel.org" 
	<linux-tip-commits@vger.kernel.org>
Subject: Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization
Date: Mon, 11 Jan 2016 13:50:24 -0800	[thread overview]
Message-ID: <CALCETrXLUewtYNQOdiGpKLvp=RL0eMLSj+v7_J0G1a_do+6G8Q@mail.gmail.com> (raw)
In-Reply-To: <20160111182548.GF6344@twins.programming.kicks-ass.net>

On Mon, Jan 11, 2016 at 10:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote:
>> --- a/arch/x86/include/asm/mmu_context.h
>> +++ b/arch/x86/include/asm/mmu_context.h
>> @@ -116,8 +116,34 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>>  #endif
>>               cpumask_set_cpu(cpu, mm_cpumask(next));
>>
>> -             /* Re-load page tables */
>> +             /*
>> +              * Re-load page tables.
>> +              *
>> +              * This logic has an ordering constraint:
>> +              *
>> +              *  CPU 0: Write to a PTE for 'next'
>> +              *  CPU 0: load bit 1 in mm_cpumask.  if nonzero, send IPI.
>> +              *  CPU 1: set bit 1 in next's mm_cpumask
>> +              *  CPU 1: load from the PTE that CPU 0 writes (implicit)
>> +              *
>> +              * We need to prevent an outcome in which CPU 1 observes
>> +              * the new PTE value and CPU 0 observes bit 1 clear in
>> +              * mm_cpumask.  (If that occurs, then the IPI will never
>> +              * be sent, and CPU 0's TLB will contain a stale entry.)
>> +              *
>> +              * The bad outcome can occur if either CPU's load is
>> +              * reordered before that CPU's store, so both CPUs much
>
> s/much/must/ ?

Indeed.  Is this worth a follow-up patch?

>
>> +              * execute full barriers to prevent this from happening.
>> +              *
>> +              * Thus, switch_mm needs a full barrier between the
>> +              * store to mm_cpumask and any operation that could load
>> +              * from next->pgd.  This barrier synchronizes with
>> +              * remote TLB flushers.  Fortunately, load_cr3 is
>> +              * serializing and thus acts as a full barrier.
>> +              *
>> +              */
>>               load_cr3(next->pgd);
>> +
>>               trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>>
>>               /* Stop flush ipis for the previous mm */
>> @@ -156,10 +182,15 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>>                        * schedule, protecting us from simultaneous changes.
>>                        */
>>                       cpumask_set_cpu(cpu, mm_cpumask(next));
>> +
>>                       /*
>>                        * We were in lazy tlb mode and leave_mm disabled
>>                        * tlb flush IPI delivery. We must reload CR3
>>                        * to make sure to use no freed page tables.
>> +                      *
>> +                      * As above, this is a barrier that forces
>> +                      * TLB repopulation to be ordered after the
>> +                      * store to mm_cpumask.
>
> somewhat confused by this comment, cpumask_set_cpu() is a LOCK BTS, that
> is already fully ordered.

There are more than enough barriers here.  v1 had cpumask_set_cpu;
smp_mb__after_atomic, which is more portable and generates identical
code.  I don't have a real preference for which barrier we should
consider to the important one.

>
>>                        */
>>                       load_cr3(next->pgd);
>>                       trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index 8ddb5d0..8f4cc3d 100644
>
>
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>
>> @@ -188,17 +191,29 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>
>>       if (!current->mm) {
>>               leave_mm(smp_processor_id());
>> +
>> +             /* Synchronize with switch_mm. */
>> +             smp_mb();
>> +
>>               goto out;
>>       }
>
>> +             } else {
>>                       leave_mm(smp_processor_id());
>> +
>> +                     /* Synchronize with switch_mm. */
>> +                     smp_mb();
>> +             }
>>       }
>
> The alternative is making leave_mm() unconditionally imply a full
> barrier. I've not looked at other sites using it though.

For a quick fix, I preferred the more self-contained change.

--Andy

  reply	other threads:[~2016-01-11 21:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <tip-71b3c126e61177eb693423f2e18a1914205b165e@git.kernel.org>
2016-01-11 18:25 ` [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization Peter Zijlstra
2016-01-11 21:50   ` Andy Lutomirski [this message]
2016-01-11 22:22     ` Peter Zijlstra
2016-01-12 10:21     ` Ingo Molnar
2016-01-12 20:11       ` [PATCH] x86/mm: Improve switch_mm barrier comments Andy Lutomirski
2016-01-12 20:11         ` Andy Lutomirski
2016-01-12 20:47       ` [PATCH v2] " Andy Lutomirski
2016-01-12 20:47         ` Andy Lutomirski
2016-01-14  9:06         ` [tip:x86/urgent] x86/mm: Improve switch_mm() " tip-bot for Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrXLUewtYNQOdiGpKLvp=RL0eMLSj+v7_J0G1a_do+6G8Q@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.