All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization
       [not found] <tip-71b3c126e61177eb693423f2e18a1914205b165e@git.kernel.org>
@ 2016-01-11 18:25 ` Peter Zijlstra
  2016-01-11 21:50   ` Andy Lutomirski
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2016-01-11 18:25 UTC (permalink / raw)
  To: linux-kernel, dave.hansen, riel, brgerst, akpm, luto, mingo,
	dvlasenk, hpa, tglx, bp, luto, torvalds
  Cc: linux-tip-commits

On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote:
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -116,8 +116,34 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>  #endif
>  		cpumask_set_cpu(cpu, mm_cpumask(next));
>  
> -		/* Re-load page tables */
> +		/*
> +		 * Re-load page tables.
> +		 *
> +		 * This logic has an ordering constraint:
> +		 *
> +		 *  CPU 0: Write to a PTE for 'next'
> +		 *  CPU 0: load bit 1 in mm_cpumask.  if nonzero, send IPI.
> +		 *  CPU 1: set bit 1 in next's mm_cpumask
> +		 *  CPU 1: load from the PTE that CPU 0 writes (implicit)
> +		 *
> +		 * We need to prevent an outcome in which CPU 1 observes
> +		 * the new PTE value and CPU 0 observes bit 1 clear in
> +		 * mm_cpumask.  (If that occurs, then the IPI will never
> +		 * be sent, and CPU 0's TLB will contain a stale entry.)
> +		 *
> +		 * The bad outcome can occur if either CPU's load is
> +		 * reordered before that CPU's store, so both CPUs much

s/much/must/ ?

> +		 * execute full barriers to prevent this from happening.
> +		 *
> +		 * Thus, switch_mm needs a full barrier between the
> +		 * store to mm_cpumask and any operation that could load
> +		 * from next->pgd.  This barrier synchronizes with
> +		 * remote TLB flushers.  Fortunately, load_cr3 is
> +		 * serializing and thus acts as a full barrier.
> +		 *
> +		 */
>  		load_cr3(next->pgd);
> +
>  		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>  
>  		/* Stop flush ipis for the previous mm */
> @@ -156,10 +182,15 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>  			 * schedule, protecting us from simultaneous changes.
>  			 */
>  			cpumask_set_cpu(cpu, mm_cpumask(next));
> +
>  			/*
>  			 * We were in lazy tlb mode and leave_mm disabled
>  			 * tlb flush IPI delivery. We must reload CR3
>  			 * to make sure to use no freed page tables.
> +			 *
> +			 * As above, this is a barrier that forces
> +			 * TLB repopulation to be ordered after the
> +			 * store to mm_cpumask.

somewhat confused by this comment, cpumask_set_cpu() is a LOCK BTS, that
is already fully ordered.

>  			 */
>  			load_cr3(next->pgd);
>  			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 8ddb5d0..8f4cc3d 100644


> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c

> @@ -188,17 +191,29 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,

>  	if (!current->mm) {
>  		leave_mm(smp_processor_id());
> +
> +		/* Synchronize with switch_mm. */
> +		smp_mb();
> +
>  		goto out;
>  	}

> +		} else {
>  			leave_mm(smp_processor_id());
> +
> +			/* Synchronize with switch_mm. */
> +			smp_mb();
> +		}
>  	}

The alternative is making leave_mm() unconditionally imply a full
barrier. I've not looked at other sites using it though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization
  2016-01-11 18:25 ` [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization Peter Zijlstra
@ 2016-01-11 21:50   ` Andy Lutomirski
  2016-01-11 22:22     ` Peter Zijlstra
  2016-01-12 10:21     ` Ingo Molnar
  0 siblings, 2 replies; 9+ messages in thread
From: Andy Lutomirski @ 2016-01-11 21:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst,
	Andrew Morton, Ingo Molnar, Denys Vlasenko, H. Peter Anvin,
	Thomas Gleixner, Borislav Petkov, Andrew Lutomirski,
	Linus Torvalds, linux-tip-commits

On Mon, Jan 11, 2016 at 10:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote:
>> --- a/arch/x86/include/asm/mmu_context.h
>> +++ b/arch/x86/include/asm/mmu_context.h
>> @@ -116,8 +116,34 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>>  #endif
>>               cpumask_set_cpu(cpu, mm_cpumask(next));
>>
>> -             /* Re-load page tables */
>> +             /*
>> +              * Re-load page tables.
>> +              *
>> +              * This logic has an ordering constraint:
>> +              *
>> +              *  CPU 0: Write to a PTE for 'next'
>> +              *  CPU 0: load bit 1 in mm_cpumask.  if nonzero, send IPI.
>> +              *  CPU 1: set bit 1 in next's mm_cpumask
>> +              *  CPU 1: load from the PTE that CPU 0 writes (implicit)
>> +              *
>> +              * We need to prevent an outcome in which CPU 1 observes
>> +              * the new PTE value and CPU 0 observes bit 1 clear in
>> +              * mm_cpumask.  (If that occurs, then the IPI will never
>> +              * be sent, and CPU 0's TLB will contain a stale entry.)
>> +              *
>> +              * The bad outcome can occur if either CPU's load is
>> +              * reordered before that CPU's store, so both CPUs much
>
> s/much/must/ ?

Indeed.  Is this worth a follow-up patch?

>
>> +              * execute full barriers to prevent this from happening.
>> +              *
>> +              * Thus, switch_mm needs a full barrier between the
>> +              * store to mm_cpumask and any operation that could load
>> +              * from next->pgd.  This barrier synchronizes with
>> +              * remote TLB flushers.  Fortunately, load_cr3 is
>> +              * serializing and thus acts as a full barrier.
>> +              *
>> +              */
>>               load_cr3(next->pgd);
>> +
>>               trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>>
>>               /* Stop flush ipis for the previous mm */
>> @@ -156,10 +182,15 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>>                        * schedule, protecting us from simultaneous changes.
>>                        */
>>                       cpumask_set_cpu(cpu, mm_cpumask(next));
>> +
>>                       /*
>>                        * We were in lazy tlb mode and leave_mm disabled
>>                        * tlb flush IPI delivery. We must reload CR3
>>                        * to make sure to use no freed page tables.
>> +                      *
>> +                      * As above, this is a barrier that forces
>> +                      * TLB repopulation to be ordered after the
>> +                      * store to mm_cpumask.
>
> somewhat confused by this comment, cpumask_set_cpu() is a LOCK BTS, that
> is already fully ordered.

There are more than enough barriers here.  v1 had cpumask_set_cpu;
smp_mb__after_atomic, which is more portable and generates identical
code.  I don't have a real preference for which barrier we should
consider to the important one.

>
>>                        */
>>                       load_cr3(next->pgd);
>>                       trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index 8ddb5d0..8f4cc3d 100644
>
>
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>
>> @@ -188,17 +191,29 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>
>>       if (!current->mm) {
>>               leave_mm(smp_processor_id());
>> +
>> +             /* Synchronize with switch_mm. */
>> +             smp_mb();
>> +
>>               goto out;
>>       }
>
>> +             } else {
>>                       leave_mm(smp_processor_id());
>> +
>> +                     /* Synchronize with switch_mm. */
>> +                     smp_mb();
>> +             }
>>       }
>
> The alternative is making leave_mm() unconditionally imply a full
> barrier. I've not looked at other sites using it though.

For a quick fix, I preferred the more self-contained change.

--Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization
  2016-01-11 21:50   ` Andy Lutomirski
@ 2016-01-11 22:22     ` Peter Zijlstra
  2016-01-12 10:21     ` Ingo Molnar
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2016-01-11 22:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst,
	Andrew Morton, Ingo Molnar, Denys Vlasenko, H. Peter Anvin,
	Thomas Gleixner, Borislav Petkov, Andrew Lutomirski,
	Linus Torvalds, linux-tip-commits

On Mon, Jan 11, 2016 at 01:50:24PM -0800, Andy Lutomirski wrote:
> On Mon, Jan 11, 2016 at 10:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote:

> >> +              * The bad outcome can occur if either CPU's load is
> >> +              * reordered before that CPU's store, so both CPUs much
> >
> > s/much/must/ ?
> 
> Indeed.  Is this worth a follow-up patch?

Dunno, I didn't even spot the typo the first time I read it.. :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization
  2016-01-11 21:50   ` Andy Lutomirski
  2016-01-11 22:22     ` Peter Zijlstra
@ 2016-01-12 10:21     ` Ingo Molnar
  2016-01-12 20:11         ` Andy Lutomirski
  2016-01-12 20:47         ` Andy Lutomirski
  1 sibling, 2 replies; 9+ messages in thread
From: Ingo Molnar @ 2016-01-12 10:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, linux-kernel, Dave Hansen, Rik van Riel,
	Brian Gerst, Andrew Morton, Denys Vlasenko, H. Peter Anvin,
	Thomas Gleixner, Borislav Petkov, Andrew Lutomirski,
	Linus Torvalds, linux-tip-commits


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Mon, Jan 11, 2016 at 10:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote:
> >> --- a/arch/x86/include/asm/mmu_context.h
> >> +++ b/arch/x86/include/asm/mmu_context.h
> >> @@ -116,8 +116,34 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> >>  #endif
> >>               cpumask_set_cpu(cpu, mm_cpumask(next));
> >>
> >> -             /* Re-load page tables */
> >> +             /*
> >> +              * Re-load page tables.
> >> +              *
> >> +              * This logic has an ordering constraint:
> >> +              *
> >> +              *  CPU 0: Write to a PTE for 'next'
> >> +              *  CPU 0: load bit 1 in mm_cpumask.  if nonzero, send IPI.
> >> +              *  CPU 1: set bit 1 in next's mm_cpumask
> >> +              *  CPU 1: load from the PTE that CPU 0 writes (implicit)
> >> +              *
> >> +              * We need to prevent an outcome in which CPU 1 observes
> >> +              * the new PTE value and CPU 0 observes bit 1 clear in
> >> +              * mm_cpumask.  (If that occurs, then the IPI will never
> >> +              * be sent, and CPU 0's TLB will contain a stale entry.)
> >> +              *
> >> +              * The bad outcome can occur if either CPU's load is
> >> +              * reordered before that CPU's store, so both CPUs much
> >
> > s/much/must/ ?
> 
> Indeed.  Is this worth a follow-up patch?

Absolutely! Any typos in code noticed by humans are worth fixing, especially when 
it's comments around tricky code. Could be done together with improving this part 
of the comments:

> > +
> >                       /*
> >                        * We were in lazy tlb mode and leave_mm disabled
> >                        * tlb flush IPI delivery. We must reload CR3
> >                        * to make sure to use no freed page tables.
> > +                      *
> > +                      * As above, this is a barrier that forces
> > +                      * TLB repopulation to be ordered after the
> > +                      * store to mm_cpumask.
>
> somewhat confused by this comment, cpumask_set_cpu() is a LOCK BTS, that is 
> already fully ordered.

... as pretty much any barriers related comment that confuses Peter needs to be 
improved, by definition. ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] x86/mm: Improve switch_mm barrier comments
  2016-01-12 10:21     ` Ingo Molnar
@ 2016-01-12 20:11         ` Andy Lutomirski
  2016-01-12 20:47         ` Andy Lutomirski
  1 sibling, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2016-01-12 20:11 UTC (permalink / raw)
  To: peterz, x86
  Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst,
	Andrew Morton, Denys Vlasenko, Borislav Petkov, Linus Torvalds,
	linux-tip-commits, Andy Lutomirski, stable

My previous comments were still a bit confusing and there was a
typo.  Fix it up.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/mmu_context.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 1edc9cd198b8..e08d27369fce 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -132,14 +132,14 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 * be sent, and CPU 0's TLB will contain a stale entry.)
 		 *
 		 * The bad outcome can occur if either CPU's load is
-		 * reordered before that CPU's store, so both CPUs much
+		 * reordered before that CPU's store, so both CPUs must
 		 * execute full barriers to prevent this from happening.
 		 *
 		 * Thus, switch_mm needs a full barrier between the
 		 * store to mm_cpumask and any operation that could load
 		 * from next->pgd.  This barrier synchronizes with
-		 * remote TLB flushers.  Fortunately, load_cr3 is
-		 * serializing and thus acts as a full barrier.
+		 * remote TLB flushers.  Fortunately, cpumask_set_cpu is
+		 * (on x86) a full barrier, and load_cr3 is serializing.
 		 *
 		 */
 		load_cr3(next->pgd);
@@ -188,9 +188,10 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
 			 *
-			 * As above, this is a barrier that forces
-			 * TLB repopulation to be ordered after the
-			 * store to mm_cpumask.
+			 * As above, either of cpumask_set_cpu and
+			 * load_cr3 is a sufficient barrier to force TLB
+			 * repopulation to be ordered after the store to
+			 * mm_cpumask.
 			 */
 			load_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
-- 
2.5.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] x86/mm: Improve switch_mm barrier comments
@ 2016-01-12 20:11         ` Andy Lutomirski
  0 siblings, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2016-01-12 20:11 UTC (permalink / raw)
  To: peterz, x86
  Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst,
	Andrew Morton, Denys Vlasenko, Borislav Petkov, Linus Torvalds,
	linux-tip-commits, Andy Lutomirski, stable

My previous comments were still a bit confusing and there was a
typo.  Fix it up.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/mmu_context.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 1edc9cd198b8..e08d27369fce 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -132,14 +132,14 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 * be sent, and CPU 0's TLB will contain a stale entry.)
 		 *
 		 * The bad outcome can occur if either CPU's load is
-		 * reordered before that CPU's store, so both CPUs much
+		 * reordered before that CPU's store, so both CPUs must
 		 * execute full barriers to prevent this from happening.
 		 *
 		 * Thus, switch_mm needs a full barrier between the
 		 * store to mm_cpumask and any operation that could load
 		 * from next->pgd.  This barrier synchronizes with
-		 * remote TLB flushers.  Fortunately, load_cr3 is
-		 * serializing and thus acts as a full barrier.
+		 * remote TLB flushers.  Fortunately, cpumask_set_cpu is
+		 * (on x86) a full barrier, and load_cr3 is serializing.
 		 *
 		 */
 		load_cr3(next->pgd);
@@ -188,9 +188,10 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
 			 *
-			 * As above, this is a barrier that forces
-			 * TLB repopulation to be ordered after the
-			 * store to mm_cpumask.
+			 * As above, either of cpumask_set_cpu and
+			 * load_cr3 is a sufficient barrier to force TLB
+			 * repopulation to be ordered after the store to
+			 * mm_cpumask.
 			 */
 			load_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
-- 
2.5.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] x86/mm: Improve switch_mm barrier comments
  2016-01-12 10:21     ` Ingo Molnar
@ 2016-01-12 20:47         ` Andy Lutomirski
  2016-01-12 20:47         ` Andy Lutomirski
  1 sibling, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2016-01-12 20:47 UTC (permalink / raw)
  To: peterz, x86
  Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst,
	Andrew Morton, Denys Vlasenko, Borislav Petkov, Linus Torvalds,
	linux-tip-commits, Andy Lutomirski, stable

My previous comments were still a bit confusing and there was a
typo.  Fix it up.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Changes from v1: Totally different.

arch/x86/include/asm/mmu_context.h | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 1edc9cd198b8..4fcae1e066f3 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -132,14 +132,16 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 * be sent, and CPU 0's TLB will contain a stale entry.)
 		 *
 		 * The bad outcome can occur if either CPU's load is
-		 * reordered before that CPU's store, so both CPUs much
+		 * reordered before that CPU's store, so both CPUs must
 		 * execute full barriers to prevent this from happening.
 		 *
 		 * Thus, switch_mm needs a full barrier between the
 		 * store to mm_cpumask and any operation that could load
-		 * from next->pgd.  This barrier synchronizes with
-		 * remote TLB flushers.  Fortunately, load_cr3 is
-		 * serializing and thus acts as a full barrier.
+		 * from next->pgd.  TLB fills are special and can happen
+		 * due to instruction fetches or for no reason at all,
+		 * and neither LOCK nor MFENCE orders them.
+		 * Fortunately, load_cr3 is serializing and gives the
+		 * ordering guarantee we need.
 		 *
 		 */
 		load_cr3(next->pgd);
@@ -188,9 +190,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
 			 *
-			 * As above, this is a barrier that forces
-			 * TLB repopulation to be ordered after the
-			 * store to mm_cpumask.
+			 * As above, load_cr3 is serializing and orders TLB
+			 * fills with respect to the mm_cpumask write.
 			 */
 			load_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
-- 
2.5.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] x86/mm: Improve switch_mm barrier comments
@ 2016-01-12 20:47         ` Andy Lutomirski
  0 siblings, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2016-01-12 20:47 UTC (permalink / raw)
  To: peterz, x86
  Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst,
	Andrew Morton, Denys Vlasenko, Borislav Petkov, Linus Torvalds,
	linux-tip-commits, Andy Lutomirski, stable

My previous comments were still a bit confusing and there was a
typo.  Fix it up.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Changes from v1: Totally different.

arch/x86/include/asm/mmu_context.h | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 1edc9cd198b8..4fcae1e066f3 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -132,14 +132,16 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 * be sent, and CPU 0's TLB will contain a stale entry.)
 		 *
 		 * The bad outcome can occur if either CPU's load is
-		 * reordered before that CPU's store, so both CPUs much
+		 * reordered before that CPU's store, so both CPUs must
 		 * execute full barriers to prevent this from happening.
 		 *
 		 * Thus, switch_mm needs a full barrier between the
 		 * store to mm_cpumask and any operation that could load
-		 * from next->pgd.  This barrier synchronizes with
-		 * remote TLB flushers.  Fortunately, load_cr3 is
-		 * serializing and thus acts as a full barrier.
+		 * from next->pgd.  TLB fills are special and can happen
+		 * due to instruction fetches or for no reason at all,
+		 * and neither LOCK nor MFENCE orders them.
+		 * Fortunately, load_cr3 is serializing and gives the
+		 * ordering guarantee we need.
 		 *
 		 */
 		load_cr3(next->pgd);
@@ -188,9 +190,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
 			 *
-			 * As above, this is a barrier that forces
-			 * TLB repopulation to be ordered after the
-			 * store to mm_cpumask.
+			 * As above, load_cr3 is serializing and orders TLB
+			 * fills with respect to the mm_cpumask write.
 			 */
 			load_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
-- 
2.5.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:x86/urgent] x86/mm: Improve switch_mm() barrier comments
  2016-01-12 20:47         ` Andy Lutomirski
  (?)
@ 2016-01-14  9:06         ` tip-bot for Andy Lutomirski
  -1 siblings, 0 replies; 9+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-14  9:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, riel, luto, tglx, luto, dvlasenk, linux-kernel, peterz,
	torvalds, mingo, bp, brgerst, dave.hansen

Commit-ID:  4eaffdd5a5fe6ff9f95e1ab4de1ac904d5e0fa8b
Gitweb:     http://git.kernel.org/tip/4eaffdd5a5fe6ff9f95e1ab4de1ac904d5e0fa8b
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 12 Jan 2016 12:47:40 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 13 Jan 2016 10:42:49 +0100

x86/mm: Improve switch_mm() barrier comments

My previous comments were still a bit confusing and there was a
typo. Fix it up.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mmu_context.h | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 1edc9cd..bfd9b2a 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -132,14 +132,16 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		 * be sent, and CPU 0's TLB will contain a stale entry.)
 		 *
 		 * The bad outcome can occur if either CPU's load is
-		 * reordered before that CPU's store, so both CPUs much
+		 * reordered before that CPU's store, so both CPUs must
 		 * execute full barriers to prevent this from happening.
 		 *
 		 * Thus, switch_mm needs a full barrier between the
 		 * store to mm_cpumask and any operation that could load
-		 * from next->pgd.  This barrier synchronizes with
-		 * remote TLB flushers.  Fortunately, load_cr3 is
-		 * serializing and thus acts as a full barrier.
+		 * from next->pgd.  TLB fills are special and can happen
+		 * due to instruction fetches or for no reason at all,
+		 * and neither LOCK nor MFENCE orders them.
+		 * Fortunately, load_cr3() is serializing and gives the
+		 * ordering guarantee we need.
 		 *
 		 */
 		load_cr3(next->pgd);
@@ -188,9 +190,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
 			 *
-			 * As above, this is a barrier that forces
-			 * TLB repopulation to be ordered after the
-			 * store to mm_cpumask.
+			 * As above, load_cr3() is serializing and orders TLB
+			 * fills with respect to the mm_cpumask write.
 			 */
 			load_cr3(next->pgd);
 			trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-01-14  9:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <tip-71b3c126e61177eb693423f2e18a1914205b165e@git.kernel.org>
2016-01-11 18:25 ` [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization Peter Zijlstra
2016-01-11 21:50   ` Andy Lutomirski
2016-01-11 22:22     ` Peter Zijlstra
2016-01-12 10:21     ` Ingo Molnar
2016-01-12 20:11       ` [PATCH] x86/mm: Improve switch_mm barrier comments Andy Lutomirski
2016-01-12 20:11         ` Andy Lutomirski
2016-01-12 20:47       ` [PATCH v2] " Andy Lutomirski
2016-01-12 20:47         ` Andy Lutomirski
2016-01-14  9:06         ` [tip:x86/urgent] x86/mm: Improve switch_mm() " tip-bot for Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.