linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3.10.y 04/12] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
       [not found] <1468607194-3879-1-git-send-email-ciwillia@brocade.com>
@ 2016-07-15 18:26 ` Charles (Chas) Williams
  2016-07-16  9:15   ` Willy Tarreau
  2016-07-15 19:08 ` Charles (Chas) Williams
  1 sibling, 1 reply; 4+ messages in thread
From: Charles (Chas) Williams @ 2016-07-15 18:26 UTC (permalink / raw)
  To: stable
  Cc: Andy Lutomirski, Andrew Morton, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Dave Hansen, Denys Vlasenko, H. Peter Anvin,
	Linus Torvalds, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	linux-mm, Ingo Molnar, Luis Henriques, Charles (Chas) Williams

From: Andy Lutomirski <luto@kernel.org>

commit 71b3c126e61177eb693423f2e18a1914205b165e upstream.

When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent.  In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.

Document all the barriers that make this work correctly and add
a couple that were missing.

CVE-2016-2069

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ luis: backported to 3.16:
  - dropped N/A comment in flush_tlb_mm_range()
  - adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
[ciwillia@brocade.com: backported to 3.10: adjusted context]
Signed-off-by: Charles (Chas) Williams <ciwillia@brocade.com>
---
 arch/x86/include/asm/mmu_context.h | 32 +++++++++++++++++++++++++++++++-
 arch/x86/mm/tlb.c                  | 24 +++++++++++++++++++++---
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index be12c53..c0d2f6b 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -42,7 +42,32 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 #endif
 		cpumask_set_cpu(cpu, mm_cpumask(next));
 
-		/* Re-load page tables */
+		/*
+		 * Re-load page tables.
+		 *
+		 * This logic has an ordering constraint:
+		 *
+		 *  CPU 0: Write to a PTE for 'next'
+		 *  CPU 0: load bit 1 in mm_cpumask.  if nonzero, send IPI.
+		 *  CPU 1: set bit 1 in next's mm_cpumask
+		 *  CPU 1: load from the PTE that CPU 0 writes (implicit)
+		 *
+		 * We need to prevent an outcome in which CPU 1 observes
+		 * the new PTE value and CPU 0 observes bit 1 clear in
+		 * mm_cpumask.  (If that occurs, then the IPI will never
+		 * be sent, and CPU 0's TLB will contain a stale entry.)
+		 *
+		 * The bad outcome can occur if either CPU's load is
+		 * reordered before that CPU's store, so both CPUs much
+		 * execute full barriers to prevent this from happening.
+		 *
+		 * Thus, switch_mm needs a full barrier between the
+		 * store to mm_cpumask and any operation that could load
+		 * from next->pgd.  This barrier synchronizes with
+		 * remote TLB flushers.  Fortunately, load_cr3 is
+		 * serializing and thus acts as a full barrier.
+		 *
+		 */
 		load_cr3(next->pgd);
 
 		/* Stop flush ipis for the previous mm */
@@ -65,10 +90,15 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			 * schedule, protecting us from simultaneous changes.
 			 */
 			cpumask_set_cpu(cpu, mm_cpumask(next));
+
 			/*
 			 * We were in lazy tlb mode and leave_mm disabled
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
+			 *
+			 * As above, this is a barrier that forces
+			 * TLB repopulation to be ordered after the
+			 * store to mm_cpumask.
 			 */
 			load_cr3(next->pgd);
 			load_LDT_nolock(&next->context);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 282375f..c26b610 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -149,7 +149,9 @@ void flush_tlb_current_task(void)
 
 	preempt_disable();
 
+	/* This is an implicit full barrier that synchronizes with switch_mm. */
 	local_flush_tlb();
+
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
 	preempt_enable();
@@ -188,11 +190,19 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	unsigned act_entries, tlb_entries = 0;
 
 	preempt_disable();
-	if (current->active_mm != mm)
+	if (current->active_mm != mm) {
+		/* Synchronize with switch_mm. */
+		smp_mb();
+
 		goto flush_all;
+	}
 
 	if (!current->mm) {
 		leave_mm(smp_processor_id());
+
+		/* Synchronize with switch_mm. */
+		smp_mb();
+
 		goto flush_all;
 	}
 
@@ -242,10 +252,18 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
 	preempt_disable();
 
 	if (current->active_mm == mm) {
-		if (current->mm)
+		if (current->mm) {
+			/*
+			 * Implicit full barrier (INVLPG) that synchronizes
+			 * with switch_mm.
+			 */
 			__flush_tlb_one(start);
-		else
+		} else {
 			leave_mm(smp_processor_id());
+
+			/* Synchronize with switch_mm. */
+			smp_mb();
+		}
 	}
 
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3.10.y 04/12] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
       [not found] <1468607194-3879-1-git-send-email-ciwillia@brocade.com>
  2016-07-15 18:26 ` [PATCH 3.10.y 04/12] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization Charles (Chas) Williams
@ 2016-07-15 19:08 ` Charles (Chas) Williams
  1 sibling, 0 replies; 4+ messages in thread
From: Charles (Chas) Williams @ 2016-07-15 19:08 UTC (permalink / raw)
  To: stable
  Cc: Andy Lutomirski, Andrew Morton, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Dave Hansen, Denys Vlasenko, H. Peter Anvin,
	Linus Torvalds, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	linux-mm, Ingo Molnar, Luis Henriques, Charles (Chas) Williams

From: Andy Lutomirski <luto@kernel.org>

commit 71b3c126e61177eb693423f2e18a1914205b165e upstream.

When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent.  In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.

Document all the barriers that make this work correctly and add
a couple that were missing.

CVE-2016-2069

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ luis: backported to 3.16:
  - dropped N/A comment in flush_tlb_mm_range()
  - adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
[ciwillia@brocade.com: backported to 3.10: adjusted context]
Signed-off-by: Charles (Chas) Williams <ciwillia@brocade.com>
---
 arch/x86/include/asm/mmu_context.h | 32 +++++++++++++++++++++++++++++++-
 arch/x86/mm/tlb.c                  | 24 +++++++++++++++++++++---
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index be12c53..c0d2f6b 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -42,7 +42,32 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 #endif
 		cpumask_set_cpu(cpu, mm_cpumask(next));
 
-		/* Re-load page tables */
+		/*
+		 * Re-load page tables.
+		 *
+		 * This logic has an ordering constraint:
+		 *
+		 *  CPU 0: Write to a PTE for 'next'
+		 *  CPU 0: load bit 1 in mm_cpumask.  if nonzero, send IPI.
+		 *  CPU 1: set bit 1 in next's mm_cpumask
+		 *  CPU 1: load from the PTE that CPU 0 writes (implicit)
+		 *
+		 * We need to prevent an outcome in which CPU 1 observes
+		 * the new PTE value and CPU 0 observes bit 1 clear in
+		 * mm_cpumask.  (If that occurs, then the IPI will never
+		 * be sent, and CPU 0's TLB will contain a stale entry.)
+		 *
+		 * The bad outcome can occur if either CPU's load is
+		 * reordered before that CPU's store, so both CPUs much
+		 * execute full barriers to prevent this from happening.
+		 *
+		 * Thus, switch_mm needs a full barrier between the
+		 * store to mm_cpumask and any operation that could load
+		 * from next->pgd.  This barrier synchronizes with
+		 * remote TLB flushers.  Fortunately, load_cr3 is
+		 * serializing and thus acts as a full barrier.
+		 *
+		 */
 		load_cr3(next->pgd);
 
 		/* Stop flush ipis for the previous mm */
@@ -65,10 +90,15 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			 * schedule, protecting us from simultaneous changes.
 			 */
 			cpumask_set_cpu(cpu, mm_cpumask(next));
+
 			/*
 			 * We were in lazy tlb mode and leave_mm disabled
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
+			 *
+			 * As above, this is a barrier that forces
+			 * TLB repopulation to be ordered after the
+			 * store to mm_cpumask.
 			 */
 			load_cr3(next->pgd);
 			load_LDT_nolock(&next->context);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 282375f..c26b610 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -149,7 +149,9 @@ void flush_tlb_current_task(void)
 
 	preempt_disable();
 
+	/* This is an implicit full barrier that synchronizes with switch_mm. */
 	local_flush_tlb();
+
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
 	preempt_enable();
@@ -188,11 +190,19 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	unsigned act_entries, tlb_entries = 0;
 
 	preempt_disable();
-	if (current->active_mm != mm)
+	if (current->active_mm != mm) {
+		/* Synchronize with switch_mm. */
+		smp_mb();
+
 		goto flush_all;
+	}
 
 	if (!current->mm) {
 		leave_mm(smp_processor_id());
+
+		/* Synchronize with switch_mm. */
+		smp_mb();
+
 		goto flush_all;
 	}
 
@@ -242,10 +252,18 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
 	preempt_disable();
 
 	if (current->active_mm == mm) {
-		if (current->mm)
+		if (current->mm) {
+			/*
+			 * Implicit full barrier (INVLPG) that synchronizes
+			 * with switch_mm.
+			 */
 			__flush_tlb_one(start);
-		else
+		} else {
 			leave_mm(smp_processor_id());
+
+			/* Synchronize with switch_mm. */
+			smp_mb();
+		}
 	}
 
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 3.10.y 04/12] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
  2016-07-15 18:26 ` [PATCH 3.10.y 04/12] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization Charles (Chas) Williams
@ 2016-07-16  9:15   ` Willy Tarreau
  2016-07-16 17:04     ` Charles (Chas) Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Willy Tarreau @ 2016-07-16  9:15 UTC (permalink / raw)
  To: Charles (Chas) Williams
  Cc: stable, Andy Lutomirski, Andrew Morton, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Dave Hansen, Denys Vlasenko,
	H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Rik van Riel,
	Thomas Gleixner, linux-mm, Ingo Molnar, Luis Henriques

Hi Chas,

On Fri, Jul 15, 2016 at 02:26:26PM -0400, Charles (Chas) Williams wrote:
> From: Andy Lutomirski <luto@kernel.org>
> 
> commit 71b3c126e61177eb693423f2e18a1914205b165e upstream.
> 
> When switch_mm() activates a new PGD, it also sets a bit that
> tells other CPUs that the PGD is in use so that TLB flush IPIs
> will be sent.  In order for that to work correctly, the bit
> needs to be visible prior to loading the PGD and therefore
> starting to fill the local TLB.
> 
> Document all the barriers that make this work correctly and add
> a couple that were missing.
> 
> CVE-2016-2069

I'm fine with queuing these patches for 3.10, but patches 4, 9 and 12
of your series are not in 3.14, and I only apply patches to 3.10 if
they are already present in 3.14 (or if there's a good reason of course).
Please could you check that you already submitted them ? If so I'll just
wait for them to pop up there. It's important for us to ensure that users
upgrading from extended LTS kernels to normal LTS kernels are never hit
by a bug that was previously fixed in the older one and not yet in the
newer one.

Thanks,
Willy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 3.10.y 04/12] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
  2016-07-16  9:15   ` Willy Tarreau
@ 2016-07-16 17:04     ` Charles (Chas) Williams
  0 siblings, 0 replies; 4+ messages in thread
From: Charles (Chas) Williams @ 2016-07-16 17:04 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: stable, Andy Lutomirski, Andrew Morton, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Dave Hansen, Denys Vlasenko,
	H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Rik van Riel,
	Thomas Gleixner, linux-mm, Ingo Molnar, Luis Henriques

I didn't submit for 3.14 --  I will do so Monday.

On 07/16/2016 05:15 AM, Willy Tarreau wrote:
> Hi Chas,
>
> On Fri, Jul 15, 2016 at 02:26:26PM -0400, Charles (Chas) Williams wrote:
>> From: Andy Lutomirski <luto@kernel.org>
>>
>> commit 71b3c126e61177eb693423f2e18a1914205b165e upstream.
>>
>> When switch_mm() activates a new PGD, it also sets a bit that
>> tells other CPUs that the PGD is in use so that TLB flush IPIs
>> will be sent.  In order for that to work correctly, the bit
>> needs to be visible prior to loading the PGD and therefore
>> starting to fill the local TLB.
>>
>> Document all the barriers that make this work correctly and add
>> a couple that were missing.
>>
>> CVE-2016-2069
>
> I'm fine with queuing these patches for 3.10, but patches 4, 9 and 12
> of your series are not in 3.14, and I only apply patches to 3.10 if
> they are already present in 3.14 (or if there's a good reason of course).
> Please could you check that you already submitted them ? If so I'll just
> wait for them to pop up there. It's important for us to ensure that users
> upgrading from extended LTS kernels to normal LTS kernels are never hit
> by a bug that was previously fixed in the older one and not yet in the
> newer one.
>
> Thanks,
> Willy
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-07-16 17:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1468607194-3879-1-git-send-email-ciwillia@brocade.com>
2016-07-15 18:26 ` [PATCH 3.10.y 04/12] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization Charles (Chas) Williams
2016-07-16  9:15   ` Willy Tarreau
2016-07-16 17:04     ` Charles (Chas) Williams
2016-07-15 19:08 ` Charles (Chas) Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).