All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andy Lutomirski <luto@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Borislav Petkov <bp@alien8.de>, Brian Gerst <brgerst@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-mm@kvack.org, Ingo Molnar <mingo@kernel.org>,
	Luis Henriques <luis.henriques@canonical.com>,
	"Charles (Chas) Williams" <ciwillia@brocade.com>
Subject: [PATCH 3.14 04/29] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
Date: Sun, 14 Aug 2016 22:07:32 +0200	[thread overview]
Message-ID: <20160814200731.625036088@linuxfoundation.org> (raw)
In-Reply-To: <20160814200731.375346059@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@kernel.org>

commit 71b3c126e61177eb693423f2e18a1914205b165e upstream.

When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent.  In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.

Document all the barriers that make this work correctly and add
a couple that were missing.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ luis: backported to 3.16:
  - dropped N/A comment in flush_tlb_mm_range()
  - adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
[ciwillia@brocade.com: backported to 3.14: adjusted context]
Signed-off-by: Charles (Chas) Williams <ciwillia@brocade.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/mmu_context.h |   32 +++++++++++++++++++++++++++++++-
 arch/x86/mm/tlb.c                  |   25 ++++++++++++++++++++++---
 2 files changed, 53 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -42,7 +42,32 @@ static inline void switch_mm(struct mm_s
 #endif
 		cpumask_set_cpu(cpu, mm_cpumask(next));
 
-		/* Re-load page tables */
+		/*
+		 * Re-load page tables.
+		 *
+		 * This logic has an ordering constraint:
+		 *
+		 *  CPU 0: Write to a PTE for 'next'
+		 *  CPU 0: load bit 1 in mm_cpumask.  if nonzero, send IPI.
+		 *  CPU 1: set bit 1 in next's mm_cpumask
+		 *  CPU 1: load from the PTE that CPU 0 writes (implicit)
+		 *
+		 * We need to prevent an outcome in which CPU 1 observes
+		 * the new PTE value and CPU 0 observes bit 1 clear in
+		 * mm_cpumask.  (If that occurs, then the IPI will never
+		 * be sent, and CPU 0's TLB will contain a stale entry.)
+		 *
+		 * The bad outcome can occur if either CPU's load is
+		 * reordered before that CPU's store, so both CPUs much
+		 * execute full barriers to prevent this from happening.
+		 *
+		 * Thus, switch_mm needs a full barrier between the
+		 * store to mm_cpumask and any operation that could load
+		 * from next->pgd.  This barrier synchronizes with
+		 * remote TLB flushers.  Fortunately, load_cr3 is
+		 * serializing and thus acts as a full barrier.
+		 *
+		 */
 		load_cr3(next->pgd);
 
 		/* Stop flush ipis for the previous mm */
@@ -65,10 +90,15 @@ static inline void switch_mm(struct mm_s
 			 * schedule, protecting us from simultaneous changes.
 			 */
 			cpumask_set_cpu(cpu, mm_cpumask(next));
+
 			/*
 			 * We were in lazy tlb mode and leave_mm disabled
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
+			 *
+			 * As above, this is a barrier that forces
+			 * TLB repopulation to be ordered after the
+			 * store to mm_cpumask.
 			 */
 			load_cr3(next->pgd);
 			load_LDT_nolock(&next->context);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -152,7 +152,10 @@ void flush_tlb_current_task(void)
 	preempt_disable();
 
 	count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
+
+	/* This is an implicit full barrier that synchronizes with switch_mm. */
 	local_flush_tlb();
+
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
 	preempt_enable();
@@ -166,11 +169,19 @@ void flush_tlb_mm_range(struct mm_struct
 	unsigned long nr_base_pages;
 
 	preempt_disable();
-	if (current->active_mm != mm)
+	if (current->active_mm != mm) {
+		/* Synchronize with switch_mm. */
+		smp_mb();
+
 		goto flush_all;
+	}
 
 	if (!current->mm) {
 		leave_mm(smp_processor_id());
+
+		/* Synchronize with switch_mm. */
+		smp_mb();
+
 		goto flush_all;
 	}
 
@@ -222,10 +233,18 @@ void flush_tlb_page(struct vm_area_struc
 	preempt_disable();
 
 	if (current->active_mm == mm) {
-		if (current->mm)
+		if (current->mm) {
+			/*
+			 * Implicit full barrier (INVLPG) that synchronizes
+			 * with switch_mm.
+			 */
 			__flush_tlb_one(start);
-		else
+		} else {
 			leave_mm(smp_processor_id());
+
+			/* Synchronize with switch_mm. */
+			smp_mb();
+		}
 	}
 
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)

WARNING: multiple messages have this Message-ID (diff)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andy Lutomirski <luto@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Borislav Petkov <bp@alien8.de>, Brian Gerst <brgerst@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-mm@kvack.org, Ingo Molnar <mingo@kernel.org>,
	Luis Henriques <luis.henriques@canonical.com>,
	"Charles (Chas) Williams" <ciwillia@brocade.com>
Subject: [PATCH 3.14 04/29] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
Date: Sun, 14 Aug 2016 22:07:32 +0200	[thread overview]
Message-ID: <20160814200731.625036088@linuxfoundation.org> (raw)
In-Reply-To: <20160814200731.375346059@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@kernel.org>

commit 71b3c126e61177eb693423f2e18a1914205b165e upstream.

When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent.  In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.

Document all the barriers that make this work correctly and add
a couple that were missing.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ luis: backported to 3.16:
  - dropped N/A comment in flush_tlb_mm_range()
  - adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
[ciwillia@brocade.com: backported to 3.14: adjusted context]
Signed-off-by: Charles (Chas) Williams <ciwillia@brocade.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/mmu_context.h |   32 +++++++++++++++++++++++++++++++-
 arch/x86/mm/tlb.c                  |   25 ++++++++++++++++++++++---
 2 files changed, 53 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -42,7 +42,32 @@ static inline void switch_mm(struct mm_s
 #endif
 		cpumask_set_cpu(cpu, mm_cpumask(next));
 
-		/* Re-load page tables */
+		/*
+		 * Re-load page tables.
+		 *
+		 * This logic has an ordering constraint:
+		 *
+		 *  CPU 0: Write to a PTE for 'next'
+		 *  CPU 0: load bit 1 in mm_cpumask.  if nonzero, send IPI.
+		 *  CPU 1: set bit 1 in next's mm_cpumask
+		 *  CPU 1: load from the PTE that CPU 0 writes (implicit)
+		 *
+		 * We need to prevent an outcome in which CPU 1 observes
+		 * the new PTE value and CPU 0 observes bit 1 clear in
+		 * mm_cpumask.  (If that occurs, then the IPI will never
+		 * be sent, and CPU 0's TLB will contain a stale entry.)
+		 *
+		 * The bad outcome can occur if either CPU's load is
+		 * reordered before that CPU's store, so both CPUs much
+		 * execute full barriers to prevent this from happening.
+		 *
+		 * Thus, switch_mm needs a full barrier between the
+		 * store to mm_cpumask and any operation that could load
+		 * from next->pgd.  This barrier synchronizes with
+		 * remote TLB flushers.  Fortunately, load_cr3 is
+		 * serializing and thus acts as a full barrier.
+		 *
+		 */
 		load_cr3(next->pgd);
 
 		/* Stop flush ipis for the previous mm */
@@ -65,10 +90,15 @@ static inline void switch_mm(struct mm_s
 			 * schedule, protecting us from simultaneous changes.
 			 */
 			cpumask_set_cpu(cpu, mm_cpumask(next));
+
 			/*
 			 * We were in lazy tlb mode and leave_mm disabled
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
+			 *
+			 * As above, this is a barrier that forces
+			 * TLB repopulation to be ordered after the
+			 * store to mm_cpumask.
 			 */
 			load_cr3(next->pgd);
 			load_LDT_nolock(&next->context);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -152,7 +152,10 @@ void flush_tlb_current_task(void)
 	preempt_disable();
 
 	count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
+
+	/* This is an implicit full barrier that synchronizes with switch_mm. */
 	local_flush_tlb();
+
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
 	preempt_enable();
@@ -166,11 +169,19 @@ void flush_tlb_mm_range(struct mm_struct
 	unsigned long nr_base_pages;
 
 	preempt_disable();
-	if (current->active_mm != mm)
+	if (current->active_mm != mm) {
+		/* Synchronize with switch_mm. */
+		smp_mb();
+
 		goto flush_all;
+	}
 
 	if (!current->mm) {
 		leave_mm(smp_processor_id());
+
+		/* Synchronize with switch_mm. */
+		smp_mb();
+
 		goto flush_all;
 	}
 
@@ -222,10 +233,18 @@ void flush_tlb_page(struct vm_area_struc
 	preempt_disable();
 
 	if (current->active_mm == mm) {
-		if (current->mm)
+		if (current->mm) {
+			/*
+			 * Implicit full barrier (INVLPG) that synchronizes
+			 * with switch_mm.
+			 */
 			__flush_tlb_one(start);
-		else
+		} else {
 			leave_mm(smp_processor_id());
+
+			/* Synchronize with switch_mm. */
+			smp_mb();
+		}
 	}
 
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-08-14 20:08 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20160814200812uscas1p1ef0170d47bedbb472ff4f71fa6e71b1c@uscas1p1.samsung.com>
2016-08-14 20:07 ` [PATCH 3.14 00/29] 3.14.76-stable review Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 01/29] USB: fix invalid memory access in hub_activate() Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 02/29] mm: migrate dirty page without clear_page_dirty_for_io etc Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 03/29] printk: do cond_resched() between lines while outputting to consoles Greg Kroah-Hartman
2016-08-14 20:07   ` Greg Kroah-Hartman [this message]
2016-08-14 20:07     ` [PATCH 3.14 04/29] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 05/29] sctp: Prevent soft lockup when sctp_accept() is called during a timeout event Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 06/29] x86/mm: Improve switch_mm() barrier comments Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 08/29] USB: fix up incorrect quirk Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 09/29] arm: oabi compat: add missing access checks Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 10/29] KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 11/29] apparmor: fix ref count leak when profile sha1 hash is read Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 12/29] random: strengthen input validation for RNDADDTOENTCNT Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 13/29] scsi: remove scsi_end_request Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 14/29] scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 15/29] IB/security: Restrict use of the write() interface Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 16/29] block: fix use-after-free in seq file Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 17/29] sysv, ipc: fix security-layer leaking Greg Kroah-Hartman
2016-08-21 11:49     ` Willy Tarreau
2016-08-29  9:23       ` Manfred Spraul
2016-08-29 11:49         ` Willy Tarreau
2016-08-14 20:07   ` [PATCH 3.14 18/29] fuse: fix wrong assignment of ->flags in fuse_send_init() Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 19/29] crypto: gcm - Filter out async ghash if necessary Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 20/29] crypto: scatterwalk - Fix test in scatterwalk_done Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 21/29] ext4: check for extents that wrap around Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 22/29] ext4: fix deadlock during page writeback Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 23/29] ext4: dont call ext4_should_journal_data() on the journal inode Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 24/29] ext4: short-cut orphan cleanup on error Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 25/29] bonding: set carrier off for devices created through netlink Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 26/29] net/irda: fix NULL pointer dereference on memory allocation failure Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 27/29] tcp: consider recv buf for the initial window scale Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 28/29] [PATCH 1/8] tcp: make challenge acks less predictable Greg Kroah-Hartman
2016-08-14 20:07   ` [PATCH 3.14 29/29] ext4: fix reference counting bug on block allocation error Greg Kroah-Hartman
2016-08-15 14:49   ` [PATCH 3.14 00/29] 3.14.76-stable review Guenter Roeck
2016-08-16  4:01   ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160814200731.625036088@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=ciwillia@brocade.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luis.henriques@canonical.com \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.