* Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization [not found] <tip-71b3c126e61177eb693423f2e18a1914205b165e@git.kernel.org> @ 2016-01-11 18:25 ` Peter Zijlstra 2016-01-11 21:50 ` Andy Lutomirski 0 siblings, 1 reply; 7+ messages in thread From: Peter Zijlstra @ 2016-01-11 18:25 UTC (permalink / raw) To: linux-kernel, dave.hansen, riel, brgerst, akpm, luto, mingo, dvlasenk, hpa, tglx, bp, luto, torvalds Cc: linux-tip-commits On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote: > --- a/arch/x86/include/asm/mmu_context.h > +++ b/arch/x86/include/asm/mmu_context.h > @@ -116,8 +116,34 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, > #endif > cpumask_set_cpu(cpu, mm_cpumask(next)); > > - /* Re-load page tables */ > + /* > + * Re-load page tables. > + * > + * This logic has an ordering constraint: > + * > + * CPU 0: Write to a PTE for 'next' > + * CPU 0: load bit 1 in mm_cpumask. if nonzero, send IPI. > + * CPU 1: set bit 1 in next's mm_cpumask > + * CPU 1: load from the PTE that CPU 0 writes (implicit) > + * > + * We need to prevent an outcome in which CPU 1 observes > + * the new PTE value and CPU 0 observes bit 1 clear in > + * mm_cpumask. (If that occurs, then the IPI will never > + * be sent, and CPU 0's TLB will contain a stale entry.) > + * > + * The bad outcome can occur if either CPU's load is > + * reordered before that CPU's store, so both CPUs much s/much/must/ ? > + * execute full barriers to prevent this from happening. > + * > + * Thus, switch_mm needs a full barrier between the > + * store to mm_cpumask and any operation that could load > + * from next->pgd. This barrier synchronizes with > + * remote TLB flushers. Fortunately, load_cr3 is > + * serializing and thus acts as a full barrier. > + * > + */ > load_cr3(next->pgd); > + > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); > > /* Stop flush ipis for the previous mm */ > @@ -156,10 +182,15 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, > * schedule, protecting us from simultaneous changes. > */ > cpumask_set_cpu(cpu, mm_cpumask(next)); > + > /* > * We were in lazy tlb mode and leave_mm disabled > * tlb flush IPI delivery. We must reload CR3 > * to make sure to use no freed page tables. > + * > + * As above, this is a barrier that forces > + * TLB repopulation to be ordered after the > + * store to mm_cpumask. somewhat confused by this comment, cpumask_set_cpu() is a LOCK BTS, that is already fully ordered. > */ > load_cr3(next->pgd); > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 8ddb5d0..8f4cc3d 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -188,17 +191,29 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, > if (!current->mm) { > leave_mm(smp_processor_id()); > + > + /* Synchronize with switch_mm. */ > + smp_mb(); > + > goto out; > } > + } else { > leave_mm(smp_processor_id()); > + > + /* Synchronize with switch_mm. */ > + smp_mb(); > + } > } The alternative is making leave_mm() unconditionally imply a full barrier. I've not looked at other sites using it though. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization 2016-01-11 18:25 ` [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization Peter Zijlstra @ 2016-01-11 21:50 ` Andy Lutomirski 2016-01-11 22:22 ` Peter Zijlstra 2016-01-12 10:21 ` Ingo Molnar 0 siblings, 2 replies; 7+ messages in thread From: Andy Lutomirski @ 2016-01-11 21:50 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst, Andrew Morton, Ingo Molnar, Denys Vlasenko, H. Peter Anvin, Thomas Gleixner, Borislav Petkov, Andrew Lutomirski, Linus Torvalds, linux-tip-commits On Mon, Jan 11, 2016 at 10:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote: >> --- a/arch/x86/include/asm/mmu_context.h >> +++ b/arch/x86/include/asm/mmu_context.h >> @@ -116,8 +116,34 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, >> #endif >> cpumask_set_cpu(cpu, mm_cpumask(next)); >> >> - /* Re-load page tables */ >> + /* >> + * Re-load page tables. >> + * >> + * This logic has an ordering constraint: >> + * >> + * CPU 0: Write to a PTE for 'next' >> + * CPU 0: load bit 1 in mm_cpumask. if nonzero, send IPI. >> + * CPU 1: set bit 1 in next's mm_cpumask >> + * CPU 1: load from the PTE that CPU 0 writes (implicit) >> + * >> + * We need to prevent an outcome in which CPU 1 observes >> + * the new PTE value and CPU 0 observes bit 1 clear in >> + * mm_cpumask. (If that occurs, then the IPI will never >> + * be sent, and CPU 0's TLB will contain a stale entry.) >> + * >> + * The bad outcome can occur if either CPU's load is >> + * reordered before that CPU's store, so both CPUs much > > s/much/must/ ? Indeed. Is this worth a follow-up patch? > >> + * execute full barriers to prevent this from happening. >> + * >> + * Thus, switch_mm needs a full barrier between the >> + * store to mm_cpumask and any operation that could load >> + * from next->pgd. This barrier synchronizes with >> + * remote TLB flushers. Fortunately, load_cr3 is >> + * serializing and thus acts as a full barrier. >> + * >> + */ >> load_cr3(next->pgd); >> + >> trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); >> >> /* Stop flush ipis for the previous mm */ >> @@ -156,10 +182,15 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, >> * schedule, protecting us from simultaneous changes. >> */ >> cpumask_set_cpu(cpu, mm_cpumask(next)); >> + >> /* >> * We were in lazy tlb mode and leave_mm disabled >> * tlb flush IPI delivery. We must reload CR3 >> * to make sure to use no freed page tables. >> + * >> + * As above, this is a barrier that forces >> + * TLB repopulation to be ordered after the >> + * store to mm_cpumask. > > somewhat confused by this comment, cpumask_set_cpu() is a LOCK BTS, that > is already fully ordered. There are more than enough barriers here. v1 had cpumask_set_cpu; smp_mb__after_atomic, which is more portable and generates identical code. I don't have a real preference for which barrier we should consider to the important one. > >> */ >> load_cr3(next->pgd); >> trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); >> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >> index 8ddb5d0..8f4cc3d 100644 > > >> --- a/arch/x86/mm/tlb.c >> +++ b/arch/x86/mm/tlb.c > >> @@ -188,17 +191,29 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, > >> if (!current->mm) { >> leave_mm(smp_processor_id()); >> + >> + /* Synchronize with switch_mm. */ >> + smp_mb(); >> + >> goto out; >> } > >> + } else { >> leave_mm(smp_processor_id()); >> + >> + /* Synchronize with switch_mm. */ >> + smp_mb(); >> + } >> } > > The alternative is making leave_mm() unconditionally imply a full > barrier. I've not looked at other sites using it though. For a quick fix, I preferred the more self-contained change. --Andy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization 2016-01-11 21:50 ` Andy Lutomirski @ 2016-01-11 22:22 ` Peter Zijlstra 2016-01-12 10:21 ` Ingo Molnar 1 sibling, 0 replies; 7+ messages in thread From: Peter Zijlstra @ 2016-01-11 22:22 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst, Andrew Morton, Ingo Molnar, Denys Vlasenko, H. Peter Anvin, Thomas Gleixner, Borislav Petkov, Andrew Lutomirski, Linus Torvalds, linux-tip-commits On Mon, Jan 11, 2016 at 01:50:24PM -0800, Andy Lutomirski wrote: > On Mon, Jan 11, 2016 at 10:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote: > >> + * The bad outcome can occur if either CPU's load is > >> + * reordered before that CPU's store, so both CPUs much > > > > s/much/must/ ? > > Indeed. Is this worth a follow-up patch? Dunno, I didn't even spot the typo the first time I read it.. :-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization 2016-01-11 21:50 ` Andy Lutomirski 2016-01-11 22:22 ` Peter Zijlstra @ 2016-01-12 10:21 ` Ingo Molnar 2016-01-12 20:11 ` [PATCH] x86/mm: Improve switch_mm barrier comments Andy Lutomirski 2016-01-12 20:47 ` [PATCH v2] " Andy Lutomirski 1 sibling, 2 replies; 7+ messages in thread From: Ingo Molnar @ 2016-01-12 10:21 UTC (permalink / raw) To: Andy Lutomirski Cc: Peter Zijlstra, linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst, Andrew Morton, Denys Vlasenko, H. Peter Anvin, Thomas Gleixner, Borislav Petkov, Andrew Lutomirski, Linus Torvalds, linux-tip-commits * Andy Lutomirski <luto@amacapital.net> wrote: > On Mon, Jan 11, 2016 at 10:25 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote: > >> --- a/arch/x86/include/asm/mmu_context.h > >> +++ b/arch/x86/include/asm/mmu_context.h > >> @@ -116,8 +116,34 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, > >> #endif > >> cpumask_set_cpu(cpu, mm_cpumask(next)); > >> > >> - /* Re-load page tables */ > >> + /* > >> + * Re-load page tables. > >> + * > >> + * This logic has an ordering constraint: > >> + * > >> + * CPU 0: Write to a PTE for 'next' > >> + * CPU 0: load bit 1 in mm_cpumask. if nonzero, send IPI. > >> + * CPU 1: set bit 1 in next's mm_cpumask > >> + * CPU 1: load from the PTE that CPU 0 writes (implicit) > >> + * > >> + * We need to prevent an outcome in which CPU 1 observes > >> + * the new PTE value and CPU 0 observes bit 1 clear in > >> + * mm_cpumask. (If that occurs, then the IPI will never > >> + * be sent, and CPU 0's TLB will contain a stale entry.) > >> + * > >> + * The bad outcome can occur if either CPU's load is > >> + * reordered before that CPU's store, so both CPUs much > > > > s/much/must/ ? > > Indeed. Is this worth a follow-up patch? Absolutely! Any typos in code noticed by humans are worth fixing, especially when it's comments around tricky code. Could be done together with improving this part of the comments: > > + > > /* > > * We were in lazy tlb mode and leave_mm disabled > > * tlb flush IPI delivery. We must reload CR3 > > * to make sure to use no freed page tables. > > + * > > + * As above, this is a barrier that forces > > + * TLB repopulation to be ordered after the > > + * store to mm_cpumask. > > somewhat confused by this comment, cpumask_set_cpu() is a LOCK BTS, that is > already fully ordered. ... as pretty much any barriers related comment that confuses Peter needs to be improved, by definition. ;-) Thanks, Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] x86/mm: Improve switch_mm barrier comments 2016-01-12 10:21 ` Ingo Molnar @ 2016-01-12 20:11 ` Andy Lutomirski 2016-01-12 20:47 ` [PATCH v2] " Andy Lutomirski 1 sibling, 0 replies; 7+ messages in thread From: Andy Lutomirski @ 2016-01-12 20:11 UTC (permalink / raw) To: peterz, x86 Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst, Andrew Morton, Denys Vlasenko, Borislav Petkov, Linus Torvalds, linux-tip-commits, Andy Lutomirski, stable My previous comments were still a bit confusing and there was a typo. Fix it up. Reported-by: Peter Zijlstra <peterz@infradead.org> Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization") Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> --- arch/x86/include/asm/mmu_context.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 1edc9cd198b8..e08d27369fce 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -132,14 +132,14 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * be sent, and CPU 0's TLB will contain a stale entry.) * * The bad outcome can occur if either CPU's load is - * reordered before that CPU's store, so both CPUs much + * reordered before that CPU's store, so both CPUs must * execute full barriers to prevent this from happening. * * Thus, switch_mm needs a full barrier between the * store to mm_cpumask and any operation that could load * from next->pgd. This barrier synchronizes with - * remote TLB flushers. Fortunately, load_cr3 is - * serializing and thus acts as a full barrier. + * remote TLB flushers. Fortunately, cpumask_set_cpu is + * (on x86) a full barrier, and load_cr3 is serializing. * */ load_cr3(next->pgd); @@ -188,9 +188,10 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * tlb flush IPI delivery. We must reload CR3 * to make sure to use no freed page tables. * - * As above, this is a barrier that forces - * TLB repopulation to be ordered after the - * store to mm_cpumask. + * As above, either of cpumask_set_cpu and + * load_cr3 is a sufficient barrier to force TLB + * repopulation to be ordered after the store to + * mm_cpumask. */ load_cr3(next->pgd); trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); -- 2.5.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2] x86/mm: Improve switch_mm barrier comments 2016-01-12 10:21 ` Ingo Molnar 2016-01-12 20:11 ` [PATCH] x86/mm: Improve switch_mm barrier comments Andy Lutomirski @ 2016-01-12 20:47 ` Andy Lutomirski 2016-01-14 9:06 ` [tip:x86/urgent] x86/mm: Improve switch_mm() " tip-bot for Andy Lutomirski 1 sibling, 1 reply; 7+ messages in thread From: Andy Lutomirski @ 2016-01-12 20:47 UTC (permalink / raw) To: peterz, x86 Cc: linux-kernel, Dave Hansen, Rik van Riel, Brian Gerst, Andrew Morton, Denys Vlasenko, Borislav Petkov, Linus Torvalds, linux-tip-commits, Andy Lutomirski, stable My previous comments were still a bit confusing and there was a typo. Fix it up. Reported-by: Peter Zijlstra <peterz@infradead.org> Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization") Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> --- Changes from v1: Totally different. arch/x86/include/asm/mmu_context.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 1edc9cd198b8..4fcae1e066f3 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -132,14 +132,16 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * be sent, and CPU 0's TLB will contain a stale entry.) * * The bad outcome can occur if either CPU's load is - * reordered before that CPU's store, so both CPUs much + * reordered before that CPU's store, so both CPUs must * execute full barriers to prevent this from happening. * * Thus, switch_mm needs a full barrier between the * store to mm_cpumask and any operation that could load - * from next->pgd. This barrier synchronizes with - * remote TLB flushers. Fortunately, load_cr3 is - * serializing and thus acts as a full barrier. + * from next->pgd. TLB fills are special and can happen + * due to instruction fetches or for no reason at all, + * and neither LOCK nor MFENCE orders them. + * Fortunately, load_cr3 is serializing and gives the + * ordering guarantee we need. * */ load_cr3(next->pgd); @@ -188,9 +190,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * tlb flush IPI delivery. We must reload CR3 * to make sure to use no freed page tables. * - * As above, this is a barrier that forces - * TLB repopulation to be ordered after the - * store to mm_cpumask. + * As above, load_cr3 is serializing and orders TLB + * fills with respect to the mm_cpumask write. */ load_cr3(next->pgd); trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); -- 2.5.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [tip:x86/urgent] x86/mm: Improve switch_mm() barrier comments 2016-01-12 20:47 ` [PATCH v2] " Andy Lutomirski @ 2016-01-14 9:06 ` tip-bot for Andy Lutomirski 0 siblings, 0 replies; 7+ messages in thread From: tip-bot for Andy Lutomirski @ 2016-01-14 9:06 UTC (permalink / raw) To: linux-tip-commits Cc: hpa, riel, luto, tglx, luto, dvlasenk, linux-kernel, peterz, torvalds, mingo, bp, brgerst, dave.hansen Commit-ID: 4eaffdd5a5fe6ff9f95e1ab4de1ac904d5e0fa8b Gitweb: http://git.kernel.org/tip/4eaffdd5a5fe6ff9f95e1ab4de1ac904d5e0fa8b Author: Andy Lutomirski <luto@kernel.org> AuthorDate: Tue, 12 Jan 2016 12:47:40 -0800 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Wed, 13 Jan 2016 10:42:49 +0100 x86/mm: Improve switch_mm() barrier comments My previous comments were still a bit confusing and there was a typo. Fix it up. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization") Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/include/asm/mmu_context.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 1edc9cd..bfd9b2a 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -132,14 +132,16 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * be sent, and CPU 0's TLB will contain a stale entry.) * * The bad outcome can occur if either CPU's load is - * reordered before that CPU's store, so both CPUs much + * reordered before that CPU's store, so both CPUs must * execute full barriers to prevent this from happening. * * Thus, switch_mm needs a full barrier between the * store to mm_cpumask and any operation that could load - * from next->pgd. This barrier synchronizes with - * remote TLB flushers. Fortunately, load_cr3 is - * serializing and thus acts as a full barrier. + * from next->pgd. TLB fills are special and can happen + * due to instruction fetches or for no reason at all, + * and neither LOCK nor MFENCE orders them. + * Fortunately, load_cr3() is serializing and gives the + * ordering guarantee we need. * */ load_cr3(next->pgd); @@ -188,9 +190,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, * tlb flush IPI delivery. We must reload CR3 * to make sure to use no freed page tables. * - * As above, this is a barrier that forces - * TLB repopulation to be ordered after the - * store to mm_cpumask. + * As above, load_cr3() is serializing and orders TLB + * fills with respect to the mm_cpumask write. */ load_cr3(next->pgd); trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-01-14 9:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <tip-71b3c126e61177eb693423f2e18a1914205b165e@git.kernel.org> 2016-01-11 18:25 ` [tip:x86/urgent] x86/mm: Add barriers and document switch_mm() -vs-flush synchronization Peter Zijlstra 2016-01-11 21:50 ` Andy Lutomirski 2016-01-11 22:22 ` Peter Zijlstra 2016-01-12 10:21 ` Ingo Molnar 2016-01-12 20:11 ` [PATCH] x86/mm: Improve switch_mm barrier comments Andy Lutomirski 2016-01-12 20:47 ` [PATCH v2] " Andy Lutomirski 2016-01-14 9:06 ` [tip:x86/urgent] x86/mm: Improve switch_mm() " tip-bot for Andy Lutomirski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).