From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id D9CC66B0292 for ; Tue, 25 Jul 2017 00:41:42 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id z19so5607969oia.13 for ; Mon, 24 Jul 2017 21:41:42 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id l197si1889831oib.40.2017.07.24.21.41.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jul 2017 21:41:41 -0700 (PDT) From: Andy Lutomirski Subject: [PATCH v5 0/2] x86/mm: PCID Date: Mon, 24 Jul 2017 21:41:37 -0700 Message-Id: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, Borislav Petkov , Linus Torvalds , Andrew Morton , Mel Gorman , "linux-mm@kvack.org" , Nadav Amit , Rik van Riel , Dave Hansen , Arjan van de Ven , Peter Zijlstra , Andy Lutomirski Here's PCID v5. Changes from v4: - Remove smp_mb__after_atomic() (Peterz) - Rebase, which involved tiny fixups due to SME - Add the doc patch, as promised Andy Lutomirski (2): x86/mm: Try to preserve old TLB entries using PCID x86/mm: Improve TLB flush documentation arch/x86/include/asm/mmu_context.h | 3 + arch/x86/include/asm/processor-flags.h | 2 + arch/x86/include/asm/tlbflush.h | 18 ++++- arch/x86/mm/init.c | 1 + arch/x86/mm/tlb.c | 123 ++++++++++++++++++++++++++------- 5 files changed, 119 insertions(+), 28 deletions(-) -- 2.9.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id 64A1C6B02B4 for ; Tue, 25 Jul 2017 00:41:44 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id p62so10234446oih.12 for ; Mon, 24 Jul 2017 21:41:44 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id j7si6123661oif.197.2017.07.24.21.41.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jul 2017 21:41:43 -0700 (PDT) From: Andy Lutomirski Subject: [PATCH v5 2/2] x86/mm: Improve TLB flush documentation Date: Mon, 24 Jul 2017 21:41:39 -0700 Message-Id: <695299daa67239284e8db5a60d4d7eb88c914e0a.1500957502.git.luto@kernel.org> In-Reply-To: References: In-Reply-To: References: Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, Borislav Petkov , Linus Torvalds , Andrew Morton , Mel Gorman , "linux-mm@kvack.org" , Nadav Amit , Rik van Riel , Dave Hansen , Arjan van de Ven , Peter Zijlstra , Andy Lutomirski Improve comments as requested by PeterZ and also add some documentation at the top of the file. Signed-off-by: Andy Lutomirski --- arch/x86/mm/tlb.c | 43 +++++++++++++++++++++++++++++++++---------- 1 file changed, 33 insertions(+), 10 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index ce104b962a17..d4ee781ca656 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -15,17 +15,24 @@ #include /* - * TLB flushing, formerly SMP-only - * c/o Linus Torvalds. + * The code in this file handles mm switches and TLB flushes. * - * These mean you can really definitely utterly forget about - * writing to user space from interrupts. (Its not allowed anyway). + * An mm's TLB state is logically represented by a totally ordered sequence + * of TLB flushes. Each flush increments the mm's tlb_gen. * - * Optimizations Manfred Spraul + * Each CPU that might have an mm in its TLB (and that might ever use + * those TLB entries) will have an entry for it in its cpu_tlbstate.ctxs + * array. The kernel maintains the following invariant: for each CPU and + * for each mm in its cpu_tlbstate.ctxs array, the CPU has performed all + * flushes in that mms history up to the tlb_gen in cpu_tlbstate.ctxs + * or the CPU has performed an equivalent set of flushes. * - * More scalable flush, from Andi Kleen - * - * Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi + * For this purpose, an equivalent set is a set that is at least as strong. + * So, for example, if the flush history is a full flush at time 1, + * a full flush after time 1 is sufficient, but a full flush before time 1 + * is not. Similarly, any number of flushes can be replaced by a single + * full flush so long as that replacement flush is after all the flushes + * that it's replacing. */ atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1); @@ -138,7 +145,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, return; } - /* Resume remote flushes and then read tlb_gen. */ + /* + * Resume remote flushes and then read tlb_gen. The + * implied barrier in atomic64_read() synchronizes + * with inc_mm_tlb_gen() like this: + * + * switch_mm_irqs_off(): flush request: + * cpumask_set_cpu(...); inc_mm_tlb_gen(); + * MB MB + * atomic64_read(.tlb_gen); flush_tlb_others(mm_cpumask()); + */ cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); @@ -186,7 +202,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, VM_WARN_ON_ONCE(cpumask_test_cpu(cpu, mm_cpumask(next))); /* - * Start remote flushes and then read tlb_gen. + * Start remote flushes and then read tlb_gen. As + * above, the implied barrier in atomic64_read() + * synchronizes with inc_mm_tlb_gen() like this: + * + * switch_mm_irqs_off(): flush request: + * cpumask_set_cpu(...); inc_mm_tlb_gen(); + * MB MB + * atomic64_read(.tlb_gen); flush_tlb_others(mm_cpumask()); */ cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); -- 2.9.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id 6E79F6B02C3 for ; Tue, 25 Jul 2017 00:41:44 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id t18so10687065oih.11 for ; Mon, 24 Jul 2017 21:41:44 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id p2si6040195oig.167.2017.07.24.21.41.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jul 2017 21:41:42 -0700 (PDT) From: Andy Lutomirski Subject: [PATCH v5 1/2] x86/mm: Try to preserve old TLB entries using PCID Date: Mon, 24 Jul 2017 21:41:38 -0700 Message-Id: <9ee75f17a81770feed616358e6860d98a2a5b1e7.1500957502.git.luto@kernel.org> In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, Borislav Petkov , Linus Torvalds , Andrew Morton , Mel Gorman , "linux-mm@kvack.org" , Nadav Amit , Rik van Riel , Dave Hansen , Arjan van de Ven , Peter Zijlstra , Andy Lutomirski PCID is a "process context ID" -- it's what other architectures call an address space ID. Every non-global TLB entry is tagged with a PCID, only TLB entries that match the currently selected PCID are used, and we can switch PGDs without flushing the TLB. x86's PCID is 12 bits. This is an unorthodox approach to using PCID. x86's PCID is far too short to uniquely identify a process, and we can't even really uniquely identify a running process because there are monster systems with over 4096 CPUs. To make matters worse, past attempts to use all 12 PCID bits have resulted in slowdowns instead of speedups. This patch uses PCID differently. We use a PCID to identify a recently-used mm on a per-cpu basis. An mm has no fixed PCID binding at all; instead, we give it a fresh PCID each time it's loaded except in cases where we want to preserve the TLB, in which case we reuse a recent value. Here are some benchmark results, done on a Skylake laptop at 2.3 GHz (turbo off, intel_pstate requesting max performance) under KVM with the guest using idle=poll (to avoid artifacts when bouncing between CPUs). I haven't done any real statistics here -- I just ran them in a loop and picked the fastest results that didn't look like outliers. Unpatched means commit a4eb8b993554, so all the bookkeeping overhead is gone. ping-pong between two mms on the same CPU using eventfd: patched: 1.22Aus patched, nopcid: 1.33Aus unpatched: 1.34Aus Same ping-pong, but now touch 512 pages (all zero-page to minimize cache misses) each iteration. dTLB misses are measured by dtlb_load_misses.miss_causes_a_walk: patched: 1.8Aus 11M dTLB misses patched, nopcid: 6.2Aus, 207M dTLB misses unpatched: 6.1Aus, 190M dTLB misses Reviewed-by: Nadav Amit Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/mmu_context.h | 3 ++ arch/x86/include/asm/processor-flags.h | 2 + arch/x86/include/asm/tlbflush.h | 18 +++++++- arch/x86/mm/init.c | 1 + arch/x86/mm/tlb.c | 80 +++++++++++++++++++++++++++------- 5 files changed, 86 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 85f6b5575aad..14b3cdccf4f9 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -300,6 +300,9 @@ static inline unsigned long __get_current_cr3_fast(void) { unsigned long cr3 = __pa(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd); + if (static_cpu_has(X86_FEATURE_PCID)) + cr3 |= this_cpu_read(cpu_tlbstate.loaded_mm_asid); + /* For now, be very restrictive about when this can be called. */ VM_WARN_ON(in_nmi() || !in_atomic()); diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h index f5d3e50af98c..8a6d89fc9a79 100644 --- a/arch/x86/include/asm/processor-flags.h +++ b/arch/x86/include/asm/processor-flags.h @@ -36,6 +36,7 @@ /* Mask off the address space ID and SME encryption bits. */ #define CR3_ADDR_MASK __sme_clr(0x7FFFFFFFFFFFF000ull) #define CR3_PCID_MASK 0xFFFull +#define CR3_NOFLUSH (1UL << 63) #else /* * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save @@ -43,6 +44,7 @@ */ #define CR3_ADDR_MASK 0xFFFFFFFFull #define CR3_PCID_MASK 0ull +#define CR3_NOFLUSH 0 #endif #endif /* _ASM_X86_PROCESSOR_FLAGS_H */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 6397275008db..d23e61dc0640 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -82,6 +82,12 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) #define __flush_tlb_single(addr) __native_flush_tlb_single(addr) #endif +/* + * 6 because 6 should be plenty and struct tlb_state will fit in + * two cache lines. + */ +#define TLB_NR_DYN_ASIDS 6 + struct tlb_context { u64 ctx_id; u64 tlb_gen; @@ -95,6 +101,8 @@ struct tlb_state { * mode even if we've already switched back to swapper_pg_dir. */ struct mm_struct *loaded_mm; + u16 loaded_mm_asid; + u16 next_asid; /* * Access to this CR4 shadow and to H/W CR4 is protected by @@ -104,7 +112,8 @@ struct tlb_state { /* * This is a list of all contexts that might exist in the TLB. - * Since we don't yet use PCID, there is only one context. + * There is one per ASID that we use, and the ASID (what the + * CPU calls PCID) is the index into ctxts. * * For each context, ctx_id indicates which mm the TLB's user * entries came from. As an invariant, the TLB will never @@ -114,8 +123,13 @@ struct tlb_state { * To be clear, this means that it's legal for the TLB code to * flush the TLB without updating tlb_gen. This can happen * (for now, at least) due to paravirt remote flushes. + * + * NB: context 0 is a bit special, since it's also used by + * various bits of init code. This is fine -- code that + * isn't aware of PCID will end up harmlessly flushing + * context 0. */ - struct tlb_context ctxs[1]; + struct tlb_context ctxs[TLB_NR_DYN_ASIDS]; }; DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate); diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 4d353efb2838..65ae17d45c4a 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -812,6 +812,7 @@ void __init zone_sizes_init(void) DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = { .loaded_mm = &init_mm, + .next_asid = 1, .cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */ }; EXPORT_SYMBOL_GPL(cpu_tlbstate); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 593d2f76a54c..ce104b962a17 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -30,6 +30,40 @@ atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1); +static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, + u16 *new_asid, bool *need_flush) +{ + u16 asid; + + if (!static_cpu_has(X86_FEATURE_PCID)) { + *new_asid = 0; + *need_flush = true; + return; + } + + for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) { + if (this_cpu_read(cpu_tlbstate.ctxs[asid].ctx_id) != + next->context.ctx_id) + continue; + + *new_asid = asid; + *need_flush = (this_cpu_read(cpu_tlbstate.ctxs[asid].tlb_gen) < + next_tlb_gen); + return; + } + + /* + * We don't currently own an ASID slot on this CPU. + * Allocate a slot. + */ + *new_asid = this_cpu_add_return(cpu_tlbstate.next_asid, 1) - 1; + if (*new_asid >= TLB_NR_DYN_ASIDS) { + *new_asid = 0; + this_cpu_write(cpu_tlbstate.next_asid, 1); + } + *need_flush = true; +} + void leave_mm(int cpu) { struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm); @@ -65,6 +99,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm); + u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); unsigned cpu = smp_processor_id(); u64 next_tlb_gen; @@ -84,12 +119,13 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, /* * Verify that CR3 is what we think it is. This will catch * hypothetical buggy code that directly switches to swapper_pg_dir - * without going through leave_mm() / switch_mm_irqs_off(). + * without going through leave_mm() / switch_mm_irqs_off() or that + * does something like write_cr3(read_cr3_pa()). */ - VM_BUG_ON(read_cr3_pa() != __pa(real_prev->pgd)); + VM_BUG_ON(__read_cr3() != (__sme_pa(real_prev->pgd) | prev_asid)); if (real_prev == next) { - VM_BUG_ON(this_cpu_read(cpu_tlbstate.ctxs[0].ctx_id) != + VM_BUG_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) != next->context.ctx_id); if (cpumask_test_cpu(cpu, mm_cpumask(next))) { @@ -106,16 +142,17 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); - if (this_cpu_read(cpu_tlbstate.ctxs[0].tlb_gen) < next_tlb_gen) { + if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) < + next_tlb_gen) { /* * Ideally, we'd have a flush_tlb() variant that * takes the known CR3 value as input. This would * be faster on Xen PV and on hypothetical CPUs * on which INVPCID is fast. */ - this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, + this_cpu_write(cpu_tlbstate.ctxs[prev_asid].tlb_gen, next_tlb_gen); - write_cr3(__sme_pa(next->pgd)); + write_cr3(__sme_pa(next->pgd) | prev_asid); trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); } @@ -126,8 +163,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * are not reflected in tlb_gen.) */ } else { - VM_BUG_ON(this_cpu_read(cpu_tlbstate.ctxs[0].ctx_id) == - next->context.ctx_id); + u16 new_asid; + bool need_flush; if (IS_ENABLED(CONFIG_VMAP_STACK)) { /* @@ -154,12 +191,22 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); - this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, next->context.ctx_id); - this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, next_tlb_gen); - this_cpu_write(cpu_tlbstate.loaded_mm, next); - write_cr3(__sme_pa(next->pgd)); + choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + if (need_flush) { + this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id); + this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen); + write_cr3(__sme_pa(next->pgd) | new_asid); + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, + TLB_FLUSH_ALL); + } else { + /* The new ASID is already up to date. */ + write_cr3(__sme_pa(next->pgd) | new_asid | CR3_NOFLUSH); + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0); + } + + this_cpu_write(cpu_tlbstate.loaded_mm, next); + this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid); } load_mm_cr4(next); @@ -186,13 +233,14 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f, * wants us to catch up to. */ struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm); + u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen); - u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[0].tlb_gen); + u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen); /* This code cannot presently handle being reentered. */ VM_WARN_ON(!irqs_disabled()); - VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[0].ctx_id) != + VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) != loaded_mm->context.ctx_id); if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(loaded_mm))) { @@ -280,7 +328,7 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f, } /* Both paths above update our state to mm_tlb_gen. */ - this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, mm_tlb_gen); + this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen); } static void flush_tlb_func_local(void *info, enum tlb_flush_reason reason) -- 2.9.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 669B66B0292 for ; Tue, 25 Jul 2017 00:47:31 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id r63so21505076pfb.7 for ; Mon, 24 Jul 2017 21:47:31 -0700 (PDT) Received: from mail-pf0-x241.google.com (mail-pf0-x241.google.com. [2607:f8b0:400e:c00::241]) by mx.google.com with ESMTPS id n189si7859445pgn.110.2017.07.24.21.47.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jul 2017 21:47:29 -0700 (PDT) Received: by mail-pf0-x241.google.com with SMTP id g69so3080614pfe.1 for ; Mon, 24 Jul 2017 21:47:29 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH v5 2/2] x86/mm: Improve TLB flush documentation From: Nadav Amit In-Reply-To: <695299daa67239284e8db5a60d4d7eb88c914e0a.1500957502.git.luto@kernel.org> Date: Mon, 24 Jul 2017 21:47:25 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <231630A0-21DB-4347-B126-F49AFD32B851@gmail.com> References: <695299daa67239284e8db5a60d4d7eb88c914e0a.1500957502.git.luto@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: X86 ML , LKML , Borislav Petkov , Linus Torvalds , Andrew Morton , Mel Gorman , "linux-mm@kvack.org" , Rik van Riel , Dave Hansen , Arjan van de Ven , Peter Zijlstra Andy Lutomirski wrote: > Improve comments as requested by PeterZ and also add some > documentation at the top of the file. >=20 > Signed-off-by: Andy Lutomirski > --- > arch/x86/mm/tlb.c | 43 +++++++++++++++++++++++++++++++++---------- > 1 file changed, 33 insertions(+), 10 deletions(-) >=20 > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index ce104b962a17..d4ee781ca656 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -15,17 +15,24 @@ > #include >=20 > /* > - * TLB flushing, formerly SMP-only > - * c/o Linus Torvalds. > + * The code in this file handles mm switches and TLB flushes. > * > - * These mean you can really definitely utterly forget about > - * writing to user space from interrupts. (Its not allowed anyway). > + * An mm's TLB state is logically represented by a totally ordered = sequence > + * of TLB flushes. Each flush increments the mm's tlb_gen. > * > - * Optimizations Manfred Spraul > + * Each CPU that might have an mm in its TLB (and that might ever use > + * those TLB entries) will have an entry for it in its = cpu_tlbstate.ctxs > + * array. The kernel maintains the following invariant: for each CPU = and > + * for each mm in its cpu_tlbstate.ctxs array, the CPU has performed = all > + * flushes in that mms history up to the tlb_gen in cpu_tlbstate.ctxs > + * or the CPU has performed an equivalent set of flushes. > * > - * More scalable flush, from Andi Kleen > - * > - * Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi > + * For this purpose, an equivalent set is a set that is at least as = strong. > + * So, for example, if the flush history is a full flush at time 1, > + * a full flush after time 1 is sufficient, but a full flush before = time 1 > + * is not. Similarly, any number of flushes can be replaced by a = single > + * full flush so long as that replacement flush is after all the = flushes > + * that it's replacing. > */ >=20 > atomic64_t last_mm_ctx_id =3D ATOMIC64_INIT(1); > @@ -138,7 +145,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, = struct mm_struct *next, > return; > } >=20 > - /* Resume remote flushes and then read tlb_gen. */ > + /* > + * Resume remote flushes and then read tlb_gen. The > + * implied barrier in atomic64_read() synchronizes > + * with inc_mm_tlb_gen() like this: You mean the implied memory barrier in cpumask_set_cpu(), no? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f69.google.com (mail-oi0-f69.google.com [209.85.218.69]) by kanga.kvack.org (Postfix) with ESMTP id CDCB46B0292 for ; Tue, 25 Jul 2017 01:43:51 -0400 (EDT) Received: by mail-oi0-f69.google.com with SMTP id b130so8014185oii.4 for ; Mon, 24 Jul 2017 22:43:51 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id j6si6340683oia.128.2017.07.24.22.43.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jul 2017 22:43:50 -0700 (PDT) Received: from mail-ua0-f182.google.com (mail-ua0-f182.google.com [209.85.217.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9BF7822CAA for ; Tue, 25 Jul 2017 05:43:49 +0000 (UTC) Received: by mail-ua0-f182.google.com with SMTP id f9so95087300uaf.4 for ; Mon, 24 Jul 2017 22:43:49 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <231630A0-21DB-4347-B126-F49AFD32B851@gmail.com> References: <695299daa67239284e8db5a60d4d7eb88c914e0a.1500957502.git.luto@kernel.org> <231630A0-21DB-4347-B126-F49AFD32B851@gmail.com> From: Andy Lutomirski Date: Mon, 24 Jul 2017 22:43:28 -0700 Message-ID: Subject: Re: [PATCH v5 2/2] x86/mm: Improve TLB flush documentation Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Nadav Amit Cc: Andy Lutomirski , X86 ML , LKML , Borislav Petkov , Linus Torvalds , Andrew Morton , Mel Gorman , "linux-mm@kvack.org" , Rik van Riel , Dave Hansen , Arjan van de Ven , Peter Zijlstra On Mon, Jul 24, 2017 at 9:47 PM, Nadav Amit wrote: > Andy Lutomirski wrote: > >> Improve comments as requested by PeterZ and also add some >> documentation at the top of the file. >> >> Signed-off-by: Andy Lutomirski >> --- >> arch/x86/mm/tlb.c | 43 +++++++++++++++++++++++++++++++++---------- >> 1 file changed, 33 insertions(+), 10 deletions(-) >> >> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >> index ce104b962a17..d4ee781ca656 100644 >> --- a/arch/x86/mm/tlb.c >> +++ b/arch/x86/mm/tlb.c >> @@ -15,17 +15,24 @@ >> #include >> >> /* >> - * TLB flushing, formerly SMP-only >> - * c/o Linus Torvalds. >> + * The code in this file handles mm switches and TLB flushes. >> * >> - * These mean you can really definitely utterly forget about >> - * writing to user space from interrupts. (Its not allowed anyway). >> + * An mm's TLB state is logically represented by a totally ordered sequence >> + * of TLB flushes. Each flush increments the mm's tlb_gen. >> * >> - * Optimizations Manfred Spraul >> + * Each CPU that might have an mm in its TLB (and that might ever use >> + * those TLB entries) will have an entry for it in its cpu_tlbstate.ctxs >> + * array. The kernel maintains the following invariant: for each CPU and >> + * for each mm in its cpu_tlbstate.ctxs array, the CPU has performed all >> + * flushes in that mms history up to the tlb_gen in cpu_tlbstate.ctxs >> + * or the CPU has performed an equivalent set of flushes. >> * >> - * More scalable flush, from Andi Kleen >> - * >> - * Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi >> + * For this purpose, an equivalent set is a set that is at least as strong. >> + * So, for example, if the flush history is a full flush at time 1, >> + * a full flush after time 1 is sufficient, but a full flush before time 1 >> + * is not. Similarly, any number of flushes can be replaced by a single >> + * full flush so long as that replacement flush is after all the flushes >> + * that it's replacing. >> */ >> >> atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1); >> @@ -138,7 +145,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, >> return; >> } >> >> - /* Resume remote flushes and then read tlb_gen. */ >> + /* >> + * Resume remote flushes and then read tlb_gen. The >> + * implied barrier in atomic64_read() synchronizes >> + * with inc_mm_tlb_gen() like this: > > You mean the implied memory barrier in cpumask_set_cpu(), no? > Ugh, yes. And I misread PeterZ's email and incorrectly removed the smp_mb__after_atomic(). I'll respin this patch. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by kanga.kvack.org (Postfix) with ESMTP id 7327D6B0292 for ; Tue, 25 Jul 2017 06:02:49 -0400 (EDT) Received: by mail-it0-f69.google.com with SMTP id c196so62153937itc.2 for ; Tue, 25 Jul 2017 03:02:49 -0700 (PDT) Received: from merlin.infradead.org (merlin.infradead.org. [2001:8b0:10b:1231::1]) by mx.google.com with ESMTPS id k82si1477406itb.186.2017.07.25.03.02.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Jul 2017 03:02:48 -0700 (PDT) Date: Tue, 25 Jul 2017 12:02:34 +0200 From: Peter Zijlstra Subject: Re: [PATCH v5 2/2] x86/mm: Improve TLB flush documentation Message-ID: <20170725100234.qbsuphozotivan3c@hirez.programming.kicks-ass.net> References: <695299daa67239284e8db5a60d4d7eb88c914e0a.1500957502.git.luto@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <695299daa67239284e8db5a60d4d7eb88c914e0a.1500957502.git.luto@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andy Lutomirski Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Borislav Petkov , Linus Torvalds , Andrew Morton , Mel Gorman , "linux-mm@kvack.org" , Nadav Amit , Rik van Riel , Dave Hansen , Arjan van de Ven On Mon, Jul 24, 2017 at 09:41:39PM -0700, Andy Lutomirski wrote: > + /* > + * Resume remote flushes and then read tlb_gen. The > + * implied barrier in atomic64_read() synchronizes There is no barrier in atomic64_read(). > + * with inc_mm_tlb_gen() like this: > + * > + * switch_mm_irqs_off(): flush request: > + * cpumask_set_cpu(...); inc_mm_tlb_gen(); > + * MB MB > + * atomic64_read(.tlb_gen); flush_tlb_others(mm_cpumask()); > + */ > cpumask_set_cpu(cpu, mm_cpumask(next)); > next_tlb_gen = atomic64_read(&next->context.tlb_gen); > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org