* [PATCH] x86,switch_mm: skip atomic operations for init_mm @ 2018-06-01 12:28 Rik van Riel 2018-06-01 15:11 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2018-06-01 12:28 UTC (permalink / raw) To: linux-kernel; +Cc: Song Liu, kernel-team, mingo, luto, tglx, x86 Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels,using 2.4% of a 48 CPU system during a netperf to localhost run. Digging into the profile, we noticed that cpumask_clear_cpu and cpumask_set_cpu together take about half of the CPU time taken by switch_mm_irqs_off. However, the CPUs running netperf end up switching back and forth between netperf and the idle task, which does not require changes to the mm_cpumask. Furthermore, the init_mm cpumask ends up being the most heavily contended one in the system.` Skipping cpumask_clear_cpu and cpumask_set_cpu for init_mm (mostly the idle task) reduced CPU use of switch_mm_irqs_off from 2.4% of the CPU to 1.9% of the CPU, with the following netperf commandline: ./super_netperf 300 -P 0 -t TCP_RR -p 8888 -H kerneltest008.09.atn1 -l 30 \ -- -r 300,300 -o -s 1M,1M -S 1M,1M perf output w/o this patch: 1.26% netserver [kernel.vmlinux] [k] switch_mm_irqs_off 1.17% swapper [kernel.vmlinux] [k] switch_mm_irqs_off perf output w/ this patch: 1.01% swapper [kernel.vmlinux] [k] switch_mm_irqs_off 0.88% netserver [kernel.vmlinux] [k] switch_mm_irqs_off Netperf throughput is about the same before and after. Signed-off-by: Rik van Riel <riel@surriel.com> Reported-and-tested-by: Song Liu <songliubraving@fb.com> --- arch/x86/mm/tlb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index e055d1a06699..c8f9c550f7ec 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -288,12 +288,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, /* Stop remote flushes for the previous mm */ VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu, mm_cpumask(real_prev)) && real_prev != &init_mm); - cpumask_clear_cpu(cpu, mm_cpumask(real_prev)); + if (real_prev != &init_mm) + cpumask_clear_cpu(cpu, mm_cpumask(real_prev)); /* * Start remote flushes and then read tlb_gen. */ - cpumask_set_cpu(cpu, mm_cpumask(next)); + if (next != &init_mm) + cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 12:28 [PATCH] x86,switch_mm: skip atomic operations for init_mm Rik van Riel @ 2018-06-01 15:11 ` Andy Lutomirski 2018-06-01 18:22 ` Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2018-06-01 15:11 UTC (permalink / raw) To: riel Cc: LKML, songliubraving, kernel-team, Ingo Molnar, Andrew Lutomirski, Thomas Gleixner, X86 ML, Peter Zijlstra, Mike Galbraith On Fri, Jun 1, 2018 at 5:28 AM Rik van Riel <riel@surriel.com> wrote: > > Song noticed switch_mm_irqs_off taking a lot of CPU time in recent > kernels,using 2.4% of a 48 CPU system during a netperf to localhost run. > Digging into the profile, we noticed that cpumask_clear_cpu and > cpumask_set_cpu together take about half of the CPU time taken by > switch_mm_irqs_off. > > However, the CPUs running netperf end up switching back and forth > between netperf and the idle task, which does not require changes > to the mm_cpumask. Furthermore, the init_mm cpumask ends up being > the most heavily contended one in the system.` > > Skipping cpumask_clear_cpu and cpumask_set_cpu for init_mm > (mostly the idle task) reduced CPU use of switch_mm_irqs_off > from 2.4% of the CPU to 1.9% of the CPU, with the following > netperf commandline: I'm conceptually fine with this change. Does mm_cpumask(&init_mm) end up in a deterministic state? Mike, depending on exactly what's going on with your benchmark, this might help recover a bit of your performance, too. --Andy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 15:11 ` Andy Lutomirski @ 2018-06-01 18:22 ` Rik van Riel 2018-06-01 18:48 ` Mike Galbraith 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2018-06-01 18:22 UTC (permalink / raw) To: Andy Lutomirski Cc: LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra, Mike Galbraith [-- Attachment #1: Type: text/plain, Size: 1639 bytes --] On Fri, 2018-06-01 at 08:11 -0700, Andy Lutomirski wrote: > On Fri, Jun 1, 2018 at 5:28 AM Rik van Riel <riel@surriel.com> wrote: > > > > Song noticed switch_mm_irqs_off taking a lot of CPU time in recent > > kernels,using 2.4% of a 48 CPU system during a netperf to localhost > > run. > > Digging into the profile, we noticed that cpumask_clear_cpu and > > cpumask_set_cpu together take about half of the CPU time taken by > > switch_mm_irqs_off. > > > > However, the CPUs running netperf end up switching back and forth > > between netperf and the idle task, which does not require changes > > to the mm_cpumask. Furthermore, the init_mm cpumask ends up being > > the most heavily contended one in the system.` > > > > Skipping cpumask_clear_cpu and cpumask_set_cpu for init_mm > > (mostly the idle task) reduced CPU use of switch_mm_irqs_off > > from 2.4% of the CPU to 1.9% of the CPU, with the following > > netperf commandline: > > I'm conceptually fine with this change. Does mm_cpumask(&init_mm) > end > up in a deterministic state? Given that we do not touch mm_cpumask(&init_mm) any more, and that bitmask never appears to be used for things like tlb shootdowns (kernel TLB shootdowns simply go to everybody), I suspect it ends up in whatever state it is initialized to on startup. I had not looked into this much, because it does not appear to be used for anything. > Mike, depending on exactly what's going on with your benchmark, this > might help recover a bit of your performance, too. It will be interesting to know how this change impacts others. -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 18:22 ` Rik van Riel @ 2018-06-01 18:48 ` Mike Galbraith 2018-06-01 19:43 ` Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Mike Galbraith @ 2018-06-01 18:48 UTC (permalink / raw) To: Rik van Riel, Andy Lutomirski Cc: LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra On Fri, 2018-06-01 at 14:22 -0400, Rik van Riel wrote: > On Fri, 2018-06-01 at 08:11 -0700, Andy Lutomirski wrote: > > On Fri, Jun 1, 2018 at 5:28 AM Rik van Riel <riel@surriel.com> wrote: > > > > > > Song noticed switch_mm_irqs_off taking a lot of CPU time in recent > > > kernels,using 2.4% of a 48 CPU system during a netperf to localhost > > > run. > > > Digging into the profile, we noticed that cpumask_clear_cpu and > > > cpumask_set_cpu together take about half of the CPU time taken by > > > switch_mm_irqs_off. > > > > > > However, the CPUs running netperf end up switching back and forth > > > between netperf and the idle task, which does not require changes > > > to the mm_cpumask. Furthermore, the init_mm cpumask ends up being > > > the most heavily contended one in the system.` > > > > > > Skipping cpumask_clear_cpu and cpumask_set_cpu for init_mm > > > (mostly the idle task) reduced CPU use of switch_mm_irqs_off > > > from 2.4% of the CPU to 1.9% of the CPU, with the following > > > netperf commandline: > > > > I'm conceptually fine with this change. Does mm_cpumask(&init_mm) > > end > > up in a deterministic state? > > Given that we do not touch mm_cpumask(&init_mm) > any more, and that bitmask never appears to be > used for things like tlb shootdowns (kernel TLB > shootdowns simply go to everybody), I suspect > it ends up in whatever state it is initialized > to on startup. > > I had not looked into this much, because it does > not appear to be used for anything. > > > Mike, depending on exactly what's going on with your benchmark, this > > might help recover a bit of your performance, too. > > It will be interesting to know how this change > impacts others. previous pipe-test numbers 4.13.16 2.024978 usecs/loop -- avg 2.045250 977.9 KHz 4.14.47 2.234518 usecs/loop -- avg 2.227716 897.8 KHz 4.15.18 2.287815 usecs/loop -- avg 2.295858 871.1 KHz 4.16.13 2.286036 usecs/loop -- avg 2.279057 877.6 KHz 4.17.0.g88a8676 2.288231 usecs/loop -- avg 2.288917 873.8 KHz new numbers 4.17.0.g0512e01 2.268629 usecs/loop -- avg 2.269493 881.3 KHz 4.17.0.g0512e01 2.035401 usecs/loop -- avg 2.038341 981.2 KHz +andy 4.17.0.g0512e01 2.238701 usecs/loop -- avg 2.231828 896.1 KHz -andy+rik There might be something there with your change Rik, but it's small enough to be wary of variance. Andy's "invert the return of tlb_defer_switch_to_init_mm()" is OTOH pretty clear. -Mike ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 18:48 ` Mike Galbraith @ 2018-06-01 19:43 ` Rik van Riel 2018-06-01 20:03 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2018-06-01 19:43 UTC (permalink / raw) To: Mike Galbraith, Andy Lutomirski Cc: LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 3080 bytes --] On Fri, 2018-06-01 at 20:48 +0200, Mike Galbraith wrote: > On Fri, 2018-06-01 at 14:22 -0400, Rik van Riel wrote: > > On Fri, 2018-06-01 at 08:11 -0700, Andy Lutomirski wrote: > > > On Fri, Jun 1, 2018 at 5:28 AM Rik van Riel <riel@surriel.com> > > > wrote: > > > > > > > > Song noticed switch_mm_irqs_off taking a lot of CPU time in > > > > recent > > > > kernels,using 2.4% of a 48 CPU system during a netperf to > > > > localhost > > > > run. > > > > Digging into the profile, we noticed that cpumask_clear_cpu and > > > > cpumask_set_cpu together take about half of the CPU time taken > > > > by > > > > switch_mm_irqs_off. > > > > > > > > However, the CPUs running netperf end up switching back and > > > > forth > > > > between netperf and the idle task, which does not require > > > > changes > > > > to the mm_cpumask. Furthermore, the init_mm cpumask ends up > > > > being > > > > the most heavily contended one in the system.` > > > > > > > > Skipping cpumask_clear_cpu and cpumask_set_cpu for init_mm > > > > (mostly the idle task) reduced CPU use of switch_mm_irqs_off > > > > from 2.4% of the CPU to 1.9% of the CPU, with the following > > > > netperf commandline: > > > > > > I'm conceptually fine with this change. Does > > > mm_cpumask(&init_mm) > > > end > > > up in a deterministic state? > > > > Given that we do not touch mm_cpumask(&init_mm) > > any more, and that bitmask never appears to be > > used for things like tlb shootdowns (kernel TLB > > shootdowns simply go to everybody), I suspect > > it ends up in whatever state it is initialized > > to on startup. > > > > I had not looked into this much, because it does > > not appear to be used for anything. > > > > > Mike, depending on exactly what's going on with your benchmark, > > > this > > > might help recover a bit of your performance, too. > > > > It will be interesting to know how this change > > impacts others. > > previous pipe-test numbers > 4.13.16 2.024978 usecs/loop -- avg 2.045250 977.9 KHz > 4.14.47 2.234518 usecs/loop -- avg 2.227716 897.8 KHz > 4.15.18 2.287815 usecs/loop -- avg 2.295858 871.1 KHz > 4.16.13 2.286036 usecs/loop -- avg 2.279057 877.6 KHz > 4.17.0.g88a8676 2.288231 usecs/loop -- avg 2.288917 873.8 KHz > > new numbers > 4.17.0.g0512e01 2.268629 usecs/loop -- avg 2.269493 881.3 KHz > 4.17.0.g0512e01 2.035401 usecs/loop -- avg 2.038341 981.2 KHz +andy > 4.17.0.g0512e01 2.238701 usecs/loop -- avg 2.231828 896.1 KHz > -andy+rik > > There might be something there with your change Rik, but it's small > enough to be wary of variance. Andy's "invert the return of > tlb_defer_switch_to_init_mm()" is OTOH pretty clear. If inverting the return value of that function helps some systems, chances are the other value might help other systems. That makes you wonder whether it might make sense to always switch to lazy TLB mode, and only call switch_mm at TLB flush time, regardless of whether the CPU supports PCID... -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 19:43 ` Rik van Riel @ 2018-06-01 20:03 ` Andy Lutomirski 2018-06-01 20:35 ` Rik van Riel 2018-06-02 3:39 ` Mike Galbraith 0 siblings, 2 replies; 18+ messages in thread From: Andy Lutomirski @ 2018-06-01 20:03 UTC (permalink / raw) To: riel Cc: Mike Galbraith, Andrew Lutomirski, LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra On Fri, Jun 1, 2018 at 12:43 PM Rik van Riel <riel@surriel.com> wrote: > > On Fri, 2018-06-01 at 20:48 +0200, Mike Galbraith wrote: > > On Fri, 2018-06-01 at 14:22 -0400, Rik van Riel wrote: > > > On Fri, 2018-06-01 at 08:11 -0700, Andy Lutomirski wrote: > > > > On Fri, Jun 1, 2018 at 5:28 AM Rik van Riel <riel@surriel.com> > > > > wrote: > > > > > > > > > > Song noticed switch_mm_irqs_off taking a lot of CPU time in > > > > > recent > > > > > kernels,using 2.4% of a 48 CPU system during a netperf to > > > > > localhost > > > > > run. > > > > > Digging into the profile, we noticed that cpumask_clear_cpu and > > > > > cpumask_set_cpu together take about half of the CPU time taken > > > > > by > > > > > switch_mm_irqs_off. > > > > > > > > > > However, the CPUs running netperf end up switching back and > > > > > forth > > > > > between netperf and the idle task, which does not require > > > > > changes > > > > > to the mm_cpumask. Furthermore, the init_mm cpumask ends up > > > > > being > > > > > the most heavily contended one in the system.` > > > > > > > > > > Skipping cpumask_clear_cpu and cpumask_set_cpu for init_mm > > > > > (mostly the idle task) reduced CPU use of switch_mm_irqs_off > > > > > from 2.4% of the CPU to 1.9% of the CPU, with the following > > > > > netperf commandline: > > > > > > > > I'm conceptually fine with this change. Does > > > > mm_cpumask(&init_mm) > > > > end > > > > up in a deterministic state? > > > > > > Given that we do not touch mm_cpumask(&init_mm) > > > any more, and that bitmask never appears to be > > > used for things like tlb shootdowns (kernel TLB > > > shootdowns simply go to everybody), I suspect > > > it ends up in whatever state it is initialized > > > to on startup. > > > > > > I had not looked into this much, because it does > > > not appear to be used for anything. > > > > > > > Mike, depending on exactly what's going on with your benchmark, > > > > this > > > > might help recover a bit of your performance, too. > > > > > > It will be interesting to know how this change > > > impacts others. > > > > previous pipe-test number > > 4.13.16 2.024978 usecs/loop -- avg 2.045250 977.9 KHz > > 4.14.47 2.234518 usecs/loop -- avg 2.227716 897.8 KHz > > 4.15.18 2.287815 usecs/loop -- avg 2.295858 871.1 KHz > > 4.16.13 2.286036 usecs/loop -- avg 2.279057 877.6 KHz > > 4.17.0.g88a8676 2.288231 usecs/loop -- avg 2.288917 873.8 KHz > > > > new numbers > > 4.17.0.g0512e01 2.268629 usecs/loop -- avg 2.269493 881.3 KHz > > 4.17.0.g0512e01 2.035401 usecs/loop -- avg 2.038341 981.2 KHz +andy > > 4.17.0.g0512e01 2.238701 usecs/loop -- avg 2.231828 896.1 KHz > > -andy+rik > > > > There might be something there with your change Rik, but it's small > > enough to be wary of variance. Andy's "invert the return of > > tlb_defer_switch_to_init_mm()" is OTOH pretty clear. > > If inverting the return value of that function helps > some systems, chances are the other value might help > other systems. > > That makes you wonder whether it might make sense > to always switch to lazy TLB mode, and only call > switch_mm at TLB flush time, regardless of whether > the CPU supports PCID... > Mike, you never did say: do you have PCID on your CPU? Also, what is your workload doing to cause so many switches back and forth between init_mm and a task. The point of the optimization is that switching to init_mm() should be fairly fast on a PCID system, whereas an IPI to do the deferred flush is very expensive regardless of PCID. I wonder if we could do something fancy where we stay on the task mm for short idles but switch to init_mm before going deeply idle. We'd also want to switch to init_mm when we go idle due to migrating the previously running task, I think. Previously, it was hard to make any decisions based on the target idle state because the idle code chose its target state so late in the idling process that the CPU was halfway shut down already. But I think this is fixed now. --Andy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 20:03 ` Andy Lutomirski @ 2018-06-01 20:35 ` Rik van Riel 2018-06-01 21:21 ` Andy Lutomirski 2018-06-02 3:39 ` Mike Galbraith 1 sibling, 1 reply; 18+ messages in thread From: Rik van Riel @ 2018-06-01 20:35 UTC (permalink / raw) To: Andy Lutomirski Cc: Mike Galbraith, LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 798 bytes --] On Fri, 2018-06-01 at 13:03 -0700, Andy Lutomirski wrote: > Mike, you never did say: do you have PCID on your CPU? Also, what is > your workload doing to cause so many switches back and forth between > init_mm and a task. > > The point of the optimization is that switching to init_mm() should > be > fairly fast on a PCID system, whereas an IPI to do the deferred flush > is very expensive regardless of PCID. While I am sure that bit is true, Song and I observed about 4x as much CPU use in the atomic operations in cpumask_clear_cpu and cpumask_set_cpu (inside switch_mm_irqs_off) as we saw CPU used in the %cr3 reload itself. Given how expensive those cpumask updates are, lazy TLB mode might always be worth it, especially on larger systems. -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 20:35 ` Rik van Riel @ 2018-06-01 21:21 ` Andy Lutomirski 2018-06-01 22:13 ` Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2018-06-01 21:21 UTC (permalink / raw) To: riel Cc: Andrew Lutomirski, Mike Galbraith, LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra On Fri, Jun 1, 2018 at 1:35 PM Rik van Riel <riel@surriel.com> wrote: > > On Fri, 2018-06-01 at 13:03 -0700, Andy Lutomirski wrote: > > Mike, you never did say: do you have PCID on your CPU? Also, what is > > your workload doing to cause so many switches back and forth between > > init_mm and a task. > > > > The point of the optimization is that switching to init_mm() should > > be > > fairly fast on a PCID system, whereas an IPI to do the deferred flush > > is very expensive regardless of PCID. > > While I am sure that bit is true, Song and I > observed about 4x as much CPU use in the atomic > operations in cpumask_clear_cpu and cpumask_set_cpu > (inside switch_mm_irqs_off) as we saw CPU used > in the %cr3 reload itself. > > Given how expensive those cpumask updates are, > lazy TLB mode might always be worth it, especially > on larger systems. > Hmm. I wonder if there's a more clever data structure than a bitmap that we could be using here. Each CPU only ever needs to be in one mm's cpumask, and each cpu only ever changes its own state in the bitmask. And writes are much less common than reads for most workloads. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 21:21 ` Andy Lutomirski @ 2018-06-01 22:13 ` Rik van Riel 2018-06-02 3:35 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2018-06-01 22:13 UTC (permalink / raw) To: Andy Lutomirski Cc: Mike Galbraith, LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra On Fri, 1 Jun 2018 14:21:58 -0700 Andy Lutomirski <luto@kernel.org> wrote: > Hmm. I wonder if there's a more clever data structure than a bitmap > that we could be using here. Each CPU only ever needs to be in one > mm's cpumask, and each cpu only ever changes its own state in the > bitmask. And writes are much less common than reads for most > workloads. It would be easy enough to add an mm_struct pointer to the per-cpu tlbstate struct, and iterate over those. However, that would be an orthogonal change to optimizing lazy TLB mode. Does the (untested) patch below make sense as a potential improvement to the lazy TLB heuristic? ---8<--- Subject: x86,tlb: workload dependent per CPU lazy TLB switch Lazy TLB mode is a tradeoff between flushing the TLB and touching the mm_cpumask(&init_mm) at context switch time, versus potentially incurring a remote TLB flush IPI while in lazy TLB mode. Whether this pays off is likely to be workload dependent more than anything else. However, the current heuristic keys off hardware type. This patch changes the lazy TLB mode heuristic to a dynamic, per-CPU decision, dependent on whether we recently received a remote TLB shootdown while in lazy TLB mode. This is a very simple heuristic. When a CPU receives a remote TLB shootdown IPI while in lazy TLB mode, a counter in the same cache line is set to 16. Every time we skip lazy TLB mode, the counter is decremented. While the counter is zero (no recent TLB flush IPIs), allow lazy TLB mode. Signed-off-by: Rik van Riel <riel@surriel.com> --- arch/x86/include/asm/tlbflush.h | 32 ++++++++++++++++---------------- arch/x86/mm/tlb.c | 4 ++++ 2 files changed, 20 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 6690cd3fc8b1..f06a934e317d 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -148,22 +148,6 @@ static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid) #define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr) #endif -static inline bool tlb_defer_switch_to_init_mm(void) -{ - /* - * If we have PCID, then switching to init_mm is reasonably - * fast. If we don't have PCID, then switching to init_mm is - * quite slow, so we try to defer it in the hopes that we can - * avoid it entirely. The latter approach runs the risk of - * receiving otherwise unnecessary IPIs. - * - * This choice is just a heuristic. The tlb code can handle this - * function returning true or false regardless of whether we have - * PCID. - */ - return !static_cpu_has(X86_FEATURE_PCID); -} - struct tlb_context { u64 ctx_id; u64 tlb_gen; @@ -179,6 +163,7 @@ struct tlb_state { struct mm_struct *loaded_mm; u16 loaded_mm_asid; u16 next_asid; + u16 flushed_while_lazy; /* last user mm's ctx id */ u64 last_ctx_id; @@ -246,6 +231,21 @@ struct tlb_state { }; DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate); +static inline bool tlb_defer_switch_to_init_mm(void) +{ + /* + * If the CPU recently received a TLB flush IPI while in lazy + * TLB mode, do a straight switch to the idle task, and skip + * lazy TLB mode for now. + */ + if (this_cpu_read(cpu_tlbstate.flushed_while_lazy)) { + this_cpu_dec(cpu_tlbstate.flushed_while_lazy); + return false; + } + + return true; +} + /* Initialize cr4 shadow for this CPU. */ static inline void cr4_init_shadow(void) { diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index e055d1a06699..d8b0b7b236f3 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -454,7 +454,11 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f, * paging-structure cache to avoid speculatively reading * garbage into our TLB. Since switching to init_mm is barely * slower than a minimal flush, just switch to init_mm. + * + * Skip lazy TLB mode for the next 16 context switches, + * in case more TLB flush IPIs are coming. */ + this_cpu_write(cpu_tlbstate.flushed_while_lazy, 16); switch_mm_irqs_off(NULL, &init_mm, NULL); return; } ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 22:13 ` Rik van Riel @ 2018-06-02 3:35 ` Andy Lutomirski 2018-06-02 5:04 ` Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2018-06-02 3:35 UTC (permalink / raw) To: riel Cc: Andrew Lutomirski, Mike Galbraith, LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra On Fri, Jun 1, 2018 at 3:13 PM Rik van Riel <riel@surriel.com> wrote: > > On Fri, 1 Jun 2018 14:21:58 -0700 > Andy Lutomirski <luto@kernel.org> wrote: > > > Hmm. I wonder if there's a more clever data structure than a bitmap > > that we could be using here. Each CPU only ever needs to be in one > > mm's cpumask, and each cpu only ever changes its own state in the > > bitmask. And writes are much less common than reads for most > > workloads. > > It would be easy enough to add an mm_struct pointer to the > per-cpu tlbstate struct, and iterate over those. > > However, that would be an orthogonal change to optimizing > lazy TLB mode. > > Does the (untested) patch below make sense as a potential > improvement to the lazy TLB heuristic? > > ---8<--- > Subject: x86,tlb: workload dependent per CPU lazy TLB switch > > Lazy TLB mode is a tradeoff between flushing the TLB and touching > the mm_cpumask(&init_mm) at context switch time, versus potentially > incurring a remote TLB flush IPI while in lazy TLB mode. > > Whether this pays off is likely to be workload dependent more than > anything else. However, the current heuristic keys off hardware type. > > This patch changes the lazy TLB mode heuristic to a dynamic, per-CPU > decision, dependent on whether we recently received a remote TLB > shootdown while in lazy TLB mode. > > This is a very simple heuristic. When a CPU receives a remote TLB > shootdown IPI while in lazy TLB mode, a counter in the same cache > line is set to 16. Every time we skip lazy TLB mode, the counter > is decremented. > > While the counter is zero (no recent TLB flush IPIs), allow lazy TLB mode. Hmm, cute. That's not a bad idea at all. It would be nice to get some kind of real benchmark on both PCID and !PCID. If nothing else, I would expect the threshold (16 in your patch) to want to be lower on PCID systems. --Andy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-02 3:35 ` Andy Lutomirski @ 2018-06-02 5:04 ` Rik van Riel 2018-06-02 20:14 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2018-06-02 5:04 UTC (permalink / raw) To: Andy Lutomirski Cc: Mike Galbraith, LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 2493 bytes --] On Fri, 2018-06-01 at 20:35 -0700, Andy Lutomirski wrote: > On Fri, Jun 1, 2018 at 3:13 PM Rik van Riel <riel@surriel.com> wrote: > > > > On Fri, 1 Jun 2018 14:21:58 -0700 > > Andy Lutomirski <luto@kernel.org> wrote: > > > > > Hmm. I wonder if there's a more clever data structure than a > > > bitmap > > > that we could be using here. Each CPU only ever needs to be in > > > one > > > mm's cpumask, and each cpu only ever changes its own state in the > > > bitmask. And writes are much less common than reads for most > > > workloads. > > > > It would be easy enough to add an mm_struct pointer to the > > per-cpu tlbstate struct, and iterate over those. > > > > However, that would be an orthogonal change to optimizing > > lazy TLB mode. > > > > Does the (untested) patch below make sense as a potential > > improvement to the lazy TLB heuristic? > > > > ---8<--- > > Subject: x86,tlb: workload dependent per CPU lazy TLB switch > > > > Lazy TLB mode is a tradeoff between flushing the TLB and touching > > the mm_cpumask(&init_mm) at context switch time, versus potentially > > incurring a remote TLB flush IPI while in lazy TLB mode. > > > > Whether this pays off is likely to be workload dependent more than > > anything else. However, the current heuristic keys off hardware > > type. > > > > This patch changes the lazy TLB mode heuristic to a dynamic, per- > > CPU > > decision, dependent on whether we recently received a remote TLB > > shootdown while in lazy TLB mode. > > > > This is a very simple heuristic. When a CPU receives a remote TLB > > shootdown IPI while in lazy TLB mode, a counter in the same cache > > line is set to 16. Every time we skip lazy TLB mode, the counter > > is decremented. > > > > While the counter is zero (no recent TLB flush IPIs), allow lazy > > TLB mode. > > Hmm, cute. That's not a bad idea at all. It would be nice to get > some kind of real benchmark on both PCID and !PCID. If nothing else, > I would expect the threshold (16 in your patch) to want to be lower > on > PCID systems. That depends on how well we manage to get rid of the cpumask manipulation overhead. On the PCID system we first found this issue, the atomic accesses to the mm_cpumask took about 4x as much CPU time as the TLB invalidation itself. That kinda limits how much the cost of cheaper TLB flushes actually help :) I agree this code should get some testing. -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-02 5:04 ` Rik van Riel @ 2018-06-02 20:14 ` Andy Lutomirski 2018-06-03 0:51 ` Song Liu 0 siblings, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2018-06-02 20:14 UTC (permalink / raw) To: riel Cc: Andrew Lutomirski, Mike Galbraith, LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra On Fri, Jun 1, 2018 at 10:04 PM Rik van Riel <riel@surriel.com> wrote: > > On Fri, 2018-06-01 at 20:35 -0700, Andy Lutomirski wrote: > > On Fri, Jun 1, 2018 at 3:13 PM Rik van Riel <riel@surriel.com> wrote: > > > > > > On Fri, 1 Jun 2018 14:21:58 -0700 > > > Andy Lutomirski <luto@kernel.org> wrote: > > > > > > > Hmm. I wonder if there's a more clever data structure than a > > > > bitmap > > > > that we could be using here. Each CPU only ever needs to be in > > > > one > > > > mm's cpumask, and each cpu only ever changes its own state in the > > > > bitmask. And writes are much less common than reads for most > > > > workloads. > > > > > > It would be easy enough to add an mm_struct pointer to the > > > per-cpu tlbstate struct, and iterate over those. > > > > > > However, that would be an orthogonal change to optimizing > > > lazy TLB mode. > > > > > > Does the (untested) patch below make sense as a potential > > > improvement to the lazy TLB heuristic? > > > > > > ---8<--- > > > Subject: x86,tlb: workload dependent per CPU lazy TLB switch > > > > > > Lazy TLB mode is a tradeoff between flushing the TLB and touching > > > the mm_cpumask(&init_mm) at context switch time, versus potentially > > > incurring a remote TLB flush IPI while in lazy TLB mode. > > > > > > Whether this pays off is likely to be workload dependent more than > > > anything else. However, the current heuristic keys off hardware > > > type. > > > > > > This patch changes the lazy TLB mode heuristic to a dynamic, per- > > > CPU > > > decision, dependent on whether we recently received a remote TLB > > > shootdown while in lazy TLB mode. > > > > > > This is a very simple heuristic. When a CPU receives a remote TLB > > > shootdown IPI while in lazy TLB mode, a counter in the same cache > > > line is set to 16. Every time we skip lazy TLB mode, the counter > > > is decremented. > > > > > > While the counter is zero (no recent TLB flush IPIs), allow lazy > > > TLB mode. > > > > Hmm, cute. That's not a bad idea at all. It would be nice to get > > some kind of real benchmark on both PCID and !PCID. If nothing else, > > I would expect the threshold (16 in your patch) to want to be lower > > on > > PCID systems. > > That depends on how well we manage to get rid of > the cpumask manipulation overhead. On the PCID > system we first found this issue, the atomic > accesses to the mm_cpumask took about 4x as much > CPU time as the TLB invalidation itself. > > That kinda limits how much the cost of cheaper > TLB flushes actually help :) > > I agree this code should get some testing. > Just to check: in the workload where you're seeing this problem, are you using an mm with many threads? I would imagine that, if you only have one or two threads, the bit operations aren't so bad. I wonder if just having a whole cacheline per node for the cpumask would solve the problem. I don't love the idea of having every flush operation scan cpu_tlbstate for every single CPU -- we'll end up with nasty contention on the cpu_tlbstate cache lines on some workloads. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-02 20:14 ` Andy Lutomirski @ 2018-06-03 0:51 ` Song Liu 2018-06-03 1:38 ` Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Song Liu @ 2018-06-03 0:51 UTC (permalink / raw) To: Andy Lutomirski Cc: riel, Mike Galbraith, LKML, Kernel Team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra > On Jun 2, 2018, at 1:14 PM, Andy Lutomirski <luto@kernel.org> wrote: > > On Fri, Jun 1, 2018 at 10:04 PM Rik van Riel <riel@surriel.com> wrote: >> >> On Fri, 2018-06-01 at 20:35 -0700, Andy Lutomirski wrote: >>> On Fri, Jun 1, 2018 at 3:13 PM Rik van Riel <riel@surriel.com> wrote: >>>> >>>> On Fri, 1 Jun 2018 14:21:58 -0700 >>>> Andy Lutomirski <luto@kernel.org> wrote: >>>> >>>>> Hmm. I wonder if there's a more clever data structure than a >>>>> bitmap >>>>> that we could be using here. Each CPU only ever needs to be in >>>>> one >>>>> mm's cpumask, and each cpu only ever changes its own state in the >>>>> bitmask. And writes are much less common than reads for most >>>>> workloads. >>>> >>>> It would be easy enough to add an mm_struct pointer to the >>>> per-cpu tlbstate struct, and iterate over those. >>>> >>>> However, that would be an orthogonal change to optimizing >>>> lazy TLB mode. >>>> >>>> Does the (untested) patch below make sense as a potential >>>> improvement to the lazy TLB heuristic? >>>> >>>> ---8<--- >>>> Subject: x86,tlb: workload dependent per CPU lazy TLB switch >>>> >>>> Lazy TLB mode is a tradeoff between flushing the TLB and touching >>>> the mm_cpumask(&init_mm) at context switch time, versus potentially >>>> incurring a remote TLB flush IPI while in lazy TLB mode. >>>> >>>> Whether this pays off is likely to be workload dependent more than >>>> anything else. However, the current heuristic keys off hardware >>>> type. >>>> >>>> This patch changes the lazy TLB mode heuristic to a dynamic, per- >>>> CPU >>>> decision, dependent on whether we recently received a remote TLB >>>> shootdown while in lazy TLB mode. >>>> >>>> This is a very simple heuristic. When a CPU receives a remote TLB >>>> shootdown IPI while in lazy TLB mode, a counter in the same cache >>>> line is set to 16. Every time we skip lazy TLB mode, the counter >>>> is decremented. >>>> >>>> While the counter is zero (no recent TLB flush IPIs), allow lazy >>>> TLB mode. >>> >>> Hmm, cute. That's not a bad idea at all. It would be nice to get >>> some kind of real benchmark on both PCID and !PCID. If nothing else, >>> I would expect the threshold (16 in your patch) to want to be lower >>> on >>> PCID systems. >> >> That depends on how well we manage to get rid of >> the cpumask manipulation overhead. On the PCID >> system we first found this issue, the atomic >> accesses to the mm_cpumask took about 4x as much >> CPU time as the TLB invalidation itself. >> >> That kinda limits how much the cost of cheaper >> TLB flushes actually help :) >> >> I agree this code should get some testing. >> > > Just to check: in the workload where you're seeing this problem, are > you using an mm with many threads? I would imagine that, if you only > have one or two threads, the bit operations aren't so bad. Yes, we are running netperf/netserver with 300 threads. We don't see this much overhead in with real workload. Here are some test results. The tests are on 2-socket system with E5-2678 v3, so 48 logical cores with PCID. I tested 3 kernel (and tunes flushed_while_lazy): Baseline: 0512e01345 in upstream Baseline + lazy-tlb: 0512e01345 + "x86,tlb: workload dependent per CPU lazy TLB switch" Baseline + both patches: 0512e01345 + "x86,tlb: workload dependent per CPU lazy TLB switch" + "x86,switch_mm: skip atomic operations for init_mm" The tests are wrong with the following options: ./super_netperf 300 -P 0 -t TCP_RR -p 8888 -H <target_host> -l 60 -- -r 100,100 -o -s 1M,1M -S 1M,1M There are the results: flushed_while_lazy Throughput %-cpu-on-switch_mm_irqs_off() Baseline N/A 2.02756e+06 2.82% (1.54% in ctx of netserver + 1.28% in ctx of swapper) Baseline + lazy-tlb 16 1.92706e+06 1.19% (only in ctx of swapper, same for all cases below) Baseline + lazy-tlb 4 2.03863e+06 1.12% Good option 1 Baseline + lazy-tlb 2 1.93847e+06 1.22% Baseline + both patches 64 1.92219e+06 1.09% Baseline + both patches 16 1.93597e+06 1.15% Baseline + both patches 8 1.92175e+06 1.11% Baseline + both patches 4 2.03465e+06 1.09% Good option 2 Baseline + both patches 2 2.03603e+06 1.10% Good option 3 Baseline + both patches 1 1.9241e+06 1.06% I highlighted 3 good options above. They are about the same for this test case. Thanks, Song ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-03 0:51 ` Song Liu @ 2018-06-03 1:38 ` Rik van Riel 2018-06-06 18:17 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2018-06-03 1:38 UTC (permalink / raw) To: Song Liu, Andy Lutomirski Cc: Mike Galbraith, LKML, Kernel Team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 877 bytes --] On Sun, 2018-06-03 at 00:51 +0000, Song Liu wrote: > > Just to check: in the workload where you're seeing this problem, > > are > > you using an mm with many threads? I would imagine that, if you > > only > > have one or two threads, the bit operations aren't so bad. > > Yes, we are running netperf/netserver with 300 threads. We don't see > this much overhead in with real workload. We may not, but there are some crazy workloads out there in the world. Think of some Java programs with thousands of threads, causing a million context switches a second on a large system. I like Andy's idea of having one cache line with a cpumask per node. That seems like it will have fewer downsides for tasks with fewer threads running on giant systems. I'll throw out the code I was working on, and look into implementing that :) -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-03 1:38 ` Rik van Riel @ 2018-06-06 18:17 ` Andy Lutomirski 2018-06-06 19:00 ` Rik van Riel 0 siblings, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2018-06-06 18:17 UTC (permalink / raw) To: Rik van Riel Cc: songliubraving, Andrew Lutomirski, Mike Galbraith, LKML, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra On Sat, Jun 2, 2018 at 6:38 PM Rik van Riel <riel@surriel.com> wrote: > > On Sun, 2018-06-03 at 00:51 +0000, Song Liu wrote: > > > > Just to check: in the workload where you're seeing this problem, > > > are > > > you using an mm with many threads? I would imagine that, if you > > > only > > > have one or two threads, the bit operations aren't so bad. > > > > Yes, we are running netperf/netserver with 300 threads. We don't see > > this much overhead in with real workload. > > We may not, but there are some crazy workloads out > there in the world. Think of some Java programs with > thousands of threads, causing a million context > switches a second on a large system. > > I like Andy's idea of having one cache line with > a cpumask per node. That seems like it will have > fewer downsides for tasks with fewer threads running > on giant systems. > > I'll throw out the code I was working on, and look > into implementing that :) > I'm not sure you should throw your patch out. It's a decent idea, too. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-06 18:17 ` Andy Lutomirski @ 2018-06-06 19:00 ` Rik van Riel 2018-06-06 19:23 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Rik van Riel @ 2018-06-06 19:00 UTC (permalink / raw) To: Andy Lutomirski Cc: songliubraving, Mike Galbraith, LKML, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 1357 bytes --] On Wed, 2018-06-06 at 11:17 -0700, Andy Lutomirski wrote: > On Sat, Jun 2, 2018 at 6:38 PM Rik van Riel <riel@surriel.com> wrote: > > > > On Sun, 2018-06-03 at 00:51 +0000, Song Liu wrote: > > > > > > Just to check: in the workload where you're seeing this > > > > problem, > > > > are > > > > you using an mm with many threads? I would imagine that, if > > > > you > > > > only > > > > have one or two threads, the bit operations aren't so bad. > > > > > > Yes, we are running netperf/netserver with 300 threads. We don't > > > see > > > this much overhead in with real workload. > > > > We may not, but there are some crazy workloads out > > there in the world. Think of some Java programs with > > thousands of threads, causing a million context > > switches a second on a large system. > > > > I like Andy's idea of having one cache line with > > a cpumask per node. That seems like it will have > > fewer downsides for tasks with fewer threads running > > on giant systems. > > > > I'll throw out the code I was working on, and look > > into implementing that :) > > > > I'm not sure you should throw your patch out. It's a decent idea, > too. Oh, I still have it saved, but the cpumask per NUMA node looks like it could have a big impact, with less guesswork or side effects. -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-06 19:00 ` Rik van Riel @ 2018-06-06 19:23 ` Andy Lutomirski 0 siblings, 0 replies; 18+ messages in thread From: Andy Lutomirski @ 2018-06-06 19:23 UTC (permalink / raw) To: Rik van Riel Cc: Andrew Lutomirski, songliubraving, Mike Galbraith, LKML, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra On Wed, Jun 6, 2018 at 12:00 PM Rik van Riel <riel@surriel.com> wrote: > > On Wed, 2018-06-06 at 11:17 -0700, Andy Lutomirski wrote: > > On Sat, Jun 2, 2018 at 6:38 PM Rik van Riel <riel@surriel.com> wrote: > > > > > > On Sun, 2018-06-03 at 00:51 +0000, Song Liu wrote: > > > > > > > > Just to check: in the workload where you're seeing this > > > > > problem, > > > > > are > > > > > you using an mm with many threads? I would imagine that, if > > > > > you > > > > > only > > > > > have one or two threads, the bit operations aren't so bad. > > > > > > > > Yes, we are running netperf/netserver with 300 threads. We don't > > > > see > > > > this much overhead in with real workload. > > > > > > We may not, but there are some crazy workloads out > > > there in the world. Think of some Java programs with > > > thousands of threads, causing a million context > > > switches a second on a large system. > > > > > > I like Andy's idea of having one cache line with > > > a cpumask per node. That seems like it will have > > > fewer downsides for tasks with fewer threads running > > > on giant systems. > > > > > > I'll throw out the code I was working on, and look > > > into implementing that :) > > > > > > > I'm not sure you should throw your patch out. It's a decent idea, > > too. > > Oh, I still have it saved, but the cpumask per > NUMA node looks like it could have a big impact, > with less guesswork or side effects. > Also, even with your other patch, we'd still have a win from the improved data structure -- switching back and forth between init_mm and something else is definitely not the only time we hammer the cpumask cache lines. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm 2018-06-01 20:03 ` Andy Lutomirski 2018-06-01 20:35 ` Rik van Riel @ 2018-06-02 3:39 ` Mike Galbraith 1 sibling, 0 replies; 18+ messages in thread From: Mike Galbraith @ 2018-06-02 3:39 UTC (permalink / raw) To: Andy Lutomirski, riel Cc: LKML, songliubraving, kernel-team, Ingo Molnar, Thomas Gleixner, X86 ML, Peter Zijlstra On Fri, 2018-06-01 at 13:03 -0700, Andy Lutomirski wrote: > > Mike, you never did say: do you have PCID on your CPU? Yes. > Also, what is > your workload doing to cause so many switches back and forth between > init_mm and a task. pipe-test measures pipe round trip, does nearly nothing but schedule. -Mike ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2018-06-06 19:23 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-06-01 12:28 [PATCH] x86,switch_mm: skip atomic operations for init_mm Rik van Riel 2018-06-01 15:11 ` Andy Lutomirski 2018-06-01 18:22 ` Rik van Riel 2018-06-01 18:48 ` Mike Galbraith 2018-06-01 19:43 ` Rik van Riel 2018-06-01 20:03 ` Andy Lutomirski 2018-06-01 20:35 ` Rik van Riel 2018-06-01 21:21 ` Andy Lutomirski 2018-06-01 22:13 ` Rik van Riel 2018-06-02 3:35 ` Andy Lutomirski 2018-06-02 5:04 ` Rik van Riel 2018-06-02 20:14 ` Andy Lutomirski 2018-06-03 0:51 ` Song Liu 2018-06-03 1:38 ` Rik van Riel 2018-06-06 18:17 ` Andy Lutomirski 2018-06-06 19:00 ` Rik van Riel 2018-06-06 19:23 ` Andy Lutomirski 2018-06-02 3:39 ` Mike Galbraith
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.