From: Andy Lutomirski <luto@kernel.org> To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, Borislav Petkov <bp@alien8.de>, Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton <akpm@linux-foundation.org>, Mel Gorman <mgorman@suse.de>, "linux-mm@kvack.org" <linux-mm@kvack.org>, Nadav Amit <nadav.amit@gmail.com>, Rik van Riel <riel@redhat.com>, Dave Hansen <dave.hansen@intel.com>, Arjan van de Ven <arjan@linux.intel.com>, Peter Zijlstra <peterz@infradead.org>, Andy Lutomirski <luto@kernel.org> Subject: [PATCH v4 00/10] PCID and improved laziness Date: Thu, 29 Jun 2017 08:53:12 -0700 [thread overview] Message-ID: <cover.1498751203.git.luto@kernel.org> (raw) *** Ingo, even if this misses 4.13, please apply the first patch before *** the merge window. There are three performance benefits here: 1. TLB flushing is slow. (I.e. the flush itself takes a while.) This avoids many of them when switching tasks by using PCID. In a stupid little benchmark I did, it saves about 100ns on my laptop per context switch. I'll try to improve that benchmark. 2. Mms that have been used recently on a given CPU might get to keep their TLB entries alive across process switches with this patch set. TLB fills are pretty fast on modern CPUs, but they're even faster when they don't happen. 3. Lazy TLB is way better. We used to do two stupid things when we ran kernel threads: we'd send IPIs to flush user contexts on their CPUs and then we'd write to CR3 for no particular reason as an excuse to stop further IPIs. With this patch, we do neither. This will, in general, perform suboptimally if paravirt TLB flushing is in use (currently just Xen, I think, but Hyper-V is in the works). The code is structured so we could fix it in one of two ways: we could take a spinlock when touching the percpu state so we can update it remotely after a paravirt flush, or we could be more careful about our exactly how we access the state and use cmpxchg16b to do atomic remote updates. (On SMP systems without cmpxchg16b, we'd just skip the optimization entirely.) This is still missing a final comment-only patch to add overall documentation for the whole thing, but I didn't want to block sending the maybe-hopefully-final code on that. This is based on tip:x86/mm. The branch is here if you want to play: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/pcid In general, performance seems to exceed my expectations. Here are some performance numbers copy-and-pasted from the changelogs for "Rework lazy TLB mode and TLB freshness" and "Try to preserve old TLB entries using PCID": MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity. In an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU. This is intended to be a nearly worst-case test. patched: 13.4µs unpatched: 21.6µs Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores), nrounds = 100, 256M data patched: 1.1 seconds or so unpatched: 1.9 seconds or so ping-pong between two mms on the same CPU using eventfd: patched: 1.22µs patched, nopcid: 1.33µs unpatched: 1.34µs Same ping-pong, but now touch 512 pages (all zero-page to minimize cache misses) each iteration. dTLB misses are measured by dtlb_load_misses.miss_causes_a_walk: patched: 1.8µs 11M dTLB misses patched, nopcid: 6.2µs, 207M dTLB misses unpatched: 6.1µs, 190M dTLB misses MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity. In an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU. This is intended to be a nearly worst-case test. patched: 13.4µs unpatched: 21.6µs Changes from v3: - Lots more acks. - Move comment deletion to the beginning. - Misc cleanups from lots of reviewers. Changes from v2: - Add some Acks - Move the reentrancy issue to the beginning. (I also sent the same patch as a standalone fix -- it's just in here so that this series applies to x86/mm.) - Fix some comments. Changes from RFC: - flush_tlb_func_common() no longer gets reentered (Nadav) - Fix ASID corruption on unlazying (kbuild bot) - Move Xen init to the right place - Misc cleanups Andy Lutomirski (10): x86/mm: Don't reenter flush_tlb_func_common() x86/mm: Delete a big outdated comment about TLB flushing x86/mm: Give each mm TLB flush generation a unique ID x86/mm: Track the TLB's tlb_gen and update the flushing algorithm x86/mm: Rework lazy TLB mode and TLB freshness tracking x86/mm: Stop calling leave_mm() in idle code x86/mm: Disable PCID on 32-bit kernels x86/mm: Add nopcid to turn off PCID x86/mm: Enable CR4.PCIDE on supported systems x86/mm: Try to preserve old TLB entries using PCID Documentation/admin-guide/kernel-parameters.txt | 2 + arch/ia64/include/asm/acpi.h | 2 - arch/x86/include/asm/acpi.h | 2 - arch/x86/include/asm/disabled-features.h | 4 +- arch/x86/include/asm/mmu.h | 25 +- arch/x86/include/asm/mmu_context.h | 15 +- arch/x86/include/asm/processor-flags.h | 2 + arch/x86/include/asm/tlbflush.h | 87 +++++- arch/x86/kernel/cpu/bugs.c | 8 + arch/x86/kernel/cpu/common.c | 40 +++ arch/x86/mm/init.c | 2 +- arch/x86/mm/tlb.c | 382 ++++++++++++++++-------- arch/x86/xen/enlighten_pv.c | 6 + arch/x86/xen/mmu_pv.c | 5 +- drivers/acpi/processor_idle.c | 2 - drivers/idle/intel_idle.c | 9 +- 16 files changed, 446 insertions(+), 147 deletions(-) -- 2.9.4
WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@kernel.org> To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, Borislav Petkov <bp@alien8.de>, Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton <akpm@linux-foundation.org>, Mel Gorman <mgorman@suse.de>, "linux-mm@kvack.org" <linux-mm@kvack.org>, Nadav Amit <nadav.amit@gmail.com>, Rik van Riel <riel@redhat.com>, Dave Hansen <dave.hansen@intel.com>, Arjan van de Ven <arjan@linux.intel.com>, Peter Zijlstra <peterz@infradead.org>, Andy Lutomirski <luto@kernel.org> Subject: [PATCH v4 00/10] PCID and improved laziness Date: Thu, 29 Jun 2017 08:53:12 -0700 [thread overview] Message-ID: <cover.1498751203.git.luto@kernel.org> (raw) *** Ingo, even if this misses 4.13, please apply the first patch before *** the merge window. There are three performance benefits here: 1. TLB flushing is slow. (I.e. the flush itself takes a while.) This avoids many of them when switching tasks by using PCID. In a stupid little benchmark I did, it saves about 100ns on my laptop per context switch. I'll try to improve that benchmark. 2. Mms that have been used recently on a given CPU might get to keep their TLB entries alive across process switches with this patch set. TLB fills are pretty fast on modern CPUs, but they're even faster when they don't happen. 3. Lazy TLB is way better. We used to do two stupid things when we ran kernel threads: we'd send IPIs to flush user contexts on their CPUs and then we'd write to CR3 for no particular reason as an excuse to stop further IPIs. With this patch, we do neither. This will, in general, perform suboptimally if paravirt TLB flushing is in use (currently just Xen, I think, but Hyper-V is in the works). The code is structured so we could fix it in one of two ways: we could take a spinlock when touching the percpu state so we can update it remotely after a paravirt flush, or we could be more careful about our exactly how we access the state and use cmpxchg16b to do atomic remote updates. (On SMP systems without cmpxchg16b, we'd just skip the optimization entirely.) This is still missing a final comment-only patch to add overall documentation for the whole thing, but I didn't want to block sending the maybe-hopefully-final code on that. This is based on tip:x86/mm. The branch is here if you want to play: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/pcid In general, performance seems to exceed my expectations. Here are some performance numbers copy-and-pasted from the changelogs for "Rework lazy TLB mode and TLB freshness" and "Try to preserve old TLB entries using PCID": MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity. In an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU. This is intended to be a nearly worst-case test. patched: 13.4Aus unpatched: 21.6Aus Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores), nrounds = 100, 256M data patched: 1.1 seconds or so unpatched: 1.9 seconds or so ping-pong between two mms on the same CPU using eventfd: patched: 1.22Aus patched, nopcid: 1.33Aus unpatched: 1.34Aus Same ping-pong, but now touch 512 pages (all zero-page to minimize cache misses) each iteration. dTLB misses are measured by dtlb_load_misses.miss_causes_a_walk: patched: 1.8Aus 11M dTLB misses patched, nopcid: 6.2Aus, 207M dTLB misses unpatched: 6.1Aus, 190M dTLB misses MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity. In an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU. This is intended to be a nearly worst-case test. patched: 13.4Aus unpatched: 21.6Aus Changes from v3: - Lots more acks. - Move comment deletion to the beginning. - Misc cleanups from lots of reviewers. Changes from v2: - Add some Acks - Move the reentrancy issue to the beginning. (I also sent the same patch as a standalone fix -- it's just in here so that this series applies to x86/mm.) - Fix some comments. Changes from RFC: - flush_tlb_func_common() no longer gets reentered (Nadav) - Fix ASID corruption on unlazying (kbuild bot) - Move Xen init to the right place - Misc cleanups Andy Lutomirski (10): x86/mm: Don't reenter flush_tlb_func_common() x86/mm: Delete a big outdated comment about TLB flushing x86/mm: Give each mm TLB flush generation a unique ID x86/mm: Track the TLB's tlb_gen and update the flushing algorithm x86/mm: Rework lazy TLB mode and TLB freshness tracking x86/mm: Stop calling leave_mm() in idle code x86/mm: Disable PCID on 32-bit kernels x86/mm: Add nopcid to turn off PCID x86/mm: Enable CR4.PCIDE on supported systems x86/mm: Try to preserve old TLB entries using PCID Documentation/admin-guide/kernel-parameters.txt | 2 + arch/ia64/include/asm/acpi.h | 2 - arch/x86/include/asm/acpi.h | 2 - arch/x86/include/asm/disabled-features.h | 4 +- arch/x86/include/asm/mmu.h | 25 +- arch/x86/include/asm/mmu_context.h | 15 +- arch/x86/include/asm/processor-flags.h | 2 + arch/x86/include/asm/tlbflush.h | 87 +++++- arch/x86/kernel/cpu/bugs.c | 8 + arch/x86/kernel/cpu/common.c | 40 +++ arch/x86/mm/init.c | 2 +- arch/x86/mm/tlb.c | 382 ++++++++++++++++-------- arch/x86/xen/enlighten_pv.c | 6 + arch/x86/xen/mmu_pv.c | 5 +- drivers/acpi/processor_idle.c | 2 - drivers/idle/intel_idle.c | 9 +- 16 files changed, 446 insertions(+), 147 deletions(-) -- 2.9.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2017-06-29 15:53 UTC|newest] Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-06-29 15:53 Andy Lutomirski [this message] 2017-06-29 15:53 ` [PATCH v4 00/10] PCID and improved laziness Andy Lutomirski 2017-06-29 15:53 ` [PATCH v4 01/10] x86/mm: Don't reenter flush_tlb_func_common() Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-06-30 13:11 ` [tip:x86/mm] " tip-bot for Andy Lutomirski 2017-06-29 15:53 ` [PATCH v4 02/10] x86/mm: Delete a big outdated comment about TLB flushing Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-06-30 13:11 ` [tip:x86/mm] " tip-bot for Andy Lutomirski 2017-06-29 15:53 ` [PATCH v4 03/10] x86/mm: Give each mm TLB flush generation a unique ID Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-07-05 10:31 ` [tip:x86/mm] " tip-bot for Andy Lutomirski 2017-07-05 10:58 ` Peter Zijlstra 2017-06-29 15:53 ` [PATCH v4 04/10] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-07-05 10:31 ` [tip:x86/mm] " tip-bot for Andy Lutomirski 2017-06-29 15:53 ` [PATCH v4 05/10] x86/mm: Rework lazy TLB mode and TLB freshness tracking Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-07-05 10:31 ` [tip:x86/mm] " tip-bot for Andy Lutomirski 2017-06-29 15:53 ` [PATCH v4 06/10] x86/mm: Stop calling leave_mm() in idle code Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-07-05 10:32 ` [tip:x86/mm] " tip-bot for Andy Lutomirski 2017-06-29 15:53 ` [PATCH v4 07/10] x86/mm: Disable PCID on 32-bit kernels Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-07-05 10:32 ` [tip:x86/mm] " tip-bot for Andy Lutomirski 2017-06-29 15:53 ` [PATCH v4 08/10] x86/mm: Add nopcid to turn off PCID Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-07-05 10:32 ` [tip:x86/mm] x86/mm: Add the 'nopcid' boot option " tip-bot for Andy Lutomirski 2017-06-29 15:53 ` [PATCH v4 09/10] x86/mm: Enable CR4.PCIDE on supported systems Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-07-05 10:33 ` [tip:x86/mm] " tip-bot for Andy Lutomirski 2017-06-29 15:53 ` [PATCH v4 10/10] x86/mm: Try to preserve old TLB entries using PCID Andy Lutomirski 2017-06-29 15:53 ` Andy Lutomirski 2017-07-03 10:56 ` Thomas Gleixner 2017-07-03 10:56 ` Thomas Gleixner 2017-07-05 12:18 ` Peter Zijlstra 2017-07-05 12:18 ` Peter Zijlstra 2017-07-05 16:04 ` Andy Lutomirski 2017-07-05 16:04 ` Andy Lutomirski 2017-07-05 17:02 ` Peter Zijlstra 2017-07-05 17:02 ` Peter Zijlstra 2017-07-18 8:53 ` Ingo Molnar 2017-07-18 8:53 ` Ingo Molnar 2017-07-18 17:06 ` Andy Lutomirski 2017-07-18 17:06 ` Andy Lutomirski 2017-07-05 12:25 ` Peter Zijlstra 2017-07-05 12:25 ` Peter Zijlstra 2017-07-05 16:10 ` Andy Lutomirski 2017-07-05 16:10 ` Andy Lutomirski 2017-07-28 13:49 ` Peter Zijlstra 2017-07-28 13:49 ` Peter Zijlstra 2017-06-30 12:44 ` [PATCH v4 00/10] PCID and improved laziness Matt Fleming 2017-06-30 12:44 ` Matt Fleming 2017-07-11 11:32 ` Matt Fleming 2017-07-11 11:32 ` Matt Fleming 2017-07-11 15:00 ` Andy Lutomirski 2017-07-11 15:00 ` Andy Lutomirski 2017-07-13 19:36 ` Matt Fleming 2017-07-13 19:36 ` Matt Fleming 2017-07-05 8:56 ` Ingo Molnar 2017-07-05 8:56 ` Ingo Molnar 2017-07-05 16:53 ` Linus Torvalds 2017-07-05 16:53 ` Linus Torvalds 2017-07-17 9:57 ` Mel Gorman 2017-07-17 9:57 ` Mel Gorman 2017-07-17 15:06 ` Ingo Molnar 2017-07-17 15:06 ` Ingo Molnar 2017-07-17 15:56 ` Mel Gorman 2017-07-17 15:56 ` Mel Gorman [not found] ` <CALBSrqDW6pGjHxOmzfnkY_KoNeH6F=pTb8-tJ8r-zbu4prw9HQ@mail.gmail.com> 2017-09-12 19:32 ` Sai Praneeth Prakhya 2017-09-12 19:32 ` Sai Praneeth Prakhya 2017-09-12 19:45 ` Andy Lutomirski 2017-09-12 20:28 ` Prakhya, Sai Praneeth 2017-09-13 7:43 ` Ingo Molnar 2017-09-13 7:43 ` Ingo Molnar 2017-09-13 7:45 ` Ingo Molnar 2017-09-13 7:45 ` Ingo Molnar 2017-09-13 4:14 ` Sai Praneeth Prakhya 2017-09-13 4:14 ` Sai Praneeth Prakhya
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=cover.1498751203.git.luto@kernel.org \ --to=luto@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=arjan@linux.intel.com \ --cc=bp@alien8.de \ --cc=dave.hansen@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=nadav.amit@gmail.com \ --cc=peterz@infradead.org \ --cc=riel@redhat.com \ --cc=torvalds@linux-foundation.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.