All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] x86/mm: Flush remote and local TLBs concurrently
@ 2019-05-25  8:21 Nadav Amit
  2019-05-25  8:21 ` [RFC PATCH 1/6] smp: Remove smp_call_function() and on_each_cpu() return values Nadav Amit
                   ` (8 more replies)
  0 siblings, 9 replies; 48+ messages in thread
From: Nadav Amit @ 2019-05-25  8:21 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Andy Lutomirski
  Cc: Borislav Petkov, linux-kernel, Nadav Amit

Currently, local and remote TLB flushes are not performed concurrently,
which introduces unnecessary overhead - each INVLPG can take 100s of
cycles. This patch-set allows TLB flushes to be run concurrently: first
request the remote CPUs to initiate the flush, then run it locally, and
finally wait for the remote CPUs to finish their work.

The proposed changes should also improve the performance of other
invocations of on_each_cpu(). Hopefully, no one has relied on the
behavior of on_each_cpu() that functions were first executed remotely
and only then locally.

On my Haswell machine (bare-metal), running a TLB flush microbenchmark
(MADV_DONTNEED/touch for a single page on one thread), takes the
following time (ns):

	n_threads	before		after
	---------	------		-----
	1		661		663
	2		1436		1225 (-14%)
	4		1571		1421 (-10%)

Note that since the benchmark also causes page-faults, the actual
speedup of TLB shootdowns is actually greater. Also note the higher
improvement in performance with 2 thread (a single remote TLB flush
target). This seems to be a side-effect of holding synchronization
data-structures (csd) off the stack, unlike the way it is currently done
(in smp_call_function_single()).

Patches 1-2 do small cleanup. Patches 3-5 actually implement the
concurrent execution of TLB flushes. Patch 6 restores local TLB flushes
performance, which was hurt by the optimization, to be as good as it was
before these changes by introducing a fast-pass for this specific case.

Nadav Amit (6):
  smp: Remove smp_call_function() and on_each_cpu() return values
  cpumask: Purify cpumask_next()
  smp: Run functions concurrently in smp_call_function_many()
  x86/mm/tlb: Refactor common code into flush_tlb_on_cpus()
  x86/mm/tlb: Flush remote and local TLBs concurrently
  x86/mm/tlb: Optimize local TLB flushes

 arch/alpha/kernel/smp.c               |  19 +---
 arch/alpha/oprofile/common.c          |   6 +-
 arch/ia64/kernel/perfmon.c            |  12 +--
 arch/ia64/kernel/uncached.c           |   8 +-
 arch/x86/hyperv/mmu.c                 |   2 +
 arch/x86/include/asm/paravirt.h       |   8 ++
 arch/x86/include/asm/paravirt_types.h |   6 ++
 arch/x86/include/asm/tlbflush.h       |   6 ++
 arch/x86/kernel/kvm.c                 |   1 +
 arch/x86/kernel/paravirt.c            |   3 +
 arch/x86/lib/cache-smp.c              |   3 +-
 arch/x86/mm/tlb.c                     | 137 +++++++++++++++++--------
 arch/x86/xen/mmu_pv.c                 |   2 +
 drivers/char/agp/generic.c            |   3 +-
 include/linux/cpumask.h               |   2 +-
 include/linux/smp.h                   |  32 ++++--
 kernel/smp.c                          | 139 ++++++++++++--------------
 kernel/up.c                           |   3 +-
 18 files changed, 230 insertions(+), 162 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2019-05-27 19:15 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-25  8:21 [RFC PATCH 0/6] x86/mm: Flush remote and local TLBs concurrently Nadav Amit
2019-05-25  8:21 ` [RFC PATCH 1/6] smp: Remove smp_call_function() and on_each_cpu() return values Nadav Amit
2019-05-25  8:21 ` [RFC PATCH 2/6] cpumask: Purify cpumask_next() Nadav Amit
2019-05-25  8:32   ` Ingo Molnar
2019-05-27  8:30   ` Peter Zijlstra
2019-05-27 17:34     ` Nadav Amit
2019-05-25  8:22 ` [RFC PATCH 3/6] smp: Run functions concurrently in smp_call_function_many() Nadav Amit
2019-05-27  9:15   ` Peter Zijlstra
2019-05-27 17:39     ` Nadav Amit
2019-05-25  8:22 ` [RFC PATCH 4/6] x86/mm/tlb: Refactor common code into flush_tlb_on_cpus() Nadav Amit
2019-05-27  9:24   ` Peter Zijlstra
2019-05-27 18:59     ` Nadav Amit
2019-05-27 19:14       ` Peter Zijlstra
2019-05-25  8:22 ` [RFC PATCH 5/6] x86/mm/tlb: Flush remote and local TLBs concurrently Nadav Amit
2019-05-25  8:22   ` [Xen-devel] " Nadav Amit
2019-05-25  8:22   ` Nadav Amit via Virtualization
2019-05-25  8:38   ` Nadav Amit
2019-05-25  8:38   ` Nadav Amit
2019-05-25  8:38     ` [Xen-devel] " Nadav Amit
2019-05-25  8:38     ` Nadav Amit via Virtualization
2019-05-25  8:54   ` Juergen Gross
2019-05-25  8:54     ` [Xen-devel] " Juergen Gross
2019-05-27  9:47     ` Peter Zijlstra
2019-05-27  9:47     ` Peter Zijlstra
2019-05-27  9:47       ` [Xen-devel] " Peter Zijlstra
2019-05-27 10:21       ` Paolo Bonzini
2019-05-27 10:21       ` Paolo Bonzini
2019-05-27 10:21         ` [Xen-devel] " Paolo Bonzini
2019-05-27 12:32         ` Peter Zijlstra
2019-05-27 12:32         ` Peter Zijlstra
2019-05-27 12:32           ` [Xen-devel] " Peter Zijlstra
2019-05-27 12:32           ` Peter Zijlstra
2019-05-27 12:45           ` Paolo Bonzini
2019-05-27 12:45           ` Paolo Bonzini
2019-05-27 12:45             ` [Xen-devel] " Paolo Bonzini
2019-05-27 12:45           ` Paolo Bonzini
2019-05-27 10:21       ` Paolo Bonzini
2019-05-27 17:49       ` Nadav Amit
2019-05-27 17:49         ` [Xen-devel] " Nadav Amit
2019-05-27 17:49       ` Nadav Amit
2019-05-27 17:49       ` Nadav Amit via Virtualization
2019-05-27  9:47     ` Peter Zijlstra
2019-05-25  8:54   ` Juergen Gross
2019-05-25  8:54   ` Juergen Gross
2019-05-25  8:22 ` Nadav Amit
2019-05-25  8:22 ` [RFC PATCH 6/6] x86/mm/tlb: Optimize local TLB flushes Nadav Amit
2019-05-27  8:28 ` [RFC PATCH 0/6] x86/mm: Flush remote and local TLBs concurrently Peter Zijlstra
2019-05-27  9:59 ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.