All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Peter Zijlstra <peterz@infradead.org>, Andy Lutomirski <luto@kernel.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Nadav Amit <namit@vmware.com>,
	Richard Henderson <rth@twiddle.net>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Matt Turner <mattst88@gmail.com>, Tony Luck <tony.luck@intel.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@surriel.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH 0/9] x86: Concurrent TLB flushes and other improvements
Date: Wed, 12 Jun 2019 23:48:04 -0700	[thread overview]
Message-ID: <20190613064813.8102-1-namit@vmware.com> (raw)

Currently, local and remote TLB flushes are not performed concurrently,
which introduces unnecessary overhead - each INVLPG can take 100s of
cycles. This patch-set allows TLB flushes to be run concurrently: first
request the remote CPUs to initiate the flush, then run it locally, and
finally wait for the remote CPUs to finish their work.

In addition, there are various small optimizations to avoid unwarranted
false-sharing and atomic operations.

The proposed changes should also improve the performance of other
invocations of on_each_cpu(). Hopefully, no one has relied on this
behavior of on_each_cpu() that invoked functions first remotely and only
then locally [Peter says he remembers someone might do so, but without
further information it is hard to know how to address it].

Running sysbench on dax w/emulated-pmem, write-cache disabled, and
various mitigations (PTI, Spectre, MDS) disabled on Haswell:

 sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
  --file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run

			events (avg/stddev)
			-------------------
  5.2-rc3:		1247669.0000/16075.39
  +patchset:		1290607.0000/13617.56 (+3.4%)

Patch 1 does small cleanup. Patches 2-5 implement the concurrent
execution of TLB flushes. Patches 6-9 deal with false-sharing and
unnecessary atomic operations. There is still no implementation that
uses the concurrent TLB flushes by Xen and Hyper-V. 

There are various additional possible optimizations that were sent or
are in development (async flushes, x2apic shorthands, fewer mm_tlb_gen
accesses, etc.), but based on Andy's feedback, they will be sent later.

RFCv2 -> v1:
* Fix comment on flush_tlb_multi [Juergen]
* Removing async invalidation optimizations [Andy]
* Adding KVM support [Paolo]

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>

Nadav Amit (9):
  smp: Remove smp_call_function() and on_each_cpu() return values
  smp: Run functions concurrently in smp_call_function_many()
  x86/mm/tlb: Refactor common code into flush_tlb_on_cpus()
  x86/mm/tlb: Flush remote and local TLBs concurrently
  x86/mm/tlb: Optimize local TLB flushes
  KVM: x86: Provide paravirtualized flush_tlb_multi()
  smp: Do not mark call_function_data as shared
  x86/tlb: Privatize cpu_tlbstate
  x86/apic: Use non-atomic operations when possible

 arch/alpha/kernel/smp.c               |  19 +---
 arch/alpha/oprofile/common.c          |   6 +-
 arch/ia64/kernel/perfmon.c            |  12 +--
 arch/ia64/kernel/uncached.c           |   8 +-
 arch/x86/hyperv/mmu.c                 |   2 +
 arch/x86/include/asm/paravirt.h       |   8 ++
 arch/x86/include/asm/paravirt_types.h |   6 ++
 arch/x86/include/asm/tlbflush.h       |  46 ++++----
 arch/x86/kernel/apic/apic_flat_64.c   |   4 +-
 arch/x86/kernel/apic/x2apic_cluster.c |   2 +-
 arch/x86/kernel/kvm.c                 |  11 +-
 arch/x86/kernel/paravirt.c            |   3 +
 arch/x86/kernel/smp.c                 |   2 +-
 arch/x86/lib/cache-smp.c              |   3 +-
 arch/x86/mm/init.c                    |   2 +-
 arch/x86/mm/tlb.c                     | 150 ++++++++++++++++++--------
 arch/x86/xen/mmu_pv.c                 |   2 +
 drivers/char/agp/generic.c            |   3 +-
 include/linux/smp.h                   |  32 ++++--
 kernel/smp.c                          | 141 +++++++++++-------------
 kernel/up.c                           |   3 +-
 21 files changed, 272 insertions(+), 193 deletions(-)

-- 
2.20.1


             reply	other threads:[~2019-06-13 16:42 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-13  6:48 Nadav Amit [this message]
2019-06-13  6:48 ` [PATCH 1/9] smp: Remove smp_call_function() and on_each_cpu() return values Nadav Amit
2019-06-23 12:32   ` [tip:smp/hotplug] " tip-bot for Nadav Amit
2019-06-13  6:48 ` [PATCH 2/9] smp: Run functions concurrently in smp_call_function_many() Nadav Amit
2019-06-13  6:48 ` [PATCH 3/9] x86/mm/tlb: Refactor common code into flush_tlb_on_cpus() Nadav Amit
2019-06-25 21:07   ` Dave Hansen
2019-06-26  1:57     ` Nadav Amit
2019-06-13  6:48 ` [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently Nadav Amit
2019-06-13  6:48   ` [Xen-devel] " Nadav Amit
2019-06-13  6:48   ` Nadav Amit via Virtualization
2019-06-25 21:29   ` Dave Hansen
2019-06-25 21:29     ` [Xen-devel] " Dave Hansen
2019-06-25 21:29     ` Dave Hansen
2019-06-26  2:35     ` Nadav Amit
2019-06-26  2:35       ` [Xen-devel] " Nadav Amit
2019-06-26  3:00       ` Dave Hansen
2019-06-26  3:00         ` [Xen-devel] " Dave Hansen
2019-06-26  3:32         ` Nadav Amit
2019-06-26  3:32           ` [Xen-devel] " Nadav Amit
2019-06-26  3:32           ` Nadav Amit via Virtualization
2019-06-26  3:00       ` Dave Hansen
2019-06-26  2:35     ` Nadav Amit via Virtualization
2019-06-26  3:36   ` Andy Lutomirski
2019-06-26  3:36     ` [Xen-devel] " Andy Lutomirski
2019-06-26  3:36     ` Andy Lutomirski
2019-06-26  3:48     ` Nadav Amit
2019-06-26  3:48       ` [Xen-devel] " Nadav Amit
2019-06-26  3:48       ` Nadav Amit via Virtualization
2019-06-26  3:51       ` Andy Lutomirski
2019-06-26  3:51       ` Andy Lutomirski
2019-06-26  3:51         ` [Xen-devel] " Andy Lutomirski
2019-06-13  6:48 ` [PATCH 5/9] x86/mm/tlb: Optimize local TLB flushes Nadav Amit
2019-06-25 21:36   ` Dave Hansen
2019-06-26 16:33     ` Andy Lutomirski
2019-06-26 16:39       ` Nadav Amit
2019-06-26 16:50         ` Andy Lutomirski
2019-06-13  6:48 ` [PATCH 6/9] KVM: x86: Provide paravirtualized flush_tlb_multi() Nadav Amit
2019-06-25 21:40   ` Dave Hansen
2019-06-26  2:39     ` Nadav Amit
2019-06-26  3:35       ` Andy Lutomirski
2019-06-26  3:41         ` Nadav Amit
2019-06-26  3:56           ` Andy Lutomirski
2019-06-26  6:30             ` Nadav Amit
2019-06-26 16:37               ` Andy Lutomirski
2019-06-26 17:41                 ` Vitaly Kuznetsov
2019-06-26 18:21                   ` Andy Lutomirski
2019-06-13  6:48 ` [PATCH 7/9] smp: Do not mark call_function_data as shared Nadav Amit
2019-06-23 12:31   ` [tip:smp/hotplug] " tip-bot for Nadav Amit
2019-06-13  6:48 ` [PATCH 8/9] x86/tlb: Privatize cpu_tlbstate Nadav Amit
2019-06-14 15:58   ` Sean Christopherson
2019-06-17 17:10     ` Nadav Amit
2019-06-25 21:52   ` Dave Hansen
2019-06-26  1:22     ` Nadav Amit
2019-06-26  3:57     ` Andy Lutomirski
2019-06-13  6:48 ` [PATCH 9/9] x86/apic: Use non-atomic operations when possible Nadav Amit
2019-06-23 12:16   ` [tip:x86/apic] " tip-bot for Nadav Amit
2019-06-25 21:58   ` [PATCH 9/9] " Dave Hansen
2019-06-25 22:03     ` Thomas Gleixner
2019-06-23 12:37 ` [PATCH 0/9] x86: Concurrent TLB flushes and other improvements Thomas Gleixner
2019-06-25 22:02 ` Dave Hansen
2019-06-26  1:34   ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190613064813.8102-1-namit@vmware.com \
    --to=namit@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mattst88@gmail.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=rth@twiddle.net \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.