[RFC PATCH 0/3] x86/mm/tlb: Defer TLB flushes with PTI

* [RFC PATCH 0/3] x86/mm/tlb: Defer TLB flushes with PTI
@ 2019-08-23 22:46 Nadav Amit
  2019-08-23 22:46 ` [RFC PATCH 1/3] x86/mm/tlb: Defer PTI flushes Nadav Amit
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Nadav Amit @ 2019-08-23 22:46 UTC (permalink / raw)
  To: Andy Lutomirski, Dave Hansen
  Cc: x86, linux-kernel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Nadav Amit

INVPCID is considerably slower than INVLPG of a single PTE, but it is
currently used to flush PTEs in the user page-table when PTI is used.

Instead, it is possible to defer TLB flushes until after the user
page-tables are loaded. Preventing speculation over the TLB flushes
should keep the whole thing safe. In some cases, deferring TLB flushes
in such a way can result in more full TLB flushes, but arguably this
behavior is oftentimes beneficial.

These patches are based and evaluated on top of the concurrent
TLB-flushes v4 patch-set.

I will provide more results later, but it might be easier to look at the
time an isolated TLB flush takes. These numbers are from skylake,
showing the number of cycles that running madvise(DONTNEED) which
results in local TLB flushes takes:

n_pages		concurrent	+deferred-pti		change
-------		----------	-------------		------
 1		2119		1986 			-6.7%
 10		6791		5417 			 -20%

Please let me know if I missed something that affects security or
performance.

[ Yes, I know there is another pending RFC for async TLB flushes, but I
  think it might be easier to merge this one first ]

Nadav Amit (3):
  x86/mm/tlb: Defer PTI flushes
  x86/mm/tlb: Avoid deferring PTI flushes on shootdown
  x86/mm/tlb: Use lockdep irq assertions

 arch/x86/entry/calling.h        | 52 +++++++++++++++++++--
 arch/x86/include/asm/tlbflush.h | 31 ++++++++++--
 arch/x86/kernel/asm-offsets.c   |  3 ++
 arch/x86/mm/tlb.c               | 83 +++++++++++++++++++++++++++++++--
 4 files changed, 158 insertions(+), 11 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 10+ messages in thread