linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Khalid Aziz <khalid.aziz@oracle.com>
To: juergh@gmail.com, tycho@tycho.ws, jsteckli@amazon.de,
	ak@linux.intel.com,  torvalds@linux-foundation.org,
	liran.alon@oracle.com, keescook@google.com,
	akpm@linux-foundation.org, mhocko@suse.com,
	catalin.marinas@arm.com, will.deacon@arm.com, jmorris@namei.org,
	konrad.wilk@oracle.com
Cc: kernel-hardening@lists.openwall.com, peterz@infradead.org,
	dave.hansen@intel.com, Khalid Aziz <khalid.aziz@oracle.com>,
	deepa.srinivasan@oracle.com, steven.sistare@oracle.com,
	hch@lst.de, x86@kernel.org, kanth.ghatraju@oracle.com,
	labbott@redhat.com, pradeep.vincent@oracle.com, jcm@redhat.com,
	luto@kernel.org, boris.ostrovsky@oracle.com,
	chris.hyser@oracle.com, linux-arm-kernel@lists.infradead.org,
	jmattson@google.com, linux-mm@kvack.org,
	andrew.cooper3@citrix.com, linux-kernel@vger.kernel.org,
	tyhicks@canonical.com, john.haxby@oracle.com, tglx@linutronix.de,
	oao.m.martins@oracle.com, dwmw@amazon.co.uk,
	kirill.shutemov@linux.intel.com
Subject: [RFC PATCH v8 13/14] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only)
Date: Wed, 13 Feb 2019 17:01:36 -0700	[thread overview]
Message-ID: <98134cb73e911b2f0b59ffb76243a7777963d218.1550088114.git.khalid.aziz@oracle.com> (raw)
In-Reply-To: <cover.1550088114.git.khalid.aziz@oracle.com>
In-Reply-To: <cover.1550088114.git.khalid.aziz@oracle.com>

XPFO flushes kernel space TLB entries for pages that are now mapped
in userspace on not only the current CPU but also all other CPUs
synchronously. Processes on each core allocating pages causes a
flood of IPI messages to all other cores to flush TLB entries.
Many of these messages are to flush the entire TLB on the core if
the number of entries being flushed from local core exceeds
tlb_single_page_flush_ceiling. The cost of TLB flush caused by
unmapping pages from physmap goes up dramatically on machines with
high core count.

This patch flushes relevant TLB entries for current process or
entire TLB depending upon number of entries for the current CPU
and posts a pending TLB flush on all other CPUs when a page is
unmapped from kernel space and mapped in userspace. Each core
checks the pending TLB flush flag for itself on every context
switch, flushes its TLB if the flag is set and clears it.
This patch potentially aggregates multiple TLB flushes into one.
This has very significant impact especially on machines with large
core counts. To illustrate this, kernel was compiled with -j on
two classes of machines - a server with high core count and large
amount of memory, and a desktop class machine with more modest
specs. System time from "make -j" from vanilla 4.20 kernel, 4.20
with XPFO patches before applying this patch and after applying
this patch are below:

Hardware: 96-core Intel Xeon Platinum 8160 CPU @ 2.10GHz, 768 GB RAM
make -j60 all

4.20                            950.966s
4.20+XPFO                       25073.169s      26.366x
4.20+XPFO+Deferred flush        1372.874s        1.44x

Hardware: 4-core Intel Core i5-3550 CPU @ 3.30GHz, 8G RAM
make -j4 all

4.20                            607.671s
4.20+XPFO                       1588.646s       2.614x
4.20+XPFO+Deferred flush        803.989s        1.32x

This patch could use more optimization. Batching more TLB entry
flushes, as was suggested for earlier version of these patches,
can help reduce these cases. This same code should be implemented
for other architectures as well once finalized.

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 arch/x86/include/asm/tlbflush.h |  1 +
 arch/x86/mm/tlb.c               | 38 +++++++++++++++++++++++++++++++++
 arch/x86/mm/xpfo.c              |  2 +-
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index f4204bf377fc..92d23629d01d 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -561,6 +561,7 @@ extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 				unsigned long end, unsigned int stride_shift,
 				bool freed_tables);
 extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
+extern void xpfo_flush_tlb_kernel_range(unsigned long start, unsigned long end);
 
 static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
 {
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 03b6b4c2238d..c907b643eecb 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -35,6 +35,15 @@
  */
 #define LAST_USER_MM_IBPB	0x1UL
 
+/*
+ * When a full TLB flush is needed to flush stale TLB entries
+ * for pages that have been mapped into userspace and unmapped
+ * from kernel space, this TLB flush will be delayed until the
+ * task is scheduled on that CPU. Keep track of CPUs with
+ * pending full TLB flush forced by xpfo.
+ */
+static cpumask_t pending_xpfo_flush;
+
 /*
  * We get here when we do something requiring a TLB invalidation
  * but could not go invalidate all of the contexts.  We do the
@@ -319,6 +328,15 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		__flush_tlb_all();
 	}
 #endif
+
+	/* If there is a pending TLB flush for this CPU due to XPFO
+	 * flush, do it now.
+	 */
+	if (cpumask_test_and_clear_cpu(cpu, &pending_xpfo_flush)) {
+		count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
+		__flush_tlb_all();
+	}
+
 	this_cpu_write(cpu_tlbstate.is_lazy, false);
 
 	/*
@@ -801,6 +819,26 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 	}
 }
 
+void xpfo_flush_tlb_kernel_range(unsigned long start, unsigned long end)
+{
+	struct cpumask tmp_mask;
+
+	/* Balance as user space task's flush, a bit conservative */
+	if (end == TLB_FLUSH_ALL ||
+	    (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) {
+		do_flush_tlb_all(NULL);
+	} else {
+		struct flush_tlb_info info;
+
+		info.start = start;
+		info.end = end;
+		do_kernel_range_flush(&info);
+	}
+	cpumask_setall(&tmp_mask);
+	cpumask_clear_cpu(smp_processor_id(), &tmp_mask);
+	cpumask_or(&pending_xpfo_flush, &pending_xpfo_flush, &tmp_mask);
+}
+
 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 {
 	struct flush_tlb_info info = {
diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c
index e13b99019c47..d3833532bfdc 100644
--- a/arch/x86/mm/xpfo.c
+++ b/arch/x86/mm/xpfo.c
@@ -115,7 +115,7 @@ inline void xpfo_flush_kernel_tlb(struct page *page, int order)
 		return;
 	}
 
-	flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size);
+	xpfo_flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size);
 }
 
 /* Convert a user space virtual address to a physical address.
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2019-02-14  0:04 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-14  0:01 [RFC PATCH v8 00/14] Add support for eXclusive Page Frame Ownership Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 01/14] mm: add MAP_HUGETLB support to vm_mmap Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 02/14] x86: always set IF before oopsing from page fault Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 03/14] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO) Khalid Aziz
2019-02-14 10:56   ` Peter Zijlstra
2019-02-14 16:15     ` Borislav Petkov
2019-02-14 17:19       ` Khalid Aziz
2019-02-14 17:13     ` Khalid Aziz
2019-02-14 19:08       ` Peter Zijlstra
2019-02-14 19:58         ` Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 04/14] swiotlb: Map the buffer if it was unmapped by XPFO Khalid Aziz
2019-02-14  7:47   ` Christoph Hellwig
2019-02-14 16:56     ` Khalid Aziz
2019-02-14 17:44       ` Christoph Hellwig
2019-02-14 19:48         ` Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 05/14] arm64/mm: Add support for XPFO Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 06/14] xpfo: add primitives for mapping underlying memory Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 07/14] arm64/mm, xpfo: temporarily map dcache regions Khalid Aziz
2019-02-14 15:54   ` Tycho Andersen
2019-02-14 17:29     ` Khalid Aziz
2019-02-14 23:49       ` Tycho Andersen
2019-02-14  0:01 ` [RFC PATCH v8 08/14] arm64/mm: disable section/contiguous mappings if XPFO is enabled Khalid Aziz
2019-02-15 13:09   ` Mark Rutland
2019-02-15 14:47     ` Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 09/14] mm: add a user_virt_to_phys symbol Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 10/14] lkdtm: Add test for XPFO Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 11/14] xpfo, mm: remove dependency on CONFIG_PAGE_EXTENSION Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 12/14] xpfo, mm: optimize spinlock usage in xpfo_kunmap Khalid Aziz
2019-02-14  0:01 ` Khalid Aziz [this message]
2019-02-14 17:42   ` [RFC PATCH v8 13/14] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only) Dave Hansen
2019-02-14 19:57     ` Khalid Aziz
2019-02-14  0:01 ` [RFC PATCH v8 14/14] xpfo, mm: Optimize XPFO TLB flushes by batching them together Khalid Aziz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98134cb73e911b2f0b59ffb76243a7777963d218.1550088114.git.khalid.aziz@oracle.com \
    --to=khalid.aziz@oracle.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=catalin.marinas@arm.com \
    --cc=chris.hyser@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=deepa.srinivasan@oracle.com \
    --cc=dwmw@amazon.co.uk \
    --cc=hch@lst.de \
    --cc=jcm@redhat.com \
    --cc=jmattson@google.com \
    --cc=jmorris@namei.org \
    --cc=john.haxby@oracle.com \
    --cc=jsteckli@amazon.de \
    --cc=juergh@gmail.com \
    --cc=kanth.ghatraju@oracle.com \
    --cc=keescook@google.com \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=labbott@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liran.alon@oracle.com \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=oao.m.martins@oracle.com \
    --cc=peterz@infradead.org \
    --cc=pradeep.vincent@oracle.com \
    --cc=steven.sistare@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tycho@tycho.ws \
    --cc=tyhicks@canonical.com \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).