From: Peter Zijlstra <peterz@infradead.org>
To: Khalid Aziz <khalid.aziz@oracle.com>
Cc: juergh@gmail.com, tycho@tycho.ws, jsteckli@amazon.de,
ak@linux.intel.com, liran.alon@oracle.com, keescook@google.com,
konrad.wilk@oracle.com, deepa.srinivasan@oracle.com,
chris.hyser@oracle.com, tyhicks@canonical.com, dwmw@amazon.co.uk,
andrew.cooper3@citrix.com, jcm@redhat.com,
boris.ostrovsky@oracle.com, kanth.ghatraju@oracle.com,
joao.m.martins@oracle.com, jmattson@google.com,
pradeep.vincent@oracle.com, john.haxby@oracle.com,
tglx@linutronix.de, kirill.shutemov@linux.intel.com, hch@lst.de,
steven.sistare@oracle.com, labbott@redhat.com, luto@kernel.org,
dave.hansen@intel.com, aaron.lu@intel.com,
akpm@linux-foundation.org, alexander.h.duyck@linux.intel.com,
amir73il@gmail.com, andreyknvl@google.com,
aneesh.kumar@linux.ibm.com, anthony.yznaga@oracle.com,
ard.biesheuvel@linaro.org, arnd@arndb.de, arunks@codeaurora.org,
ben@decadent.org.uk, bigeasy@linutronix.de, bp@alien8.de,
brgl@bgdev.pl, catalin.marinas@arm.com, corbet@lwn.net,
cpandya@codeaurora.org, daniel.vetter@ffwll.ch,
dan.j.williams@intel.com, gregkh@linuxfoundation.org,
guro@fb.com, hannes@cmpxchg.org, hpa@zytor.com,
iamjoonsoo.kim@lge.com, james.morse@arm.com, jannh@google.com,
jgross@suse.com, jkosina@suse.cz, jmorris@namei.org,
joe@perches.com, jrdr.linux@gmail.com, jroedel@suse.de,
keith.busch@intel.com, khlebnikov@yandex-team.ru,
logang@deltatee.com, marco.antonio.780@gmail.com,
mark.rutland@arm.com, mgorman@techsingularity.net,
mhocko@suse.com, mhocko@suse.cz, mike.kravetz@oracle.com,
mingo@redhat.com, mst@redhat.com, m.szyprowski@samsung.com,
npiggin@gmail.com, osalvador@suse.de, paulmck@linux.vnet.ibm.com,
pavel.tatashin@microsoft.com, rdunlap@infradead.org,
richard.weiyang@gmail.com, riel@surriel.com, rientjes@google.com,
robin.murphy@arm.com, rostedt@goodmis.org,
rppt@linux.vnet.ibm.com, sai.praneeth.prakhya@intel.com,
serge@hallyn.com, steve.capper@arm.com, thymovanbeers@gmail.com,
vbabka@suse.cz, will.deacon@arm.com, willy@infradead.org,
yang.shi@linux.alibaba.com, yaojun8558363@gmail.com,
ying.huang@intel.com, zhangshaokun@hisilicon.com,
iommu@lists.linux-foundation.org, x86@kernel.org,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-security-module@vger.kernel.org,
Khalid Aziz <khalid@gonehiking.org>
Subject: Re: [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only)
Date: Thu, 4 Apr 2019 10:18:47 +0200 [thread overview]
Message-ID: <20190404081847.GR4038@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <4495dda4bfc4a06b3312cc4063915b306ecfaecb.1554248002.git.khalid.aziz@oracle.com>
On Wed, Apr 03, 2019 at 11:34:13AM -0600, Khalid Aziz wrote:
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 999d6d8f0bef..cc806a01a0eb 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -37,6 +37,20 @@
> */
> #define LAST_USER_MM_IBPB 0x1UL
>
> +/*
> + * A TLB flush may be needed to flush stale TLB entries
> + * for pages that have been mapped into userspace and unmapped
> + * from kernel space. This TLB flush needs to be propagated to
> + * all CPUs. Asynchronous flush requests to all CPUs can cause
> + * significant performance imapct. Queue a pending flush for
> + * a CPU instead. Multiple of these requests can then be handled
> + * by a CPU at a less disruptive time, like context switch, in
> + * one go and reduce performance impact significantly. Following
> + * data structure is used to keep track of CPUs with pending full
> + * TLB flush forced by xpfo.
> + */
> +static cpumask_t pending_xpfo_flush;
> +
> /*
> * We get here when we do something requiring a TLB invalidation
> * but could not go invalidate all of the contexts. We do the
> @@ -321,6 +335,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> __flush_tlb_all();
> }
> #endif
> +
> + /*
> + * If there is a pending TLB flush for this CPU due to XPFO
> + * flush, do it now.
> + */
> + if (cpumask_test_and_clear_cpu(cpu, &pending_xpfo_flush)) {
> + count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
> + __flush_tlb_all();
> + }
That really should be:
if (cpumask_test_cpu(cpu, &pending_xpfo_flush)) {
cpumask_clear_cpu(cpu, &pending_xpfo_flush);
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
__flush_tlb_all();
}
test_and_clear is an unconditional RmW and can cause cacheline
contention between adjecent CPUs even if none of the bits are set.
> +
> this_cpu_write(cpu_tlbstate.is_lazy, false);
>
> /*
> @@ -803,6 +827,34 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> }
> }
>
> +void xpfo_flush_tlb_kernel_range(unsigned long start, unsigned long end)
> +{
> + struct cpumask tmp_mask;
> +
> + /*
> + * Balance as user space task's flush, a bit conservative.
> + * Do a local flush immediately and post a pending flush on all
> + * other CPUs. Local flush can be a range flush or full flush
> + * depending upon the number of entries to be flushed. Remote
> + * flushes will be done by individual processors at the time of
> + * context switch and this allows multiple flush requests from
> + * other CPUs to be batched together.
> + */
> + if (end == TLB_FLUSH_ALL ||
> + (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) {
> + do_flush_tlb_all(NULL);
> + } else {
> + struct flush_tlb_info info;
> +
> + info.start = start;
> + info.end = end;
> + do_kernel_range_flush(&info);
> + }
> + cpumask_setall(&tmp_mask);
> + __cpumask_clear_cpu(smp_processor_id(), &tmp_mask);
> + cpumask_or(&pending_xpfo_flush, &pending_xpfo_flush, &tmp_mask);
> +}
> +
> void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> {
> struct flush_tlb_info info = {
> diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c
> index b42513347865..638eee5b1f09 100644
> --- a/arch/x86/mm/xpfo.c
> +++ b/arch/x86/mm/xpfo.c
> @@ -118,7 +118,7 @@ inline void xpfo_flush_kernel_tlb(struct page *page, int order)
> return;
> }
>
> - flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size);
> + xpfo_flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size);
> }
> EXPORT_SYMBOL_GPL(xpfo_flush_kernel_tlb);
So this patch is the one that makes it 'work', but I'm with Andy on
hating it something fierce.
Up until this point x86_64 is completely buggered in this series, after
this it sorta works but *urgh* what crap.
All in all your changelog is complete and utter garbage, this is _NOT_ a
performance issue. It is a very much a correctness issue.
Also; I distinctly dislike the inconsistent TLB states this generates.
It makes it very hard to argue for its correctness..
next prev parent reply other threads:[~2019-04-04 8:19 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-03 17:34 [RFC PATCH v9 00/13] Add support for eXclusive Page Frame Ownership Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 01/13] mm: add MAP_HUGETLB support to vm_mmap Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 02/13] x86: always set IF before oopsing from page fault Khalid Aziz
2019-04-04 0:12 ` Andy Lutomirski
2019-04-04 1:42 ` Tycho Andersen
2019-04-04 4:12 ` Andy Lutomirski
2019-04-04 15:47 ` Tycho Andersen
2019-04-04 16:23 ` Sebastian Andrzej Siewior
2019-04-04 16:28 ` Thomas Gleixner
2019-04-04 17:11 ` Andy Lutomirski
2019-04-03 17:34 ` [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO) Khalid Aziz
2019-04-04 7:21 ` Peter Zijlstra
2019-04-04 9:25 ` Peter Zijlstra
2019-04-04 14:48 ` Tycho Andersen
2019-04-04 7:43 ` Peter Zijlstra
2019-04-04 15:15 ` Khalid Aziz
2019-04-04 17:01 ` Peter Zijlstra
2019-04-17 16:15 ` Ingo Molnar
2019-04-17 16:49 ` Khalid Aziz
2019-04-17 17:09 ` Ingo Molnar
2019-04-17 17:19 ` Nadav Amit
2019-04-17 17:26 ` Ingo Molnar
2019-04-17 17:44 ` Nadav Amit
2019-04-17 21:19 ` Thomas Gleixner
2019-04-17 23:18 ` Linus Torvalds
2019-04-17 23:42 ` Thomas Gleixner
2019-04-17 23:52 ` Linus Torvalds
2019-04-18 4:41 ` Andy Lutomirski
2019-04-18 5:41 ` Kees Cook
2019-04-18 14:34 ` Khalid Aziz
2019-04-22 19:30 ` Khalid Aziz
2019-04-22 22:23 ` Kees Cook
2019-04-18 6:14 ` Thomas Gleixner
2019-04-17 17:33 ` Khalid Aziz
2019-04-17 19:49 ` Andy Lutomirski
2019-04-17 19:52 ` Tycho Andersen
2019-04-17 20:12 ` Khalid Aziz
2019-05-01 14:49 ` Waiman Long
2019-05-01 15:18 ` Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 04/13] xpfo, x86: Add support for XPFO for x86-64 Khalid Aziz
2019-04-04 7:52 ` Peter Zijlstra
2019-04-04 15:40 ` Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 05/13] mm: add a user_virt_to_phys symbol Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 06/13] lkdtm: Add test for XPFO Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 07/13] arm64/mm: Add support " Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 08/13] swiotlb: Map the buffer if it was unmapped by XPFO Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 09/13] xpfo: add primitives for mapping underlying memory Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 10/13] arm64/mm, xpfo: temporarily map dcache regions Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 11/13] xpfo, mm: optimize spinlock usage in xpfo_kunmap Khalid Aziz
2019-04-04 7:56 ` Peter Zijlstra
2019-04-04 16:06 ` Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only) Khalid Aziz
2019-04-04 4:10 ` Andy Lutomirski
[not found] ` <91f1dbce-332e-25d1-15f6-0e9cfc8b797b@oracle.com>
2019-04-05 7:17 ` Thomas Gleixner
2019-04-05 14:44 ` Dave Hansen
2019-04-05 15:24 ` Andy Lutomirski
2019-04-05 15:56 ` Tycho Andersen
2019-04-05 16:32 ` Andy Lutomirski
2019-04-05 15:56 ` Khalid Aziz
2019-04-05 16:01 ` Dave Hansen
2019-04-05 16:27 ` Andy Lutomirski
2019-04-05 16:41 ` Peter Zijlstra
2019-04-05 17:35 ` Khalid Aziz
2019-04-05 15:44 ` Khalid Aziz
2019-04-05 15:24 ` Andy Lutomirski
2019-04-04 8:18 ` Peter Zijlstra [this message]
2019-04-03 17:34 ` [RFC PATCH v9 13/13] xpfo, mm: Optimize XPFO TLB flushes by batching them together Khalid Aziz
2019-04-04 16:44 ` [RFC PATCH v9 00/13] Add support for eXclusive Page Frame Ownership Nadav Amit
2019-04-04 17:18 ` Khalid Aziz
2019-04-06 6:40 ` Jon Masters
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190404081847.GR4038@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=aaron.lu@intel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@linux.intel.com \
--cc=amir73il@gmail.com \
--cc=andrew.cooper3@citrix.com \
--cc=andreyknvl@google.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=anthony.yznaga@oracle.com \
--cc=ard.biesheuvel@linaro.org \
--cc=arnd@arndb.de \
--cc=arunks@codeaurora.org \
--cc=ben@decadent.org.uk \
--cc=bigeasy@linutronix.de \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=brgl@bgdev.pl \
--cc=catalin.marinas@arm.com \
--cc=chris.hyser@oracle.com \
--cc=corbet@lwn.net \
--cc=cpandya@codeaurora.org \
--cc=dan.j.williams@intel.com \
--cc=daniel.vetter@ffwll.ch \
--cc=dave.hansen@intel.com \
--cc=deepa.srinivasan@oracle.com \
--cc=dwmw@amazon.co.uk \
--cc=gregkh@linuxfoundation.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=hch@lst.de \
--cc=hpa@zytor.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=iommu@lists.linux-foundation.org \
--cc=james.morse@arm.com \
--cc=jannh@google.com \
--cc=jcm@redhat.com \
--cc=jgross@suse.com \
--cc=jkosina@suse.cz \
--cc=jmattson@google.com \
--cc=jmorris@namei.org \
--cc=joao.m.martins@oracle.com \
--cc=joe@perches.com \
--cc=john.haxby@oracle.com \
--cc=jrdr.linux@gmail.com \
--cc=jroedel@suse.de \
--cc=jsteckli@amazon.de \
--cc=juergh@gmail.com \
--cc=kanth.ghatraju@oracle.com \
--cc=keescook@google.com \
--cc=keith.busch@intel.com \
--cc=khalid.aziz@oracle.com \
--cc=khalid@gonehiking.org \
--cc=khlebnikov@yandex-team.ru \
--cc=kirill.shutemov@linux.intel.com \
--cc=konrad.wilk@oracle.com \
--cc=labbott@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-security-module@vger.kernel.org \
--cc=liran.alon@oracle.com \
--cc=logang@deltatee.com \
--cc=luto@kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=marco.antonio.780@gmail.com \
--cc=mark.rutland@arm.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=mhocko@suse.cz \
--cc=mike.kravetz@oracle.com \
--cc=mingo@redhat.com \
--cc=mst@redhat.com \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=paulmck@linux.vnet.ibm.com \
--cc=pavel.tatashin@microsoft.com \
--cc=pradeep.vincent@oracle.com \
--cc=rdunlap@infradead.org \
--cc=richard.weiyang@gmail.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=robin.murphy@arm.com \
--cc=rostedt@goodmis.org \
--cc=rppt@linux.vnet.ibm.com \
--cc=sai.praneeth.prakhya@intel.com \
--cc=serge@hallyn.com \
--cc=steve.capper@arm.com \
--cc=steven.sistare@oracle.com \
--cc=tglx@linutronix.de \
--cc=thymovanbeers@gmail.com \
--cc=tycho@tycho.ws \
--cc=tyhicks@canonical.com \
--cc=vbabka@suse.cz \
--cc=will.deacon@arm.com \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=yang.shi@linux.alibaba.com \
--cc=yaojun8558363@gmail.com \
--cc=ying.huang@intel.com \
--cc=zhangshaokun@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).