iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Khalid Aziz <khalid.aziz-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>,
	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>,
	steven.sistare-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
	Tycho Andersen <tycho-E0fblnxP3wo@public.gmane.org>,
	Andi Kleen <ak-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org,
	James Morris <jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org>,
	David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	anthony.yznaga-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	Rik van Riel <riel-ebMLmSuQjDVBDgjK7y7TUQ@public.gmane.org>,
	Nicholas Piggin <npiggin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Mike Rapoport
	<rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Greg KH
	<gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>,
	Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Souptick Joarder
	<jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Jiri Kosina <jkosina-AlSwsSmVLrQ@public.gmane.org>,
	Joe Perches <joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>,
	arunks-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	"Woodhouse, David" <dwmw-vV1OtcyAfmbQXOPxS62xeg@public.gmane.org>,
	Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.k>
Subject: Re: [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only)
Date: Thu, 4 Apr 2019 16:55:48 -0600	[thread overview]
Message-ID: <91f1dbce-332e-25d1-15f6-0e9cfc8b797b@oracle.com> (raw)
In-Reply-To: <CALCETrXMXxnWqN94d83UvGWhkD1BNWiwvH2vsUth1w0T3=0ywQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 4/3/19 10:10 PM, Andy Lutomirski wrote:
> On Wed, Apr 3, 2019 at 10:36 AM Khalid Aziz <khalid.aziz-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>
>> XPFO flushes kernel space TLB entries for pages that are now mapped
>> in userspace on not only the current CPU but also all other CPUs
>> synchronously. Processes on each core allocating pages causes a
>> flood of IPI messages to all other cores to flush TLB entries.
>> Many of these messages are to flush the entire TLB on the core if
>> the number of entries being flushed from local core exceeds
>> tlb_single_page_flush_ceiling. The cost of TLB flush caused by
>> unmapping pages from physmap goes up dramatically on machines with
>> high core count.
>>
>> This patch flushes relevant TLB entries for current process or
>> entire TLB depending upon number of entries for the current CPU
>> and posts a pending TLB flush on all other CPUs when a page is
>> unmapped from kernel space and mapped in userspace. Each core
>> checks the pending TLB flush flag for itself on every context
>> switch, flushes its TLB if the flag is set and clears it.
>> This patch potentially aggregates multiple TLB flushes into one.
>> This has very significant impact especially on machines with large
>> core counts.
> 
> Why is this a reasonable strategy?

Ideally when pages are unmapped from physmap, all CPUs would be sent IPI
synchronously to flush TLB entry for those pages immediately. This may
be ideal from correctness and consistency point of view, but it also
results in IPI storm and repeated TLB flushes on all processors. Any
time a page is allocated to userspace, we are going to go through this
and it is very expensive. On a 96-core server, performance degradation
is 26x!!

When xpfo unmaps a page from physmap only (after mapping the page in
userspace in response to an allocation request from userspace) on one
processor, there is a small window of opportunity for ret2dir attack on
other cpus until the TLB entry in physmap for the unmapped pages on
other cpus is cleared. Forcing that to happen synchronously is the
expensive part. A multiple of these requests can come in over a very
short time across multiple processors resulting in every cpu asking
every other cpusto flush TLB just to close this small window of
vulnerability in the kernel. If each request is processed synchronously,
each CPU will do multiple TLB flushes in short order. If we could
consolidate these TLB flush requests instead and do one TLB flush on
each cpu at the time of context switch, we can reduce the performance
impact significantly. This bears out in real life measuring the system
time when doing a parallel kernel build on a large server. Without this,
system time on 96-core server when doing "make -j60 all" went up 26x.
After this optimization, impact went down to 1.44x.

The trade-off with this strategy is, the kernel on a cpu is vulnerable
for a short time if the current running processor is the malicious
process. Is that an acceptable trade-off?

I am open to other ideas on reducing the performance impact due to xpfo.

> 
>> +void xpfo_flush_tlb_kernel_range(unsigned long start, unsigned long end)
>> +{
>> +       struct cpumask tmp_mask;
>> +
>> +       /*
>> +        * Balance as user space task's flush, a bit conservative.
>> +        * Do a local flush immediately and post a pending flush on all
>> +        * other CPUs. Local flush can be a range flush or full flush
>> +        * depending upon the number of entries to be flushed. Remote
>> +        * flushes will be done by individual processors at the time of
>> +        * context switch and this allows multiple flush requests from
>> +        * other CPUs to be batched together.
>> +        */
> 
> I don't like this function at all.  A core function like this is a
> contract of sorts between the caller and the implementation.  There is
> no such thing as an "xpfo" flush, and this function's behavior isn't
> at all well defined.  For flush_tlb_kernel_range(), I can tell you
> exactly what that function does, and the implementation is either
> correct or incorrect.  With this function, I have no idea what is
> actually required, and I can't possibly tell whether it's correct.
> 
> As far as I can see, xpfo_flush_tlb_kernel_range() actually means
> "flush this range on this CPU right now, and flush it on remote CPUs
> eventually".  It would be valid, but probably silly, to flush locally
> and to never flush at all on remote CPUs.  This makes me wonder what
> the point is.
> 

I would restate that as "flush this range on this cpu right now, and
flush it on remote cpus at the next context switch". A better name for
the routine and a better description is a reasonable change to make.

Thanks,
Khalid

  parent reply	other threads:[~2019-04-04 22:55 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-03 17:34 [RFC PATCH v9 00/13] Add support for eXclusive Page Frame Ownership Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 11/13] xpfo, mm: optimize spinlock usage in xpfo_kunmap Khalid Aziz
     [not found]   ` <5bab13e12d4215112ad2180106cc6bb9b513754a.1554248002.git.khalid.aziz-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2019-04-04  7:56     ` Peter Zijlstra
2019-04-04 16:06       ` Khalid Aziz
     [not found] ` <cover.1554248001.git.khalid.aziz-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2019-04-03 17:34   ` [RFC PATCH v9 01/13] mm: add MAP_HUGETLB support to vm_mmap Khalid Aziz
2019-04-03 17:34   ` [RFC PATCH v9 02/13] x86: always set IF before oopsing from page fault Khalid Aziz
     [not found]     ` <e6c57f675e5b53d4de266412aa526b7660c47918.1554248002.git.khalid.aziz-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2019-04-04  0:12       ` Andy Lutomirski
     [not found]         ` <CALCETrXvwuwkVSJ+S5s7wTBkNNj3fRVxpx9BvsXWrT=3ZdRnCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-04-04  1:42           ` Tycho Andersen
2019-04-04  4:12             ` Andy Lutomirski
     [not found]               ` <CALCETrVp37Xo3EMHkeedP1zxUMf9og=mceBa8c55e1F4G1DRSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-04-04 15:47                 ` Tycho Andersen
2019-04-04 16:23                   ` Sebastian Andrzej Siewior
2019-04-04 16:28                   ` Thomas Gleixner
     [not found]                     ` <alpine.DEB.2.21.1904041822320.1802-ecDvlHI5BZPZikZi3RtOZ1XZhhPuCNm+@public.gmane.org>
2019-04-04 17:11                       ` Andy Lutomirski
2019-04-03 17:34   ` [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO) Khalid Aziz
     [not found]     ` <f1ac3700970365fb979533294774af0b0dd84b3b.1554248002.git.khalid.aziz-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2019-04-04  7:21       ` Peter Zijlstra
     [not found]         ` <20190404072152.GN4038-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2019-04-04  9:25           ` Peter Zijlstra
2019-04-04 14:48           ` Tycho Andersen
2019-04-04  7:43       ` Peter Zijlstra
     [not found]         ` <20190404074323.GO4038-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2019-04-04 15:15           ` Khalid Aziz
     [not found]             ` <b414bacc-2883-1914-38ec-3d8f4a032e10-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2019-04-04 17:01               ` Peter Zijlstra
2019-04-17 16:15     ` Ingo Molnar
2019-04-17 16:15       ` Ingo Molnar
     [not found]       ` <20190417161042.GA43453-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-17 16:49         ` Khalid Aziz
2019-04-17 16:49           ` Khalid Aziz
2019-04-17 17:09           ` Ingo Molnar
2019-04-17 17:09             ` Ingo Molnar
2019-04-17 17:19             ` Nadav Amit
2019-04-17 17:19               ` Nadav Amit
2019-04-17 17:26               ` Ingo Molnar
2019-04-17 17:26                 ` Ingo Molnar
2019-04-17 17:44                 ` Nadav Amit
2019-04-17 17:44                   ` Nadav Amit
2019-04-17 21:19                   ` Thomas Gleixner
2019-04-17 21:19                     ` Thomas Gleixner
     [not found]                     ` <alpine.DEB.2.21.1904172317460.3174-ecDvlHI5BZPZikZi3RtOZ1XZhhPuCNm+@public.gmane.org>
2019-04-17 23:18                       ` Linus Torvalds
2019-04-17 23:18                         ` Linus Torvalds
2019-04-17 23:42                         ` Thomas Gleixner
2019-04-17 23:42                           ` Thomas Gleixner
2019-04-17 23:52                           ` Linus Torvalds
2019-04-17 23:52                             ` Linus Torvalds
2019-04-18  4:41                             ` Andy Lutomirski
2019-04-18  4:41                               ` Andy Lutomirski
2019-04-18  5:41                               ` Kees Cook
2019-04-18  5:41                                 ` Kees Cook via iommu
2019-04-18 14:34                                 ` Khalid Aziz
2019-04-18 14:34                                   ` Khalid Aziz
2019-04-22 19:30                                   ` Khalid Aziz
2019-04-22 19:30                                     ` Khalid Aziz
     [not found]                                   ` <8f9d059d-e720-cd24-faa6-45493fc012e0-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2019-04-22 22:23                                     ` Kees Cook via iommu
2019-04-22 22:23                                       ` Kees Cook via iommu
     [not found]                             ` <CAHk-=whUwOjFW6RjHVM8kNOv1QVLJuHj2Dda0=mpLPdJ1UyatQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-04-18  6:14                               ` Thomas Gleixner
2019-04-18  6:14                                 ` Thomas Gleixner
2019-04-17 17:33             ` Khalid Aziz
2019-04-17 17:33               ` Khalid Aziz
2019-04-17 19:49               ` Andy Lutomirski
2019-04-17 19:49                 ` Andy Lutomirski
     [not found]                 ` <CALCETrXFzWFMrV-zDa4QFjB=4WnC9RZmorBko65dLGhymDpeQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-04-17 19:52                   ` Tycho Andersen
2019-04-17 19:52                     ` Tycho Andersen
2019-04-17 20:12                   ` Khalid Aziz
2019-04-17 20:12                     ` Khalid Aziz
2019-05-01 14:49           ` Waiman Long
2019-05-01 14:49             ` Waiman Long
2019-05-01 15:18             ` Khalid Aziz
2019-05-01 15:18               ` Khalid Aziz
2019-04-03 17:34   ` [RFC PATCH v9 04/13] xpfo, x86: Add support for XPFO for x86-64 Khalid Aziz
     [not found]     ` <c15e7d09dfe3dfdb9947d39ed0ddd6573ff86dbf.1554248002.git.khalid.aziz-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2019-04-04  7:52       ` Peter Zijlstra
     [not found]         ` <20190404075206.GP4038-Nxj+rRp3nVydTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2019-04-04 15:40           ` Khalid Aziz
2019-04-03 17:34   ` [RFC PATCH v9 05/13] mm: add a user_virt_to_phys symbol Khalid Aziz
2019-04-03 17:34   ` [RFC PATCH v9 06/13] lkdtm: Add test for XPFO Khalid Aziz
2019-04-03 17:34   ` [RFC PATCH v9 07/13] arm64/mm: Add support " Khalid Aziz
2019-04-03 17:34   ` [RFC PATCH v9 08/13] swiotlb: Map the buffer if it was unmapped by XPFO Khalid Aziz
2019-04-03 17:34   ` [RFC PATCH v9 09/13] xpfo: add primitives for mapping underlying memory Khalid Aziz
2019-04-03 17:34   ` [RFC PATCH v9 10/13] arm64/mm, xpfo: temporarily map dcache regions Khalid Aziz
2019-04-03 17:34   ` [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only) Khalid Aziz
     [not found]     ` <4495dda4bfc4a06b3312cc4063915b306ecfaecb.1554248002.git.khalid.aziz-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2019-04-04  4:10       ` Andy Lutomirski
     [not found]         ` <CALCETrXMXxnWqN94d83UvGWhkD1BNWiwvH2vsUth1w0T3=0ywQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-04-04 22:55           ` Khalid Aziz [this message]
     [not found]             ` <91f1dbce-332e-25d1-15f6-0e9cfc8b797b-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2019-04-05  7:17               ` Thomas Gleixner
     [not found]                 ` <alpine.DEB.2.21.1904050909520.1802-ecDvlHI5BZPZikZi3RtOZ1XZhhPuCNm+@public.gmane.org>
2019-04-05 14:44                   ` Dave Hansen
     [not found]                     ` <26b00051-b03c-9fce-1446-52f0d6ed52f8-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2019-04-05 15:24                       ` Andy Lutomirski
2019-04-05 15:24                         ` Andy Lutomirski
     [not found]                         ` <DFA69954-3F0F-4B79-A9B5-893D33D87E51-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
2019-04-05 15:56                           ` Tycho Andersen
2019-04-05 15:56                             ` Tycho Andersen
2019-04-05 16:32                             ` Andy Lutomirski
2019-04-05 16:32                               ` Andy Lutomirski
2019-04-05 15:56                           ` Khalid Aziz
2019-04-05 15:56                             ` Khalid Aziz
2019-04-05 16:01                           ` Dave Hansen
2019-04-05 16:01                             ` Dave Hansen
     [not found]                             ` <36b999d4-adf6-08a3-2897-d77b9cba20f8-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2019-04-05 16:27                               ` Andy Lutomirski
2019-04-05 16:27                                 ` Andy Lutomirski
     [not found]                                 ` <E0BBD625-6FE0-4A8A-884B-E10FAFC3319E-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
2019-04-05 16:41                                   ` Peter Zijlstra
2019-04-05 16:41                                     ` Peter Zijlstra
2019-04-05 17:35                                   ` Khalid Aziz
2019-04-05 17:35                                     ` Khalid Aziz
2019-04-05 15:44                       ` Khalid Aziz
2019-04-05 15:44                         ` Khalid Aziz
2019-04-05 15:24               ` Andy Lutomirski
2019-04-05 15:24                 ` Andy Lutomirski
2019-04-04  8:18       ` Peter Zijlstra
2019-04-03 17:34   ` [RFC PATCH v9 13/13] xpfo, mm: Optimize XPFO TLB flushes by batching them together Khalid Aziz
2019-04-06  6:40   ` [RFC PATCH v9 00/13] Add support for eXclusive Page Frame Ownership Jon Masters
2019-04-06  6:40     ` Jon Masters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91f1dbce-332e-25d1-15f6-0e9cfc8b797b@oracle.com \
    --to=khalid.aziz-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=ak-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org \
    --cc=anthony.yznaga-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=arunks-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org \
    --cc=catalin.marinas-5wv7dgnIgG8@public.gmane.org \
    --cc=dwmw-vV1OtcyAfmbQXOPxS62xeg@public.gmane.org \
    --cc=gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=jkosina-AlSwsSmVLrQ@public.gmane.org \
    --cc=jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org \
    --cc=joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org \
    --cc=jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-doc@vger.k \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=mark.rutland-5wv7dgnIgG8@public.gmane.org \
    --cc=npiggin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=riel-ebMLmSuQjDVBDgjK7y7TUQ@public.gmane.org \
    --cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=steven.sistare-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=tycho-E0fblnxP3wo@public.gmane.org \
    --cc=will.deacon-5wv7dgnIgG8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).