linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jon Masters <jcm@redhat.com>
To: Khalid Aziz <khalid.aziz@oracle.com>
Cc: juergh@gmail.com, tycho@tycho.ws, jsteckli@amazon.de,
	ak@linux.intel.com, liran.alon@oracle.com, keescook@google.com,
	akpm@linux-foundation.org, konrad.wilk@oracle.com,
	deepa.srinivasan@oracle.com, chris.hyser@oracle.com,
	tyhicks@canonical.com, dwmw@amazon.co.uk,
	andrew.cooper3@citrix.com, boris.ostrovsky@oracle.com,
	kanth.ghatraju@oracle.com, joao.m.martins@oracle.com,
	jmattson@google.com, pradeep.vincent@oracle.com,
	john.haxby@oracle.com, tglx@linutronix.de,
	kirill.shutemov@linux.intel.com, hch@lst.de,
	steven.sistare@oracle.com, labbott@redhat.com, luto@kernel.org,
	dave.hansen@intel.com, peterz@infradead.org, aaron.lu@intel.com,
	alexander.h.duyck@linux.intel.com, amir73il@gmail.com,
	andreyknvl@google.com, aneesh.kumar@linux.ibm.com,
	anthony.yznaga@oracle.com, ard.biesheuvel@linaro.org,
	arnd@arndb.de, arunks@codeaurora.org, ben@decadent.org.uk,
	bigeasy@linutronix.de, bp@alien8.de, brgl@bgdev.pl,
	catalin.marinas@arm.com, corbet@lwn.net, cpandya@codeaurora.org,
	daniel.vetter@ffwll.ch, dan.j.williams@intel.com,
	gregkh@linuxfoundation.org, guro@fb.com, hannes@cmpxchg.org,
	hpa@zytor.com, iamjoonsoo.kim@lge.com, james.morse@arm.com,
	jannh@google.com, jgross@suse.com, jkosina@suse.cz,
	jmorris@namei.org, joe@perches.com, jrdr.linux@gmail.com,
	jroedel@suse.de, keith.busch@intel.com,
	khlebnikov@yandex-team.ru, logang@deltatee.com,
	marco.antonio.780@gmail.com, mark.rutland@arm.com,
	mgorman@techsingularity.net, mhocko@suse.com, mhocko@suse.cz,
	mike.kravetz@oracle.com, mingo@redhat.com, mst@redhat.com,
	m.szyprowski@samsung.com, npiggin@gmail.com, osalvador@suse.de,
	paulmck@linux.vnet.ibm.com, pavel.tatashin@microsoft.com,
	rdunlap@infradead.org, richard.weiyang@gmail.com,
	riel@surriel.com, rientjes@google.com, robin.murphy@arm.com,
	rostedt@goodmis.org, rppt@linux.vnet.ibm.com,
	sai.praneeth.prakhya@intel.com, serge@hallyn.com,
	steve.capper@arm.com, thymovanbeers@gmail.com, vbabka@suse.cz,
	will.deacon@arm.com, willy@infradead.org,
	yang.shi@linux.alibaba.com, yaojun8558363@gmail.com,
	ying.huang@intel.com, zhangshaokun@hisilicon.com,
	khalid@gonehiking.org, iommu@lists.linux-foundation.org,
	x86@kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-security-module@vger.kernel.org
Subject: Re: [RFC PATCH v9 00/13] Add support for eXclusive Page Frame Ownership
Date: Sat, 6 Apr 2019 02:40:39 -0400 (EDT)	[thread overview]
Message-ID: <80680B91-4EB8-4F23-B8CE-0156BC2C7DCA@redhat.com> (raw)
In-Reply-To: <cover.1554248001.git.khalid.aziz@oracle.com>

Khalid,

Thanks for these patches. We will test them on x86 and investigate the Arm pieces highlighted.

Jon.

-- 
Computer Architect


> On Apr 4, 2019, at 00:37, Khalid Aziz <khalid.aziz@oracle.com> wrote:
> 
> This is another update to the work Juerg, Tycho and Julian have
> done on XPFO. After the last round of updates, we were seeing very
> significant performance penalties when stale TLB entries were
> flushed actively after an XPFO TLB update.  Benchmark for measuring
> performance is kernel build using parallel make. To get full
> protection from ret2dir attackes, we must flush stale TLB entries.
> Performance penalty from flushing stale TLB entries goes up as the
> number of cores goes up. On a desktop class machine with only 4
> cores, enabling TLB flush for stale entries causes system time for
> "make -j4" to go up by a factor of 2.61x but on a larger machine
> with 96 cores, system time with "make -j60" goes up by a factor of
> 26.37x!  I have been working on reducing this performance penalty.
> 
> I implemented two solutions to reduce performance penalty and that
> has had large impact. XPFO code flushes TLB every time a page is
> allocated to userspace. It does so by sending IPIs to all processors
> to flush TLB. Back to back allocations of pages to userspace on
> multiple processors results in a storm of IPIs.  Each one of these
> incoming IPIs is handled by a processor by flushing its TLB. To
> reduce this IPI storm, I have added a per CPU flag that can be set
> to tell a processor to flush its TLB. A processor checks this flag
> on every context switch. If the flag is set, it flushes its TLB and
> clears the flag. This allows for multiple TLB flush requests to a
> single CPU to be combined into a single request. A kernel TLB entry
> for a page that has been allocated to userspace is flushed on all
> processors unlike the previous version of this patch. A processor
> could hold a stale kernel TLB entry that was removed on another
> processor until the next context switch. A local userspace page
> allocation by the currently running process could force the TLB
> flush earlier for such entries.
> 
> The other solution reduces the number of TLB flushes required, by
> performing TLB flush for multiple pages at one time when pages are
> refilled on the per-cpu freelist. If the pages being addedd to
> per-cpu freelist are marked for userspace allocation, TLB entries
> for these pages can be flushed upfront and pages tagged as currently
> unmapped. When any such page is allocated to userspace, there is no
> need to performa a TLB flush at that time any more. This batching of
> TLB flushes reduces performance imapct further. Similarly when
> these user pages are freed by userspace and added back to per-cpu
> free list, they are left unmapped and tagged so. This further
> optimization reduced performance impact from 1.32x to 1.28x for
> 96-core server and from 1.31x to 1.27x for a 4-core desktop.
> 
> I measured system time for parallel make with unmodified 4.20
> kernel, 4.20 with XPFO patches before these patches and then again
> after applying each of these patches. Here are the results:
> 
> Hardware: 96-core Intel Xeon Platinum 8160 CPU @ 2.10GHz, 768 GB RAM
> make -j60 all
> 
> 5.0                    913.862s
> 5.0+this patch series            1165.259ss    1.28x
> 
> 
> Hardware: 4-core Intel Core i5-3550 CPU @ 3.30GHz, 8G RAM
> make -j4 all
> 
> 5.0                    610.642s
> 5.0+this patch series            773.075s    1.27x
> 
> Performance with this patch set is good enough to use these as
> starting point for further refinement before we merge it into main
> kernel, hence RFC.
> 
> I have restructurerd the patches in this version to separate out
> architecture independent code. I folded much of the code
> improvement by Julian to not use page extension into patch 3. 
> 
> What remains to be done beyond this patch series:
> 
> 1. Performance improvements: Ideas to explore - (1) kernel mappings
>   private to an mm, (2) Any others??
> 2. Re-evaluate the patch "arm64/mm: Add support for XPFO to swiotlb"
>   from Juerg. I dropped it for now since swiotlb code for ARM has
>   changed a lot since this patch was written. I could use help
>   from ARM experts on this.
> 3. Extend the patch "xpfo, mm: Defer TLB flushes for non-current
>   CPUs" to other architectures besides x86.
> 4. Change kmap to not map the page back to physmap, instead map it
>   to a new va similar to what kmap_high does. Mapping page back
>   into physmap re-opens the ret2dir security for the duration of
>   kmap. All of the kmap_high and related code can be reused for this
>   but that will require restructuring that code so it can be built for
>   64-bits as well. Any objections to that?
> 
> ---------------------------------------------------------
> 
> Juerg Haefliger (6):
>  mm: Add support for eXclusive Page Frame Ownership (XPFO)
>  xpfo, x86: Add support for XPFO for x86-64
>  lkdtm: Add test for XPFO
>  arm64/mm: Add support for XPFO
>  swiotlb: Map the buffer if it was unmapped by XPFO
>  arm64/mm, xpfo: temporarily map dcache regions
> 
> Julian Stecklina (1):
>  xpfo, mm: optimize spinlock usage in xpfo_kunmap
> 
> Khalid Aziz (2):
>  xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only)
>  xpfo, mm: Optimize XPFO TLB flushes by batching them together
> 
> Tycho Andersen (4):
>  mm: add MAP_HUGETLB support to vm_mmap
>  x86: always set IF before oopsing from page fault
>  mm: add a user_virt_to_phys symbol
>  xpfo: add primitives for mapping underlying memory
> 
> .../admin-guide/kernel-parameters.txt         |   6 +
> arch/arm64/Kconfig                            |   1 +
> arch/arm64/mm/Makefile                        |   2 +
> arch/arm64/mm/flush.c                         |   7 +
> arch/arm64/mm/mmu.c                           |   2 +-
> arch/arm64/mm/xpfo.c                          |  66 ++++++
> arch/x86/Kconfig                              |   1 +
> arch/x86/include/asm/pgtable.h                |  26 +++
> arch/x86/include/asm/tlbflush.h               |   1 +
> arch/x86/mm/Makefile                          |   2 +
> arch/x86/mm/fault.c                           |   6 +
> arch/x86/mm/pageattr.c                        |  32 +--
> arch/x86/mm/tlb.c                             |  39 ++++
> arch/x86/mm/xpfo.c                            | 185 +++++++++++++++++
> drivers/misc/lkdtm/Makefile                   |   1 +
> drivers/misc/lkdtm/core.c                     |   3 +
> drivers/misc/lkdtm/lkdtm.h                    |   5 +
> drivers/misc/lkdtm/xpfo.c                     | 196 ++++++++++++++++++
> include/linux/highmem.h                       |  34 +--
> include/linux/mm.h                            |   2 +
> include/linux/mm_types.h                      |   8 +
> include/linux/page-flags.h                    |  23 +-
> include/linux/xpfo.h                          | 191 +++++++++++++++++
> include/trace/events/mmflags.h                |  10 +-
> kernel/dma/swiotlb.c                          |   3 +-
> mm/Makefile                                   |   1 +
> mm/compaction.c                               |   2 +-
> mm/internal.h                                 |   2 +-
> mm/mmap.c                                     |  19 +-
> mm/page_alloc.c                               |  19 +-
> mm/page_isolation.c                           |   2 +-
> mm/util.c                                     |  32 +++
> mm/xpfo.c                                     | 170 +++++++++++++++
> security/Kconfig                              |  27 +++
> 34 files changed, 1047 insertions(+), 79 deletions(-)
> create mode 100644 arch/arm64/mm/xpfo.c
> create mode 100644 arch/x86/mm/xpfo.c
> create mode 100644 drivers/misc/lkdtm/xpfo.c
> create mode 100644 include/linux/xpfo.h
> create mode 100644 mm/xpfo.c
> 
> -- 
> 2.17.1
> 


      parent reply	other threads:[~2019-04-06  6:40 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-03 17:34 [RFC PATCH v9 00/13] Add support for eXclusive Page Frame Ownership Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 01/13] mm: add MAP_HUGETLB support to vm_mmap Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 02/13] x86: always set IF before oopsing from page fault Khalid Aziz
2019-04-04  0:12   ` Andy Lutomirski
2019-04-04  1:42     ` Tycho Andersen
2019-04-04  4:12       ` Andy Lutomirski
2019-04-04 15:47         ` Tycho Andersen
2019-04-04 16:23           ` Sebastian Andrzej Siewior
2019-04-04 16:28           ` Thomas Gleixner
2019-04-04 17:11             ` Andy Lutomirski
2019-04-03 17:34 ` [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO) Khalid Aziz
2019-04-04  7:21   ` Peter Zijlstra
2019-04-04  9:25     ` Peter Zijlstra
2019-04-04 14:48     ` Tycho Andersen
2019-04-04  7:43   ` Peter Zijlstra
2019-04-04 15:15     ` Khalid Aziz
2019-04-04 17:01       ` Peter Zijlstra
2019-04-17 16:15   ` Ingo Molnar
2019-04-17 16:49     ` Khalid Aziz
2019-04-17 17:09       ` Ingo Molnar
2019-04-17 17:19         ` Nadav Amit
2019-04-17 17:26           ` Ingo Molnar
2019-04-17 17:44             ` Nadav Amit
2019-04-17 21:19               ` Thomas Gleixner
2019-04-17 23:18                 ` Linus Torvalds
2019-04-17 23:42                   ` Thomas Gleixner
2019-04-17 23:52                     ` Linus Torvalds
2019-04-18  4:41                       ` Andy Lutomirski
2019-04-18  5:41                         ` Kees Cook
2019-04-18 14:34                           ` Khalid Aziz
2019-04-22 19:30                             ` Khalid Aziz
2019-04-22 22:23                             ` Kees Cook
2019-04-18  6:14                       ` Thomas Gleixner
2019-04-17 17:33         ` Khalid Aziz
2019-04-17 19:49           ` Andy Lutomirski
2019-04-17 19:52             ` Tycho Andersen
2019-04-17 20:12             ` Khalid Aziz
2019-05-01 14:49       ` Waiman Long
2019-05-01 15:18         ` Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 04/13] xpfo, x86: Add support for XPFO for x86-64 Khalid Aziz
2019-04-04  7:52   ` Peter Zijlstra
2019-04-04 15:40     ` Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 05/13] mm: add a user_virt_to_phys symbol Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 06/13] lkdtm: Add test for XPFO Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 07/13] arm64/mm: Add support " Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 08/13] swiotlb: Map the buffer if it was unmapped by XPFO Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 09/13] xpfo: add primitives for mapping underlying memory Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 10/13] arm64/mm, xpfo: temporarily map dcache regions Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 11/13] xpfo, mm: optimize spinlock usage in xpfo_kunmap Khalid Aziz
2019-04-04  7:56   ` Peter Zijlstra
2019-04-04 16:06     ` Khalid Aziz
2019-04-03 17:34 ` [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only) Khalid Aziz
2019-04-04  4:10   ` Andy Lutomirski
     [not found]     ` <91f1dbce-332e-25d1-15f6-0e9cfc8b797b@oracle.com>
2019-04-05  7:17       ` Thomas Gleixner
2019-04-05 14:44         ` Dave Hansen
2019-04-05 15:24           ` Andy Lutomirski
2019-04-05 15:56             ` Tycho Andersen
2019-04-05 16:32               ` Andy Lutomirski
2019-04-05 15:56             ` Khalid Aziz
2019-04-05 16:01             ` Dave Hansen
2019-04-05 16:27               ` Andy Lutomirski
2019-04-05 16:41                 ` Peter Zijlstra
2019-04-05 17:35                 ` Khalid Aziz
2019-04-05 15:44           ` Khalid Aziz
2019-04-05 15:24       ` Andy Lutomirski
2019-04-04  8:18   ` Peter Zijlstra
2019-04-03 17:34 ` [RFC PATCH v9 13/13] xpfo, mm: Optimize XPFO TLB flushes by batching them together Khalid Aziz
2019-04-04 16:44 ` [RFC PATCH v9 00/13] Add support for eXclusive Page Frame Ownership Nadav Amit
2019-04-04 17:18   ` Khalid Aziz
2019-04-06  6:40 ` Jon Masters [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80680B91-4EB8-4F23-B8CE-0156BC2C7DCA@redhat.com \
    --to=jcm@redhat.com \
    --cc=aaron.lu@intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=amir73il@gmail.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=andreyknvl@google.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=arnd@arndb.de \
    --cc=arunks@codeaurora.org \
    --cc=ben@decadent.org.uk \
    --cc=bigeasy@linutronix.de \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=brgl@bgdev.pl \
    --cc=catalin.marinas@arm.com \
    --cc=chris.hyser@oracle.com \
    --cc=corbet@lwn.net \
    --cc=cpandya@codeaurora.org \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dave.hansen@intel.com \
    --cc=deepa.srinivasan@oracle.com \
    --cc=dwmw@amazon.co.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=james.morse@arm.com \
    --cc=jannh@google.com \
    --cc=jgross@suse.com \
    --cc=jkosina@suse.cz \
    --cc=jmattson@google.com \
    --cc=jmorris@namei.org \
    --cc=joao.m.martins@oracle.com \
    --cc=joe@perches.com \
    --cc=john.haxby@oracle.com \
    --cc=jrdr.linux@gmail.com \
    --cc=jroedel@suse.de \
    --cc=jsteckli@amazon.de \
    --cc=juergh@gmail.com \
    --cc=kanth.ghatraju@oracle.com \
    --cc=keescook@google.com \
    --cc=keith.busch@intel.com \
    --cc=khalid.aziz@oracle.com \
    --cc=khalid@gonehiking.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=labbott@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=liran.alon@oracle.com \
    --cc=logang@deltatee.com \
    --cc=luto@kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=marco.antonio.780@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mhocko@suse.cz \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pavel.tatashin@microsoft.com \
    --cc=peterz@infradead.org \
    --cc=pradeep.vincent@oracle.com \
    --cc=rdunlap@infradead.org \
    --cc=richard.weiyang@gmail.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=robin.murphy@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sai.praneeth.prakhya@intel.com \
    --cc=serge@hallyn.com \
    --cc=steve.capper@arm.com \
    --cc=steven.sistare@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=thymovanbeers@gmail.com \
    --cc=tycho@tycho.ws \
    --cc=tyhicks@canonical.com \
    --cc=vbabka@suse.cz \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=yaojun8558363@gmail.com \
    --cc=ying.huang@intel.com \
    --cc=zhangshaokun@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).