All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Nadav Amit <namit@vmware.com>
Subject: [RESEND PATCH v3 0/5] mm/mprotect: avoid unnecessary TLB flushes
Date: Fri, 11 Mar 2022 11:07:44 -0800	[thread overview]
Message-ID: <20220311190749.338281-1-namit@vmware.com> (raw)

From: Nadav Amit <namit@vmware.com>

This patch-set is intended to remove unnecessary TLB flushes during
mprotect() syscalls. Once this patch-set make it through, similar
and further optimizations for MADV_COLD and userfaultfd would be
possible.

Sorry for the time between it took me to get to v3.

Basically, there are 3 optimizations in this patch-set:
1. Use TLB batching infrastructure to batch flushes across VMAs and
   do better/fewer flushes. This would also be handy for later
   userfaultfd enhancements.
2. Avoid TLB flushes on permission demotion. This optimization is
   the one that provides most of the performance benefits. Note that
   the previous batching infrastructure changes are needed for that to
   happen.
3. Avoiding TLB flushes on change_huge_pmd() that are only needed to
   prevent the A/D bits from changing.

Andrew asked for some benchmark numbers. I do not have an easy
determinate macrobenchmark in which it is easy to show benefit. I therre
ran a microbenchmark: a loop that does the following on anonymous
memory, just as a sanity check to see that time is saved by avoiding TLB
flushes. The loop goes:

	mprotect(p, PAGE_SIZE, PROT_READ)
	mprotect(p, PAGE_SIZE, PROT_READ|PROT_WRITE)
	*p = 0; // make the page writable

The test was run in KVM guest with 1 or 2 threads (the second thread
was busy-looping). I measured the time (cycles) of each operation:

		1 thread		2 threads
		mmots	+patch		mmots	+patch
PROT_READ	3494	2725 (-22%)	8630	7788 (-10%)
PROT_READ|WRITE	3952	2724 (-31%)	9075	2865 (-68%)

[ mmots = v5.17-rc6-mmots-2022-03-06-20-38 ]

The exact numbers are really meaningless, but the benefit is clear.
There are 2 interesting results though. 

(1) PROT_READ is cheaper, while one can expect it not to be affected.
This is presumably due to TLB miss that is saved

(2) Without memory access (*p = 0), the speedup of the patch is even
greater. In that scenario mprotect(PROT_READ) also avoids the TLB flush.
As a result both operations on the patched kernel take roughly ~1500
cycles (with either 1 or 2 threads), whereas on mmotm their cost is as
high as presented in the table.

--

v2 -> v3:
* Fix orders of patches (order could lead to breakage)
* Better comments
* Clearer KNL detection [Dave]
* Assertion on PF error-code [Dave]
* Comments, code, function names improvements [PeterZ]
* Flush on access-bit clearing on PMD changes to follow the way
  flushing on x86 is done today in the kernel.

v1 -> v2:
* Wrong detection of permission demotion [Andrea]
* Better comments [Andrea]
* Handle THP [Andrea]
* Batching across VMAs [Peter Xu]
* Avoid open-coding PTE analysis
* Fix wrong use of the mmu_gather()


*** BLURB HERE ***

Nadav Amit (5):
  x86: Detection of Knights Landing A/D leak
  x86/mm: check exec permissions on fault
  mm/mprotect: use mmu_gather
  mm/mprotect: do not flush on permission promotion
  mm: avoid unnecessary flush on change_huge_pmd()

 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/pgtable.h       |  5 ++
 arch/x86/include/asm/pgtable_types.h |  2 +
 arch/x86/include/asm/tlbflush.h      | 82 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel.c          |  5 ++
 arch/x86/mm/fault.c                  | 22 ++++++-
 arch/x86/mm/pgtable.c                | 10 +++
 fs/exec.c                            |  6 +-
 include/asm-generic/tlb.h            | 14 +++++
 include/linux/huge_mm.h              |  5 +-
 include/linux/mm.h                   |  5 +-
 include/linux/pgtable.h              | 20 ++++++
 mm/huge_memory.c                     | 19 ++++--
 mm/mprotect.c                        | 94 +++++++++++++++-------------
 mm/pgtable-generic.c                 |  8 +++
 mm/userfaultfd.c                     |  6 +-
 16 files changed, 248 insertions(+), 56 deletions(-)

-- 
2.25.1


             reply	other threads:[~2022-03-11 19:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 19:07 Nadav Amit [this message]
2022-03-11 19:07 ` [RESEND PATCH v3 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault Nadav Amit
2022-03-11 19:41   ` Dave Hansen
2022-03-11 20:38     ` Nadav Amit
2022-03-11 20:59       ` Dave Hansen
2022-03-11 21:16         ` Nadav Amit
2022-03-11 21:23           ` Dave Hansen
2022-03-11 19:07 ` [RESEND PATCH v3 3/5] mm/mprotect: use mmu_gather Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 4/5] mm/mprotect: do not flush on permission promotion Nadav Amit
2022-03-11 22:45   ` Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit
2022-03-11 20:41   ` Dave Hansen
2022-03-11 20:53     ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220311190749.338281-1-namit@vmware.com \
    --to=nadav.amit@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=namit@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.