[PATCH] mm: improve mprotect(R|W) efficiency on pages referenced once

* [PATCH] mm: improve mprotect(R|W) efficiency on pages referenced once
@ 2020-12-12  5:31 Peter Collingbourne
  2020-12-23 22:34 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Collingbourne @ 2020-12-12  5:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Collingbourne, Kostya Kortchinsky, linux-mm

In the Scudo memory allocator [1] we would like to be able to
detect use-after-free vulnerabilities involving large allocations
by issuing mprotect(PROT_NONE) on the memory region used for the
allocation when it is deallocated. Later on, after the memory
region has been "quarantined" for a sufficient period of time we
would like to be able to use it for another allocation by issuing
mprotect(PROT_READ|PROT_WRITE).

Before this patch, after removing the write protection, any writes
to the memory region would result in page faults and entering
the copy-on-write code path, even in the usual case where the
pages are only referenced by a single PTE, harming performance
unnecessarily. Make it so that any pages in anonymous mappings that
are only referenced by a single PTE are immediately made writable
during the mprotect so that we can avoid the page faults.

This program shows the critical syscall sequence that we intend to
use in the allocator:

  #include <string.h>
  #include <sys/mman.h>

  enum { kSize = 131072 };

  int main(int argc, char **argv) {
    char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE,
                              MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
    for (int i = 0; i != 100000; ++i) {
      memset(addr, i, kSize);
      mprotect((void *)addr, kSize, PROT_NONE);
      mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE);
    }
  }

The effect of this patch on the above program was measured on a
DragonBoard 845c by taking the median real time execution time of
10 runs.

Before: 3.19s
After:  0.79s

The effect was also measured using one of the microbenchmarks that
we normally use to benchmark the allocator [2], after modifying it
to make the appropriate mprotect calls [3]. With an allocation size
of 131072 bytes to trigger the allocator's "large allocation" code
path the per-iteration time was measured as follows:

Before: 33364ns
After:   6886ns

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/I98d75ef90e20330c578871c87494d64b1df3f1b8
Link: [1] https://source.android.com/devices/tech/debug/scudo
Link: [2] https://cs.android.com/android/platform/superproject/+/master:bionic/benchmarks/stdlib_benchmark.cpp;l=53;drc=e8693e78711e8f45ccd2b610e4dbe0b94d551cc9
Link: [3] https://github.com/pcc/llvm-project/commit/scudo-mprotect-secondary
---
 mm/mprotect.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 56c02beb6041..6f5313d66d00 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -47,6 +47,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
 	bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
 	bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
+	bool anon_writable =
+		vma_is_anonymous(vma) && (vma->vm_flags & VM_WRITE);
 
 	/*
 	 * Can be called with only the mmap_lock for reading by
@@ -136,7 +138,11 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 					(pte_soft_dirty(ptent) ||
 					 !(vma->vm_flags & VM_SOFTDIRTY))) {
 				ptent = pte_mkwrite(ptent);
+			} else if (anon_writable &&
+				   page_mapcount(pte_page(ptent)) == 1) {
+				ptent = pte_mkwrite(ptent);
 			}
+
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
 			pages++;
 		} else if (is_swap_pte(oldpte)) {
-- 
2.29.2.576.ga3fc446d84-goog



^ permalink raw reply related	[flat|nested] 5+ messages in thread