linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	peterx@redhat.com, Jerome Glisse <jglisse@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Hugh Dickins <hughd@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Nadav Amit <nadav.amit@gmail.com>
Subject: [PATCH RFC 00/30] userfaultfd-wp: Support shmem and hugetlbfs
Date: Fri, 15 Jan 2021 12:08:37 -0500	[thread overview]
Message-ID: <20210115170907.24498-1-peterx@redhat.com> (raw)

This is a RFC series to support userfaultfd upon shmem and hugetlbfs.

PS. Note that there's a known issue [0] with tlb against uffd-wp/soft-dirty in
general and Nadav is working on it.  It may or may not directly affect
shmem/hugetlbfs since there're no COW on shared mappings normally.  Private
shmem could hit, but still that's another problem to solve in general, and this
RFC is majorly to see whether there's any objection on the concept of the idea
specific to uffd-wp on shmem/hugetlbfs.

The whole series can also be found online [1].

The major comment I'd like to get is on the new idea of swap special pte.  That
comes from suggestions from both Hugh and Andrea and I appreciated a lot for
those discussions.

In short, it's a new type of pte that doesn't exist in the past, while used in
file-backed memories to persist information across ptes being erased (but the
page cache could still exist, for example, so in the next page fault we can
reload the page cache with that specific information when necessary).

I'm copy-pasting some commit message from the patch "mm/swap: Introduce the
idea of special swap ptes", where uffd-wp becomes the first user of it:

    We used to have special swap entries, like migration entries, hw-poison
    entries, device private entries, etc.

    Those "special swap entries" reside in the range that they need to be at least
    swap entries first, and their types are decided by swp_type(entry).

    This patch introduces another idea called "special swap ptes".

    It's very easy to get confused against "special swap entries", but a speical
    swap pte should never contain a swap entry at all.  It means, it's illegal to
    call pte_to_swp_entry() upon a special swap pte.

    Make the uffd-wp special pte to be the first special swap pte.

    Before this patch, is_swap_pte()==true means one of the below:

       (a.1) The pte has a normal swap entry (non_swap_entry()==false).  For
             example, when an anonymous page got swapped out.

       (a.2) The pte has a special swap entry (non_swap_entry()==true).  For
             example, a migration entry, a hw-poison entry, etc.

    After this patch, is_swap_pte()==true means one of the below, where case (b) is
    added:

     (a) The pte contains a swap entry.

       (a.1) The pte has a normal swap entry (non_swap_entry()==false).  For
             example, when an anonymous page got swapped out.

       (a.2) The pte has a special swap entry (non_swap_entry()==true).  For
             example, a migration entry, a hw-poison entry, etc.

     (b) The pte does not contain a swap entry at all (so it cannot be passed
         into pte_to_swp_entry()).  For example, uffd-wp special swap pte.

Hugetlbfs needs similar thing because it's also file-backed.  I directly reused
the same special pte there, though the shmem/hugetlb change on supporting this
new pte is different since they don't share code path a lot.

Patch layout
============

Part (1): some fixes that I observed when working on this; feel free to skip
them for now becuase I think they're corner cases and irrelevant of the major
change:

  mm/thp: Simplify copying of huge zero page pmd when fork
  mm/userfaultfd: Fix uffd-wp special cases for fork()
  mm/userfaultfd: Fix a few thp pmd missing uffd-wp bit

Part (2): Shmem support, this is where the special swap pte is introduced.
Some zap rework is needed within the process:

  shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  mm: Clear vmf->pte after pte_unmap_same() returns
  mm/userfaultfd: Introduce special pte for unmapped file-backed mem
  mm/swap: Introduce the idea of special swap ptes
  shmem/userfaultfd: Handle uffd-wp special pte in page fault handler
  mm: Drop first_index/last_index in zap_details
  mm: Introduce zap_details.zap_flags
  mm: Introduce ZAP_FLAG_SKIP_SWAP
  mm: Pass zap_flags into unmap_mapping_pages()
  shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed
  shmem/userfaultfd: Allow wr-protect none pte for file-backed mem
  shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps
  shmem/userfaultfd: Handle the left-overed special swap ptes
  shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()

Part (3): Hugetlb support, we need to disable huge pmd sharing for uffd-wp
because not compatible just like uffd minor mode.  The rest is the changes
required to teach hugetlbfs understand the special swap pte too that introduced
with the uffd-wp change:

  hugetlb/userfaultfd: Hook page faults for uffd write protection
  hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT
  hugetlb: Pass vma into huge_pte_alloc()
  hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled
  mm/hugetlb: Introduce huge version of special swap pte helpers
  mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h
  hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp
  hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler
  hugetlb/userfaultfd: Allow wr-protect none ptes
  hugetlb/userfaultfd: Only drop uffd-wp special pte if required

Part (4): Enable both features in code and test

  userfaultfd: Enable write protection for shmem & hugetlbfs
  userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs

Tests
=========

I've tested it using either userfaultfd kselftest program, but also with
umapsort [2] which should be even stricter.  No complicated mm setup is tested
yet besides page swapping in/out, but in all cases we need to have more tests
when it becomes non-RFC.

If anyone would like to try umapsort, need to use an extremely hacked version
of umap library [3], because by default umap only supports anonymous.  So to
test it we need to build [3] then [2].

Any comment would be greatly welcomed.  Thanks,

[0] https://lore.kernel.org/lkml/20201225092529.3228466-1-namit@vmware.com/
[1] https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs
[2] https://github.com/LLNL/umap-apps
[3] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs

Peter Xu (30):
  mm/thp: Simplify copying of huge zero page pmd when fork
  mm/userfaultfd: Fix uffd-wp special cases for fork()
  mm/userfaultfd: Fix a few thp pmd missing uffd-wp bit
  shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  mm: Clear vmf->pte after pte_unmap_same() returns
  mm/userfaultfd: Introduce special pte for unmapped file-backed mem
  mm/swap: Introduce the idea of special swap ptes
  shmem/userfaultfd: Handle uffd-wp special pte in page fault handler
  mm: Drop first_index/last_index in zap_details
  mm: Introduce zap_details.zap_flags
  mm: Introduce ZAP_FLAG_SKIP_SWAP
  mm: Pass zap_flags into unmap_mapping_pages()
  shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed
  shmem/userfaultfd: Allow wr-protect none pte for file-backed mem
  shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on
    thps
  shmem/userfaultfd: Handle the left-overed special swap ptes
  shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()
  hugetlb/userfaultfd: Hook page faults for uffd write protection
  hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT
  hugetlb: Pass vma into huge_pte_alloc()
  hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled
  mm/hugetlb: Introduce huge version of special swap pte helpers
  mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h
  hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp
  hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler
  hugetlb/userfaultfd: Allow wr-protect none ptes
  hugetlb/userfaultfd: Only drop uffd-wp special pte if required
  userfaultfd: Enable write protection for shmem & hugetlbfs
  userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs

 arch/arm64/mm/hugetlbpage.c              |   5 +-
 arch/ia64/mm/hugetlbpage.c               |   3 +-
 arch/mips/mm/hugetlbpage.c               |   4 +-
 arch/parisc/mm/hugetlbpage.c             |   2 +-
 arch/powerpc/mm/hugetlbpage.c            |   3 +-
 arch/s390/mm/hugetlbpage.c               |   2 +-
 arch/sh/mm/hugetlbpage.c                 |   2 +-
 arch/sparc/mm/hugetlbpage.c              |   2 +-
 arch/x86/include/asm/pgtable.h           |  28 +++
 fs/dax.c                                 |  10 +-
 fs/hugetlbfs/inode.c                     |  15 +-
 fs/proc/task_mmu.c                       |  14 +-
 fs/userfaultfd.c                         |  80 +++++--
 include/asm-generic/hugetlb.h            |  10 +
 include/asm-generic/pgtable_uffd.h       |   3 +
 include/linux/huge_mm.h                  |   3 +-
 include/linux/hugetlb.h                  |  47 +++-
 include/linux/mm.h                       |  50 +++-
 include/linux/mm_inline.h                |  43 ++++
 include/linux/mmu_notifier.h             |   1 +
 include/linux/shmem_fs.h                 |   5 +-
 include/linux/swapops.h                  |  41 +++-
 include/linux/userfaultfd_k.h            |  37 +++
 include/uapi/linux/userfaultfd.h         |   3 +-
 mm/huge_memory.c                         |  36 ++-
 mm/hugetlb.c                             | 174 +++++++++++---
 mm/khugepaged.c                          |  14 +-
 mm/memcontrol.c                          |   2 +-
 mm/memory.c                              | 277 ++++++++++++++++++-----
 mm/migrate.c                             |   2 +-
 mm/mprotect.c                            |  63 +++++-
 mm/mremap.c                              |   2 +-
 mm/page_vma_mapped.c                     |   6 +-
 mm/rmap.c                                |   8 +
 mm/shmem.c                               |  39 +++-
 mm/truncate.c                            |  17 +-
 mm/userfaultfd.c                         |  37 +--
 tools/testing/selftests/vm/userfaultfd.c |  14 +-
 38 files changed, 881 insertions(+), 223 deletions(-)

-- 
2.26.2



             reply	other threads:[~2021-01-15 17:10 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-15 17:08 Peter Xu [this message]
2021-01-15 17:08 ` [PATCH RFC 01/30] mm/thp: Simplify copying of huge zero page pmd when fork Peter Xu
2021-01-15 17:08 ` [PATCH RFC 02/30] mm/userfaultfd: Fix uffd-wp special cases for fork() Peter Xu
2021-01-15 17:08 ` [PATCH RFC 03/30] mm/userfaultfd: Fix a few thp pmd missing uffd-wp bit Peter Xu
2021-01-15 17:08 ` [PATCH RFC 04/30] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-01-15 17:08 ` [PATCH RFC 05/30] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-01-15 17:08 ` [PATCH RFC 06/30] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-01-15 17:08 ` [PATCH RFC 07/30] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-01-18 19:40   ` Jason Gunthorpe
2021-01-19 14:24     ` Peter Xu
2021-01-15 17:08 ` [PATCH RFC 08/30] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-01-15 17:08 ` [PATCH RFC 09/30] mm: Drop first_index/last_index in zap_details Peter Xu
2021-01-15 17:08 ` [PATCH RFC 10/30] mm: Introduce zap_details.zap_flags Peter Xu
2021-01-15 17:08 ` [PATCH RFC 11/30] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-01-15 17:08 ` [PATCH RFC 12/30] mm: Pass zap_flags into unmap_mapping_pages() Peter Xu
2021-01-15 17:08 ` [PATCH RFC 13/30] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-01-15 17:08 ` [PATCH RFC 14/30] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
2021-01-15 17:08 ` [PATCH RFC 15/30] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-01-15 17:08 ` [PATCH RFC 16/30] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-01-15 17:08 ` [PATCH RFC 17/30] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-01-15 17:08 ` [PATCH RFC 18/30] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-01-15 17:08 ` [PATCH RFC 19/30] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-01-15 17:08 ` [PATCH RFC 20/30] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-01-15 17:08 ` [PATCH RFC 21/30] hugetlb: Pass vma into huge_pte_alloc() Peter Xu
2021-01-28 22:59   ` Axel Rasmussen
2021-01-29 22:31     ` Peter Xu
2021-01-30  8:08       ` Axel Rasmussen
2021-01-15 17:08 ` [PATCH RFC 22/30] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Peter Xu
2021-01-15 17:09 ` [PATCH RFC 23/30] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-01-15 17:09 ` [PATCH RFC 24/30] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Peter Xu
2021-01-15 17:09 ` [PATCH RFC 25/30] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Peter Xu
2021-01-15 17:09 ` [PATCH RFC 26/30] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-01-15 17:09 ` [PATCH RFC 27/30] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-01-15 17:09 ` [PATCH RFC 28/30] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
2021-01-15 17:09 ` [PATCH RFC 29/30] userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-01-15 17:12 ` [PATCH RFC 30/30] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
2021-01-29 22:49 ` [PATCH RFC 00/30] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-02-05 21:53   ` Mike Kravetz
2021-02-06  2:36     ` Peter Xu
2021-02-09 19:29       ` Mike Kravetz
2021-02-09 22:00         ` Peter Xu
2021-02-05 22:21   ` Hugh Dickins
2021-02-06  2:47     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210115170907.24498-1-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).