linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Axel Rasmussen <axelrasmussen@google.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Hugh Dickins <hughd@google.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Alistair Popple <apopple@nvidia.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	peterx@redhat.com, David Hildenbrand <david@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: [PATCH v6 00/23] userfaultfd-wp: Support shmem and hugetlbfs
Date: Mon, 15 Nov 2021 15:54:59 +0800	[thread overview]
Message-ID: <20211115075522.73795-1-peterx@redhat.com> (raw)

This is v6 of the series to add shmem+hugetlbfs support for userfaultfd write
protection.  It is based on v5.16-rc1 (fa55b7dcdc43), with below two patches
applied first:

  Subject: [PATCH RFC 0/2] mm: Rework zap ptes on swap entries
  https://lore.kernel.org/lkml/20211110082952.19266-1-peterx@redhat.com/

The whole tree can be found here for testing:

  https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs

Previous versions:

  RFC: https://lore.kernel.org/lkml/20210115170907.24498-1-peterx@redhat.com/
  v1:  https://lore.kernel.org/lkml/20210323004912.35132-1-peterx@redhat.com/
  v2:  https://lore.kernel.org/lkml/20210427161317.50682-1-peterx@redhat.com/
  v3:  https://lore.kernel.org/lkml/20210527201927.29586-1-peterx@redhat.com/
  v4:  https://lore.kernel.org/lkml/20210714222117.47648-1-peterx@redhat.com/
  v5:  https://lore.kernel.org/lkml/20210715201422.211004-1-peterx@redhat.com/

Overview
==================

This is the first version of this work to rebase the uffd-wp logic work upon
PTE markers.  The major logic will be the same as v5, but since there're quite
a few minor changes here and there, I decided to not provide a change log at
all as it'll stop to be helpful.  However I should have addressed all the
comments that were raised by reviewers, please shoot if I missed something.  I
still kept many of the Mike's Review-By tag when there's merely no change to
the patch content (I touched up quite a few commit messages), but it'll be nice
if Mike could still went over the patches even if there're R-bs standing.

PTE marker is a new type of swap entry that is ony applicable to file-backed
memories like shmem and hugetlbfs.  It's used to persist some pte-level
information even if the original present ptes in pgtable are zapped.  These
information could be one of:

  (1) Userfaultfd wr-protect information
  (2) PTE soft-dirty information
  (3) Or others

This series only uses the marker to store uffd-wp information across temporary
zappings of shmem/hugetlbfs pgtables, for example, when a shmem thp is split.
So even if ptes are temporarily zapped, the wr-protect information can still be
kept within the pgtables.  Then when the page fault triggers again, we'll know
this pte is wr-protected so we can treat the pte the same as a normal uffd
wr-protected pte.

The extra information is encoded into the swap entry, or swp_offset to be
explicit, with the swp_type being PTE_MARKER.  So far uffd-wp only uses one bit
out of the swap entry, the rest bits of swp_offset are still reserved for other
purposes.

There're two configs to enable/disable PTE markers:

  CONFIG_PTE_MARKER
  CONFIG_PTE_MARKER_UFFD_WP

We can set !PTE_MARKER to completely disable all the PTE markers, along with
uffd-wp support.  I made two config so we can also enable PTE marker but
disable uffd-wp file-backed for other purposes.  At the end of current series,
I'll enable CONFIG_PTE_MARKER by default, but that patch is standalone and if
anyone worries about having it by default, we can also consider turn it off by
dropping that oneliner patch.  So far I don't see a huge risk of doing so, so I
kept that patch.

In most cases, PTE markers should be treated as none ptes.  It is because that
unlike most of the other swap entry types, there's no PFN or block offset
information encoded into PTE markers but some extra well-defined bits showing
the status of the pte.  These bits should only be used as extra data when
servicing an upcoming page fault, and that should be it.

I did spend a lot of time observing all the pte_none() users this time. It is
indeed a challenge because there're a lot, and I hope I didn't miss a single of
them when we should take care of pte markers.  Luckily, I don't think it'll
need to be considered in many cases, for example: boot code, arch code
(especially non-x86), kernel-only page handlings (e.g. CPA), or device driver
codes when we're tackling with pure PFN mappings.

I introduced pte_none_mostly() in this series when we need to handle pte
markers the same as none pte, the "mostly" is the other way to write "either
none pte or a pte marker".

I didn't replace pte_none() to cover pte markers for below reasons:

  - Very rare case of pte_none() callers will handle pte markers.  E.g., all
    the kernel pages do not require knowledge of pte markers.  So we don't
    pollute the major use cases.

  - Unconditionally change pte_none() semantics could confuse people, because
    pte_none() existed for so long a time.

  - Unconditionally change pte_none() semantics could make pte_none() slower
    even if in many cases pte markers do not exist.

  - There're cases where we'd like to handle pte markers differntly from
    pte_none(), so a full replace is also impossible.  E.g. khugepaged should
    still treat pte markers as normal swap ptes rather than none ptes, because
    pte markers will always need a fault-in to merge the marker with a valid
    pte.  Or the smap code will need to parse PTE markers not none ptes.

Patch Layout
============

Introducing PTE marker and uffd-wp bit in PTE marker:

  mm: Introduce PTE_MARKER swap entry
  mm: Teach core mm about pte markers
  mm: Check against orig_pte for finish_fault()
  mm/uffd: PTE_MARKER_UFFD_WP

Adding support for shmem uffd-wp:

  mm/shmem: Take care of UFFDIO_COPY_MODE_WP
  mm/shmem: Handle uffd-wp special pte in page fault handler
  mm/shmem: Persist uffd-wp bit across zapping for file-backed
  mm/shmem: Allow uffd wr-protect none pte for file-backed mem
  mm/shmem: Allows file-back mem to be uffd wr-protected on thps
  mm/shmem: Handle uffd-wp during fork()

Adding support for hugetlbfs uffd-wp:

  mm/hugetlb: Introduce huge pte version of uffd-wp helpers
  mm/hugetlb: Hook page faults for uffd write protection
  mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP
  mm/hugetlb: Handle UFFDIO_WRITEPROTECT
  mm/hugetlb: Handle pte markers in page faults
  mm/hugetlb: Allow uffd wr-protect none ptes
  mm/hugetlb: Only drop uffd-wp special pte if required
  mm/hugetlb: Handle uffd-wp during fork()

Misc handling on the rest mm for uffd-wp file-backed:

  mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered
  mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs

Enabling of uffd-wp on file-backed memory:

  mm/uffd: Enable write protection for shmem & hugetlbfs
  mm: Enable PTE markers by default
  selftests/uffd: Enable uffd-wp for shmem/hugetlbfs

Tests
==============

- x86_64
  - Compile tested on:
    - PTE_MARKER && PTE_MARKER_UFFD_WP,
    - PTE_MARKER && !PTE_MARKER_UFFD_WP,
    - !PTE_MARKER
    - !USERFAULTFD
  - Kernel userfaultfd selftests for shmem/hugetlb/hugetlb_shared
  - Umapsort [1,2] test for shmem/hugetlb, with swap on/off
- aarch64
  - Compile and smoke tested with !PTE_MARKER

[1] https://github.com/xzpeter/umap-apps/tree/peter
[2] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs

Peter Xu (23):
  mm: Introduce PTE_MARKER swap entry
  mm: Teach core mm about pte markers
  mm: Check against orig_pte for finish_fault()
  mm/uffd: PTE_MARKER_UFFD_WP
  mm/shmem: Take care of UFFDIO_COPY_MODE_WP
  mm/shmem: Handle uffd-wp special pte in page fault handler
  mm/shmem: Persist uffd-wp bit across zapping for file-backed
  mm/shmem: Allow uffd wr-protect none pte for file-backed mem
  mm/shmem: Allows file-back mem to be uffd wr-protected on thps
  mm/shmem: Handle uffd-wp during fork()
  mm/hugetlb: Introduce huge pte version of uffd-wp helpers
  mm/hugetlb: Hook page faults for uffd write protection
  mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP
  mm/hugetlb: Handle UFFDIO_WRITEPROTECT
  mm/hugetlb: Handle pte markers in page faults
  mm/hugetlb: Allow uffd wr-protect none ptes
  mm/hugetlb: Only drop uffd-wp special pte if required
  mm/hugetlb: Handle uffd-wp during fork()
  mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered
  mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs
  mm/uffd: Enable write protection for shmem & hugetlbfs
  mm: Enable PTE markers by default
  selftests/uffd: Enable uffd-wp for shmem/hugetlbfs

 arch/s390/include/asm/hugetlb.h          |  15 ++
 fs/hugetlbfs/inode.c                     |  15 +-
 fs/proc/task_mmu.c                       |  11 ++
 fs/userfaultfd.c                         |  31 +---
 include/asm-generic/hugetlb.h            |  24 +++
 include/linux/hugetlb.h                  |  27 ++--
 include/linux/mm.h                       |  20 +++
 include/linux/mm_inline.h                |  45 ++++++
 include/linux/shmem_fs.h                 |   4 +-
 include/linux/swap.h                     |  15 +-
 include/linux/swapops.h                  |  79 ++++++++++
 include/linux/userfaultfd_k.h            |  67 +++++++++
 include/uapi/linux/userfaultfd.h         |  10 +-
 mm/Kconfig                               |  16 ++
 mm/filemap.c                             |   5 +
 mm/hmm.c                                 |   2 +-
 mm/hugetlb.c                             | 181 +++++++++++++++++-----
 mm/khugepaged.c                          |  14 +-
 mm/memcontrol.c                          |   8 +-
 mm/memory.c                              | 184 ++++++++++++++++++++---
 mm/mincore.c                             |   3 +-
 mm/mprotect.c                            |  76 +++++++++-
 mm/rmap.c                                |   8 +
 mm/shmem.c                               |   4 +-
 mm/userfaultfd.c                         |  61 +++++---
 tools/testing/selftests/vm/userfaultfd.c |   4 +-
 26 files changed, 798 insertions(+), 131 deletions(-)

-- 
2.32.0


             reply	other threads:[~2021-11-15  7:55 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15  7:54 Peter Xu [this message]
2021-11-15  7:55 ` [PATCH v6 01/23] mm: Introduce PTE_MARKER swap entry Peter Xu
2021-12-03  3:30   ` Alistair Popple
2021-12-03  4:21     ` Peter Xu
2021-12-03  5:35       ` Alistair Popple
2021-12-03  6:45         ` Peter Xu
2021-12-07  2:12           ` Alistair Popple
2021-12-07  2:30             ` Peter Xu
2021-11-15  7:55 ` [PATCH v6 02/23] mm: Teach core mm about pte markers Peter Xu
2021-11-15  7:55 ` [PATCH v6 03/23] mm: Check against orig_pte for finish_fault() Peter Xu
2021-12-16  5:01   ` Alistair Popple
2021-12-16  5:38     ` Peter Xu
2021-12-16  5:50       ` Peter Xu
2021-12-16  6:23         ` Alistair Popple
2021-12-16  7:06           ` Peter Xu
2021-12-16  7:45             ` Alistair Popple
2021-12-16  8:04               ` Peter Xu
2021-11-15  7:55 ` [PATCH v6 04/23] mm/uffd: PTE_MARKER_UFFD_WP Peter Xu
2021-12-16  5:18   ` Alistair Popple
2021-12-16  5:45     ` Peter Xu
2021-11-15  7:55 ` [PATCH v6 05/23] mm/shmem: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-11-15  7:55 ` [PATCH v6 06/23] mm/shmem: Handle uffd-wp special pte in page fault handler Peter Xu
2021-12-16  5:56   ` Alistair Popple
2021-12-16  6:17     ` Peter Xu
2021-12-16  6:30       ` Alistair Popple
2021-11-15  7:55 ` [PATCH v6 07/23] mm/shmem: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-11-15  8:00 ` [PATCH v6 08/23] mm/shmem: Allow uffd wr-protect none pte for file-backed mem Peter Xu
2021-11-15  8:00 ` [PATCH v6 09/23] mm/shmem: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-11-15  8:01 ` [PATCH v6 10/23] mm/shmem: Handle uffd-wp during fork() Peter Xu
2021-11-15  8:01 ` [PATCH v6 11/23] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
2021-11-15  8:01 ` [PATCH v6 12/23] mm/hugetlb: Hook page faults for uffd write protection Peter Xu
2021-11-15  8:01 ` [PATCH v6 13/23] mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-11-15  8:02 ` [PATCH v6 14/23] mm/hugetlb: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-11-15  8:02 ` [PATCH v6 15/23] mm/hugetlb: Handle pte markers in page faults Peter Xu
2021-11-15  8:02 ` [PATCH v6 16/23] mm/hugetlb: Allow uffd wr-protect none ptes Peter Xu
2021-11-15  8:02 ` [PATCH v6 17/23] mm/hugetlb: Only drop uffd-wp special pte if required Peter Xu
2021-11-15  8:02 ` [PATCH v6 18/23] mm/hugetlb: Handle uffd-wp during fork() Peter Xu
2021-11-15  8:03 ` [PATCH v6 19/23] mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered Peter Xu
2021-11-15  8:03 ` [PATCH v6 20/23] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
2021-11-15  8:03 ` [PATCH v6 21/23] mm/uffd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-11-15  8:03 ` [PATCH v6 22/23] mm: Enable PTE markers by default Peter Xu
2021-11-15  8:04 ` [PATCH v6 23/23] selftests/uffd: Enable uffd-wp for shmem/hugetlbfs Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211115075522.73795-1-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).