All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org,
	aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com,
	david@redhat.com, hughd@google.com, mhocko@suse.com,
	axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org,
	Liam.Howlett@oracle.com, jannh@google.com,
	zhangpeng362@huawei.com, bgeffon@google.com,
	kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com,
	surenb@google.com, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, kernel-team@android.com
Subject: [PATCH v2 0/3] userfaultfd remap option
Date: Fri, 22 Sep 2023 18:31:43 -0700	[thread overview]
Message-ID: <20230923013148.1390521-1-surenb@google.com> (raw)

This patch series introduces UFFDIO_REMAP feature to userfaultfd, which
has long been implemented and maintained by Andrea in his local tree [1],
but was not upstreamed due to lack of use cases where this approach would
be better than allocating a new page and copying the contents.

UFFDIO_COPY performs ~20% better than UFFDIO_REMAP when the application
needs pages to be allocated [2]. However, with UFFDIO_REMAP, if pages are
available (in userspace) for recycling, as is usually the case in heap
compaction algorithms, then we can avoid the page allocation and memcpy
(done by UFFDIO_COPY). Also, since the pages are recycled in the
userspace, we avoid the need to release (via madvise) the pages back to
the kernel [3].
We see over 40% reduction (on a Google pixel 6 device) in the compacting
thread’s completion time by using UFFDIO_REMAP vs. UFFDIO_COPY. This was
measured using a benchmark that emulates a heap compaction implementation
using userfaultfd (to allow concurrent accesses by application threads).
More details of the usecase are explained in [3].

Furthermore, UFFDIO_REMAP enables remapping swapped-out pages without
touching them within the same vma. Today, it can only be done by mremap,
however it forces splitting the vma.

Main changes since Andrea's last version [1]:
- Trivial translations from page to folio, mmap_sem to mmap_lock
- Replace pmd_trans_unstable() with pte_offset_map_nolock() and handle its
possible failure
- Move pte mapping into remap_pages_pte to allow for retries when source
page or anon_vma is contended. Since pte_offset_map_nolock() start RCU
read section, we can't block anymore after mapping a pte, so have to unmap
the ptesm do the locking and retry.
- Add and use anon_vma_trylock_write()  to avoid blocking while in RCU
read section.
- Accommodate changes in mmu_notifier_range_init() API, switch to
mmu_notifier_invalidate_range_start_nonblock() to avoid blocking while in
RCU read section.
- Open-code now removed __swp_swapcount()
- Replace pmd_read_atomic() with pmdp_get_lockless()
- Add new selftest for UFFDIO_REMAP

Changes since v1 [4]:
- add mmget_not_zero in userfaultfd_remap, per Jann Horn
- removed extern from function definitions, per Matthew Wilcox
- converted to folios in remap_pages_huge_pmd, per Matthew Wilcox
- use PageAnonExclusive in remap_pages_huge_pmd, per David Hildenbrand
- handle pgtable transfers between MMs, per Jann Horn
- ignore concurrent A/D pte bit changes, per Jann Horn
- split functions into smaller units, per David Hildenbrand
- test for folio_test_large in remap_anon_pte, per Matthew Wilcox
- use pte_swp_exclusive for swapcount check, per David Hildenbrand
- eliminated use of mmu_notifier_invalidate_range_start_nonblock,
per Jann Horn
- simplified THP alignment checks, per Jann Horn
- refactored the loop inside remap_pages, per Jann Horn
- additional clarifying comments, per Jann Horn

[1] https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc19e92
[2] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redhat.com/
[3] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyjniNVjp0Aw@mail.gmail.com/
[4] https://lore.kernel.org/all/20230914152620.2743033-1-surenb@google.com/

Andrea Arcangeli (2):
  userfaultfd: UFFDIO_REMAP: rmap preparation
  userfaultfd: UFFDIO_REMAP uABI

Suren Baghdasaryan (1):
  selftests/mm: add UFFDIO_REMAP ioctl test

 fs/userfaultfd.c                             |  63 ++
 include/linux/rmap.h                         |   5 +
 include/linux/userfaultfd_k.h                |  12 +
 include/uapi/linux/userfaultfd.h             |  22 +
 mm/huge_memory.c                             | 130 ++++
 mm/khugepaged.c                              |   3 +
 mm/rmap.c                                    |  13 +
 mm/userfaultfd.c                             | 590 +++++++++++++++++++
 tools/testing/selftests/mm/uffd-common.c     |  41 +-
 tools/testing/selftests/mm/uffd-common.h     |   1 +
 tools/testing/selftests/mm/uffd-unit-tests.c |  62 ++
 11 files changed, 940 insertions(+), 2 deletions(-)

-- 
2.42.0.515.g380fc7ccd1-goog


             reply	other threads:[~2023-09-23  1:32 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-23  1:31 Suren Baghdasaryan [this message]
2023-09-23  1:31 ` [PATCH v2 1/3] userfaultfd: UFFDIO_REMAP: rmap preparation Suren Baghdasaryan
2023-09-28 16:23   ` Peter Xu
2023-09-28 20:03     ` Suren Baghdasaryan
2023-10-02 14:42   ` David Hildenbrand
2023-10-02 15:23     ` Peter Xu
2023-10-02 17:30       ` David Hildenbrand
2023-10-03 17:56         ` Suren Baghdasaryan
2023-09-23  1:31 ` [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI Suren Baghdasaryan
2023-09-27 10:06   ` potential new userfaultfd vs khugepaged conflict [was: Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI] Jann Horn
2023-09-27 17:12     ` Suren Baghdasaryan
2023-09-28 15:29       ` Jann Horn
2023-09-27 12:47   ` [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI Jann Horn
2023-09-27 13:29     ` David Hildenbrand
2023-09-27 18:25       ` Suren Baghdasaryan
2023-09-28 16:28         ` Peter Xu
2023-09-28 17:15         ` David Hildenbrand
2023-09-28 18:32           ` Suren Baghdasaryan
2023-09-28 20:11             ` Suren Baghdasaryan
2023-09-28 19:00           ` Peter Xu
2023-10-02  7:49             ` David Hildenbrand
2023-09-28 16:24       ` Peter Xu
2023-09-28 17:05         ` David Hildenbrand
2023-09-28 17:21           ` Peter Xu
2023-09-28 17:51             ` David Hildenbrand
2023-09-28 18:34               ` Peter Xu
2023-09-28 19:47                 ` Suren Baghdasaryan
2023-10-02  8:00                 ` David Hildenbrand
2023-10-02 15:21                   ` Peter Xu
2023-10-02 15:46                     ` Lokesh Gidra
2023-10-02 15:55                       ` Lokesh Gidra
2023-10-02 17:43                         ` David Hildenbrand
2023-10-02 19:33                           ` Lokesh Gidra
2023-10-03 20:04                             ` Suren Baghdasaryan
2023-10-03 20:21                               ` Peter Xu
2023-10-03 21:08                                 ` David Hildenbrand
2023-10-03 21:20                                   ` Peter Xu
2023-10-03 22:26                                     ` Suren Baghdasaryan
2023-10-03 23:39                                       ` Lokesh Gidra
2023-10-06 12:30                                         ` David Hildenbrand
2023-10-06 15:02                                           ` Suren Baghdasaryan
2023-10-03 21:04                               ` David Hildenbrand
2023-10-02 17:33                     ` David Hildenbrand
2023-10-02 17:36                       ` David Hildenbrand
2023-09-27 18:07     ` Suren Baghdasaryan
2023-09-27 20:04       ` Jann Horn
2023-09-27 20:42         ` Suren Baghdasaryan
2023-09-27 21:08           ` Suren Baghdasaryan
2023-09-27 22:48             ` Jann Horn
2023-09-28 15:36               ` Suren Baghdasaryan
2023-09-28 17:09   ` Peter Xu
2023-09-28 18:23     ` Suren Baghdasaryan
2023-09-28 18:43   ` Peter Xu
2023-09-28 19:50     ` Suren Baghdasaryan
2023-09-23  1:31 ` [PATCH v2 3/3] selftests/mm: add UFFDIO_REMAP ioctl test Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230923013148.1390521-1-surenb@google.com \
    --to=surenb@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=bgeffon@google.com \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jdduke@google.com \
    --cc=kaleshsingh@google.com \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=mhocko@suse.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=shuah@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=zhangpeng362@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.