All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: David Hildenbrand <david@redhat.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [RFC PATCH v2 2/5] userfaultfd: introduce access-likely mode for copy/wp operations
Date: Tue, 21 Jun 2022 17:27:40 +0000	[thread overview]
Message-ID: <6DCD53B1-E764-4E29-8144-896032756F11@vmware.com> (raw)
In-Reply-To: <506888c0-c257-e2a8-9540-823acdd403db@redhat.com>

On Jun 21, 2022, at 1:48 AM, David Hildenbrand <david@redhat.com> wrote:

> ⚠ External Email
> 
> On 20.06.22 01:34, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> Using a PTE on x86 with cleared access-bit (aka young-bit)
>> takes ~600 cycles more than when the access bit is set. At the same
>> time, setting the access-bit for memory that is not used (e.g.,
>> prefetched) can introduce greater overheads, as the prefetched memory is
>> reclaimed later than it should be.
>> 
>> Userfaultfd currently does not set the access-bit (excluding the
>> huge-pages case). Arguably, it is best to let the user control whether
>> the access bit should be set or not. The expected use is to request
>> userfaultfd to set the access-bit when the copy/wp operation is done to
>> resolve a page-fault, and not to set the access-bit when the memory is
>> prefetched.
>> 
>> Introduce UFFDIO_COPY_MODE_ACCESS_LIKELY and
>> UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY to enable userspace to request
>> the young bit to be set. Set for UFFDIO_CONTINUE and UFFDIO_ZEROPAGE the
>> bit unconditionally since the former is only used to resolve page-faults
>> and the latter would not benefit from not setting the access-bit.
>> 
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Axel Rasmussen <axelrasmussen@google.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>> fs/userfaultfd.c                 | 23 ++++++++++++++++-------
>> include/linux/userfaultfd_k.h    |  1 +
>> include/uapi/linux/userfaultfd.h | 20 +++++++++++++++++++-
>> mm/userfaultfd.c                 | 18 ++++++++++++++++--
>> 4 files changed, 52 insertions(+), 10 deletions(-)
>> 
>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
>> index 5daafa54eb3f..35a8c4347c54 100644
>> --- a/fs/userfaultfd.c
>> +++ b/fs/userfaultfd.c
>> @@ -1700,7 +1700,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
>>      struct uffdio_copy uffdio_copy;
>>      struct uffdio_copy __user *user_uffdio_copy;
>>      struct userfaultfd_wake_range range;
>> -     bool mode_wp;
>> +     bool mode_wp, mode_access_likely;
>>      uffd_flags_t uffd_flags;
>> 
>>      user_uffdio_copy = (struct uffdio_copy __user *) arg;
>> @@ -1726,12 +1726,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
>>      ret = -EINVAL;
>>      if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src)
>>              goto out;
>> -     if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP))
>> +     if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP|
>> +                              UFFDIO_COPY_MODE_ACCESS_LIKELY))
>>              goto out;
>> 
>>      mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP;
>> +     mode_access_likely = uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY;
> 
> I *relly* prefer just
> 
> if (uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY)
>        uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY
> [...]
> 
>> -     uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0);
>> +     uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0) |
>> +                  (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0);
> 
> Dito.
> 
>>      if (mmget_not_zero(ctx->mm)) {
>>              ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start,
>> @@ -1871,6 +1877,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
>>      struct uffdio_continue uffdio_continue;
>>      struct uffdio_continue __user *user_uffdio_continue;
>>      struct userfaultfd_wake_range range;
>> +     uffd_flags_t uffd_flags;
>> 
>>      user_uffdio_continue = (struct uffdio_continue __user *)arg;
>> 
>> @@ -1898,10 +1905,12 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
>>      if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE)
>>              goto out;
>> 
>> +     uffd_flags = UFFD_FLAGS_ACCESS_LIKELY;
> 
> Can we add a comment why that makes sense? I think I know why -- someone
> is stuck waiting for that continue to happen :)
> 
>> +
>>      if (mmget_not_zero(ctx->mm)) {
>>              ret = mcopy_continue(ctx->mm, uffdio_continue.range.start,
>>                                   uffdio_continue.range.len,
>> -                                  &ctx->mmap_changing, 0);
>> +                                  &ctx->mmap_changing, uffd_flags);
>>              mmput(ctx->mm);
>>      } else {
>>              return -ESRCH;
>> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
>> index 6331148023c1..e6ac165ec044 100644
>> --- a/include/linux/userfaultfd_k.h
>> +++ b/include/linux/userfaultfd_k.h
>> @@ -58,6 +58,7 @@ enum mcopy_atomic_mode {
>> typedef unsigned int __bitwise uffd_flags_t;
>> 
>> #define UFFD_FLAGS_WP                        ((__force uffd_flags_t)BIT(0))
>> +#define UFFD_FLAGS_ACCESS_LIKELY     ((__force uffd_flags_t)BIT(1))
>> 
>> extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
>>                                  struct vm_area_struct *dst_vma,
>> diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
>> index 005e5e306266..d9c8ce9ba777 100644
>> --- a/include/uapi/linux/userfaultfd.h
>> +++ b/include/uapi/linux/userfaultfd.h
>> @@ -38,7 +38,8 @@
>>                         UFFD_FEATURE_MINOR_HUGETLBFS |       \
>>                         UFFD_FEATURE_MINOR_SHMEM |           \
>>                         UFFD_FEATURE_EXACT_ADDRESS |         \
>> -                        UFFD_FEATURE_WP_HUGETLBFS_SHMEM)
>> +                        UFFD_FEATURE_WP_HUGETLBFS_SHMEM |    \
>> +                        UFFD_FEATURE_ACCESS_HINTS)
>> #define UFFD_API_IOCTLS                              \
>>      ((__u64)1 << _UFFDIO_REGISTER |         \
>>       (__u64)1 << _UFFDIO_UNREGISTER |       \
>> @@ -203,6 +204,10 @@ struct uffdio_api {
>>       *
>>       * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd
>>       * write-protection mode is supported on both shmem and hugetlbfs.
>> +      *
>> +      * UFFD_FEATURE_ACCESS_HINTS indicates that the copy supports
>> +      * UFFDIO_COPY_MODE_ACCESS_LIKELY supports
>> +      * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY.
> 
> I think that sentence doesn't make sense.

English… :)

How about

         * UFFD_FEATURE_ACCESS_HINTS indicates that the ioctl operations
         * supports the UFFDIO_*_MODE_[ACCESS|WRITE]_LIKELY and
         * UFFDIO_*_MODE_[ACCESS|WRITE]_LIKELY hints.

But that would mean that for consistency, I would need to provide
zero/continue hints (which might be disregarded by the kernel)?

>>       */
>> #define UFFD_FEATURE_PAGEFAULT_FLAG_WP               (1<<0)
>> #define UFFD_FEATURE_EVENT_FORK                      (1<<1)
>> @@ -217,6 +222,7 @@ struct uffdio_api {
>> #define UFFD_FEATURE_MINOR_SHMEM             (1<<10)
>> #define UFFD_FEATURE_EXACT_ADDRESS           (1<<11)
>> #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM              (1<<12)
>> +#define UFFD_FEATURE_ACCESS_HINTS            (1<<13)
>>      __u64 features;
>> 
>>      __u64 ioctls;
>> @@ -260,6 +266,13 @@ struct uffdio_copy {
>>       * copy_from_user will not read the last 8 bytes.
>>       */
>>      __s64 copy;
>> +     /*
>> +      * UFFDIO_COPY_MODE_ACCESS_LIKELY will set the mapped page as young.
> 
> Setting the page young is an implementation detail. Can you phrase it
> more generically what the effect of that hint might be?

Err. I forgot to fix it before sending. How about:

         * UFFDIO_COPY_MODE_ACCESS_LIKELY provides a hint to the kernel
         * that the page is likely to be access in the near future. Providing
         * the hint properly can improve performance.


?
> 
>> @@ -691,6 +699,9 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
>>                     unsigned long len, atomic_t *mmap_changing,
>>                     uffd_flags_t uffd_flags)
>> {
>> +     /* There is no cost for setting the access bit of a zeropage */
>> +     uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY;
>> +
>>      return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE,
>>                            mmap_changing, 0);
>> }
>> @@ -699,6 +710,9 @@ ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start,
>>                     unsigned long len, atomic_t *mmap_changing,
>>                     uffd_flags_t uffd_flags)
>> {
>> +     /* The page is likely to be accessed */
>> +     uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY;
> 
> Shoouldn't that be set by the caller already?

I thought that it belongs conceptually to mm/userfaultfd and not
fs/userfaultfd.

I will wait for Axel input as to how to handle the CONTINUE case and fix it
accordingly.

  parent reply	other threads:[~2022-06-21 17:27 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-19 23:34 [RFC PATCH v2 0/5] userfaultfd: support access/write hints Nadav Amit
2022-06-19 23:34 ` [RFC PATCH v2 1/5] userfaultfd: introduce uffd_flags Nadav Amit
2022-06-21  8:41   ` David Hildenbrand
2022-06-21 15:31     ` Peter Xu
2022-06-21 15:29   ` Peter Xu
2022-06-21 17:41     ` Nadav Amit
2022-06-19 23:34 ` [RFC PATCH v2 2/5] userfaultfd: introduce access-likely mode for copy/wp operations Nadav Amit
2022-06-20 10:33   ` kernel test robot
2022-06-21  8:48   ` David Hildenbrand
2022-06-21 15:42     ` Peter Xu
2022-06-21 17:27     ` Nadav Amit [this message]
2022-06-19 23:34 ` [RFC PATCH v2 3/5] userfaultfd: introduce write-likely " Nadav Amit
2022-06-21 16:38   ` Peter Xu
2022-06-21 17:14     ` Nadav Amit
2022-06-21 18:10       ` Peter Xu
2022-06-21 18:30         ` Nadav Amit
2022-06-21 18:43           ` Peter Xu
2022-06-19 23:34 ` [RFC PATCH v2 4/5] userfaultfd: zero access/write hints Nadav Amit
2022-06-20 18:06   ` kernel test robot
2022-06-21 17:04   ` Peter Xu
2022-06-21 17:17     ` Nadav Amit
2022-06-21 17:56       ` Peter Xu
2022-06-21 17:58         ` Nadav Amit
2022-06-19 23:34 ` [RFC PATCH v2 5/5] selftest/userfaultfd: test read/write hints Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6DCD53B1-E764-4E29-8144-896032756F11@vmware.com \
    --to=namit@vmware.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.