linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lokesh Gidra <lokeshgidra@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
	akpm@linux-foundation.org, viro@zeniv.linux.org.uk,
	 brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com,
	peterx@redhat.com,  hughd@google.com, mhocko@suse.com,
	axelrasmussen@google.com, rppt@kernel.org,  willy@infradead.org,
	Liam.Howlett@oracle.com, jannh@google.com,
	 zhangpeng362@huawei.com, bgeffon@google.com,
	kaleshsingh@google.com,  ngeoffray@google.com, jdduke@google.com,
	linux-mm@kvack.org,  linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,  linux-kselftest@vger.kernel.org,
	kernel-team@android.com
Subject: Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI
Date: Mon, 9 Oct 2023 10:56:50 -0700	[thread overview]
Message-ID: <CA+EESO47LqwMwGgkHQdx1cBdcn_+FWqda8OPcBU-skk9yML_qA@mail.gmail.com> (raw)
In-Reply-To: <CA+EESO5nvzka0KzFGzdGgiCWPLg7XD-8jA9=NTUOKFy-56orUg@mail.gmail.com>

On Mon, Oct 9, 2023 at 9:29 AM Lokesh Gidra <lokeshgidra@google.com> wrote:
>
> On Mon, Oct 9, 2023 at 5:24 PM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 09.10.23 18:21, Suren Baghdasaryan wrote:
> > > On Mon, Oct 9, 2023 at 7:38 AM David Hildenbrand <david@redhat.com> wrote:
> > >>
> > >> On 09.10.23 08:42, Suren Baghdasaryan wrote:
> > >>> From: Andrea Arcangeli <aarcange@redhat.com>
> > >>>
> > >>> Implement the uABI of UFFDIO_MOVE ioctl.
> > >>> UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application
> > >>> needs pages to be allocated [1]. However, with UFFDIO_MOVE, if pages are
> > >>> available (in userspace) for recycling, as is usually the case in heap
> > >>> compaction algorithms, then we can avoid the page allocation and memcpy
> > >>> (done by UFFDIO_COPY). Also, since the pages are recycled in the
> > >>> userspace, we avoid the need to release (via madvise) the pages back to
> > >>> the kernel [2].
> > >>> We see over 40% reduction (on a Google pixel 6 device) in the compacting
> > >>> thread’s completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was
> > >>> measured using a benchmark that emulates a heap compaction implementation
> > >>> using userfaultfd (to allow concurrent accesses by application threads).
> > >>> More details of the usecase are explained in [2].
> > >>> Furthermore, UFFDIO_MOVE enables moving swapped-out pages without
> > >>> touching them within the same vma. Today, it can only be done by mremap,
> > >>> however it forces splitting the vma.
> > >>>
> > >>> [1] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redhat.com/
> > >>> [2] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyjniNVjp0Aw@mail.gmail.com/
> > >>>
> > >>> Update for the ioctl_userfaultfd(2)  manpage:
> > >>>
> > >>>      UFFDIO_MOVE
> > >>>          (Since Linux xxx)  Move a continuous memory chunk into the
> > >>>          userfault registered range and optionally wake up the blocked
> > >>>          thread. The source and destination addresses and the number of
> > >>>          bytes to move are specified by the src, dst, and len fields of
> > >>>          the uffdio_move structure pointed to by argp:
> > >>>
> > >>>              struct uffdio_move {
> > >>>                  __u64 dst;    /* Destination of move */
> > >>>                  __u64 src;    /* Source of move */
> > >>>                  __u64 len;    /* Number of bytes to move */
> > >>>                  __u64 mode;   /* Flags controlling behavior of move */
> > >>>                  __s64 move;   /* Number of bytes moved, or negated error */
> > >>>              };
> > >>>
> > >>>          The following value may be bitwise ORed in mode to change the
> > >>>          behavior of the UFFDIO_MOVE operation:
> > >>>
> > >>>          UFFDIO_MOVE_MODE_DONTWAKE
> > >>>                 Do not wake up the thread that waits for page-fault
> > >>>                 resolution
> > >>>
> > >>>          UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES
> > >>>                 Allow holes in the source virtual range that is being moved.
> > >>>                 When not specified, the holes will result in ENOENT error.
> > >>>                 When specified, the holes will be accounted as successfully
> > >>>                 moved memory. This is mostly useful to move hugepage aligned
> > >>>                 virtual regions without knowing if there are transparent
> > >>>                 hugepages in the regions or not, but preventing the risk of
> > >>>                 having to split the hugepage during the operation.
> > >>>
> > >>>          The move field is used by the kernel to return the number of
> > >>>          bytes that was actually moved, or an error (a negated errno-
> > >>>          style value).  If the value returned in move doesn't match the
> > >>>          value that was specified in len, the operation fails with the
> > >>>          error EAGAIN.  The move field is output-only; it is not read by
> > >>>          the UFFDIO_MOVE operation.
> > >>>
> > >>>          The operation may fail for various reasons. Usually, remapping of
> > >>>          pages that are not exclusive to the given process fail; once KSM
> > >>>          might deduplicate pages or fork() COW-shares pages during fork()
> > >>>          with child processes, they are no longer exclusive. Further, the
> > >>>          kernel might only perform lightweight checks for detecting whether
> > >>>          the pages are exclusive, and return -EBUSY in case that check fails.
> > >>>          To make the operation more likely to succeed, KSM should be
> > >>>          disabled, fork() should be avoided or MADV_DONTFORK should be
> > >>>          configured for the source VMA before fork().
> > >>>
> > >>>          This ioctl(2) operation returns 0 on success.  In this case, the
> > >>>          entire area was moved.  On error, -1 is returned and errno is
> > >>>          set to indicate the error.  Possible errors include:
> > >>>
> > >>>          EAGAIN The number of bytes moved (i.e., the value returned in
> > >>>                 the move field) does not equal the value that was
> > >>>                 specified in the len field.
> > >>>
> > >>>          EINVAL Either dst or len was not a multiple of the system page
> > >>>                 size, or the range specified by src and len or dst and len
> > >>>                 was invalid.
> > >>>
> > >>>          EINVAL An invalid bit was specified in the mode field.
> > >>>
> > >>>          ENOENT
> > >>>                 The source virtual memory range has unmapped holes and
> > >>>                 UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES is not set.
> > >>>
> > >>>          EEXIST
> > >>>                 The destination virtual memory range is fully or partially
> > >>>                 mapped.
> > >>>
> > >>>          EBUSY
> > >>>                 The pages in the source virtual memory range are not
> > >>>                 exclusive to the process. The kernel might only perform
> > >>>                 lightweight checks for detecting whether the pages are
> > >>>                 exclusive. To make the operation more likely to succeed,
> > >>>                 KSM should be disabled, fork() should be avoided or
> > >>>                 MADV_DONTFORK should be configured for the source virtual
> > >>>                 memory area before fork().
> > >>>
> > >>>          ENOMEM Allocating memory needed for the operation failed.
> > >>>
> > >>>          ESRCH
> > >>>                 The faulting process has exited at the time of a
> > >>>                 UFFDIO_MOVE operation.
> > >>>
> > >>
> > >> A general comment simply because I realized that just now: does anything
> > >> speak against limiting the operations now to a single MM?
> > >>
> > >> The use cases I heard so far don't need it. If ever required, we could
> > >> consider extending it.
> > >>
> > >> Let's reduce complexity and KIS unless really required.
> > >
> > > Let me check if there are use cases that require moves between MMs.
> > > Andrea seems to have put considerable effort to make it work between
> > > MMs and it would be a pity to lose that. I can send a follow-up patch
> > > to recover that functionality and even if it does not get merged, it
> > > can be used in the future as a reference. But first let me check if we
> > > can drop it.
>
> For the compaction use case that we have it's fine to limit it to
> single MM. However, for general use I think Peter will have a better
> idea.
> >
> > Yes, that sounds reasonable. Unless the big important use cases requires
> > moving pages between processes, let's leave that as future work for now.
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >

While going through mremap's move_page_tables code, which is pretty
similar to what we do here, I noticed that cache is flushed as well,
whereas we are not doing that here. Is that OK? I'm not a MM expert by
any means, so it's a question rather than a comment :)


  reply	other threads:[~2023-10-09 17:57 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-09  6:42 [PATCH v3 0/3] userfaultfd move option Suren Baghdasaryan
2023-10-09  6:42 ` [PATCH v3 1/3] mm/rmap: support move to different root anon_vma in folio_move_anon_rmap() Suren Baghdasaryan
2023-10-12 22:01   ` Peter Xu
2023-10-13  8:04     ` David Hildenbrand
2023-10-19 15:19       ` Suren Baghdasaryan
2023-10-09  6:42 ` [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI Suren Baghdasaryan
2023-10-09 14:38   ` David Hildenbrand
2023-10-09 16:21     ` Suren Baghdasaryan
2023-10-09 16:23       ` David Hildenbrand
2023-10-09 16:29         ` Lokesh Gidra
2023-10-09 17:56           ` Lokesh Gidra [this message]
2023-10-10  1:49             ` Suren Baghdasaryan
2023-10-12 20:11           ` Peter Xu
2023-10-13  9:56             ` David Hildenbrand
2023-10-13 16:08               ` Peter Xu
2023-10-13 16:49                 ` Lokesh Gidra
2023-10-13 17:05                   ` Peter Xu
2023-10-16 18:01                 ` David Hildenbrand
2023-10-16 19:01                   ` Peter Xu
2023-10-17 15:55                     ` David Hildenbrand
2023-10-17 18:59                       ` Peter Xu
2023-10-19 15:41                         ` David Hildenbrand
2023-10-19 19:53                           ` Peter Xu
2023-10-19 20:02                             ` Suren Baghdasaryan
2023-10-19 20:43                               ` Peter Xu
2023-10-20 10:02                             ` David Hildenbrand
2023-10-20 14:09                               ` Suren Baghdasaryan
2023-10-20 17:16                                 ` David Hildenbrand
2023-10-22 15:46                                   ` Peter Xu
2023-10-23 12:03                                     ` David Hildenbrand
2023-10-23 16:36                                       ` David Hildenbrand
2023-10-23 17:33                                         ` Suren Baghdasaryan
2023-10-19 21:45                 ` Suren Baghdasaryan
2023-10-12 21:59   ` Peter Xu
2023-10-19 21:24     ` Suren Baghdasaryan
2023-10-22 17:01       ` Peter Xu
2023-10-23 17:43         ` Suren Baghdasaryan
2023-10-23 18:37           ` Peter Xu
2023-10-23 19:01             ` Suren Baghdasaryan
2023-10-17 19:39   ` kernel test robot
2023-10-19 21:55     ` Suren Baghdasaryan
2023-10-23 12:29   ` David Hildenbrand
2023-10-23 15:53     ` David Hildenbrand
2023-10-23 19:00       ` Suren Baghdasaryan
2023-10-23 18:56     ` Suren Baghdasaryan
2023-10-24 14:27       ` David Hildenbrand
2023-10-24 14:36         ` Suren Baghdasaryan
2023-10-09  6:42 ` [PATCH v3 3/3] selftests/mm: add UFFDIO_MOVE ioctl test Suren Baghdasaryan
2023-10-12 22:29   ` Peter Xu
2023-10-19 15:43     ` Suren Baghdasaryan
2023-10-19 17:29       ` Axel Rasmussen
2023-10-19 19:33         ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+EESO47LqwMwGgkHQdx1cBdcn_+FWqda8OPcBU-skk9yML_qA@mail.gmail.com \
    --to=lokeshgidra@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=bgeffon@google.com \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jdduke@google.com \
    --cc=kaleshsingh@google.com \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=zhangpeng362@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).