From: David Hildenbrand <david@redhat.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk,
brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com,
lokeshgidra@google.com, peterx@redhat.com, hughd@google.com,
mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org,
willy@infradead.org, Liam.Howlett@oracle.com, jannh@google.com,
zhangpeng362@huawei.com, bgeffon@google.com,
kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel-team@android.com
Subject: Re: [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI
Date: Mon, 9 Oct 2023 18:23:56 +0200 [thread overview]
Message-ID: <478697aa-f55c-375a-6888-3abb343c6d9d@redhat.com> (raw)
In-Reply-To: <CAJuCfpHzSm+z9b6uxyYFeqr5b5=6LehE9O0g192DZdJnZqmQEw@mail.gmail.com>
On 09.10.23 18:21, Suren Baghdasaryan wrote:
> On Mon, Oct 9, 2023 at 7:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 09.10.23 08:42, Suren Baghdasaryan wrote:
>>> From: Andrea Arcangeli <aarcange@redhat.com>
>>>
>>> Implement the uABI of UFFDIO_MOVE ioctl.
>>> UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application
>>> needs pages to be allocated [1]. However, with UFFDIO_MOVE, if pages are
>>> available (in userspace) for recycling, as is usually the case in heap
>>> compaction algorithms, then we can avoid the page allocation and memcpy
>>> (done by UFFDIO_COPY). Also, since the pages are recycled in the
>>> userspace, we avoid the need to release (via madvise) the pages back to
>>> the kernel [2].
>>> We see over 40% reduction (on a Google pixel 6 device) in the compacting
>>> thread’s completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was
>>> measured using a benchmark that emulates a heap compaction implementation
>>> using userfaultfd (to allow concurrent accesses by application threads).
>>> More details of the usecase are explained in [2].
>>> Furthermore, UFFDIO_MOVE enables moving swapped-out pages without
>>> touching them within the same vma. Today, it can only be done by mremap,
>>> however it forces splitting the vma.
>>>
>>> [1] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redhat.com/
>>> [2] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyjniNVjp0Aw@mail.gmail.com/
>>>
>>> Update for the ioctl_userfaultfd(2) manpage:
>>>
>>> UFFDIO_MOVE
>>> (Since Linux xxx) Move a continuous memory chunk into the
>>> userfault registered range and optionally wake up the blocked
>>> thread. The source and destination addresses and the number of
>>> bytes to move are specified by the src, dst, and len fields of
>>> the uffdio_move structure pointed to by argp:
>>>
>>> struct uffdio_move {
>>> __u64 dst; /* Destination of move */
>>> __u64 src; /* Source of move */
>>> __u64 len; /* Number of bytes to move */
>>> __u64 mode; /* Flags controlling behavior of move */
>>> __s64 move; /* Number of bytes moved, or negated error */
>>> };
>>>
>>> The following value may be bitwise ORed in mode to change the
>>> behavior of the UFFDIO_MOVE operation:
>>>
>>> UFFDIO_MOVE_MODE_DONTWAKE
>>> Do not wake up the thread that waits for page-fault
>>> resolution
>>>
>>> UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES
>>> Allow holes in the source virtual range that is being moved.
>>> When not specified, the holes will result in ENOENT error.
>>> When specified, the holes will be accounted as successfully
>>> moved memory. This is mostly useful to move hugepage aligned
>>> virtual regions without knowing if there are transparent
>>> hugepages in the regions or not, but preventing the risk of
>>> having to split the hugepage during the operation.
>>>
>>> The move field is used by the kernel to return the number of
>>> bytes that was actually moved, or an error (a negated errno-
>>> style value). If the value returned in move doesn't match the
>>> value that was specified in len, the operation fails with the
>>> error EAGAIN. The move field is output-only; it is not read by
>>> the UFFDIO_MOVE operation.
>>>
>>> The operation may fail for various reasons. Usually, remapping of
>>> pages that are not exclusive to the given process fail; once KSM
>>> might deduplicate pages or fork() COW-shares pages during fork()
>>> with child processes, they are no longer exclusive. Further, the
>>> kernel might only perform lightweight checks for detecting whether
>>> the pages are exclusive, and return -EBUSY in case that check fails.
>>> To make the operation more likely to succeed, KSM should be
>>> disabled, fork() should be avoided or MADV_DONTFORK should be
>>> configured for the source VMA before fork().
>>>
>>> This ioctl(2) operation returns 0 on success. In this case, the
>>> entire area was moved. On error, -1 is returned and errno is
>>> set to indicate the error. Possible errors include:
>>>
>>> EAGAIN The number of bytes moved (i.e., the value returned in
>>> the move field) does not equal the value that was
>>> specified in the len field.
>>>
>>> EINVAL Either dst or len was not a multiple of the system page
>>> size, or the range specified by src and len or dst and len
>>> was invalid.
>>>
>>> EINVAL An invalid bit was specified in the mode field.
>>>
>>> ENOENT
>>> The source virtual memory range has unmapped holes and
>>> UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES is not set.
>>>
>>> EEXIST
>>> The destination virtual memory range is fully or partially
>>> mapped.
>>>
>>> EBUSY
>>> The pages in the source virtual memory range are not
>>> exclusive to the process. The kernel might only perform
>>> lightweight checks for detecting whether the pages are
>>> exclusive. To make the operation more likely to succeed,
>>> KSM should be disabled, fork() should be avoided or
>>> MADV_DONTFORK should be configured for the source virtual
>>> memory area before fork().
>>>
>>> ENOMEM Allocating memory needed for the operation failed.
>>>
>>> ESRCH
>>> The faulting process has exited at the time of a
>>> UFFDIO_MOVE operation.
>>>
>>
>> A general comment simply because I realized that just now: does anything
>> speak against limiting the operations now to a single MM?
>>
>> The use cases I heard so far don't need it. If ever required, we could
>> consider extending it.
>>
>> Let's reduce complexity and KIS unless really required.
>
> Let me check if there are use cases that require moves between MMs.
> Andrea seems to have put considerable effort to make it work between
> MMs and it would be a pity to lose that. I can send a follow-up patch
> to recover that functionality and even if it does not get merged, it
> can be used in the future as a reference. But first let me check if we
> can drop it.
Yes, that sounds reasonable. Unless the big important use cases requires
moving pages between processes, let's leave that as future work for now.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-10-09 16:24 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-09 6:42 [PATCH v3 0/3] userfaultfd move option Suren Baghdasaryan
2023-10-09 6:42 ` [PATCH v3 1/3] mm/rmap: support move to different root anon_vma in folio_move_anon_rmap() Suren Baghdasaryan
2023-10-12 22:01 ` Peter Xu
2023-10-13 8:04 ` David Hildenbrand
2023-10-19 15:19 ` Suren Baghdasaryan
2023-10-09 6:42 ` [PATCH v3 2/3] userfaultfd: UFFDIO_MOVE uABI Suren Baghdasaryan
2023-10-09 14:38 ` David Hildenbrand
2023-10-09 16:21 ` Suren Baghdasaryan
2023-10-09 16:23 ` David Hildenbrand [this message]
2023-10-09 16:29 ` Lokesh Gidra
2023-10-09 17:56 ` Lokesh Gidra
2023-10-10 1:49 ` Suren Baghdasaryan
2023-10-12 20:11 ` Peter Xu
2023-10-13 9:56 ` David Hildenbrand
2023-10-13 16:08 ` Peter Xu
2023-10-13 16:49 ` Lokesh Gidra
2023-10-13 17:05 ` Peter Xu
2023-10-16 18:01 ` David Hildenbrand
2023-10-16 19:01 ` Peter Xu
2023-10-17 15:55 ` David Hildenbrand
2023-10-17 18:59 ` Peter Xu
2023-10-19 15:41 ` David Hildenbrand
2023-10-19 19:53 ` Peter Xu
2023-10-19 20:02 ` Suren Baghdasaryan
2023-10-19 20:43 ` Peter Xu
2023-10-20 10:02 ` David Hildenbrand
2023-10-20 14:09 ` Suren Baghdasaryan
2023-10-20 17:16 ` David Hildenbrand
2023-10-22 15:46 ` Peter Xu
2023-10-23 12:03 ` David Hildenbrand
2023-10-23 16:36 ` David Hildenbrand
2023-10-23 17:33 ` Suren Baghdasaryan
2023-10-19 21:45 ` Suren Baghdasaryan
2023-10-12 21:59 ` Peter Xu
2023-10-19 21:24 ` Suren Baghdasaryan
2023-10-22 17:01 ` Peter Xu
2023-10-23 17:43 ` Suren Baghdasaryan
2023-10-23 18:37 ` Peter Xu
2023-10-23 19:01 ` Suren Baghdasaryan
2023-10-17 19:39 ` kernel test robot
2023-10-19 21:55 ` Suren Baghdasaryan
2023-10-23 12:29 ` David Hildenbrand
2023-10-23 15:53 ` David Hildenbrand
2023-10-23 19:00 ` Suren Baghdasaryan
2023-10-23 18:56 ` Suren Baghdasaryan
2023-10-24 14:27 ` David Hildenbrand
2023-10-24 14:36 ` Suren Baghdasaryan
2023-10-09 6:42 ` [PATCH v3 3/3] selftests/mm: add UFFDIO_MOVE ioctl test Suren Baghdasaryan
2023-10-12 22:29 ` Peter Xu
2023-10-19 15:43 ` Suren Baghdasaryan
2023-10-19 17:29 ` Axel Rasmussen
2023-10-19 19:33 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=478697aa-f55c-375a-6888-3abb343c6d9d@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=bgeffon@google.com \
--cc=brauner@kernel.org \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=jdduke@google.com \
--cc=kaleshsingh@google.com \
--cc=kernel-team@android.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=mhocko@suse.com \
--cc=ngeoffray@google.com \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=zhangpeng362@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).