From: Nadav Amit <nadav.amit@gmail.com>
To: Peter Xu <peterx@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>,
David Hildenbrand <david@redhat.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: userfaultfd: usability issue due to lack of UFFD events ordering
Date: Tue, 15 Feb 2022 14:35:09 -0800 [thread overview]
Message-ID: <F195F8B6-05C4-45BC-BA10-632CA3699941@gmail.com> (raw)
In-Reply-To: <YgnUVqKfkYTjz3Gx@xz-m1.local>
> On Feb 13, 2022, at 8:02 PM, Peter Xu <peterx@redhat.com> wrote:
>
> Thanks for explaining.
>
> I also digged out the discussion threads between you and Mike and that's a good
> one too summarizing the problems:
>
> https://lore.kernel.org/all/5921BA80-F263-4F8D-B7E6-316CEB602B51@gmail.com/
>
> Scenario 4 is kind of special imho along all those, because that's the only one
> that can be workarounded by user application by only copying pages one by one.
> I know you were even leveraging iouring in your local tree, so that's probably
> not a solution at all for you. But I'm just trying to start thinking without
> that scenario for now.
>
> Per my understanding, a major issue regarding the rest of the scenarios is
> ordering of uffd messages may not match with how things are happening. This
> actually contains two problems.
>
> First of all, mmap_sem is mostly held read for all page faults and most of the
> mm changes except e.g. fork, then we can never serialize them. Not to mention
> uffd events releases mmap_sem within prep and completion. Let's call it
> problem 1.
>
> The other problem 2 is we can never serialize faults against events.
>
> For problem 1, I do sense something that mmap_sem is just not suitable for uffd
> scenario. Say, we grant concurrent with most of the events like dontneed and
> mremap, but when uffd ordering is a concern we may not want to grant that
> concurrency. I'm wondering whether it means uffd may need its own semaphore to
> achieve this. So for all events that uffd cares we take write lock on a new
> uffd_sem after mmap_sem, meanwhile we don't release that uffd_sem after prep of
> events, not until completion (the message is read). It'll slow down uffd
> tracked systems but guarantees ordering.
Peter,
Thanks for finding the time and looking into the issues that I encountered.
Your approach sounds possible, but it sounds to me unsafe to acquire uffd_sem
after mmap_lock, since it might cause deadlocks (e.g., if a process uses events
to manage its own memory).
>
> At the meantime, I'm wildly thinking whether we can tackle with the other
> problem by merging the page fault queue with the event queue, aka, event_wqh
> and fault_pending_wqh. Obviously we'll need to identify the messages when
> read() and conditionally move then into fault_wqh only if they come from page
> faults, but that seems doable?
This, I guess is necessary in addition to your aforementioned proposal to have
some semaphore protecting, can do the trick.
While I got your attention, let me share some other challenges I encountered
using userfaultfd. They might be unrelated, but perhaps you can keep them in
the back of your mind. Nobody should suffer as I did ;-)
1. mmap_changing (i.e., -EAGAIN on ioctls) makes using userfaultfd harder than
it should be, especially when using io-uring as I wish to do.
I think it is not too hard to address by changing the API. For instance, if
uffd-ctx had a uffd-generation that would increase on each event, the user
could have provided an ioctl-generation as part of copy/zero/etc ioctls, and
the kernel would only fail the operation if ioctl copy/zero/etc operation
only succeeds if the uffd-generation is lower/equal than the one provided by
the user.
2. userfaultfd is separated from other tracing/instrumentation mechanisms in
the kernel. I, for instance, also wanted to track mmap events (let’s put
aside for a second why). Tracking these events can be done with ptrace or
perf_event_open() but then it is hard to correlate these events with
userfaultfd. It would have been easier for users, I think, if userfaultfd
notifications were provided through ptrace/tracepoints mechanisms as well.
3. Nesting/chaining. It is not easy to allow two monitors to use userfaultfd
concurrently. This seems as a general problem that I believe ptrace suffers
from too. I know it might seem far-fetched to have 2 monitors at the moment,
but I think that any tracking/instrumentation mechanism (e.g., ptrace,
software-dirty, not to mention hardware virtualization) should be designed
from the beginning with such support as adding it in a later stage can be
tricky.
4. Missing state. It would be useful to provide the TID of the faulting
thread. I will send a patch for this one once I get the necessary
internal approvals.
Thanks again,
Nadav
next prev parent reply other threads:[~2022-02-15 22:35 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-30 6:23 userfaultfd: usability issue due to lack of UFFD events ordering Nadav Amit
2022-01-31 10:42 ` Mike Rapoport
2022-01-31 10:48 ` David Hildenbrand
2022-01-31 14:05 ` Mike Rapoport
2022-01-31 14:12 ` David Hildenbrand
2022-01-31 14:28 ` Mike Rapoport
2022-01-31 14:41 ` David Hildenbrand
2022-01-31 18:47 ` Mike Rapoport
2022-01-31 22:39 ` Nadav Amit
2022-02-01 9:10 ` Mike Rapoport
2022-02-10 7:48 ` Peter Xu
2022-02-10 18:42 ` Nadav Amit
2022-02-14 4:02 ` Peter Xu
2022-02-15 22:35 ` Nadav Amit [this message]
2022-02-16 8:27 ` Peter Xu
2022-02-17 21:15 ` Mike Rapoport
2022-01-31 17:23 ` Nadav Amit
2022-01-31 17:28 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=F195F8B6-05C4-45BC-BA10-632CA3699941@gmail.com \
--to=nadav.amit@gmail.com \
--cc=aarcange@redhat.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=rppt@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).