linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>, Linux-MM <linux-mm@kvack.org>
Subject: Re: userfaultfd: usability issue due to lack of UFFD events ordering
Date: Mon, 31 Jan 2022 09:23:54 -0800	[thread overview]
Message-ID: <18B50289-223E-4C78-B2D6-8E9F0B9E2387@gmail.com> (raw)
In-Reply-To: <Yfe9JS47vCQv6R1l@kernel.org>


> On Jan 31, 2022, at 2:42 AM, Mike Rapoport <rppt@kernel.org> wrote:
> 
> Hi Nadav,
> 
> On Sat, Jan 29, 2022 at 10:23:55PM -0800, Nadav Amit wrote:
>> Using userfautlfd and looking at the kernel code, I encountered a usability
>> issue that complicates userspace UFFD-monitor implementation. I obviosuly
>> might be wrong, so I would appreciate a (polite?) feedback. I do have a
>> userspace workaround, but I thought it is worthy to share and to hear your
>> opinion, as well as feedback from other UFFD users.
>> 
>> The issue I encountered regards the ordering of UFFD events tbat might not
>> reflect the actual order in which events took place.
>> 
>> In more detail, UFFD events (e.g., unmap, fork) are not ordered against
>> themselves [*]. The mm-lock is dropped before notifying the userspace
>> UFFD-monitor, and therefore there is no guarantee as to whether the order of
>> the events actually reflects the order in which the events took place.
>> This can prevent a UFFD-monitor from using the events to track which
>> ranges are mapped. Specifically, UFFD_EVENT_FORK message and a
>> UFFD_EVENT_UNMAP message (which reflects unmap in the parent process) can
>> be reordered, if the events are triggered by two different threads. In
>> this case the UFFD-monitor cannot figure from the events whether the
>> child process has the unmapped memory range still mapped (because fork
>> happened first) or not.
> 
> Yeah, it seems that something like this is possible:
> 
> 
> fork()					munmap()
> 	mmap_write_unlock();
> 						mmap_write_lock_killable();
> 						do_things();
> 						mmap_{read,write}_unlock();
> 						userfaultfd_unmap_complete();
> 	dup_userfaultfd_complete();
> 
> A solution could be to split uffd_*_complete() to two parts: one that
> queues up the event message and the second one that waits for it to be read
> by the monitor. The first part then can run befor mm-lock is released.
> 
> If you can think of something nicer, it'll be really great!

Thanks for the quick response. Your solution is possible, but then the
order between events and page-faults is certainly not kept - as David
mentioned: regardless of mm-lock that is not always taken for write,
events and page-faults are on two separate lists, and queued page-faults
are reported before events.

I am also not sure how simple/performant it is, since it would require
an additional refcount for userfaultfd_wait_queue to prevent it from
disappearing between the time it is enqueued to the time it blocks.

Another option is to associate some “generation” or “sequence number”
with every event and change the PAI to include it. It still leaves the
problem of ordering MADV_DONTNEED and page-faults though.

  parent reply	other threads:[~2022-01-31 17:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-30  6:23 userfaultfd: usability issue due to lack of UFFD events ordering Nadav Amit
2022-01-31 10:42 ` Mike Rapoport
2022-01-31 10:48   ` David Hildenbrand
2022-01-31 14:05     ` Mike Rapoport
2022-01-31 14:12       ` David Hildenbrand
2022-01-31 14:28         ` Mike Rapoport
2022-01-31 14:41           ` David Hildenbrand
2022-01-31 18:47             ` Mike Rapoport
2022-01-31 22:39               ` Nadav Amit
2022-02-01  9:10                 ` Mike Rapoport
2022-02-10  7:48                 ` Peter Xu
2022-02-10 18:42                   ` Nadav Amit
2022-02-14  4:02                     ` Peter Xu
2022-02-15 22:35                       ` Nadav Amit
2022-02-16  8:27                         ` Peter Xu
2022-02-17 21:15                         ` Mike Rapoport
2022-01-31 17:23   ` Nadav Amit [this message]
2022-01-31 17:28     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18B50289-223E-4C78-B2D6-8E9F0B9E2387@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).