linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Mike Rapoport <rppt@linux.ibm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Andrei Vagin <avagin@virtuozzo.com>
Subject: Re: [PATCH] userfaultfd: prevent non-cooperative events vs mcopy_atomic races
Date: Sun, 6 Dec 2020 20:31:39 -0800	[thread overview]
Message-ID: <5921BA80-F263-4F8D-B7E6-316CEB602B51@gmail.com> (raw)
In-Reply-To: <20201206093703.GY123287@linux.ibm.com>

Thanks for the detailed answer, Mike. Things are clearer in regard to your
intention.

> On Dec 6, 2020, at 1:37 AM, Mike Rapoport <rppt@linux.ibm.com> wrote:
> 
> The uffd monotor should *know* what is the state of child's memory and
> without this patch it could only guess.

I see - so mmap_changing is not just about graceful handling of copy-ioctl’s
(which I think monitors could handle before mmap_changing was introduced)
but to allow the monitor to know which pages are mapped in each process.
Makes sense, but I have strong doubts it really works (see below).

>> 2. How is memory ordering supposed to work here? IIUC, mmap_changing is not
>> protected by any lock and there are no memory barriers that are associated
>> with the assignment. Indeed, the code calls WRITE_ONCE()/READ_ONCE(), but
>> AFAIK this does not guarantee ordering with non-volatile reads/writes.
> 
> There is also mmap_lock involved, so I don't see how copy can start in
> parallel with fork processing. Fork sets mmap_chaning to true while
> holding mmap_lock, so copy cannot start in parallel. When mmap_lock is
> realeased, mmap_chaning remains true until fork event is pushed to
> userspace and when this is done there is no issue with
> userfaultfd_copy.

Whenever I run into a non-standard and non-trivial synchronization algorithm
in the kernel (and elsewhere), I become very confused and concerned. I
raised my question since I wanted to modify the code and could not figure
out how to properly do so. Based on your input that the monitor is expected
to know the child mappings according to userfaultfd events, I now think that
the kernel does not provide this ability and the locking scheme is broken.

Here are some scenarios that I think are broken - please correct me if I am
wrong:

* Scenario 1: MADV_DONTNEED racing with userfaultfd page-faults

userfaultfd_remove() only holds the mmap_lock for read, so these events
cannot be ordered with userfaultfd page-faults.

* Scenario 2: MADV_DONTNEED racing with fork()

As userfaultfd_remove() releases mmap_lock after the user notification and
before the actual unmapping, concurrent fork() might happen before or after
the actual unmapping in MADV_DONTNEED and the user therefore has no way of
knowing whether the actual unmapping took place before or after the fork().

* Scenario 3: Concurrent MADV_DONTNEED can cause userfaultfd_remove() to
clear mmap_changing cleared before all the notifications are completed.

As mmap_lock is only taken for read, the first thread the completed
userfaultfd_remove() would clear the indication that was set by the other
one.

* Scenario 4: Fork starts and ends between copying of two pages.

As mmap_lock might be released during ioctl_copy() (inside
__mcopy_atomic()), some pages might be mapped in the child and others not:


CPU0				CPU1
----				----
ioctl_copy():
 __mcopy_atomic()
  mmap_read_lock()
  !mmap_changing [ok]
  mfill_atomic_pte() == 0 [page0 copied]
  mfill_atomic_pte() == -ENOENT [page1 will be retried]
  mmap_read_unlock()
  goto retry

				fork():
				 dup_userfaultfd()
				 -> mmap_changing=true
				 userfaultfd_event_wait_completion()
				 -> mmap_changing=false

  mmap_read_lock()
  !mmap_changing [ok]
  mfill_atomic_pte() == 0 [page1 copied]
  mmap_read_unlock()
 
 return: 2 pages were mapped, while the first is present in the child and
 the second one is non-present.

Bottom-line: it seems to me that mmap_changing should be a counter (not
boolean) that is protected by mmap_lock. This counter should be kept
elevated throughout the entire operation (in regard to MADV_DONTNEED).
Perhaps mmap_lock does not have to be taken to decrease the counter, but
then an smp_wmb() would be needed before the counter is decreased.

Let me know whether I am completely off or missing something.

Thanks,
Nadav



  reply	other threads:[~2020-12-07  4:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-23  7:42 [PATCH] userfaultfd: prevent non-cooperative events vs mcopy_atomic races Mike Rapoport
2018-05-24 11:24 ` Pavel Emelyanov
2018-05-24 11:56   ` Mike Rapoport
2018-05-24 16:40     ` Pavel Emelyanov
2018-05-24 19:06       ` Mike Rapoport
2018-05-25 14:05         ` Pavel Emelyanov
2020-12-03 19:57 ` Nadav Amit
2020-12-06  9:37   ` Mike Rapoport
2020-12-07  4:31     ` Nadav Amit [this message]
2020-12-08  8:34       ` Mike Rapoport
2020-12-08  8:57         ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5921BA80-F263-4F8D-B7E6-316CEB602B51@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rppt@linux.ibm.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).