linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Axel Rasmussen" <axelrasmussen@google.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Alexey Dobriyan" <adobriyan@gmail.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Anshuman Khandual" <anshuman.khandual@arm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Chinwen Chang" <chinwen.chang@mediatek.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Ingo Molnar" <mingo@redhat.com>, "Jann Horn" <jannh@google.com>,
	"Jerome Glisse" <jglisse@redhat.com>,
	"Lokesh Gidra" <lokeshgidra@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Michel Lespinasse" <walken@google.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>, "Shaohua Li" <shli@fb.com>,
	"Shawn Anastasio" <shawn@anastas.io>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Steven Price" <steven.price@arm.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, "Adam Ruprecht" <ruprecht@google.com>,
	"Cannon Matthews" <cannonmatthews@google.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"David Rientjes" <rientjes@google.com>,
	"Oliver Upton" <oupton@google.com>
Subject: Re: [RFC PATCH 0/2] userfaultfd: handle minor faults, add UFFDIO_CONTINUE
Date: Mon, 11 Jan 2021 18:08:48 -0500	[thread overview]
Message-ID: <20210111230848.GA588752@xz-x1> (raw)
In-Reply-To: <48f4f43f-eadd-f37d-bd8f-bddba03a7d39@oracle.com>

On Mon, Jan 11, 2021 at 02:42:48PM -0800, Mike Kravetz wrote:
> On 1/7/21 11:04 AM, Axel Rasmussen wrote:
> > Overview
> > ========
> > 
> > This series adds a new userfaultfd registration mode,
> > UFFDIO_REGISTER_MODE_MINOR. This allows userspace to intercept "minor" faults.
> > By "minor" fault, I mean the following situation:
> > 
> > Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory).
> > One of the mappings is registered with userfaultfd (in minor mode), and the
> > other is not. Via the non-UFFD mapping, the underlying pages have already been
> > allocated & filled with some contents. The UFFD mapping has not yet been
> > faulted in; when it is touched for the first time, this results in what I'm
> > calling a "minor" fault. As a concrete example, when working with hugetlbfs, we
> > have huge_pte_none(), but find_lock_page() finds an existing page.
> > 
> > We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is,
> > userspace resolves the fault by either a) doing nothing if the contents are
> > already correct, or b) updating the underlying contents using the second,
> > non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA,
> > or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel
> > "I have ensured the page contents are correct, carry on setting up the mapping".
> > 
> 
> One quick thought.
> 
> This is not going to work as expected with hugetlbfs pmd sharing.  If you
> are not familiar with hugetlbfs pmd sharing, you are not alone. :)
> 
> pmd sharing is enabled for x86 and arm64 architectures.  If there are multiple
> shared mappings of the same underlying hugetlbfs file or shared memory segment
> that are 'suitably aligned', then the PMD pages associated with those regions
> are shared by all the mappings.  Suitably aligned means 'on a 1GB boundary'
> and 1GB in size.
> 
> When pmds are shared, your mappings will never see a 'minor fault'.  This
> is because the PMD (page table entries) is shared.

Thanks for raising this, Mike.

I've got a few patches that plan to disable huge pmd sharing for uffd in
general, e.g.:

https://github.com/xzpeter/linux/commit/f9123e803d9bdd91bf6ef23b028087676bed1540
https://github.com/xzpeter/linux/commit/aa9aeb5c4222a2fdb48793cdbc22902288454a31

I believe we don't want that for missing mode too, but it's just not extremely
important for missing mode yet, because in missing mode we normally monitor all
the processes that will be using the registered mm range.  For example, in QEMU
postcopy migration with vhost-user hugetlbfs files as backends, we'll monitor
both the QEMU process and the DPDK program, so that either of the programs will
trigger a missing fault even if pmd shared between them.  However again I think
it's not ideal since uffd (even if missing mode) is pgtable-based, so sharing
could always be too tricky.

They're not yet posted to public yet since that's part of uffd-wp support for
hugetlbfs (along with shmem).  So just raise this up to avoid potential
duplicated work before I post the patchset.

(Will read into details soon; probably too many things piled up...)

Thanks,

-- 
Peter Xu


  reply	other threads:[~2021-01-12  0:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-07 19:04 [RFC PATCH 0/2] userfaultfd: handle minor faults, add UFFDIO_CONTINUE Axel Rasmussen
2021-01-07 19:04 ` [RFC PATCH 1/2] userfaultfd: add minor fault registration mode Axel Rasmussen
2021-01-11 11:58   ` Dr. David Alan Gilbert
2021-01-11 17:37     ` Axel Rasmussen
2021-01-11 18:09       ` Dr. David Alan Gilbert
2021-01-07 19:04 ` [RFC PATCH 2/2] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
2021-01-11 11:43 ` [RFC PATCH 0/2] userfaultfd: handle minor faults, add UFFDIO_CONTINUE Dr. David Alan Gilbert
2021-01-11 22:42 ` Mike Kravetz
2021-01-11 23:08   ` Peter Xu [this message]
2021-01-12  0:13     ` Mike Kravetz
2021-01-12  1:49       ` Peter Xu
2021-01-12 17:37         ` Axel Rasmussen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210111230848.GA588752@xz-x1 \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=axelrasmussen@google.com \
    --cc=cannonmatthews@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=chinwen.chang@mediatek.com \
    --cc=dgilbert@redhat.com \
    --cc=jannh@google.com \
    --cc=jglisse@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=oupton@google.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=ruprecht@google.com \
    --cc=shawn@anastas.io \
    --cc=shli@fb.com \
    --cc=steven.price@arm.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).