All of lore.kernel.org
 help / color / mirror / Atom feed
From: Axel Rasmussen <axelrasmussen@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <songmuchun@bytedance.com>,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
Date: Wed, 13 Jul 2022 15:46:06 -0700	[thread overview]
Message-ID: <CAJHvVcgqb6R6ePMbgmA8LpMpMgrGWie9ZCTuR4MA77GTvi4XAw@mail.gmail.com> (raw)
In-Reply-To: <Ys7uXHdzzFezUub5@xz-m1.local>

I think there is a small mistake in this patch.

Consider the non-minor-fault case. We have this block:

/* Add shared, newly allocated pages to the page cache. */
if (vm_shared && !is_continue) {
        /* ... */
}

In here, we've added the newly allocated page to the page cache, and
we've set this page_in_pagecache flag to true. But we *do not* setup
rmap for this page in this block. I think in this case, the patch will
cause us to do the wrong thing: we should hugepage_add_new_anon_rmap()
further down, but with this patch we dup instead.

On Wed, Jul 13, 2022 at 9:10 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jul 13, 2022 at 10:24:09AM -0400, Peter Xu wrote:
> > On Tue, Jul 12, 2022 at 10:39:20AM -0700, Mike Kravetz wrote:
> > > On 07/12/22 21:05, Miaohe Lin wrote:
> > > > In MCOPY_ATOMIC_CONTINUE case with a non-shared VMA, pages in the page
> > > > cache are installed in the ptes. But hugepage_add_new_anon_rmap is called
> > > > for them mistakenly because they're not vm_shared. This will corrupt the
> > > > page->mapping used by page cache code.
> > > >
> > > > Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
> > > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > > > ---
> > > >  mm/hugetlb.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > This looks correct to me.
> > >
> > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > >
> > > However, I am having a hard time wrapping my head around how UFFDIO_CONTINUE
> > > should work on non-anon private mappings.  For example, a private mapping of
> > > a hugetlbfs file.  I think we just map the page in the file/cache and do not
> > > set the write bit in the pte.  So, yes we would want page_dup_file_rmap()
> > > in this case as shown below.
> > >
> > > Adding Axel and Peter on Cc: as they were more involved in adding that code
> > > and the design of UFFDIO_CONTINUE.
> >
> > Yes the change makes sense to me too.  There's just one thing to check on
> > whether minor mode should support private mappings at all as it's probably
> > not in the major goal of when it's proposed.
> >
> > I don't see why it can't logically, but I think we should have failed the
> > uffdio-register already somewhere before when the vma was private and
> > registered with minor mode.  It's just that I cannot quickly find it in the
> > code anywhere..  ideally it should be checked in vma_can_userfault() but it
> > seems not.
> >
> > Axel?
> >
> > PS: the minor mode man page update seems to be still missing.
>
> Oh I should have done a pull first on the man-page repo..
>
> From the man page indeed I didn't see anything mentioned on not allowing
> private mappings.  There's the example given on using two mappings for
> modifying pages but logically that applies to private mappings too - we
> could have mapped the uffd region with private mappings but the other one
> shared, then we can modify page caches but later after pte installed it'll
> trigger cow for writes.
>
> So I think we need to confirm whether private mappings are supported. If
> no, we should be crystal clear in both the code and man page (we probably
> want a follow up patch to man-page to mention that too?).  If yes, we'll
> need Miaohe's patch and also make sure they're enabled in the current code
> path.  We'll also want to set test_uffdio_minor=1 for "hugetlb" test case
> in the userfaultfd kselftest (currently it's not there).

So, originally when I proposed minor fault handling, I was expecting
to only support VM_SHARED VMAs. Indeed, I too have a hard time
imagining how one might benefit from using it with a private mapping.
If my memory serves this restriction was relaxed due to feedback on
the original RFC proposal [1], essentially on the basis of: why make
it more restrictive than it needs to be? Since all we need for a minor
fault to happen is for the pages to be in the page cache, that's the
only restriction we should have.

I don't see why it shouldn't work in principle though. Imagine a
scenario where the VM guest's mapping is private, and the memory
manager's mapping is shared. I guess in this case, say for a write
from the guest:

1. The guest will generate a minor fault
2. The memory manager can modify the page via its shared mapping, and
the guest will see those changes
3. After UFFDIO_CONTINUE resolves the fault, the page is CoW-ed, and
the memory manager can no longer see the guest's version of the page

I'm not really sure *why* you'd want to do this, but it seems like it
should work.

[1]: https://patchwork.kernel.org/project/linux-mm/patch/20210107190453.3051110-2-axelrasmussen@google.com/

>
> --
> Peter Xu
>

  reply	other threads:[~2022-07-13 22:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-12 13:05 [PATCH] mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte Miaohe Lin
2022-07-12 17:39 ` Mike Kravetz
2022-07-13  2:10   ` Miaohe Lin
2022-07-13 14:24   ` Peter Xu
2022-07-13 16:10     ` Peter Xu
2022-07-13 22:46       ` Axel Rasmussen [this message]
2022-07-13 23:36         ` Mike Kravetz
2022-07-14  0:20           ` Axel Rasmussen
2022-07-14 10:09             ` Miaohe Lin
2022-07-14 15:45               ` Peter Xu
2022-07-15  2:50                 ` Miaohe Lin
2022-07-13 17:23 ` Andrew Morton
2022-07-14  9:59   ` Miaohe Lin
2022-07-14 15:52     ` Peter Xu
2022-07-15  3:56       ` Miaohe Lin
2022-07-15 12:35         ` Peter Xu
2022-07-15 16:45           ` Axel Rasmussen
2022-07-15 17:07             ` Peter Xu
2022-07-15 17:28               ` Axel Rasmussen
2022-07-15 17:39                 ` Peter Xu
2022-07-15 17:51                   ` Axel Rasmussen
2022-07-16  1:32                     ` Miaohe Lin
2022-07-15 17:29               ` Mike Kravetz
2022-07-15 17:38                 ` Peter Xu
2022-07-16 23:06     ` Andrew Morton
2022-07-18  2:25       ` Miaohe Lin
2022-07-18 18:07         ` Axel Rasmussen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJHvVcgqb6R6ePMbgmA8LpMpMgrGWie9ZCTuR4MA77GTvi4XAw@mail.gmail.com \
    --to=axelrasmussen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.