All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Ives van Hoorne <ives@codesandbox.io>,
	Nadav Amit <nadav.amit@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2 1/2] mm/migrate: Fix read-only page got writable when recover pte
Date: Mon, 14 Nov 2022 15:09:05 -0500	[thread overview]
Message-ID: <Y3KgYeMTdTM0FN5W@x1n> (raw)
In-Reply-To: <9af36be3-313b-e39c-85bb-bf30011bccb8@redhat.com>

On Mon, Nov 14, 2022 at 05:09:32PM +0100, David Hildenbrand wrote:
> On 10.11.22 21:31, Peter Xu wrote:
> > Ives van Hoorne from codesandbox.io reported an issue regarding possible
> > data loss of uffd-wp when applied to memfds on heavily loaded systems.  The
> > sympton is some read page got data mismatch from the snapshot child VMs.
> > 
> > Here I can also reproduce with a Rust reproducer that was provided by Ives
> > that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate
> > 80 instances I can trigger the issues in ten minutes.
> > 
> > It turns out that we got some pages write-through even if uffd-wp is
> > applied to the pte.
> > 
> > The problem is, when removing migration entries, we didn't really worry
> > about write bit as long as we know it's not a write migration entry.  That
> > may not be true, for some memory types (e.g. writable shmem) mk_pte can
> > return a pte with write bit set, then to recover the migration entry to its
> > original state we need to explicit wr-protect the pte or it'll has the
> > write bit set if it's a read migration entry.
> > 
> > For uffd it can cause write-through.  I didn't verify, but I think it'll be
> > the same for mprotect()ed pages and after migration we can miss the sigbus
> > instead.
> 
> I don't think so. mprotect() handling relies on vma->vm_page_prot, which is
> supposed to do the right thing. E.g., map the pte protnone without
> VM_READ/VM_WRITE/....

I've removed that example when I posted v3, feel free to have a look.

> 
> > 
> > The relevant code on uffd was introduced in the anon support, which is
> > commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration",
> > 2020-04-07).  However anon shouldn't suffer from this problem because anon
> > should already have the write bit cleared always, so that may not be a
> > proper Fixes target.  To satisfy the need on the backport, I'm attaching
> > the Fixes tag to the uffd-wp shmem support.  Since no one had issue with
> > mprotect, so I assume that's also the kernel version we should start to
> > backport for stable, and we shouldn't need to worry before that.
> > 
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: stable@vger.kernel.org
> > Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
> > Reported-by: Ives van Hoorne <ives@codesandbox.io>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   mm/migrate.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index dff333593a8a..8b6351c08c78 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio,
> >   			pte = pte_mkdirty(pte);
> >   		if (is_writable_migration_entry(entry))
> >   			pte = maybe_mkwrite(pte, vma);
> > -		else if (pte_swp_uffd_wp(*pvmw.pte))
> > +		else
> > +			/* NOTE: mk_pte can have write bit set */
> > +			pte = pte_wrprotect(pte);
> 
> 
> Any particular reason why not to simply glue this to pte_swp_uffd_wp(),
> because only that needs special care:
> 
> if (pte_swp_uffd_wp(*pvmw.pte)) {
> 	pte = pte_wrprotect(pte);
> 	pte = pte_mkuffd_wp(pte);
> }
> 
> 
> And that would match what actually should have been done in commit
> f45ec5ff16a7 -- only special-case uffd-wp.
> 
> Note that I think there are cases where we have a PTE that was !writable,
> but after migration we can map it writable.

The thing is recovering the pte into its original form is the safest
approach to me, so I think we need justification on why it's always safe to
set the write bit.

Or do you perhaps have solid clue and think it's always safe?

> 
> BTW, does unuse_pte() need similar care?
> 
> new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
> if (pte_swp_uffd_wp(*pte))
> 	new_pte = pte_mkuffd_wp(new_pte);
> set_pte_at(vma->vm_mm, addr, pte, new_pte);

I think unuse path is fine because unuse only applies to private mappings,
so we should always have the W bit removed there within mk_pte().

Thanks,

-- 
Peter Xu


  reply	other threads:[~2022-11-14 20:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-10 20:31 [PATCH v2 0/2] mm/migrate: Fix writable pte for read migration entry Peter Xu
2022-11-10 20:31 ` [PATCH v2 1/2] mm/migrate: Fix read-only page got writable when recover pte Peter Xu
2022-11-10 21:28   ` Nadav Amit
2022-11-10 22:09     ` Peter Xu
2022-11-10 21:53   ` Ives van Hoorne
2022-11-10 22:08     ` Peter Xu
2022-11-10 23:42   ` Alistair Popple
2022-11-13 23:56     ` Peter Xu
2022-11-14  6:22       ` Alistair Popple
2022-11-14 16:09   ` David Hildenbrand
2022-11-14 20:09     ` Peter Xu [this message]
2022-11-15  9:13       ` David Hildenbrand
2022-11-15 16:08         ` Peter Xu
2022-11-15 17:22           ` David Hildenbrand
2022-11-15 17:54             ` David Hildenbrand
2022-11-15 18:11               ` Peter Xu
2022-11-15 18:16                 ` David Hildenbrand
2022-11-15 18:03             ` Peter Xu
2022-11-15 18:08               ` David Hildenbrand
2022-11-10 20:31 ` [PATCH v2 2/2] mm/uffd: Sanity check write bit for uffd-wp protected ptes Peter Xu
2022-11-11 22:06   ` kernel test robot
2022-11-13 22:33     ` Peter Xu
2022-11-12  2:59   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3KgYeMTdTM0FN5W@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=ives@codesandbox.io \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.