From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Andrea Arcangeli <aarcange@redhat.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [PATCH 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling
Date: Mon, 3 Oct 2022 14:45:47 -0700 [thread overview]
Message-ID: <YztYC+2jalrsxKDm@monkey> (raw)
In-Reply-To: <YztT2OJake65WG3P@x1n>
On 10/03/22 17:27, Peter Xu wrote:
> On Mon, Oct 03, 2022 at 10:20:29AM -0700, Mike Kravetz wrote:
> > On 10/03/22 11:56, Peter Xu wrote:
> > > After the recent rework patchset of hugetlb locking on pmd sharing,
> > > kselftest for userfaultfd sometimes fails on hugetlb private tests with
> > > unexpected write fault checks.
> > >
> > > It turns out there's nothing wrong within the locking series regarding this
> > > matter, but it could have changed the timing of threads so it can trigger
> > > an old bug.
> > >
> > > The real bug is when we call hugetlb_no_page() we're not with the pgtable
> > > lock. It means we're reading the pte values lockless. It's perfectly fine
> > > in most cases because before we do normal page allocations we'll take the
> > > lock and check pte_same() again. However before that, there are actually
> > > two paths on userfaultfd missing/minor handling that may directly move on
> > > with the fault process without checking the pte values.
> > >
> > > It means for these two paths we may be generating an uffd message based on
> > > an unstable pte, while an unstable pte can legally be anything as long as
> > > the modifier holds the pgtable lock.
> > >
> > > One example, which is also what happened in the failing kselftest and
> > > caused the test failure, is that for private mappings CoW can happen on one
> > > page. CoW requires pte being cleared before being replaced with a new page
> > > for TLB coherency, but then there can be a race condition:
> > >
> > > thread 1 thread 2
> > > -------- --------
> > >
> > > hugetlb_fault hugetlb_fault
> > > private pte RO
> > > hugetlb_wp
> > > pgtable_lock()
> > > huge_ptep_clear_flush
> > > pte=NULL
> > > hugetlb_no_page
> > > generate uffd missing event
> > > even if page existed!!
> > > set_huge_pte_at
> > > pgtable_unlock()
> >
> > Thanks for working on this Peter!
> >
> > I agree with this patch, but I suspect the above race is not possible. Why?
> > In both cases, we take the hugetlb fault mutex when processing a huegtlb
> > page fault. This means only one thread can execute the fault code for
> > a specific mapping/index at a time. This is why I was so confused, and may
> > remain a bit confused about how we end up with userfault processing a write
> > fault. It was part of the reason for my 'unclear' wording about this being
> > more about cpus not seeing updated values. Note that we do drop the fault
> > mutex before calling handle_usefault, but by then we have already made the
> > 'missing' determination.
> >
> > Thoughts? Perhaps, I am still confused.
>
> It's my fault to have the commit message wrong, sorry. :) And thanks for
> raising this question, I could have overlooked that.
>
> It turns out it's not the CoW that's clearing the pte... it's the
> wr-protect with huge_ptep_modify_prot_start(). So the race is with
> UFFDIO_WRITEPROTECT, not CoW.
Thank you! Now I understand.
This also explains why the new locking exposes the race.
hugetlb_change_protection needs to take the i_mmap_sema in write mode because
it could unshare pmds. Previously, hugetlb page faults took i_mmap_sema in
read mode so this race could not happen.
--
Mike Kravetz
next prev parent reply other threads:[~2022-10-03 21:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-03 15:56 [PATCH 0/3] mm/hugetlb: Fix selftest failures with write check Peter Xu
2022-10-03 15:56 ` [PATCH 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Peter Xu
2022-10-03 17:20 ` Mike Kravetz
2022-10-03 21:27 ` Peter Xu
2022-10-03 21:45 ` Mike Kravetz [this message]
2022-10-04 0:28 ` Peter Xu
2022-10-03 21:00 ` Nadav Amit
2022-10-03 21:16 ` Peter Xu
2022-10-03 21:32 ` Nadav Amit
2022-10-03 15:56 ` [PATCH 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check Peter Xu
2022-10-03 15:56 ` [PATCH 3/3] mm/selftest: uffd: Explain the write missing fault check Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YztYC+2jalrsxKDm@monkey \
--to=mike.kravetz@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).