* [PATCH v2 0/3] mm/hugetlb: Fix selftest failures with write check @ 2022-10-04 0:37 Peter Xu 2022-10-04 0:37 ` [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Peter Xu ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Peter Xu @ 2022-10-04 0:37 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: peterx, Andrew Morton, David Hildenbrand, Nadav Amit, Mike Kravetz, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport v2: - Fix the commit message of patch 1, replacing CoW with wr-unprotect example Currently akpm mm-unstable fails with uffd hugetlb private mapping test randomly on a write check. The initial bisection of that points to the recent pmd unshare series, but it turns out there's no direction relationship with the series but only some timing change caused the race to start trigger. The race should be fixed in patch 1. Patch 2 is a trivial cleanup on the similar race with hugetlb migrations, patch 3 comment on the write check so when anyone read it again it'll be clear why it's there. Thanks, Peter Xu (3): mm/hugetlb: Fix race condition of uffd missing/minor handling mm/hugetlb: Use hugetlb_pte_stable in migration race check mm/selftest: uffd: Explain the write missing fault check mm/hugetlb.c | 62 ++++++++++++++++++++---- tools/testing/selftests/vm/userfaultfd.c | 22 ++++++++- 2 files changed, 73 insertions(+), 11 deletions(-) -- 2.37.3 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling 2022-10-04 0:37 [PATCH v2 0/3] mm/hugetlb: Fix selftest failures with write check Peter Xu @ 2022-10-04 0:37 ` Peter Xu 2022-10-04 2:32 ` Mike Kravetz 2022-10-04 12:19 ` David Hildenbrand 2022-10-04 0:37 ` [PATCH v2 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check Peter Xu 2022-10-04 0:37 ` [PATCH v2 3/3] mm/selftest: uffd: Explain the write missing fault check Peter Xu 2 siblings, 2 replies; 12+ messages in thread From: Peter Xu @ 2022-10-04 0:37 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: peterx, Andrew Morton, David Hildenbrand, Nadav Amit, Mike Kravetz, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport After the recent rework patchset of hugetlb locking on pmd sharing, kselftest for userfaultfd sometimes fails on hugetlb private tests with unexpected write fault checks. It turns out there's nothing wrong within the locking series regarding this matter, but it could have changed the timing of threads so it can trigger an old bug. The real bug is when we call hugetlb_no_page() we're not with the pgtable lock. It means we're reading the pte values lockless. It's perfectly fine in most cases because before we do normal page allocations we'll take the lock and check pte_same() again. However before that, there are actually two paths on userfaultfd missing/minor handling that may directly move on with the fault process without checking the pte values. It means for these two paths we may be generating an uffd message based on an unstable pte, while an unstable pte can legally be anything as long as the modifier holds the pgtable lock. One example, which is also what happened in the failing kselftest and caused the test failure, is that for private mappings wr-protection changes can happen on one page. While hugetlb_change_protection() generally requires pte being cleared before being changed, then there can be a race condition like: thread 1 thread 2 -------- -------- UFFDIO_WRITEPROTECT hugetlb_fault hugetlb_change_protection pgtable_lock() huge_ptep_modify_prot_start pte==NULL hugetlb_no_page generate uffd missing event even if page existed!! huge_ptep_modify_prot_commit pgtable_unlock() Fix this by recheck the pte after pgtable lock for both userfaultfd missing & minor fault paths. This bug should have been around starting from uffd hugetlb introduced, so attaching a Fixes to the commit. Also attach another Fixes to the minor support commit for easier tracking. Note that userfaultfd is actually fine with false positives (e.g. caused by pte changed), but not wrong logical events (e.g. caused by reading a pte during changing). The latter can confuse the userspace, so the strictness is very much preferred. E.g., MISSING event should never happen on the page after UFFDIO_COPY has correctly installed the page and returned. Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Nadav Amit <nadav.amit@gmail.com> Fixes: 1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook") Fixes: 7677f7fd8be7 ("userfaultfd: add minor fault registration mode") Co-developed-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/hugetlb.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9679fe519b90..fa3fcdb0c4b8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5521,6 +5521,23 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, return ret; } +/* + * Recheck pte with pgtable lock. Returns true if pte didn't change, or + * false if pte changed or is changing. + */ +static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, + pte_t *ptep, pte_t old_pte) +{ + spinlock_t *ptl; + bool same; + + ptl = huge_pte_lock(h, mm, ptep); + same = pte_same(huge_ptep_get(ptep), old_pte); + spin_unlock(ptl); + + return same; +} + static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, @@ -5562,9 +5579,30 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, goto out; /* Check for page in userfault range */ if (userfaultfd_missing(vma)) { - ret = hugetlb_handle_userfault(vma, mapping, idx, - flags, haddr, address, - VM_UFFD_MISSING); + /* + * Since hugetlb_no_page() was examining pte + * without pgtable lock, we need to re-test under + * lock because the pte may not be stable and could + * have changed from under us. Try to detect + * either changed or during-changing ptes and retry + * properly when needed. + * + * Note that userfaultfd is actually fine with + * false positives (e.g. caused by pte changed), + * but not wrong logical events (e.g. caused by + * reading a pte during changing). The latter can + * confuse the userspace, so the strictness is very + * much preferred. E.g., MISSING event should + * never happen on the page after UFFDIO_COPY has + * correctly installed the page and returned. + */ + if (hugetlb_pte_stable(h, mm, ptep, old_pte)) + ret = hugetlb_handle_userfault( + vma, mapping, idx, flags, haddr, + address, VM_UFFD_MISSING); + else + /* Retry the fault */ + ret = 0; goto out; } @@ -5634,9 +5672,14 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, if (userfaultfd_minor(vma)) { unlock_page(page); put_page(page); - ret = hugetlb_handle_userfault(vma, mapping, idx, - flags, haddr, address, - VM_UFFD_MINOR); + /* See comment in userfaultfd_missing() block above */ + if (hugetlb_pte_stable(h, mm, ptep, old_pte)) + ret = hugetlb_handle_userfault( + vma, mapping, idx, flags, haddr, + address, VM_UFFD_MINOR); + else + /* Retry the fault */ + ret = 0; goto out; } } -- 2.37.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling 2022-10-04 0:37 ` [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Peter Xu @ 2022-10-04 2:32 ` Mike Kravetz 2022-10-04 13:53 ` Peter Xu 2022-10-04 12:19 ` David Hildenbrand 1 sibling, 1 reply; 12+ messages in thread From: Mike Kravetz @ 2022-10-04 2:32 UTC (permalink / raw) To: Peter Xu Cc: linux-kernel, linux-mm, Andrew Morton, David Hildenbrand, Nadav Amit, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport On 10/03/22 20:37, Peter Xu wrote: > After the recent rework patchset of hugetlb locking on pmd sharing, > kselftest for userfaultfd sometimes fails on hugetlb private tests with > unexpected write fault checks. > > It turns out there's nothing wrong within the locking series regarding this > matter, but it could have changed the timing of threads so it can trigger > an old bug. > > The real bug is when we call hugetlb_no_page() we're not with the pgtable > lock. It means we're reading the pte values lockless. It's perfectly fine > in most cases because before we do normal page allocations we'll take the > lock and check pte_same() again. However before that, there are actually > two paths on userfaultfd missing/minor handling that may directly move on > with the fault process without checking the pte values. > > It means for these two paths we may be generating an uffd message based on > an unstable pte, while an unstable pte can legally be anything as long as > the modifier holds the pgtable lock. > > One example, which is also what happened in the failing kselftest and > caused the test failure, is that for private mappings wr-protection changes > can happen on one page. While hugetlb_change_protection() generally > requires pte being cleared before being changed, then there can be a race > condition like: > > thread 1 thread 2 > -------- -------- > > UFFDIO_WRITEPROTECT hugetlb_fault > hugetlb_change_protection > pgtable_lock() > huge_ptep_modify_prot_start > pte==NULL > hugetlb_no_page > generate uffd missing event > even if page existed!! > huge_ptep_modify_prot_commit > pgtable_unlock() > > Fix this by recheck the pte after pgtable lock for both userfaultfd missing > & minor fault paths. > > This bug should have been around starting from uffd hugetlb introduced, so > attaching a Fixes to the commit. Also attach another Fixes to the minor > support commit for easier tracking. > > Note that userfaultfd is actually fine with false positives (e.g. caused by > pte changed), but not wrong logical events (e.g. caused by reading a pte > during changing). The latter can confuse the userspace, so the strictness > is very much preferred. E.g., MISSING event should never happen on the > page after UFFDIO_COPY has correctly installed the page and returned. > > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Axel Rasmussen <axelrasmussen@google.com> > Cc: Nadav Amit <nadav.amit@gmail.com> > Fixes: 1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook") > Fixes: 7677f7fd8be7 ("userfaultfd: add minor fault registration mode") > Co-developed-by: Mike Kravetz <mike.kravetz@oracle.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 49 insertions(+), 6 deletions(-) Thanks. Do note that this will not apply on top of "mm: hugetlb: fix UAF in hugetlb_handle_userfault" which is already in Andrew's tree. However, required changes should be simple. Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> -- Mike Kravetz > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 9679fe519b90..fa3fcdb0c4b8 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5521,6 +5521,23 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, > return ret; > } > > +/* > + * Recheck pte with pgtable lock. Returns true if pte didn't change, or > + * false if pte changed or is changing. > + */ > +static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, > + pte_t *ptep, pte_t old_pte) > +{ > + spinlock_t *ptl; > + bool same; > + > + ptl = huge_pte_lock(h, mm, ptep); > + same = pte_same(huge_ptep_get(ptep), old_pte); > + spin_unlock(ptl); > + > + return same; > +} > + > static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > struct vm_area_struct *vma, > struct address_space *mapping, pgoff_t idx, > @@ -5562,9 +5579,30 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > goto out; > /* Check for page in userfault range */ > if (userfaultfd_missing(vma)) { > - ret = hugetlb_handle_userfault(vma, mapping, idx, > - flags, haddr, address, > - VM_UFFD_MISSING); > + /* > + * Since hugetlb_no_page() was examining pte > + * without pgtable lock, we need to re-test under > + * lock because the pte may not be stable and could > + * have changed from under us. Try to detect > + * either changed or during-changing ptes and retry > + * properly when needed. > + * > + * Note that userfaultfd is actually fine with > + * false positives (e.g. caused by pte changed), > + * but not wrong logical events (e.g. caused by > + * reading a pte during changing). The latter can > + * confuse the userspace, so the strictness is very > + * much preferred. E.g., MISSING event should > + * never happen on the page after UFFDIO_COPY has > + * correctly installed the page and returned. > + */ > + if (hugetlb_pte_stable(h, mm, ptep, old_pte)) > + ret = hugetlb_handle_userfault( > + vma, mapping, idx, flags, haddr, > + address, VM_UFFD_MISSING); > + else > + /* Retry the fault */ > + ret = 0; > goto out; > } > > @@ -5634,9 +5672,14 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > if (userfaultfd_minor(vma)) { > unlock_page(page); > put_page(page); > - ret = hugetlb_handle_userfault(vma, mapping, idx, > - flags, haddr, address, > - VM_UFFD_MINOR); > + /* See comment in userfaultfd_missing() block above */ > + if (hugetlb_pte_stable(h, mm, ptep, old_pte)) > + ret = hugetlb_handle_userfault( > + vma, mapping, idx, flags, haddr, > + address, VM_UFFD_MINOR); > + else > + /* Retry the fault */ > + ret = 0; > goto out; > } > } > -- > 2.37.3 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling 2022-10-04 2:32 ` Mike Kravetz @ 2022-10-04 13:53 ` Peter Xu 0 siblings, 0 replies; 12+ messages in thread From: Peter Xu @ 2022-10-04 13:53 UTC (permalink / raw) To: Mike Kravetz Cc: linux-kernel, linux-mm, Andrew Morton, David Hildenbrand, Nadav Amit, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport On Mon, Oct 03, 2022 at 07:32:58PM -0700, Mike Kravetz wrote: > Do note that this will not apply on top of "mm: hugetlb: fix UAF in > hugetlb_handle_userfault" which is already in Andrew's tree. However, > required changes should be simple. Thanks for the heads up, I'll rebase. > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling 2022-10-04 0:37 ` [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Peter Xu 2022-10-04 2:32 ` Mike Kravetz @ 2022-10-04 12:19 ` David Hildenbrand 2022-10-04 13:49 ` Peter Xu 1 sibling, 1 reply; 12+ messages in thread From: David Hildenbrand @ 2022-10-04 12:19 UTC (permalink / raw) To: Peter Xu, linux-kernel, linux-mm Cc: Andrew Morton, Nadav Amit, Mike Kravetz, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport On 04.10.22 02:37, Peter Xu wrote: > After the recent rework patchset of hugetlb locking on pmd sharing, > kselftest for userfaultfd sometimes fails on hugetlb private tests with > unexpected write fault checks. > > It turns out there's nothing wrong within the locking series regarding this > matter, but it could have changed the timing of threads so it can trigger > an old bug. > > The real bug is when we call hugetlb_no_page() we're not with the pgtable > lock. It means we're reading the pte values lockless. It's perfectly fine > in most cases because before we do normal page allocations we'll take the > lock and check pte_same() again. However before that, there are actually > two paths on userfaultfd missing/minor handling that may directly move on > with the fault process without checking the pte values. > > It means for these two paths we may be generating an uffd message based on > an unstable pte, while an unstable pte can legally be anything as long as > the modifier holds the pgtable lock. > > One example, which is also what happened in the failing kselftest and > caused the test failure, is that for private mappings wr-protection changes > can happen on one page. While hugetlb_change_protection() generally > requires pte being cleared before being changed, then there can be a race > condition like: > > thread 1 thread 2 > -------- -------- > > UFFDIO_WRITEPROTECT hugetlb_fault > hugetlb_change_protection > pgtable_lock() > huge_ptep_modify_prot_start > pte==NULL > hugetlb_no_page > generate uffd missing event > even if page existed!! > huge_ptep_modify_prot_commit > pgtable_unlock() > > Fix this by recheck the pte after pgtable lock for both userfaultfd missing > & minor fault paths. > > This bug should have been around starting from uffd hugetlb introduced, so > attaching a Fixes to the commit. Also attach another Fixes to the minor > support commit for easier tracking. > > Note that userfaultfd is actually fine with false positives (e.g. caused by > pte changed), but not wrong logical events (e.g. caused by reading a pte > during changing). The latter can confuse the userspace, so the strictness > is very much preferred. E.g., MISSING event should never happen on the > page after UFFDIO_COPY has correctly installed the page and returned. > > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Axel Rasmussen <axelrasmussen@google.com> > Cc: Nadav Amit <nadav.amit@gmail.com> > Fixes: 1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook") > Fixes: 7677f7fd8be7 ("userfaultfd: add minor fault registration mode") > Co-developed-by: Mike Kravetz <mike.kravetz@oracle.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 49 insertions(+), 6 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 9679fe519b90..fa3fcdb0c4b8 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5521,6 +5521,23 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, > return ret; > } > > +/* > + * Recheck pte with pgtable lock. Returns true if pte didn't change, or > + * false if pte changed or is changing. > + */ > +static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, > + pte_t *ptep, pte_t old_pte) > +{ > + spinlock_t *ptl; > + bool same; > + > + ptl = huge_pte_lock(h, mm, ptep); > + same = pte_same(huge_ptep_get(ptep), old_pte); > + spin_unlock(ptl); > + > + return same; > +} > + > static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > struct vm_area_struct *vma, > struct address_space *mapping, pgoff_t idx, > @@ -5562,9 +5579,30 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > goto out; > /* Check for page in userfault range */ > if (userfaultfd_missing(vma)) { > - ret = hugetlb_handle_userfault(vma, mapping, idx, > - flags, haddr, address, > - VM_UFFD_MISSING); > + /* > + * Since hugetlb_no_page() was examining pte > + * without pgtable lock, we need to re-test under > + * lock because the pte may not be stable and could > + * have changed from under us. Try to detect > + * either changed or during-changing ptes and retry > + * properly when needed. > + * > + * Note that userfaultfd is actually fine with > + * false positives (e.g. caused by pte changed), > + * but not wrong logical events (e.g. caused by > + * reading a pte during changing). The latter can > + * confuse the userspace, so the strictness is very > + * much preferred. E.g., MISSING event should > + * never happen on the page after UFFDIO_COPY has > + * correctly installed the page and returned. > + */ > + if (hugetlb_pte_stable(h, mm, ptep, old_pte)) > + ret = hugetlb_handle_userfault( > + vma, mapping, idx, flags, haddr, > + address, VM_UFFD_MISSING); That looks kind-of ugly now. I wonder if it would be worth factoring that handling out into a separate function and reusing it at two places. Would get rid of one level of code indent at least. Apart from that, LGTM. Although the lockless reading of the PTE screams for more trouble in the future :) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling 2022-10-04 12:19 ` David Hildenbrand @ 2022-10-04 13:49 ` Peter Xu 0 siblings, 0 replies; 12+ messages in thread From: Peter Xu @ 2022-10-04 13:49 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Andrew Morton, Nadav Amit, Mike Kravetz, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport On Tue, Oct 04, 2022 at 02:19:36PM +0200, David Hildenbrand wrote: > That looks kind-of ugly now. I wonder if it would be worth factoring that > handling out into a separate function and reusing it at two places. Would > get rid of one level of code indent at least. > > Apart from that, LGTM. Although the lockless reading of the PTE screams for > more trouble in the future :) Right there's potential to further rework it, I am just not sure whether that could be common enough so that we can start to take pg lock for the whole region (then we'll need to release for either page lock or alloc). Not really sure whether that'll be worth the effort. However, at least uffd minor doesn't really need the page lock so we can optimize it with a find_get_page() earlier then the missing mode can be moved over too (following a lock_page?). Maybe I should give it a shot. For this one I'll keep it simple since I think we should have it for stable too. Thanks for the review! -- Peter Xu ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check 2022-10-04 0:37 [PATCH v2 0/3] mm/hugetlb: Fix selftest failures with write check Peter Xu 2022-10-04 0:37 ` [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Peter Xu @ 2022-10-04 0:37 ` Peter Xu 2022-10-04 2:35 ` Mike Kravetz 2022-10-04 12:20 ` David Hildenbrand 2022-10-04 0:37 ` [PATCH v2 3/3] mm/selftest: uffd: Explain the write missing fault check Peter Xu 2 siblings, 2 replies; 12+ messages in thread From: Peter Xu @ 2022-10-04 0:37 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: peterx, Andrew Morton, David Hildenbrand, Nadav Amit, Mike Kravetz, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport After hugetlb_pte_stable() introduced, we can also rewrite the migration race condition against page allocation to use the new helper too. Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/hugetlb.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index fa3fcdb0c4b8..e762c5369a6f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5620,11 +5620,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * here. Before returning error, get ptl and make * sure there really is no pte entry. */ - ptl = huge_pte_lock(h, mm, ptep); - ret = 0; - if (huge_pte_none(huge_ptep_get(ptep))) + if (hugetlb_pte_stable(h, mm, ptep, old_pte)) ret = vmf_error(PTR_ERR(page)); - spin_unlock(ptl); + else + ret = 0; goto out; } clear_huge_page(page, address, pages_per_huge_page(h)); -- 2.37.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check 2022-10-04 0:37 ` [PATCH v2 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check Peter Xu @ 2022-10-04 2:35 ` Mike Kravetz 2022-10-04 12:20 ` David Hildenbrand 1 sibling, 0 replies; 12+ messages in thread From: Mike Kravetz @ 2022-10-04 2:35 UTC (permalink / raw) To: Peter Xu Cc: linux-kernel, linux-mm, Andrew Morton, David Hildenbrand, Nadav Amit, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport On 10/03/22 20:37, Peter Xu wrote: > After hugetlb_pte_stable() introduced, we can also rewrite the migration > race condition against page allocation to use the new helper too. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> -- Mike Kravetz > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index fa3fcdb0c4b8..e762c5369a6f 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5620,11 +5620,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > * here. Before returning error, get ptl and make > * sure there really is no pte entry. > */ > - ptl = huge_pte_lock(h, mm, ptep); > - ret = 0; > - if (huge_pte_none(huge_ptep_get(ptep))) > + if (hugetlb_pte_stable(h, mm, ptep, old_pte)) > ret = vmf_error(PTR_ERR(page)); > - spin_unlock(ptl); > + else > + ret = 0; > goto out; > } > clear_huge_page(page, address, pages_per_huge_page(h)); > -- > 2.37.3 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check 2022-10-04 0:37 ` [PATCH v2 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check Peter Xu 2022-10-04 2:35 ` Mike Kravetz @ 2022-10-04 12:20 ` David Hildenbrand 1 sibling, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2022-10-04 12:20 UTC (permalink / raw) To: Peter Xu, linux-kernel, linux-mm Cc: Andrew Morton, Nadav Amit, Mike Kravetz, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport On 04.10.22 02:37, Peter Xu wrote: > After hugetlb_pte_stable() introduced, we can also rewrite the migration > race condition against page allocation to use the new helper too. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > mm/hugetlb.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index fa3fcdb0c4b8..e762c5369a6f 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5620,11 +5620,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > * here. Before returning error, get ptl and make > * sure there really is no pte entry. > */ > - ptl = huge_pte_lock(h, mm, ptep); > - ret = 0; > - if (huge_pte_none(huge_ptep_get(ptep))) > + if (hugetlb_pte_stable(h, mm, ptep, old_pte)) > ret = vmf_error(PTR_ERR(page)); > - spin_unlock(ptl); > + else > + ret = 0; > goto out; > } > clear_huge_page(page, address, pages_per_huge_page(h)); Reviewed-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 3/3] mm/selftest: uffd: Explain the write missing fault check 2022-10-04 0:37 [PATCH v2 0/3] mm/hugetlb: Fix selftest failures with write check Peter Xu 2022-10-04 0:37 ` [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Peter Xu 2022-10-04 0:37 ` [PATCH v2 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check Peter Xu @ 2022-10-04 0:37 ` Peter Xu 2022-10-04 2:37 ` Mike Kravetz 2022-10-04 12:20 ` David Hildenbrand 2 siblings, 2 replies; 12+ messages in thread From: Peter Xu @ 2022-10-04 0:37 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: peterx, Andrew Morton, David Hildenbrand, Nadav Amit, Mike Kravetz, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport It's not obvious why we had a write check for each of the missing messages, especially when it should be a locking op. Add a rich comment for that, and also try to explain its good side and limitations, so that if someone hit it again for either a bug or a different glibc impl there'll be some clue to start with. Signed-off-by: Peter Xu <peterx@redhat.com> --- tools/testing/selftests/vm/userfaultfd.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 74babdbc02e5..297f250c1d95 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -774,7 +774,27 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, continue_range(uffd, msg->arg.pagefault.address, page_size); stats->minor_faults++; } else { - /* Missing page faults */ + /* + * Missing page faults. + * + * Here we force a write check for each of the missing mode + * faults. It's guaranteed because the only threads that + * will trigger uffd faults are the locking threads, and + * their first instruction to touch the missing page will + * always be pthread_mutex_lock(). + * + * Note that here we relied on an NPTL glibc impl detail to + * always read the lock type at the entry of the lock op + * (pthread_mutex_t.__data.__type, offset 0x10) before + * doing any locking operations to guarantee that. It's + * actually not good to rely on this impl detail because + * logically a pthread-compatible lib can implement the + * locks without types and we can fail when linking with + * them. However since we used to find bugs with this + * strict check we still keep it around. Hopefully this + * could be a good hint when it fails again. If one day + * it'll break on some other impl of glibc we'll revisit. + */ if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE) err("unexpected write fault"); -- 2.37.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] mm/selftest: uffd: Explain the write missing fault check 2022-10-04 0:37 ` [PATCH v2 3/3] mm/selftest: uffd: Explain the write missing fault check Peter Xu @ 2022-10-04 2:37 ` Mike Kravetz 2022-10-04 12:20 ` David Hildenbrand 1 sibling, 0 replies; 12+ messages in thread From: Mike Kravetz @ 2022-10-04 2:37 UTC (permalink / raw) To: Peter Xu Cc: linux-kernel, linux-mm, Andrew Morton, David Hildenbrand, Nadav Amit, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport On 10/03/22 20:37, Peter Xu wrote: > It's not obvious why we had a write check for each of the missing messages, > especially when it should be a locking op. Add a rich comment for that, > and also try to explain its good side and limitations, so that if someone > hit it again for either a bug or a different glibc impl there'll be some > clue to start with. Thanks! It did take a while to understand all this, so the comment is appropriate. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > tools/testing/selftests/vm/userfaultfd.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> -- Mike Kravetz > > diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c > index 74babdbc02e5..297f250c1d95 100644 > --- a/tools/testing/selftests/vm/userfaultfd.c > +++ b/tools/testing/selftests/vm/userfaultfd.c > @@ -774,7 +774,27 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, > continue_range(uffd, msg->arg.pagefault.address, page_size); > stats->minor_faults++; > } else { > - /* Missing page faults */ > + /* > + * Missing page faults. > + * > + * Here we force a write check for each of the missing mode > + * faults. It's guaranteed because the only threads that > + * will trigger uffd faults are the locking threads, and > + * their first instruction to touch the missing page will > + * always be pthread_mutex_lock(). > + * > + * Note that here we relied on an NPTL glibc impl detail to > + * always read the lock type at the entry of the lock op > + * (pthread_mutex_t.__data.__type, offset 0x10) before > + * doing any locking operations to guarantee that. It's > + * actually not good to rely on this impl detail because > + * logically a pthread-compatible lib can implement the > + * locks without types and we can fail when linking with > + * them. However since we used to find bugs with this > + * strict check we still keep it around. Hopefully this > + * could be a good hint when it fails again. If one day > + * it'll break on some other impl of glibc we'll revisit. > + */ > if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE) > err("unexpected write fault"); > > -- > 2.37.3 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] mm/selftest: uffd: Explain the write missing fault check 2022-10-04 0:37 ` [PATCH v2 3/3] mm/selftest: uffd: Explain the write missing fault check Peter Xu 2022-10-04 2:37 ` Mike Kravetz @ 2022-10-04 12:20 ` David Hildenbrand 1 sibling, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2022-10-04 12:20 UTC (permalink / raw) To: Peter Xu, linux-kernel, linux-mm Cc: Andrew Morton, Nadav Amit, Mike Kravetz, Andrea Arcangeli, Axel Rasmussen, Mike Rapoport On 04.10.22 02:37, Peter Xu wrote: > It's not obvious why we had a write check for each of the missing messages, > especially when it should be a locking op. Add a rich comment for that, > and also try to explain its good side and limitations, so that if someone > hit it again for either a bug or a different glibc impl there'll be some > clue to start with. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > tools/testing/selftests/vm/userfaultfd.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c > index 74babdbc02e5..297f250c1d95 100644 > --- a/tools/testing/selftests/vm/userfaultfd.c > +++ b/tools/testing/selftests/vm/userfaultfd.c > @@ -774,7 +774,27 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, > continue_range(uffd, msg->arg.pagefault.address, page_size); > stats->minor_faults++; > } else { > - /* Missing page faults */ > + /* > + * Missing page faults. > + * > + * Here we force a write check for each of the missing mode > + * faults. It's guaranteed because the only threads that > + * will trigger uffd faults are the locking threads, and > + * their first instruction to touch the missing page will > + * always be pthread_mutex_lock(). > + * > + * Note that here we relied on an NPTL glibc impl detail to > + * always read the lock type at the entry of the lock op > + * (pthread_mutex_t.__data.__type, offset 0x10) before > + * doing any locking operations to guarantee that. It's > + * actually not good to rely on this impl detail because > + * logically a pthread-compatible lib can implement the > + * locks without types and we can fail when linking with > + * them. However since we used to find bugs with this > + * strict check we still keep it around. Hopefully this > + * could be a good hint when it fails again. If one day > + * it'll break on some other impl of glibc we'll revisit. > + */ > if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE) > err("unexpected write fault"); > Reviewed-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-10-04 13:53 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-04 0:37 [PATCH v2 0/3] mm/hugetlb: Fix selftest failures with write check Peter Xu 2022-10-04 0:37 ` [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Peter Xu 2022-10-04 2:32 ` Mike Kravetz 2022-10-04 13:53 ` Peter Xu 2022-10-04 12:19 ` David Hildenbrand 2022-10-04 13:49 ` Peter Xu 2022-10-04 0:37 ` [PATCH v2 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check Peter Xu 2022-10-04 2:35 ` Mike Kravetz 2022-10-04 12:20 ` David Hildenbrand 2022-10-04 0:37 ` [PATCH v2 3/3] mm/selftest: uffd: Explain the write missing fault check Peter Xu 2022-10-04 2:37 ` Mike Kravetz 2022-10-04 12:20 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).