* Re: [PATCH 4.9 098/128] mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked() [not found] ` <20200619141625.314982137@linuxfoundation.org> @ 2021-02-15 18:37 ` Vlastimil Babka 2021-02-26 16:22 ` [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount Vlastimil Babka 0 siblings, 1 reply; 3+ messages in thread From: Vlastimil Babka @ 2021-02-15 18:37 UTC (permalink / raw) To: Andrea Arcangeli Cc: stable, Jann Horn, Kirill A. Shutemov, Linus Torvalds, Greg Kroah-Hartman, linux-kernel, linux-mm, Jann Horn, Nicolai Stange, Michal Hocko On 6/19/20 4:33 PM, Greg Kroah-Hartman wrote: > From: Andrea Arcangeli <aarcange@redhat.com> > > commit c444eb564fb16645c172d550359cb3d75fe8a040 upstream. > > Write protect anon page faults require an accurate mapcount to decide > if to break the COW or not. This is implemented in the THP path with > reuse_swap_page() -> > page_trans_huge_map_swapcount()/page_trans_huge_mapcount(). > > If the COW triggers while the other processes sharing the page are > under a huge pmd split, to do an accurate reading, we must ensure the > mapcount isn't computed while it's being transferred from the head > page to the tail pages. > > reuse_swap_cache() already runs serialized by the page lock, so it's > enough to add the page lock around __split_huge_pmd_locked too, in > order to add the missing serialization. > > Note: the commit in "Fixes" is just to facilitate the backporting, > because the code before such commit didn't try to do an accurate THP > mapcount calculation and it instead used the page_count() to decide if > to COW or not. Both the page_count and the pin_count are THP-wide > refcounts, so they're inaccurate if used in > reuse_swap_page(). Reverting such commit (besides the unrelated fix to > the local anon_vma assignment) would have also opened the window for > memory corruption side effects to certain workloads as documented in > such commit header. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > Suggested-by: Jann Horn <jannh@google.com> > Reported-by: Jann Horn <jannh@google.com> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Fixes: 6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages during WP faults") > Cc: stable@vger.kernel.org > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Hi, when evaluating this backport for our 4.12 based kernel, Nicolai found out that Jann's POC still triggers, AFAICS because do_huge_pmd_wp_page() doesn't take the page lock, which was only added by ba3c4ce6def4 ("mm, THP, swap: make reuse_swap_page() works for THP swapped out") in 4.14. The upstream stable 4.9 is thus in the same situation (didn't actually test the POC there, but should be obvious), so this is a heads up. Now just backporting ba3c4ce6def4 to 4.9 stable isn't that simple, as that's part of a larger series (maybe with even more prerequisities, didn't check). I'm considering just taking the part of ba3c4ce6def4 that's wrapping page_trans_huge_mapcount() in the page lock (without changing it to reuse_swap_page() and changing the latter to deal with swapped out THP) and will look at it tomorrow. But suggestions (and/or later review) from Andrea/Kirill are welcome. Thanks, Vlastimil > --- > mm/huge_memory.c | 31 ++++++++++++++++++++++++++++--- > 1 file changed, 28 insertions(+), 3 deletions(-) > > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1755,6 +1755,8 @@ void __split_huge_pmd(struct vm_area_str > spinlock_t *ptl; > struct mm_struct *mm = vma->vm_mm; > unsigned long haddr = address & HPAGE_PMD_MASK; > + bool was_locked = false; > + pmd_t _pmd; > > mmu_notifier_invalidate_range_start(mm, haddr, haddr + HPAGE_PMD_SIZE); > ptl = pmd_lock(mm, pmd); > @@ -1764,11 +1766,32 @@ void __split_huge_pmd(struct vm_area_str > * pmd against. Otherwise we can end up replacing wrong page. > */ > VM_BUG_ON(freeze && !page); > - if (page && page != pmd_page(*pmd)) > - goto out; > + if (page) { > + VM_WARN_ON_ONCE(!PageLocked(page)); > + was_locked = true; > + if (page != pmd_page(*pmd)) > + goto out; > + } > > +repeat: > if (pmd_trans_huge(*pmd)) { > - page = pmd_page(*pmd); > + if (!page) { > + page = pmd_page(*pmd); > + if (unlikely(!trylock_page(page))) { > + get_page(page); > + _pmd = *pmd; > + spin_unlock(ptl); > + lock_page(page); > + spin_lock(ptl); > + if (unlikely(!pmd_same(*pmd, _pmd))) { > + unlock_page(page); > + put_page(page); > + page = NULL; > + goto repeat; > + } > + put_page(page); > + } > + } > if (PageMlocked(page)) > clear_page_mlock(page); > } else if (!pmd_devmap(*pmd)) > @@ -1776,6 +1799,8 @@ void __split_huge_pmd(struct vm_area_str > __split_huge_pmd_locked(vma, pmd, haddr, freeze); > out: > spin_unlock(ptl); > + if (!was_locked && page) > + unlock_page(page); > mmu_notifier_invalidate_range_end(mm, haddr, haddr + HPAGE_PMD_SIZE); > } > > > > ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount 2021-02-15 18:37 ` [PATCH 4.9 098/128] mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked() Vlastimil Babka @ 2021-02-26 16:22 ` Vlastimil Babka 2021-02-26 19:09 ` Sasha Levin 0 siblings, 1 reply; 3+ messages in thread From: Vlastimil Babka @ 2021-02-26 16:22 UTC (permalink / raw) To: stable Cc: linux-kernel, linux-mm, Andrea Arcangeli, Kirill A. Shutemov, Jann Horn, Linus Torvalds, Michal Hocko, Hugh Dickins, Vlastimil Babka, Nicolai Stange Jann reported [1] a race between __split_huge_pmd_locked() and page_trans_huge_map_swapcount() which can result in a page to be reused instead of COWed. This was later assigned CVE-2020-29368. This was fixed by commit c444eb564fb1 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()") by doing the split under the page lock, while all users of page_trans_huge_map_swapcount() were already also under page lock. The fix was backported also to 4.9 stable series. When testing the backport on a 4.12 based kernel, Nicolai noticed the POC from [1] still reproduces after backporting c444eb564fb1 and identified a missing page lock in do_huge_pmd_wp_page() around the call to page_trans_huge_mapcount(). The page lock was only added in ba3c4ce6def4 ("mm, THP, swap: make reuse_swap_page() works for THP swapped out") in 4.14. The commit also wrapped page_trans_huge_mapcount() into page_trans_huge_map_swapcount() for the purposes of COW decisions. I have verified that 4.9.y indeed also reproduces with the POC. Backporting ba3c4ce6def4 alone however is not possible as it's part of a larger effort of optimizing THP swapping, which would be risky to backport fully. Therefore this 4.9-stable-only patch just wraps page_trans_huge_mapcount() in page_trans_huge_mapcount() under page lock the same way as ba3c4ce6def4 does, but without the page_trans_huge_map_swapcount() part. Other callers of page_trans_huge_mapcount() are all under page lock already. I have verified the POC no longer reproduces afterwards. [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 Reported-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- mm/huge_memory.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 05ca01ef97f7..14cd0ef33b62 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1022,6 +1022,19 @@ int do_huge_pmd_wp_page(struct fault_env *fe, pmd_t orig_pmd) * We can only reuse the page if nobody else maps the huge page or it's * part. */ + if (!trylock_page(page)) { + get_page(page); + spin_unlock(fe->ptl); + lock_page(page); + spin_lock(fe->ptl); + if (unlikely(!pmd_same(*fe->pmd, orig_pmd))) { + unlock_page(page); + put_page(page); + goto out_unlock; + } + put_page(page); + } + if (page_trans_huge_mapcount(page, NULL) == 1) { pmd_t entry; entry = pmd_mkyoung(orig_pmd); @@ -1029,8 +1042,10 @@ int do_huge_pmd_wp_page(struct fault_env *fe, pmd_t orig_pmd) if (pmdp_set_access_flags(vma, haddr, fe->pmd, entry, 1)) update_mmu_cache_pmd(vma, fe->address, fe->pmd); ret |= VM_FAULT_WRITE; + unlock_page(page); goto out_unlock; } + unlock_page(page); get_page(page); spin_unlock(fe->ptl); alloc: -- 2.30.1 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount 2021-02-26 16:22 ` [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount Vlastimil Babka @ 2021-02-26 19:09 ` Sasha Levin 0 siblings, 0 replies; 3+ messages in thread From: Sasha Levin @ 2021-02-26 19:09 UTC (permalink / raw) To: Vlastimil Babka Cc: stable, linux-kernel, linux-mm, Andrea Arcangeli, Kirill A. Shutemov, Jann Horn, Linus Torvalds, Michal Hocko, Hugh Dickins, Nicolai Stange On Fri, Feb 26, 2021 at 05:22:00PM +0100, Vlastimil Babka wrote: >Jann reported [1] a race between __split_huge_pmd_locked() and >page_trans_huge_map_swapcount() which can result in a page to be reused >instead of COWed. This was later assigned CVE-2020-29368. > >This was fixed by commit c444eb564fb1 ("mm: thp: make the THP mapcount atomic >against __split_huge_pmd_locked()") by doing the split under the page lock, >while all users of page_trans_huge_map_swapcount() were already also under page >lock. The fix was backported also to 4.9 stable series. > >When testing the backport on a 4.12 based kernel, Nicolai noticed the POC from >[1] still reproduces after backporting c444eb564fb1 and identified a missing >page lock in do_huge_pmd_wp_page() around the call to >page_trans_huge_mapcount(). The page lock was only added in ba3c4ce6def4 ("mm, >THP, swap: make reuse_swap_page() works for THP swapped out") in 4.14. The >commit also wrapped page_trans_huge_mapcount() into >page_trans_huge_map_swapcount() for the purposes of COW decisions. > >I have verified that 4.9.y indeed also reproduces with the POC. Backporting >ba3c4ce6def4 alone however is not possible as it's part of a larger effort of >optimizing THP swapping, which would be risky to backport fully. > >Therefore this 4.9-stable-only patch just wraps page_trans_huge_mapcount() >in page_trans_huge_mapcount() under page lock the same way as ba3c4ce6def4 >does, but without the page_trans_huge_map_swapcount() part. Other callers >of page_trans_huge_mapcount() are all under page lock already. I have verified >the POC no longer reproduces afterwards. > >[1] https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 > >Reported-by: Nicolai Stange <nstange@suse.de> >Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Queued up, thanks! -- Thanks, Sasha ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-02-26 19:09 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20200619141620.148019466@linuxfoundation.org> [not found] ` <20200619141625.314982137@linuxfoundation.org> 2021-02-15 18:37 ` [PATCH 4.9 098/128] mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked() Vlastimil Babka 2021-02-26 16:22 ` [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount Vlastimil Babka 2021-02-26 19:09 ` Sasha Levin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).