linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 4.9 098/128] mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()
       [not found] ` <20200619141625.314982137@linuxfoundation.org>
@ 2021-02-15 18:37   ` Vlastimil Babka
  2021-02-26 16:22     ` [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount Vlastimil Babka
  0 siblings, 1 reply; 3+ messages in thread
From: Vlastimil Babka @ 2021-02-15 18:37 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: stable, Jann Horn, Kirill A. Shutemov, Linus Torvalds,
	Greg Kroah-Hartman, linux-kernel, linux-mm, Jann Horn,
	Nicolai Stange, Michal Hocko

On 6/19/20 4:33 PM, Greg Kroah-Hartman wrote:
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> commit c444eb564fb16645c172d550359cb3d75fe8a040 upstream.
> 
> Write protect anon page faults require an accurate mapcount to decide
> if to break the COW or not. This is implemented in the THP path with
> reuse_swap_page() ->
> page_trans_huge_map_swapcount()/page_trans_huge_mapcount().
> 
> If the COW triggers while the other processes sharing the page are
> under a huge pmd split, to do an accurate reading, we must ensure the
> mapcount isn't computed while it's being transferred from the head
> page to the tail pages.
> 
> reuse_swap_cache() already runs serialized by the page lock, so it's
> enough to add the page lock around __split_huge_pmd_locked too, in
> order to add the missing serialization.
> 
> Note: the commit in "Fixes" is just to facilitate the backporting,
> because the code before such commit didn't try to do an accurate THP
> mapcount calculation and it instead used the page_count() to decide if
> to COW or not. Both the page_count and the pin_count are THP-wide
> refcounts, so they're inaccurate if used in
> reuse_swap_page(). Reverting such commit (besides the unrelated fix to
> the local anon_vma assignment) would have also opened the window for
> memory corruption side effects to certain workloads as documented in
> such commit header.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Suggested-by: Jann Horn <jannh@google.com>
> Reported-by: Jann Horn <jannh@google.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Fixes: 6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages during WP faults")
> Cc: stable@vger.kernel.org
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Hi, when evaluating this backport for our 4.12 based kernel, Nicolai found out
that Jann's POC still triggers, AFAICS because do_huge_pmd_wp_page() doesn't
take the page lock, which was only added by ba3c4ce6def4 ("mm, THP, swap: make
reuse_swap_page() works for THP swapped out") in 4.14. The upstream stable 4.9
is thus in the same situation (didn't actually test the POC there, but should be
obvious), so this is a heads up.

Now just backporting ba3c4ce6def4 to 4.9 stable isn't that simple, as that's
part of a larger series (maybe with even more prerequisities, didn't check). I'm
considering just taking the part of ba3c4ce6def4 that's wrapping
page_trans_huge_mapcount() in the page lock (without changing it to
reuse_swap_page() and changing the latter to deal with swapped out THP) and will
look at it tomorrow. But suggestions (and/or later review) from Andrea/Kirill
are welcome.

Thanks,
Vlastimil

> ---
>  mm/huge_memory.c |   31 ++++++++++++++++++++++++++++---
>  1 file changed, 28 insertions(+), 3 deletions(-)
> 
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1755,6 +1755,8 @@ void __split_huge_pmd(struct vm_area_str
>  	spinlock_t *ptl;
>  	struct mm_struct *mm = vma->vm_mm;
>  	unsigned long haddr = address & HPAGE_PMD_MASK;
> +	bool was_locked = false;
> +	pmd_t _pmd;
>  
>  	mmu_notifier_invalidate_range_start(mm, haddr, haddr + HPAGE_PMD_SIZE);
>  	ptl = pmd_lock(mm, pmd);
> @@ -1764,11 +1766,32 @@ void __split_huge_pmd(struct vm_area_str
>  	 * pmd against. Otherwise we can end up replacing wrong page.
>  	 */
>  	VM_BUG_ON(freeze && !page);
> -	if (page && page != pmd_page(*pmd))
> -	        goto out;
> +	if (page) {
> +		VM_WARN_ON_ONCE(!PageLocked(page));
> +		was_locked = true;
> +		if (page != pmd_page(*pmd))
> +			goto out;
> +	}
>  
> +repeat:
>  	if (pmd_trans_huge(*pmd)) {
> -		page = pmd_page(*pmd);
> +		if (!page) {
> +			page = pmd_page(*pmd);
> +			if (unlikely(!trylock_page(page))) {
> +				get_page(page);
> +				_pmd = *pmd;
> +				spin_unlock(ptl);
> +				lock_page(page);
> +				spin_lock(ptl);
> +				if (unlikely(!pmd_same(*pmd, _pmd))) {
> +					unlock_page(page);
> +					put_page(page);
> +					page = NULL;
> +					goto repeat;
> +				}
> +				put_page(page);
> +			}
> +		}
>  		if (PageMlocked(page))
>  			clear_page_mlock(page);
>  	} else if (!pmd_devmap(*pmd))
> @@ -1776,6 +1799,8 @@ void __split_huge_pmd(struct vm_area_str
>  	__split_huge_pmd_locked(vma, pmd, haddr, freeze);
>  out:
>  	spin_unlock(ptl);
> +	if (!was_locked && page)
> +		unlock_page(page);
>  	mmu_notifier_invalidate_range_end(mm, haddr, haddr + HPAGE_PMD_SIZE);
>  }
>  
> 
> 
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount
  2021-02-15 18:37   ` [PATCH 4.9 098/128] mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked() Vlastimil Babka
@ 2021-02-26 16:22     ` Vlastimil Babka
  2021-02-26 19:09       ` Sasha Levin
  0 siblings, 1 reply; 3+ messages in thread
From: Vlastimil Babka @ 2021-02-26 16:22 UTC (permalink / raw)
  To: stable
  Cc: linux-kernel, linux-mm, Andrea Arcangeli, Kirill A. Shutemov,
	Jann Horn, Linus Torvalds, Michal Hocko, Hugh Dickins,
	Vlastimil Babka, Nicolai Stange

Jann reported [1] a race between __split_huge_pmd_locked() and
page_trans_huge_map_swapcount() which can result in a page to be reused
instead of COWed. This was later assigned CVE-2020-29368.

This was fixed by commit c444eb564fb1 ("mm: thp: make the THP mapcount atomic
against __split_huge_pmd_locked()") by doing the split under the page lock,
while all users of page_trans_huge_map_swapcount() were already also under page
lock. The fix was backported also to 4.9 stable series.

When testing the backport on a 4.12 based kernel, Nicolai noticed the POC from
[1] still reproduces after backporting c444eb564fb1 and identified a missing
page lock in do_huge_pmd_wp_page() around the call to
page_trans_huge_mapcount(). The page lock was only added in ba3c4ce6def4 ("mm,
THP, swap: make reuse_swap_page() works for THP swapped out") in 4.14. The
commit also wrapped page_trans_huge_mapcount() into
page_trans_huge_map_swapcount() for the purposes of COW decisions.

I have verified that 4.9.y indeed also reproduces with the POC. Backporting
ba3c4ce6def4 alone however is not possible as it's part of a larger effort of
optimizing THP swapping, which would be risky to backport fully.

Therefore this 4.9-stable-only patch just wraps page_trans_huge_mapcount()
in page_trans_huge_mapcount() under page lock the same way as ba3c4ce6def4
does, but without the page_trans_huge_map_swapcount() part. Other callers
of page_trans_huge_mapcount() are all under page lock already. I have verified
the POC no longer reproduces afterwards.

[1] https://bugs.chromium.org/p/project-zero/issues/detail?id=2045

Reported-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/huge_memory.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 05ca01ef97f7..14cd0ef33b62 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1022,6 +1022,19 @@ int do_huge_pmd_wp_page(struct fault_env *fe, pmd_t orig_pmd)
 	 * We can only reuse the page if nobody else maps the huge page or it's
 	 * part.
 	 */
+	if (!trylock_page(page)) {
+		get_page(page);
+		spin_unlock(fe->ptl);
+		lock_page(page);
+		spin_lock(fe->ptl);
+		if (unlikely(!pmd_same(*fe->pmd, orig_pmd))) {
+			unlock_page(page);
+			put_page(page);
+			goto out_unlock;
+		}
+		put_page(page);
+	}
+
 	if (page_trans_huge_mapcount(page, NULL) == 1) {
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
@@ -1029,8 +1042,10 @@ int do_huge_pmd_wp_page(struct fault_env *fe, pmd_t orig_pmd)
 		if (pmdp_set_access_flags(vma, haddr, fe->pmd, entry,  1))
 			update_mmu_cache_pmd(vma, fe->address, fe->pmd);
 		ret |= VM_FAULT_WRITE;
+		unlock_page(page);
 		goto out_unlock;
 	}
+	unlock_page(page);
 	get_page(page);
 	spin_unlock(fe->ptl);
 alloc:
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount
  2021-02-26 16:22     ` [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount Vlastimil Babka
@ 2021-02-26 19:09       ` Sasha Levin
  0 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2021-02-26 19:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: stable, linux-kernel, linux-mm, Andrea Arcangeli,
	Kirill A. Shutemov, Jann Horn, Linus Torvalds, Michal Hocko,
	Hugh Dickins, Nicolai Stange

On Fri, Feb 26, 2021 at 05:22:00PM +0100, Vlastimil Babka wrote:
>Jann reported [1] a race between __split_huge_pmd_locked() and
>page_trans_huge_map_swapcount() which can result in a page to be reused
>instead of COWed. This was later assigned CVE-2020-29368.
>
>This was fixed by commit c444eb564fb1 ("mm: thp: make the THP mapcount atomic
>against __split_huge_pmd_locked()") by doing the split under the page lock,
>while all users of page_trans_huge_map_swapcount() were already also under page
>lock. The fix was backported also to 4.9 stable series.
>
>When testing the backport on a 4.12 based kernel, Nicolai noticed the POC from
>[1] still reproduces after backporting c444eb564fb1 and identified a missing
>page lock in do_huge_pmd_wp_page() around the call to
>page_trans_huge_mapcount(). The page lock was only added in ba3c4ce6def4 ("mm,
>THP, swap: make reuse_swap_page() works for THP swapped out") in 4.14. The
>commit also wrapped page_trans_huge_mapcount() into
>page_trans_huge_map_swapcount() for the purposes of COW decisions.
>
>I have verified that 4.9.y indeed also reproduces with the POC. Backporting
>ba3c4ce6def4 alone however is not possible as it's part of a larger effort of
>optimizing THP swapping, which would be risky to backport fully.
>
>Therefore this 4.9-stable-only patch just wraps page_trans_huge_mapcount()
>in page_trans_huge_mapcount() under page lock the same way as ba3c4ce6def4
>does, but without the page_trans_huge_map_swapcount() part. Other callers
>of page_trans_huge_mapcount() are all under page lock already. I have verified
>the POC no longer reproduces afterwards.
>
>[1] https://bugs.chromium.org/p/project-zero/issues/detail?id=2045
>
>Reported-by: Nicolai Stange <nstange@suse.de>
>Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Queued up, thanks!

-- 
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-02-26 19:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200619141620.148019466@linuxfoundation.org>
     [not found] ` <20200619141625.314982137@linuxfoundation.org>
2021-02-15 18:37   ` [PATCH 4.9 098/128] mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked() Vlastimil Babka
2021-02-26 16:22     ` [PATCH 4.9 STABLE] mm, thp: make do_huge_pmd_wp_page() lock page for testing mapcount Vlastimil Babka
2021-02-26 19:09       ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).