Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/8] Backported fixes for 4.4 stable tree
@ 2019-10-09  0:44 Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 1/8] mm: make page ref count overflow check tighter and more explicit Ajay Kaher
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Ajay Kaher @ 2019-10-09  0:44 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher

These patches include few backported fixes for the 4.4 stable
tree.
I would appreciate if you could kindly consider including them in the
next release.

Ajay

---

[Changes from v1]: No changes, only answering Greg's below queries:

>> Why are these needed?  From what I remember, the last patch here is only
>> needed for machines that are "HUGE" and for those, you shouldn't be
>> using 4.4.y anymore anyway, right?  You just end up saving so much more
>> speed and energy using a newer kernel, why would you want to waste it
>> using an older one?
>>
>> So I need a really good reason why to accept these :)
>
> It's been a week, so I'm dropping this from my queue now.  Please resend
> with this information if you still want these in the tree.

> thanks,
> greg k-h

Indeed, the machine needs to have about 140 GB of RAM to exploit
this vulnerability (CVE-2019-11487). However, Photon OS doesn't
impose any limits on the amount of RAM that it supports, so we would
like to safeguard the kernel against this CVE. Also, while newer
versions of Photon OS are on more recent kernels, Photon OS 1.0 uses
the 4.4 stable series, so it would be great to get these patches
included in an upcoming 4.4 stable release.
    
We would also like to have the following patches that are for machines
that are huge:
    
Patch 1: Introduced page_ref_zero_or_close_to_overflow() which helps to
check for small underflows (or _very_ close to overflowing), and ignore
overflows which have strayed into negative territory.
And this is being used inside get_page() and get_page_foll() to reduce the
possibility of overflowing.  
    
Patch 6: Attacker could do direct IO on a page multiple times to trigger 
an overflowing. This patch makes get_user_pages() refuse to if there is
an overflow.
    
Patch 8: This removes another mechanism for overflowing the page refcount
inside pipe_buf_get().
    
---

[PATCH v2 1/8]:
Backporting of upstream commit f958d7b528b1:
mm: make page ref count overflow check tighter and more explicit

[PATCH v2 2/8]:
Backporting of upstream commit 88b1a17dfc3e:
mm: add 'try_get_page()' helper function

[PATCH v2 3/8]:
Backporting of upstream commit 7aef4172c795:
mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton

[PATCH v2 4/8]:
Backporting of upstream commit a3e328556d41:
mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages

[PATCH v2 5/8]:
Backporting of upstream commit d63206ee32b6:
mm, gup: ensure real head page is ref-counted when using hugepages

[PATCH v2 6/8]:
Backporting of upstream commit 8fde12ca79af:
mm: prevent get_user_pages() from overflowing page refcount

[PATCH v2 7/8]:
Backporting of upstream commit 7bf2d1df8082:
pipe: add pipe_buf_get() helper

[PATCH v2 8/8]:
Backporting of upstream commit 15fab63e1e57:
fs: prevent page refcount overflow in pipe_buf_get



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/8] mm: make page ref count overflow check tighter and more explicit
  2019-10-09  0:44 [PATCH v2 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
@ 2019-10-09  0:44 ` Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 2/8] mm: add 'try_get_page()' helper function Ajay Kaher
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ajay Kaher @ 2019-10-09  0:44 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Jann Horn, stable

From: Linus Torvalds <torvalds@linux-foundation.org>

commit f958d7b528b1b40c44cfda5eabe2d82760d868c3 upsteam.

We have a VM_BUG_ON() to check that the page reference count doesn't
underflow (or get close to overflow) by checking the sign of the count.

That's all fine, but we actually want to allow people to use a "get page
ref unless it's already very high" helper function, and we want that one
to use the sign of the page ref (without triggering this VM_BUG_ON).

Change the VM_BUG_ON to only check for small underflows (or _very_ close
to overflowing), and ignore overflows which have strayed into negative
territory.

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ 4.4.y backport notes:
  Ajay: Open-coded atomic refcount access due to missing
  page_ref_count() helper in 4.4.y
  Srivatsa: Added overflow check to get_page_foll() and related code. ]
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
---
 include/linux/mm.h | 6 +++++-
 mm/internal.h      | 5 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed653ba..701088e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -488,6 +488,10 @@ static inline void get_huge_page_tail(struct page *page)
 
 extern bool __get_page_tail(struct page *page);
 
+/* 127: arbitrary random number, small enough to assemble well */
+#define page_ref_zero_or_close_to_overflow(page) \
+	((unsigned int) atomic_read(&page->_count) + 127u <= 127u)
+
 static inline void get_page(struct page *page)
 {
 	if (unlikely(PageTail(page)))
@@ -497,7 +501,7 @@ static inline void get_page(struct page *page)
 	 * Getting a normal page or the head of a compound page
 	 * requires to already have an elevated page->_count.
 	 */
-	VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page);
+	VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
 	atomic_inc(&page->_count);
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index f63f439..67015e5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -81,7 +81,8 @@ static inline void __get_page_tail_foll(struct page *page,
 	 * speculative page access (like in
 	 * page_cache_get_speculative()) on tail pages.
 	 */
-	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
+	VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(compound_head(page)),
+		       page);
 	if (get_page_head)
 		atomic_inc(&compound_head(page)->_count);
 	get_huge_page_tail(page);
@@ -106,7 +107,7 @@ static inline void get_page_foll(struct page *page)
 		 * Getting a normal page or the head of a compound page
 		 * requires to already have an elevated page->_count.
 		 */
-		VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page);
+		VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
 		atomic_inc(&page->_count);
 	}
 }
-- 
2.7.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 2/8] mm: add 'try_get_page()' helper function
  2019-10-09  0:44 [PATCH v2 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 1/8] mm: make page ref count overflow check tighter and more explicit Ajay Kaher
@ 2019-10-09  0:44 ` Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 3/8] mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton Ajay Kaher
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ajay Kaher @ 2019-10-09  0:44 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Jann Horn, stable

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 upsteam.

This is the same as the traditional 'get_page()' function, but instead
of unconditionally incrementing the reference count of the page, it only
does so if the count was "safe".  It returns whether the reference count
was incremented (and is marked __must_check, since the caller obviously
has to be aware of it).

Also like 'get_page()', you can't use this function unless you already
had a reference to the page.  The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.

The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.

NOTE! The count access for the "safety" check is inherently racy, but
that doesn't matter since the buffer we use is basically half the range
of the reference count (ie we look at the sign of the count).

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ 4.4.y backport notes:
  Srivatsa:
  - Adapted try_get_page() to match the get_page()
    implementation in 4.4.y, except for the refcount check.
  - Added try_get_page_foll() which will be needed
    in a subsequent patch. ]
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
---
 include/linux/mm.h | 12 ++++++++++++
 mm/internal.h      | 23 +++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 701088e..52edaf1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -505,6 +505,18 @@ static inline void get_page(struct page *page)
 	atomic_inc(&page->_count);
 }
 
+static inline __must_check bool try_get_page(struct page *page)
+{
+	if (unlikely(PageTail(page)))
+		if (likely(__get_page_tail(page)))
+			return true;
+
+	if (WARN_ON_ONCE(atomic_read(&page->_count) <= 0))
+		return false;
+	atomic_inc(&page->_count);
+	return true;
+}
+
 static inline struct page *virt_to_head_page(const void *x)
 {
 	struct page *page = virt_to_page(x);
diff --git a/mm/internal.h b/mm/internal.h
index 67015e5..d83afc9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -112,6 +112,29 @@ static inline void get_page_foll(struct page *page)
 	}
 }
 
+static inline __must_check bool try_get_page_foll(struct page *page)
+{
+	if (unlikely(PageTail(page))) {
+		if (WARN_ON_ONCE(atomic_read(&compound_head(page)->_count) <= 0))
+			return false;
+		/*
+		 * This is safe only because
+		 * __split_huge_page_refcount() can't run under
+		 * get_page_foll() because we hold the proper PT lock.
+		 */
+		__get_page_tail_foll(page, true);
+	} else {
+		/*
+		 * Getting a normal page or the head of a compound page
+		 * requires to already have an elevated page->_count.
+		 */
+		if (WARN_ON_ONCE(atomic_read(&page->_count) <= 0))
+			return false;
+		atomic_inc(&page->_count);
+	}
+	return true;
+}
+
 extern unsigned long highest_memmap_pfn;
 
 /*
-- 
2.7.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 3/8] mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton
  2019-10-09  0:44 [PATCH v2 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 1/8] mm: make page ref count overflow check tighter and more explicit Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 2/8] mm: add 'try_get_page()' helper function Ajay Kaher
@ 2019-10-09  0:44 ` Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 4/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages Ajay Kaher
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ajay Kaher @ 2019-10-09  0:44 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Andrea Arcangeli, Hugh Dickins, Dave Hansen,
	Mel Gorman, Rik van Riel, Naoya Horiguchi, Steve Capper,
	Johannes Weiner, Michal Hocko, Christoph Lameter, David Rientjes

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

commit 7aef4172c7957d7e65fc172be4c99becaef855d4 upstream.

With new refcounting we are going to see THP tail pages mapped with PTE.
Generic fast GUP rely on page_cache_get_speculative() to obtain
reference on page.  page_cache_get_speculative() always fails on tail
pages, because ->_count on tail pages is always zero.

Let's handle tail pages in gup_pte_range().

New split_huge_page() will rely on migration entries to freeze page's
counts.  Recheck PTE value after page_cache_get_speculative() on head
page should be enough to serialize against split.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
---
 mm/gup.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 2cd3b31..45c544b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1070,7 +1070,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 		 * for an example see gup_get_pte in arch/x86/mm/gup.c
 		 */
 		pte_t pte = READ_ONCE(*ptep);
-		struct page *page;
+		struct page *head, *page;
 
 		/*
 		 * Similar to the PMD case below, NUMA hinting must take slow
@@ -1082,15 +1082,17 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
+		head = compound_head(page);
 
-		if (!page_cache_get_speculative(page))
+		if (!page_cache_get_speculative(head))
 			goto pte_unmap;
 
 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-			put_page(page);
+			put_page(head);
 			goto pte_unmap;
 		}
 
+		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 		pages[*nr] = page;
 		(*nr)++;
 
-- 
2.7.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 4/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages
  2019-10-09  0:44 [PATCH v2 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (2 preceding siblings ...)
  2019-10-09  0:44 ` [PATCH v2 3/8] mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton Ajay Kaher
@ 2019-10-09  0:44 ` Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 5/8] mm, gup: ensure real head page is ref-counted when using hugepages Ajay Kaher
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ajay Kaher @ 2019-10-09  0:44 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Aneesh Kumar K . V, Catalin Marinas,
	Naoya Horiguchi, Mark Rutland, Hillf Danton, Michal Hocko,
	Mike Kravetz

From: Will Deacon <will.deacon@arm.com>

commit a3e328556d41bb61c55f9dfcc62d6a826ea97b85 upstream.

When operating on hugepages with DEBUG_VM enabled, the GUP code checks
the compound head for each tail page prior to calling
page_cache_add_speculative.  This is broken, because on the fast-GUP
path (where we don't hold any page table locks) we can be racing with a
concurrent invocation of split_huge_page_to_list.

split_huge_page_to_list deals with this race by using page_ref_freeze to
freeze the page and force concurrent GUPs to fail whilst the component
pages are modified.  This modification includes clearing the
compound_head field for the tail pages, so checking this prior to a
successful call to page_cache_add_speculative can lead to false
positives: In fact, page_cache_add_speculative *already* has this check
once the page refcount has been successfully updated, so we can simply
remove the broken calls to VM_BUG_ON_PAGE.

Link: http://lkml.kernel.org/r/20170522133604.11392-2-punit.agrawal@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
---
 mm/gup.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 45c544b..6e7cfaa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1136,7 +1136,6 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
-		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 		pages[*nr] = page;
 		(*nr)++;
 		page++;
@@ -1183,7 +1182,6 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
-		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 		pages[*nr] = page;
 		(*nr)++;
 		page++;
@@ -1226,7 +1224,6 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 	page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
-		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 		pages[*nr] = page;
 		(*nr)++;
 		page++;
-- 
2.7.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 5/8] mm, gup: ensure real head page is ref-counted when using hugepages
  2019-10-09  0:44 [PATCH v2 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (3 preceding siblings ...)
  2019-10-09  0:44 ` [PATCH v2 4/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages Ajay Kaher
@ 2019-10-09  0:44 ` Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 6/8] mm: prevent get_user_pages() from overflowing page refcount Ajay Kaher
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ajay Kaher @ 2019-10-09  0:44 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Michal Hocko, Aneesh Kumar K . V,
	Catalin Marinas, Naoya Horiguchi, Mark Rutland, Hillf Danton,
	Mike Kravetz

From: Punit Agrawal <punit.agrawal@arm.com>

commit d63206ee32b6e64b0e12d46e5d6004afd9913713 upstream.

When speculatively taking references to a hugepage using
page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the
page returned by pmd_page() is the head page.  Although normally true,
this assumption doesn't hold when the hugepage comprises of successive
page table entries such as when using contiguous bit on arm64 at PTE or
PMD levels.

This can be addressed by ensuring that the page passed to
page_cache_add_speculative() is the real head or by de-referencing the
head page within the function.

We take the first approach to keep the usage pattern aligned with
page_cache_get_speculative() where users already pass the appropriate
page, i.e., the de-referenced head.

Apply the same logic to fix gup_huge_[pud|pgd]() as well.

[punit.agrawal@arm.com: fix arm64 ltp failure]
  Link: http://lkml.kernel.org/r/20170619170145.25577-5-punit.agrawal@arm.com
Link: http://lkml.kernel.org/r/20170522133604.11392-3-punit.agrawal@arm.com
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
---
 mm/gup.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 6e7cfaa..fae4d1e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1132,8 +1132,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 		return 0;
 
 	refs = 0;
-	head = pmd_page(orig);
-	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
 		pages[*nr] = page;
@@ -1142,6 +1141,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
+	head = compound_head(pmd_page(orig));
 	if (!page_cache_add_speculative(head, refs)) {
 		*nr -= refs;
 		return 0;
@@ -1178,8 +1178,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 		return 0;
 
 	refs = 0;
-	head = pud_page(orig);
-	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
 		pages[*nr] = page;
@@ -1188,6 +1187,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
+	head = compound_head(pud_page(orig));
 	if (!page_cache_add_speculative(head, refs)) {
 		*nr -= refs;
 		return 0;
@@ -1220,8 +1220,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 		return 0;
 
 	refs = 0;
-	head = pgd_page(orig);
-	page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
+	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
 		pages[*nr] = page;
@@ -1230,6 +1229,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
+	head = compound_head(pgd_page(orig));
 	if (!page_cache_add_speculative(head, refs)) {
 		*nr -= refs;
 		return 0;
-- 
2.7.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 6/8] mm: prevent get_user_pages() from overflowing page refcount
  2019-10-09  0:44 [PATCH v2 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (4 preceding siblings ...)
  2019-10-09  0:44 ` [PATCH v2 5/8] mm, gup: ensure real head page is ref-counted when using hugepages Ajay Kaher
@ 2019-10-09  0:44 ` Ajay Kaher
  2019-10-09 13:13   ` Vlastimil Babka
  2019-10-09  0:44 ` [PATCH v2 7/8] pipe: add pipe_buf_get() helper Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 8/8] fs: prevent page refcount overflow in pipe_buf_get Ajay Kaher
  7 siblings, 1 reply; 11+ messages in thread
From: Ajay Kaher @ 2019-10-09  0:44 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, stable

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream.

If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it.  One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times.  This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.

Reported-by: Jann Horn <jannh@google.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ 4.4.y backport notes:
  Ajay: Added local variable 'err' with-in follow_hugetlb_page()
        from 2be7cfed995e, to resolve compilation error
  Srivatsa: Replaced call to get_page_foll() with try_get_page_foll() ]
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
---
 mm/gup.c     | 43 ++++++++++++++++++++++++++++++++-----------
 mm/hugetlb.c | 16 +++++++++++++++-
 2 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index fae4d1e..171b460 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -126,8 +126,12 @@ retry:
 		}
 	}
 
-	if (flags & FOLL_GET)
-		get_page_foll(page);
+	if (flags & FOLL_GET) {
+		if (unlikely(!try_get_page_foll(page))) {
+			page = ERR_PTR(-ENOMEM);
+			goto out;
+		}
+	}
 	if (flags & FOLL_TOUCH) {
 		if ((flags & FOLL_WRITE) &&
 		    !pte_dirty(pte) && !PageDirty(page))
@@ -289,7 +293,10 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
 			goto unmap;
 		*page = pte_page(*pte);
 	}
-	get_page(*page);
+	if (unlikely(!try_get_page(*page))) {
+		ret = -ENOMEM;
+		goto unmap;
+	}
 out:
 	ret = 0;
 unmap:
@@ -1053,6 +1060,20 @@ struct page *get_dump_page(unsigned long addr)
  */
 #ifdef CONFIG_HAVE_GENERIC_RCU_GUP
 
+/*
+ * Return the compund head page with ref appropriately incremented,
+ * or NULL if that failed.
+ */
+static inline struct page *try_get_compound_head(struct page *page, int refs)
+{
+	struct page *head = compound_head(page);
+	if (WARN_ON_ONCE(atomic_read(&head->_count) < 0))
+		return NULL;
+	if (unlikely(!page_cache_add_speculative(head, refs)))
+		return NULL;
+	return head;
+}
+
 #ifdef __HAVE_ARCH_PTE_SPECIAL
 static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 			 int write, struct page **pages, int *nr)
@@ -1082,9 +1103,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
-		head = compound_head(page);
 
-		if (!page_cache_get_speculative(head))
+		head = try_get_compound_head(page, 1);
+		if (!head)
 			goto pte_unmap;
 
 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
@@ -1141,8 +1162,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
-	head = compound_head(pmd_page(orig));
-	if (!page_cache_add_speculative(head, refs)) {
+	head = try_get_compound_head(pmd_page(orig), refs);
+	if (!head) {
 		*nr -= refs;
 		return 0;
 	}
@@ -1187,8 +1208,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
-	head = compound_head(pud_page(orig));
-	if (!page_cache_add_speculative(head, refs)) {
+	head = try_get_compound_head(pud_page(orig), refs);
+	if (!head) {
 		*nr -= refs;
 		return 0;
 	}
@@ -1229,8 +1250,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
-	head = compound_head(pgd_page(orig));
-	if (!page_cache_add_speculative(head, refs)) {
+	head = try_get_compound_head(pgd_page(orig), refs);
+	if (!head) {
 		*nr -= refs;
 		return 0;
 	}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fd932e7..3a1501e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3886,6 +3886,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long vaddr = *position;
 	unsigned long remainder = *nr_pages;
 	struct hstate *h = hstate_vma(vma);
+	int err = -EFAULT;
 
 	while (vaddr < vma->vm_end && remainder) {
 		pte_t *pte;
@@ -3957,6 +3958,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
 		page = pte_page(huge_ptep_get(pte));
+
+		/*
+		 * Instead of doing 'try_get_page_foll()' below in the same_page
+		 * loop, just check the count once here.
+		 */
+		if (unlikely(page_count(page) <= 0)) {
+			if (pages) {
+				spin_unlock(ptl);
+				remainder = 0;
+				err = -ENOMEM;
+				break;
+			}
+		}
 same_page:
 		if (pages) {
 			pages[i] = mem_map_offset(page, pfn_offset);
@@ -3983,7 +3997,7 @@ same_page:
 	*nr_pages = remainder;
 	*position = vaddr;
 
-	return i ? i : -EFAULT;
+	return i ? i : err;
 }
 
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
-- 
2.7.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 7/8] pipe: add pipe_buf_get() helper
  2019-10-09  0:44 [PATCH v2 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (5 preceding siblings ...)
  2019-10-09  0:44 ` [PATCH v2 6/8] mm: prevent get_user_pages() from overflowing page refcount Ajay Kaher
@ 2019-10-09  0:44 ` Ajay Kaher
  2019-10-09  0:44 ` [PATCH v2 8/8] fs: prevent page refcount overflow in pipe_buf_get Ajay Kaher
  7 siblings, 0 replies; 11+ messages in thread
From: Ajay Kaher @ 2019-10-09  0:44 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Al Viro

From: Miklos Szeredi <mszeredi@redhat.com>

commit 7bf2d1df80822ec056363627e2014990f068f7aa upstream.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
---
 fs/fuse/dev.c             |  2 +-
 fs/splice.c               |  4 ++--
 include/linux/pipe_fs_i.h | 11 +++++++++++
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index f5d2d23..36a5df9 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2052,7 +2052,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 			pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1);
 			pipe->nrbufs--;
 		} else {
-			ibuf->ops->get(pipe, ibuf);
+			pipe_buf_get(pipe, ibuf);
 			*obuf = *ibuf;
 			obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
 			obuf->len = rem;
diff --git a/fs/splice.c b/fs/splice.c
index 8398974..fde1263 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1876,7 +1876,7 @@ retry:
 			 * Get a reference to this pipe buffer,
 			 * so we can copy the contents over.
 			 */
-			ibuf->ops->get(ipipe, ibuf);
+			pipe_buf_get(ipipe, ibuf);
 			*obuf = *ibuf;
 
 			/*
@@ -1948,7 +1948,7 @@ static int link_pipe(struct pipe_inode_info *ipipe,
 		 * Get a reference to this pipe buffer,
 		 * so we can copy the contents over.
 		 */
-		ibuf->ops->get(ipipe, ibuf);
+		pipe_buf_get(ipipe, ibuf);
 
 		obuf = opipe->bufs + nbuf;
 		*obuf = *ibuf;
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 24f5470..10876f3 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -115,6 +115,17 @@ struct pipe_buf_operations {
 	void (*get)(struct pipe_inode_info *, struct pipe_buffer *);
 };
 
+/**
+ * pipe_buf_get - get a reference to a pipe_buffer
+ * @pipe:	the pipe that the buffer belongs to
+ * @buf:	the buffer to get a reference to
+ */
+static inline void pipe_buf_get(struct pipe_inode_info *pipe,
+				struct pipe_buffer *buf)
+{
+	buf->ops->get(pipe, buf);
+}
+
 /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
    memory allocation, whereas PIPE_BUF makes atomicity guarantees.  */
 #define PIPE_SIZE		PAGE_SIZE
-- 
2.7.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 8/8] fs: prevent page refcount overflow in pipe_buf_get
  2019-10-09  0:44 [PATCH v2 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (6 preceding siblings ...)
  2019-10-09  0:44 ` [PATCH v2 7/8] pipe: add pipe_buf_get() helper Ajay Kaher
@ 2019-10-09  0:44 ` Ajay Kaher
  7 siblings, 0 replies; 11+ messages in thread
From: Ajay Kaher @ 2019-10-09  0:44 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, stable

From: Matthew Wilcox <willy@infradead.org>

commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream.

Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount.  All
callers converted to handle a failure.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ 4.4.y backport notes:
  Regarding the change in generic_pipe_buf_get(), note that
  page_cache_get() is the same as get_page(). See mainline commit
  09cbfeaf1a5a6 "mm, fs: get rid of PAGE_CACHE_* and
  page_cache_{get,release} macros" for context. ]
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
---
 fs/fuse/dev.c             | 12 ++++++------
 fs/pipe.c                 |  4 ++--
 fs/splice.c               | 12 ++++++++++--
 include/linux/pipe_fs_i.h | 10 ++++++----
 kernel/trace/trace.c      |  6 +++++-
 5 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 36a5df9..16891f5 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2031,10 +2031,8 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 		rem += pipe->bufs[(pipe->curbuf + idx) & (pipe->buffers - 1)].len;
 
 	ret = -EINVAL;
-	if (rem < len) {
-		pipe_unlock(pipe);
-		goto out;
-	}
+	if (rem < len)
+		goto out_free;
 
 	rem = len;
 	while (rem) {
@@ -2052,7 +2050,9 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 			pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1);
 			pipe->nrbufs--;
 		} else {
-			pipe_buf_get(pipe, ibuf);
+			if (!pipe_buf_get(pipe, ibuf))
+				goto out_free;
+
 			*obuf = *ibuf;
 			obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
 			obuf->len = rem;
@@ -2075,13 +2075,13 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 	ret = fuse_dev_do_write(fud, &cs, len);
 
 	pipe_lock(pipe);
+out_free:
 	for (idx = 0; idx < nbuf; idx++) {
 		struct pipe_buffer *buf = &bufs[idx];
 		buf->ops->release(pipe, buf);
 	}
 	pipe_unlock(pipe);
 
-out:
 	kfree(bufs);
 	return ret;
 }
diff --git a/fs/pipe.c b/fs/pipe.c
index 1e7263b..6534470 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -178,9 +178,9 @@ EXPORT_SYMBOL(generic_pipe_buf_steal);
  *	in the tee() system call, when we duplicate the buffers in one
  *	pipe into another.
  */
-void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
+bool generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
 {
-	page_cache_get(buf->page);
+	return try_get_page(buf->page);
 }
 EXPORT_SYMBOL(generic_pipe_buf_get);
 
diff --git a/fs/splice.c b/fs/splice.c
index fde1263..57ccc58 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1876,7 +1876,11 @@ retry:
 			 * Get a reference to this pipe buffer,
 			 * so we can copy the contents over.
 			 */
-			pipe_buf_get(ipipe, ibuf);
+			if (!pipe_buf_get(ipipe, ibuf)) {
+				if (ret == 0)
+					ret = -EFAULT;
+				break;
+			}
 			*obuf = *ibuf;
 
 			/*
@@ -1948,7 +1952,11 @@ static int link_pipe(struct pipe_inode_info *ipipe,
 		 * Get a reference to this pipe buffer,
 		 * so we can copy the contents over.
 		 */
-		pipe_buf_get(ipipe, ibuf);
+		if (!pipe_buf_get(ipipe, ibuf)) {
+			if (ret == 0)
+				ret = -EFAULT;
+			break;
+		}
 
 		obuf = opipe->bufs + nbuf;
 		*obuf = *ibuf;
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 10876f3..0b28b65 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -112,18 +112,20 @@ struct pipe_buf_operations {
 	/*
 	 * Get a reference to the pipe buffer.
 	 */
-	void (*get)(struct pipe_inode_info *, struct pipe_buffer *);
+	bool (*get)(struct pipe_inode_info *, struct pipe_buffer *);
 };
 
 /**
  * pipe_buf_get - get a reference to a pipe_buffer
  * @pipe:	the pipe that the buffer belongs to
  * @buf:	the buffer to get a reference to
+ *
+ * Return: %true if the reference was successfully obtained.
  */
-static inline void pipe_buf_get(struct pipe_inode_info *pipe,
+static inline __must_check bool pipe_buf_get(struct pipe_inode_info *pipe,
 				struct pipe_buffer *buf)
 {
-	buf->ops->get(pipe, buf);
+	return buf->ops->get(pipe, buf);
 }
 
 /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
@@ -148,7 +150,7 @@ struct pipe_inode_info *alloc_pipe_info(void);
 void free_pipe_info(struct pipe_inode_info *);
 
 /* Generic pipe buffer ops functions */
-void generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *);
+bool generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *);
 int generic_pipe_buf_confirm(struct pipe_inode_info *, struct pipe_buffer *);
 int generic_pipe_buf_steal(struct pipe_inode_info *, struct pipe_buffer *);
 void generic_pipe_buf_release(struct pipe_inode_info *, struct pipe_buffer *);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ae00e68..7fe8d04 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5731,12 +5731,16 @@ static void buffer_pipe_buf_release(struct pipe_inode_info *pipe,
 	buf->private = 0;
 }
 
-static void buffer_pipe_buf_get(struct pipe_inode_info *pipe,
+static bool buffer_pipe_buf_get(struct pipe_inode_info *pipe,
 				struct pipe_buffer *buf)
 {
 	struct buffer_ref *ref = (struct buffer_ref *)buf->private;
 
+	if (ref->ref > INT_MAX/2)
+		return false;
+
 	ref->ref++;
+	return true;
 }
 
 /* Pipe buffer operations for a buffer. */
-- 
2.7.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 6/8] mm: prevent get_user_pages() from overflowing page refcount
  2019-10-09  0:44 ` [PATCH v2 6/8] mm: prevent get_user_pages() from overflowing page refcount Ajay Kaher
@ 2019-10-09 13:13   ` Vlastimil Babka
  2019-10-17 16:28     ` Ajay Kaher
  0 siblings, 1 reply; 11+ messages in thread
From: Vlastimil Babka @ 2019-10-09 13:13 UTC (permalink / raw)
  To: Ajay Kaher, gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, stable, Ben Hutchings

On 10/9/19 2:44 AM, Ajay Kaher wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> 
> commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream.
> 
> If the page refcount wraps around past zero, it will be freed while
> there are still four billion references to it.  One of the possible
> avenues for an attacker to try to make this happen is by doing direct IO
> on a page multiple times.  This patch makes get_user_pages() refuse to
> take a new page reference if there are already more than two billion
> references to the page.
> 
> Reported-by: Jann Horn <jannh@google.com>
> Acked-by: Matthew Wilcox <willy@infradead.org>
> Cc: stable@kernel.org
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> [ 4.4.y backport notes:
>   Ajay: Added local variable 'err' with-in follow_hugetlb_page()
>         from 2be7cfed995e, to resolve compilation error
>   Srivatsa: Replaced call to get_page_foll() with try_get_page_foll() ]
> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> ---
>  mm/gup.c     | 43 ++++++++++++++++++++++++++++++++-----------
>  mm/hugetlb.c | 16 +++++++++++++++-
>  2 files changed, 47 insertions(+), 12 deletions(-)

This seems to have the same issue as the 4.9 stable version [1], in not
touching the arch-specific gup.c variants.

[1]
https://lore.kernel.org/lkml/6650323f-dbc9-f069-000b-f6b0f941a065@suse.cz/

> diff --git a/mm/gup.c b/mm/gup.c
> index fae4d1e..171b460 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -126,8 +126,12 @@ retry:
>  		}
>  	}
>  
> -	if (flags & FOLL_GET)
> -		get_page_foll(page);
> +	if (flags & FOLL_GET) {
> +		if (unlikely(!try_get_page_foll(page))) {
> +			page = ERR_PTR(-ENOMEM);
> +			goto out;
> +		}
> +	}
>  	if (flags & FOLL_TOUCH) {
>  		if ((flags & FOLL_WRITE) &&
>  		    !pte_dirty(pte) && !PageDirty(page))
> @@ -289,7 +293,10 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
>  			goto unmap;
>  		*page = pte_page(*pte);
>  	}
> -	get_page(*page);
> +	if (unlikely(!try_get_page(*page))) {
> +		ret = -ENOMEM;
> +		goto unmap;
> +	}
>  out:
>  	ret = 0;
>  unmap:
> @@ -1053,6 +1060,20 @@ struct page *get_dump_page(unsigned long addr)
>   */
>  #ifdef CONFIG_HAVE_GENERIC_RCU_GUP
>  
> +/*
> + * Return the compund head page with ref appropriately incremented,
> + * or NULL if that failed.
> + */
> +static inline struct page *try_get_compound_head(struct page *page, int refs)
> +{
> +	struct page *head = compound_head(page);
> +	if (WARN_ON_ONCE(atomic_read(&head->_count) < 0))
> +		return NULL;
> +	if (unlikely(!page_cache_add_speculative(head, refs)))
> +		return NULL;
> +	return head;
> +}
> +
>  #ifdef __HAVE_ARCH_PTE_SPECIAL
>  static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  			 int write, struct page **pages, int *nr)
> @@ -1082,9 +1103,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  
>  		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>  		page = pte_page(pte);
> -		head = compound_head(page);
>  
> -		if (!page_cache_get_speculative(head))
> +		head = try_get_compound_head(page, 1);
> +		if (!head)
>  			goto pte_unmap;
>  
>  		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> @@ -1141,8 +1162,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  		refs++;
>  	} while (addr += PAGE_SIZE, addr != end);
>  
> -	head = compound_head(pmd_page(orig));
> -	if (!page_cache_add_speculative(head, refs)) {
> +	head = try_get_compound_head(pmd_page(orig), refs);
> +	if (!head) {
>  		*nr -= refs;
>  		return 0;
>  	}
> @@ -1187,8 +1208,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>  		refs++;
>  	} while (addr += PAGE_SIZE, addr != end);
>  
> -	head = compound_head(pud_page(orig));
> -	if (!page_cache_add_speculative(head, refs)) {
> +	head = try_get_compound_head(pud_page(orig), refs);
> +	if (!head) {
>  		*nr -= refs;
>  		return 0;
>  	}
> @@ -1229,8 +1250,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
>  		refs++;
>  	} while (addr += PAGE_SIZE, addr != end);
>  
> -	head = compound_head(pgd_page(orig));
> -	if (!page_cache_add_speculative(head, refs)) {
> +	head = try_get_compound_head(pgd_page(orig), refs);
> +	if (!head) {
>  		*nr -= refs;
>  		return 0;
>  	}
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fd932e7..3a1501e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3886,6 +3886,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	unsigned long vaddr = *position;
>  	unsigned long remainder = *nr_pages;
>  	struct hstate *h = hstate_vma(vma);
> +	int err = -EFAULT;
>  
>  	while (vaddr < vma->vm_end && remainder) {
>  		pte_t *pte;
> @@ -3957,6 +3958,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  
>  		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
>  		page = pte_page(huge_ptep_get(pte));
> +
> +		/*
> +		 * Instead of doing 'try_get_page_foll()' below in the same_page
> +		 * loop, just check the count once here.
> +		 */
> +		if (unlikely(page_count(page) <= 0)) {
> +			if (pages) {
> +				spin_unlock(ptl);
> +				remainder = 0;
> +				err = -ENOMEM;
> +				break;
> +			}
> +		}
>  same_page:
>  		if (pages) {
>  			pages[i] = mem_map_offset(page, pfn_offset);
> @@ -3983,7 +3997,7 @@ same_page:
>  	*nr_pages = remainder;
>  	*position = vaddr;
>  
> -	return i ? i : -EFAULT;
> +	return i ? i : err;
>  }
>  
>  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 6/8] mm: prevent get_user_pages() from overflowing page refcount
  2019-10-09 13:13   ` Vlastimil Babka
@ 2019-10-17 16:28     ` Ajay Kaher
  0 siblings, 0 replies; 11+ messages in thread
From: Ajay Kaher @ 2019-10-17 16:28 UTC (permalink / raw)
  To: Vlastimil Babka, gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel,
	Srivatsa Bhat, srivatsa, Alexey Makhalov, Srinidhi Rao,
	Vikash Bansal, Anish Swaminathan, Vasavi Sirnapalli,
	Steven Rostedt, stable, Ben Hutchings


On 09/10/19, 6:43 PM, "Vlastimil Babka" <vbabka@suse.cz> wrote:

>> Reported-by: Jann Horn <jannh@google.com>
>> Acked-by: Matthew Wilcox <willy@infradead.org>
>> Cc: stable@kernel.org
>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>> [ 4.4.y backport notes:
>>   Ajay: Added local variable 'err' with-in follow_hugetlb_page()
>>         from 2be7cfed995e, to resolve compilation error
>>   Srivatsa: Replaced call to get_page_foll() with try_get_page_foll() ]
>> Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
>> Signed-off-by: Ajay Kaher <akaher@vmware.com>
>> ---
>>  mm/gup.c     | 43 ++++++++++++++++++++++++++++++++-----------
>>  mm/hugetlb.c | 16 +++++++++++++++-
>>  2 files changed, 47 insertions(+), 12 deletions(-)
>    
> This seems to have the same issue as the 4.9 stable version [1], in not
> touching the arch-specific gup.c variants.
>    
> [1]
> https://lore.kernel.org/lkml/6650323f-dbc9-f069-000b-f6b0f941a065@suse.cz/

Thanks Vlastimil for highlighting this here.

Yes, arch-specific gup.c variants also need to handle not only for 4.4.y,
however it should be handled till 4.19.y. I believe it's better to start
from 4.19.y and then backport those changes till 4.4.y.

Affected areas of gup.c (where page->count have been used) are:
#1: get_page() used in these files and this is safe as
       it's defined in mm.h (here it's already taken care of)
#2: get_head_page_multiple() has following:
               VM_BUG_ON_PAGE(page_count(page) == 0, page);
           Need to change this to:
               VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
#3: Some of the files have used page_cache_get_speculative(),
       page_cache_add_speculative() with combination of compound_head(),
       this scenario needs to be handled as it was handled here:
           https://lore.kernel.org/stable/1570581863-12090-7-git-send-email-akaher@vmware.com/

Please share with me any suggestions or patches if you have already  
worked on this.

Could we handle arch-specific gup.c in different patch sets and 
let these patches to merge to 4.4.y?

- Ajay



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09  0:44 [PATCH v2 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
2019-10-09  0:44 ` [PATCH v2 1/8] mm: make page ref count overflow check tighter and more explicit Ajay Kaher
2019-10-09  0:44 ` [PATCH v2 2/8] mm: add 'try_get_page()' helper function Ajay Kaher
2019-10-09  0:44 ` [PATCH v2 3/8] mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton Ajay Kaher
2019-10-09  0:44 ` [PATCH v2 4/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages Ajay Kaher
2019-10-09  0:44 ` [PATCH v2 5/8] mm, gup: ensure real head page is ref-counted when using hugepages Ajay Kaher
2019-10-09  0:44 ` [PATCH v2 6/8] mm: prevent get_user_pages() from overflowing page refcount Ajay Kaher
2019-10-09 13:13   ` Vlastimil Babka
2019-10-17 16:28     ` Ajay Kaher
2019-10-09  0:44 ` [PATCH v2 7/8] pipe: add pipe_buf_get() helper Ajay Kaher
2019-10-09  0:44 ` [PATCH v2 8/8] fs: prevent page refcount overflow in pipe_buf_get Ajay Kaher

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox