linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest
  2019-12-16 20:45 ` [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest Ajay Kaher
@ 2019-12-16 13:04   ` Peter Zijlstra
  2019-12-16 13:30     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2019-12-16 13:04 UTC (permalink / raw)
  To: Ajay Kaher
  Cc: gregkh, stable, torvalds, punit.agrawal, akpm, kirill.shutemov,
	willy, will.deacon, mszeredi, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, Vlastimil Babka, Oscar Salvador, Thomas Gleixner,
	Ingo Molnar, Juergen Gross, Vitaly Kuznetsov, Borislav Petkov,
	Dave Hansen, Andy Lutomirski

On Tue, Dec 17, 2019 at 02:15:48AM +0530, Ajay Kaher wrote:
> From: Vlastimil Babka <vbabka@suse.cz>
> 
> The x86 version of get_user_pages_fast() relies on disabled interrupts to
> synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against
> a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then
> releases the page. As TLB flush is done synchronously via IPI disabling
> interrupts blocks the page release, and get_page(), which assumes existing
> reference on page, is thus safe.
> However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is
> no blocking thanks to disabled interrupts, and get_page() can succeed on a page
> that was already freed or even reused.
> 
> We have recently seen this happen with our 4.4 and 4.12 based kernels, with
> userspace (java) that exits a thread, where mm_release() performs a futex_wake()
> on tsk->clear_child_tid, and another thread in parallel unmaps the page where
> tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code
> immediately releases the page again, while it's already on a freelist. Symptoms
> include a bad page state warning, general protection faults acessing a poisoned
> list prev/next pointer in the freelist, or free page pcplists of two cpus joined
> together in a single list. Oscar has also reproduced this scenario, with a
> patch inserting delays before the get_page() to make the race window larger.
> 
> Fix this by removing the dependency on TLB flush interrupts the same way as the

This is suppsed to be fixed by:

arch/x86/Kconfig:       select HAVE_RCU_TABLE_FREE              if PARAVIRT

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest
  2019-12-16 13:04   ` Peter Zijlstra
@ 2019-12-16 13:30     ` Vitaly Kuznetsov
  2019-12-16 13:47       ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Vitaly Kuznetsov @ 2019-12-16 13:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ajay Kaher
  Cc: gregkh, stable, torvalds, punit.agrawal, akpm, kirill.shutemov,
	willy, will.deacon, mszeredi, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, Vlastimil Babka, Oscar Salvador, Thomas Gleixner,
	Ingo Molnar, Juergen Gross, Borislav Petkov, Dave Hansen,
	Andy Lutomirski

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Dec 17, 2019 at 02:15:48AM +0530, Ajay Kaher wrote:
>> From: Vlastimil Babka <vbabka@suse.cz>
>> 
>> The x86 version of get_user_pages_fast() relies on disabled interrupts to
>> synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against
>> a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then
>> releases the page. As TLB flush is done synchronously via IPI disabling
>> interrupts blocks the page release, and get_page(), which assumes existing
>> reference on page, is thus safe.
>> However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is
>> no blocking thanks to disabled interrupts, and get_page() can succeed on a page
>> that was already freed or even reused.
>> 
>> We have recently seen this happen with our 4.4 and 4.12 based kernels, with
>> userspace (java) that exits a thread, where mm_release() performs a futex_wake()
>> on tsk->clear_child_tid, and another thread in parallel unmaps the page where
>> tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code
>> immediately releases the page again, while it's already on a freelist. Symptoms
>> include a bad page state warning, general protection faults acessing a poisoned
>> list prev/next pointer in the freelist, or free page pcplists of two cpus joined
>> together in a single list. Oscar has also reproduced this scenario, with a
>> patch inserting delays before the get_page() to make the race window larger.
>> 
>> Fix this by removing the dependency on TLB flush interrupts the same way as the
>
> This is suppsed to be fixed by:
>
> arch/x86/Kconfig:       select HAVE_RCU_TABLE_FREE              if PARAVIRT
>

Yes,

but HAVE_RCU_TABLE_FREE was enabled on x86 only in 4.14:

commit 9e52fc2b50de3a1c08b44f94c610fbe998c0031a
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Mon Aug 28 10:22:51 2017 +0200

    x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)

and, if I understood correctly, Ajay is suggesting the patch for older
stable kernels (4.9 and 4.4 I would guess).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest
  2019-12-16 13:30     ` Vitaly Kuznetsov
@ 2019-12-16 13:47       ` Peter Zijlstra
  2019-12-16 15:11         ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2019-12-16 13:47 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Ajay Kaher, gregkh, stable, torvalds, punit.agrawal, akpm,
	kirill.shutemov, willy, will.deacon, mszeredi, linux-mm,
	linux-kernel, srivatsab, srivatsa, amakhalov, srinidhir, bvikas,
	anishs, vsirnapalli, srostedt, Vlastimil Babka, Oscar Salvador,
	Thomas Gleixner, Ingo Molnar, Juergen Gross, Borislav Petkov,
	Dave Hansen, Andy Lutomirski

On Mon, Dec 16, 2019 at 02:30:44PM +0100, Vitaly Kuznetsov wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Tue, Dec 17, 2019 at 02:15:48AM +0530, Ajay Kaher wrote:
> >> From: Vlastimil Babka <vbabka@suse.cz>
> >> 
> >> The x86 version of get_user_pages_fast() relies on disabled interrupts to
> >> synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against
> >> a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then
> >> releases the page. As TLB flush is done synchronously via IPI disabling
> >> interrupts blocks the page release, and get_page(), which assumes existing
> >> reference on page, is thus safe.
> >> However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is
> >> no blocking thanks to disabled interrupts, and get_page() can succeed on a page
> >> that was already freed or even reused.
> >> 
> >> We have recently seen this happen with our 4.4 and 4.12 based kernels, with
> >> userspace (java) that exits a thread, where mm_release() performs a futex_wake()
> >> on tsk->clear_child_tid, and another thread in parallel unmaps the page where
> >> tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code
> >> immediately releases the page again, while it's already on a freelist. Symptoms
> >> include a bad page state warning, general protection faults acessing a poisoned
> >> list prev/next pointer in the freelist, or free page pcplists of two cpus joined
> >> together in a single list. Oscar has also reproduced this scenario, with a
> >> patch inserting delays before the get_page() to make the race window larger.
> >> 
> >> Fix this by removing the dependency on TLB flush interrupts the same way as the
> >
> > This is suppsed to be fixed by:
> >
> > arch/x86/Kconfig:       select HAVE_RCU_TABLE_FREE              if PARAVIRT
> >
> 
> Yes,
> 
> but HAVE_RCU_TABLE_FREE was enabled on x86 only in 4.14:
> 
> commit 9e52fc2b50de3a1c08b44f94c610fbe998c0031a
> Author: Vitaly Kuznetsov <vkuznets@redhat.com>
> Date:   Mon Aug 28 10:22:51 2017 +0200
> 
>     x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
> 
> and, if I understood correctly, Ajay is suggesting the patch for older
> stable kernels (4.9 and 4.4 I would guess).

It wasn't at all clear this was targeted at old kernels (I only got this
one patch).

And why can't those necro kernels do backports of the upstream solution?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest
  2019-12-16 13:47       ` Peter Zijlstra
@ 2019-12-16 15:11         ` Vlastimil Babka
  2019-12-16 16:08           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2019-12-16 15:11 UTC (permalink / raw)
  To: Peter Zijlstra, Vitaly Kuznetsov
  Cc: Ajay Kaher, gregkh, stable, torvalds, punit.agrawal, akpm,
	kirill.shutemov, willy, will.deacon, mszeredi, linux-mm,
	linux-kernel, srivatsab, srivatsa, amakhalov, srinidhir, bvikas,
	anishs, vsirnapalli, srostedt, Oscar Salvador, Thomas Gleixner,
	Ingo Molnar, Juergen Gross, Borislav Petkov, Dave Hansen,
	Andy Lutomirski

On 12/16/19 2:47 PM, Peter Zijlstra wrote:
> On Mon, Dec 16, 2019 at 02:30:44PM +0100, Vitaly Kuznetsov wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>>
>>> On Tue, Dec 17, 2019 at 02:15:48AM +0530, Ajay Kaher wrote:
>>>> From: Vlastimil Babka <vbabka@suse.cz>
>>>>
>>>> The x86 version of get_user_pages_fast() relies on disabled interrupts to
>>>> synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against
>>>> a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then
>>>> releases the page. As TLB flush is done synchronously via IPI disabling
>>>> interrupts blocks the page release, and get_page(), which assumes existing
>>>> reference on page, is thus safe.
>>>> However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is
>>>> no blocking thanks to disabled interrupts, and get_page() can succeed on a page
>>>> that was already freed or even reused.
>>>>
>>>> We have recently seen this happen with our 4.4 and 4.12 based kernels, with
>>>> userspace (java) that exits a thread, where mm_release() performs a futex_wake()
>>>> on tsk->clear_child_tid, and another thread in parallel unmaps the page where
>>>> tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code
>>>> immediately releases the page again, while it's already on a freelist. Symptoms
>>>> include a bad page state warning, general protection faults acessing a poisoned
>>>> list prev/next pointer in the freelist, or free page pcplists of two cpus joined
>>>> together in a single list. Oscar has also reproduced this scenario, with a
>>>> patch inserting delays before the get_page() to make the race window larger.
>>>>
>>>> Fix this by removing the dependency on TLB flush interrupts the same way as the
>>>
>>> This is suppsed to be fixed by:
>>>
>>> arch/x86/Kconfig:       select HAVE_RCU_TABLE_FREE              if PARAVIRT
>>>
>>
>> Yes,

Well, that commit fixes the "page table can be freed under us" part. But
this patch is about the "get_page() will succeed on a page that's being
freed" part. Upstream fixed that unknowingly in 4.13 by a gup.c
refactoring that would be too risky to backport fully.

>> but HAVE_RCU_TABLE_FREE was enabled on x86 only in 4.14:
>>
>> commit 9e52fc2b50de3a1c08b44f94c610fbe998c0031a
>> Author: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Date:   Mon Aug 28 10:22:51 2017 +0200
>>
>>     x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
>>
>> and, if I understood correctly, Ajay is suggesting the patch for older
>> stable kernels (4.9 and 4.4 I would guess).
> 
> It wasn't at all clear this was targeted at old kernels (I only got this
> one patch).
> 
> And why can't those necro kernels do backports of the upstream solution?
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest
  2019-12-16 15:11         ` Vlastimil Babka
@ 2019-12-16 16:08           ` Vitaly Kuznetsov
  2019-12-17  4:13             ` Ajay Kaher
  2020-01-31 12:51             ` Ajay Kaher
  0 siblings, 2 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2019-12-16 16:08 UTC (permalink / raw)
  To: Vlastimil Babka, Ajay Kaher
  Cc: Peter Zijlstra, gregkh, stable, torvalds, punit.agrawal, akpm,
	kirill.shutemov, willy, will.deacon, mszeredi, linux-mm,
	linux-kernel, srivatsab, srivatsa, amakhalov, srinidhir, bvikas,
	anishs, vsirnapalli, srostedt, Oscar Salvador, Thomas Gleixner,
	Ingo Molnar, Juergen Gross, Borislav Petkov, Dave Hansen,
	Andy Lutomirski

Vlastimil Babka <vbabka@suse.cz> writes:

> On 12/16/19 2:47 PM, Peter Zijlstra wrote:
>> On Mon, Dec 16, 2019 at 02:30:44PM +0100, Vitaly Kuznetsov wrote:
>>> Peter Zijlstra <peterz@infradead.org> writes:
>>>
>>>> On Tue, Dec 17, 2019 at 02:15:48AM +0530, Ajay Kaher wrote:
>>>>> From: Vlastimil Babka <vbabka@suse.cz>
>>>>>
>>>>> The x86 version of get_user_pages_fast() relies on disabled interrupts to
>>>>> synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against
>>>>> a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then
>>>>> releases the page. As TLB flush is done synchronously via IPI disabling
>>>>> interrupts blocks the page release, and get_page(), which assumes existing
>>>>> reference on page, is thus safe.
>>>>> However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is
>>>>> no blocking thanks to disabled interrupts, and get_page() can succeed on a page
>>>>> that was already freed or even reused.
>>>>>
>>>>> We have recently seen this happen with our 4.4 and 4.12 based kernels, with
>>>>> userspace (java) that exits a thread, where mm_release() performs a futex_wake()
>>>>> on tsk->clear_child_tid, and another thread in parallel unmaps the page where
>>>>> tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code
>>>>> immediately releases the page again, while it's already on a freelist. Symptoms
>>>>> include a bad page state warning, general protection faults acessing a poisoned
>>>>> list prev/next pointer in the freelist, or free page pcplists of two cpus joined
>>>>> together in a single list. Oscar has also reproduced this scenario, with a
>>>>> patch inserting delays before the get_page() to make the race window larger.
>>>>>
>>>>> Fix this by removing the dependency on TLB flush interrupts the same way as the
>>>>
>>>> This is suppsed to be fixed by:
>>>>
>>>> arch/x86/Kconfig:       select HAVE_RCU_TABLE_FREE              if PARAVIRT
>>>>
>>>
>>> Yes,
>
> Well, that commit fixes the "page table can be freed under us" part. But
> this patch is about the "get_page() will succeed on a page that's being
> freed" part. Upstream fixed that unknowingly in 4.13 by a gup.c
> refactoring that would be too risky to backport fully.
>

(I also dislike receiving only this patch of the series, next time
please send the whole thing, it's only 8 patches, our mailfolders will
survive that)

When I was adding Hyper-V PV TLB flush to RHEL7 - which is 3.10 based -
in addition to adding page_cache_get_speculative() to
gup_get_pte()/gup_huge_pmd()/gup_huge_pud() I also had to synchronize
huge PMD split against gup_fast with the following hack:

+static void do_nothing(void *unused)
+{
+
+}
+
+static void serialize_against_pte_lookup(struct mm_struct *mm)
+{
+       smp_mb();
+       smp_call_function_many(mm_cpumask(mm), do_nothing, NULL, 1);
+}
+
 void pmdp_splitting_flush(struct vm_area_struct *vma,
                          unsigned long address, pmd_t *pmdp)
 {
@@ -434,9 +473,10 @@ void pmdp_splitting_flush(struct vm_area_struct *vma,
        set = !test_and_set_bit(_PAGE_BIT_SPLITTING,
                                (unsigned long *)pmdp);
        if (set) {
                /* need tlb flush only to serialize against gup-fast */
                flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+               if (pv_mmu_ops.flush_tlb_others != native_flush_tlb_others)
+                       serialize_against_pte_lookup(vma->vm_mm);
        }
 }

I'm not sure which stable kernel you're targeting (and if you addressed this
with other patches in the series, if this is needed,...) so JFYI.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 0/8] Backported fixes for 4.4 stable tree
@ 2019-12-16 20:45 Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 1/8] mm: make page ref count overflow check tighter and more explicit Ajay Kaher
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Ajay Kaher @ 2019-12-16 20:45 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher

These patches include few backported fixes for the 4.4 stable
tree.
I would appreciate if you could kindly consider including them in the
next release.

Ajay

---

[Changes from v2]:
Merged following changes from Vlastimil's series [1]:
- Added page_ref_count() in [Patch v3 5/8]
- Added missing refcount overflow checks on x86 and s390 [Patch v3 5/8]
- Added [Patch v3 8/8]
- Removed 7aef4172c795 i.e. [Patch v2 3/8]

[1] https://lore.kernel.org/stable/20191108093814.16032-1-vbabka@suse.cz/

---

[PATCH v3 1/8]:
Backporting of upstream commit f958d7b528b1:
mm: make page ref count overflow check tighter and more explicit

[PATCH v3 2/8]:
Backporting of upstream commit 88b1a17dfc3e:
mm: add 'try_get_page()' helper function

[PATCH v3 3/8]:
Backporting of upstream commit a3e328556d41:
mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages

[PATCH v3 4/8]:
Backporting of upstream commit d63206ee32b6:
mm, gup: ensure real head page is ref-counted when using hugepages

[PATCH v3 5/8]:
Backporting of upstream commit 8fde12ca79af:
mm: prevent get_user_pages() from overflowing page refcount

[PATCH v3 6/8]:
Backporting of upstream commit 7bf2d1df8082:
pipe: add pipe_buf_get() helper

[PATCH v3 7/8]:
Backporting of upstream commit 15fab63e1e57:
fs: prevent page refcount overflow in pipe_buf_get

[PATCH v3 8/8]:
x86, mm, gup: prevent get_page() race with munmap in paravirt guest


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/8] mm: make page ref count overflow check tighter and more explicit
  2019-12-16 20:45 [PATCH v3 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
@ 2019-12-16 20:45 ` Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 2/8] mm: add 'try_get_page()' helper function Ajay Kaher
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ajay Kaher @ 2019-12-16 20:45 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Jann Horn, stable

From: Linus Torvalds <torvalds@linux-foundation.org>

commit f958d7b528b1b40c44cfda5eabe2d82760d868c3 upsteam.

We have a VM_BUG_ON() to check that the page reference count doesn't
underflow (or get close to overflow) by checking the sign of the count.

That's all fine, but we actually want to allow people to use a "get page
ref unless it's already very high" helper function, and we want that one
to use the sign of the page ref (without triggering this VM_BUG_ON).

Change the VM_BUG_ON to only check for small underflows (or _very_ close
to overflowing), and ignore overflows which have strayed into negative
territory.

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ 4.4.y backport notes:
  Ajay: Open-coded atomic refcount access due to missing
  page_ref_count() helper in 4.4.y
  Srivatsa: Added overflow check to get_page_foll() and related code. ]
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
---
 include/linux/mm.h | 6 +++++-
 mm/internal.h      | 5 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed653ba..701088e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -488,6 +488,10 @@ static inline void get_huge_page_tail(struct page *page)
 
 extern bool __get_page_tail(struct page *page);
 
+/* 127: arbitrary random number, small enough to assemble well */
+#define page_ref_zero_or_close_to_overflow(page) \
+	((unsigned int) atomic_read(&page->_count) + 127u <= 127u)
+
 static inline void get_page(struct page *page)
 {
 	if (unlikely(PageTail(page)))
@@ -497,7 +501,7 @@ static inline void get_page(struct page *page)
 	 * Getting a normal page or the head of a compound page
 	 * requires to already have an elevated page->_count.
 	 */
-	VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page);
+	VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
 	atomic_inc(&page->_count);
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index f63f439..67015e5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -81,7 +81,8 @@ static inline void __get_page_tail_foll(struct page *page,
 	 * speculative page access (like in
 	 * page_cache_get_speculative()) on tail pages.
 	 */
-	VM_BUG_ON_PAGE(atomic_read(&compound_head(page)->_count) <= 0, page);
+	VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(compound_head(page)),
+		       page);
 	if (get_page_head)
 		atomic_inc(&compound_head(page)->_count);
 	get_huge_page_tail(page);
@@ -106,7 +107,7 @@ static inline void get_page_foll(struct page *page)
 		 * Getting a normal page or the head of a compound page
 		 * requires to already have an elevated page->_count.
 		 */
-		VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page);
+		VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
 		atomic_inc(&page->_count);
 	}
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/8] mm: add 'try_get_page()' helper function
  2019-12-16 20:45 [PATCH v3 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 1/8] mm: make page ref count overflow check tighter and more explicit Ajay Kaher
@ 2019-12-16 20:45 ` Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 3/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages Ajay Kaher
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ajay Kaher @ 2019-12-16 20:45 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Jann Horn, stable, Vlastimil Babka

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 upsteam.

This is the same as the traditional 'get_page()' function, but instead
of unconditionally incrementing the reference count of the page, it only
does so if the count was "safe".  It returns whether the reference count
was incremented (and is marked __must_check, since the caller obviously
has to be aware of it).

Also like 'get_page()', you can't use this function unless you already
had a reference to the page.  The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.

The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.

NOTE! The count access for the "safety" check is inherently racy, but
that doesn't matter since the buffer we use is basically half the range
of the reference count (ie we look at the sign of the count).

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ 4.4.y backport notes:
  Srivatsa:
  - Adapted try_get_page() to match the get_page()
    implementation in 4.4.y, except for the refcount check.
  - Added try_get_page_foll() which will be needed
    in a subsequent patch. ]
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mm.h | 12 ++++++++++++
 mm/internal.h      | 23 +++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 701088e..52edaf1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -505,6 +505,18 @@ static inline void get_page(struct page *page)
 	atomic_inc(&page->_count);
 }
 
+static inline __must_check bool try_get_page(struct page *page)
+{
+	if (unlikely(PageTail(page)))
+		if (likely(__get_page_tail(page)))
+			return true;
+
+	if (WARN_ON_ONCE(atomic_read(&page->_count) <= 0))
+		return false;
+	atomic_inc(&page->_count);
+	return true;
+}
+
 static inline struct page *virt_to_head_page(const void *x)
 {
 	struct page *page = virt_to_page(x);
diff --git a/mm/internal.h b/mm/internal.h
index 67015e5..d83afc9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -112,6 +112,29 @@ static inline void get_page_foll(struct page *page)
 	}
 }
 
+static inline __must_check bool try_get_page_foll(struct page *page)
+{
+	if (unlikely(PageTail(page))) {
+		if (WARN_ON_ONCE(atomic_read(&compound_head(page)->_count) <= 0))
+			return false;
+		/*
+		 * This is safe only because
+		 * __split_huge_page_refcount() can't run under
+		 * get_page_foll() because we hold the proper PT lock.
+		 */
+		__get_page_tail_foll(page, true);
+	} else {
+		/*
+		 * Getting a normal page or the head of a compound page
+		 * requires to already have an elevated page->_count.
+		 */
+		if (WARN_ON_ONCE(atomic_read(&page->_count) <= 0))
+			return false;
+		atomic_inc(&page->_count);
+	}
+	return true;
+}
+
 extern unsigned long highest_memmap_pfn;
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages
  2019-12-16 20:45 [PATCH v3 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 1/8] mm: make page ref count overflow check tighter and more explicit Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 2/8] mm: add 'try_get_page()' helper function Ajay Kaher
@ 2019-12-16 20:45 ` Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 4/8] mm, gup: ensure real head page is ref-counted when using hugepages Ajay Kaher
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ajay Kaher @ 2019-12-16 20:45 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Aneesh Kumar K . V, Catalin Marinas,
	Naoya Horiguchi, Mark Rutland, Hillf Danton, Michal Hocko,
	Mike Kravetz, Vlastimil Babka

From: Will Deacon <will.deacon@arm.com>

commit a3e328556d41bb61c55f9dfcc62d6a826ea97b85 upstream.

When operating on hugepages with DEBUG_VM enabled, the GUP code checks
the compound head for each tail page prior to calling
page_cache_add_speculative.  This is broken, because on the fast-GUP
path (where we don't hold any page table locks) we can be racing with a
concurrent invocation of split_huge_page_to_list.

split_huge_page_to_list deals with this race by using page_ref_freeze to
freeze the page and force concurrent GUPs to fail whilst the component
pages are modified.  This modification includes clearing the
compound_head field for the tail pages, so checking this prior to a
successful call to page_cache_add_speculative can lead to false
positives: In fact, page_cache_add_speculative *already* has this check
once the page refcount has been successfully updated, so we can simply
remove the broken calls to VM_BUG_ON_PAGE.

Link: http://lkml.kernel.org/r/20170522133604.11392-2-punit.agrawal@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/gup.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 45c544b..6e7cfaa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1136,7 +1136,6 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
-		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 		pages[*nr] = page;
 		(*nr)++;
 		page++;
@@ -1183,7 +1182,6 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
-		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 		pages[*nr] = page;
 		(*nr)++;
 		page++;
@@ -1226,7 +1224,6 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 	page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
-		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 		pages[*nr] = page;
 		(*nr)++;
 		page++;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/8] mm, gup: ensure real head page is ref-counted when using hugepages
  2019-12-16 20:45 [PATCH v3 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (2 preceding siblings ...)
  2019-12-16 20:45 ` [PATCH v3 3/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages Ajay Kaher
@ 2019-12-16 20:45 ` Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 5/8] mm: prevent get_user_pages() from overflowing page refcount Ajay Kaher
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ajay Kaher @ 2019-12-16 20:45 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Michal Hocko, Aneesh Kumar K . V,
	Catalin Marinas, Naoya Horiguchi, Mark Rutland, Hillf Danton,
	Mike Kravetz, Vlastimil Babka

From: Punit Agrawal <punit.agrawal@arm.com>

commit d63206ee32b6e64b0e12d46e5d6004afd9913713 upstream.

When speculatively taking references to a hugepage using
page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the
page returned by pmd_page() is the head page.  Although normally true,
this assumption doesn't hold when the hugepage comprises of successive
page table entries such as when using contiguous bit on arm64 at PTE or
PMD levels.

This can be addressed by ensuring that the page passed to
page_cache_add_speculative() is the real head or by de-referencing the
head page within the function.

We take the first approach to keep the usage pattern aligned with
page_cache_get_speculative() where users already pass the appropriate
page, i.e., the de-referenced head.

Apply the same logic to fix gup_huge_[pud|pgd]() as well.

[punit.agrawal@arm.com: fix arm64 ltp failure]
  Link: http://lkml.kernel.org/r/20170619170145.25577-5-punit.agrawal@arm.com
Link: http://lkml.kernel.org/r/20170522133604.11392-3-punit.agrawal@arm.com
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/gup.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 6e7cfaa..fae4d1e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1132,8 +1132,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 		return 0;
 
 	refs = 0;
-	head = pmd_page(orig);
-	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
 		pages[*nr] = page;
@@ -1142,6 +1141,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
+	head = compound_head(pmd_page(orig));
 	if (!page_cache_add_speculative(head, refs)) {
 		*nr -= refs;
 		return 0;
@@ -1178,8 +1178,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 		return 0;
 
 	refs = 0;
-	head = pud_page(orig);
-	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
 		pages[*nr] = page;
@@ -1188,6 +1187,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
+	head = compound_head(pud_page(orig));
 	if (!page_cache_add_speculative(head, refs)) {
 		*nr -= refs;
 		return 0;
@@ -1220,8 +1220,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 		return 0;
 
 	refs = 0;
-	head = pgd_page(orig);
-	page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
+	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
 	tail = page;
 	do {
 		pages[*nr] = page;
@@ -1230,6 +1229,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
+	head = compound_head(pgd_page(orig));
 	if (!page_cache_add_speculative(head, refs)) {
 		*nr -= refs;
 		return 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/8] mm: prevent get_user_pages() from overflowing page refcount
  2019-12-16 20:45 [PATCH v3 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (3 preceding siblings ...)
  2019-12-16 20:45 ` [PATCH v3 4/8] mm, gup: ensure real head page is ref-counted when using hugepages Ajay Kaher
@ 2019-12-16 20:45 ` Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 6/8] pipe: add pipe_buf_get() helper Ajay Kaher
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ajay Kaher @ 2019-12-16 20:45 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, stable, Vlastimil Babka

From: Linus Torvalds <torvalds@linux-foundation.org>

commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream.

If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it.  One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times.  This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.

Reported-by: Jann Horn <jannh@google.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ 4.4.y backport notes:
  Ajay:     - Added local variable 'err' with-in follow_hugetlb_page()
              from 2be7cfed995e, to resolve compilation error
            - Added page_ref_count()
            - Added missing refcount overflow checks on x86 and s390
              (Vlastimil, thanks for this change)
  Srivatsa: - Replaced call to get_page_foll() with try_get_page_foll() ]
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 arch/s390/mm/gup.c |  6 ++++--
 arch/x86/mm/gup.c  |  9 ++++++++-
 include/linux/mm.h |  5 +++++
 mm/gup.c           | 42 +++++++++++++++++++++++++++++++++---------
 mm/hugetlb.c       | 16 +++++++++++++++-
 5 files changed, 65 insertions(+), 13 deletions(-)

diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
index 7ad41be..4f7dad3 100644
--- a/arch/s390/mm/gup.c
+++ b/arch/s390/mm/gup.c
@@ -37,7 +37,8 @@ static inline int gup_pte_range(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
 			return 0;
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
-		if (!page_cache_get_speculative(page))
+		if (WARN_ON_ONCE(page_ref_count(page) < 0)
+		    || !page_cache_get_speculative(page))
 			return 0;
 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
 			put_page(page);
@@ -76,7 +77,8 @@ static inline int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
-	if (!page_cache_add_speculative(head, refs)) {
+	if (WARN_ON_ONCE(page_ref_count(head) < 0)
+	    || !page_cache_add_speculative(head, refs)) {
 		*nr -= refs;
 		return 0;
 	}
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 7d2542a..6612d53 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -95,7 +95,10 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
 		}
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
-		get_page(page);
+		if (unlikely(!try_get_page(page))) {
+			pte_unmap(ptep);
+			return 0;
+		}
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		(*nr)++;
@@ -132,6 +135,8 @@ static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr,
 
 	refs = 0;
 	head = pmd_page(pmd);
+	if (WARN_ON_ONCE(page_ref_count(head) <= 0))
+		return 0;
 	page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
 	do {
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
@@ -208,6 +213,8 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr,
 
 	refs = 0;
 	head = pud_page(pud);
+	if (WARN_ON_ONCE(page_ref_count(head) <= 0))
+		return 0;
 	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
 	do {
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52edaf1..31a4500 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -488,6 +488,11 @@ static inline void get_huge_page_tail(struct page *page)
 
 extern bool __get_page_tail(struct page *page);
 
+static inline int page_ref_count(struct page *page)
+{
+	return atomic_read(&page->_count);
+}
+
 /* 127: arbitrary random number, small enough to assemble well */
 #define page_ref_zero_or_close_to_overflow(page) \
 	((unsigned int) atomic_read(&page->_count) + 127u <= 127u)
diff --git a/mm/gup.c b/mm/gup.c
index 71e9d00..4c58578 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -126,8 +126,12 @@ retry:
 		}
 	}
 
-	if (flags & FOLL_GET)
-		get_page_foll(page);
+	if (flags & FOLL_GET) {
+		if (unlikely(!try_get_page_foll(page))) {
+			page = ERR_PTR(-ENOMEM);
+			goto out;
+		}
+	}
 	if (flags & FOLL_TOUCH) {
 		if ((flags & FOLL_WRITE) &&
 		    !pte_dirty(pte) && !PageDirty(page))
@@ -289,7 +293,10 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
 			goto unmap;
 		*page = pte_page(*pte);
 	}
-	get_page(*page);
+	if (unlikely(!try_get_page(*page))) {
+		ret = -ENOMEM;
+		goto unmap;
+	}
 out:
 	ret = 0;
 unmap:
@@ -1053,6 +1060,20 @@ struct page *get_dump_page(unsigned long addr)
  */
 #ifdef CONFIG_HAVE_GENERIC_RCU_GUP
 
+/*
+ * Return the compund head page with ref appropriately incremented,
+ * or NULL if that failed.
+ */
+static inline struct page *try_get_compound_head(struct page *page, int refs)
+{
+	struct page *head = compound_head(page);
+	if (WARN_ON_ONCE(atomic_read(&head->_count) < 0))
+		return NULL;
+	if (unlikely(!page_cache_add_speculative(head, refs)))
+		return NULL;
+	return head;
+}
+
 #ifdef __HAVE_ARCH_PTE_SPECIAL
 static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 			 int write, struct page **pages, int *nr)
@@ -1083,6 +1104,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
 
+		if (WARN_ON_ONCE(page_ref_count(page) < 0))
+			goto pte_unmap;
+
 		if (!page_cache_get_speculative(page))
 			goto pte_unmap;
 
@@ -1139,8 +1163,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
-	head = compound_head(pmd_page(orig));
-	if (!page_cache_add_speculative(head, refs)) {
+	head = try_get_compound_head(pmd_page(orig), refs);
+	if (!head) {
 		*nr -= refs;
 		return 0;
 	}
@@ -1185,8 +1209,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
-	head = compound_head(pud_page(orig));
-	if (!page_cache_add_speculative(head, refs)) {
+	head = try_get_compound_head(pud_page(orig), refs);
+	if (!head) {
 		*nr -= refs;
 		return 0;
 	}
@@ -1227,8 +1251,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 		refs++;
 	} while (addr += PAGE_SIZE, addr != end);
 
-	head = compound_head(pgd_page(orig));
-	if (!page_cache_add_speculative(head, refs)) {
+	head = try_get_compound_head(pgd_page(orig), refs);
+	if (!head) {
 		*nr -= refs;
 		return 0;
 	}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fd932e7..3a1501e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3886,6 +3886,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long vaddr = *position;
 	unsigned long remainder = *nr_pages;
 	struct hstate *h = hstate_vma(vma);
+	int err = -EFAULT;
 
 	while (vaddr < vma->vm_end && remainder) {
 		pte_t *pte;
@@ -3957,6 +3958,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
 		page = pte_page(huge_ptep_get(pte));
+
+		/*
+		 * Instead of doing 'try_get_page_foll()' below in the same_page
+		 * loop, just check the count once here.
+		 */
+		if (unlikely(page_count(page) <= 0)) {
+			if (pages) {
+				spin_unlock(ptl);
+				remainder = 0;
+				err = -ENOMEM;
+				break;
+			}
+		}
 same_page:
 		if (pages) {
 			pages[i] = mem_map_offset(page, pfn_offset);
@@ -3983,7 +3997,7 @@ same_page:
 	*nr_pages = remainder;
 	*position = vaddr;
 
-	return i ? i : -EFAULT;
+	return i ? i : err;
 }
 
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 6/8] pipe: add pipe_buf_get() helper
  2019-12-16 20:45 [PATCH v3 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (4 preceding siblings ...)
  2019-12-16 20:45 ` [PATCH v3 5/8] mm: prevent get_user_pages() from overflowing page refcount Ajay Kaher
@ 2019-12-16 20:45 ` Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 7/8] fs: prevent page refcount overflow in pipe_buf_get Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest Ajay Kaher
  7 siblings, 0 replies; 16+ messages in thread
From: Ajay Kaher @ 2019-12-16 20:45 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Al Viro, Vlastimil Babka

From: Miklos Szeredi <mszeredi@redhat.com>

commit 7bf2d1df80822ec056363627e2014990f068f7aa upstream.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 fs/fuse/dev.c             |  2 +-
 fs/splice.c               |  4 ++--
 include/linux/pipe_fs_i.h | 11 +++++++++++
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index f5d2d23..36a5df9 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2052,7 +2052,7 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 			pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1);
 			pipe->nrbufs--;
 		} else {
-			ibuf->ops->get(pipe, ibuf);
+			pipe_buf_get(pipe, ibuf);
 			*obuf = *ibuf;
 			obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
 			obuf->len = rem;
diff --git a/fs/splice.c b/fs/splice.c
index 8398974..fde1263 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1876,7 +1876,7 @@ retry:
 			 * Get a reference to this pipe buffer,
 			 * so we can copy the contents over.
 			 */
-			ibuf->ops->get(ipipe, ibuf);
+			pipe_buf_get(ipipe, ibuf);
 			*obuf = *ibuf;
 
 			/*
@@ -1948,7 +1948,7 @@ static int link_pipe(struct pipe_inode_info *ipipe,
 		 * Get a reference to this pipe buffer,
 		 * so we can copy the contents over.
 		 */
-		ibuf->ops->get(ipipe, ibuf);
+		pipe_buf_get(ipipe, ibuf);
 
 		obuf = opipe->bufs + nbuf;
 		*obuf = *ibuf;
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 24f5470..10876f3 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -115,6 +115,17 @@ struct pipe_buf_operations {
 	void (*get)(struct pipe_inode_info *, struct pipe_buffer *);
 };
 
+/**
+ * pipe_buf_get - get a reference to a pipe_buffer
+ * @pipe:	the pipe that the buffer belongs to
+ * @buf:	the buffer to get a reference to
+ */
+static inline void pipe_buf_get(struct pipe_inode_info *pipe,
+				struct pipe_buffer *buf)
+{
+	buf->ops->get(pipe, buf);
+}
+
 /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
    memory allocation, whereas PIPE_BUF makes atomicity guarantees.  */
 #define PIPE_SIZE		PAGE_SIZE
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 7/8] fs: prevent page refcount overflow in pipe_buf_get
  2019-12-16 20:45 [PATCH v3 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (5 preceding siblings ...)
  2019-12-16 20:45 ` [PATCH v3 6/8] pipe: add pipe_buf_get() helper Ajay Kaher
@ 2019-12-16 20:45 ` Ajay Kaher
  2019-12-16 20:45 ` [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest Ajay Kaher
  7 siblings, 0 replies; 16+ messages in thread
From: Ajay Kaher @ 2019-12-16 20:45 UTC (permalink / raw)
  To: gregkh
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, stable, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, stable, Vlastimil Babka

From: Matthew Wilcox <willy@infradead.org>

commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream.

Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount.  All
callers converted to handle a failure.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ 4.4.y backport notes:
  Regarding the change in generic_pipe_buf_get(), note that
  page_cache_get() is the same as get_page(). See mainline commit
  09cbfeaf1a5a6 "mm, fs: get rid of PAGE_CACHE_* and
  page_cache_{get,release} macros" for context. ]
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 fs/fuse/dev.c             | 12 ++++++------
 fs/pipe.c                 |  4 ++--
 fs/splice.c               | 12 ++++++++++--
 include/linux/pipe_fs_i.h | 10 ++++++----
 kernel/trace/trace.c      |  6 +++++-
 5 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 36a5df9..16891f5 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2031,10 +2031,8 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 		rem += pipe->bufs[(pipe->curbuf + idx) & (pipe->buffers - 1)].len;
 
 	ret = -EINVAL;
-	if (rem < len) {
-		pipe_unlock(pipe);
-		goto out;
-	}
+	if (rem < len)
+		goto out_free;
 
 	rem = len;
 	while (rem) {
@@ -2052,7 +2050,9 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 			pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1);
 			pipe->nrbufs--;
 		} else {
-			pipe_buf_get(pipe, ibuf);
+			if (!pipe_buf_get(pipe, ibuf))
+				goto out_free;
+
 			*obuf = *ibuf;
 			obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
 			obuf->len = rem;
@@ -2075,13 +2075,13 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 	ret = fuse_dev_do_write(fud, &cs, len);
 
 	pipe_lock(pipe);
+out_free:
 	for (idx = 0; idx < nbuf; idx++) {
 		struct pipe_buffer *buf = &bufs[idx];
 		buf->ops->release(pipe, buf);
 	}
 	pipe_unlock(pipe);
 
-out:
 	kfree(bufs);
 	return ret;
 }
diff --git a/fs/pipe.c b/fs/pipe.c
index 1e7263b..6534470 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -178,9 +178,9 @@ EXPORT_SYMBOL(generic_pipe_buf_steal);
  *	in the tee() system call, when we duplicate the buffers in one
  *	pipe into another.
  */
-void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
+bool generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
 {
-	page_cache_get(buf->page);
+	return try_get_page(buf->page);
 }
 EXPORT_SYMBOL(generic_pipe_buf_get);
 
diff --git a/fs/splice.c b/fs/splice.c
index fde1263..57ccc58 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1876,7 +1876,11 @@ retry:
 			 * Get a reference to this pipe buffer,
 			 * so we can copy the contents over.
 			 */
-			pipe_buf_get(ipipe, ibuf);
+			if (!pipe_buf_get(ipipe, ibuf)) {
+				if (ret == 0)
+					ret = -EFAULT;
+				break;
+			}
 			*obuf = *ibuf;
 
 			/*
@@ -1948,7 +1952,11 @@ static int link_pipe(struct pipe_inode_info *ipipe,
 		 * Get a reference to this pipe buffer,
 		 * so we can copy the contents over.
 		 */
-		pipe_buf_get(ipipe, ibuf);
+		if (!pipe_buf_get(ipipe, ibuf)) {
+			if (ret == 0)
+				ret = -EFAULT;
+			break;
+		}
 
 		obuf = opipe->bufs + nbuf;
 		*obuf = *ibuf;
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 10876f3..0b28b65 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -112,18 +112,20 @@ struct pipe_buf_operations {
 	/*
 	 * Get a reference to the pipe buffer.
 	 */
-	void (*get)(struct pipe_inode_info *, struct pipe_buffer *);
+	bool (*get)(struct pipe_inode_info *, struct pipe_buffer *);
 };
 
 /**
  * pipe_buf_get - get a reference to a pipe_buffer
  * @pipe:	the pipe that the buffer belongs to
  * @buf:	the buffer to get a reference to
+ *
+ * Return: %true if the reference was successfully obtained.
  */
-static inline void pipe_buf_get(struct pipe_inode_info *pipe,
+static inline __must_check bool pipe_buf_get(struct pipe_inode_info *pipe,
 				struct pipe_buffer *buf)
 {
-	buf->ops->get(pipe, buf);
+	return buf->ops->get(pipe, buf);
 }
 
 /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
@@ -148,7 +150,7 @@ struct pipe_inode_info *alloc_pipe_info(void);
 void free_pipe_info(struct pipe_inode_info *);
 
 /* Generic pipe buffer ops functions */
-void generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *);
+bool generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *);
 int generic_pipe_buf_confirm(struct pipe_inode_info *, struct pipe_buffer *);
 int generic_pipe_buf_steal(struct pipe_inode_info *, struct pipe_buffer *);
 void generic_pipe_buf_release(struct pipe_inode_info *, struct pipe_buffer *);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ae00e68..7fe8d04 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5731,12 +5731,16 @@ static void buffer_pipe_buf_release(struct pipe_inode_info *pipe,
 	buf->private = 0;
 }
 
-static void buffer_pipe_buf_get(struct pipe_inode_info *pipe,
+static bool buffer_pipe_buf_get(struct pipe_inode_info *pipe,
 				struct pipe_buffer *buf)
 {
 	struct buffer_ref *ref = (struct buffer_ref *)buf->private;
 
+	if (ref->ref > INT_MAX/2)
+		return false;
+
 	ref->ref++;
+	return true;
 }
 
 /* Pipe buffer operations for a buffer. */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest
  2019-12-16 20:45 [PATCH v3 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
                   ` (6 preceding siblings ...)
  2019-12-16 20:45 ` [PATCH v3 7/8] fs: prevent page refcount overflow in pipe_buf_get Ajay Kaher
@ 2019-12-16 20:45 ` Ajay Kaher
  2019-12-16 13:04   ` Peter Zijlstra
  7 siblings, 1 reply; 16+ messages in thread
From: Ajay Kaher @ 2019-12-16 20:45 UTC (permalink / raw)
  To: gregkh, stable
  Cc: torvalds, punit.agrawal, akpm, kirill.shutemov, willy,
	will.deacon, mszeredi, linux-mm, linux-kernel, srivatsab,
	srivatsa, amakhalov, srinidhir, bvikas, anishs, vsirnapalli,
	srostedt, akaher, Vlastimil Babka, Oscar Salvador,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juergen Gross,
	Vitaly Kuznetsov, Borislav Petkov, Dave Hansen, Andy Lutomirski

From: Vlastimil Babka <vbabka@suse.cz>

The x86 version of get_user_pages_fast() relies on disabled interrupts to
synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against
a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then
releases the page. As TLB flush is done synchronously via IPI disabling
interrupts blocks the page release, and get_page(), which assumes existing
reference on page, is thus safe.
However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is
no blocking thanks to disabled interrupts, and get_page() can succeed on a page
that was already freed or even reused.

We have recently seen this happen with our 4.4 and 4.12 based kernels, with
userspace (java) that exits a thread, where mm_release() performs a futex_wake()
on tsk->clear_child_tid, and another thread in parallel unmaps the page where
tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code
immediately releases the page again, while it's already on a freelist. Symptoms
include a bad page state warning, general protection faults acessing a poisoned
list prev/next pointer in the freelist, or free page pcplists of two cpus joined
together in a single list. Oscar has also reproduced this scenario, with a
patch inserting delays before the get_page() to make the race window larger.

Fix this by removing the dependency on TLB flush interrupts the same way as the
generic get_user_pages_fast() code by using page_cache_add_speculative() and
revalidating the PTE contents after pinning the page. Mainline is safe since
4.13 where the x86 gup code was removed in favor of the common code. Accessing
the page table itself safely also relies on disabled interrupts and TLB flush
IPIs that don't happen with hypercalls, which was acknowledged in commit
9e52fc2b50de ("x86/mm: Enable RCU based page table freeing
(CONFIG_HAVE_RCU_TABLE_FREE=y)"). That commit with follups should also be
backported for full safety, although our reproducer didn't hit a problem
without that backport.

Reproduced-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 arch/x86/mm/gup.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 6612d532e42e..6379a4883c0a 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -9,6 +9,7 @@
 #include <linux/vmstat.h>
 #include <linux/highmem.h>
 #include <linux/swap.h>
+#include <linux/pagemap.h>
 
 #include <asm/pgtable.h>
 
@@ -95,10 +96,23 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
 		}
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
-		if (unlikely(!try_get_page(page))) {
+
+		if (WARN_ON_ONCE(page_ref_count(page) < 0)) {
+			pte_unmap(ptep);
+			return 0;
+		}
+
+		if (!page_cache_get_speculative(page)) {
 			pte_unmap(ptep);
 			return 0;
 		}
+
+		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+			put_page(page);
+			pte_unmap(ptep);
+			return 0;
+		}
+
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		(*nr)++;
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest
  2019-12-16 16:08           ` Vitaly Kuznetsov
@ 2019-12-17  4:13             ` Ajay Kaher
  2020-01-31 12:51             ` Ajay Kaher
  1 sibling, 0 replies; 16+ messages in thread
From: Ajay Kaher @ 2019-12-17  4:13 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Vlastimil Babka
  Cc: Peter Zijlstra, gregkh, stable, torvalds, punit.agrawal, akpm,
	kirill.shutemov, willy, will.deacon, mszeredi, linux-mm,
	linux-kernel, Srivatsa Bhat, srivatsa, Alexey Makhalov,
	Srinidhi Rao, Vikash Bansal, Anish Swaminathan,
	Vasavi Sirnapalli, Steven Rostedt, Oscar Salvador,
	Thomas Gleixner, Ingo Molnar, Juergen Gross, Borislav Petkov,
	Dave Hansen, Andy Lutomirski



On 16/12/19, 9:38 PM, "Vitaly Kuznetsov" <vkuznets@redhat.com> wrote:

> (I also dislike receiving only this patch of the series, next time
> please send the whole thing, it's only 8 patches, our mailfolders will
> survive that)

Thanks for pointing this, I will take care. 

> I'm not sure which stable kernel you're targeting (and if you addressed this
> with other patches in the series, if this is needed,...) so JFYI.

This series is for 4.4.y, please refer following link for complete series:
https://lore.kernel.org/stable/1576529149-14269-1-git-send-email-akaher@vmware.com/

Yes, this 'Patch v3 8/8' could be merged separately, if it's unsafe to merge at this movement.  

- Ajay
    
    


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest
  2019-12-16 16:08           ` Vitaly Kuznetsov
  2019-12-17  4:13             ` Ajay Kaher
@ 2020-01-31 12:51             ` Ajay Kaher
  1 sibling, 0 replies; 16+ messages in thread
From: Ajay Kaher @ 2020-01-31 12:51 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Vlastimil Babka, gregkh
  Cc: Peter Zijlstra, stable, torvalds, punit.agrawal, akpm,
	kirill.shutemov, willy, will.deacon, mszeredi, linux-mm,
	linux-kernel, Srivatsa Bhat, srivatsa, Alexey Makhalov,
	Srinidhi Rao, Vikash Bansal, Anish Swaminathan,
	Vasavi Sirnapalli, Steven Rostedt, Oscar Salvador,
	Thomas Gleixner, Ingo Molnar, Juergen Gross, Borislav Petkov,
	Dave Hansen, Andy Lutomirski



On 16/12/19, 9:38 PM, "Vitaly Kuznetsov" <vkuznets@redhat.com> wrote:

>> (I also dislike receiving only this patch of the series, next time
>> please send the whole thing, it's only 8 patches, our mailfolders will
>> survive that)
>
> Thanks for pointing this, I will take care. 
>
>> I'm not sure which stable kernel you're targeting (and if you addressed this
>> with other patches in the series, if this is needed,...) so JFYI.
>
> This series is for 4.4.y, please refer following link for complete series:
> https://lore.kernel.org/stable/1576529149-14269-1-git-send-email-akaher@vmware.com/
>
> Yes, this 'Patch v3 8/8' could be merged separately, if it's unsafe to merge at this movement.  
>
> - Ajay

Greg, last month I posted this series [1] for v4.4.y (including Vlastimil's change).

There were some discussion for [PATCH v3 8/8] [2] and no further discussion once I specified it's for 4.4.y.
Please suggest, if I need to repost this series again.

- Ajay

[1] https://lore.kernel.org/stable/1576529149-14269-1-git-send-email-akaher@vmware.com/
[2] https://lore.kernel.org/stable/1576529149-14269-9-git-send-email-akaher@vmware.com/


    


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-01-31 12:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-16 20:45 [PATCH v3 0/8] Backported fixes for 4.4 stable tree Ajay Kaher
2019-12-16 20:45 ` [PATCH v3 1/8] mm: make page ref count overflow check tighter and more explicit Ajay Kaher
2019-12-16 20:45 ` [PATCH v3 2/8] mm: add 'try_get_page()' helper function Ajay Kaher
2019-12-16 20:45 ` [PATCH v3 3/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages Ajay Kaher
2019-12-16 20:45 ` [PATCH v3 4/8] mm, gup: ensure real head page is ref-counted when using hugepages Ajay Kaher
2019-12-16 20:45 ` [PATCH v3 5/8] mm: prevent get_user_pages() from overflowing page refcount Ajay Kaher
2019-12-16 20:45 ` [PATCH v3 6/8] pipe: add pipe_buf_get() helper Ajay Kaher
2019-12-16 20:45 ` [PATCH v3 7/8] fs: prevent page refcount overflow in pipe_buf_get Ajay Kaher
2019-12-16 20:45 ` [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest Ajay Kaher
2019-12-16 13:04   ` Peter Zijlstra
2019-12-16 13:30     ` Vitaly Kuznetsov
2019-12-16 13:47       ` Peter Zijlstra
2019-12-16 15:11         ` Vlastimil Babka
2019-12-16 16:08           ` Vitaly Kuznetsov
2019-12-17  4:13             ` Ajay Kaher
2020-01-31 12:51             ` Ajay Kaher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).