All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] thp: reduce indentation level in change_huge_pmd()
       [not found] <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>
@ 2017-12-07 10:22 ` Jack Wang
  2017-12-07 12:05   ` Greg Kroah-Hartman
  2017-12-07 10:22 ` [PATCH 2/3] thp: fix MADV_DONTNEED vs. numa balancing race Jack Wang
  2017-12-07 10:22 ` [PATCH 3/3] mm: drop unused pmdp_huge_get_and_clear_notify() Jack Wang
  2 siblings, 1 reply; 6+ messages in thread
From: Jack Wang @ 2017-12-07 10:22 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
	Andrew Morton, Linus Torvalds, Jack Wang

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

commit 0a85e51d37645e9ce57e5e1a30859e07810ed07c upstream.

Patch series "thp: fix few MADV_DONTNEED races"

For MADV_DONTNEED to work properly with huge pages, it's critical to not
clear pmd intermittently unless you hold down_write(mmap_sem).

Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
breakage.

See example of such race in commit message of patch 2/4.

All these races are found by code inspection.  I haven't seen them
triggered.  I don't think it's worth to apply them to stable@.

This patch (of 4):

Restructure code in preparation for a fix.

Link: http://lkml.kernel.org/r/20170302151034.27829-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.4 kernel]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 mm/huge_memory.c | 50 ++++++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8f3769e..ea013cb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1566,35 +1566,37 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	struct mm_struct *mm = vma->vm_mm;
 	spinlock_t *ptl;
+	pmd_t entry;
+	bool preserve_write;
+
 	int ret = 0;
 
-	if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
-		pmd_t entry;
-		bool preserve_write = prot_numa && pmd_write(*pmd);
-		ret = 1;
+	if (__pmd_trans_huge_lock(pmd, vma, &ptl) != 1)
+		return 0;
 
-		/*
-		 * Avoid trapping faults against the zero page. The read-only
-		 * data is likely to be read-cached on the local CPU and
-		 * local/remote hits to the zero page are not interesting.
-		 */
-		if (prot_numa && is_huge_zero_pmd(*pmd)) {
-			spin_unlock(ptl);
-			return ret;
-		}
+	preserve_write = prot_numa && pmd_write(*pmd);
+	ret = 1;
 
-		if (!prot_numa || !pmd_protnone(*pmd)) {
-			entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
-			entry = pmd_modify(entry, newprot);
-			if (preserve_write)
-				entry = pmd_mkwrite(entry);
-			ret = HPAGE_PMD_NR;
-			set_pmd_at(mm, addr, pmd, entry);
-			BUG_ON(!preserve_write && pmd_write(entry));
-		}
-		spin_unlock(ptl);
-	}
+	/*
+	 * Avoid trapping faults against the zero page. The read-only
+	 * data is likely to be read-cached on the local CPU and
+	 * local/remote hits to the zero page are not interesting.
+	 */
+	if (prot_numa && is_huge_zero_pmd(*pmd))
+		goto unlock;
+
+	if (prot_numa && pmd_protnone(*pmd))
+		goto unlock;
 
+	entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
+	entry = pmd_modify(entry, newprot);
+	if (preserve_write)
+		entry = pmd_mkwrite(entry);
+	ret = HPAGE_PMD_NR;
+	set_pmd_at(mm, addr, pmd, entry);
+	BUG_ON(!preserve_write && pmd_write(entry));
+unlock:
+	spin_unlock(ptl);
 	return ret;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] thp: fix MADV_DONTNEED vs. numa balancing race
       [not found] <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>
  2017-12-07 10:22 ` [PATCH 1/3] thp: reduce indentation level in change_huge_pmd() Jack Wang
@ 2017-12-07 10:22 ` Jack Wang
  2017-12-07 10:22 ` [PATCH 3/3] mm: drop unused pmdp_huge_get_and_clear_notify() Jack Wang
  2 siblings, 0 replies; 6+ messages in thread
From: Jack Wang @ 2017-12-07 10:22 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
	Andrew Morton, Linus Torvalds, Jack Wang

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

commit ced108037c2aa542b3ed8b7afd1576064ad1362a upstream

In case prot_numa, we are under down_read(mmap_sem).  It's critical to
not clear pmd intermittently to avoid race with MADV_DONTNEED which is
also under down_read(mmap_sem):

	CPU0:				CPU1:
				change_huge_pmd(prot_numa=1)
				 pmdp_huge_get_and_clear_notify()
madvise_dontneed()
 zap_pmd_range()
  pmd_trans_huge(*pmd) == 0 (without ptl)
  // skip the pmd
				 set_pmd_at();
				 // pmd is re-established

The race makes MADV_DONTNEED miss the huge pmd and don't clear it
which may break userspace.

Found by code analysis, never saw triggered.

Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.4]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 mm/huge_memory.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ea013cb..0127b78 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1588,7 +1588,39 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	if (prot_numa && pmd_protnone(*pmd))
 		goto unlock;
 
-	entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
+	/*
+	 * In case prot_numa, we are under down_read(mmap_sem). It's critical
+	 * to not clear pmd intermittently to avoid race with MADV_DONTNEED
+	 * which is also under down_read(mmap_sem):
+	 *
+	 *	CPU0:				CPU1:
+	 *				change_huge_pmd(prot_numa=1)
+	 *				 pmdp_huge_get_and_clear_notify()
+	 * madvise_dontneed()
+	 *  zap_pmd_range()
+	 *   pmd_trans_huge(*pmd) == 0 (without ptl)
+	 *   // skip the pmd
+	 *				 set_pmd_at();
+	 *				 // pmd is re-established
+	 *
+	 * The race makes MADV_DONTNEED miss the huge pmd and don't clear it
+	 * which may break userspace.
+	 *
+	 * pmdp_invalidate() is required to make sure we don't miss
+	 * dirty/young flags set by hardware.
+	 */
+	entry = *pmd;
+	pmdp_invalidate(vma, addr, pmd);
+
+	/*
+	 * Recover dirty/young flags.  It relies on pmdp_invalidate to not
+	 * corrupt them.
+	 */
+	if (pmd_dirty(*pmd))
+		entry = pmd_mkdirty(entry);
+	if (pmd_young(*pmd))
+		entry = pmd_mkyoung(entry);
+
 	entry = pmd_modify(entry, newprot);
 	if (preserve_write)
 		entry = pmd_mkwrite(entry);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] mm: drop unused pmdp_huge_get_and_clear_notify()
       [not found] <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>
  2017-12-07 10:22 ` [PATCH 1/3] thp: reduce indentation level in change_huge_pmd() Jack Wang
  2017-12-07 10:22 ` [PATCH 2/3] thp: fix MADV_DONTNEED vs. numa balancing race Jack Wang
@ 2017-12-07 10:22 ` Jack Wang
  2 siblings, 0 replies; 6+ messages in thread
From: Jack Wang @ 2017-12-07 10:22 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kirill A. Shutemov, Dave Hansen, stable, Andrew Morton,
	Linus Torvalds, Jack Wang

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

commit c0c379e2931b05facef538e53bf3b21f283d9a0b upstream

Dave noticed that after fixing MADV_DONTNEED vs numa balancing race the
last pmdp_huge_get_and_clear_notify() user is gone.

Let's drop the helper.

Link: http://lkml.kernel.org/r/20170306112047.24809-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.4]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 include/linux/mmu_notifier.h | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index a1a210d..38c5eb2 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -381,18 +381,6 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 	___pmd;								\
 })
 
-#define pmdp_huge_get_and_clear_notify(__mm, __haddr, __pmd)		\
-({									\
-	unsigned long ___haddr = __haddr & HPAGE_PMD_MASK;		\
-	pmd_t ___pmd;							\
-									\
-	___pmd = pmdp_huge_get_and_clear(__mm, __haddr, __pmd);		\
-	mmu_notifier_invalidate_range(__mm, ___haddr,			\
-				      ___haddr + HPAGE_PMD_SIZE);	\
-									\
-	___pmd;								\
-})
-
 /*
  * set_pte_at_notify() sets the pte _after_ running the notifier.
  * This is safe to start by updating the secondary MMUs, because the primary MMU
@@ -475,7 +463,6 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 #define pmdp_clear_young_notify pmdp_test_and_clear_young
 #define	ptep_clear_flush_notify ptep_clear_flush
 #define pmdp_huge_clear_flush_notify pmdp_huge_clear_flush
-#define pmdp_huge_get_and_clear_notify pmdp_huge_get_and_clear
 #define set_pte_at_notify set_pte_at
 
 #endif /* CONFIG_MMU_NOTIFIER */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] thp: reduce indentation level in change_huge_pmd()
  2017-12-07 10:22 ` [PATCH 1/3] thp: reduce indentation level in change_huge_pmd() Jack Wang
@ 2017-12-07 12:05   ` Greg Kroah-Hartman
  2017-12-07 12:25     ` Jinpu Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2017-12-07 12:05 UTC (permalink / raw)
  To: Jack Wang
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
	Andrew Morton, Linus Torvalds

On Thu, Dec 07, 2017 at 11:22:50AM +0100, Jack Wang wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> commit 0a85e51d37645e9ce57e5e1a30859e07810ed07c upstream.
> 
> Patch series "thp: fix few MADV_DONTNEED races"
> 
> For MADV_DONTNEED to work properly with huge pages, it's critical to not
> clear pmd intermittently unless you hold down_write(mmap_sem).
> 
> Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
> breakage.
> 
> See example of such race in commit message of patch 2/4.
> 
> All these races are found by code inspection.  I haven't seen them
> triggered.  I don't think it's worth to apply them to stable@.
> 
> This patch (of 4):

What about all 4 of these?

And should this also go into 4.9?  I don't want to include fixes into
4.4-stable without them also being in 4.9-stable, as that would be a
regression for people upgrading.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] thp: reduce indentation level in change_huge_pmd()
  2017-12-07 12:05   ` Greg Kroah-Hartman
@ 2017-12-07 12:25     ` Jinpu Wang
  2017-12-12  8:32       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 6+ messages in thread
From: Jinpu Wang @ 2017-12-07 12:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
	Andrew Morton, Linus Torvalds

On Thu, Dec 7, 2017 at 1:05 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Thu, Dec 07, 2017 at 11:22:50AM +0100, Jack Wang wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>
>> commit 0a85e51d37645e9ce57e5e1a30859e07810ed07c upstream.
>>
>> Patch series "thp: fix few MADV_DONTNEED races"
>>
>> For MADV_DONTNEED to work properly with huge pages, it's critical to not
>> clear pmd intermittently unless you hold down_write(mmap_sem).
>>
>> Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
>> breakage.
>>
>> See example of such race in commit message of patch 2/4.
>>
>> All these races are found by code inspection.  I haven't seen them
>> triggered.  I don't think it's worth to apply them to stable@.
>>
>> This patch (of 4):
>
> What about all 4 of these?
The 4 patches were from here:
http://lkml.iu.edu/hypermail/linux/kernel/1703.0/01274.html

thp: reduce indentation level in change_huge_pmd()
thp: fix MADV_DONTNEED vs. numa balancing race
thp: fix MADV_DONTNEED vs. MADV_FREE race (the function exists only since 4.5+ )
thp: fix MADV_DONTNEED vs clear soft dirty race (this one already
included since 4.4.63)

>
> And should this also go into 4.9?  I don't want to include fixes into
> 4.4-stable without them also being in 4.9-stable, as that would be a
> regression for people upgrading.
In 4.9, the last 2 were already applied, I will try to backport these
3 commits also to 4.9 stable.


>
> thanks,
>
> greg k-h

Thanks,

-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] thp: reduce indentation level in change_huge_pmd()
  2017-12-07 12:25     ` Jinpu Wang
@ 2017-12-12  8:32       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2017-12-12  8:32 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
	Andrew Morton, Linus Torvalds

On Thu, Dec 07, 2017 at 01:25:19PM +0100, Jinpu Wang wrote:
> On Thu, Dec 7, 2017 at 1:05 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Thu, Dec 07, 2017 at 11:22:50AM +0100, Jack Wang wrote:
> >> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >>
> >> commit 0a85e51d37645e9ce57e5e1a30859e07810ed07c upstream.
> >>
> >> Patch series "thp: fix few MADV_DONTNEED races"
> >>
> >> For MADV_DONTNEED to work properly with huge pages, it's critical to not
> >> clear pmd intermittently unless you hold down_write(mmap_sem).
> >>
> >> Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
> >> breakage.
> >>
> >> See example of such race in commit message of patch 2/4.
> >>
> >> All these races are found by code inspection.  I haven't seen them
> >> triggered.  I don't think it's worth to apply them to stable@.
> >>
> >> This patch (of 4):
> >
> > What about all 4 of these?
> The 4 patches were from here:
> http://lkml.iu.edu/hypermail/linux/kernel/1703.0/01274.html
> 
> thp: reduce indentation level in change_huge_pmd()
> thp: fix MADV_DONTNEED vs. numa balancing race
> thp: fix MADV_DONTNEED vs. MADV_FREE race (the function exists only since 4.5+ )
> thp: fix MADV_DONTNEED vs clear soft dirty race (this one already
> included since 4.4.63)
> 
> >
> > And should this also go into 4.9?  I don't want to include fixes into
> > 4.4-stable without them also being in 4.9-stable, as that would be a
> > regression for people upgrading.
> In 4.9, the last 2 were already applied, I will try to backport these
> 3 commits also to 4.9 stable.

Thanks for all of these now, all queued up.

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-12-12  8:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>
2017-12-07 10:22 ` [PATCH 1/3] thp: reduce indentation level in change_huge_pmd() Jack Wang
2017-12-07 12:05   ` Greg Kroah-Hartman
2017-12-07 12:25     ` Jinpu Wang
2017-12-12  8:32       ` Greg Kroah-Hartman
2017-12-07 10:22 ` [PATCH 2/3] thp: fix MADV_DONTNEED vs. numa balancing race Jack Wang
2017-12-07 10:22 ` [PATCH 3/3] mm: drop unused pmdp_huge_get_and_clear_notify() Jack Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.