* [PATCH 1/3] thp: reduce indentation level in change_huge_pmd()
[not found] <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>
@ 2017-12-07 10:22 ` Jack Wang
2017-12-07 12:05 ` Greg Kroah-Hartman
2017-12-07 10:22 ` [PATCH 2/3] thp: fix MADV_DONTNEED vs. numa balancing race Jack Wang
2017-12-07 10:22 ` [PATCH 3/3] mm: drop unused pmdp_huge_get_and_clear_notify() Jack Wang
2 siblings, 1 reply; 6+ messages in thread
From: Jack Wang @ 2017-12-07 10:22 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
Andrew Morton, Linus Torvalds, Jack Wang
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
commit 0a85e51d37645e9ce57e5e1a30859e07810ed07c upstream.
Patch series "thp: fix few MADV_DONTNEED races"
For MADV_DONTNEED to work properly with huge pages, it's critical to not
clear pmd intermittently unless you hold down_write(mmap_sem).
Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
breakage.
See example of such race in commit message of patch 2/4.
All these races are found by code inspection. I haven't seen them
triggered. I don't think it's worth to apply them to stable@.
This patch (of 4):
Restructure code in preparation for a fix.
Link: http://lkml.kernel.org/r/20170302151034.27829-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.4 kernel]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
mm/huge_memory.c | 50 ++++++++++++++++++++++++++------------------------
1 file changed, 26 insertions(+), 24 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8f3769e..ea013cb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1566,35 +1566,37 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
{
struct mm_struct *mm = vma->vm_mm;
spinlock_t *ptl;
+ pmd_t entry;
+ bool preserve_write;
+
int ret = 0;
- if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
- pmd_t entry;
- bool preserve_write = prot_numa && pmd_write(*pmd);
- ret = 1;
+ if (__pmd_trans_huge_lock(pmd, vma, &ptl) != 1)
+ return 0;
- /*
- * Avoid trapping faults against the zero page. The read-only
- * data is likely to be read-cached on the local CPU and
- * local/remote hits to the zero page are not interesting.
- */
- if (prot_numa && is_huge_zero_pmd(*pmd)) {
- spin_unlock(ptl);
- return ret;
- }
+ preserve_write = prot_numa && pmd_write(*pmd);
+ ret = 1;
- if (!prot_numa || !pmd_protnone(*pmd)) {
- entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
- entry = pmd_modify(entry, newprot);
- if (preserve_write)
- entry = pmd_mkwrite(entry);
- ret = HPAGE_PMD_NR;
- set_pmd_at(mm, addr, pmd, entry);
- BUG_ON(!preserve_write && pmd_write(entry));
- }
- spin_unlock(ptl);
- }
+ /*
+ * Avoid trapping faults against the zero page. The read-only
+ * data is likely to be read-cached on the local CPU and
+ * local/remote hits to the zero page are not interesting.
+ */
+ if (prot_numa && is_huge_zero_pmd(*pmd))
+ goto unlock;
+
+ if (prot_numa && pmd_protnone(*pmd))
+ goto unlock;
+ entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
+ entry = pmd_modify(entry, newprot);
+ if (preserve_write)
+ entry = pmd_mkwrite(entry);
+ ret = HPAGE_PMD_NR;
+ set_pmd_at(mm, addr, pmd, entry);
+ BUG_ON(!preserve_write && pmd_write(entry));
+unlock:
+ spin_unlock(ptl);
return ret;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] thp: fix MADV_DONTNEED vs. numa balancing race
[not found] <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>
2017-12-07 10:22 ` [PATCH 1/3] thp: reduce indentation level in change_huge_pmd() Jack Wang
@ 2017-12-07 10:22 ` Jack Wang
2017-12-07 10:22 ` [PATCH 3/3] mm: drop unused pmdp_huge_get_and_clear_notify() Jack Wang
2 siblings, 0 replies; 6+ messages in thread
From: Jack Wang @ 2017-12-07 10:22 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
Andrew Morton, Linus Torvalds, Jack Wang
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
commit ced108037c2aa542b3ed8b7afd1576064ad1362a upstream
In case prot_numa, we are under down_read(mmap_sem). It's critical to
not clear pmd intermittently to avoid race with MADV_DONTNEED which is
also under down_read(mmap_sem):
CPU0: CPU1:
change_huge_pmd(prot_numa=1)
pmdp_huge_get_and_clear_notify()
madvise_dontneed()
zap_pmd_range()
pmd_trans_huge(*pmd) == 0 (without ptl)
// skip the pmd
set_pmd_at();
// pmd is re-established
The race makes MADV_DONTNEED miss the huge pmd and don't clear it
which may break userspace.
Found by code analysis, never saw triggered.
Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.4]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
mm/huge_memory.c | 34 +++++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ea013cb..0127b78 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1588,7 +1588,39 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
if (prot_numa && pmd_protnone(*pmd))
goto unlock;
- entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
+ /*
+ * In case prot_numa, we are under down_read(mmap_sem). It's critical
+ * to not clear pmd intermittently to avoid race with MADV_DONTNEED
+ * which is also under down_read(mmap_sem):
+ *
+ * CPU0: CPU1:
+ * change_huge_pmd(prot_numa=1)
+ * pmdp_huge_get_and_clear_notify()
+ * madvise_dontneed()
+ * zap_pmd_range()
+ * pmd_trans_huge(*pmd) == 0 (without ptl)
+ * // skip the pmd
+ * set_pmd_at();
+ * // pmd is re-established
+ *
+ * The race makes MADV_DONTNEED miss the huge pmd and don't clear it
+ * which may break userspace.
+ *
+ * pmdp_invalidate() is required to make sure we don't miss
+ * dirty/young flags set by hardware.
+ */
+ entry = *pmd;
+ pmdp_invalidate(vma, addr, pmd);
+
+ /*
+ * Recover dirty/young flags. It relies on pmdp_invalidate to not
+ * corrupt them.
+ */
+ if (pmd_dirty(*pmd))
+ entry = pmd_mkdirty(entry);
+ if (pmd_young(*pmd))
+ entry = pmd_mkyoung(entry);
+
entry = pmd_modify(entry, newprot);
if (preserve_write)
entry = pmd_mkwrite(entry);
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] mm: drop unused pmdp_huge_get_and_clear_notify()
[not found] <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>
2017-12-07 10:22 ` [PATCH 1/3] thp: reduce indentation level in change_huge_pmd() Jack Wang
2017-12-07 10:22 ` [PATCH 2/3] thp: fix MADV_DONTNEED vs. numa balancing race Jack Wang
@ 2017-12-07 10:22 ` Jack Wang
2 siblings, 0 replies; 6+ messages in thread
From: Jack Wang @ 2017-12-07 10:22 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Kirill A. Shutemov, Dave Hansen, stable, Andrew Morton,
Linus Torvalds, Jack Wang
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
commit c0c379e2931b05facef538e53bf3b21f283d9a0b upstream
Dave noticed that after fixing MADV_DONTNEED vs numa balancing race the
last pmdp_huge_get_and_clear_notify() user is gone.
Let's drop the helper.
Link: http://lkml.kernel.org/r/20170306112047.24809-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.4]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
include/linux/mmu_notifier.h | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index a1a210d..38c5eb2 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -381,18 +381,6 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
___pmd; \
})
-#define pmdp_huge_get_and_clear_notify(__mm, __haddr, __pmd) \
-({ \
- unsigned long ___haddr = __haddr & HPAGE_PMD_MASK; \
- pmd_t ___pmd; \
- \
- ___pmd = pmdp_huge_get_and_clear(__mm, __haddr, __pmd); \
- mmu_notifier_invalidate_range(__mm, ___haddr, \
- ___haddr + HPAGE_PMD_SIZE); \
- \
- ___pmd; \
-})
-
/*
* set_pte_at_notify() sets the pte _after_ running the notifier.
* This is safe to start by updating the secondary MMUs, because the primary MMU
@@ -475,7 +463,6 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
#define pmdp_clear_young_notify pmdp_test_and_clear_young
#define ptep_clear_flush_notify ptep_clear_flush
#define pmdp_huge_clear_flush_notify pmdp_huge_clear_flush
-#define pmdp_huge_get_and_clear_notify pmdp_huge_get_and_clear
#define set_pte_at_notify set_pte_at
#endif /* CONFIG_MMU_NOTIFIER */
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/3] thp: reduce indentation level in change_huge_pmd()
2017-12-07 10:22 ` [PATCH 1/3] thp: reduce indentation level in change_huge_pmd() Jack Wang
@ 2017-12-07 12:05 ` Greg Kroah-Hartman
2017-12-07 12:25 ` Jinpu Wang
0 siblings, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2017-12-07 12:05 UTC (permalink / raw)
To: Jack Wang
Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
Andrew Morton, Linus Torvalds
On Thu, Dec 07, 2017 at 11:22:50AM +0100, Jack Wang wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> commit 0a85e51d37645e9ce57e5e1a30859e07810ed07c upstream.
>
> Patch series "thp: fix few MADV_DONTNEED races"
>
> For MADV_DONTNEED to work properly with huge pages, it's critical to not
> clear pmd intermittently unless you hold down_write(mmap_sem).
>
> Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
> breakage.
>
> See example of such race in commit message of patch 2/4.
>
> All these races are found by code inspection. I haven't seen them
> triggered. I don't think it's worth to apply them to stable@.
>
> This patch (of 4):
What about all 4 of these?
And should this also go into 4.9? I don't want to include fixes into
4.4-stable without them also being in 4.9-stable, as that would be a
regression for people upgrading.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/3] thp: reduce indentation level in change_huge_pmd()
2017-12-07 12:05 ` Greg Kroah-Hartman
@ 2017-12-07 12:25 ` Jinpu Wang
2017-12-12 8:32 ` Greg Kroah-Hartman
0 siblings, 1 reply; 6+ messages in thread
From: Jinpu Wang @ 2017-12-07 12:25 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
Andrew Morton, Linus Torvalds
On Thu, Dec 7, 2017 at 1:05 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Thu, Dec 07, 2017 at 11:22:50AM +0100, Jack Wang wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>
>> commit 0a85e51d37645e9ce57e5e1a30859e07810ed07c upstream.
>>
>> Patch series "thp: fix few MADV_DONTNEED races"
>>
>> For MADV_DONTNEED to work properly with huge pages, it's critical to not
>> clear pmd intermittently unless you hold down_write(mmap_sem).
>>
>> Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
>> breakage.
>>
>> See example of such race in commit message of patch 2/4.
>>
>> All these races are found by code inspection. I haven't seen them
>> triggered. I don't think it's worth to apply them to stable@.
>>
>> This patch (of 4):
>
> What about all 4 of these?
The 4 patches were from here:
http://lkml.iu.edu/hypermail/linux/kernel/1703.0/01274.html
thp: reduce indentation level in change_huge_pmd()
thp: fix MADV_DONTNEED vs. numa balancing race
thp: fix MADV_DONTNEED vs. MADV_FREE race (the function exists only since 4.5+ )
thp: fix MADV_DONTNEED vs clear soft dirty race (this one already
included since 4.4.63)
>
> And should this also go into 4.9? I don't want to include fixes into
> 4.4-stable without them also being in 4.9-stable, as that would be a
> regression for people upgrading.
In 4.9, the last 2 were already applied, I will try to backport these
3 commits also to 4.9 stable.
>
> thanks,
>
> greg k-h
Thanks,
--
Jack Wang
Linux Kernel Developer
ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/3] thp: reduce indentation level in change_huge_pmd()
2017-12-07 12:25 ` Jinpu Wang
@ 2017-12-12 8:32 ` Greg Kroah-Hartman
0 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2017-12-12 8:32 UTC (permalink / raw)
To: Jinpu Wang
Cc: Kirill A. Shutemov, Andrea Arcangeli, Hillf Danton, stable,
Andrew Morton, Linus Torvalds
On Thu, Dec 07, 2017 at 01:25:19PM +0100, Jinpu Wang wrote:
> On Thu, Dec 7, 2017 at 1:05 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Thu, Dec 07, 2017 at 11:22:50AM +0100, Jack Wang wrote:
> >> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >>
> >> commit 0a85e51d37645e9ce57e5e1a30859e07810ed07c upstream.
> >>
> >> Patch series "thp: fix few MADV_DONTNEED races"
> >>
> >> For MADV_DONTNEED to work properly with huge pages, it's critical to not
> >> clear pmd intermittently unless you hold down_write(mmap_sem).
> >>
> >> Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
> >> breakage.
> >>
> >> See example of such race in commit message of patch 2/4.
> >>
> >> All these races are found by code inspection. I haven't seen them
> >> triggered. I don't think it's worth to apply them to stable@.
> >>
> >> This patch (of 4):
> >
> > What about all 4 of these?
> The 4 patches were from here:
> http://lkml.iu.edu/hypermail/linux/kernel/1703.0/01274.html
>
> thp: reduce indentation level in change_huge_pmd()
> thp: fix MADV_DONTNEED vs. numa balancing race
> thp: fix MADV_DONTNEED vs. MADV_FREE race (the function exists only since 4.5+ )
> thp: fix MADV_DONTNEED vs clear soft dirty race (this one already
> included since 4.4.63)
>
> >
> > And should this also go into 4.9? I don't want to include fixes into
> > 4.4-stable without them also being in 4.9-stable, as that would be a
> > regression for people upgrading.
> In 4.9, the last 2 were already applied, I will try to backport these
> 3 commits also to 4.9 stable.
Thanks for all of these now, all queued up.
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-12-12 8:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>
2017-12-07 10:22 ` [PATCH 1/3] thp: reduce indentation level in change_huge_pmd() Jack Wang
2017-12-07 12:05 ` Greg Kroah-Hartman
2017-12-07 12:25 ` Jinpu Wang
2017-12-12 8:32 ` Greg Kroah-Hartman
2017-12-07 10:22 ` [PATCH 2/3] thp: fix MADV_DONTNEED vs. numa balancing race Jack Wang
2017-12-07 10:22 ` [PATCH 3/3] mm: drop unused pmdp_huge_get_and_clear_notify() Jack Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.