From: Peter Zijlstra <peterz@infradead.org>
To: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
"H. Peter Anvin" <hpa@zytor.com>,
Linux-Next Mailing List <linux-next@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Nadav Amit <namit@vmware.com>,
Linus <torvalds@linux-foundation.org>
Subject: Re: linux-next: manual merge of the akpm-current tree with the tip tree
Date: Fri, 11 Aug 2017 11:34:49 +0200 [thread overview]
Message-ID: <20170811093449.w5wttpulmwfykjzm@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20170811175326.36d546dc@canb.auug.org.au>
[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]
On Fri, Aug 11, 2017 at 05:53:26PM +1000, Stephen Rothwell wrote:
> Hi all,
>
> Today's linux-next merge of the akpm-current tree got conflicts in:
>
> include/linux/mm_types.h
> mm/huge_memory.c
>
> between commit:
>
> 8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
>
> from the tip tree and commits:
>
> 16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
> a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as long as possible"")
>
> from the akpm-current tree.
>
> The latter 2 are now in Linus' tree as well (but were not when I started
> the day).
>
> The only way forward I could see was to revert
>
> 8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
>
> and the three following commits
>
> ff7a5fb0f1d5 ("overlayfs, locking: Remove smp_mb__before_spinlock() usage")
> d89e588ca408 ("locking: Introduce smp_mb__after_spinlock()")
> a9668cd6ee28 ("locking: Remove smp_mb__before_spinlock()")
>
> before merging the akpm-current tree again.
Here's two patches that apply on top of tip.
[-- Attachment #2: nadav_amit-mm-migrate__prevent_racy_access_to_tlb_flush_pending.patch --]
[-- Type: text/x-diff, Size: 4923 bytes --]
Subject: mm: migrate: prevent racy access to tlb_flush_pending
From: Nadav Amit <nadav.amit@gmail.com>
Date: Tue, 1 Aug 2017 17:08:12 -0700
Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work(). If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.
This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending. The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.
An actual data corruption was not observed, yet the race was
was confirmed by adding assertion to check tlb_flush_pending is not set
by two threads, adding artificial latency in change_protection_range()
and using sysctl to reduce kernel.numa_balancing_scan_delay_ms.
Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")
Cc: <akpm@linux-foundation.org>
Cc: CC: <nadav.amit@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com
---
include/linux/mm_types.h | 29 +++++++++++++++++++++--------
kernel/fork.c | 2 +-
mm/debug.c | 2 +-
mm/mprotect.c | 4 ++--
4 files changed, 25 insertions(+), 12 deletions(-)
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -493,7 +493,7 @@ struct mm_struct {
* can move process memory needs to flush the TLB when moving a
* PROT_NONE or PROT_NUMA mapped page.
*/
- bool tlb_flush_pending;
+ atomic_t tlb_flush_pending;
#endif
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
/* See flush_tlb_batched_pending() */
@@ -535,11 +535,17 @@ static inline bool mm_tlb_flush_pending(
* Must be called with PTL held; such that our PTL acquire will have
* observed the store from set_tlb_flush_pending().
*/
- return mm->tlb_flush_pending;
+ return atomic_read(&mm->tlb_flush_pending);
}
-static inline void set_tlb_flush_pending(struct mm_struct *mm)
+
+static inline void init_tlb_flush_pending(struct mm_struct *mm)
{
- mm->tlb_flush_pending = true;
+ atomic_set(&mm->tlb_flush_pending, 0);
+}
+
+static inline void inc_tlb_flush_pending(struct mm_struct *mm)
+{
+ atomic_inc(&mm->tlb_flush_pending);
/*
* The only time this value is relevant is when there are indeed pages
* to flush. And we'll only flush pages after changing them, which
@@ -565,21 +571,28 @@ static inline void set_tlb_flush_pending
* store is constrained by the TLB invalidate.
*/
}
+
/* Clearing is done after a TLB flush, which also provides a barrier. */
-static inline void clear_tlb_flush_pending(struct mm_struct *mm)
+static inline void dec_tlb_flush_pending(struct mm_struct *mm)
{
/* see set_tlb_flush_pending */
- mm->tlb_flush_pending = false;
+ atomic_dec(&mm->tlb_flush_pending);
}
#else
static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
{
return false;
}
-static inline void set_tlb_flush_pending(struct mm_struct *mm)
+
+static inline void init_tlb_flush_pending(struct mm_struct *mm)
{
}
-static inline void clear_tlb_flush_pending(struct mm_struct *mm)
+
+static inline void inc_tlb_flush_pending(struct mm_struct *mm)
+{
+}
+
+static inline void dec_tlb_flush_pending(struct mm_struct *mm)
{
}
#endif
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -809,7 +809,7 @@ static struct mm_struct *mm_init(struct
mm_init_aio(mm);
mm_init_owner(mm, p);
mmu_notifier_mm_init(mm);
- clear_tlb_flush_pending(mm);
+ init_tlb_flush_pending(mm);
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
mm->pmd_huge_pte = NULL;
#endif
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm)
mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
#endif
#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
- mm->tlb_flush_pending,
+ atomic_read(&mm->tlb_flush_pending),
#endif
mm->def_flags, &mm->def_flags
);
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -244,7 +244,7 @@ static unsigned long change_protection_r
BUG_ON(addr >= end);
pgd = pgd_offset(mm, addr);
flush_cache_range(vma, addr, end);
- set_tlb_flush_pending(mm);
+ inc_tlb_flush_pending(mm);
do {
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd))
@@ -256,7 +256,7 @@ static unsigned long change_protection_r
/* Only flush the TLB if we actually modified any entries: */
if (pages)
flush_tlb_range(vma, start, end);
- clear_tlb_flush_pending(mm);
+ dec_tlb_flush_pending(mm);
return pages;
}
[-- Attachment #3: nadav_amit-revert__mm-numa__defer_tlb_flush_for_thp_migration_as_long_as_possible_.patch --]
[-- Type: text/x-diff, Size: 2576 bytes --]
Subject: Revert "mm: numa: defer TLB flush for THP migration as long as possible"
From: Nadav Amit <namit@vmware.com>
Date: Tue, 1 Aug 2017 17:08:14 -0700
While deferring TLB flushes is a good practice, the reverted patch
caused pending TLB flushes to be checked while the page-table lock is
not taken. As a result, in architectures with weak memory model (PPC),
Linux may miss a memory-barrier, miss the fact TLB flushes are pending,
and cause (in theory) a memory corruption.
Since the alternative of using smp_mb__after_unlock_lock() was
considered a bit open-coded, and the performance impact is expected to
be small, the previous patch is reverted.
This reverts commit b0943d61b8fa420180f92f64ef67662b4f6cc493.
Cc: <akpm@linux-foundation.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: CC: <nadav.amit@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Suggested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170802000818.4760-4-namit@vmware.com
---
mm/huge_memory.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1410,7 +1410,6 @@ int do_huge_pmd_numa_page(struct vm_faul
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
int page_nid = -1, this_nid = numa_node_id();
int target_nid, last_cpupid = -1;
- bool need_flush = false;
bool page_locked;
bool migrated = false;
bool was_writable;
@@ -1503,9 +1502,12 @@ int do_huge_pmd_numa_page(struct vm_faul
*
* Must be done under PTL such that we'll observe the relevant
* set_tlb_flush_pending().
+ *
+ * We are not sure a pending tlb flush here is for a huge page
+ * mapping or not. Hence use the tlb range variant
*/
if (mm_tlb_flush_pending(vma->vm_mm))
- need_flush = true;
+ flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
/*
* Migrate the THP to the requested node, returns with page unlocked
@@ -1513,13 +1515,6 @@ int do_huge_pmd_numa_page(struct vm_faul
*/
spin_unlock(vmf->ptl);
- /*
- * We are not sure a pending tlb flush here is for a huge page
- * mapping or not. Hence use the tlb range variant
- */
- if (need_flush)
- flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
-
migrated = migrate_misplaced_transhuge_page(vma->vm_mm, vma,
vmf->pmd, pmd, vmf->address, page, target_nid);
if (migrated) {
next prev parent reply other threads:[~2017-08-11 9:35 UTC|newest]
Thread overview: 112+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-11 7:53 linux-next: manual merge of the akpm-current tree with the tip tree Stephen Rothwell
2017-08-11 9:34 ` Peter Zijlstra [this message]
2017-08-11 10:48 ` Peter Zijlstra
2017-08-11 11:45 ` Stephen Rothwell
2017-08-11 11:56 ` Ingo Molnar
2017-08-11 12:17 ` Peter Zijlstra
2017-08-11 12:44 ` Ingo Molnar
2017-08-11 13:49 ` Stephen Rothwell
2017-08-11 14:04 ` Peter Zijlstra
2017-08-13 6:06 ` Nadav Amit
2017-08-13 12:50 ` Peter Zijlstra
2017-08-14 3:16 ` Minchan Kim
2017-08-14 5:07 ` Nadav Amit
2017-08-14 5:23 ` Minchan Kim
2017-08-14 8:38 ` Minchan Kim
2017-08-14 19:57 ` Peter Zijlstra
2017-08-16 4:14 ` Minchan Kim
2017-08-14 19:38 ` Peter Zijlstra
2017-08-15 7:51 ` Nadav Amit
2017-08-14 3:09 ` Minchan Kim
2017-08-14 18:54 ` Peter Zijlstra
-- strict thread matches above, loose matches on Subject: below --
2022-02-16 5:38 Stephen Rothwell
2021-10-07 6:27 Stephen Rothwell
2021-03-22 6:12 Stephen Rothwell
2020-12-11 8:56 Stephen Rothwell
2020-12-11 12:47 ` Jason Gunthorpe
2020-11-27 7:48 Stephen Rothwell
2020-11-27 7:39 Stephen Rothwell
2020-11-27 11:54 ` Andy Shevchenko
2020-11-30 9:27 ` Thomas Gleixner
2020-11-23 8:05 Stephen Rothwell
2020-11-09 6:00 Stephen Rothwell
2020-10-13 6:59 Stephen Rothwell
2020-07-17 10:19 Stephen Rothwell
2020-05-29 11:05 Stephen Rothwell
2020-05-29 10:18 Stephen Rothwell
2020-05-29 10:05 Stephen Rothwell
2020-05-29 9:58 Stephen Rothwell
2020-05-25 11:04 Stephen Rothwell
2020-05-26 4:41 ` Singh, Balbir
2020-06-03 4:43 ` Stephen Rothwell
2020-05-19 16:18 Stephen Rothwell
2020-03-25 7:48 Stephen Rothwell
2020-03-19 6:42 Stephen Rothwell
2020-01-20 6:37 Stephen Rothwell
2020-01-20 6:30 Stephen Rothwell
2019-10-31 5:43 Stephen Rothwell
2019-06-24 10:24 Stephen Rothwell
2019-05-01 11:10 Stephen Rothwell
2019-01-31 4:31 Stephen Rothwell
2018-08-20 4:32 Stephen Rothwell
2018-08-20 19:52 ` Andrew Morton
2018-03-23 5:59 Stephen Rothwell
2017-12-18 5:04 Stephen Rothwell
2017-11-10 4:33 Stephen Rothwell
2017-11-02 7:19 Stephen Rothwell
2017-08-22 6:57 Stephen Rothwell
2017-08-23 6:39 ` Vlastimil Babka
2017-04-12 6:46 Stephen Rothwell
2017-04-12 20:53 ` Vlastimil Babka
2017-04-20 2:17 ` NeilBrown
2017-03-24 5:25 Stephen Rothwell
2017-02-17 4:40 Stephen Rothwell
2016-11-14 6:08 Stephen Rothwell
2016-07-29 4:14 Stephen Rothwell
2016-06-15 5:23 Stephen Rothwell
2016-06-18 19:39 ` Manfred Spraul
2016-04-29 6:12 Stephen Rothwell
2016-04-29 6:26 ` Ingo Molnar
2016-03-02 5:40 Stephen Rothwell
2016-02-26 5:07 Stephen Rothwell
2016-02-26 21:35 ` Andrew Morton
2016-02-19 4:09 Stephen Rothwell
2016-02-19 15:26 ` Ard Biesheuvel
2015-12-07 8:06 Stephen Rothwell
2015-10-02 4:21 Stephen Rothwell
2015-07-28 6:00 Stephen Rothwell
2015-07-29 17:12 ` Andrea Arcangeli
2015-07-29 17:47 ` Andy Lutomirski
2015-07-29 18:46 ` Thomas Gleixner
2015-07-30 15:38 ` Andrea Arcangeli
2015-07-29 23:06 ` Stephen Rothwell
2015-07-29 23:07 ` Thomas Gleixner
2015-09-07 23:35 ` Stephen Rothwell
2015-09-08 18:11 ` Linus Torvalds
2015-09-08 22:56 ` Stephen Rothwell
2015-09-08 23:03 ` Linus Torvalds
2015-09-08 23:21 ` Andrew Morton
2015-09-16 6:58 ` Geert Uytterhoeven
2015-06-04 12:07 Stephen Rothwell
2015-04-08 8:28 Stephen Rothwell
2015-04-08 8:25 Stephen Rothwell
2014-03-17 9:31 Stephen Rothwell
2014-03-17 9:36 ` Peter Zijlstra
2014-03-19 23:27 ` Andrew Morton
2014-01-14 4:53 Stephen Rothwell
2014-01-14 5:04 ` Davidlohr Bueso
2014-01-14 12:51 ` Peter Zijlstra
2014-01-14 13:17 ` Geert Uytterhoeven
2014-01-14 13:33 ` Peter Zijlstra
2014-01-14 16:19 ` H. Peter Anvin
2014-01-14 15:15 ` H. Peter Anvin
2014-01-14 15:20 ` Geert Uytterhoeven
2014-01-14 15:41 ` Peter Zijlstra
2014-01-14 15:48 ` H. Peter Anvin
2014-01-07 6:00 Stephen Rothwell
2014-01-07 6:34 ` Tang Chen
2013-11-08 7:48 Stephen Rothwell
2013-11-08 18:58 ` Josh Triplett
2013-11-08 23:20 ` Stephen Rothwell
2013-11-09 0:19 ` Josh Triplett
2013-10-30 6:40 Stephen Rothwell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170811093449.w5wttpulmwfykjzm@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=namit@vmware.com \
--cc=sfr@canb.auug.org.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).