* [PATCH v2 0/2] mm: fixes of tlb_flush_pending
@ 2017-07-26 15:02 Nadav Amit
2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
2017-07-26 15:02 ` [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit
0 siblings, 2 replies; 6+ messages in thread
From: Nadav Amit @ 2017-07-26 15:02 UTC (permalink / raw)
To: linux-mm; +Cc: nadav.amit, mgorman, riel, luto, Nadav Amit
These two patches address tlb_flush_pending issues. The first one address a
race when accessing tlb_flush_pending and is the important one.
The second patch addresses Andrew Morton question regarding the barriers. This
patch is not really related to the first one: the atomic operations
atomic_read() and atomic_inc() do not act as a memory barrier, and replacing
existing barriers with smp_mb__after_atomic() did not seem beneficial. Yet,
while reviewing the memory barriers around the use of tlb_flush_pending, few
issues were identified.
v1 -> v2:
- Explain the implications of the implications of the race (Andrew)
- Mark the patch that address the race as stable (Andrew)
- Add another patch to clean the use of barriers (Andrew)
Nadav Amit (2):
mm: migrate: prevent racy access to tlb_flush_pending
mm: migrate: fix barriers around tlb_flush_pending
include/linux/mm_types.h | 26 ++++++++++++++++----------
kernel/fork.c | 2 +-
mm/debug.c | 2 +-
mm/migrate.c | 9 +++++++++
4 files changed, 27 insertions(+), 12 deletions(-)
--
2.11.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending
2017-07-26 15:02 [PATCH v2 0/2] mm: fixes of tlb_flush_pending Nadav Amit
@ 2017-07-26 15:02 ` Nadav Amit
2017-07-27 6:48 ` Mel Gorman
2017-07-29 23:50 ` kbuild test robot
2017-07-26 15:02 ` [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit
1 sibling, 2 replies; 6+ messages in thread
From: Nadav Amit @ 2017-07-26 15:02 UTC (permalink / raw)
To: linux-mm; +Cc: nadav.amit, mgorman, riel, luto, stable, Nadav Amit
From: Nadav Amit <nadav.amit@gmail.com>
Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work(). If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.
This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending. The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.
An actual data corruption was not observed, yet the race was
was confirmed by adding assertion to check tlb_flush_pending is not set
by two threads, adding artificial latency in change_protection_range()
and using sysctl to reduce kernel.numa_balancing_scan_delay_ms.
Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
---
include/linux/mm_types.h | 8 ++++----
kernel/fork.c | 2 +-
mm/debug.c | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 45cdb27791a3..36f4ec589544 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -493,7 +493,7 @@ struct mm_struct {
* can move process memory needs to flush the TLB when moving a
* PROT_NONE or PROT_NUMA mapped page.
*/
- bool tlb_flush_pending;
+ atomic_t tlb_flush_pending;
#endif
struct uprobes_state uprobes_state;
#ifdef CONFIG_HUGETLB_PAGE
@@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
{
barrier();
- return mm->tlb_flush_pending;
+ return atomic_read(&mm->tlb_flush_pending) > 0;
}
static inline void set_tlb_flush_pending(struct mm_struct *mm)
{
- mm->tlb_flush_pending = true;
+ atomic_inc(&mm->tlb_flush_pending);
/*
* Guarantee that the tlb_flush_pending store does not leak into the
@@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
static inline void clear_tlb_flush_pending(struct mm_struct *mm)
{
barrier();
- mm->tlb_flush_pending = false;
+ atomic_dec(&mm->tlb_flush_pending);
}
#else
static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
diff --git a/kernel/fork.c b/kernel/fork.c
index e53770d2bf95..5a7ecfbb7420 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -809,7 +809,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm_init_aio(mm);
mm_init_owner(mm, p);
mmu_notifier_mm_init(mm);
- clear_tlb_flush_pending(mm);
+ atomic_set(&mm->tlb_flush_pending, 0);
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
mm->pmd_huge_pte = NULL;
#endif
diff --git a/mm/debug.c b/mm/debug.c
index db1cd26d8752..d70103bb4731 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm)
mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
#endif
#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
- mm->tlb_flush_pending,
+ atomic_read(&mm->tlb_flush_pending),
#endif
mm->def_flags, &mm->def_flags
);
--
2.11.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending
2017-07-26 15:02 [PATCH v2 0/2] mm: fixes of tlb_flush_pending Nadav Amit
2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
@ 2017-07-26 15:02 ` Nadav Amit
2017-07-27 0:04 ` Minchan Kim
1 sibling, 1 reply; 6+ messages in thread
From: Nadav Amit @ 2017-07-26 15:02 UTC (permalink / raw)
To: linux-mm; +Cc: nadav.amit, mgorman, riel, luto, Nadav Amit
Reading tlb_flush_pending while the page-table lock is taken does not
require a barrier, since the lock/unlock already acts as a barrier.
Removing the barrier in mm_tlb_flush_pending() to address this issue.
However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending()
while the page-table lock is already released, which may present a
problem on architectures with weak memory model (PPC). Use
smp_mb__after_unlock_lock() in that case.
Signed-off-by: Nadav Amit <namit@vmware.com>
---
include/linux/mm_types.h | 18 ++++++++++++------
mm/migrate.c | 9 +++++++++
2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 36f4ec589544..312eec5690d4 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -522,12 +522,12 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
/*
* Memory barriers to keep this state in sync are graciously provided by
* the page table locks, outside of which no page table modifications happen.
- * The barriers below prevent the compiler from re-ordering the instructions
- * around the memory barriers that are already present in the code.
+ * The barriers are used to ensure the order between tlb_flush_pending updates,
+ * which happen while the lock is not taken, and the PTE updates, which happen
+ * while the lock is taken, are serialized.
*/
static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
{
- barrier();
return atomic_read(&mm->tlb_flush_pending) > 0;
}
static inline void set_tlb_flush_pending(struct mm_struct *mm)
@@ -535,15 +535,21 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
atomic_inc(&mm->tlb_flush_pending);
/*
- * Guarantee that the tlb_flush_pending store does not leak into the
+ * Guarantee that the tlb_flush_pending increase does not leak into the
* critical section updating the page tables
*/
smp_mb__before_spinlock();
}
-/* Clearing is done after a TLB flush, which also provides a barrier. */
+
static inline void clear_tlb_flush_pending(struct mm_struct *mm)
{
- barrier();
+ /*
+ * Guarantee that the tlb_flush_pending does not not leak into the
+ * critical section, since we must order the PTE change and changes to
+ * the pending TLB flush indication. We could have relied on TLB flush
+ * as a memory barrier, but this behavior is not clearly documented.
+ */
+ smp_mb__before_atomic();
atomic_dec(&mm->tlb_flush_pending);
}
#else
diff --git a/mm/migrate.c b/mm/migrate.c
index 89a0a1707f4c..85c7134d70cc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1935,6 +1935,15 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
put_page(new_page);
goto out_fail;
}
+
+ /*
+ * mm_tlb_flush_pending() is safe if it is executed while the page-table
+ * lock is taken. But here, it is executed while the page-table lock is
+ * already released. This requires a full memory barrier on
+ * architectures with weak memory models.
+ */
+ smp_mb__after_unlock_lock();
+
/*
* We are not sure a pending tlb flush here is for a huge page
* mapping or not. Hence use the tlb range variant
--
2.11.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending
2017-07-26 15:02 ` [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit
@ 2017-07-27 0:04 ` Minchan Kim
0 siblings, 0 replies; 6+ messages in thread
From: Minchan Kim @ 2017-07-27 0:04 UTC (permalink / raw)
To: Nadav Amit; +Cc: linux-mm, nadav.amit, mgorman, riel, luto
On Wed, Jul 26, 2017 at 08:02:14AM -0700, Nadav Amit wrote:
> Reading tlb_flush_pending while the page-table lock is taken does not
> require a barrier, since the lock/unlock already acts as a barrier.
> Removing the barrier in mm_tlb_flush_pending() to address this issue.
>
> However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending()
> while the page-table lock is already released, which may present a
> problem on architectures with weak memory model (PPC). Use
> smp_mb__after_unlock_lock() in that case.
>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
> include/linux/mm_types.h | 18 ++++++++++++------
> mm/migrate.c | 9 +++++++++
> 2 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 36f4ec589544..312eec5690d4 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -522,12 +522,12 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
> /*
> * Memory barriers to keep this state in sync are graciously provided by
> * the page table locks, outside of which no page table modifications happen.
> - * The barriers below prevent the compiler from re-ordering the instructions
> - * around the memory barriers that are already present in the code.
> + * The barriers are used to ensure the order between tlb_flush_pending updates,
> + * which happen while the lock is not taken, and the PTE updates, which happen
> + * while the lock is taken, are serialized.
> */
> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
> {
> - barrier();
> return atomic_read(&mm->tlb_flush_pending) > 0;
> }
> static inline void set_tlb_flush_pending(struct mm_struct *mm)
> @@ -535,15 +535,21 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
> atomic_inc(&mm->tlb_flush_pending);
>
> /*
> - * Guarantee that the tlb_flush_pending store does not leak into the
> + * Guarantee that the tlb_flush_pending increase does not leak into the
> * critical section updating the page tables
> */
> smp_mb__before_spinlock();
> }
> -/* Clearing is done after a TLB flush, which also provides a barrier. */
> +
> static inline void clear_tlb_flush_pending(struct mm_struct *mm)
> {
> - barrier();
> + /*
> + * Guarantee that the tlb_flush_pending does not not leak into the
> + * critical section, since we must order the PTE change and changes to
> + * the pending TLB flush indication. We could have relied on TLB flush
> + * as a memory barrier, but this behavior is not clearly documented.
> + */
> + smp_mb__before_atomic();
> atomic_dec(&mm->tlb_flush_pending);
> }
> #else
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 89a0a1707f4c..85c7134d70cc 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1935,6 +1935,15 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
> put_page(new_page);
> goto out_fail;
> }
> +
> + /*
> + * mm_tlb_flush_pending() is safe if it is executed while the page-table
> + * lock is taken. But here, it is executed while the page-table lock is
> + * already released. This requires a full memory barrier on
> + * architectures with weak memory models.
> + */
> + smp_mb__after_unlock_lock();
> +
As you saw my work, I will use mm_tlb_flush_pending in tlb_finish_mmu where
page-table lock is already released. So, I should use same comment/barrier
in there, too.
Like that, mm_tlb_flush_pending user should be aware of whether he is
calling the mm_tlb_flush_pending inside of pte lock or not.
I think it would be better to say about it as function interface.
IOW,
bool mm_tlb_flush_pending(bool pte_locked)
Otherwise, at least, I hope comment you wrote in here should be in
mm_tlb_flush_pending for users to catch it up.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending
2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
@ 2017-07-27 6:48 ` Mel Gorman
2017-07-29 23:50 ` kbuild test robot
1 sibling, 0 replies; 6+ messages in thread
From: Mel Gorman @ 2017-07-27 6:48 UTC (permalink / raw)
To: Nadav Amit; +Cc: linux-mm, nadav.amit, riel, luto, stable
On Wed, Jul 26, 2017 at 08:02:13AM -0700, Nadav Amit wrote:
> From: Nadav Amit <nadav.amit@gmail.com>
>
> Setting and clearing mm->tlb_flush_pending can be performed by multiple
> threads, since mmap_sem may only be acquired for read in
> task_numa_work(). If this happens, tlb_flush_pending might be cleared
> while one of the threads still changes PTEs and batches TLB flushes.
>
> This can lead to the same race between migration and
> change_protection_range() that led to the introduction of
> tlb_flush_pending. The result of this race was data corruption, which
> means that this patch also addresses a theoretically possible data
> corruption.
>
> An actual data corruption was not observed, yet the race was
> was confirmed by adding assertion to check tlb_flush_pending is not set
> by two threads, adding artificial latency in change_protection_range()
> and using sysctl to reduce kernel.numa_balancing_scan_delay_ms.
>
> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
> change_protection_range")
>
> Cc: stable@vger.kernel.org
>
> Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@suse.de>
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending
2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
2017-07-27 6:48 ` Mel Gorman
@ 2017-07-29 23:50 ` kbuild test robot
1 sibling, 0 replies; 6+ messages in thread
From: kbuild test robot @ 2017-07-29 23:50 UTC (permalink / raw)
To: Nadav Amit; +Cc: kbuild-all, linux-mm, nadav.amit, mgorman, riel, luto, stable
[-- Attachment #1: Type: text/plain, Size: 4595 bytes --]
Hi Nadav,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc2 next-20170728]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Nadav-Amit/mm-fixes-of-tlb_flush_pending/20170728-034608
config: blackfin-BF537-STAMP_defconfig (attached as .config)
compiler: bfin-uclinux-gcc (GCC) 6.2.0
reproduce:
wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=blackfin
All errors (new ones prefixed by >>):
In file included from include/asm-generic/bug.h:4:0,
from arch/blackfin/include/asm/bug.h:71,
from include/linux/bug.h:4,
from include/linux/mmdebug.h:4,
from include/linux/gfp.h:4,
from include/linux/slab.h:14,
from kernel/fork.c:14:
kernel/fork.c: In function 'mm_init':
>> kernel/fork.c:810:16: error: 'struct mm_struct' has no member named 'tlb_flush_pending'
atomic_set(&mm->tlb_flush_pending, 0);
^
include/linux/compiler.h:329:17: note: in definition of macro 'WRITE_ONCE'
union { typeof(x) __val; char __c[1]; } __u = \
^
kernel/fork.c:810:2: note: in expansion of macro 'atomic_set'
atomic_set(&mm->tlb_flush_pending, 0);
^~~~~~~~~~
>> kernel/fork.c:810:16: error: 'struct mm_struct' has no member named 'tlb_flush_pending'
atomic_set(&mm->tlb_flush_pending, 0);
^
include/linux/compiler.h:330:30: note: in definition of macro 'WRITE_ONCE'
{ .__val = (__force typeof(x)) (val) }; \
^
kernel/fork.c:810:2: note: in expansion of macro 'atomic_set'
atomic_set(&mm->tlb_flush_pending, 0);
^~~~~~~~~~
>> kernel/fork.c:810:16: error: 'struct mm_struct' has no member named 'tlb_flush_pending'
atomic_set(&mm->tlb_flush_pending, 0);
^
include/linux/compiler.h:331:22: note: in definition of macro 'WRITE_ONCE'
__write_once_size(&(x), __u.__c, sizeof(x)); \
^
kernel/fork.c:810:2: note: in expansion of macro 'atomic_set'
atomic_set(&mm->tlb_flush_pending, 0);
^~~~~~~~~~
>> kernel/fork.c:810:16: error: 'struct mm_struct' has no member named 'tlb_flush_pending'
atomic_set(&mm->tlb_flush_pending, 0);
^
include/linux/compiler.h:331:42: note: in definition of macro 'WRITE_ONCE'
__write_once_size(&(x), __u.__c, sizeof(x)); \
^
kernel/fork.c:810:2: note: in expansion of macro 'atomic_set'
atomic_set(&mm->tlb_flush_pending, 0);
^~~~~~~~~~
vim +810 kernel/fork.c
787
788 static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
789 struct user_namespace *user_ns)
790 {
791 mm->mmap = NULL;
792 mm->mm_rb = RB_ROOT;
793 mm->vmacache_seqnum = 0;
794 atomic_set(&mm->mm_users, 1);
795 atomic_set(&mm->mm_count, 1);
796 init_rwsem(&mm->mmap_sem);
797 INIT_LIST_HEAD(&mm->mmlist);
798 mm->core_state = NULL;
799 atomic_long_set(&mm->nr_ptes, 0);
800 mm_nr_pmds_init(mm);
801 mm->map_count = 0;
802 mm->locked_vm = 0;
803 mm->pinned_vm = 0;
804 memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
805 spin_lock_init(&mm->page_table_lock);
806 mm_init_cpumask(mm);
807 mm_init_aio(mm);
808 mm_init_owner(mm, p);
809 mmu_notifier_mm_init(mm);
> 810 atomic_set(&mm->tlb_flush_pending, 0);
811 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
812 mm->pmd_huge_pte = NULL;
813 #endif
814
815 if (current->mm) {
816 mm->flags = current->mm->flags & MMF_INIT_MASK;
817 mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
818 } else {
819 mm->flags = default_dump_filter;
820 mm->def_flags = 0;
821 }
822
823 if (mm_alloc_pgd(mm))
824 goto fail_nopgd;
825
826 if (init_new_context(p, mm))
827 goto fail_nocontext;
828
829 mm->user_ns = get_user_ns(user_ns);
830 return mm;
831
832 fail_nocontext:
833 mm_free_pgd(mm);
834 fail_nopgd:
835 free_mm(mm);
836 return NULL;
837 }
838
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 14145 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-07-29 23:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-26 15:02 [PATCH v2 0/2] mm: fixes of tlb_flush_pending Nadav Amit
2017-07-26 15:02 ` [PATCH v2 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
2017-07-27 6:48 ` Mel Gorman
2017-07-29 23:50 ` kbuild test robot
2017-07-26 15:02 ` [PATCH v2 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit
2017-07-27 0:04 ` Minchan Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).