linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] mm: fixes of tlb_flush_pending races
@ 2017-07-27 11:40 Nadav Amit
  2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
  2017-07-27 11:40 ` [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit
  0 siblings, 2 replies; 9+ messages in thread
From: Nadav Amit @ 2017-07-27 11:40 UTC (permalink / raw)
  To: linux-mm
  Cc: sergey.senozhatsky, minchan, nadav.amit, mgorman, riel, luto, Nadav Amit

These two patches address tlb_flush_pending issues. The first one address a
race when accessing tlb_flush_pending and is the important one.

The second patch addresses Andrew Morton question regarding the barriers. This
patch is not really related to the first one: the atomic operations
atomic_read() and atomic_inc() do not act as a memory barrier, and replacing
existing barriers with smp_mb__after_atomic() did not seem beneficial. Yet,
while reviewing the memory barriers around the use of tlb_flush_pending, few
issues were identified.


v2 -> v3:
 - Do not init tlb_flush_pending if it is not defined without (Sergey)
 - Internalize memory barriers to mm_tlb_flush_pending (Minchan) 

v1 -> v2:
 - Explain the implications of the implications of the race (Andrew)
 - Mark the patch that address the race as stable (Andrew)
 - Add another patch to clean the use of barriers (Andrew)


Nadav Amit (2):
  mm: migrate: prevent racy access to tlb_flush_pending
  mm: migrate: fix barriers around tlb_flush_pending

 arch/arm/include/asm/pgtable.h   |  3 ++-
 arch/arm64/include/asm/pgtable.h |  3 ++-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/mm_types.h         | 39 +++++++++++++++++++++++++++------------
 kernel/fork.c                    |  4 +++-
 mm/debug.c                       |  2 +-
 mm/migrate.c                     |  2 +-
 7 files changed, 37 insertions(+), 18 deletions(-)

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending
  2017-07-27 11:40 [PATCH v3 0/2] mm: fixes of tlb_flush_pending races Nadav Amit
@ 2017-07-27 11:40 ` Nadav Amit
  2017-07-28  1:34   ` Sergey Senozhatsky
                     ` (2 more replies)
  2017-07-27 11:40 ` [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit
  1 sibling, 3 replies; 9+ messages in thread
From: Nadav Amit @ 2017-07-27 11:40 UTC (permalink / raw)
  To: linux-mm
  Cc: sergey.senozhatsky, minchan, nadav.amit, mgorman, riel, luto,
	stable, Nadav Amit

From: Nadav Amit <nadav.amit@gmail.com>

Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work(). If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.

This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending. The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.

An actual data corruption was not observed, yet the race was
was confirmed by adding assertion to check tlb_flush_pending is not set
by two threads, adding artificial latency in change_protection_range()
and using sysctl to reduce kernel.numa_balancing_scan_delay_ms.

Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")

Cc: stable@vger.kernel.org

Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/mm_types.h | 8 ++++----
 kernel/fork.c            | 4 +++-
 mm/debug.c               | 2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 45cdb27791a3..36f4ec589544 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -493,7 +493,7 @@ struct mm_struct {
 	 * can move process memory needs to flush the TLB when moving a
 	 * PROT_NONE or PROT_NUMA mapped page.
 	 */
-	bool tlb_flush_pending;
+	atomic_t tlb_flush_pending;
 #endif
 	struct uprobes_state uprobes_state;
 #ifdef CONFIG_HUGETLB_PAGE
@@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
 static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
 {
 	barrier();
-	return mm->tlb_flush_pending;
+	return atomic_read(&mm->tlb_flush_pending) > 0;
 }
 static inline void set_tlb_flush_pending(struct mm_struct *mm)
 {
-	mm->tlb_flush_pending = true;
+	atomic_inc(&mm->tlb_flush_pending);
 
 	/*
 	 * Guarantee that the tlb_flush_pending store does not leak into the
@@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
 static inline void clear_tlb_flush_pending(struct mm_struct *mm)
 {
 	barrier();
-	mm->tlb_flush_pending = false;
+	atomic_dec(&mm->tlb_flush_pending);
 }
 #else
 static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
diff --git a/kernel/fork.c b/kernel/fork.c
index e53770d2bf95..d6bf35b1cf31 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -809,7 +809,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
 	mmu_notifier_mm_init(mm);
-	clear_tlb_flush_pending(mm);
+#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
+	atomic_set(&mm->tlb_flush_pending, 0);
+#endif
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
 	mm->pmd_huge_pte = NULL;
 #endif
diff --git a/mm/debug.c b/mm/debug.c
index db1cd26d8752..d70103bb4731 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm)
 		mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
 #endif
 #if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
-		mm->tlb_flush_pending,
+		atomic_read(&mm->tlb_flush_pending),
 #endif
 		mm->def_flags, &mm->def_flags
 	);
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending
  2017-07-27 11:40 [PATCH v3 0/2] mm: fixes of tlb_flush_pending races Nadav Amit
  2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
@ 2017-07-27 11:40 ` Nadav Amit
  2017-07-28  7:42   ` Mel Gorman
  1 sibling, 1 reply; 9+ messages in thread
From: Nadav Amit @ 2017-07-27 11:40 UTC (permalink / raw)
  To: linux-mm
  Cc: sergey.senozhatsky, minchan, nadav.amit, mgorman, riel, luto, Nadav Amit

Reading tlb_flush_pending while the page-table lock is taken does not
require a barrier, since the lock/unlock already acts as a barrier.
Removing the barrier in mm_tlb_flush_pending() to address this issue.

However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending()
while the page-table lock is already released, which may present a
problem on architectures with weak memory model (PPC). To deal with this
case, a new parameter is added to mm_tlb_flush_pending() to indicate
if it is read without the page-table lock taken, and calling
smp_mb__after_unlock_lock() in this case.

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/arm/include/asm/pgtable.h   |  3 ++-
 arch/arm64/include/asm/pgtable.h |  3 ++-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/mm_types.h         | 31 +++++++++++++++++++++++--------
 mm/migrate.c                     |  2 +-
 5 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 1c462381c225..2e0608a8049d 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -223,7 +223,8 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 #define pte_none(pte)		(!pte_val(pte))
 #define pte_present(pte)	(pte_isset((pte), L_PTE_PRESENT))
 #define pte_valid(pte)		(pte_isset((pte), L_PTE_VALID))
-#define pte_accessible(mm, pte)	(mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte))
+#define pte_accessible(mm, pte)	(mm_tlb_flush_pending(mm, true) ? \
+					pte_present(pte) : pte_valid(pte))
 #define pte_write(pte)		(pte_isclear((pte), L_PTE_RDONLY))
 #define pte_dirty(pte)		(pte_isset((pte), L_PTE_DIRTY))
 #define pte_young(pte)		(pte_isset((pte), L_PTE_YOUNG))
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index c213fdbd056c..47f934d378ca 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -108,7 +108,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
  * remapped as PROT_NONE but are yet to be flushed from the TLB.
  */
 #define pte_accessible(mm, pte)	\
-	(mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte))
+	(mm_tlb_flush_pending(mm, true) ? pte_present(pte) : \
+					  pte_valid_young(pte))
 
 static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
 {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f5af95a0c6b8..da16793203dd 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -642,7 +642,7 @@ static inline bool pte_accessible(struct mm_struct *mm, pte_t a)
 		return true;
 
 	if ((pte_flags(a) & _PAGE_PROTNONE) &&
-			mm_tlb_flush_pending(mm))
+			mm_tlb_flush_pending(mm, true))
 		return true;
 
 	return false;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 36f4ec589544..57ab8061a2c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -522,12 +522,21 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
 /*
  * Memory barriers to keep this state in sync are graciously provided by
  * the page table locks, outside of which no page table modifications happen.
- * The barriers below prevent the compiler from re-ordering the instructions
- * around the memory barriers that are already present in the code.
+ * The barriers are used to ensure the order between tlb_flush_pending updates,
+ * which happen while the lock is not taken, and the PTE updates, which happen
+ * while the lock is taken, are serialized.
  */
-static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
+static inline bool mm_tlb_flush_pending(struct mm_struct *mm, bool pt_locked)
 {
-	barrier();
+	/*
+	 * mm_tlb_flush_pending() is safe if it is executed while the page-table
+	 * lock is taken. But if the lock was already released, there does not
+	 * seem to be a guarantee that a memory barrier. A memory barrier is
+	 * therefore needed on architectures with weak memory models.
+	 */
+	if (!pt_locked)
+		smp_mb__after_unlock_lock();
+
 	return atomic_read(&mm->tlb_flush_pending) > 0;
 }
 static inline void set_tlb_flush_pending(struct mm_struct *mm)
@@ -535,19 +544,25 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
 	atomic_inc(&mm->tlb_flush_pending);
 
 	/*
-	 * Guarantee that the tlb_flush_pending store does not leak into the
+	 * Guarantee that the tlb_flush_pending increase does not leak into the
 	 * critical section updating the page tables
 	 */
 	smp_mb__before_spinlock();
 }
-/* Clearing is done after a TLB flush, which also provides a barrier. */
+
 static inline void clear_tlb_flush_pending(struct mm_struct *mm)
 {
-	barrier();
+	/*
+	 * Guarantee that the tlb_flush_pending does not not leak into the
+	 * critical section, since we must order the PTE change and changes to
+	 * the pending TLB flush indication. We could have relied on TLB flush
+	 * as a memory barrier, but this behavior is not clearly documented.
+	 */
+	smp_mb__before_atomic();
 	atomic_dec(&mm->tlb_flush_pending);
 }
 #else
-static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
+static inline bool mm_tlb_flush_pending(struct mm_struct *mm, bool pt_locked)
 {
 	return false;
 }
diff --git a/mm/migrate.c b/mm/migrate.c
index 89a0a1707f4c..169c3165be41 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1939,7 +1939,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	 * We are not sure a pending tlb flush here is for a huge page
 	 * mapping or not. Hence use the tlb range variant
 	 */
-	if (mm_tlb_flush_pending(mm))
+	if (mm_tlb_flush_pending(mm, false))
 		flush_tlb_range(vma, mmun_start, mmun_end);
 
 	/* Prepare a page as a migration target */
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending
  2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
@ 2017-07-28  1:34   ` Sergey Senozhatsky
  2017-07-28  1:44     ` Nadav Amit
  2017-07-28  1:42   ` Sergey Senozhatsky
  2017-07-28  2:28   ` Rik van Riel
  2 siblings, 1 reply; 9+ messages in thread
From: Sergey Senozhatsky @ 2017-07-28  1:34 UTC (permalink / raw)
  To: Nadav Amit
  Cc: linux-mm, sergey.senozhatsky, minchan, nadav.amit, mgorman, riel,
	luto, stable, Andrew Morton

just my 5 silly cents,

On (07/27/17 04:40), Nadav Amit wrote:
[..]
>  static inline void set_tlb_flush_pending(struct mm_struct *mm)
>  {
> -	mm->tlb_flush_pending = true;
> +	atomic_inc(&mm->tlb_flush_pending);
>  
>  	/*
>  	 * Guarantee that the tlb_flush_pending store does not leak into the
> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
>  static inline void clear_tlb_flush_pending(struct mm_struct *mm)
>  {
>  	barrier();
> -	mm->tlb_flush_pending = false;
> +	atomic_dec(&mm->tlb_flush_pending);
>  }

so, _technically_, set_tlb_flush_pending() can be nested, right? IOW,

	set_tlb_flush_pending()
	set_tlb_flush_pending()
	flush_tlb_range()
	clear_tlb_flush_pending()
	clear_tlb_flush_pending()  // if we miss this one, then
				   // ->tlb_flush_pending is !clear,
				   // even though we called
				   // clear_tlb_flush_pending()

if so then set_ and clear_ are a bit misleading names for something
that does atomic_inc()/atomic_dec() internally.

especially when one sees this part

> -	clear_tlb_flush_pending(mm);
> +#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
> +	atomic_set(&mm->tlb_flush_pending, 0);
> +#endif

so we have clear_tlb_flush_pending() function which probably should
set it to 0 as the name suggests (I see what you did tho), yet still
do atomic_set() under ifdef-s.

well, just nitpicks.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending
  2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
  2017-07-28  1:34   ` Sergey Senozhatsky
@ 2017-07-28  1:42   ` Sergey Senozhatsky
  2017-07-28  2:28   ` Rik van Riel
  2 siblings, 0 replies; 9+ messages in thread
From: Sergey Senozhatsky @ 2017-07-28  1:42 UTC (permalink / raw)
  To: Nadav Amit
  Cc: linux-mm, sergey.senozhatsky, minchan, nadav.amit, mgorman, riel,
	luto, stable, Andrew Morton

On (07/27/17 04:40), Nadav Amit wrote:
[..]
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm)
>  		mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
>  #endif
>  #if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
> -		mm->tlb_flush_pending,
> +		atomic_read(&mm->tlb_flush_pending),
>  #endif

can we use mm_tlb_flush_pending() here and get rid of ifdef-s?

/* I understand that this a -stable patch, so we can do it in a
separate patch. */

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending
  2017-07-28  1:34   ` Sergey Senozhatsky
@ 2017-07-28  1:44     ` Nadav Amit
  0 siblings, 0 replies; 9+ messages in thread
From: Nadav Amit @ 2017-07-28  1:44 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: open list:MEMORY MANAGEMENT, Sergey Senozhatsky, Minchan Kim,
	Mel Gorman, Rik van Riel, Andy Lutomirski, stable, Andrew Morton

Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> just my 5 silly cents,
> 
> On (07/27/17 04:40), Nadav Amit wrote:
> [..]
>> static inline void set_tlb_flush_pending(struct mm_struct *mm)
>> {
>> -	mm->tlb_flush_pending = true;
>> +	atomic_inc(&mm->tlb_flush_pending);
>> 
>> 	/*
>> 	 * Guarantee that the tlb_flush_pending store does not leak into the
>> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
>> static inline void clear_tlb_flush_pending(struct mm_struct *mm)
>> {
>> 	barrier();
>> -	mm->tlb_flush_pending = false;
>> +	atomic_dec(&mm->tlb_flush_pending);
>> }
> 
> so, _technically_, set_tlb_flush_pending() can be nested, right? IOW,
> 
> 	set_tlb_flush_pending()
> 	set_tlb_flush_pending()
> 	flush_tlb_range()
> 	clear_tlb_flush_pending()
> 	clear_tlb_flush_pending()  // if we miss this one, then
> 				   // ->tlb_flush_pending is !clear,
> 				   // even though we called
> 				   // clear_tlb_flush_pending()
> 
> if so then set_ and clear_ are a bit misleading names for something
> that does atomic_inc()/atomic_dec() internally.
> 
> especially when one sees this part
> 
>> -	clear_tlb_flush_pending(mm);
>> +#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
>> +	atomic_set(&mm->tlb_flush_pending, 0);
>> +#endif
> 
> so we have clear_tlb_flush_pending() function which probably should
> set it to 0 as the name suggests (I see what you did tho), yet still
> do atomic_set() under ifdef-s.
> 
> well, just nitpicks.

I see your point. Initially, I tried to keep exactly the same interface to
reduce the number of changes, but it might be misleading. I will change the
names (inc/dec_tlb_flush_pending). I will create init_tlb_flush_pending()
for initialization since you ask, but Minchan's changes would likely remove
the ifdef’s, making it a bit strange for a single use.

Anyhow, I’ll wait to see if there any other comments and then do it for v4.

Thanks,
Nadav


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending
  2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
  2017-07-28  1:34   ` Sergey Senozhatsky
  2017-07-28  1:42   ` Sergey Senozhatsky
@ 2017-07-28  2:28   ` Rik van Riel
  2 siblings, 0 replies; 9+ messages in thread
From: Rik van Riel @ 2017-07-28  2:28 UTC (permalink / raw)
  To: Nadav Amit, linux-mm
  Cc: sergey.senozhatsky, minchan, nadav.amit, mgorman, luto, stable

On Thu, 2017-07-27 at 04:40 -0700, Nadav Amit wrote:
> From: Nadav Amit <nadav.amit@gmail.com>
> 
> Setting and clearing mm->tlb_flush_pending can be performed by
> multiple
> threads, since mmap_sem may only be acquired for read in
> task_numa_work(). If this happens, tlb_flush_pending might be cleared
> while one of the threads still changes PTEs and batches TLB flushes.
> 
> This can lead to the same race between migration and
> change_protection_range() that led to the introduction of
> tlb_flush_pending. The result of this race was data corruption, which
> means that this patch also addresses a theoretically possible data
> corruption.
> 
> An actual data corruption was not observed, yet the race was
> was confirmed by adding assertion to check tlb_flush_pending is not
> set
> by two threads, adding artificial latency in
> change_protection_range()
> and using sysctl to reduce kernel.numa_balancing_scan_delay_ms.
> 
> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
> change_protection_range")
> 
> Cc: stable@vger.kernel.org
> 
> Signed-off-by: Nadav Amit <namit@vmware.com>
> Acked-by: Mel Gorman <mgorman@suse.de>
> 
Acked-by: Rik van Riel <riel@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending
  2017-07-27 11:40 ` [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit
@ 2017-07-28  7:42   ` Mel Gorman
  2017-07-28 16:40     ` Nadav Amit
  0 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2017-07-28  7:42 UTC (permalink / raw)
  To: Nadav Amit; +Cc: linux-mm, sergey.senozhatsky, minchan, nadav.amit, riel, luto

On Thu, Jul 27, 2017 at 04:40:15AM -0700, Nadav Amit wrote:
> Reading tlb_flush_pending while the page-table lock is taken does not
> require a barrier, since the lock/unlock already acts as a barrier.
> Removing the barrier in mm_tlb_flush_pending() to address this issue.
> 
> However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending()
> while the page-table lock is already released, which may present a
> problem on architectures with weak memory model (PPC). To deal with this
> case, a new parameter is added to mm_tlb_flush_pending() to indicate
> if it is read without the page-table lock taken, and calling
> smp_mb__after_unlock_lock() in this case.
> 
> Signed-off-by: Nadav Amit <namit@vmware.com>

Conditional locking based on function arguements are often considered
extremely hazardous. Conditional barriers are even more troublesome because
it's simply too easy to get wrong.

Revert b0943d61b8fa420180f92f64ef67662b4f6cc493 instead of this patch. It's
not a clean revert but conflicts are due to comment changes. It moves
the check back under the PTL and the impact is marginal given that
it a spurious TLB flush will only occur when potentially racing with
change_prot_range. Since that commit went in, a lot of changes have happened
that alter the scan rate of automatic NUMA balancing so it shouldn't be a
serious issue. It's certainly a nicer option than using conditional barriers.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending
  2017-07-28  7:42   ` Mel Gorman
@ 2017-07-28 16:40     ` Nadav Amit
  0 siblings, 0 replies; 9+ messages in thread
From: Nadav Amit @ 2017-07-28 16:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: open list:MEMORY MANAGEMENT, Sergey Senozhatsky, Minchan Kim,
	Rik van Riel, luto

Mel Gorman <mgorman@suse.de> wrote:

> On Thu, Jul 27, 2017 at 04:40:15AM -0700, Nadav Amit wrote:
>> Reading tlb_flush_pending while the page-table lock is taken does not
>> require a barrier, since the lock/unlock already acts as a barrier.
>> Removing the barrier in mm_tlb_flush_pending() to address this issue.
>> 
>> However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending()
>> while the page-table lock is already released, which may present a
>> problem on architectures with weak memory model (PPC). To deal with this
>> case, a new parameter is added to mm_tlb_flush_pending() to indicate
>> if it is read without the page-table lock taken, and calling
>> smp_mb__after_unlock_lock() in this case.
>> 
>> Signed-off-by: Nadav Amit <namit@vmware.com>
> 
> Conditional locking based on function arguements are often considered
> extremely hazardous. Conditional barriers are even more troublesome because
> it's simply too easy to get wrong.
> 
> Revert b0943d61b8fa420180f92f64ef67662b4f6cc493 instead of this patch. It's
> not a clean revert but conflicts are due to comment changes. It moves
> the check back under the PTL and the impact is marginal given that
> it a spurious TLB flush will only occur when potentially racing with
> change_prot_range. Since that commit went in, a lot of changes have happened
> that alter the scan rate of automatic NUMA balancing so it shouldn't be a
> serious issue. It's certainly a nicer option than using conditional barriers.

Ok. Initially, I added a memory barrier only in
migrate_misplaced_transhuge_page(), and included a detailed comment about it
- I still think it is better. But since you feel confident the impact will
be relatively small, I will do the revert.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-07-28 16:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27 11:40 [PATCH v3 0/2] mm: fixes of tlb_flush_pending races Nadav Amit
2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
2017-07-28  1:34   ` Sergey Senozhatsky
2017-07-28  1:44     ` Nadav Amit
2017-07-28  1:42   ` Sergey Senozhatsky
2017-07-28  2:28   ` Rik van Riel
2017-07-27 11:40 ` [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit
2017-07-28  7:42   ` Mel Gorman
2017-07-28 16:40     ` Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).