linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Prevent racy access to tlb_flush_pending
@ 2017-07-17 18:02 Nadav Amit
  2017-07-18  1:31 ` Andy Lutomirski
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-17 18:02 UTC (permalink / raw)
  To: linux-mm; +Cc: nadav.amit, mgorman, riel, luto, Nadav Amit

Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in task_numa_work.
If this happens, tlb_flush_pending may be cleared while one of the
threads still changes PTEs and batches TLB flushes.

As a result, TLB flushes can be skipped because the indication of
pending TLB flushes is lost, for instance due to race between
migration and change_protection_range (just as in the scenario that
caused the introduction of tlb_flush_pending).

The feasibility of such a scenario was confirmed by adding assertion to
check tlb_flush_pending is not set by two threads, adding artificial
latency in change_protection_range() and using sysctl to reduce
kernel.numa_balancing_scan_delay_ms.

Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 include/linux/mm_types.h | 8 ++++----
 kernel/fork.c            | 2 +-
 mm/debug.c               | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 45cdb27791a3..36f4ec589544 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -493,7 +493,7 @@ struct mm_struct {
 	 * can move process memory needs to flush the TLB when moving a
 	 * PROT_NONE or PROT_NUMA mapped page.
 	 */
-	bool tlb_flush_pending;
+	atomic_t tlb_flush_pending;
 #endif
 	struct uprobes_state uprobes_state;
 #ifdef CONFIG_HUGETLB_PAGE
@@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
 static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
 {
 	barrier();
-	return mm->tlb_flush_pending;
+	return atomic_read(&mm->tlb_flush_pending) > 0;
 }
 static inline void set_tlb_flush_pending(struct mm_struct *mm)
 {
-	mm->tlb_flush_pending = true;
+	atomic_inc(&mm->tlb_flush_pending);
 
 	/*
 	 * Guarantee that the tlb_flush_pending store does not leak into the
@@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
 static inline void clear_tlb_flush_pending(struct mm_struct *mm)
 {
 	barrier();
-	mm->tlb_flush_pending = false;
+	atomic_dec(&mm->tlb_flush_pending);
 }
 #else
 static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
diff --git a/kernel/fork.c b/kernel/fork.c
index e53770d2bf95..5a7ecfbb7420 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -809,7 +809,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
 	mmu_notifier_mm_init(mm);
-	clear_tlb_flush_pending(mm);
+	atomic_set(&mm->tlb_flush_pending, 0);
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
 	mm->pmd_huge_pte = NULL;
 #endif
diff --git a/mm/debug.c b/mm/debug.c
index db1cd26d8752..d70103bb4731 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm)
 		mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
 #endif
 #if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
-		mm->tlb_flush_pending,
+		atomic_read(&mm->tlb_flush_pending),
 #endif
 		mm->def_flags, &mm->def_flags
 	);
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
  2017-07-17 18:02 [PATCH] mm: Prevent racy access to tlb_flush_pending Nadav Amit
@ 2017-07-18  1:31 ` Andy Lutomirski
  2017-07-18  1:40   ` Nadav Amit
  2017-07-24 19:50 ` Nadav Amit
  2017-07-24 23:54 ` Andrew Morton
  2 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2017-07-18  1:31 UTC (permalink / raw)
  To: Nadav Amit
  Cc: linux-mm, Nadav Amit, Mel Gorman, Rik van Riel, Andrew Lutomirski

On Mon, Jul 17, 2017 at 11:02 AM, Nadav Amit <namit@vmware.com> wrote:
> Setting and clearing mm->tlb_flush_pending can be performed by multiple
> threads, since mmap_sem may only be acquired for read in task_numa_work.
> If this happens, tlb_flush_pending may be cleared while one of the
> threads still changes PTEs and batches TLB flushes.
>
> As a result, TLB flushes can be skipped because the indication of
> pending TLB flushes is lost, for instance due to race between
> migration and change_protection_range (just as in the scenario that
> caused the introduction of tlb_flush_pending).
>
> The feasibility of such a scenario was confirmed by adding assertion to
> check tlb_flush_pending is not set by two threads, adding artificial
> latency in change_protection_range() and using sysctl to reduce
> kernel.numa_balancing_scan_delay_ms.

This thing is logically a refcount.  Should it be refcount_t?

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
  2017-07-18  1:31 ` Andy Lutomirski
@ 2017-07-18  1:40   ` Nadav Amit
  2017-07-18  4:52     ` Andy Lutomirski
  0 siblings, 1 reply; 10+ messages in thread
From: Nadav Amit @ 2017-07-18  1:40 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Nadav Amit, linux-mm, Mel Gorman, Rik van Riel

Andy Lutomirski <luto@kernel.org> wrote:

> On Mon, Jul 17, 2017 at 11:02 AM, Nadav Amit <namit@vmware.com> wrote:
>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>> If this happens, tlb_flush_pending may be cleared while one of the
>> threads still changes PTEs and batches TLB flushes.
>> 
>> As a result, TLB flushes can be skipped because the indication of
>> pending TLB flushes is lost, for instance due to race between
>> migration and change_protection_range (just as in the scenario that
>> caused the introduction of tlb_flush_pending).
>> 
>> The feasibility of such a scenario was confirmed by adding assertion to
>> check tlb_flush_pending is not set by two threads, adding artificial
>> latency in change_protection_range() and using sysctl to reduce
>> kernel.numa_balancing_scan_delay_ms.
> 
> This thing is logically a refcount.  Should it be refcount_t?

I don’t think so. refcount_inc() would WARN_ONCE if the counter is zero
before the increase, although this is a valid scenario here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
  2017-07-18  1:40   ` Nadav Amit
@ 2017-07-18  4:52     ` Andy Lutomirski
  2017-07-18  5:11       ` Nadav Amit
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2017-07-18  4:52 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andy Lutomirski, Nadav Amit, linux-mm, Mel Gorman, Rik van Riel

On Mon, Jul 17, 2017 at 6:40 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> Andy Lutomirski <luto@kernel.org> wrote:
>
>> On Mon, Jul 17, 2017 at 11:02 AM, Nadav Amit <namit@vmware.com> wrote:
>>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>>> If this happens, tlb_flush_pending may be cleared while one of the
>>> threads still changes PTEs and batches TLB flushes.
>>>
>>> As a result, TLB flushes can be skipped because the indication of
>>> pending TLB flushes is lost, for instance due to race between
>>> migration and change_protection_range (just as in the scenario that
>>> caused the introduction of tlb_flush_pending).
>>>
>>> The feasibility of such a scenario was confirmed by adding assertion to
>>> check tlb_flush_pending is not set by two threads, adding artificial
>>> latency in change_protection_range() and using sysctl to reduce
>>> kernel.numa_balancing_scan_delay_ms.
>>
>> This thing is logically a refcount.  Should it be refcount_t?
>
> I don’t think so. refcount_inc() would WARN_ONCE if the counter is zero
> before the increase, although this is a valid scenario here.
>

Hmm.  Maybe a refcount that starts at 1?  My point is that, if someone
could force it to overflow, it would be bad.  Maybe this isn't worth
worrying about.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
  2017-07-18  4:52     ` Andy Lutomirski
@ 2017-07-18  5:11       ` Nadav Amit
  0 siblings, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-18  5:11 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Nadav Amit, linux-mm, Mel Gorman, Rik van Riel

Andy Lutomirski <luto@kernel.org> wrote:

> On Mon, Jul 17, 2017 at 6:40 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
>> Andy Lutomirski <luto@kernel.org> wrote:
>> 
>>> On Mon, Jul 17, 2017 at 11:02 AM, Nadav Amit <namit@vmware.com> wrote:
>>>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>>>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>>>> If this happens, tlb_flush_pending may be cleared while one of the
>>>> threads still changes PTEs and batches TLB flushes.
>>>> 
>>>> As a result, TLB flushes can be skipped because the indication of
>>>> pending TLB flushes is lost, for instance due to race between
>>>> migration and change_protection_range (just as in the scenario that
>>>> caused the introduction of tlb_flush_pending).
>>>> 
>>>> The feasibility of such a scenario was confirmed by adding assertion to
>>>> check tlb_flush_pending is not set by two threads, adding artificial
>>>> latency in change_protection_range() and using sysctl to reduce
>>>> kernel.numa_balancing_scan_delay_ms.
>>> 
>>> This thing is logically a refcount.  Should it be refcount_t?
>> 
>> I don’t think so. refcount_inc() would WARN_ONCE if the counter is zero
>> before the increase, although this is a valid scenario here.
> 
> Hmm.  Maybe a refcount that starts at 1?  My point is that, if someone
> could force it to overflow, it would be bad.  Maybe this isn't worth
> worrying about.

I don’t think it is a issue. At most you can have one task_numa_work() per
core running in any given moment.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
  2017-07-17 18:02 [PATCH] mm: Prevent racy access to tlb_flush_pending Nadav Amit
  2017-07-18  1:31 ` Andy Lutomirski
@ 2017-07-24 19:50 ` Nadav Amit
  2017-07-24 23:54 ` Andrew Morton
  2 siblings, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-24 19:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: open list:MEMORY MANAGEMENT, Mel Gorman, Rik van Riel,
	Andy Lutomirski, Nadav Amit

Nadav Amit <namit@vmware.com> wrote:

> Setting and clearing mm->tlb_flush_pending can be performed by multiple
> threads, since mmap_sem may only be acquired for read in task_numa_work.
> If this happens, tlb_flush_pending may be cleared while one of the
> threads still changes PTEs and batches TLB flushes.
> 
> As a result, TLB flushes can be skipped because the indication of
> pending TLB flushes is lost, for instance due to race between
> migration and change_protection_range (just as in the scenario that
> caused the introduction of tlb_flush_pending).
> 
> The feasibility of such a scenario was confirmed by adding assertion to
> check tlb_flush_pending is not set by two threads, adding artificial
> latency in change_protection_range() and using sysctl to reduce
> kernel.numa_balancing_scan_delay_ms.
> 
> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
> change_protection_range")
> 
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
> include/linux/mm_types.h | 8 ++++----
> kernel/fork.c            | 2 +-
> mm/debug.c               | 2 +-
> 3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 45cdb27791a3..36f4ec589544 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -493,7 +493,7 @@ struct mm_struct {
> 	 * can move process memory needs to flush the TLB when moving a
> 	 * PROT_NONE or PROT_NUMA mapped page.
> 	 */
> -	bool tlb_flush_pending;
> +	atomic_t tlb_flush_pending;
> #endif
> 	struct uprobes_state uprobes_state;
> #ifdef CONFIG_HUGETLB_PAGE
> @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
> {
> 	barrier();
> -	return mm->tlb_flush_pending;
> +	return atomic_read(&mm->tlb_flush_pending) > 0;
> }
> static inline void set_tlb_flush_pending(struct mm_struct *mm)
> {
> -	mm->tlb_flush_pending = true;
> +	atomic_inc(&mm->tlb_flush_pending);
> 
> 	/*
> 	 * Guarantee that the tlb_flush_pending store does not leak into the
> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
> static inline void clear_tlb_flush_pending(struct mm_struct *mm)
> {
> 	barrier();
> -	mm->tlb_flush_pending = false;
> +	atomic_dec(&mm->tlb_flush_pending);
> }
> #else
> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index e53770d2bf95..5a7ecfbb7420 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -809,7 +809,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> 	mm_init_aio(mm);
> 	mm_init_owner(mm, p);
> 	mmu_notifier_mm_init(mm);
> -	clear_tlb_flush_pending(mm);
> +	atomic_set(&mm->tlb_flush_pending, 0);
> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
> 	mm->pmd_huge_pte = NULL;
> #endif
> diff --git a/mm/debug.c b/mm/debug.c
> index db1cd26d8752..d70103bb4731 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm)
> 		mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
> #endif
> #if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
> -		mm->tlb_flush_pending,
> +		atomic_read(&mm->tlb_flush_pending),
> #endif
> 		mm->def_flags, &mm->def_flags
> 	);
> -- 
> 2.11.0

Andrew, are there any reservations regarding this patch (excluding those of
Andy’s which I think I addressed)?

Thanks,
Nadav
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
  2017-07-17 18:02 [PATCH] mm: Prevent racy access to tlb_flush_pending Nadav Amit
  2017-07-18  1:31 ` Andy Lutomirski
  2017-07-24 19:50 ` Nadav Amit
@ 2017-07-24 23:54 ` Andrew Morton
  2017-07-25  0:27   ` Nadav Amit
  2 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2017-07-24 23:54 UTC (permalink / raw)
  To: Nadav Amit; +Cc: linux-mm, nadav.amit, mgorman, riel, luto

On Mon, 17 Jul 2017 11:02:46 -0700 Nadav Amit <namit@vmware.com> wrote:

> Setting and clearing mm->tlb_flush_pending can be performed by multiple
> threads, since mmap_sem may only be acquired for read in task_numa_work.
> If this happens, tlb_flush_pending may be cleared while one of the
> threads still changes PTEs and batches TLB flushes.
> 
> As a result, TLB flushes can be skipped because the indication of
> pending TLB flushes is lost, for instance due to race between
> migration and change_protection_range (just as in the scenario that
> caused the introduction of tlb_flush_pending).
> 
> The feasibility of such a scenario was confirmed by adding assertion to
> check tlb_flush_pending is not set by two threads, adding artificial
> latency in change_protection_range() and using sysctl to reduce
> kernel.numa_balancing_scan_delay_ms.
> 
> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
> change_protection_range")
> 

The changelog doesn't describe the user-visible effects of the bug (it
should always do so, please).  But it is presumably a data-corruption
bug so I suggest that a -stable backport is warranted?

It has been there for 4 years so I'm thinking we can hold off a
mainline (and hence -stable) merge until 4.13-rc1, yes?


One thought:

> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
>
> ...
>
> @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
>  static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
>  {
>  	barrier();
> -	return mm->tlb_flush_pending;
> +	return atomic_read(&mm->tlb_flush_pending) > 0;
>  }
>  static inline void set_tlb_flush_pending(struct mm_struct *mm)
>  {
> -	mm->tlb_flush_pending = true;
> +	atomic_inc(&mm->tlb_flush_pending);
>  
>  	/*
>  	 * Guarantee that the tlb_flush_pending store does not leak into the
> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
>  static inline void clear_tlb_flush_pending(struct mm_struct *mm)
>  {
>  	barrier();
> -	mm->tlb_flush_pending = false;
> +	atomic_dec(&mm->tlb_flush_pending);
>  }
>  #else

Do we still need the barrier()s or is it OK to let the atomic op do
that for us (with a suitable code comment).


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
  2017-07-24 23:54 ` Andrew Morton
@ 2017-07-25  0:27   ` Nadav Amit
  2017-07-25  0:33     ` Nadav Amit
  2017-07-25  9:49     ` Mel Gorman
  0 siblings, 2 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-25  0:27 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman
  Cc: Nadav Amit, open list:MEMORY MANAGEMENT, Rik van Riel, Andy Lutomirski

Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 17 Jul 2017 11:02:46 -0700 Nadav Amit <namit@vmware.com> wrote:
> 
>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>> If this happens, tlb_flush_pending may be cleared while one of the
>> threads still changes PTEs and batches TLB flushes.
>> 
>> As a result, TLB flushes can be skipped because the indication of
>> pending TLB flushes is lost, for instance due to race between
>> migration and change_protection_range (just as in the scenario that
>> caused the introduction of tlb_flush_pending).
>> 
>> The feasibility of such a scenario was confirmed by adding assertion to
>> check tlb_flush_pending is not set by two threads, adding artificial
>> latency in change_protection_range() and using sysctl to reduce
>> kernel.numa_balancing_scan_delay_ms.
>> 
>> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
>> change_protection_range")
> 
> The changelog doesn't describe the user-visible effects of the bug (it
> should always do so, please).  But it is presumably a data-corruption
> bug so I suggest that a -stable backport is warranted?

Yes, although I did not encounter an actual memory corruption.

> 
> It has been there for 4 years so I'm thinking we can hold off a
> mainline (and hence -stable) merge until 4.13-rc1, yes?
> 
> 
> One thought:
> 
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> 
>> ...
>> 
>> @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
>> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
>> {
>> 	barrier();
>> -	return mm->tlb_flush_pending;
>> +	return atomic_read(&mm->tlb_flush_pending) > 0;
>> }
>> static inline void set_tlb_flush_pending(struct mm_struct *mm)
>> {
>> -	mm->tlb_flush_pending = true;
>> +	atomic_inc(&mm->tlb_flush_pending);
>> 
>> 	/*
>> 	 * Guarantee that the tlb_flush_pending store does not leak into the
>> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
>> static inline void clear_tlb_flush_pending(struct mm_struct *mm)
>> {
>> 	barrier();
>> -	mm->tlb_flush_pending = false;
>> +	atomic_dec(&mm->tlb_flush_pending);
>> }
>> #else
> 
> Do we still need the barrier()s or is it OK to let the atomic op do
> that for us (with a suitable code comment).

I will submit v2. However, I really don’t understand the comment on
mm_tlb_flush_pending():

/*              
 * Memory barriers to keep this state in sync are graciously provided by
 * the page table locks, outside of which no page table modifications happen.
 * The barriers below prevent the compiler from re-ordering the instructions
 * around the memory barriers that are already present in the code.
 */

But IIUC migrate_misplaced_transhuge_page() does not call
mm_tlb_flush_pending() while the ptl is taken.

Mel, can I bother you again? Should I move the flush in
migrate_misplaced_transhuge_page() till after the ptl is taken?

Thanks,
Nadav

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
  2017-07-25  0:27   ` Nadav Amit
@ 2017-07-25  0:33     ` Nadav Amit
  2017-07-25  9:49     ` Mel Gorman
  1 sibling, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-25  0:33 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman
  Cc: Nadav Amit, open list:MEMORY MANAGEMENT, Rik van Riel, Andy Lutomirski

Nadav Amit <nadav.amit@gmail.com> wrote:

> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>> On Mon, 17 Jul 2017 11:02:46 -0700 Nadav Amit <namit@vmware.com> wrote:
>> 
>>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>>> If this happens, tlb_flush_pending may be cleared while one of the
>>> threads still changes PTEs and batches TLB flushes.
>>> 
>>> As a result, TLB flushes can be skipped because the indication of
>>> pending TLB flushes is lost, for instance due to race between
>>> migration and change_protection_range (just as in the scenario that
>>> caused the introduction of tlb_flush_pending).
>>> 
>>> The feasibility of such a scenario was confirmed by adding assertion to
>>> check tlb_flush_pending is not set by two threads, adding artificial
>>> latency in change_protection_range() and using sysctl to reduce
>>> kernel.numa_balancing_scan_delay_ms.
>>> 
>>> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
>>> change_protection_range")
>> 
>> The changelog doesn't describe the user-visible effects of the bug (it
>> should always do so, please).  But it is presumably a data-corruption
>> bug so I suggest that a -stable backport is warranted?
> 
> Yes, although I did not encounter an actual memory corruption.
> 
>> It has been there for 4 years so I'm thinking we can hold off a
>> mainline (and hence -stable) merge until 4.13-rc1, yes?
>> 
>> 
>> One thought:
>> 
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> 
>>> ...
>>> 
>>> @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
>>> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
>>> {
>>> 	barrier();
>>> -	return mm->tlb_flush_pending;
>>> +	return atomic_read(&mm->tlb_flush_pending) > 0;
>>> }
>>> static inline void set_tlb_flush_pending(struct mm_struct *mm)
>>> {
>>> -	mm->tlb_flush_pending = true;
>>> +	atomic_inc(&mm->tlb_flush_pending);
>>> 
>>> 	/*
>>> 	 * Guarantee that the tlb_flush_pending store does not leak into the
>>> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
>>> static inline void clear_tlb_flush_pending(struct mm_struct *mm)
>>> {
>>> 	barrier();
>>> -	mm->tlb_flush_pending = false;
>>> +	atomic_dec(&mm->tlb_flush_pending);
>>> }
>>> #else
>> 
>> Do we still need the barrier()s or is it OK to let the atomic op do
>> that for us (with a suitable code comment).
> 
> I will submit v2. However, I really don’t understand the comment on
> mm_tlb_flush_pending():
> 
> /*              
> * Memory barriers to keep this state in sync are graciously provided by
> * the page table locks, outside of which no page table modifications happen.
> * The barriers below prevent the compiler from re-ordering the instructions
> * around the memory barriers that are already present in the code.
> */
> 
> But IIUC migrate_misplaced_transhuge_page() does not call
> mm_tlb_flush_pending() while the ptl is taken.
> 
> Mel, can I bother you again? Should I move the flush in
> migrate_misplaced_transhuge_page() till after the ptl is taken?

Oops: this would be obviously wrong since it would move it after
migrate_page_copy() is run. So I do need your advice whether the comment is
wrong or the implementation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
  2017-07-25  0:27   ` Nadav Amit
  2017-07-25  0:33     ` Nadav Amit
@ 2017-07-25  9:49     ` Mel Gorman
  1 sibling, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2017-07-25  9:49 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrew Morton, Nadav Amit, open list:MEMORY MANAGEMENT,
	Rik van Riel, Andy Lutomirski

On Mon, Jul 24, 2017 at 05:27:47PM -0700, Nadav Amit wrote:
> > Do we still need the barrier()s or is it OK to let the atomic op do
> > that for us (with a suitable code comment).
> 
> I will submit v2. However, I really don???t understand the comment on
> mm_tlb_flush_pending():
> 
> /*              
>  * Memory barriers to keep this state in sync are graciously provided by
>  * the page table locks, outside of which no page table modifications happen.
>  * The barriers below prevent the compiler from re-ordering the instructions
>  * around the memory barriers that are already present in the code.
>  */
> 
> But IIUC migrate_misplaced_transhuge_page() does not call
> mm_tlb_flush_pending() while the ptl is taken.
> 
> Mel, can I bother you again? Should I move the flush in
> migrate_misplaced_transhuge_page() till after the ptl is taken?
> 

The flush, if it's necessary, needs to happen before the copy. However,
in this particular context it shouldn't matter.  In this specific context,
we must be dealing with a NUMA hinting fault which means the original PTE
is PROT_NONE, flushed and no writes are possible.  If a protection update
happens during the copy in migrate_misplaced_transhuge_page then it'll
be detected in migrate_misplaced_transhuge_page by the pmd_same check and
the page copy was a waste of time but otherwise harmless.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-07-25  9:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-17 18:02 [PATCH] mm: Prevent racy access to tlb_flush_pending Nadav Amit
2017-07-18  1:31 ` Andy Lutomirski
2017-07-18  1:40   ` Nadav Amit
2017-07-18  4:52     ` Andy Lutomirski
2017-07-18  5:11       ` Nadav Amit
2017-07-24 19:50 ` Nadav Amit
2017-07-24 23:54 ` Andrew Morton
2017-07-25  0:27   ` Nadav Amit
2017-07-25  0:33     ` Nadav Amit
2017-07-25  9:49     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).