* [PATCH] mm: Prevent racy access to tlb_flush_pending
@ 2017-07-17 18:02 Nadav Amit
2017-07-18 1:31 ` Andy Lutomirski
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-17 18:02 UTC (permalink / raw)
To: linux-mm; +Cc: nadav.amit, mgorman, riel, luto, Nadav Amit
Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in task_numa_work.
If this happens, tlb_flush_pending may be cleared while one of the
threads still changes PTEs and batches TLB flushes.
As a result, TLB flushes can be skipped because the indication of
pending TLB flushes is lost, for instance due to race between
migration and change_protection_range (just as in the scenario that
caused the introduction of tlb_flush_pending).
The feasibility of such a scenario was confirmed by adding assertion to
check tlb_flush_pending is not set by two threads, adding artificial
latency in change_protection_range() and using sysctl to reduce
kernel.numa_balancing_scan_delay_ms.
Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")
Signed-off-by: Nadav Amit <namit@vmware.com>
---
include/linux/mm_types.h | 8 ++++----
kernel/fork.c | 2 +-
mm/debug.c | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 45cdb27791a3..36f4ec589544 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -493,7 +493,7 @@ struct mm_struct {
* can move process memory needs to flush the TLB when moving a
* PROT_NONE or PROT_NUMA mapped page.
*/
- bool tlb_flush_pending;
+ atomic_t tlb_flush_pending;
#endif
struct uprobes_state uprobes_state;
#ifdef CONFIG_HUGETLB_PAGE
@@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
{
barrier();
- return mm->tlb_flush_pending;
+ return atomic_read(&mm->tlb_flush_pending) > 0;
}
static inline void set_tlb_flush_pending(struct mm_struct *mm)
{
- mm->tlb_flush_pending = true;
+ atomic_inc(&mm->tlb_flush_pending);
/*
* Guarantee that the tlb_flush_pending store does not leak into the
@@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
static inline void clear_tlb_flush_pending(struct mm_struct *mm)
{
barrier();
- mm->tlb_flush_pending = false;
+ atomic_dec(&mm->tlb_flush_pending);
}
#else
static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
diff --git a/kernel/fork.c b/kernel/fork.c
index e53770d2bf95..5a7ecfbb7420 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -809,7 +809,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm_init_aio(mm);
mm_init_owner(mm, p);
mmu_notifier_mm_init(mm);
- clear_tlb_flush_pending(mm);
+ atomic_set(&mm->tlb_flush_pending, 0);
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
mm->pmd_huge_pte = NULL;
#endif
diff --git a/mm/debug.c b/mm/debug.c
index db1cd26d8752..d70103bb4731 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm)
mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
#endif
#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
- mm->tlb_flush_pending,
+ atomic_read(&mm->tlb_flush_pending),
#endif
mm->def_flags, &mm->def_flags
);
--
2.11.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
2017-07-17 18:02 [PATCH] mm: Prevent racy access to tlb_flush_pending Nadav Amit
@ 2017-07-18 1:31 ` Andy Lutomirski
2017-07-18 1:40 ` Nadav Amit
2017-07-24 19:50 ` Nadav Amit
2017-07-24 23:54 ` Andrew Morton
2 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2017-07-18 1:31 UTC (permalink / raw)
To: Nadav Amit
Cc: linux-mm, Nadav Amit, Mel Gorman, Rik van Riel, Andrew Lutomirski
On Mon, Jul 17, 2017 at 11:02 AM, Nadav Amit <namit@vmware.com> wrote:
> Setting and clearing mm->tlb_flush_pending can be performed by multiple
> threads, since mmap_sem may only be acquired for read in task_numa_work.
> If this happens, tlb_flush_pending may be cleared while one of the
> threads still changes PTEs and batches TLB flushes.
>
> As a result, TLB flushes can be skipped because the indication of
> pending TLB flushes is lost, for instance due to race between
> migration and change_protection_range (just as in the scenario that
> caused the introduction of tlb_flush_pending).
>
> The feasibility of such a scenario was confirmed by adding assertion to
> check tlb_flush_pending is not set by two threads, adding artificial
> latency in change_protection_range() and using sysctl to reduce
> kernel.numa_balancing_scan_delay_ms.
This thing is logically a refcount. Should it be refcount_t?
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
2017-07-18 1:31 ` Andy Lutomirski
@ 2017-07-18 1:40 ` Nadav Amit
2017-07-18 4:52 ` Andy Lutomirski
0 siblings, 1 reply; 10+ messages in thread
From: Nadav Amit @ 2017-07-18 1:40 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: Nadav Amit, linux-mm, Mel Gorman, Rik van Riel
Andy Lutomirski <luto@kernel.org> wrote:
> On Mon, Jul 17, 2017 at 11:02 AM, Nadav Amit <namit@vmware.com> wrote:
>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>> If this happens, tlb_flush_pending may be cleared while one of the
>> threads still changes PTEs and batches TLB flushes.
>>
>> As a result, TLB flushes can be skipped because the indication of
>> pending TLB flushes is lost, for instance due to race between
>> migration and change_protection_range (just as in the scenario that
>> caused the introduction of tlb_flush_pending).
>>
>> The feasibility of such a scenario was confirmed by adding assertion to
>> check tlb_flush_pending is not set by two threads, adding artificial
>> latency in change_protection_range() and using sysctl to reduce
>> kernel.numa_balancing_scan_delay_ms.
>
> This thing is logically a refcount. Should it be refcount_t?
I don’t think so. refcount_inc() would WARN_ONCE if the counter is zero
before the increase, although this is a valid scenario here.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
2017-07-18 1:40 ` Nadav Amit
@ 2017-07-18 4:52 ` Andy Lutomirski
2017-07-18 5:11 ` Nadav Amit
0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2017-07-18 4:52 UTC (permalink / raw)
To: Nadav Amit
Cc: Andy Lutomirski, Nadav Amit, linux-mm, Mel Gorman, Rik van Riel
On Mon, Jul 17, 2017 at 6:40 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> Andy Lutomirski <luto@kernel.org> wrote:
>
>> On Mon, Jul 17, 2017 at 11:02 AM, Nadav Amit <namit@vmware.com> wrote:
>>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>>> If this happens, tlb_flush_pending may be cleared while one of the
>>> threads still changes PTEs and batches TLB flushes.
>>>
>>> As a result, TLB flushes can be skipped because the indication of
>>> pending TLB flushes is lost, for instance due to race between
>>> migration and change_protection_range (just as in the scenario that
>>> caused the introduction of tlb_flush_pending).
>>>
>>> The feasibility of such a scenario was confirmed by adding assertion to
>>> check tlb_flush_pending is not set by two threads, adding artificial
>>> latency in change_protection_range() and using sysctl to reduce
>>> kernel.numa_balancing_scan_delay_ms.
>>
>> This thing is logically a refcount. Should it be refcount_t?
>
> I don’t think so. refcount_inc() would WARN_ONCE if the counter is zero
> before the increase, although this is a valid scenario here.
>
Hmm. Maybe a refcount that starts at 1? My point is that, if someone
could force it to overflow, it would be bad. Maybe this isn't worth
worrying about.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
2017-07-18 4:52 ` Andy Lutomirski
@ 2017-07-18 5:11 ` Nadav Amit
0 siblings, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-18 5:11 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: Nadav Amit, linux-mm, Mel Gorman, Rik van Riel
Andy Lutomirski <luto@kernel.org> wrote:
> On Mon, Jul 17, 2017 at 6:40 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
>> Andy Lutomirski <luto@kernel.org> wrote:
>>
>>> On Mon, Jul 17, 2017 at 11:02 AM, Nadav Amit <namit@vmware.com> wrote:
>>>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>>>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>>>> If this happens, tlb_flush_pending may be cleared while one of the
>>>> threads still changes PTEs and batches TLB flushes.
>>>>
>>>> As a result, TLB flushes can be skipped because the indication of
>>>> pending TLB flushes is lost, for instance due to race between
>>>> migration and change_protection_range (just as in the scenario that
>>>> caused the introduction of tlb_flush_pending).
>>>>
>>>> The feasibility of such a scenario was confirmed by adding assertion to
>>>> check tlb_flush_pending is not set by two threads, adding artificial
>>>> latency in change_protection_range() and using sysctl to reduce
>>>> kernel.numa_balancing_scan_delay_ms.
>>>
>>> This thing is logically a refcount. Should it be refcount_t?
>>
>> I don’t think so. refcount_inc() would WARN_ONCE if the counter is zero
>> before the increase, although this is a valid scenario here.
>
> Hmm. Maybe a refcount that starts at 1? My point is that, if someone
> could force it to overflow, it would be bad. Maybe this isn't worth
> worrying about.
I don’t think it is a issue. At most you can have one task_numa_work() per
core running in any given moment.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
2017-07-17 18:02 [PATCH] mm: Prevent racy access to tlb_flush_pending Nadav Amit
2017-07-18 1:31 ` Andy Lutomirski
@ 2017-07-24 19:50 ` Nadav Amit
2017-07-24 23:54 ` Andrew Morton
2 siblings, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-24 19:50 UTC (permalink / raw)
To: Andrew Morton
Cc: open list:MEMORY MANAGEMENT, Mel Gorman, Rik van Riel,
Andy Lutomirski, Nadav Amit
Nadav Amit <namit@vmware.com> wrote:
> Setting and clearing mm->tlb_flush_pending can be performed by multiple
> threads, since mmap_sem may only be acquired for read in task_numa_work.
> If this happens, tlb_flush_pending may be cleared while one of the
> threads still changes PTEs and batches TLB flushes.
>
> As a result, TLB flushes can be skipped because the indication of
> pending TLB flushes is lost, for instance due to race between
> migration and change_protection_range (just as in the scenario that
> caused the introduction of tlb_flush_pending).
>
> The feasibility of such a scenario was confirmed by adding assertion to
> check tlb_flush_pending is not set by two threads, adding artificial
> latency in change_protection_range() and using sysctl to reduce
> kernel.numa_balancing_scan_delay_ms.
>
> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
> change_protection_range")
>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
> include/linux/mm_types.h | 8 ++++----
> kernel/fork.c | 2 +-
> mm/debug.c | 2 +-
> 3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 45cdb27791a3..36f4ec589544 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -493,7 +493,7 @@ struct mm_struct {
> * can move process memory needs to flush the TLB when moving a
> * PROT_NONE or PROT_NUMA mapped page.
> */
> - bool tlb_flush_pending;
> + atomic_t tlb_flush_pending;
> #endif
> struct uprobes_state uprobes_state;
> #ifdef CONFIG_HUGETLB_PAGE
> @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
> {
> barrier();
> - return mm->tlb_flush_pending;
> + return atomic_read(&mm->tlb_flush_pending) > 0;
> }
> static inline void set_tlb_flush_pending(struct mm_struct *mm)
> {
> - mm->tlb_flush_pending = true;
> + atomic_inc(&mm->tlb_flush_pending);
>
> /*
> * Guarantee that the tlb_flush_pending store does not leak into the
> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
> static inline void clear_tlb_flush_pending(struct mm_struct *mm)
> {
> barrier();
> - mm->tlb_flush_pending = false;
> + atomic_dec(&mm->tlb_flush_pending);
> }
> #else
> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index e53770d2bf95..5a7ecfbb7420 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -809,7 +809,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> mm_init_aio(mm);
> mm_init_owner(mm, p);
> mmu_notifier_mm_init(mm);
> - clear_tlb_flush_pending(mm);
> + atomic_set(&mm->tlb_flush_pending, 0);
> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
> mm->pmd_huge_pte = NULL;
> #endif
> diff --git a/mm/debug.c b/mm/debug.c
> index db1cd26d8752..d70103bb4731 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm)
> mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
> #endif
> #if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
> - mm->tlb_flush_pending,
> + atomic_read(&mm->tlb_flush_pending),
> #endif
> mm->def_flags, &mm->def_flags
> );
> --
> 2.11.0
Andrew, are there any reservations regarding this patch (excluding those of
Andy’s which I think I addressed)?
Thanks,
Nadav
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
2017-07-17 18:02 [PATCH] mm: Prevent racy access to tlb_flush_pending Nadav Amit
2017-07-18 1:31 ` Andy Lutomirski
2017-07-24 19:50 ` Nadav Amit
@ 2017-07-24 23:54 ` Andrew Morton
2017-07-25 0:27 ` Nadav Amit
2 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2017-07-24 23:54 UTC (permalink / raw)
To: Nadav Amit; +Cc: linux-mm, nadav.amit, mgorman, riel, luto
On Mon, 17 Jul 2017 11:02:46 -0700 Nadav Amit <namit@vmware.com> wrote:
> Setting and clearing mm->tlb_flush_pending can be performed by multiple
> threads, since mmap_sem may only be acquired for read in task_numa_work.
> If this happens, tlb_flush_pending may be cleared while one of the
> threads still changes PTEs and batches TLB flushes.
>
> As a result, TLB flushes can be skipped because the indication of
> pending TLB flushes is lost, for instance due to race between
> migration and change_protection_range (just as in the scenario that
> caused the introduction of tlb_flush_pending).
>
> The feasibility of such a scenario was confirmed by adding assertion to
> check tlb_flush_pending is not set by two threads, adding artificial
> latency in change_protection_range() and using sysctl to reduce
> kernel.numa_balancing_scan_delay_ms.
>
> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
> change_protection_range")
>
The changelog doesn't describe the user-visible effects of the bug (it
should always do so, please). But it is presumably a data-corruption
bug so I suggest that a -stable backport is warranted?
It has been there for 4 years so I'm thinking we can hold off a
mainline (and hence -stable) merge until 4.13-rc1, yes?
One thought:
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
>
> ...
>
> @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
> {
> barrier();
> - return mm->tlb_flush_pending;
> + return atomic_read(&mm->tlb_flush_pending) > 0;
> }
> static inline void set_tlb_flush_pending(struct mm_struct *mm)
> {
> - mm->tlb_flush_pending = true;
> + atomic_inc(&mm->tlb_flush_pending);
>
> /*
> * Guarantee that the tlb_flush_pending store does not leak into the
> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
> static inline void clear_tlb_flush_pending(struct mm_struct *mm)
> {
> barrier();
> - mm->tlb_flush_pending = false;
> + atomic_dec(&mm->tlb_flush_pending);
> }
> #else
Do we still need the barrier()s or is it OK to let the atomic op do
that for us (with a suitable code comment).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
2017-07-24 23:54 ` Andrew Morton
@ 2017-07-25 0:27 ` Nadav Amit
2017-07-25 0:33 ` Nadav Amit
2017-07-25 9:49 ` Mel Gorman
0 siblings, 2 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-25 0:27 UTC (permalink / raw)
To: Andrew Morton, Mel Gorman
Cc: Nadav Amit, open list:MEMORY MANAGEMENT, Rik van Riel, Andy Lutomirski
Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 17 Jul 2017 11:02:46 -0700 Nadav Amit <namit@vmware.com> wrote:
>
>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>> If this happens, tlb_flush_pending may be cleared while one of the
>> threads still changes PTEs and batches TLB flushes.
>>
>> As a result, TLB flushes can be skipped because the indication of
>> pending TLB flushes is lost, for instance due to race between
>> migration and change_protection_range (just as in the scenario that
>> caused the introduction of tlb_flush_pending).
>>
>> The feasibility of such a scenario was confirmed by adding assertion to
>> check tlb_flush_pending is not set by two threads, adding artificial
>> latency in change_protection_range() and using sysctl to reduce
>> kernel.numa_balancing_scan_delay_ms.
>>
>> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
>> change_protection_range")
>
> The changelog doesn't describe the user-visible effects of the bug (it
> should always do so, please). But it is presumably a data-corruption
> bug so I suggest that a -stable backport is warranted?
Yes, although I did not encounter an actual memory corruption.
>
> It has been there for 4 years so I'm thinking we can hold off a
> mainline (and hence -stable) merge until 4.13-rc1, yes?
>
>
> One thought:
>
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>>
>> ...
>>
>> @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
>> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
>> {
>> barrier();
>> - return mm->tlb_flush_pending;
>> + return atomic_read(&mm->tlb_flush_pending) > 0;
>> }
>> static inline void set_tlb_flush_pending(struct mm_struct *mm)
>> {
>> - mm->tlb_flush_pending = true;
>> + atomic_inc(&mm->tlb_flush_pending);
>>
>> /*
>> * Guarantee that the tlb_flush_pending store does not leak into the
>> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
>> static inline void clear_tlb_flush_pending(struct mm_struct *mm)
>> {
>> barrier();
>> - mm->tlb_flush_pending = false;
>> + atomic_dec(&mm->tlb_flush_pending);
>> }
>> #else
>
> Do we still need the barrier()s or is it OK to let the atomic op do
> that for us (with a suitable code comment).
I will submit v2. However, I really don’t understand the comment on
mm_tlb_flush_pending():
/*
* Memory barriers to keep this state in sync are graciously provided by
* the page table locks, outside of which no page table modifications happen.
* The barriers below prevent the compiler from re-ordering the instructions
* around the memory barriers that are already present in the code.
*/
But IIUC migrate_misplaced_transhuge_page() does not call
mm_tlb_flush_pending() while the ptl is taken.
Mel, can I bother you again? Should I move the flush in
migrate_misplaced_transhuge_page() till after the ptl is taken?
Thanks,
Nadav
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
2017-07-25 0:27 ` Nadav Amit
@ 2017-07-25 0:33 ` Nadav Amit
2017-07-25 9:49 ` Mel Gorman
1 sibling, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2017-07-25 0:33 UTC (permalink / raw)
To: Andrew Morton, Mel Gorman
Cc: Nadav Amit, open list:MEMORY MANAGEMENT, Rik van Riel, Andy Lutomirski
Nadav Amit <nadav.amit@gmail.com> wrote:
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> On Mon, 17 Jul 2017 11:02:46 -0700 Nadav Amit <namit@vmware.com> wrote:
>>
>>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>>> If this happens, tlb_flush_pending may be cleared while one of the
>>> threads still changes PTEs and batches TLB flushes.
>>>
>>> As a result, TLB flushes can be skipped because the indication of
>>> pending TLB flushes is lost, for instance due to race between
>>> migration and change_protection_range (just as in the scenario that
>>> caused the introduction of tlb_flush_pending).
>>>
>>> The feasibility of such a scenario was confirmed by adding assertion to
>>> check tlb_flush_pending is not set by two threads, adding artificial
>>> latency in change_protection_range() and using sysctl to reduce
>>> kernel.numa_balancing_scan_delay_ms.
>>>
>>> Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
>>> change_protection_range")
>>
>> The changelog doesn't describe the user-visible effects of the bug (it
>> should always do so, please). But it is presumably a data-corruption
>> bug so I suggest that a -stable backport is warranted?
>
> Yes, although I did not encounter an actual memory corruption.
>
>> It has been there for 4 years so I'm thinking we can hold off a
>> mainline (and hence -stable) merge until 4.13-rc1, yes?
>>
>>
>> One thought:
>>
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>>
>>> ...
>>>
>>> @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
>>> static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
>>> {
>>> barrier();
>>> - return mm->tlb_flush_pending;
>>> + return atomic_read(&mm->tlb_flush_pending) > 0;
>>> }
>>> static inline void set_tlb_flush_pending(struct mm_struct *mm)
>>> {
>>> - mm->tlb_flush_pending = true;
>>> + atomic_inc(&mm->tlb_flush_pending);
>>>
>>> /*
>>> * Guarantee that the tlb_flush_pending store does not leak into the
>>> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm)
>>> static inline void clear_tlb_flush_pending(struct mm_struct *mm)
>>> {
>>> barrier();
>>> - mm->tlb_flush_pending = false;
>>> + atomic_dec(&mm->tlb_flush_pending);
>>> }
>>> #else
>>
>> Do we still need the barrier()s or is it OK to let the atomic op do
>> that for us (with a suitable code comment).
>
> I will submit v2. However, I really don’t understand the comment on
> mm_tlb_flush_pending():
>
> /*
> * Memory barriers to keep this state in sync are graciously provided by
> * the page table locks, outside of which no page table modifications happen.
> * The barriers below prevent the compiler from re-ordering the instructions
> * around the memory barriers that are already present in the code.
> */
>
> But IIUC migrate_misplaced_transhuge_page() does not call
> mm_tlb_flush_pending() while the ptl is taken.
>
> Mel, can I bother you again? Should I move the flush in
> migrate_misplaced_transhuge_page() till after the ptl is taken?
Oops: this would be obviously wrong since it would move it after
migrate_page_copy() is run. So I do need your advice whether the comment is
wrong or the implementation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: Prevent racy access to tlb_flush_pending
2017-07-25 0:27 ` Nadav Amit
2017-07-25 0:33 ` Nadav Amit
@ 2017-07-25 9:49 ` Mel Gorman
1 sibling, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2017-07-25 9:49 UTC (permalink / raw)
To: Nadav Amit
Cc: Andrew Morton, Nadav Amit, open list:MEMORY MANAGEMENT,
Rik van Riel, Andy Lutomirski
On Mon, Jul 24, 2017 at 05:27:47PM -0700, Nadav Amit wrote:
> > Do we still need the barrier()s or is it OK to let the atomic op do
> > that for us (with a suitable code comment).
>
> I will submit v2. However, I really don???t understand the comment on
> mm_tlb_flush_pending():
>
> /*
> * Memory barriers to keep this state in sync are graciously provided by
> * the page table locks, outside of which no page table modifications happen.
> * The barriers below prevent the compiler from re-ordering the instructions
> * around the memory barriers that are already present in the code.
> */
>
> But IIUC migrate_misplaced_transhuge_page() does not call
> mm_tlb_flush_pending() while the ptl is taken.
>
> Mel, can I bother you again? Should I move the flush in
> migrate_misplaced_transhuge_page() till after the ptl is taken?
>
The flush, if it's necessary, needs to happen before the copy. However,
in this particular context it shouldn't matter. In this specific context,
we must be dealing with a NUMA hinting fault which means the original PTE
is PROT_NONE, flushed and no writes are possible. If a protection update
happens during the copy in migrate_misplaced_transhuge_page then it'll
be detected in migrate_misplaced_transhuge_page by the pmd_same check and
the page copy was a waste of time but otherwise harmless.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-07-25 9:49 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-17 18:02 [PATCH] mm: Prevent racy access to tlb_flush_pending Nadav Amit
2017-07-18 1:31 ` Andy Lutomirski
2017-07-18 1:40 ` Nadav Amit
2017-07-18 4:52 ` Andy Lutomirski
2017-07-18 5:11 ` Nadav Amit
2017-07-24 19:50 ` Nadav Amit
2017-07-24 23:54 ` Andrew Morton
2017-07-25 0:27 ` Nadav Amit
2017-07-25 0:33 ` Nadav Amit
2017-07-25 9:49 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).