* [PATCH v3 0/2] mm: fixes of tlb_flush_pending races @ 2017-07-27 11:40 Nadav Amit 2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit 2017-07-27 11:40 ` [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit 0 siblings, 2 replies; 9+ messages in thread From: Nadav Amit @ 2017-07-27 11:40 UTC (permalink / raw) To: linux-mm Cc: sergey.senozhatsky, minchan, nadav.amit, mgorman, riel, luto, Nadav Amit These two patches address tlb_flush_pending issues. The first one address a race when accessing tlb_flush_pending and is the important one. The second patch addresses Andrew Morton question regarding the barriers. This patch is not really related to the first one: the atomic operations atomic_read() and atomic_inc() do not act as a memory barrier, and replacing existing barriers with smp_mb__after_atomic() did not seem beneficial. Yet, while reviewing the memory barriers around the use of tlb_flush_pending, few issues were identified. v2 -> v3: - Do not init tlb_flush_pending if it is not defined without (Sergey) - Internalize memory barriers to mm_tlb_flush_pending (Minchan) v1 -> v2: - Explain the implications of the implications of the race (Andrew) - Mark the patch that address the race as stable (Andrew) - Add another patch to clean the use of barriers (Andrew) Nadav Amit (2): mm: migrate: prevent racy access to tlb_flush_pending mm: migrate: fix barriers around tlb_flush_pending arch/arm/include/asm/pgtable.h | 3 ++- arch/arm64/include/asm/pgtable.h | 3 ++- arch/x86/include/asm/pgtable.h | 2 +- include/linux/mm_types.h | 39 +++++++++++++++++++++++++++------------ kernel/fork.c | 4 +++- mm/debug.c | 2 +- mm/migrate.c | 2 +- 7 files changed, 37 insertions(+), 18 deletions(-) -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending 2017-07-27 11:40 [PATCH v3 0/2] mm: fixes of tlb_flush_pending races Nadav Amit @ 2017-07-27 11:40 ` Nadav Amit 2017-07-28 1:34 ` Sergey Senozhatsky ` (2 more replies) 2017-07-27 11:40 ` [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit 1 sibling, 3 replies; 9+ messages in thread From: Nadav Amit @ 2017-07-27 11:40 UTC (permalink / raw) To: linux-mm Cc: sergey.senozhatsky, minchan, nadav.amit, mgorman, riel, luto, stable, Nadav Amit From: Nadav Amit <nadav.amit@gmail.com> Setting and clearing mm->tlb_flush_pending can be performed by multiple threads, since mmap_sem may only be acquired for read in task_numa_work(). If this happens, tlb_flush_pending might be cleared while one of the threads still changes PTEs and batches TLB flushes. This can lead to the same race between migration and change_protection_range() that led to the introduction of tlb_flush_pending. The result of this race was data corruption, which means that this patch also addresses a theoretically possible data corruption. An actual data corruption was not observed, yet the race was was confirmed by adding assertion to check tlb_flush_pending is not set by two threads, adding artificial latency in change_protection_range() and using sysctl to reduce kernel.numa_balancing_scan_delay_ms. Fixes: 20841405940e ("mm: fix TLB flush race between migration, and change_protection_range") Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@suse.de> --- include/linux/mm_types.h | 8 ++++---- kernel/fork.c | 4 +++- mm/debug.c | 2 +- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 45cdb27791a3..36f4ec589544 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -493,7 +493,7 @@ struct mm_struct { * can move process memory needs to flush the TLB when moving a * PROT_NONE or PROT_NUMA mapped page. */ - bool tlb_flush_pending; + atomic_t tlb_flush_pending; #endif struct uprobes_state uprobes_state; #ifdef CONFIG_HUGETLB_PAGE @@ -528,11 +528,11 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm) static inline bool mm_tlb_flush_pending(struct mm_struct *mm) { barrier(); - return mm->tlb_flush_pending; + return atomic_read(&mm->tlb_flush_pending) > 0; } static inline void set_tlb_flush_pending(struct mm_struct *mm) { - mm->tlb_flush_pending = true; + atomic_inc(&mm->tlb_flush_pending); /* * Guarantee that the tlb_flush_pending store does not leak into the @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm) static inline void clear_tlb_flush_pending(struct mm_struct *mm) { barrier(); - mm->tlb_flush_pending = false; + atomic_dec(&mm->tlb_flush_pending); } #else static inline bool mm_tlb_flush_pending(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index e53770d2bf95..d6bf35b1cf31 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -809,7 +809,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm_init_aio(mm); mm_init_owner(mm, p); mmu_notifier_mm_init(mm); - clear_tlb_flush_pending(mm); +#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) + atomic_set(&mm->tlb_flush_pending, 0); +#endif #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS mm->pmd_huge_pte = NULL; #endif diff --git a/mm/debug.c b/mm/debug.c index db1cd26d8752..d70103bb4731 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm) mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq, #endif #if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) - mm->tlb_flush_pending, + atomic_read(&mm->tlb_flush_pending), #endif mm->def_flags, &mm->def_flags ); -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending 2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit @ 2017-07-28 1:34 ` Sergey Senozhatsky 2017-07-28 1:44 ` Nadav Amit 2017-07-28 1:42 ` Sergey Senozhatsky 2017-07-28 2:28 ` Rik van Riel 2 siblings, 1 reply; 9+ messages in thread From: Sergey Senozhatsky @ 2017-07-28 1:34 UTC (permalink / raw) To: Nadav Amit Cc: linux-mm, sergey.senozhatsky, minchan, nadav.amit, mgorman, riel, luto, stable, Andrew Morton just my 5 silly cents, On (07/27/17 04:40), Nadav Amit wrote: [..] > static inline void set_tlb_flush_pending(struct mm_struct *mm) > { > - mm->tlb_flush_pending = true; > + atomic_inc(&mm->tlb_flush_pending); > > /* > * Guarantee that the tlb_flush_pending store does not leak into the > @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm) > static inline void clear_tlb_flush_pending(struct mm_struct *mm) > { > barrier(); > - mm->tlb_flush_pending = false; > + atomic_dec(&mm->tlb_flush_pending); > } so, _technically_, set_tlb_flush_pending() can be nested, right? IOW, set_tlb_flush_pending() set_tlb_flush_pending() flush_tlb_range() clear_tlb_flush_pending() clear_tlb_flush_pending() // if we miss this one, then // ->tlb_flush_pending is !clear, // even though we called // clear_tlb_flush_pending() if so then set_ and clear_ are a bit misleading names for something that does atomic_inc()/atomic_dec() internally. especially when one sees this part > - clear_tlb_flush_pending(mm); > +#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) > + atomic_set(&mm->tlb_flush_pending, 0); > +#endif so we have clear_tlb_flush_pending() function which probably should set it to 0 as the name suggests (I see what you did tho), yet still do atomic_set() under ifdef-s. well, just nitpicks. -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending 2017-07-28 1:34 ` Sergey Senozhatsky @ 2017-07-28 1:44 ` Nadav Amit 0 siblings, 0 replies; 9+ messages in thread From: Nadav Amit @ 2017-07-28 1:44 UTC (permalink / raw) To: Sergey Senozhatsky Cc: open list:MEMORY MANAGEMENT, Sergey Senozhatsky, Minchan Kim, Mel Gorman, Rik van Riel, Andy Lutomirski, stable, Andrew Morton Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > just my 5 silly cents, > > On (07/27/17 04:40), Nadav Amit wrote: > [..] >> static inline void set_tlb_flush_pending(struct mm_struct *mm) >> { >> - mm->tlb_flush_pending = true; >> + atomic_inc(&mm->tlb_flush_pending); >> >> /* >> * Guarantee that the tlb_flush_pending store does not leak into the >> @@ -544,7 +544,7 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm) >> static inline void clear_tlb_flush_pending(struct mm_struct *mm) >> { >> barrier(); >> - mm->tlb_flush_pending = false; >> + atomic_dec(&mm->tlb_flush_pending); >> } > > so, _technically_, set_tlb_flush_pending() can be nested, right? IOW, > > set_tlb_flush_pending() > set_tlb_flush_pending() > flush_tlb_range() > clear_tlb_flush_pending() > clear_tlb_flush_pending() // if we miss this one, then > // ->tlb_flush_pending is !clear, > // even though we called > // clear_tlb_flush_pending() > > if so then set_ and clear_ are a bit misleading names for something > that does atomic_inc()/atomic_dec() internally. > > especially when one sees this part > >> - clear_tlb_flush_pending(mm); >> +#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) >> + atomic_set(&mm->tlb_flush_pending, 0); >> +#endif > > so we have clear_tlb_flush_pending() function which probably should > set it to 0 as the name suggests (I see what you did tho), yet still > do atomic_set() under ifdef-s. > > well, just nitpicks. I see your point. Initially, I tried to keep exactly the same interface to reduce the number of changes, but it might be misleading. I will change the names (inc/dec_tlb_flush_pending). I will create init_tlb_flush_pending() for initialization since you ask, but Minchan's changes would likely remove the ifdef’s, making it a bit strange for a single use. Anyhow, I’ll wait to see if there any other comments and then do it for v4. Thanks, Nadav -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending 2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit 2017-07-28 1:34 ` Sergey Senozhatsky @ 2017-07-28 1:42 ` Sergey Senozhatsky 2017-07-28 2:28 ` Rik van Riel 2 siblings, 0 replies; 9+ messages in thread From: Sergey Senozhatsky @ 2017-07-28 1:42 UTC (permalink / raw) To: Nadav Amit Cc: linux-mm, sergey.senozhatsky, minchan, nadav.amit, mgorman, riel, luto, stable, Andrew Morton On (07/27/17 04:40), Nadav Amit wrote: [..] > --- a/mm/debug.c > +++ b/mm/debug.c > @@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm) > mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq, > #endif > #if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) > - mm->tlb_flush_pending, > + atomic_read(&mm->tlb_flush_pending), > #endif can we use mm_tlb_flush_pending() here and get rid of ifdef-s? /* I understand that this a -stable patch, so we can do it in a separate patch. */ -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending 2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit 2017-07-28 1:34 ` Sergey Senozhatsky 2017-07-28 1:42 ` Sergey Senozhatsky @ 2017-07-28 2:28 ` Rik van Riel 2 siblings, 0 replies; 9+ messages in thread From: Rik van Riel @ 2017-07-28 2:28 UTC (permalink / raw) To: Nadav Amit, linux-mm Cc: sergey.senozhatsky, minchan, nadav.amit, mgorman, luto, stable On Thu, 2017-07-27 at 04:40 -0700, Nadav Amit wrote: > From: Nadav Amit <nadav.amit@gmail.com> > > Setting and clearing mm->tlb_flush_pending can be performed by > multiple > threads, since mmap_sem may only be acquired for read in > task_numa_work(). If this happens, tlb_flush_pending might be cleared > while one of the threads still changes PTEs and batches TLB flushes. > > This can lead to the same race between migration and > change_protection_range() that led to the introduction of > tlb_flush_pending. The result of this race was data corruption, which > means that this patch also addresses a theoretically possible data > corruption. > > An actual data corruption was not observed, yet the race was > was confirmed by adding assertion to check tlb_flush_pending is not > set > by two threads, adding artificial latency in > change_protection_range() > and using sysctl to reduce kernel.numa_balancing_scan_delay_ms. > > Fixes: 20841405940e ("mm: fix TLB flush race between migration, and > change_protection_range") > > Cc: stable@vger.kernel.org > > Signed-off-by: Nadav Amit <namit@vmware.com> > Acked-by: Mel Gorman <mgorman@suse.de> > Acked-by: Rik van Riel <riel@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending 2017-07-27 11:40 [PATCH v3 0/2] mm: fixes of tlb_flush_pending races Nadav Amit 2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit @ 2017-07-27 11:40 ` Nadav Amit 2017-07-28 7:42 ` Mel Gorman 1 sibling, 1 reply; 9+ messages in thread From: Nadav Amit @ 2017-07-27 11:40 UTC (permalink / raw) To: linux-mm Cc: sergey.senozhatsky, minchan, nadav.amit, mgorman, riel, luto, Nadav Amit Reading tlb_flush_pending while the page-table lock is taken does not require a barrier, since the lock/unlock already acts as a barrier. Removing the barrier in mm_tlb_flush_pending() to address this issue. However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending() while the page-table lock is already released, which may present a problem on architectures with weak memory model (PPC). To deal with this case, a new parameter is added to mm_tlb_flush_pending() to indicate if it is read without the page-table lock taken, and calling smp_mb__after_unlock_lock() in this case. Signed-off-by: Nadav Amit <namit@vmware.com> --- arch/arm/include/asm/pgtable.h | 3 ++- arch/arm64/include/asm/pgtable.h | 3 ++- arch/x86/include/asm/pgtable.h | 2 +- include/linux/mm_types.h | 31 +++++++++++++++++++++++-------- mm/migrate.c | 2 +- 5 files changed, 29 insertions(+), 12 deletions(-) diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 1c462381c225..2e0608a8049d 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -223,7 +223,8 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd) #define pte_none(pte) (!pte_val(pte)) #define pte_present(pte) (pte_isset((pte), L_PTE_PRESENT)) #define pte_valid(pte) (pte_isset((pte), L_PTE_VALID)) -#define pte_accessible(mm, pte) (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte)) +#define pte_accessible(mm, pte) (mm_tlb_flush_pending(mm, true) ? \ + pte_present(pte) : pte_valid(pte)) #define pte_write(pte) (pte_isclear((pte), L_PTE_RDONLY)) #define pte_dirty(pte) (pte_isset((pte), L_PTE_DIRTY)) #define pte_young(pte) (pte_isset((pte), L_PTE_YOUNG)) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index c213fdbd056c..47f934d378ca 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -108,7 +108,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]; * remapped as PROT_NONE but are yet to be flushed from the TLB. */ #define pte_accessible(mm, pte) \ - (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte)) + (mm_tlb_flush_pending(mm, true) ? pte_present(pte) : \ + pte_valid_young(pte)) static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot) { diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index f5af95a0c6b8..da16793203dd 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -642,7 +642,7 @@ static inline bool pte_accessible(struct mm_struct *mm, pte_t a) return true; if ((pte_flags(a) & _PAGE_PROTNONE) && - mm_tlb_flush_pending(mm)) + mm_tlb_flush_pending(mm, true)) return true; return false; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 36f4ec589544..57ab8061a2c0 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -522,12 +522,21 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm) /* * Memory barriers to keep this state in sync are graciously provided by * the page table locks, outside of which no page table modifications happen. - * The barriers below prevent the compiler from re-ordering the instructions - * around the memory barriers that are already present in the code. + * The barriers are used to ensure the order between tlb_flush_pending updates, + * which happen while the lock is not taken, and the PTE updates, which happen + * while the lock is taken, are serialized. */ -static inline bool mm_tlb_flush_pending(struct mm_struct *mm) +static inline bool mm_tlb_flush_pending(struct mm_struct *mm, bool pt_locked) { - barrier(); + /* + * mm_tlb_flush_pending() is safe if it is executed while the page-table + * lock is taken. But if the lock was already released, there does not + * seem to be a guarantee that a memory barrier. A memory barrier is + * therefore needed on architectures with weak memory models. + */ + if (!pt_locked) + smp_mb__after_unlock_lock(); + return atomic_read(&mm->tlb_flush_pending) > 0; } static inline void set_tlb_flush_pending(struct mm_struct *mm) @@ -535,19 +544,25 @@ static inline void set_tlb_flush_pending(struct mm_struct *mm) atomic_inc(&mm->tlb_flush_pending); /* - * Guarantee that the tlb_flush_pending store does not leak into the + * Guarantee that the tlb_flush_pending increase does not leak into the * critical section updating the page tables */ smp_mb__before_spinlock(); } -/* Clearing is done after a TLB flush, which also provides a barrier. */ + static inline void clear_tlb_flush_pending(struct mm_struct *mm) { - barrier(); + /* + * Guarantee that the tlb_flush_pending does not not leak into the + * critical section, since we must order the PTE change and changes to + * the pending TLB flush indication. We could have relied on TLB flush + * as a memory barrier, but this behavior is not clearly documented. + */ + smp_mb__before_atomic(); atomic_dec(&mm->tlb_flush_pending); } #else -static inline bool mm_tlb_flush_pending(struct mm_struct *mm) +static inline bool mm_tlb_flush_pending(struct mm_struct *mm, bool pt_locked) { return false; } diff --git a/mm/migrate.c b/mm/migrate.c index 89a0a1707f4c..169c3165be41 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1939,7 +1939,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, * We are not sure a pending tlb flush here is for a huge page * mapping or not. Hence use the tlb range variant */ - if (mm_tlb_flush_pending(mm)) + if (mm_tlb_flush_pending(mm, false)) flush_tlb_range(vma, mmun_start, mmun_end); /* Prepare a page as a migration target */ -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending 2017-07-27 11:40 ` [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit @ 2017-07-28 7:42 ` Mel Gorman 2017-07-28 16:40 ` Nadav Amit 0 siblings, 1 reply; 9+ messages in thread From: Mel Gorman @ 2017-07-28 7:42 UTC (permalink / raw) To: Nadav Amit; +Cc: linux-mm, sergey.senozhatsky, minchan, nadav.amit, riel, luto On Thu, Jul 27, 2017 at 04:40:15AM -0700, Nadav Amit wrote: > Reading tlb_flush_pending while the page-table lock is taken does not > require a barrier, since the lock/unlock already acts as a barrier. > Removing the barrier in mm_tlb_flush_pending() to address this issue. > > However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending() > while the page-table lock is already released, which may present a > problem on architectures with weak memory model (PPC). To deal with this > case, a new parameter is added to mm_tlb_flush_pending() to indicate > if it is read without the page-table lock taken, and calling > smp_mb__after_unlock_lock() in this case. > > Signed-off-by: Nadav Amit <namit@vmware.com> Conditional locking based on function arguements are often considered extremely hazardous. Conditional barriers are even more troublesome because it's simply too easy to get wrong. Revert b0943d61b8fa420180f92f64ef67662b4f6cc493 instead of this patch. It's not a clean revert but conflicts are due to comment changes. It moves the check back under the PTL and the impact is marginal given that it a spurious TLB flush will only occur when potentially racing with change_prot_range. Since that commit went in, a lot of changes have happened that alter the scan rate of automatic NUMA balancing so it shouldn't be a serious issue. It's certainly a nicer option than using conditional barriers. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending 2017-07-28 7:42 ` Mel Gorman @ 2017-07-28 16:40 ` Nadav Amit 0 siblings, 0 replies; 9+ messages in thread From: Nadav Amit @ 2017-07-28 16:40 UTC (permalink / raw) To: Mel Gorman Cc: open list:MEMORY MANAGEMENT, Sergey Senozhatsky, Minchan Kim, Rik van Riel, luto Mel Gorman <mgorman@suse.de> wrote: > On Thu, Jul 27, 2017 at 04:40:15AM -0700, Nadav Amit wrote: >> Reading tlb_flush_pending while the page-table lock is taken does not >> require a barrier, since the lock/unlock already acts as a barrier. >> Removing the barrier in mm_tlb_flush_pending() to address this issue. >> >> However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending() >> while the page-table lock is already released, which may present a >> problem on architectures with weak memory model (PPC). To deal with this >> case, a new parameter is added to mm_tlb_flush_pending() to indicate >> if it is read without the page-table lock taken, and calling >> smp_mb__after_unlock_lock() in this case. >> >> Signed-off-by: Nadav Amit <namit@vmware.com> > > Conditional locking based on function arguements are often considered > extremely hazardous. Conditional barriers are even more troublesome because > it's simply too easy to get wrong. > > Revert b0943d61b8fa420180f92f64ef67662b4f6cc493 instead of this patch. It's > not a clean revert but conflicts are due to comment changes. It moves > the check back under the PTL and the impact is marginal given that > it a spurious TLB flush will only occur when potentially racing with > change_prot_range. Since that commit went in, a lot of changes have happened > that alter the scan rate of automatic NUMA balancing so it shouldn't be a > serious issue. It's certainly a nicer option than using conditional barriers. Ok. Initially, I added a memory barrier only in migrate_misplaced_transhuge_page(), and included a detailed comment about it - I still think it is better. But since you feel confident the impact will be relatively small, I will do the revert. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-07-28 16:40 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-27 11:40 [PATCH v3 0/2] mm: fixes of tlb_flush_pending races Nadav Amit 2017-07-27 11:40 ` [PATCH v3 1/2] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit 2017-07-28 1:34 ` Sergey Senozhatsky 2017-07-28 1:44 ` Nadav Amit 2017-07-28 1:42 ` Sergey Senozhatsky 2017-07-28 2:28 ` Rik van Riel 2017-07-27 11:40 ` [PATCH v3 2/2] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit 2017-07-28 7:42 ` Mel Gorman 2017-07-28 16:40 ` Nadav Amit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).