* [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate @ 2020-07-21 15:41 Peter Zijlstra 2020-07-21 21:06 ` Andrew Morton 2020-08-21 5:39 ` Aneesh Kumar K.V 0 siblings, 2 replies; 8+ messages in thread From: Peter Zijlstra @ 2020-07-21 15:41 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch, npiggin, mathieu.desnoyers For SMP systems using IPI based TLB invalidation, looking at current->active_mm is entirely reasonable. This then presents the following race condition: CPU0 CPU1 flush_tlb_mm(mm) use_mm(mm) <send-IPI> tsk->active_mm = mm; <IPI> if (tsk->active_mm == mm) // flush TLBs </IPI> switch_mm(old_mm,mm,tsk); Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, because the IPI lands before we actually switched. Avoid this by disabling IRQs across changing ->active_mm and switch_mm(). [ There are all sorts of reasons this might be harmless for various architecture specific reasons, but best not leave the door open at all. ] Cc: stable@kernel.org Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- Sorry, I dropped the ball on this and only found it because I was looking at the whole membarrier things vs use_mm(). kernel/kthread.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index 1d9e2fdfd67a..7221dcbffef3 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm) WARN_ON_ONCE(tsk->mm); task_lock(tsk); + local_irq_disable(); active_mm = tsk->active_mm; if (active_mm != mm) { mmgrab(mm); tsk->active_mm = mm; } tsk->mm = mm; - switch_mm(active_mm, mm, tsk); + switch_mm_irqs_off(active_mm, mm, tsk); + local_irq_enable(); task_unlock(tsk); #ifdef finish_arch_post_lock_switch finish_arch_post_lock_switch(); @@ -1276,9 +1278,11 @@ void kthread_unuse_mm(struct mm_struct *mm) task_lock(tsk); sync_mm_rss(mm); + local_irq_disable(); tsk->mm = NULL; /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); + local_irq_enable(); task_unlock(tsk); } EXPORT_SYMBOL_GPL(kthread_unuse_mm); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate 2020-07-21 15:41 [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate Peter Zijlstra @ 2020-07-21 21:06 ` Andrew Morton 2020-07-22 8:35 ` Peter Zijlstra 2020-08-21 5:39 ` Aneesh Kumar K.V 1 sibling, 1 reply; 8+ messages in thread From: Andrew Morton @ 2020-07-21 21:06 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch, npiggin, mathieu.desnoyers On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > > For SMP systems using IPI based TLB invalidation, looking at > current->active_mm is entirely reasonable. This then presents the > following race condition: > > > CPU0 CPU1 > > flush_tlb_mm(mm) use_mm(mm) > <send-IPI> > tsk->active_mm = mm; > <IPI> > if (tsk->active_mm == mm) > // flush TLBs > </IPI> > switch_mm(old_mm,mm,tsk); > > > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, > because the IPI lands before we actually switched. > > Avoid this by disabling IRQs across changing ->active_mm and > switch_mm(). > > [ There are all sorts of reasons this might be harmless for various > architecture specific reasons, but best not leave the door open at > all. ] Can we give the -stable maintainers (and others) more explanation of why they might choose to merge this? > ... > > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm) > WARN_ON_ONCE(tsk->mm); > > task_lock(tsk); > + local_irq_disable(); A bare local_irq_disable() is one of those "what the heck is this protecting" things. It's the new lock_kernel(). So a little comment will help readers to understand why we did it. Something like this? --- a/kernel/kthread.c~mm-fix-kthread_use_mm-vs-tlb-invalidate-fix +++ a/kernel/kthread.c @@ -1239,6 +1239,7 @@ void kthread_use_mm(struct mm_struct *mm WARN_ON_ONCE(tsk->mm); task_lock(tsk); + /* Hold off tlb flush IPIs while switching mm's */ local_irq_disable(); active_mm = tsk->active_mm; if (active_mm != mm) { _ > active_mm = tsk->active_mm; > if (active_mm != mm) { > mmgrab(mm); > tsk->active_mm = mm; > } > tsk->mm = mm; > - switch_mm(active_mm, mm, tsk); > + switch_mm_irqs_off(active_mm, mm, tsk); > + local_irq_enable(); > task_unlock(tsk); > #ifdef finish_arch_post_lock_switch > finish_arch_post_lock_switch(); > > ... > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate 2020-07-21 21:06 ` Andrew Morton @ 2020-07-22 8:35 ` Peter Zijlstra 2020-07-23 7:15 ` Nicholas Piggin 0 siblings, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2020-07-22 8:35 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch, npiggin, mathieu.desnoyers On Tue, Jul 21, 2020 at 02:06:23PM -0700, Andrew Morton wrote: > On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > > > > > For SMP systems using IPI based TLB invalidation, looking at > > current->active_mm is entirely reasonable. This then presents the > > following race condition: > > > > > > CPU0 CPU1 > > > > flush_tlb_mm(mm) use_mm(mm) > > <send-IPI> > > tsk->active_mm = mm; > > <IPI> > > if (tsk->active_mm == mm) > > // flush TLBs > > </IPI> > > switch_mm(old_mm,mm,tsk); > > > > > > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, > > because the IPI lands before we actually switched. > > > > Avoid this by disabling IRQs across changing ->active_mm and > > switch_mm(). > > > > [ There are all sorts of reasons this might be harmless for various > > architecture specific reasons, but best not leave the door open at > > all. ] > > Can we give the -stable maintainers (and others) more explanation of > why they might choose to merge this? Like so then? --- Subject: mm: Fix kthread_use_mm() vs TLB invalidate From: Peter Zijlstra <peterz@infradead.org> Date: Tue, 11 Feb 2020 10:25:19 +0100 For SMP systems using IPI based TLB invalidation, looking at current->active_mm is entirely reasonable. This then presents the following race condition: CPU0 CPU1 flush_tlb_mm(mm) use_mm(mm) <send-IPI> tsk->active_mm = mm; <IPI> if (tsk->active_mm == mm) // flush TLBs </IPI> switch_mm(old_mm,mm,tsk); Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, because the IPI lands before we actually switched. Avoid this by disabling IRQs across changing ->active_mm and switch_mm(). Of the (SMP) architectures that have IPI based TLB invalidate: Alpha - checks active_mm ARC - ASID specific IA64 - checks active_mm MIPS - ASID specific flush OpenRISC - shoots down world PARISC - shoots down world SH - ASID specific SPARC - ASID specific x86 - N/A xtensa - checks active_mm So at the very least Alpha, IA64 and Xtensa are suspect. On top of this, for scheduler consistency we need at least preemption disabled across changing tsk->mm and doing switch_mm(), which is currently provided by task_lock(), but that's not sufficient for PREEMPT_RT. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org --- kernel/kthread.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1241,13 +1241,20 @@ void kthread_use_mm(struct mm_struct *mm WARN_ON_ONCE(tsk->mm); task_lock(tsk); + /* + * Serialize the tsk->mm store and switch_mm() against TLB invalidation + * IPIs. Also make sure we're non-preemptible on PREEMPT_RT to not race + * against the scheduler writing to these variables. + */ + local_irq_disable(); active_mm = tsk->active_mm; if (active_mm != mm) { mmgrab(mm); tsk->active_mm = mm; } tsk->mm = mm; - switch_mm(active_mm, mm, tsk); + switch_mm_irqs_off(active_mm, mm, tsk); + local_irq_enable(); task_unlock(tsk); #ifdef finish_arch_post_lock_switch finish_arch_post_lock_switch(); @@ -1276,9 +1283,11 @@ void kthread_unuse_mm(struct mm_struct * task_lock(tsk); sync_mm_rss(mm); + local_irq_disable(); tsk->mm = NULL; /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); + local_irq_enable(); task_unlock(tsk); } EXPORT_SYMBOL_GPL(kthread_unuse_mm); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate 2020-07-22 8:35 ` Peter Zijlstra @ 2020-07-23 7:15 ` Nicholas Piggin 0 siblings, 0 replies; 8+ messages in thread From: Nicholas Piggin @ 2020-07-23 7:15 UTC (permalink / raw) To: Andrew Morton, Peter Zijlstra Cc: axboe, hch, jannh, keescook, linux-kernel, luto, mathieu.desnoyers, torvalds, will Excerpts from Peter Zijlstra's message of July 22, 2020 6:35 pm: > On Tue, Jul 21, 2020 at 02:06:23PM -0700, Andrew Morton wrote: >> On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <peterz@infradead.org> wrote: >> >> > >> > For SMP systems using IPI based TLB invalidation, looking at >> > current->active_mm is entirely reasonable. This then presents the >> > following race condition: >> > >> > >> > CPU0 CPU1 >> > >> > flush_tlb_mm(mm) use_mm(mm) >> > <send-IPI> >> > tsk->active_mm = mm; >> > <IPI> >> > if (tsk->active_mm == mm) >> > // flush TLBs >> > </IPI> >> > switch_mm(old_mm,mm,tsk); >> > >> > >> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, >> > because the IPI lands before we actually switched. >> > >> > Avoid this by disabling IRQs across changing ->active_mm and >> > switch_mm(). >> > >> > [ There are all sorts of reasons this might be harmless for various >> > architecture specific reasons, but best not leave the door open at >> > all. ] >> >> Can we give the -stable maintainers (and others) more explanation of >> why they might choose to merge this? > > Like so then? > > --- > Subject: mm: Fix kthread_use_mm() vs TLB invalidate > From: Peter Zijlstra <peterz@infradead.org> > Date: Tue, 11 Feb 2020 10:25:19 +0100 > > For SMP systems using IPI based TLB invalidation, looking at > current->active_mm is entirely reasonable. This then presents the > following race condition: > > > CPU0 CPU1 > > flush_tlb_mm(mm) use_mm(mm) > <send-IPI> > tsk->active_mm = mm; > <IPI> > if (tsk->active_mm == mm) > // flush TLBs > </IPI> > switch_mm(old_mm,mm,tsk); > > > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, > because the IPI lands before we actually switched. > > Avoid this by disabling IRQs across changing ->active_mm and > switch_mm(). > > Of the (SMP) architectures that have IPI based TLB invalidate: > > Alpha - checks active_mm > ARC - ASID specific > IA64 - checks active_mm > MIPS - ASID specific flush > OpenRISC - shoots down world > PARISC - shoots down world > SH - ASID specific > SPARC - ASID specific > x86 - N/A > xtensa - checks active_mm > > So at the very least Alpha, IA64 and Xtensa are suspect. > > On top of this, for scheduler consistency we need at least preemption > disabled across changing tsk->mm and doing switch_mm(), which is > currently provided by task_lock(), but that's not sufficient for > PREEMPT_RT. > > Reported-by: Andy Lutomirski <luto@amacapital.net> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: stable@kernel.org > --- > kernel/kthread.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -1241,13 +1241,20 @@ void kthread_use_mm(struct mm_struct *mm > WARN_ON_ONCE(tsk->mm); > > task_lock(tsk); > + /* > + * Serialize the tsk->mm store and switch_mm() against TLB invalidation > + * IPIs. Also make sure we're non-preemptible on PREEMPT_RT to not race > + * against the scheduler writing to these variables. > + */ > + local_irq_disable(); > active_mm = tsk->active_mm; > if (active_mm != mm) { > mmgrab(mm); > tsk->active_mm = mm; > } > tsk->mm = mm; > - switch_mm(active_mm, mm, tsk); > + switch_mm_irqs_off(active_mm, mm, tsk); > + local_irq_enable(); > task_unlock(tsk); > #ifdef finish_arch_post_lock_switch > finish_arch_post_lock_switch(); > @@ -1276,9 +1283,11 @@ void kthread_unuse_mm(struct mm_struct * > > task_lock(tsk); > sync_mm_rss(mm); > + local_irq_disable(); > tsk->mm = NULL; > /* active_mm is still 'mm' */ > enter_lazy_tlb(mm, tsk); > + local_irq_enable(); > task_unlock(tsk); > } > EXPORT_SYMBOL_GPL(kthread_unuse_mm); > Oh good, this is also needed as part of my preferred fix for the io_uring mmget_not_zero->use_mm() vs mm_cpumask problem https://marc.info/?l=linux-mm&m=159520550112106&w=2 I'll try to do arch fixes on top of this (I have the same hunks locally!). After that, we should be able to allow mmget_not_zero to be first class references to mm AFAIKS. Thanks, Nick ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate 2020-07-21 15:41 [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate Peter Zijlstra 2020-07-21 21:06 ` Andrew Morton @ 2020-08-21 5:39 ` Aneesh Kumar K.V 2020-08-21 13:04 ` peterz 1 sibling, 1 reply; 8+ messages in thread From: Aneesh Kumar K.V @ 2020-08-21 5:39 UTC (permalink / raw) To: Peter Zijlstra, Andrew Morton Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch, npiggin, mathieu.desnoyers Peter Zijlstra <peterz@infradead.org> writes: > For SMP systems using IPI based TLB invalidation, looking at > current->active_mm is entirely reasonable. This then presents the > following race condition: > > > CPU0 CPU1 > > flush_tlb_mm(mm) use_mm(mm) > <send-IPI> > tsk->active_mm = mm; > <IPI> > if (tsk->active_mm == mm) > // flush TLBs > </IPI> > switch_mm(old_mm,mm,tsk); > > > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, > because the IPI lands before we actually switched. > > Avoid this by disabling IRQs across changing ->active_mm and > switch_mm(). > > [ There are all sorts of reasons this might be harmless for various > architecture specific reasons, but best not leave the door open at > all. ] Do we have similar race with exec_mmap()? I am looking at exec_mmap() runnning parallel to do_exit_flush_lazy_tlb(). We can get if (current->active_mm == mm) { true and if we don't disable irq around updating tsk->mm/active_mm we can end up doing mmdrop on wrong mm? > > Cc: stable@kernel.org > Reported-by: Andy Lutomirski <luto@amacapital.net> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > > Sorry, I dropped the ball on this and only found it because I was > looking at the whole membarrier things vs use_mm(). > > > kernel/kthread.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/kernel/kthread.c b/kernel/kthread.c > index 1d9e2fdfd67a..7221dcbffef3 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm) > WARN_ON_ONCE(tsk->mm); > > task_lock(tsk); > + local_irq_disable(); > active_mm = tsk->active_mm; > if (active_mm != mm) { > mmgrab(mm); > tsk->active_mm = mm; > } > tsk->mm = mm; > - switch_mm(active_mm, mm, tsk); > + switch_mm_irqs_off(active_mm, mm, tsk); > + local_irq_enable(); > task_unlock(tsk); > #ifdef finish_arch_post_lock_switch > finish_arch_post_lock_switch(); > @@ -1276,9 +1278,11 @@ void kthread_unuse_mm(struct mm_struct *mm) > > task_lock(tsk); > sync_mm_rss(mm); > + local_irq_disable(); > tsk->mm = NULL; > /* active_mm is still 'mm' */ > enter_lazy_tlb(mm, tsk); > + local_irq_enable(); > task_unlock(tsk); > } > EXPORT_SYMBOL_GPL(kthread_unuse_mm); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate 2020-08-21 5:39 ` Aneesh Kumar K.V @ 2020-08-21 13:04 ` peterz 2020-08-28 3:26 ` Nicholas Piggin 0 siblings, 1 reply; 8+ messages in thread From: peterz @ 2020-08-21 13:04 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: Andrew Morton, linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch, npiggin, mathieu.desnoyers On Fri, Aug 21, 2020 at 11:09:51AM +0530, Aneesh Kumar K.V wrote: > Peter Zijlstra <peterz@infradead.org> writes: > > > For SMP systems using IPI based TLB invalidation, looking at > > current->active_mm is entirely reasonable. This then presents the > > following race condition: > > > > > > CPU0 CPU1 > > > > flush_tlb_mm(mm) use_mm(mm) > > <send-IPI> > > tsk->active_mm = mm; > > <IPI> > > if (tsk->active_mm == mm) > > // flush TLBs > > </IPI> > > switch_mm(old_mm,mm,tsk); > > > > > > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, > > because the IPI lands before we actually switched. > > > > Avoid this by disabling IRQs across changing ->active_mm and > > switch_mm(). > > > > [ There are all sorts of reasons this might be harmless for various > > architecture specific reasons, but best not leave the door open at > > all. ] > > > Do we have similar race with exec_mmap()? I am looking at exec_mmap() > runnning parallel to do_exit_flush_lazy_tlb(). We can get > > if (current->active_mm == mm) { > > true and if we don't disable irq around updating tsk->mm/active_mm we > can end up doing mmdrop on wrong mm? exec_mmap() is called after de_thread(), there should not be any mm specific invalidations around I think. Then again, CLONE_VM without CLONE_THREAD might still be possible, so yeah, we probably want IRQs disabled there too, just for consistency and general paranoia if nothing else. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate 2020-08-21 13:04 ` peterz @ 2020-08-28 3:26 ` Nicholas Piggin 2020-08-28 6:55 ` Nicholas Piggin 0 siblings, 1 reply; 8+ messages in thread From: Nicholas Piggin @ 2020-08-28 3:26 UTC (permalink / raw) To: Aneesh Kumar K.V, peterz, linux-arch Cc: Andrew Morton, axboe, hch, jannh, keescook, linux-kernel, luto, mathieu.desnoyers, torvalds, will Excerpts from peterz@infradead.org's message of August 21, 2020 11:04 pm: > On Fri, Aug 21, 2020 at 11:09:51AM +0530, Aneesh Kumar K.V wrote: >> Peter Zijlstra <peterz@infradead.org> writes: >> >> > For SMP systems using IPI based TLB invalidation, looking at >> > current->active_mm is entirely reasonable. This then presents the >> > following race condition: >> > >> > >> > CPU0 CPU1 >> > >> > flush_tlb_mm(mm) use_mm(mm) >> > <send-IPI> >> > tsk->active_mm = mm; >> > <IPI> >> > if (tsk->active_mm == mm) >> > // flush TLBs >> > </IPI> >> > switch_mm(old_mm,mm,tsk); >> > >> > >> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, >> > because the IPI lands before we actually switched. >> > >> > Avoid this by disabling IRQs across changing ->active_mm and >> > switch_mm(). >> > >> > [ There are all sorts of reasons this might be harmless for various >> > architecture specific reasons, but best not leave the door open at >> > all. ] >> >> >> Do we have similar race with exec_mmap()? I am looking at exec_mmap() >> runnning parallel to do_exit_flush_lazy_tlb(). We can get >> >> if (current->active_mm == mm) { >> >> true and if we don't disable irq around updating tsk->mm/active_mm we >> can end up doing mmdrop on wrong mm? > > exec_mmap() is called after de_thread(), there should not be any mm > specific invalidations around I think. > > Then again, CLONE_VM without CLONE_THREAD might still be possible, so > yeah, we probably want IRQs disabled there too, just for consistency and > general paranoia if nothing else. The problem is probably not this TLB flushing race, but I think there is a lazy tlb race. call_usermodehelper() kernel_execve() old_mm = current->mm; active_mm = current->active_mm; *** preempt *** ---------------------->schedule() prev->active_mm = NULL; mmdrop(prev active mm) ... <----------------------schedule() current->mm = mm; current->active_mm = mm; if (!old_mm) mmdrop(active_mm); /* double free! */ There's possibly other problematic interleavings. powerpc also has an issue with switching away a lazy tlb mm via IPI which is basically the same problem so I just illustrate the more general issue. I think we just make it a rule that these always get updated under local_irq_disable, to be safe. Trouble is we can't just do it, because some architectures can't do activate_mm with irqs disabled. ARM and UM, at least. UM can't even do preempt_disabled. We can probably change them to make them work, I'm not sure what the best way to go is, my first attempt is to require activate_mm to do the mm switching and the irq disable as well, but I'll need some help from the archs I'll send out rfcs in a minute. Thanks, Nick ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate 2020-08-28 3:26 ` Nicholas Piggin @ 2020-08-28 6:55 ` Nicholas Piggin 0 siblings, 0 replies; 8+ messages in thread From: Nicholas Piggin @ 2020-08-28 6:55 UTC (permalink / raw) To: Aneesh Kumar K.V, linux-arch, peterz Cc: Andrew Morton, axboe, hch, jannh, keescook, linux-kernel, luto, mathieu.desnoyers, torvalds, will Excerpts from Nicholas Piggin's message of August 28, 2020 1:26 pm: > Excerpts from peterz@infradead.org's message of August 21, 2020 11:04 pm: >> On Fri, Aug 21, 2020 at 11:09:51AM +0530, Aneesh Kumar K.V wrote: >>> Peter Zijlstra <peterz@infradead.org> writes: >>> >>> > For SMP systems using IPI based TLB invalidation, looking at >>> > current->active_mm is entirely reasonable. This then presents the >>> > following race condition: >>> > >>> > >>> > CPU0 CPU1 >>> > >>> > flush_tlb_mm(mm) use_mm(mm) >>> > <send-IPI> >>> > tsk->active_mm = mm; >>> > <IPI> >>> > if (tsk->active_mm == mm) >>> > // flush TLBs >>> > </IPI> >>> > switch_mm(old_mm,mm,tsk); >>> > >>> > >>> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, >>> > because the IPI lands before we actually switched. >>> > >>> > Avoid this by disabling IRQs across changing ->active_mm and >>> > switch_mm(). >>> > >>> > [ There are all sorts of reasons this might be harmless for various >>> > architecture specific reasons, but best not leave the door open at >>> > all. ] >>> >>> >>> Do we have similar race with exec_mmap()? I am looking at exec_mmap() >>> runnning parallel to do_exit_flush_lazy_tlb(). We can get >>> >>> if (current->active_mm == mm) { >>> >>> true and if we don't disable irq around updating tsk->mm/active_mm we >>> can end up doing mmdrop on wrong mm? >> >> exec_mmap() is called after de_thread(), there should not be any mm >> specific invalidations around I think. >> >> Then again, CLONE_VM without CLONE_THREAD might still be possible, so >> yeah, we probably want IRQs disabled there too, just for consistency and >> general paranoia if nothing else. > > The problem is probably not this TLB flushing race, but I think there > is a lazy tlb race. Hmm, is it possible for something to be holding the mm_users when we exec? That could actually make it a problem for TLB flushing too. Thanks, Nick ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-08-28 6:55 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-21 15:41 [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate Peter Zijlstra 2020-07-21 21:06 ` Andrew Morton 2020-07-22 8:35 ` Peter Zijlstra 2020-07-23 7:15 ` Nicholas Piggin 2020-08-21 5:39 ` Aneesh Kumar K.V 2020-08-21 13:04 ` peterz 2020-08-28 3:26 ` Nicholas Piggin 2020-08-28 6:55 ` Nicholas Piggin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.