* [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
@ 2020-07-21 15:41 Peter Zijlstra
2020-07-21 21:06 ` Andrew Morton
2020-08-21 5:39 ` Aneesh Kumar K.V
0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2020-07-21 15:41 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch,
npiggin, mathieu.desnoyers
For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:
CPU0 CPU1
flush_tlb_mm(mm) use_mm(mm)
<send-IPI>
tsk->active_mm = mm;
<IPI>
if (tsk->active_mm == mm)
// flush TLBs
</IPI>
switch_mm(old_mm,mm,tsk);
Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.
Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().
[ There are all sorts of reasons this might be harmless for various
architecture specific reasons, but best not leave the door open at
all. ]
Cc: stable@kernel.org
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
Sorry, I dropped the ball on this and only found it because I was
looking at the whole membarrier things vs use_mm().
kernel/kthread.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1d9e2fdfd67a..7221dcbffef3 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm)
WARN_ON_ONCE(tsk->mm);
task_lock(tsk);
+ local_irq_disable();
active_mm = tsk->active_mm;
if (active_mm != mm) {
mmgrab(mm);
tsk->active_mm = mm;
}
tsk->mm = mm;
- switch_mm(active_mm, mm, tsk);
+ switch_mm_irqs_off(active_mm, mm, tsk);
+ local_irq_enable();
task_unlock(tsk);
#ifdef finish_arch_post_lock_switch
finish_arch_post_lock_switch();
@@ -1276,9 +1278,11 @@ void kthread_unuse_mm(struct mm_struct *mm)
task_lock(tsk);
sync_mm_rss(mm);
+ local_irq_disable();
tsk->mm = NULL;
/* active_mm is still 'mm' */
enter_lazy_tlb(mm, tsk);
+ local_irq_enable();
task_unlock(tsk);
}
EXPORT_SYMBOL_GPL(kthread_unuse_mm);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
2020-07-21 15:41 [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate Peter Zijlstra
@ 2020-07-21 21:06 ` Andrew Morton
2020-07-22 8:35 ` Peter Zijlstra
2020-08-21 5:39 ` Aneesh Kumar K.V
1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2020-07-21 21:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch,
npiggin, mathieu.desnoyers
On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <peterz@infradead.org> wrote:
>
> For SMP systems using IPI based TLB invalidation, looking at
> current->active_mm is entirely reasonable. This then presents the
> following race condition:
>
>
> CPU0 CPU1
>
> flush_tlb_mm(mm) use_mm(mm)
> <send-IPI>
> tsk->active_mm = mm;
> <IPI>
> if (tsk->active_mm == mm)
> // flush TLBs
> </IPI>
> switch_mm(old_mm,mm,tsk);
>
>
> Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> because the IPI lands before we actually switched.
>
> Avoid this by disabling IRQs across changing ->active_mm and
> switch_mm().
>
> [ There are all sorts of reasons this might be harmless for various
> architecture specific reasons, but best not leave the door open at
> all. ]
Can we give the -stable maintainers (and others) more explanation of
why they might choose to merge this?
> ...
>
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm)
> WARN_ON_ONCE(tsk->mm);
>
> task_lock(tsk);
> + local_irq_disable();
A bare local_irq_disable() is one of those "what the heck is this
protecting" things. It's the new lock_kernel().
So a little comment will help readers to understand why we did it.
Something like this?
--- a/kernel/kthread.c~mm-fix-kthread_use_mm-vs-tlb-invalidate-fix
+++ a/kernel/kthread.c
@@ -1239,6 +1239,7 @@ void kthread_use_mm(struct mm_struct *mm
WARN_ON_ONCE(tsk->mm);
task_lock(tsk);
+ /* Hold off tlb flush IPIs while switching mm's */
local_irq_disable();
active_mm = tsk->active_mm;
if (active_mm != mm) {
_
> active_mm = tsk->active_mm;
> if (active_mm != mm) {
> mmgrab(mm);
> tsk->active_mm = mm;
> }
> tsk->mm = mm;
> - switch_mm(active_mm, mm, tsk);
> + switch_mm_irqs_off(active_mm, mm, tsk);
> + local_irq_enable();
> task_unlock(tsk);
> #ifdef finish_arch_post_lock_switch
> finish_arch_post_lock_switch();
>
> ...
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
2020-07-21 21:06 ` Andrew Morton
@ 2020-07-22 8:35 ` Peter Zijlstra
2020-07-23 7:15 ` Nicholas Piggin
0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2020-07-22 8:35 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch,
npiggin, mathieu.desnoyers
On Tue, Jul 21, 2020 at 02:06:23PM -0700, Andrew Morton wrote:
> On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <peterz@infradead.org> wrote:
>
> >
> > For SMP systems using IPI based TLB invalidation, looking at
> > current->active_mm is entirely reasonable. This then presents the
> > following race condition:
> >
> >
> > CPU0 CPU1
> >
> > flush_tlb_mm(mm) use_mm(mm)
> > <send-IPI>
> > tsk->active_mm = mm;
> > <IPI>
> > if (tsk->active_mm == mm)
> > // flush TLBs
> > </IPI>
> > switch_mm(old_mm,mm,tsk);
> >
> >
> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> > because the IPI lands before we actually switched.
> >
> > Avoid this by disabling IRQs across changing ->active_mm and
> > switch_mm().
> >
> > [ There are all sorts of reasons this might be harmless for various
> > architecture specific reasons, but best not leave the door open at
> > all. ]
>
> Can we give the -stable maintainers (and others) more explanation of
> why they might choose to merge this?
Like so then?
---
Subject: mm: Fix kthread_use_mm() vs TLB invalidate
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 11 Feb 2020 10:25:19 +0100
For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:
CPU0 CPU1
flush_tlb_mm(mm) use_mm(mm)
<send-IPI>
tsk->active_mm = mm;
<IPI>
if (tsk->active_mm == mm)
// flush TLBs
</IPI>
switch_mm(old_mm,mm,tsk);
Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.
Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().
Of the (SMP) architectures that have IPI based TLB invalidate:
Alpha - checks active_mm
ARC - ASID specific
IA64 - checks active_mm
MIPS - ASID specific flush
OpenRISC - shoots down world
PARISC - shoots down world
SH - ASID specific
SPARC - ASID specific
x86 - N/A
xtensa - checks active_mm
So at the very least Alpha, IA64 and Xtensa are suspect.
On top of this, for scheduler consistency we need at least preemption
disabled across changing tsk->mm and doing switch_mm(), which is
currently provided by task_lock(), but that's not sufficient for
PREEMPT_RT.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
---
kernel/kthread.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1241,13 +1241,20 @@ void kthread_use_mm(struct mm_struct *mm
WARN_ON_ONCE(tsk->mm);
task_lock(tsk);
+ /*
+ * Serialize the tsk->mm store and switch_mm() against TLB invalidation
+ * IPIs. Also make sure we're non-preemptible on PREEMPT_RT to not race
+ * against the scheduler writing to these variables.
+ */
+ local_irq_disable();
active_mm = tsk->active_mm;
if (active_mm != mm) {
mmgrab(mm);
tsk->active_mm = mm;
}
tsk->mm = mm;
- switch_mm(active_mm, mm, tsk);
+ switch_mm_irqs_off(active_mm, mm, tsk);
+ local_irq_enable();
task_unlock(tsk);
#ifdef finish_arch_post_lock_switch
finish_arch_post_lock_switch();
@@ -1276,9 +1283,11 @@ void kthread_unuse_mm(struct mm_struct *
task_lock(tsk);
sync_mm_rss(mm);
+ local_irq_disable();
tsk->mm = NULL;
/* active_mm is still 'mm' */
enter_lazy_tlb(mm, tsk);
+ local_irq_enable();
task_unlock(tsk);
}
EXPORT_SYMBOL_GPL(kthread_unuse_mm);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
2020-07-22 8:35 ` Peter Zijlstra
@ 2020-07-23 7:15 ` Nicholas Piggin
0 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-07-23 7:15 UTC (permalink / raw)
To: Andrew Morton, Peter Zijlstra
Cc: axboe, hch, jannh, keescook, linux-kernel, luto,
mathieu.desnoyers, torvalds, will
Excerpts from Peter Zijlstra's message of July 22, 2020 6:35 pm:
> On Tue, Jul 21, 2020 at 02:06:23PM -0700, Andrew Morton wrote:
>> On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> >
>> > For SMP systems using IPI based TLB invalidation, looking at
>> > current->active_mm is entirely reasonable. This then presents the
>> > following race condition:
>> >
>> >
>> > CPU0 CPU1
>> >
>> > flush_tlb_mm(mm) use_mm(mm)
>> > <send-IPI>
>> > tsk->active_mm = mm;
>> > <IPI>
>> > if (tsk->active_mm == mm)
>> > // flush TLBs
>> > </IPI>
>> > switch_mm(old_mm,mm,tsk);
>> >
>> >
>> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
>> > because the IPI lands before we actually switched.
>> >
>> > Avoid this by disabling IRQs across changing ->active_mm and
>> > switch_mm().
>> >
>> > [ There are all sorts of reasons this might be harmless for various
>> > architecture specific reasons, but best not leave the door open at
>> > all. ]
>>
>> Can we give the -stable maintainers (and others) more explanation of
>> why they might choose to merge this?
>
> Like so then?
>
> ---
> Subject: mm: Fix kthread_use_mm() vs TLB invalidate
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue, 11 Feb 2020 10:25:19 +0100
>
> For SMP systems using IPI based TLB invalidation, looking at
> current->active_mm is entirely reasonable. This then presents the
> following race condition:
>
>
> CPU0 CPU1
>
> flush_tlb_mm(mm) use_mm(mm)
> <send-IPI>
> tsk->active_mm = mm;
> <IPI>
> if (tsk->active_mm == mm)
> // flush TLBs
> </IPI>
> switch_mm(old_mm,mm,tsk);
>
>
> Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> because the IPI lands before we actually switched.
>
> Avoid this by disabling IRQs across changing ->active_mm and
> switch_mm().
>
> Of the (SMP) architectures that have IPI based TLB invalidate:
>
> Alpha - checks active_mm
> ARC - ASID specific
> IA64 - checks active_mm
> MIPS - ASID specific flush
> OpenRISC - shoots down world
> PARISC - shoots down world
> SH - ASID specific
> SPARC - ASID specific
> x86 - N/A
> xtensa - checks active_mm
>
> So at the very least Alpha, IA64 and Xtensa are suspect.
>
> On top of this, for scheduler consistency we need at least preemption
> disabled across changing tsk->mm and doing switch_mm(), which is
> currently provided by task_lock(), but that's not sufficient for
> PREEMPT_RT.
>
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: stable@kernel.org
> ---
> kernel/kthread.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1241,13 +1241,20 @@ void kthread_use_mm(struct mm_struct *mm
> WARN_ON_ONCE(tsk->mm);
>
> task_lock(tsk);
> + /*
> + * Serialize the tsk->mm store and switch_mm() against TLB invalidation
> + * IPIs. Also make sure we're non-preemptible on PREEMPT_RT to not race
> + * against the scheduler writing to these variables.
> + */
> + local_irq_disable();
> active_mm = tsk->active_mm;
> if (active_mm != mm) {
> mmgrab(mm);
> tsk->active_mm = mm;
> }
> tsk->mm = mm;
> - switch_mm(active_mm, mm, tsk);
> + switch_mm_irqs_off(active_mm, mm, tsk);
> + local_irq_enable();
> task_unlock(tsk);
> #ifdef finish_arch_post_lock_switch
> finish_arch_post_lock_switch();
> @@ -1276,9 +1283,11 @@ void kthread_unuse_mm(struct mm_struct *
>
> task_lock(tsk);
> sync_mm_rss(mm);
> + local_irq_disable();
> tsk->mm = NULL;
> /* active_mm is still 'mm' */
> enter_lazy_tlb(mm, tsk);
> + local_irq_enable();
> task_unlock(tsk);
> }
> EXPORT_SYMBOL_GPL(kthread_unuse_mm);
>
Oh good, this is also needed as part of my preferred fix for the
io_uring mmget_not_zero->use_mm() vs mm_cpumask problem
https://marc.info/?l=linux-mm&m=159520550112106&w=2
I'll try to do arch fixes on top of this (I have the same hunks
locally!). After that, we should be able to allow mmget_not_zero
to be first class references to mm AFAIKS.
Thanks,
Nick
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
2020-07-21 15:41 [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate Peter Zijlstra
2020-07-21 21:06 ` Andrew Morton
@ 2020-08-21 5:39 ` Aneesh Kumar K.V
2020-08-21 13:04 ` peterz
1 sibling, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2020-08-21 5:39 UTC (permalink / raw)
To: Peter Zijlstra, Andrew Morton
Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch,
npiggin, mathieu.desnoyers
Peter Zijlstra <peterz@infradead.org> writes:
> For SMP systems using IPI based TLB invalidation, looking at
> current->active_mm is entirely reasonable. This then presents the
> following race condition:
>
>
> CPU0 CPU1
>
> flush_tlb_mm(mm) use_mm(mm)
> <send-IPI>
> tsk->active_mm = mm;
> <IPI>
> if (tsk->active_mm == mm)
> // flush TLBs
> </IPI>
> switch_mm(old_mm,mm,tsk);
>
>
> Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> because the IPI lands before we actually switched.
>
> Avoid this by disabling IRQs across changing ->active_mm and
> switch_mm().
>
> [ There are all sorts of reasons this might be harmless for various
> architecture specific reasons, but best not leave the door open at
> all. ]
Do we have similar race with exec_mmap()? I am looking at exec_mmap()
runnning parallel to do_exit_flush_lazy_tlb(). We can get
if (current->active_mm == mm) {
true and if we don't disable irq around updating tsk->mm/active_mm we
can end up doing mmdrop on wrong mm?
>
> Cc: stable@kernel.org
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>
> Sorry, I dropped the ball on this and only found it because I was
> looking at the whole membarrier things vs use_mm().
>
>
> kernel/kthread.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 1d9e2fdfd67a..7221dcbffef3 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm)
> WARN_ON_ONCE(tsk->mm);
>
> task_lock(tsk);
> + local_irq_disable();
> active_mm = tsk->active_mm;
> if (active_mm != mm) {
> mmgrab(mm);
> tsk->active_mm = mm;
> }
> tsk->mm = mm;
> - switch_mm(active_mm, mm, tsk);
> + switch_mm_irqs_off(active_mm, mm, tsk);
> + local_irq_enable();
> task_unlock(tsk);
> #ifdef finish_arch_post_lock_switch
> finish_arch_post_lock_switch();
> @@ -1276,9 +1278,11 @@ void kthread_unuse_mm(struct mm_struct *mm)
>
> task_lock(tsk);
> sync_mm_rss(mm);
> + local_irq_disable();
> tsk->mm = NULL;
> /* active_mm is still 'mm' */
> enter_lazy_tlb(mm, tsk);
> + local_irq_enable();
> task_unlock(tsk);
> }
> EXPORT_SYMBOL_GPL(kthread_unuse_mm);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
2020-08-21 5:39 ` Aneesh Kumar K.V
@ 2020-08-21 13:04 ` peterz
2020-08-28 3:26 ` Nicholas Piggin
0 siblings, 1 reply; 8+ messages in thread
From: peterz @ 2020-08-21 13:04 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Andrew Morton, linux-kernel, luto, axboe, keescook, torvalds,
jannh, will, hch, npiggin, mathieu.desnoyers
On Fri, Aug 21, 2020 at 11:09:51AM +0530, Aneesh Kumar K.V wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
>
> > For SMP systems using IPI based TLB invalidation, looking at
> > current->active_mm is entirely reasonable. This then presents the
> > following race condition:
> >
> >
> > CPU0 CPU1
> >
> > flush_tlb_mm(mm) use_mm(mm)
> > <send-IPI>
> > tsk->active_mm = mm;
> > <IPI>
> > if (tsk->active_mm == mm)
> > // flush TLBs
> > </IPI>
> > switch_mm(old_mm,mm,tsk);
> >
> >
> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> > because the IPI lands before we actually switched.
> >
> > Avoid this by disabling IRQs across changing ->active_mm and
> > switch_mm().
> >
> > [ There are all sorts of reasons this might be harmless for various
> > architecture specific reasons, but best not leave the door open at
> > all. ]
>
>
> Do we have similar race with exec_mmap()? I am looking at exec_mmap()
> runnning parallel to do_exit_flush_lazy_tlb(). We can get
>
> if (current->active_mm == mm) {
>
> true and if we don't disable irq around updating tsk->mm/active_mm we
> can end up doing mmdrop on wrong mm?
exec_mmap() is called after de_thread(), there should not be any mm
specific invalidations around I think.
Then again, CLONE_VM without CLONE_THREAD might still be possible, so
yeah, we probably want IRQs disabled there too, just for consistency and
general paranoia if nothing else.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
2020-08-21 13:04 ` peterz
@ 2020-08-28 3:26 ` Nicholas Piggin
2020-08-28 6:55 ` Nicholas Piggin
0 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2020-08-28 3:26 UTC (permalink / raw)
To: Aneesh Kumar K.V, peterz, linux-arch
Cc: Andrew Morton, axboe, hch, jannh, keescook, linux-kernel, luto,
mathieu.desnoyers, torvalds, will
Excerpts from peterz@infradead.org's message of August 21, 2020 11:04 pm:
> On Fri, Aug 21, 2020 at 11:09:51AM +0530, Aneesh Kumar K.V wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>>
>> > For SMP systems using IPI based TLB invalidation, looking at
>> > current->active_mm is entirely reasonable. This then presents the
>> > following race condition:
>> >
>> >
>> > CPU0 CPU1
>> >
>> > flush_tlb_mm(mm) use_mm(mm)
>> > <send-IPI>
>> > tsk->active_mm = mm;
>> > <IPI>
>> > if (tsk->active_mm == mm)
>> > // flush TLBs
>> > </IPI>
>> > switch_mm(old_mm,mm,tsk);
>> >
>> >
>> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
>> > because the IPI lands before we actually switched.
>> >
>> > Avoid this by disabling IRQs across changing ->active_mm and
>> > switch_mm().
>> >
>> > [ There are all sorts of reasons this might be harmless for various
>> > architecture specific reasons, but best not leave the door open at
>> > all. ]
>>
>>
>> Do we have similar race with exec_mmap()? I am looking at exec_mmap()
>> runnning parallel to do_exit_flush_lazy_tlb(). We can get
>>
>> if (current->active_mm == mm) {
>>
>> true and if we don't disable irq around updating tsk->mm/active_mm we
>> can end up doing mmdrop on wrong mm?
>
> exec_mmap() is called after de_thread(), there should not be any mm
> specific invalidations around I think.
>
> Then again, CLONE_VM without CLONE_THREAD might still be possible, so
> yeah, we probably want IRQs disabled there too, just for consistency and
> general paranoia if nothing else.
The problem is probably not this TLB flushing race, but I think there
is a lazy tlb race.
call_usermodehelper()
kernel_execve()
old_mm = current->mm;
active_mm = current->active_mm;
*** preempt *** ---------------------->schedule()
prev->active_mm = NULL;
mmdrop(prev active mm)
...
<----------------------schedule()
current->mm = mm;
current->active_mm = mm;
if (!old_mm)
mmdrop(active_mm); /* double free! */
There's possibly other problematic interleavings. powerpc also has an
issue with switching away a lazy tlb mm via IPI which is basically the
same problem so I just illustrate the more general issue.
I think we just make it a rule that these always get updated under
local_irq_disable, to be safe.
Trouble is we can't just do it, because some architectures can't do
activate_mm with irqs disabled. ARM and UM, at least. UM can't even
do preempt_disabled. We can probably change them to make them work,
I'm not sure what the best way to go is, my first attempt is to require
activate_mm to do the mm switching and the irq disable as well, but
I'll need some help from the archs
I'll send out rfcs in a minute.
Thanks,
Nick
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
2020-08-28 3:26 ` Nicholas Piggin
@ 2020-08-28 6:55 ` Nicholas Piggin
0 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-08-28 6:55 UTC (permalink / raw)
To: Aneesh Kumar K.V, linux-arch, peterz
Cc: Andrew Morton, axboe, hch, jannh, keescook, linux-kernel, luto,
mathieu.desnoyers, torvalds, will
Excerpts from Nicholas Piggin's message of August 28, 2020 1:26 pm:
> Excerpts from peterz@infradead.org's message of August 21, 2020 11:04 pm:
>> On Fri, Aug 21, 2020 at 11:09:51AM +0530, Aneesh Kumar K.V wrote:
>>> Peter Zijlstra <peterz@infradead.org> writes:
>>>
>>> > For SMP systems using IPI based TLB invalidation, looking at
>>> > current->active_mm is entirely reasonable. This then presents the
>>> > following race condition:
>>> >
>>> >
>>> > CPU0 CPU1
>>> >
>>> > flush_tlb_mm(mm) use_mm(mm)
>>> > <send-IPI>
>>> > tsk->active_mm = mm;
>>> > <IPI>
>>> > if (tsk->active_mm == mm)
>>> > // flush TLBs
>>> > </IPI>
>>> > switch_mm(old_mm,mm,tsk);
>>> >
>>> >
>>> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
>>> > because the IPI lands before we actually switched.
>>> >
>>> > Avoid this by disabling IRQs across changing ->active_mm and
>>> > switch_mm().
>>> >
>>> > [ There are all sorts of reasons this might be harmless for various
>>> > architecture specific reasons, but best not leave the door open at
>>> > all. ]
>>>
>>>
>>> Do we have similar race with exec_mmap()? I am looking at exec_mmap()
>>> runnning parallel to do_exit_flush_lazy_tlb(). We can get
>>>
>>> if (current->active_mm == mm) {
>>>
>>> true and if we don't disable irq around updating tsk->mm/active_mm we
>>> can end up doing mmdrop on wrong mm?
>>
>> exec_mmap() is called after de_thread(), there should not be any mm
>> specific invalidations around I think.
>>
>> Then again, CLONE_VM without CLONE_THREAD might still be possible, so
>> yeah, we probably want IRQs disabled there too, just for consistency and
>> general paranoia if nothing else.
>
> The problem is probably not this TLB flushing race, but I think there
> is a lazy tlb race.
Hmm, is it possible for something to be holding the mm_users when we
exec? That could actually make it a problem for TLB flushing too.
Thanks,
Nick
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-08-28 6:55 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21 15:41 [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate Peter Zijlstra
2020-07-21 21:06 ` Andrew Morton
2020-07-22 8:35 ` Peter Zijlstra
2020-07-23 7:15 ` Nicholas Piggin
2020-08-21 5:39 ` Aneesh Kumar K.V
2020-08-21 13:04 ` peterz
2020-08-28 3:26 ` Nicholas Piggin
2020-08-28 6:55 ` Nicholas Piggin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.