All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
@ 2020-07-21 15:41 Peter Zijlstra
  2020-07-21 21:06 ` Andrew Morton
  2020-08-21  5:39 ` Aneesh Kumar K.V
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2020-07-21 15:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch,
	npiggin, mathieu.desnoyers


For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:


  CPU0			CPU1

  flush_tlb_mm(mm)	use_mm(mm)
    <send-IPI>
			  tsk->active_mm = mm;
			  <IPI>
			    if (tsk->active_mm == mm)
			      // flush TLBs
			  </IPI>
			  switch_mm(old_mm,mm,tsk);


Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.

Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().

[ There are all sorts of reasons this might be harmless for various
architecture specific reasons, but best not leave the door open at
all. ]

Cc: stable@kernel.org
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---

Sorry, I dropped the ball on this and only found it because I was
looking at the whole membarrier things vs use_mm().


 kernel/kthread.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 1d9e2fdfd67a..7221dcbffef3 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm)
 	WARN_ON_ONCE(tsk->mm);
 
 	task_lock(tsk);
+	local_irq_disable();
 	active_mm = tsk->active_mm;
 	if (active_mm != mm) {
 		mmgrab(mm);
 		tsk->active_mm = mm;
 	}
 	tsk->mm = mm;
-	switch_mm(active_mm, mm, tsk);
+	switch_mm_irqs_off(active_mm, mm, tsk);
+	local_irq_enable();
 	task_unlock(tsk);
 #ifdef finish_arch_post_lock_switch
 	finish_arch_post_lock_switch();
@@ -1276,9 +1278,11 @@ void kthread_unuse_mm(struct mm_struct *mm)
 
 	task_lock(tsk);
 	sync_mm_rss(mm);
+	local_irq_disable();
 	tsk->mm = NULL;
 	/* active_mm is still 'mm' */
 	enter_lazy_tlb(mm, tsk);
+	local_irq_enable();
 	task_unlock(tsk);
 }
 EXPORT_SYMBOL_GPL(kthread_unuse_mm);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
  2020-07-21 15:41 [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate Peter Zijlstra
@ 2020-07-21 21:06 ` Andrew Morton
  2020-07-22  8:35   ` Peter Zijlstra
  2020-08-21  5:39 ` Aneesh Kumar K.V
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2020-07-21 21:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch,
	npiggin, mathieu.desnoyers

On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <peterz@infradead.org> wrote:

> 
> For SMP systems using IPI based TLB invalidation, looking at
> current->active_mm is entirely reasonable. This then presents the
> following race condition:
> 
> 
>   CPU0			CPU1
> 
>   flush_tlb_mm(mm)	use_mm(mm)
>     <send-IPI>
> 			  tsk->active_mm = mm;
> 			  <IPI>
> 			    if (tsk->active_mm == mm)
> 			      // flush TLBs
> 			  </IPI>
> 			  switch_mm(old_mm,mm,tsk);
> 
> 
> Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> because the IPI lands before we actually switched.
> 
> Avoid this by disabling IRQs across changing ->active_mm and
> switch_mm().
> 
> [ There are all sorts of reasons this might be harmless for various
> architecture specific reasons, but best not leave the door open at
> all. ]

Can we give the -stable maintainers (and others) more explanation of
why they might choose to merge this?

> ...
>
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm)
>  	WARN_ON_ONCE(tsk->mm);
>  
>  	task_lock(tsk);
> +	local_irq_disable();

A bare local_irq_disable() is one of those "what the heck is this
protecting" things.  It's the new lock_kernel().

So a little comment will help readers to understand why we did it. 
Something like this?

--- a/kernel/kthread.c~mm-fix-kthread_use_mm-vs-tlb-invalidate-fix
+++ a/kernel/kthread.c
@@ -1239,6 +1239,7 @@ void kthread_use_mm(struct mm_struct *mm
 	WARN_ON_ONCE(tsk->mm);
 
 	task_lock(tsk);
+	/* Hold off tlb flush IPIs while switching mm's */
 	local_irq_disable();
 	active_mm = tsk->active_mm;
 	if (active_mm != mm) {
_

>  	active_mm = tsk->active_mm;
>  	if (active_mm != mm) {
>  		mmgrab(mm);
>  		tsk->active_mm = mm;
>  	}
>  	tsk->mm = mm;
> -	switch_mm(active_mm, mm, tsk);
> +	switch_mm_irqs_off(active_mm, mm, tsk);
> +	local_irq_enable();
>  	task_unlock(tsk);
>  #ifdef finish_arch_post_lock_switch
>  	finish_arch_post_lock_switch();
>
> ...
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
  2020-07-21 21:06 ` Andrew Morton
@ 2020-07-22  8:35   ` Peter Zijlstra
  2020-07-23  7:15     ` Nicholas Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2020-07-22  8:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch,
	npiggin, mathieu.desnoyers

On Tue, Jul 21, 2020 at 02:06:23PM -0700, Andrew Morton wrote:
> On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > 
> > For SMP systems using IPI based TLB invalidation, looking at
> > current->active_mm is entirely reasonable. This then presents the
> > following race condition:
> > 
> > 
> >   CPU0			CPU1
> > 
> >   flush_tlb_mm(mm)	use_mm(mm)
> >     <send-IPI>
> > 			  tsk->active_mm = mm;
> > 			  <IPI>
> > 			    if (tsk->active_mm == mm)
> > 			      // flush TLBs
> > 			  </IPI>
> > 			  switch_mm(old_mm,mm,tsk);
> > 
> > 
> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> > because the IPI lands before we actually switched.
> > 
> > Avoid this by disabling IRQs across changing ->active_mm and
> > switch_mm().
> > 
> > [ There are all sorts of reasons this might be harmless for various
> > architecture specific reasons, but best not leave the door open at
> > all. ]
> 
> Can we give the -stable maintainers (and others) more explanation of
> why they might choose to merge this?

Like so then?

---
Subject: mm: Fix kthread_use_mm() vs TLB invalidate
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 11 Feb 2020 10:25:19 +0100

For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:


  CPU0			CPU1

  flush_tlb_mm(mm)	use_mm(mm)
    <send-IPI>
			  tsk->active_mm = mm;
			  <IPI>
			    if (tsk->active_mm == mm)
			      // flush TLBs
			  </IPI>
			  switch_mm(old_mm,mm,tsk);


Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.

Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().

Of the (SMP) architectures that have IPI based TLB invalidate:

  Alpha    - checks active_mm
  ARC      - ASID specific
  IA64     - checks active_mm
  MIPS     - ASID specific flush
  OpenRISC - shoots down world
  PARISC   - shoots down world
  SH       - ASID specific
  SPARC    - ASID specific
  x86      - N/A
  xtensa   - checks active_mm

So at the very least Alpha, IA64 and Xtensa are suspect.

On top of this, for scheduler consistency we need at least preemption
disabled across changing tsk->mm and doing switch_mm(), which is
currently provided by task_lock(), but that's not sufficient for
PREEMPT_RT.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
---
 kernel/kthread.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1241,13 +1241,20 @@ void kthread_use_mm(struct mm_struct *mm
 	WARN_ON_ONCE(tsk->mm);
 
 	task_lock(tsk);
+	/*
+	 * Serialize the tsk->mm store and switch_mm() against TLB invalidation
+	 * IPIs. Also make sure we're non-preemptible on PREEMPT_RT to not race
+	 * against the scheduler writing to these variables.
+	 */
+	local_irq_disable();
 	active_mm = tsk->active_mm;
 	if (active_mm != mm) {
 		mmgrab(mm);
 		tsk->active_mm = mm;
 	}
 	tsk->mm = mm;
-	switch_mm(active_mm, mm, tsk);
+	switch_mm_irqs_off(active_mm, mm, tsk);
+	local_irq_enable();
 	task_unlock(tsk);
 #ifdef finish_arch_post_lock_switch
 	finish_arch_post_lock_switch();
@@ -1276,9 +1283,11 @@ void kthread_unuse_mm(struct mm_struct *
 
 	task_lock(tsk);
 	sync_mm_rss(mm);
+	local_irq_disable();
 	tsk->mm = NULL;
 	/* active_mm is still 'mm' */
 	enter_lazy_tlb(mm, tsk);
+	local_irq_enable();
 	task_unlock(tsk);
 }
 EXPORT_SYMBOL_GPL(kthread_unuse_mm);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
  2020-07-22  8:35   ` Peter Zijlstra
@ 2020-07-23  7:15     ` Nicholas Piggin
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-07-23  7:15 UTC (permalink / raw)
  To: Andrew Morton, Peter Zijlstra
  Cc: axboe, hch, jannh, keescook, linux-kernel, luto,
	mathieu.desnoyers, torvalds, will

Excerpts from Peter Zijlstra's message of July 22, 2020 6:35 pm:
> On Tue, Jul 21, 2020 at 02:06:23PM -0700, Andrew Morton wrote:
>> On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> > 
>> > For SMP systems using IPI based TLB invalidation, looking at
>> > current->active_mm is entirely reasonable. This then presents the
>> > following race condition:
>> > 
>> > 
>> >   CPU0			CPU1
>> > 
>> >   flush_tlb_mm(mm)	use_mm(mm)
>> >     <send-IPI>
>> > 			  tsk->active_mm = mm;
>> > 			  <IPI>
>> > 			    if (tsk->active_mm == mm)
>> > 			      // flush TLBs
>> > 			  </IPI>
>> > 			  switch_mm(old_mm,mm,tsk);
>> > 
>> > 
>> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
>> > because the IPI lands before we actually switched.
>> > 
>> > Avoid this by disabling IRQs across changing ->active_mm and
>> > switch_mm().
>> > 
>> > [ There are all sorts of reasons this might be harmless for various
>> > architecture specific reasons, but best not leave the door open at
>> > all. ]
>> 
>> Can we give the -stable maintainers (and others) more explanation of
>> why they might choose to merge this?
> 
> Like so then?
> 
> ---
> Subject: mm: Fix kthread_use_mm() vs TLB invalidate
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue, 11 Feb 2020 10:25:19 +0100
> 
> For SMP systems using IPI based TLB invalidation, looking at
> current->active_mm is entirely reasonable. This then presents the
> following race condition:
> 
> 
>   CPU0			CPU1
> 
>   flush_tlb_mm(mm)	use_mm(mm)
>     <send-IPI>
> 			  tsk->active_mm = mm;
> 			  <IPI>
> 			    if (tsk->active_mm == mm)
> 			      // flush TLBs
> 			  </IPI>
> 			  switch_mm(old_mm,mm,tsk);
> 
> 
> Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> because the IPI lands before we actually switched.
> 
> Avoid this by disabling IRQs across changing ->active_mm and
> switch_mm().
> 
> Of the (SMP) architectures that have IPI based TLB invalidate:
> 
>   Alpha    - checks active_mm
>   ARC      - ASID specific
>   IA64     - checks active_mm
>   MIPS     - ASID specific flush
>   OpenRISC - shoots down world
>   PARISC   - shoots down world
>   SH       - ASID specific
>   SPARC    - ASID specific
>   x86      - N/A
>   xtensa   - checks active_mm
> 
> So at the very least Alpha, IA64 and Xtensa are suspect.
> 
> On top of this, for scheduler consistency we need at least preemption
> disabled across changing tsk->mm and doing switch_mm(), which is
> currently provided by task_lock(), but that's not sufficient for
> PREEMPT_RT.
> 
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: stable@kernel.org
> ---
>  kernel/kthread.c |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1241,13 +1241,20 @@ void kthread_use_mm(struct mm_struct *mm
>  	WARN_ON_ONCE(tsk->mm);
>  
>  	task_lock(tsk);
> +	/*
> +	 * Serialize the tsk->mm store and switch_mm() against TLB invalidation
> +	 * IPIs. Also make sure we're non-preemptible on PREEMPT_RT to not race
> +	 * against the scheduler writing to these variables.
> +	 */
> +	local_irq_disable();
>  	active_mm = tsk->active_mm;
>  	if (active_mm != mm) {
>  		mmgrab(mm);
>  		tsk->active_mm = mm;
>  	}
>  	tsk->mm = mm;
> -	switch_mm(active_mm, mm, tsk);
> +	switch_mm_irqs_off(active_mm, mm, tsk);
> +	local_irq_enable();
>  	task_unlock(tsk);
>  #ifdef finish_arch_post_lock_switch
>  	finish_arch_post_lock_switch();
> @@ -1276,9 +1283,11 @@ void kthread_unuse_mm(struct mm_struct *
>  
>  	task_lock(tsk);
>  	sync_mm_rss(mm);
> +	local_irq_disable();
>  	tsk->mm = NULL;
>  	/* active_mm is still 'mm' */
>  	enter_lazy_tlb(mm, tsk);
> +	local_irq_enable();
>  	task_unlock(tsk);
>  }
>  EXPORT_SYMBOL_GPL(kthread_unuse_mm);
> 

Oh good, this is also needed as part of my preferred fix for the 
io_uring mmget_not_zero->use_mm() vs mm_cpumask problem

https://marc.info/?l=linux-mm&m=159520550112106&w=2

I'll try to do arch fixes on top of this (I have the same hunks
locally!). After that, we should be able to allow mmget_not_zero
to be first class references to mm AFAIKS.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
  2020-07-21 15:41 [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate Peter Zijlstra
  2020-07-21 21:06 ` Andrew Morton
@ 2020-08-21  5:39 ` Aneesh Kumar K.V
  2020-08-21 13:04   ` peterz
  1 sibling, 1 reply; 8+ messages in thread
From: Aneesh Kumar K.V @ 2020-08-21  5:39 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton
  Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will, hch,
	npiggin, mathieu.desnoyers

Peter Zijlstra <peterz@infradead.org> writes:

> For SMP systems using IPI based TLB invalidation, looking at
> current->active_mm is entirely reasonable. This then presents the
> following race condition:
>
>
>   CPU0			CPU1
>
>   flush_tlb_mm(mm)	use_mm(mm)
>     <send-IPI>
> 			  tsk->active_mm = mm;
> 			  <IPI>
> 			    if (tsk->active_mm == mm)
> 			      // flush TLBs
> 			  </IPI>
> 			  switch_mm(old_mm,mm,tsk);
>
>
> Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> because the IPI lands before we actually switched.
>
> Avoid this by disabling IRQs across changing ->active_mm and
> switch_mm().
>
> [ There are all sorts of reasons this might be harmless for various
> architecture specific reasons, but best not leave the door open at
> all. ]


Do we have similar race with exec_mmap()? I am looking at exec_mmap()
runnning parallel to do_exit_flush_lazy_tlb(). We can get

	if (current->active_mm == mm) {

true and if we don't disable irq around updating tsk->mm/active_mm we
can end up doing mmdrop on wrong mm?

>
> Cc: stable@kernel.org
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>
> Sorry, I dropped the ball on this and only found it because I was
> looking at the whole membarrier things vs use_mm().
>
>
>  kernel/kthread.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 1d9e2fdfd67a..7221dcbffef3 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm)
>  	WARN_ON_ONCE(tsk->mm);
>  
>  	task_lock(tsk);
> +	local_irq_disable();
>  	active_mm = tsk->active_mm;
>  	if (active_mm != mm) {
>  		mmgrab(mm);
>  		tsk->active_mm = mm;
>  	}
>  	tsk->mm = mm;
> -	switch_mm(active_mm, mm, tsk);
> +	switch_mm_irqs_off(active_mm, mm, tsk);
> +	local_irq_enable();
>  	task_unlock(tsk);
>  #ifdef finish_arch_post_lock_switch
>  	finish_arch_post_lock_switch();
> @@ -1276,9 +1278,11 @@ void kthread_unuse_mm(struct mm_struct *mm)
>  
>  	task_lock(tsk);
>  	sync_mm_rss(mm);
> +	local_irq_disable();
>  	tsk->mm = NULL;
>  	/* active_mm is still 'mm' */
>  	enter_lazy_tlb(mm, tsk);
> +	local_irq_enable();
>  	task_unlock(tsk);
>  }
>  EXPORT_SYMBOL_GPL(kthread_unuse_mm);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
  2020-08-21  5:39 ` Aneesh Kumar K.V
@ 2020-08-21 13:04   ` peterz
  2020-08-28  3:26     ` Nicholas Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: peterz @ 2020-08-21 13:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, linux-kernel, luto, axboe, keescook, torvalds,
	jannh, will, hch, npiggin, mathieu.desnoyers

On Fri, Aug 21, 2020 at 11:09:51AM +0530, Aneesh Kumar K.V wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > For SMP systems using IPI based TLB invalidation, looking at
> > current->active_mm is entirely reasonable. This then presents the
> > following race condition:
> >
> >
> >   CPU0			CPU1
> >
> >   flush_tlb_mm(mm)	use_mm(mm)
> >     <send-IPI>
> > 			  tsk->active_mm = mm;
> > 			  <IPI>
> > 			    if (tsk->active_mm == mm)
> > 			      // flush TLBs
> > 			  </IPI>
> > 			  switch_mm(old_mm,mm,tsk);
> >
> >
> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> > because the IPI lands before we actually switched.
> >
> > Avoid this by disabling IRQs across changing ->active_mm and
> > switch_mm().
> >
> > [ There are all sorts of reasons this might be harmless for various
> > architecture specific reasons, but best not leave the door open at
> > all. ]
> 
> 
> Do we have similar race with exec_mmap()? I am looking at exec_mmap()
> runnning parallel to do_exit_flush_lazy_tlb(). We can get
> 
> 	if (current->active_mm == mm) {
> 
> true and if we don't disable irq around updating tsk->mm/active_mm we
> can end up doing mmdrop on wrong mm?

exec_mmap() is called after de_thread(), there should not be any mm
specific invalidations around I think.

Then again, CLONE_VM without CLONE_THREAD might still be possible, so
yeah, we probably want IRQs disabled there too, just for consistency and
general paranoia if nothing else.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
  2020-08-21 13:04   ` peterz
@ 2020-08-28  3:26     ` Nicholas Piggin
  2020-08-28  6:55       ` Nicholas Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2020-08-28  3:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V, peterz, linux-arch
  Cc: Andrew Morton, axboe, hch, jannh, keescook, linux-kernel, luto,
	mathieu.desnoyers, torvalds, will

Excerpts from peterz@infradead.org's message of August 21, 2020 11:04 pm:
> On Fri, Aug 21, 2020 at 11:09:51AM +0530, Aneesh Kumar K.V wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > For SMP systems using IPI based TLB invalidation, looking at
>> > current->active_mm is entirely reasonable. This then presents the
>> > following race condition:
>> >
>> >
>> >   CPU0			CPU1
>> >
>> >   flush_tlb_mm(mm)	use_mm(mm)
>> >     <send-IPI>
>> > 			  tsk->active_mm = mm;
>> > 			  <IPI>
>> > 			    if (tsk->active_mm == mm)
>> > 			      // flush TLBs
>> > 			  </IPI>
>> > 			  switch_mm(old_mm,mm,tsk);
>> >
>> >
>> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
>> > because the IPI lands before we actually switched.
>> >
>> > Avoid this by disabling IRQs across changing ->active_mm and
>> > switch_mm().
>> >
>> > [ There are all sorts of reasons this might be harmless for various
>> > architecture specific reasons, but best not leave the door open at
>> > all. ]
>> 
>> 
>> Do we have similar race with exec_mmap()? I am looking at exec_mmap()
>> runnning parallel to do_exit_flush_lazy_tlb(). We can get
>> 
>> 	if (current->active_mm == mm) {
>> 
>> true and if we don't disable irq around updating tsk->mm/active_mm we
>> can end up doing mmdrop on wrong mm?
> 
> exec_mmap() is called after de_thread(), there should not be any mm
> specific invalidations around I think.
> 
> Then again, CLONE_VM without CLONE_THREAD might still be possible, so
> yeah, we probably want IRQs disabled there too, just for consistency and
> general paranoia if nothing else.

The problem is probably not this TLB flushing race, but I think there
is a lazy tlb race.

  call_usermodehelper()
    kernel_execve()
      old_mm = current->mm;
      active_mm = current->active_mm;
      *** preempt *** ---------------------->schedule()
                                               prev->active_mm = NULL;
					       mmdrop(prev active mm)
					     ... 
                      <----------------------schedule()
      current->mm = mm;
      current->active_mm = mm;
      if (!old_mm)
          mmdrop(active_mm); /* double free! */

There's possibly other problematic interleavings. powerpc also has an
issue with switching away a lazy tlb mm via IPI which is basically the
same problem so I just illustrate the more general issue.

I think we just make it a rule that these always get updated under
local_irq_disable, to be safe.

Trouble is we can't just do it, because some architectures can't do
activate_mm with irqs disabled. ARM and UM, at least. UM can't even
do preempt_disabled. We can probably change them to make them work,
I'm not sure what the best way to go is, my first attempt is to require
activate_mm to do the mm switching and the irq disable as well, but
I'll need some help from the archs

I'll send out rfcs in a minute.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate
  2020-08-28  3:26     ` Nicholas Piggin
@ 2020-08-28  6:55       ` Nicholas Piggin
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2020-08-28  6:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-arch, peterz
  Cc: Andrew Morton, axboe, hch, jannh, keescook, linux-kernel, luto,
	mathieu.desnoyers, torvalds, will

Excerpts from Nicholas Piggin's message of August 28, 2020 1:26 pm:
> Excerpts from peterz@infradead.org's message of August 21, 2020 11:04 pm:
>> On Fri, Aug 21, 2020 at 11:09:51AM +0530, Aneesh Kumar K.V wrote:
>>> Peter Zijlstra <peterz@infradead.org> writes:
>>> 
>>> > For SMP systems using IPI based TLB invalidation, looking at
>>> > current->active_mm is entirely reasonable. This then presents the
>>> > following race condition:
>>> >
>>> >
>>> >   CPU0			CPU1
>>> >
>>> >   flush_tlb_mm(mm)	use_mm(mm)
>>> >     <send-IPI>
>>> > 			  tsk->active_mm = mm;
>>> > 			  <IPI>
>>> > 			    if (tsk->active_mm == mm)
>>> > 			      // flush TLBs
>>> > 			  </IPI>
>>> > 			  switch_mm(old_mm,mm,tsk);
>>> >
>>> >
>>> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
>>> > because the IPI lands before we actually switched.
>>> >
>>> > Avoid this by disabling IRQs across changing ->active_mm and
>>> > switch_mm().
>>> >
>>> > [ There are all sorts of reasons this might be harmless for various
>>> > architecture specific reasons, but best not leave the door open at
>>> > all. ]
>>> 
>>> 
>>> Do we have similar race with exec_mmap()? I am looking at exec_mmap()
>>> runnning parallel to do_exit_flush_lazy_tlb(). We can get
>>> 
>>> 	if (current->active_mm == mm) {
>>> 
>>> true and if we don't disable irq around updating tsk->mm/active_mm we
>>> can end up doing mmdrop on wrong mm?
>> 
>> exec_mmap() is called after de_thread(), there should not be any mm
>> specific invalidations around I think.
>> 
>> Then again, CLONE_VM without CLONE_THREAD might still be possible, so
>> yeah, we probably want IRQs disabled there too, just for consistency and
>> general paranoia if nothing else.
> 
> The problem is probably not this TLB flushing race, but I think there
> is a lazy tlb race.

Hmm, is it possible for something to be holding the mm_users when we
exec? That could actually make it a problem for TLB flushing too.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-08-28  6:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21 15:41 [PATCH v3] mm: Fix kthread_use_mm() vs TLB invalidate Peter Zijlstra
2020-07-21 21:06 ` Andrew Morton
2020-07-22  8:35   ` Peter Zijlstra
2020-07-23  7:15     ` Nicholas Piggin
2020-08-21  5:39 ` Aneesh Kumar K.V
2020-08-21 13:04   ` peterz
2020-08-28  3:26     ` Nicholas Piggin
2020-08-28  6:55       ` Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.