LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] mm/tlb: Fix use_mm() vs TLB invalidate
@ 2020-02-21 11:11 Peter Zijlstra
  2020-02-21 19:19 ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2020-02-21 11:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, luto, axboe, keescook, torvalds, jannh, will


For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:


  CPU0			CPU1

  flush_tlb_mm(mm)	use_mm(mm)
    <send-IPI>
			  tsk->active_mm = mm;
			  <IPI>
			    if (tsk->active_mm == mm)
			      // flush TLBs
			  </IPI>
			  switch_mm(old_mm,mm,tsk);


Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.

Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().

[ There are all sorts of reasons this might be harmless for various
architecture specific reasons, but best not leave the door open at
all. ]

Cc: stable@kernel.org
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
Index: linux-2.6/mm/mmu_context.c
===================================================================
--- linux-2.6.orig/mm/mmu_context.c
+++ linux-2.6/mm/mmu_context.c
@@ -24,14 +24,19 @@ void use_mm(struct mm_struct *mm)
 	struct mm_struct *active_mm;
 	struct task_struct *tsk = current;
 
+	BUG_ON(!(tsk->flags & PF_KTHREAD));
+	BUG_ON(tsk->mm != NULL);
+
 	task_lock(tsk);
+	local_irq_disable();
 	active_mm = tsk->active_mm;
 	if (active_mm != mm) {
 		mmgrab(mm);
 		tsk->active_mm = mm;
 	}
 	tsk->mm = mm;
-	switch_mm(active_mm, mm, tsk);
+	switch_mm_irqs_off(active_mm, mm, tsk);
+	local_irq_enable();
 	task_unlock(tsk);
 #ifdef finish_arch_post_lock_switch
 	finish_arch_post_lock_switch();
@@ -54,11 +59,15 @@ void unuse_mm(struct mm_struct *mm)
 {
 	struct task_struct *tsk = current;
 
+	BUG_ON(!(tsk->flags & PF_KTHREAD));
+
 	task_lock(tsk);
 	sync_mm_rss(mm);
+	local_irq_disable();
 	tsk->mm = NULL;
 	/* active_mm is still 'mm' */
 	enter_lazy_tlb(mm, tsk);
+	local_irq_enable();
 	task_unlock(tsk);
 }
 EXPORT_SYMBOL_GPL(unuse_mm);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/tlb: Fix use_mm() vs TLB invalidate
  2020-02-21 11:11 [PATCH] mm/tlb: Fix use_mm() vs TLB invalidate Peter Zijlstra
@ 2020-02-21 19:19 ` Linus Torvalds
  2020-02-21 19:22   ` Andy Lutomirski
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2020-02-21 19:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Linux Kernel Mailing List, Andy Lutomirski,
	Jens Axboe, Kees Cook, Jann Horn, Will Deacon

On Fri, Feb 21, 2020 at 3:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> +       BUG_ON(!(tsk->flags & PF_KTHREAD));
> +       BUG_ON(tsk->mm != NULL);

Stop this craziness.

There is absolutely ZERO excuse for this kind of garbage.

Making this a BUG_ON() will just cause all the possible debugging info
to be thrown away and lost, and you often have a dead machine.

For absolutely no good reason.

Make it a WARN_ON_ONCE(). If it triggers, everything works the way it
always did, but we get notified.

Stop with the stupid crazy BUG_ON() crap already. It is actively _bad_
for debugging.

              Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/tlb: Fix use_mm() vs TLB invalidate
  2020-02-21 19:19 ` Linus Torvalds
@ 2020-02-21 19:22   ` Andy Lutomirski
  2020-02-21 19:26     ` Linus Torvalds
  2020-02-21 23:10     ` Kees Cook
  0 siblings, 2 replies; 7+ messages in thread
From: Andy Lutomirski @ 2020-02-21 19:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Andrew Morton, Linux Kernel Mailing List,
	Jens Axboe, Kees Cook, Jann Horn, Will Deacon



> On Feb 21, 2020, at 11:19 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Fri, Feb 21, 2020 at 3:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> +       BUG_ON(!(tsk->flags & PF_KTHREAD));
>> +       BUG_ON(tsk->mm != NULL);
> 
> Stop this craziness.
> 
> There is absolutely ZERO excuse for this kind of garbage.
> 
> Making this a BUG_ON() will just cause all the possible debugging info
> to be thrown away and lost, and you often have a dead machine.
> 
> For absolutely no good reason.
> 
> Make it a WARN_ON_ONCE(). If it triggers, everything works the way it
> always did, but we get notified.
> 
> Stop with the stupid crazy BUG_ON() crap already. It is actively _bad_
> for debugging.
> 
>  

In this particular case, if we actually flub this, we are very likely to cause data corruption — we’re about to do user access with the wrong mm.

So I suppose we could switch to init_mm and carry on. *Something* will crash, but it probably won’t corrupt data or take down the machine.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/tlb: Fix use_mm() vs TLB invalidate
  2020-02-21 19:22   ` Andy Lutomirski
@ 2020-02-21 19:26     ` Linus Torvalds
  2020-02-21 23:10     ` Kees Cook
  1 sibling, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2020-02-21 19:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Andrew Morton, Linux Kernel Mailing List,
	Jens Axboe, Kees Cook, Jann Horn, Will Deacon

On Fri, Feb 21, 2020 at 11:22 AM Andy Lutomirski <luto@amacapital.net> wrote:
>
> In this particular case, if we actually flub this, we are very likely to cause data corruption — we’re about to do user access with the wrong mm.

So?

IF this ever triggers, it has presumably been around for decades.

Nobody noticed.

Adding a WARN_ON_ONCE() means that somebody will now notice. Good.

Adding a BUG_ON() will now possibly mean that the machine is dead, and
is not sending the bug-report to anybody.

It's not going to hit, of course. But if you are 100% sure of that,
then it shouldn't exist at all.

And if you're not 100% sure of it, then it's going to be in some very
unusual circumstances (possibly some odd driver subsystem that does
something completely wrong - wouldn't be the first time), and the last
thing you want to do is lose the information about it.

So BUG_ON() is basically ALWAYS 100% the wrong thing to do. The
argument that "there could be memory corruption" is complete and utter
garbage. See above why.

Stop it. Seriously.

                Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/tlb: Fix use_mm() vs TLB invalidate
  2020-02-21 19:22   ` Andy Lutomirski
  2020-02-21 19:26     ` Linus Torvalds
@ 2020-02-21 23:10     ` Kees Cook
  2020-02-21 23:57       ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Kees Cook @ 2020-02-21 23:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Linux Kernel Mailing List, Jens Axboe, Jann Horn, Will Deacon

On Fri, Feb 21, 2020 at 11:22:16AM -0800, Andy Lutomirski wrote:
> 
> 
> > On Feb 21, 2020, at 11:19 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > On Fri, Feb 21, 2020 at 3:11 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >> 
> >> +       BUG_ON(!(tsk->flags & PF_KTHREAD));
> >> +       BUG_ON(tsk->mm != NULL);
> > 
> > Stop this craziness.
> > 
> > There is absolutely ZERO excuse for this kind of garbage.
> > 
> > Making this a BUG_ON() will just cause all the possible debugging info
> > to be thrown away and lost, and you often have a dead machine.
> > 
> > For absolutely no good reason.
> > 
> > Make it a WARN_ON_ONCE(). If it triggers, everything works the way it
> > always did, but we get notified.
> > 
> > Stop with the stupid crazy BUG_ON() crap already. It is actively _bad_
> > for debugging.
> > 
> >  
> 
> In this particular case, if we actually flub this, we are very likely to cause data corruption — we’re about to do user access with the wrong mm.
> 
> So I suppose we could switch to init_mm and carry on. *Something* will crash, but it probably won’t corrupt data or take down the machine.

Why not just fail after the WARN -- I wrote the patch for the (very few)
callers to handle the errors, clean up, and carry on.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/tlb: Fix use_mm() vs TLB invalidate
  2020-02-21 23:10     ` Kees Cook
@ 2020-02-21 23:57       ` Linus Torvalds
  2020-02-22  0:29         ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2020-02-21 23:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Linux Kernel Mailing List, Jens Axboe, Jann Horn, Will Deacon

On Fri, Feb 21, 2020 at 3:10 PM Kees Cook <keescook@chromium.org> wrote:
>
> Why not just fail after the WARN -- I wrote the patch for the (very few)
> callers to handle the errors, clean up, and carry on.

Can it actually fail? Or is this all just "let's add new error
conditions that make the code harder to read because they make no
actual sense"?

It's not clear that it's worth handling "cannot happen" situations. It
might be worth warning about them (that's questionable too, but at
least there's an argument that you "verify that it can't happen").

But having the error condition of a "cannot happen" situation then
percolate down and make other code more complex seems to be only a
downside.

It's not even security or "avoid data corruption" at that point. At
the point where you start adding more complexity for things that
aren't supposed to be possible in the first place, you're only going
to cause bugs.

Maybe not immediately. But "illegible code" ends up being "buggy code"
a decade later.

               Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/tlb: Fix use_mm() vs TLB invalidate
  2020-02-21 23:57       ` Linus Torvalds
@ 2020-02-22  0:29         ` Kees Cook
  0 siblings, 0 replies; 7+ messages in thread
From: Kees Cook @ 2020-02-22  0:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Linux Kernel Mailing List, Jens Axboe, Jann Horn, Will Deacon

On Fri, Feb 21, 2020 at 03:57:21PM -0800, Linus Torvalds wrote:
> On Fri, Feb 21, 2020 at 3:10 PM Kees Cook <keescook@chromium.org> wrote:
> > Why not just fail after the WARN -- I wrote the patch for the (very few)
> > callers to handle the errors, clean up, and carry on.
> 
> Can it actually fail? Or is this all just "let's add new error
> conditions that make the code harder to read because they make no
> actual sense"?

I was just trying to see if there was a reasonable "do not continue and
do stuff we just tested for". If this should just be WARN_ON_ONCE() and
continue anyway, so be it. My general guideline is to avoid continuing a
known-bad path.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-21 11:11 [PATCH] mm/tlb: Fix use_mm() vs TLB invalidate Peter Zijlstra
2020-02-21 19:19 ` Linus Torvalds
2020-02-21 19:22   ` Andy Lutomirski
2020-02-21 19:26     ` Linus Torvalds
2020-02-21 23:10     ` Kees Cook
2020-02-21 23:57       ` Linus Torvalds
2020-02-22  0:29         ` Kees Cook

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git