linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Lutomirski <luto@kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Borislav Petkov <bp@alien8.de>, Will Deacon <will.deacon@arm.com>,
	Rik van Riel <riel@surriel.com>, Jann Horn <jannh@google.com>,
	Adin Scannell <ascannell@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, David Miller <davem@davemloft.net>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>
Subject: Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition
Date: Thu, 23 Aug 2018 15:15:16 +1000	[thread overview]
Message-ID: <20180823151516.5dc87dbf@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <CA+55aFzfnWv3JoB0mR7iCX322KsiE+uRq3HcmOpciEAiTw-oLw@mail.gmail.com>

On Wed, 22 Aug 2018 21:54:48 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Aug 22, 2018 at 9:16 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > On Wed, 22 Aug 2018 20:35:16 -0700
> > Linus Torvalds <torvalds@linux-foundation.org> wrote:  
> > >
> > > And yes, those lazy tlbs are all kernel threads, but they can still
> > > speculatively load user addresses.  
> >
> > So?
> >
> > If the arch does not shoot those all down after the user page tables
> > are removed then it's buggy regardless of this short cut.  
> 
> Read the code.
> 
> That shortcut frees the pages *WITHOUT* doing that TLB flush.

I've read the code and I understand that. Hence I'm asking because I
did't think the changelog matched the change. But possibly it was
because I actually didn't read the changelog enough -- I guess it does
say the TLB operation is the problem. I may have got side tracked by
the word speculativee.

> It just
> does __tlb_remove_table(), which does *not* do that whole page
> queueing so that we can batch flush and then free later.
> 
> So the fast-case it's buggy, exactly because of the reason you state.

Okay sure, thanks for confirming it for me. I would ask for changelog
to be slightly expanded on, but maybe it's just my reading
comprehension needs improvement...

> > The only real problem I could see would be if a page walk cache still
> > points to the freed table, then the table gets re-allocated and used
> > elsewhere, and meanwhile a speculative access tries to load an entry
> > from the page that is an invalid form of page table that might cause
> > a machine check or something. That would be (u)arch specific, but if
> > that's what we're concerned with here it's a different issue and needs
> > to be documented as such.  
> 
> We've *seen* that case, we had exactly that when we were being
> aggressive about trying to avoid flushes for the lazy mmu case
> entirely, because "we can just flush when we activate the lazy mm
> state instead".
> 
> The logic was actually solid from a TLB case - who cares what garbage
> TLB entries we might speculatively fill, when we're going to flush
> them before they can possibly be used.
> 
> It turns out that that logic is solid, but hits another problem: at
> least some AMD CPU's will cause machine checks when the TLB entries
> are inconsistent with the machine setup. That can't happen with a
> *good* page table, but when you speculatively walk already freed
> tables, you can get any garbage.

Yeah that does make sense.

> 
> I forget what the exact trigger was, but this was actually reported.
> So you can't free page directory pages without flushing the tlb first
> (to make that internal tlb node caches are flushed).
> 
> So the logic for freeing leaf pages and freeing middle nodes has to be
> exactly the same: you make the modification to the page table to
> remove the node/leaf, you queue up the memory for freeing, you flush
> the tlb, and you then free the page.
> 
> That's the ordering that tlb_remove_page() has always honored, but
> that tlb_remove_tabl() didn't.
> 
> It honored it for the *normal* case, which is why it took so long to
> notice that the TLB shootdown had been broken on x86 when it moved to
> the "generic" code. The *normal* case does this all right, and batches
> things up, and then when the batch fills up it does a
> tlb_table_flush() which does the TLB flush and schedules the actual
> freeing.
> 
> But there were two cases that *didn't* do that. The special "I'm the
> only thread" fast case, and the "oops I ran out of memory, so now I'll
> fake it, and just synchronize with page twalkers manually, and then do
> that special direct remove without flushing the tlb".
> 
> NOTE! Jann triggered that one by
>  (a) forcing the machine low on memory
>  (b) force-poisoning the page tables immediately after free
> 
> I suspect it's really hard to trigger under normal loads, exactly
> because the *normal* case gets it right. It's literally the "oops, I
> can't batch up because I ran out of memory" case that gets it wrong.
> 
> (And the special single-thread + lazy or use_mm() case, but that's
> going to be entirely impossible to trigger, because in practice it's
> just a single thread, and you won't ever hit the magic timing needed
> that frees the page in the single thread at exactly the same time that
> some optimistic lazy mm on another cpu happens to speculatively load
> that address).
> 
> So the "atomic_read(&mm_users)" case is likely entirely impossible to
> trigger any sane way. But because Jann found the problem with the 'ran
> out of memory" case, we started looking at the more theoretical cases
> that matched the same kind of "no tlb flush before free" pattern.

Thanks for giving that background. In that case I'm happy with this fix.

Thanks,
Nick

  reply	other threads:[~2018-08-23  5:15 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-22 15:30 [PATCH 0/4] x86: TLB invalidate fixes Peter Zijlstra
2018-08-22 15:30 ` [PATCH 1/4] x86/mm/tlb: Revert the recent lazy TLB patches Peter Zijlstra
2018-08-22 21:37   ` Linus Torvalds
2018-08-22 22:11     ` Rik van Riel
2018-08-22 15:30 ` [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition Peter Zijlstra
2018-08-23  3:31   ` Nicholas Piggin
2018-08-23  3:35     ` Linus Torvalds
2018-08-23  3:44       ` Linus Torvalds
2018-08-23  4:16       ` Nicholas Piggin
2018-08-23  4:54         ` Linus Torvalds
2018-08-23  5:15           ` Nicholas Piggin [this message]
2018-08-24  8:42           ` Peter Zijlstra
2018-08-23 13:40   ` Will Deacon
2018-08-22 15:30 ` [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE Peter Zijlstra
2018-08-22 15:55   ` Peter Zijlstra
2018-08-23  3:45     ` Nicholas Piggin
2018-08-23  3:59       ` Linus Torvalds
2018-08-23  4:33         ` Nicholas Piggin
2018-08-23  5:03           ` Linus Torvalds
2018-08-23  5:58             ` Nicholas Piggin
2018-08-23  4:54         ` Benjamin Herrenschmidt
2018-08-23  5:11           ` Linus Torvalds
2018-08-23  5:20             ` Linus Torvalds
2018-08-23  6:48               ` Martin Schwidefsky
2018-08-23  5:21             ` Benjamin Herrenschmidt
2018-08-23  6:15               ` Nicholas Piggin
2018-08-23 13:39             ` Will Deacon
2018-08-24  8:47               ` Peter Zijlstra
2018-08-24 11:32                 ` Peter Zijlstra
2018-08-24 11:39                   ` Peter Zijlstra
2018-08-27  5:00                     ` Nicholas Piggin
2018-08-27  7:47                       ` Peter Zijlstra
2018-08-27  8:04                         ` Nicholas Piggin
2018-08-27  8:09                           ` Benjamin Herrenschmidt
2018-08-27  8:20                             ` Peter Zijlstra
2018-08-27  8:54                               ` Nicholas Piggin
2018-08-27  9:02                             ` Nicholas Piggin
2018-08-27 22:13                               ` Benjamin Herrenschmidt
2018-08-27 13:36                           ` Rik van Riel
2018-08-27 14:29                             ` Nicholas Piggin
2018-08-27  8:57                         ` removig ia64, was: " Christoph Hellwig
2018-08-27 11:28                           ` Peter Zijlstra
2018-08-27 11:45                           ` Jason Duerstock
2018-08-27 11:00                         ` Peter Zijlstra
2018-08-30  0:13                           ` Vineet Gupta
2018-08-30 10:23                             ` Peter Zijlstra
2018-08-24 17:26                 ` Nadav Amit
2018-08-24 18:04                   ` Peter Zijlstra
2018-08-24 18:35                     ` TLB flushes on fixmap changes Nadav Amit
2018-08-24 19:31                       ` Linus Torvalds
2018-08-24 20:24                         ` Nadav Amit
2018-08-25  0:58                           ` Linus Torvalds
2018-08-25  2:29                             ` nadav.amit
2018-08-25  4:23                               ` Andy Lutomirski
2018-08-26  2:23                                 ` Masami Hiramatsu
2018-08-26  4:21                                   ` Andy Lutomirski
2018-08-26  4:43                                     ` Kees Cook
2018-08-26  5:53                                       ` Nadav Amit
2018-08-26 14:20                                       ` Andy Lutomirski
2018-08-26 16:47                                         ` Kees Cook
2018-08-26 17:25                                           ` Andy Lutomirski
2018-08-26 20:15                                             ` Thomas Gleixner
2018-08-26 22:03                                               ` Kees Cook
2018-08-26 22:15                                                 ` Matthew Wilcox
2018-08-26 22:29                                                 ` Jann Horn
2018-08-26  9:09                                     ` Peter Zijlstra
2018-08-27  3:03                                       ` Masami Hiramatsu
2018-08-27  3:26                                         ` Nadav Amit
2018-08-27  8:05                                           ` Masami Hiramatsu
2018-08-27 17:34                                             ` Nadav Amit
2018-08-27 18:45                                               ` Andy Lutomirski
2018-08-27 18:54                                                 ` Nadav Amit
2018-08-27 18:58                                                   ` Andy Lutomirski
2018-08-27 19:10                                                     ` Nadav Amit
2018-08-27 19:43                                                       ` Nadav Amit
2018-08-27 19:58                                                         ` Andy Lutomirski
2018-08-27 20:16                                                           ` Nadav Amit
2018-08-27 21:55                                                             ` Nadav Amit
2018-08-27 22:32                                                               ` Andy Lutomirski
2018-08-27 22:54                                                                 ` Nadav Amit
2018-08-27 23:01                                                                   ` Andy Lutomirski
2018-08-28  8:49                                                                     ` Masami Hiramatsu
2018-08-28 17:33                                                                       ` Nadav Amit
2018-08-27  8:13                                         ` Peter Zijlstra
2018-08-27  9:39                                           ` Masami Hiramatsu
2018-08-27  9:55                                           ` Jann Horn
2018-08-26 22:48                                     ` Jann Horn
2018-08-24  8:35           ` [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE Peter Zijlstra
2018-08-24 13:13             ` Peter Zijlstra
2018-08-24 13:14               ` Peter Zijlstra
2018-08-24 15:49               ` Will Deacon
2018-08-23 23:31     ` Will Deacon
2018-08-22 21:34   ` Linus Torvalds
2018-08-23  8:46   ` Nicholas Piggin
2018-08-22 15:30 ` [PATCH 4/4] x86/mm: Only use tlb_remove_table() for paravirt Peter Zijlstra
2018-08-22 22:12   ` Eduardo Valentin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180823151516.5dc87dbf@roar.ozlabs.ibm.com \
    --to=npiggin@gmail.com \
    --cc=ascannell@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).