linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Nadav Amit <namit@vmware.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [RFC PATCH v2 11/12] x86/mm/tlb: Use async and inline messages for flushing
Date: Fri, 31 May 2019 13:37:38 -0700	[thread overview]
Message-ID: <CALCETrVf9dh4GxEXsHbP65x6YuzOBf+7HWqOgBBjUma+7nB6Nw@mail.gmail.com> (raw)
In-Reply-To: <4e0ed5a5-0e5e-3481-e646-3f032f17ac60@intel.com>

On Fri, May 31, 2019 at 1:13 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 5/31/19 12:31 PM, Nadav Amit wrote:
> >> On May 31, 2019, at 11:44 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> >>
> >>
> >>
> >>> On May 31, 2019, at 3:57 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>>
> >>>> On Thu, May 30, 2019 at 11:36:44PM -0700, Nadav Amit wrote:
> >>>> When we flush userspace mappings, we can defer the TLB flushes, as long
> >>>> the following conditions are met:
> >>>>
> >>>> 1. No tables are freed, since otherwise speculative page walks might
> >>>>  cause machine-checks.
> >>>>
> >>>> 2. No one would access userspace before flush takes place. Specifically,
> >>>>  NMI handlers and kprobes would avoid accessing userspace.
> >>>>
> >>>> Use the new SMP support to execute remote function calls with inlined
> >>>> data for the matter. The function remote TLB flushing function would be
> >>>> executed asynchronously and the local CPU would continue execution as
> >>>> soon as the IPI was delivered, before the function was actually
> >>>> executed. Since tlb_flush_info is copied, there is no risk it would
> >>>> change before the TLB flush is actually executed.
> >>>>
> >>>> Change nmi_uaccess_okay() to check whether a remote TLB flush is
> >>>> currently in progress on this CPU by checking whether the asynchronously
> >>>> called function is the remote TLB flushing function. The current
> >>>> implementation disallows access in such cases, but it is also possible
> >>>> to flush the entire TLB in such case and allow access.
> >>>
> >>> ARGGH, brain hurt. I'm not sure I fully understand this one. How is it
> >>> different from today, where the NMI can hit in the middle of the TLB
> >>> invalidation?
> >>>
> >>> Also; since we're not waiting on the IPI, what prevents us from freeing
> >>> the user pages before the remote CPU is 'done' with them? Currently the
> >>> synchronous IPI is like a sync point where we *know* the remote CPU is
> >>> completely done accessing the page.
> >>>
> >>> Where getting an IPI stops speculation, speculation again restarts
> >>> inside the interrupt handler, and until we've passed the INVLPG/MOV CR3,
> >>> speculation can happen on that TLB entry, even though we've already
> >>> freed and re-used the user-page.
> >>>
> >>> Also, what happens if the TLB invalidation IPI is stuck behind another
> >>> smp_function_call IPI that is doing user-access?
> >>>
> >>> As said,.. brain hurts.
> >>
> >> Speculation aside, any code doing dirty tracking needs the flush to happen
> >> for real before it reads the dirty bit.
> >>
> >> How does this patch guarantee that the flush is really done before someone
> >> depends on it?
> >
> > I was always under the impression that the dirty-bit is pass-through - the
> > A/D-assist walks the tables and sets the dirty bit upon access. Otherwise,
> > what happens when you invalidate the PTE, and have already marked the PTE as
> > non-present? Would the CPU set the dirty-bit at this point?
>
> Modulo bugs^Werrata...  No.  What actually happens is that a
> try-to-set-dirty-bit page table walk acts just like a TLB miss.  The old
> contents of the TLB are discarded and only the in-memory contents matter
> for forward progress.  If Present=0 when the PTE is reached, you'll get
> a normal Present=0 page fault.

Wait, does that mean that you can do a lock cmpxchg or similar to
clear the dirty and writable bits together and, if the dirty bit was
clear, skip the TLB flush?  If so, nifty!  Modulo errata, of course.
And I seem to remember some exceptions relating to CET shadow stack
involving the dirty bit being set on not-present pages.

>
> > In this regard, I remember this thread of Dave Hansen [1], which also seems
> > to me as supporting the notion the dirty-bit is set on write and not on
> > INVLPG.
>
> ... and that's the erratum I was hoping you wouldn't mention. :)
>
> But, yeah, I don't think it's possible to set the Dirty bit on INVLPG.
> The bits are set on establishing TLB entries, not on evicting or
> flushing them.
>
> I hope that clears it up.

  reply	other threads:[~2019-05-31 20:37 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-31  6:36 [RFC PATCH v2 00/12] x86: Flush remote TLBs concurrently and async Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 01/12] smp: Remove smp_call_function() and on_each_cpu() return values Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 02/12] smp: Run functions concurrently in smp_call_function_many() Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 03/12] x86/mm/tlb: Refactor common code into flush_tlb_on_cpus() Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 04/12] x86/mm/tlb: Flush remote and local TLBs concurrently Nadav Amit
2019-05-31 11:48   ` Juergen Gross
2019-05-31 19:44     ` Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 05/12] x86/mm/tlb: Optimize local TLB flushes Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 06/12] KVM: x86: Provide paravirtualized flush_tlb_multi() Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 07/12] smp: Do not mark call_function_data as shared Nadav Amit
2019-05-31 10:17   ` Peter Zijlstra
2019-05-31 17:50     ` Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 08/12] x86/tlb: Privatize cpu_tlbstate Nadav Amit
2019-05-31 18:48   ` Andy Lutomirski
2019-05-31 19:42     ` Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 09/12] x86/apic: Use non-atomic operations when possible Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 10/12] smp: Enable data inlining for inter-processor function call Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 11/12] x86/mm/tlb: Use async and inline messages for flushing Nadav Amit
2019-05-31 10:57   ` Peter Zijlstra
2019-05-31 18:29     ` Nadav Amit
2019-05-31 19:20       ` Jann Horn
2019-05-31 20:04         ` Nadav Amit
2019-05-31 20:37           ` Jann Horn
2019-05-31 18:44     ` Andy Lutomirski
2019-05-31 19:31       ` Nadav Amit
2019-05-31 20:13         ` Dave Hansen
2019-05-31 20:37           ` Andy Lutomirski [this message]
2019-05-31 20:42             ` Nadav Amit
2019-05-31 21:06             ` Dave Hansen
2019-05-31 21:14   ` Andy Lutomirski
2019-05-31 21:33     ` Nadav Amit
2019-05-31 21:47       ` Andy Lutomirski
2019-05-31 22:07         ` Nadav Amit
2019-06-07  5:28           ` Nadav Amit
2019-06-07 16:42             ` Andy Lutomirski
2019-05-31  6:36 ` [RFC PATCH v2 12/12] x86/mm/tlb: Reverting the removal of flush_tlb_info from stack Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrVf9dh4GxEXsHbP65x6YuzOBf+7HWqOgBBjUma+7nB6Nw@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).