linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	the arch/x86 maintainers <x86@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Jann Horn <jannh@google.com>
Subject: Re: [RFC PATCH v2 11/12] x86/mm/tlb: Use async and inline messages for flushing
Date: Fri, 31 May 2019 18:29:33 +0000	[thread overview]
Message-ID: <82791CFA-3748-4881-9F01-53F677108FC3@vmware.com> (raw)
In-Reply-To: <20190531105758.GO2623@hirez.programming.kicks-ass.net>

[ +Jann Horn ]

> On May 31, 2019, at 3:57 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, May 30, 2019 at 11:36:44PM -0700, Nadav Amit wrote:
>> When we flush userspace mappings, we can defer the TLB flushes, as long
>> the following conditions are met:
>> 
>> 1. No tables are freed, since otherwise speculative page walks might
>>   cause machine-checks.
>> 
>> 2. No one would access userspace before flush takes place. Specifically,
>>   NMI handlers and kprobes would avoid accessing userspace.
>> 
>> Use the new SMP support to execute remote function calls with inlined
>> data for the matter. The function remote TLB flushing function would be
>> executed asynchronously and the local CPU would continue execution as
>> soon as the IPI was delivered, before the function was actually
>> executed. Since tlb_flush_info is copied, there is no risk it would
>> change before the TLB flush is actually executed.
>> 
>> Change nmi_uaccess_okay() to check whether a remote TLB flush is
>> currently in progress on this CPU by checking whether the asynchronously
>> called function is the remote TLB flushing function. The current
>> implementation disallows access in such cases, but it is also possible
>> to flush the entire TLB in such case and allow access.
> 
> ARGGH, brain hurt. I'm not sure I fully understand this one. How is it
> different from today, where the NMI can hit in the middle of the TLB
> invalidation?

Let’s assume that a page mapping is removed so a PTE is changed from present
to non-present. Today, the page would only be recycled after all the cores
flushed their TLB. So NMI handlers accessing the page are safe - they might
either access the previously mapped page (which still has not been reused)
or fault, but they cannot access the wrong data.

Following this change it might happen that one core would still have stale
mappings for a page that has already recycled for a new use (e.g., it is
allocated to a different process). The core that has the stale mapping would
find itself in the interrupt handler. But we must ensure the NMI handler
does not access this page.

> Also; since we're not waiting on the IPI, what prevents us from freeing
> the user pages before the remote CPU is 'done' with them? Currently the
> synchronous IPI is like a sync point where we *know* the remote CPU is
> completely done accessing the page.

We might free them before the remote TLB flush is done, but not that only
after the IPI has been acknowledged and right before the actual TLB flush
is about to start. See flush_smp_call_function_queue().

                if (csd->flags & CSD_FLAG_SYNCHRONOUS) {
                        func(info);
                        csd_unlock(csd);
                } else {
                        this_cpu_write(async_func_in_progress, csd->func);
                        csd_unlock(csd); (*)
                        func(info);

    
We do wait for the IPI to be received and the entry of the specific TLB
invalidation to be processed. Following the instruction that is marked with
(*) the initiator to the TLB shootdown would be able to continue, but not
before. We just do not wait for the actual flush (called by func()) to take
place.

> Where getting an IPI stops speculation, speculation again restarts
> inside the interrupt handler, and until we've passed the INVLPG/MOV CR3,
> speculation can happen on that TLB entry, even though we've already
> freed and re-used the user-page.

Let’s consider what the impact of such speculation might be. Virtually
indexed caches might have the wrong data. But this data should not be used
before the flush, since we are already in the interrupt handler, and we
should be able to prevent accesses to userspace.

A new vector for Spectre-type attacks might be opened. but I don’t see how.

A #MC might be caused. I tried to avoid it by not allowing freeing of
page-tables in such way. Did I miss something else? Some interaction with
MTRR changes? I’ll think about it some more, but I don’t see how.


> Also, what happens if the TLB invalidation IPI is stuck behind another
> smp_function_call IPI that is doing user-access?

It’s not exactly the IPI but the call_single_data. In such case, as shown
above, csd_unlock() is only called once you are about to start handling
the specific TLB flush.

> 
> As said,.. brain hurts.

I understand. Mine too. The textbook says that we need to flush all the TLBs
before we can recycle the page, be guaranteed that write-protected page will
not change, and so on. I think that since we should be able to guarantee
that the userspace address, which are invalidated, would never be used by
the kernel before the flush, we should be safe. After the flush, all the
artifact of speculative caching should disappear.

I might be wrong, and I do question myself, but I don’t see how.


  reply	other threads:[~2019-05-31 18:29 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-31  6:36 [RFC PATCH v2 00/12] x86: Flush remote TLBs concurrently and async Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 01/12] smp: Remove smp_call_function() and on_each_cpu() return values Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 02/12] smp: Run functions concurrently in smp_call_function_many() Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 03/12] x86/mm/tlb: Refactor common code into flush_tlb_on_cpus() Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 04/12] x86/mm/tlb: Flush remote and local TLBs concurrently Nadav Amit
2019-05-31 11:48   ` Juergen Gross
2019-05-31 19:44     ` Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 05/12] x86/mm/tlb: Optimize local TLB flushes Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 06/12] KVM: x86: Provide paravirtualized flush_tlb_multi() Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 07/12] smp: Do not mark call_function_data as shared Nadav Amit
2019-05-31 10:17   ` Peter Zijlstra
2019-05-31 17:50     ` Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 08/12] x86/tlb: Privatize cpu_tlbstate Nadav Amit
2019-05-31 18:48   ` Andy Lutomirski
2019-05-31 19:42     ` Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 09/12] x86/apic: Use non-atomic operations when possible Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 10/12] smp: Enable data inlining for inter-processor function call Nadav Amit
2019-05-31  6:36 ` [RFC PATCH v2 11/12] x86/mm/tlb: Use async and inline messages for flushing Nadav Amit
2019-05-31 10:57   ` Peter Zijlstra
2019-05-31 18:29     ` Nadav Amit [this message]
2019-05-31 19:20       ` Jann Horn
2019-05-31 20:04         ` Nadav Amit
2019-05-31 20:37           ` Jann Horn
2019-05-31 18:44     ` Andy Lutomirski
2019-05-31 19:31       ` Nadav Amit
2019-05-31 20:13         ` Dave Hansen
2019-05-31 20:37           ` Andy Lutomirski
2019-05-31 20:42             ` Nadav Amit
2019-05-31 21:06             ` Dave Hansen
2019-05-31 21:14   ` Andy Lutomirski
2019-05-31 21:33     ` Nadav Amit
2019-05-31 21:47       ` Andy Lutomirski
2019-05-31 22:07         ` Nadav Amit
2019-06-07  5:28           ` Nadav Amit
2019-06-07 16:42             ` Andy Lutomirski
2019-05-31  6:36 ` [RFC PATCH v2 12/12] x86/mm/tlb: Reverting the removal of flush_tlb_info from stack Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82791CFA-3748-4881-9F01-53F677108FC3@vmware.com \
    --to=namit@vmware.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).