linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Graf <graf@amazon.com>
To: Andy Lutomirski <luto@kernel.org>,
	Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Radim Krcmar <rkrcmar@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	kvm list <kvm@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	<jan.setjeeilers@oracle.com>, Liran Alon <liran.alon@oracle.com>,
	Jonathan Adams <jwadams@google.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Paul Turner <pjt@google.com>
Subject: Re: [RFC v2 00/27] Kernel Address Space Isolation
Date: Sun, 14 Jul 2019 20:17:30 +0200	[thread overview]
Message-ID: <849b74ba-2ce4-04e9-557c-6d8c8ec29e16@amazon.com> (raw)
In-Reply-To: <CALCETrVcM-SpEqLMJSOdyGuN0gjr+97+cpu2KYneuTv1fJDoog@mail.gmail.com>



On 12.07.19 16:36, Andy Lutomirski wrote:
> On Fri, Jul 12, 2019 at 6:45 AM Alexandre Chartre
> <alexandre.chartre@oracle.com> wrote:
>>
>>
>> On 7/12/19 2:50 PM, Peter Zijlstra wrote:
>>> On Fri, Jul 12, 2019 at 01:56:44PM +0200, Alexandre Chartre wrote:
>>>
>>>> I think that's precisely what makes ASI and PTI different and independent.
>>>> PTI is just about switching between userland and kernel page-tables, while
>>>> ASI is about switching page-table inside the kernel. You can have ASI without
>>>> having PTI. You can also use ASI for kernel threads so for code that won't
>>>> be triggered from userland and so which won't involve PTI.
>>>
>>> PTI is not mapping         kernel space to avoid             speculation crap (meltdown).
>>> ASI is not mapping part of kernel space to avoid (different) speculation crap (MDS).
>>>
>>> See how very similar they are?
>>>
>>>
>>> Furthermore, to recover SMT for userspace (under MDS) we not only need
>>> core-scheduling but core-scheduling per address space. And ASI was
>>> specifically designed to help mitigate the trainwreck just described.
>>>
>>> By explicitly exposing (hopefully harmless) part of the kernel to MDS,
>>> we reduce the part that needs core-scheduling and thus reduce the rate
>>> the SMT siblngs need to sync up/schedule.
>>>
>>> But looking at it that way, it makes no sense to retain 3 address
>>> spaces, namely:
>>>
>>>     user / kernel exposed / kernel private.
>>>
>>> Specifically, it makes no sense to expose part of the kernel through MDS
>>> but not through Meltdow. Therefore we can merge the user and kernel
>>> exposed address spaces.
>>
>> The goal of ASI is to provide a reduced address space which exclude sensitive
>> data. A user process (for example a database daemon, a web server, or a vmm
>> like qemu) will likely have sensitive data mapped in its user address space.
>> Such data shouldn't be mapped with ASI because it can potentially leak to the
>> sibling hyperthread. For example, if an hyperthread is running a VM then the
>> VM could potentially access user sensitive data if they are mapped on the
>> sibling hyperthread with ASI.
> 
> So I've proposed the following slightly hackish thing:
> 
> Add a mechanism (call it /dev/xpfo).  When you open /dev/xpfo and
> fallocate it to some size, you allocate that amount of memory and kick
> it out of the kernel direct map.  (And pay the IPI cost unless there
> were already cached non-direct-mapped pages ready.)  Then you map
> *that* into your VMs.  Now, for a dedicated VM host, you map *all* the
> VM private memory from /dev/xpfo.  Pretend it's SEV if you want to
> determine which pages can be set up like this.
> 
> Does this get enough of the benefit at a negligible fraction of the
> code complexity cost?  (This plus core scheduling, anyway.)

The problem with that approach is that you lose the ability to run 
legacy workloads that do not support an SEV like model of "guest owned" 
and "host visible" pages, but instead assume you can DMA anywhere.

Without that, your host will have visibility into guest pages via user 
space (QEMU) pages which again are mapped in the kernel direct map, so 
can be exposed via a spectre gadget into a malicious guest.

Also, please keep in mind that even register state of other VMs may be a 
secret that we do not want to leak into other guests.


Alex

  reply	other threads:[~2019-07-14 18:17 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-11 14:25 [RFC v2 00/27] Kernel Address Space Isolation Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 01/26] mm/x86: Introduce kernel address space isolation Alexandre Chartre
2019-07-11 21:33   ` Thomas Gleixner
2019-07-12  7:43     ` Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 02/26] mm/asi: Abort isolation on interrupt, exception and context switch Alexandre Chartre
2019-07-11 20:11   ` Andi Kleen
2019-07-11 20:17     ` Mike Rapoport
2019-07-11 20:41       ` Alexandre Chartre
2019-07-12  0:05   ` Andy Lutomirski
2019-07-12  7:50     ` Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 03/26] mm/asi: Handle page fault due to address space isolation Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 04/26] mm/asi: Functions to track buffers allocated for an ASI page-table Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 05/26] mm/asi: Add ASI page-table entry offset functions Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 06/26] mm/asi: Add ASI page-table entry allocation functions Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 07/26] mm/asi: Add ASI page-table entry set functions Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 08/26] mm/asi: Functions to populate an ASI page-table from a VA range Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 09/26] mm/asi: Helper functions to map module into ASI Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 10/26] mm/asi: Keep track of VA ranges mapped in ASI page-table Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 11/26] mm/asi: Functions to clear ASI page-table entries for a VA range Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 12/26] mm/asi: Function to copy page-table entries for percpu buffer Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 13/26] mm/asi: Add asi_remap() function Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 14/26] mm/asi: Handle ASI mapped range leaks and overlaps Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 15/26] mm/asi: Initialize the ASI page-table with core mappings Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 16/26] mm/asi: Option to map current task into ASI Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 17/26] rcu: Move tree.h static forward declarations to tree.c Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 18/26] rcu: Make percpu rcu_data non-static Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 19/26] mm/asi: Add option to map RCU data Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 20/26] mm/asi: Add option to map cpu_hw_events Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 21/26] mm/asi: Make functions to read cr3/cr4 ASI aware Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 22/26] KVM: x86/asi: Introduce address_space_isolation module parameter Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 23/26] KVM: x86/asi: Introduce KVM address space isolation Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 24/26] KVM: x86/asi: Populate the KVM ASI page-table Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 25/26] KVM: x86/asi: Switch to KVM address space on entry to guest Alexandre Chartre
2019-07-11 14:25 ` [RFC v2 26/26] KVM: x86/asi: Map KVM memslots and IO buses into KVM ASI Alexandre Chartre
2019-07-11 14:40 ` [RFC v2 00/27] Kernel Address Space Isolation Alexandre Chartre
2019-07-11 22:38 ` Dave Hansen
2019-07-12  8:09   ` Alexandre Chartre
2019-07-12 13:51     ` Dave Hansen
2019-07-12 14:06       ` Alexandre Chartre
2019-07-12 15:23         ` Thomas Gleixner
2019-07-12 10:44   ` Thomas Gleixner
2019-07-12 11:56     ` Alexandre Chartre
2019-07-12 12:50       ` Peter Zijlstra
2019-07-12 13:43         ` Alexandre Chartre
2019-07-12 13:58           ` Dave Hansen
2019-07-12 14:36           ` Andy Lutomirski
2019-07-14 18:17             ` Alexander Graf [this message]
2019-07-12 13:54         ` Dave Hansen
2019-07-12 15:20           ` Peter Zijlstra
2019-07-12 15:16         ` Thomas Gleixner
2019-07-12 16:37           ` Alexandre Chartre
2019-07-12 16:45             ` Andy Lutomirski
2019-07-14 17:11               ` Mike Rapoport
2019-07-12 19:06             ` Peter Zijlstra
2019-07-14 15:06               ` Andy Lutomirski
2019-07-15 10:33                 ` Peter Zijlstra
2019-07-12 19:48             ` Thomas Gleixner
2019-07-15  8:23               ` Alexandre Chartre
2019-07-15  8:28                 ` Thomas Gleixner
2019-07-12 16:00       ` Thomas Gleixner
2019-07-12 11:44 ` Peter Zijlstra
2019-07-12 12:17   ` Alexandre Chartre
2019-07-12 12:36     ` Peter Zijlstra
2019-07-12 12:47       ` Alexandre Chartre
2019-07-12 13:07         ` Peter Zijlstra
2019-07-12 13:46           ` Alexandre Chartre
2019-07-31 16:31             ` Dario Faggioli
2019-08-22 12:31               ` Alexandre Chartre
2020-07-01 13:55 hackapple
2020-07-01 14:00 黄金海
2020-07-01 14:02 黄金海

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=849b74ba-2ce4-04e9-557c-6d8c8ec29e16@amazon.com \
    --to=graf@amazon.com \
    --cc=alexandre.chartre@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jan.setjeeilers@oracle.com \
    --cc=jwadams@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liran.alon@oracle.com \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rkrcmar@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).