linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liran Alon <liran.alon@oracle.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Marius Hillenbrand <mhillenb@amazon.de>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-hardening@lists.openwall.com, linux-mm@kvack.org,
	Alexander Graf <graf@amazon.de>,
	David Woodhouse <dwmw@amazon.co.uk>
Subject: Re: [RFC 00/10] Process-local memory allocations for hiding KVM secrets
Date: Thu, 13 Jun 2019 13:54:51 +0300	[thread overview]
Message-ID: <65D4DBEB-5A9A-457D-909B-2D31A3031607@oracle.com> (raw)
In-Reply-To: <20190612182550.GI20308@linux.intel.com>



> On 12 Jun 2019, at 21:25, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Wed, Jun 12, 2019 at 07:08:24PM +0200, Marius Hillenbrand wrote:
>> The Linux kernel has a global address space that is the same for any
>> kernel code. This address space becomes a liability in a world with
>> processor information leak vulnerabilities, such as L1TF. With the right
>> cache load gadget, an attacker-controlled hyperthread pair can leak
>> arbitrary data via L1TF. Disabling hyperthreading is one recommended
>> mitigation, but it comes with a large performance hit for a wide range
>> of workloads.
>> 
>> An alternative mitigation is to not make certain data in the kernel
>> globally visible, but only when the kernel executes in the context of
>> the process where this data belongs to.
>> 
>> This patch series proposes to introduce a region for what we call
>> process-local memory into the kernel's virtual address space. Page
>> tables and mappings in that region will be exclusive to one address
>> space, instead of implicitly shared between all kernel address spaces.
>> Any data placed in that region will be out of reach of cache load
>> gadgets that execute in different address spaces. To implement
>> process-local memory, we introduce a new interface kmalloc_proclocal() /
>> kfree_proclocal() that allocates and maps pages exclusively into the
>> current kernel address space. As a first use case, we move architectural
>> state of guest CPUs in KVM out of reach of other kernel address spaces.
> 
> Can you briefly describe what types of attacks this is intended to
> mitigate?  E.g. guest-guest, userspace-guest, etc...  I don't want to
> make comments based on my potentially bad assumptions.

I think I can assist in the explanation.

Consider the following scenario:
1) Hyperthread A in CPU core runs in guest and triggers a VMExit which is handled by host kernel.
While hyperthread A runs VMExit handler, it populates CPU core cache / internal-resources (e.g. MDS buffers)
with some sensitive data it have speculatively/architecturally access.
2) During hyperthread A running on host kernel, hyperthread B on same CPU core runs in guest and use
some CPU speculative execution vulnerability to leak the sensitive host data populated by hyperthread A
in CPU core cache / internal-resources.

Current CPU microcode mitigations (L1D/MDS flush) only handle the case of a single hyperthread and don’t
provide a mechanism to mitigate this hyperthreading attack scenario.

Assuming there is some guest triggerable speculative load gadget in some VMExit path,
it can be used to force any data that is mapped into kernel address space to be loaded into CPU resource that is subject to leak.
Therefore, there were multiple attempts to reduce sensitive information from being mapped into the kernel address space
that is accessible by this VMExit path.

One attempt was XPFO which attempts to remove from kernel direct-map any page that is currently used only by userspace.
Unfortunately, XPFO currently exhibits multiple performance issues that *currently* makes it impractical as far as I know.

Another attempt is this patch-series which attempts to remove from one vCPU thread host kernel address space,
the state of vCPUs of other guests. Which is very specific but I personally have additional ideas on how this patch series can be further used.
For example, vhost-net needs to kmap entire guest memory into kernel-space to write ingress packets data into guest memory.
Thus, vCPU thread kernel address space now maps entire other guest memory which can be leaked using the technique described above.
Therefore, it should be useful to also move this kmap() to happen on process-local kernel virtual address region.

One could argue however that there is still a much bigger issue because of kernel direct-map that maps all physical pages that kernel
manage (i.e. have struct page) in kernel virtual address space. And all of those pages can theoretically be leaked.
However, this could be handled by complementary techniques such as booting host kernel with “mem=X” and mapping guest memory
by directly mmap relevant portion of /dev/mem.
Which is probably what AWS does given these upstream KVM patches they have contributed:
bd53cb35a3e9 X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs
e45adf665a53 KVM: Introduce a new guest mapping API
0c55671f84ff kvm, x86: Properly check whether a pfn is an MMIO or not

Also note that when using such “mem=X” technique, you can also avoid performance penalties introduced by CPU microcode mitigations.
E.g. You can avoid doing L1D flush on VMEntry if VMExit handler run only in kernel and didn’t context-switch as you assume kernel address
space don’t map any host sensitive data.

It’s also worth mentioning that another alternative that I have attempted to this “mem=X” technique
was to create an isolated address space that is only used when running KVM VMExit handlers.
For more information, refer to:
https://lkml.org/lkml/2019/5/13/515
(See some of my comments on that thread)

This is my 2cents on this at least.

-Liran



  parent reply	other threads:[~2019-06-13 15:29 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-12 17:08 [RFC 00/10] Process-local memory allocations for hiding KVM secrets Marius Hillenbrand
2019-06-12 17:08 ` [RFC 01/10] x86/mm/kaslr: refactor to use enum indices for regions Marius Hillenbrand
2019-06-12 17:08 ` [RFC 02/10] x86/speculation, mm: add process local virtual memory region Marius Hillenbrand
2019-06-12 17:08 ` [RFC 03/10] x86/mm, mm,kernel: add teardown for process-local memory to mm cleanup Marius Hillenbrand
2019-06-12 17:08 ` [RFC 04/10] mm: allocate virtual space for process-local memory Marius Hillenbrand
2019-06-12 17:08 ` [RFC 05/10] mm: allocate/release physical pages " Marius Hillenbrand
2019-06-12 17:08 ` [RFC 06/10] kvm/x86: add support for storing vCPU state in " Marius Hillenbrand
2019-06-12 17:08 ` [RFC 07/10] kvm, vmx: move CR2 context switch out of assembly path Marius Hillenbrand
2019-06-12 17:08 ` [RFC 08/10] kvm, vmx: move register clearing " Marius Hillenbrand
2019-06-12 17:08 ` [RFC 09/10] kvm, vmx: move gprs to process local memory Marius Hillenbrand
2019-06-12 17:08 ` [RFC 10/10] kvm, x86: move guest FPU state into " Marius Hillenbrand
2019-06-12 18:25 ` [RFC 00/10] Process-local memory allocations for hiding KVM secrets Sean Christopherson
2019-06-13  7:20   ` Alexander Graf
2019-06-13 10:54   ` Liran Alon [this message]
2019-06-12 19:55 ` Dave Hansen
2019-06-12 20:27   ` Andy Lutomirski
2019-06-12 20:41     ` Dave Hansen
2019-06-12 20:56       ` Andy Lutomirski
2019-06-13  1:30     ` Andy Lutomirski
2019-06-13  1:50       ` Nadav Amit
2019-06-13 16:16         ` Andy Lutomirski
2019-06-13  7:52       ` Alexander Graf
2019-06-13 16:13         ` Andy Lutomirski
2019-06-13 16:20           ` Dave Hansen
2019-06-13 17:29             ` Nadav Amit
2019-06-13 17:49               ` Dave Hansen
2019-06-13 20:05                 ` Sean Christopherson
2019-06-14 14:21     ` Thomas Gleixner
2019-06-16 22:18       ` Andy Lutomirski
2019-06-16 22:28         ` Thomas Gleixner
2019-06-17  7:38       ` Alexander Graf
2019-06-17 15:50         ` Dave Hansen
2019-06-17 15:54           ` Andy Lutomirski
2019-06-17 16:03             ` Dave Hansen
2019-06-17 16:14               ` Andy Lutomirski
2019-06-17 16:53                 ` Nadav Amit
2019-06-17 18:07                   ` Dave Hansen
2019-06-17 18:45                     ` Konrad Rzeszutek Wilk
2019-06-17 18:49                       ` Dave Hansen
2019-06-17 18:53                       ` Andy Lutomirski
2019-06-17 18:50                     ` Nadav Amit
2019-06-17 18:55                       ` Dave Hansen
2019-06-13  7:27   ` Alexander Graf
2019-06-13 14:19     ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65D4DBEB-5A9A-457D-909B-2D31A3031607@oracle.com \
    --to=liran.alon@oracle.com \
    --cc=dwmw@amazon.co.uk \
    --cc=graf@amazon.de \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhillenb@amazon.de \
    --cc=sean.j.christopherson@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).