KVM Archive on lore.kernel.org
 help / color / Atom feed
From: "Nakajima, Jun" <jun.nakajima@intel.com>
To: Will Deacon <will@kernel.org>
Cc: "Christopherson, Sean J" <sean.j.christopherson@intel.com>,
	Marc Zyngier <maz@kernel.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	"Jim Mattson" <jmattson@google.com>,
	Joerg Roedel <joro@8bytes.org>,
	"David Rientjes" <rientjes@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Kees Cook <keescook@chromium.org>, Will Drewry <wad@chromium.org>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"Kleen, Andi" <andi.kleen@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"kernel-team@android.com" <kernel-team@android.com>
Subject: Re: [RFC 00/16] KVM protected memory extension
Date: Thu, 4 Jun 2020 19:09:19 +0000
Message-ID: <6DBAB6A4-A1F9-40E9-B81B-74182DDCF939@intel.com> (raw)
In-Reply-To: <20200604163532.GE3650@willie-the-truck>

> 
> On Jun 4, 2020, at 9:35 AM, Will Deacon <will@kernel.org> wrote:
> 
> Hi Sean,
> 
> On Thu, Jun 04, 2020 at 08:48:35AM -0700, Sean Christopherson wrote:
>> On Thu, Jun 04, 2020 at 04:15:23PM +0100, Marc Zyngier wrote:
>>> On Fri, 22 May 2020 15:51:58 +0300
>>> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>>> 
>>>> == Background / Problem ==
>>>> 
>>>> There are a number of hardware features (MKTME, SEV) which protect guest
>>>> memory from some unauthorized host access. The patchset proposes a purely
>>>> software feature that mitigates some of the same host-side read-only
>>>> attacks.
>>>> 
>>>> 
>>>> == What does this set mitigate? ==
>>>> 
>>>> - Host kernel ”accidental” access to guest data (think speculation)
>>>> 
>>>> - Host kernel induced access to guest data (write(fd, &guest_data_ptr, len))
>>>> 
>>>> - Host userspace access to guest data (compromised qemu)
>>>> 
>>>> == What does this set NOT mitigate? ==
>>>> 
>>>> - Full host kernel compromise.  Kernel will just map the pages again.
>>>> 
>>>> - Hardware attacks
>>> 
>>> Just as a heads up, we (the Android kernel team) are currently
>>> involved in something pretty similar for KVM/arm64 in order to bring
>>> some level of confidentiality to guests.
>>> 
>>> The main idea is to de-privilege the host kernel by wrapping it in its
>>> own nested set of page tables which allows us to remove memory
>>> allocated to guests on a per-page basis. The core hypervisor runs more
>>> or less independently at its own privilege level. It still is KVM
>>> though, as we don't intend to reinvent the wheel.
>>> 
>>> Will has written a much more lingo-heavy description here:
>>> https://lore.kernel.org/kvmarm/20200327165935.GA8048@willie-the-truck/
>> 

We (Intel virtualization team) are also working on a similar thing, prototyping to meet such requirements, i..e "some level of confidentiality to guests”. Linux/KVM is the host, and the Kirill’s patches are helpful when removing the mappings from the host to achieve memory isolation of a guest. But, it’s not easy to prove there are no other mappings.

To raise the level of security, our idea is to de-privilege the host kernel just to enforce memory isolation using EPT (Extended Page Table) that virtualizes guest (the host kernel in this case) physical memory; almost everything is passthrough. And the EPT for the host kernel excludes the memory for the guest(s) that has confidential info. So, the host kernel shouldn’t cause VM exits as long as it’s behaving well (CPUID still causes a VM exit, though). 

When the control enters KVM, we go back to privileged (hypervisor or root) mode, and it works as does today. Once a VM exit happens, we will stay in the root mode as long as the exit can be handled within KVM. If we need to depend on the host kernel, we de-privilege the host kernel (i.e. VM enter). Yes, it sounds ugly.

There are cleaner (but more expensive) approaches, and we are collecting data at this point. For example, we could run the host kernel (like Xen dom0) on top of a thin? hypervisor that consists of KVM and minimally configured Linux.  

> 
>> IIUC, in this mode, the host kernel runs at EL1?  And to switch to a guest
>> it has to bounce through EL2, which is KVM, or at least a chunk of KVM?
>> I assume the EL1->EL2->EL1 switch is done by trapping an exception of some
>> form?
> 
> Yes, and this is actually the way that KVM works on some Arm CPUs today,
> as the original virtualisation extensions in the Armv8 architecture do
> not make it possible to run the kernel directly at EL2 (for example, there
> is only one page-table base register). This was later addressed in the
> architecture by the "Virtualisation Host Extensions (VHE)", and so KVM
> supports both options.
> 
> With non-VHE today, there is a small amount of "world switch" code at
> EL2 which is installed by the host kernel and provides a way to transition
> between the host and the guest. If the host needs to do something at EL2
> (e.g. privileged TLB invalidation), then it makes a hypercall (HVC instruction)
> via the kvm_call_hyp() macro (and this ends up just being a function call
> for VHE).
> 
>> If all of the above are "yes", does KVM already have the necessary logic to
>> perform the EL1->EL2->EL1 switches, or is that being added as part of the
>> de-privileging effort?
> 
> The logic is there as part of the non-VHE support code, but it's not great
> from a security angle. For example, the guest stage-2 page-tables are still
> allocated by the host, the host has complete access to guest and hypervisor
> memory (including hypervisor text) and things like kvm_call_hyp() are a bit
> of an open door. We're working on making the EL2 code more self contained,
> so that after the host has initialised KVM, it can shut the door and the
> hypervisor can install a stage-2 translation over the host, which limits its
> access to hypervisor and guest memory. There will clearly be IOMMU work as
> well to prevent DMA attacks.

Sounds interesting. 

--- 
Jun
Intel Open Source Technology Center






  reply index

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-22 12:51 Kirill A. Shutemov
2020-05-22 12:51 ` [RFC 01/16] x86/mm: Move force_dma_unencrypted() to common code Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 02/16] x86/kvm: Introduce KVM memory protection feature Kirill A. Shutemov
2020-05-25 14:58   ` Vitaly Kuznetsov
2020-05-25 15:15     ` Kirill A. Shutemov
2020-05-27  5:03       ` Sean Christopherson
2020-05-27  8:39         ` Vitaly Kuznetsov
2020-05-27  8:52           ` Sean Christopherson
2020-06-03  2:09           ` Huang, Kai
2020-06-03 11:14             ` Vitaly Kuznetsov
2020-05-22 12:52 ` [RFC 03/16] x86/kvm: Make DMA pages shared Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 04/16] x86/kvm: Use bounce buffers for KVM memory protection Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 05/16] x86/kvm: Make VirtIO use DMA API in KVM guest Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 06/16] KVM: Use GUP instead of copy_from/to_user() to access guest memory Kirill A. Shutemov
2020-05-25 15:08   ` Vitaly Kuznetsov
2020-05-25 15:17     ` Kirill A. Shutemov
2020-06-01 16:35       ` Paolo Bonzini
2020-06-02 13:33         ` Kirill A. Shutemov
2020-05-26  6:14   ` Mike Rapoport
2020-05-26 21:56     ` Kirill A. Shutemov
2020-05-29 15:24   ` Kees Cook
2020-05-22 12:52 ` [RFC 07/16] KVM: mm: Introduce VM_KVM_PROTECTED Kirill A. Shutemov
2020-05-26  6:15   ` Mike Rapoport
2020-05-26 22:01     ` Kirill A. Shutemov
2020-05-26  6:40   ` John Hubbard
2020-05-26 22:04     ` Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 08/16] KVM: x86: Use GUP for page walk instead of __get_user() Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 09/16] KVM: Protected memory extension Kirill A. Shutemov
2020-05-25 15:26   ` Vitaly Kuznetsov
2020-05-25 15:34     ` Kirill A. Shutemov
2020-06-03  1:34       ` Huang, Kai
2020-05-22 12:52 ` [RFC 10/16] KVM: x86: Enabled protected " Kirill A. Shutemov
2020-05-25 15:26   ` Vitaly Kuznetsov
2020-05-26  6:16   ` Mike Rapoport
2020-05-26 21:58     ` Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 11/16] KVM: Rework copy_to/from_guest() to avoid direct mapping Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 12/16] x86/kvm: Share steal time page with host Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 13/16] x86/kvmclock: Share hvclock memory with the host Kirill A. Shutemov
2020-05-25 15:22   ` Vitaly Kuznetsov
2020-05-25 15:25     ` Kirill A. Shutemov
2020-05-25 15:42       ` Vitaly Kuznetsov
2020-05-22 12:52 ` [RFC 14/16] KVM: Introduce gfn_to_pfn_memslot_protected() Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 15/16] KVM: Handle protected memory in __kvm_map_gfn()/__kvm_unmap_gfn() Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 16/16] KVM: Unmap protected pages from direct mapping Kirill A. Shutemov
2020-05-26  6:16   ` Mike Rapoport
2020-05-26 22:10     ` Kirill A. Shutemov
2020-05-25  5:27 ` [RFC 00/16] KVM protected memory extension Kirill A. Shutemov
2020-05-25 13:47 ` Liran Alon
2020-05-25 14:46   ` Kirill A. Shutemov
2020-05-25 15:56     ` Liran Alon
2020-05-26  6:17   ` Mike Rapoport
2020-05-26 10:16     ` Liran Alon
2020-05-26 11:38       ` Mike Rapoport
2020-05-27 15:45         ` Dave Hansen
2020-05-27 21:22           ` Mike Rapoport
2020-06-04 15:15 ` Marc Zyngier
2020-06-04 15:48   ` Sean Christopherson
2020-06-04 16:27     ` Marc Zyngier
2020-06-04 16:35     ` Will Deacon
2020-06-04 19:09       ` Nakajima, Jun [this message]
2020-06-04 21:03         ` Jim Mattson
2020-06-04 23:29           ` Nakajima, Jun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6DBAB6A4-A1F9-40E9-B81B-74182DDCF939@intel.com \
    --to=jun.nakajima@intel.com \
    --cc=aarcange@redhat.com \
    --cc=andi.kleen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=keescook@chromium.org \
    --cc=kernel-team@android.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wad@chromium.org \
    --cc=wanpengli@tencent.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git