linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>
Cc: David Rientjes <rientjes@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Kees Cook <keescook@chromium.org>, Will Drewry <wad@chromium.org>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"Kleen, Andi" <andi.kleen@intel.com>,
	x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [RFC 00/16] KVM protected memory extension
Date: Fri, 22 May 2020 15:51:58 +0300	[thread overview]
Message-ID: <20200522125214.31348-1-kirill.shutemov@linux.intel.com> (raw)

== Background / Problem ==

There are a number of hardware features (MKTME, SEV) which protect guest
memory from some unauthorized host access. The patchset proposes a purely
software feature that mitigates some of the same host-side read-only
attacks.


== What does this set mitigate? ==

 - Host kernel ”accidental” access to guest data (think speculation)

 - Host kernel induced access to guest data (write(fd, &guest_data_ptr, len))

 - Host userspace access to guest data (compromised qemu)

== What does this set NOT mitigate? ==

 - Full host kernel compromise.  Kernel will just map the pages again.

 - Hardware attacks


The patchset is RFC-quality: it works but has known issues that must be
addressed before it can be considered for applying.

We are looking for high-level feedback on the concept.  Some open
questions:

 - This protects from some kernel and host userspace read-only attacks,
   but does not place the host kernel outside the trust boundary. Is it
   still valuable?

 - Can this approach be used to avoid cache-coherency problems with
   hardware encryption schemes that repurpose physical bits?

 - The guest kernel must be modified for this to work.  Is that a deal
   breaker, especially for public clouds?

 - Are the costs of removing pages from the direct map too high to be
   feasible?

== Series Overview ==

The hardware features protect guest data by encrypting it and then
ensuring that only the right guest can decrypt it.  This has the
side-effect of making the kernel direct map and userspace mapping
(QEMU et al) useless.  But, this teaches us something very useful:
neither the kernel or userspace mappings are really necessary for normal
guest operations.

Instead of using encryption, this series simply unmaps the memory. One
advantage compared to allowing access to ciphertext is that it allows bad
accesses to be caught instead of simply reading garbage.

Protection from physical attacks needs to be provided by some other means.
On Intel platforms, (single-key) Total Memory Encryption (TME) provides
mitigation against physical attacks, such as DIMM interposers sniffing
memory bus traffic.

The patchset modifies both host and guest kernel. The guest OS must enable
the feature via hypercall and mark any memory range that has to be shared
with the host: DMA regions, bounce buffers, etc. SEV does this marking via a
bit in the guest’s page table while this approach uses a hypercall.

For removing the userspace mapping, use a trick similar to what NUMA
balancing does: convert memory that belongs to KVM memory slots to
PROT_NONE: all existing entries converted to PROT_NONE with mprotect() and
the newly faulted in pages get PROT_NONE from the updated vm_page_prot.
The new VMA flag -- VM_KVM_PROTECTED -- indicates that the pages in the
VMA must be treated in a special way in the GUP and fault paths. The flag
allows GUP to return the page even though it is mapped with PROT_NONE, but
only if the new GUP flag -- FOLL_KVM -- is specified. Any userspace access
to the memory would result in SIGBUS. Any GUP access without FOLL_KVM
would result in -EFAULT.

Any anonymous page faulted into the VM_KVM_PROTECTED VMA gets removed from
the direct mapping with kernel_map_pages(). Note that kernel_map_pages() only
flushes local TLB. I think it's a reasonable compromise between security and
perfromance.

Zapping the PTE would bring the page back to the direct mapping after clearing.
At least for now, we don't remove file-backed pages from the direct mapping.
File-backed pages could be accessed via read/write syscalls. It adds
complexity.

Occasionally, host kernel has to access guest memory that was not made
shared by the guest. For instance, it happens for instruction emulation.
Normally, it's done via copy_to/from_user() which would fail with -EFAULT
now. We introduced a new pair of helpers: copy_to/from_guest(). The new
helpers acquire the page via GUP, map it into kernel address space with
kmap_atomic()-style mechanism and only then copy the data.

For some instruction emulation copying is not good enough: cmpxchg
emulation has to have direct access to the guest memory. __kvm_map_gfn()
is modified to accommodate the case.

The patchset is on top of v5.7-rc6 plus this patch:

https://lkml.kernel.org/r/20200402172507.2786-1-jimmyassarsson@gmail.com

== Open Issues ==

Unmapping the pages from direct mapping bring a few of issues that have
not rectified yet:

 - Touching direct mapping leads to fragmentation. We need to be able to
   recover from it. I have a buggy patch that aims at recovering 2M/1G page.
   It has to be fixed and tested properly

 - Page migration and KSM is not supported yet.

 - Live migration of a guest would require a new flow. Not sure yet how it
   would look like.

 - The feature interfere with NUMA balancing. Not sure yet if it's
   possible to make them work together.

 - Guests have no mechanism to ensure that even a well-behaving host has
   unmapped its private data.  With SEV, for instance, the guest only has
   to trust the hardware to encrypt a page after the C bit is set in a
   guest PTE.  A mechanism for a guest to query the host mapping state, or
   to constantly assert the intent for a page to be Private would be
   valuable.
Kirill A. Shutemov (16):
  x86/mm: Move force_dma_unencrypted() to common code
  x86/kvm: Introduce KVM memory protection feature
  x86/kvm: Make DMA pages shared
  x86/kvm: Use bounce buffers for KVM memory protection
  x86/kvm: Make VirtIO use DMA API in KVM guest
  KVM: Use GUP instead of copy_from/to_user() to access guest memory
  KVM: mm: Introduce VM_KVM_PROTECTED
  KVM: x86: Use GUP for page walk instead of __get_user()
  KVM: Protected memory extension
  KVM: x86: Enabled protected memory extension
  KVM: Rework copy_to/from_guest() to avoid direct mapping
  x86/kvm: Share steal time page with host
  x86/kvmclock: Share hvclock memory with the host
  KVM: Introduce gfn_to_pfn_memslot_protected()
  KVM: Handle protected memory in __kvm_map_gfn()/__kvm_unmap_gfn()
  KVM: Unmap protected pages from direct mapping

 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   2 +-
 arch/x86/Kconfig                       |  11 +-
 arch/x86/include/asm/io.h              |   6 +-
 arch/x86/include/asm/kvm_para.h        |   5 +
 arch/x86/include/asm/pgtable_types.h   |   1 +
 arch/x86/include/uapi/asm/kvm_para.h   |   3 +-
 arch/x86/kernel/kvm.c                  |  24 +-
 arch/x86/kernel/kvmclock.c             |   2 +-
 arch/x86/kernel/pci-swiotlb.c          |   3 +-
 arch/x86/kvm/cpuid.c                   |   3 +
 arch/x86/kvm/mmu/mmu.c                 |   6 +-
 arch/x86/kvm/mmu/paging_tmpl.h         |  10 +-
 arch/x86/kvm/x86.c                     |   9 +
 arch/x86/mm/Makefile                   |   2 +
 arch/x86/mm/ioremap.c                  |  16 +-
 arch/x86/mm/mem_encrypt.c              |  50 ----
 arch/x86/mm/mem_encrypt_common.c       |  62 ++++
 arch/x86/mm/pat/set_memory.c           |   8 +
 drivers/virtio/virtio_ring.c           |   4 +
 include/linux/kvm_host.h               |  14 +-
 include/linux/mm.h                     |  12 +
 include/uapi/linux/kvm_para.h          |   5 +-
 mm/gup.c                               |  20 +-
 mm/huge_memory.c                       |  29 +-
 mm/ksm.c                               |   3 +
 mm/memory.c                            |  16 +
 mm/mmap.c                              |   3 +
 mm/mprotect.c                          |   1 +
 mm/rmap.c                              |   4 +
 virt/kvm/async_pf.c                    |   4 +-
 virt/kvm/kvm_main.c                    | 390 +++++++++++++++++++++++--
 32 files changed, 627 insertions(+), 103 deletions(-)
 create mode 100644 arch/x86/mm/mem_encrypt_common.c

-- 
2.26.2



             reply	other threads:[~2020-05-22 12:52 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-22 12:51 Kirill A. Shutemov [this message]
2020-05-22 12:51 ` [RFC 01/16] x86/mm: Move force_dma_unencrypted() to common code Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 02/16] x86/kvm: Introduce KVM memory protection feature Kirill A. Shutemov
2020-05-25 14:58   ` Vitaly Kuznetsov
2020-05-25 15:15     ` Kirill A. Shutemov
2020-05-27  5:03       ` Sean Christopherson
2020-05-27  8:39         ` Vitaly Kuznetsov
2020-05-27  8:52           ` Sean Christopherson
2020-06-03  2:09           ` Huang, Kai
2020-06-03 11:14             ` Vitaly Kuznetsov
2020-05-22 12:52 ` [RFC 03/16] x86/kvm: Make DMA pages shared Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 04/16] x86/kvm: Use bounce buffers for KVM memory protection Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 05/16] x86/kvm: Make VirtIO use DMA API in KVM guest Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 06/16] KVM: Use GUP instead of copy_from/to_user() to access guest memory Kirill A. Shutemov
2020-05-25 15:08   ` Vitaly Kuznetsov
2020-05-25 15:17     ` Kirill A. Shutemov
2020-06-01 16:35       ` Paolo Bonzini
2020-06-02 13:33         ` Kirill A. Shutemov
2020-05-26  6:14   ` Mike Rapoport
2020-05-26 21:56     ` Kirill A. Shutemov
2020-05-29 15:24   ` Kees Cook
2020-05-22 12:52 ` [RFC 07/16] KVM: mm: Introduce VM_KVM_PROTECTED Kirill A. Shutemov
2020-05-26  6:15   ` Mike Rapoport
2020-05-26 22:01     ` Kirill A. Shutemov
2020-05-26  6:40   ` John Hubbard
2020-05-26 22:04     ` Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 08/16] KVM: x86: Use GUP for page walk instead of __get_user() Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 09/16] KVM: Protected memory extension Kirill A. Shutemov
2020-05-25 15:26   ` Vitaly Kuznetsov
2020-05-25 15:34     ` Kirill A. Shutemov
2020-06-03  1:34       ` Huang, Kai
2020-05-22 12:52 ` [RFC 10/16] KVM: x86: Enabled protected " Kirill A. Shutemov
2020-05-25 15:26   ` Vitaly Kuznetsov
2020-05-26  6:16   ` Mike Rapoport
2020-05-26 21:58     ` Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 11/16] KVM: Rework copy_to/from_guest() to avoid direct mapping Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 12/16] x86/kvm: Share steal time page with host Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 13/16] x86/kvmclock: Share hvclock memory with the host Kirill A. Shutemov
2020-05-25 15:22   ` Vitaly Kuznetsov
2020-05-25 15:25     ` Kirill A. Shutemov
2020-05-25 15:42       ` Vitaly Kuznetsov
2020-05-22 12:52 ` [RFC 14/16] KVM: Introduce gfn_to_pfn_memslot_protected() Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 15/16] KVM: Handle protected memory in __kvm_map_gfn()/__kvm_unmap_gfn() Kirill A. Shutemov
2020-05-22 12:52 ` [RFC 16/16] KVM: Unmap protected pages from direct mapping Kirill A. Shutemov
2020-05-26  6:16   ` Mike Rapoport
2020-05-26 22:10     ` Kirill A. Shutemov
2020-05-25  5:27 ` [RFC 00/16] KVM protected memory extension Kirill A. Shutemov
2020-05-25 13:47 ` Liran Alon
2020-05-25 14:46   ` Kirill A. Shutemov
2020-05-25 15:56     ` Liran Alon
2020-05-26  6:17   ` Mike Rapoport
2020-05-26 10:16     ` Liran Alon
2020-05-26 11:38       ` Mike Rapoport
2020-05-27 15:45         ` Dave Hansen
2020-05-27 21:22           ` Mike Rapoport
2020-06-04 15:15 ` Marc Zyngier
2020-06-04 15:48   ` Sean Christopherson
2020-06-04 16:27     ` Marc Zyngier
2020-06-04 16:35     ` Will Deacon
2020-06-04 19:09       ` Nakajima, Jun
2020-06-04 21:03         ` Jim Mattson
2020-06-04 23:29           ` Nakajima, Jun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200522125214.31348-1-kirill.shutemov@linux.intel.com \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=andi.kleen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wad@chromium.org \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).