KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Sean Christopherson <seanjc@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Jim Mattson <jmattson@google.com>,
	David Rientjes <rientjes@google.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"Kleen, Andi" <andi.kleen@intel.com>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	Erdem Aktas <erdemaktas@google.com>,
	Steve Rutherford <srutherford@google.com>,
	Peter Gonda <pgonda@google.com>,
	David Hildenbrand <david@redhat.com>,
	Chao Peng <chao.p.peng@linux.intel.com>,
	x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org,
Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages
Date: Fri, 4 Jun 2021 10:16:49 -0700
Message-ID: <b367e721-d613-c171-20e7-fe9ea096e411@kernel.org> (raw)
In-Reply-To: <20210521123148.a3t4uh4iezm6ax47@box>

On 5/21/21 5:31 AM, Kirill A. Shutemov wrote:
> Hi Sean,
> The core patch of the approach we've discussed before is below. It
> introduce a new page type with the required semantics.
> The full patchset can be found here:
>  git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-guest-only
> but only the patch below is relevant for TDX. QEMU patch is attached.
> CONFIG_HAVE_KVM_PROTECTED_MEMORY has to be changed to what is appropriate
> for TDX and FOLL_GUEST has to be used in hva_to_pfn_slow() when running
> TDX guest.
> When page get inserted into private sept we must make sure it is
> PageGuest() or SIGBUS otherwise. Inserting PageGuest() into shared is
> fine, but the page will not be accessible from userspace.

I may have missed a detail, and I think Sean almost said this, but:

I don't think that a single bit like this is sufficient.  A KVM host can
contain more than one TDX guest, and, to reliably avoid machine checks,
I think we need to make sure that secure pages are only ever mapped into
the guest to which they belong, as well as to the right GPA.  If a KVM
user host process creates a PageGuest page, what stops it from being
mapped into two different guests?  The refcount?

After contemplating this a bit, I have a somewhat different suggestion,
at least for TDX.  In TDX, a guest logically has two entirely separate
physical address spaces: the secure address space and the shared address
space.  They are managed separately, they have different contents, etc.
Normally one would expect a given numerical address (with the secure
selector bit masked off) to only exist in one of the two spaces, but
nothing in the architecture fundamentally requires this.  And switching
a page between the secure and shared states is a heavyweight operation.

So what if KVM actually treated them completely separately?  The secure
address space doesn't have any of the complex properties that the shared
address space has.  There are no paravirt pages there, no KSM pages
there, no forked pages there, no MAP_ANONYMOUS pages there, no MMIO
pages there, etc.  There could be a KVM API to allocate and deallocate a
secure page, and that would be it.

This could be inefficient if a guest does a bunch of MapGPA calls and
makes the shared address space highly sparse.  That could potentially be
mitigated by allowing a (shared) user memory region to come with a
bitmap of which pages are actually mapped.  Pages in the unmapped areas
are holes, and KVM won't look through to the QEMU page tables.

Does this make any sense?  It would completely eliminate all this
PageGuest stuff -- a secure page would be backed by a *kernel*
allocation (potentially a pageable allocation a la tmpfs if TDX ever
learns how to swap, but that's no currently possible nor would it be
user visible), so the kernel can straightforwardly guarantee that user
pagetables will not contain references to secure pages and that GUP will
not be called on them.

I'm not sure how this would map to SEV.  Certainly, with some degree of
host vs guest cooperation, SEV could work the same way, and the
migration support is pushing SEV in that direction.

  parent reply index

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16 15:40 [RFCv2 00/13] TDX and guest memory unmapping Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 01/13] x86/mm: Move force_dma_unencrypted() to common code Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 02/13] x86/kvm: Introduce KVM memory protection feature Kirill A. Shutemov
2021-04-16 16:10   ` Borislav Petkov
2021-04-19 10:10     ` Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 03/13] x86/kvm: Make DMA pages shared Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 04/13] x86/kvm: Use bounce buffers for KVM memory protection Kirill A. Shutemov
2021-04-16 16:21   ` Dave Hansen
2021-04-16 15:40 ` [RFCv2 05/13] x86/kvmclock: Share hvclock memory with the host Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 06/13] x86/realmode: Share trampoline area if KVM memory protection enabled Kirill A. Shutemov
2021-04-19 16:49   ` Dave Hansen
2021-04-16 15:41 ` [RFCv2 07/13] mm: Add hwpoison_entry_to_pfn() and hwpoison_entry_to_page() Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 08/13] mm/gup: Add FOLL_ALLOW_POISONED Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 09/13] shmem: Fail shmem_getpage_gfp() on poisoned pages Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 10/13] mm: Keep page reference for hwpoison entries Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 11/13] mm: Replace hwpoison entry with present PTE if page got unpoisoned Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 12/13] KVM: passdown struct kvm to hva_to_pfn_slow() Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Kirill A. Shutemov
2021-04-16 17:30   ` Sean Christopherson
2021-04-19 11:32     ` Xiaoyao Li
2021-04-19 14:26     ` Kirill A. Shutemov
2021-04-19 16:01       ` Sean Christopherson
2021-04-19 16:40         ` Kirill A. Shutemov
2021-04-19 18:09           ` Sean Christopherson
2021-04-19 18:12             ` David Hildenbrand
2021-04-19 18:53             ` Kirill A. Shutemov
2021-04-19 20:09               ` Sean Christopherson
2021-04-19 22:57                 ` Kirill A. Shutemov
2021-04-20 17:13                   ` Sean Christopherson
2021-05-21 12:31                     ` Kirill A. Shutemov
2021-05-26 19:46                       ` Sean Christopherson
2021-05-31 20:07                         ` Kirill A. Shutemov
2021-06-02 17:51                           ` Sean Christopherson
2021-06-02 23:33                             ` Kirill A. Shutemov
2021-06-03 19:46                               ` Sean Christopherson
2021-06-04 14:29                                 ` Kirill A. Shutemov
2021-06-04 17:16                       ` Andy Lutomirski [this message]
2021-06-04 17:54                         ` Kirill A. Shutemov
2021-04-16 16:46 ` [RFCv2 00/13] TDX and guest memory unmapping Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b367e721-d613-c171-20e7-fe9ea096e411@kernel.org \
    --to=luto@kernel.org \
    --cc=andi.kleen@intel.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=erdemaktas@google.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jmattson@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=pgonda@google.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=srutherford@google.com \
    --cc=x86@kernel.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git