KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Jim Mattson <jmattson@google.com>,
	David Rientjes <rientjes@google.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"Kleen, Andi" <andi.kleen@intel.com>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	Erdem Aktas <erdemaktas@google.com>,
	Steve Rutherford <srutherford@google.com>,
	Peter Gonda <pgonda@google.com>,
	David Hildenbrand <david@redhat.com>,
	Chao Peng <chao.p.peng@linux.intel.com>,
	x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages
Date: Wed, 2 Jun 2021 17:51:02 +0000
Message-ID: <YLfFBgPeWZ91TfH7@google.com> (raw)
In-Reply-To: <20210531200712.qjxghakcaj4s6ara@box.shutemov.name>

On Mon, May 31, 2021, Kirill A. Shutemov wrote:
> On Wed, May 26, 2021 at 07:46:52PM +0000, Sean Christopherson wrote:
> > On Fri, May 21, 2021, Kirill A. Shutemov wrote:
> > > Inserting PageGuest() into shared is fine, but the page will not be accessible
> > > from userspace.
> > 
> > Even if it can be functionally fine, I don't think we want to allow KVM to map
> > PageGuest() as shared memory.  The only reason to map memory shared is to share
> > it with something, e.g. the host, that doesn't have access to private memory, so
> > I can't envision a use case.
> > 
> > On the KVM side, it's trivially easy to omit FOLL_GUEST for shared memory, while
> > always passing FOLL_GUEST would require manually zapping.  Manual zapping isn't
> > a big deal, but I do think it can be avoided if userspace must either remap the
> > hva or define a new KVM memslot (new gpa->hva), both of which will automatically
> > zap any existing translations.
> > 
> > Aha, thought of a concrete problem.  If KVM maps PageGuest() into shared memory,
> > then KVM must ensure that the page is not mapped private via a different hva/gpa,
> > and is not mapped _any_ other guest because the TDX-Module's 1:1 PFN:TD+GPA
> > enforcement only applies to private memory.  The explicit "VM_WRITE | VM_SHARED"
> > requirement below makes me think this wouldn't be prevented.
> 
> Hm. I didn't realize that TDX module doesn't prevent the same page to be
> used as shared and private at the same time.

Ya, only private mappings are routed through the TDX module, e.g. it can prevent
mapping the same page as private into multiple guests, but it can't prevent the
host from mapping the page as non-private.

> Omitting FOLL_GUEST for shared memory doesn't look like a right approach.
> IIUC, it would require the kernel to track what memory is share and what
> private, which defeat the purpose of the rework. I would rather enforce
> !PageGuest() when share SEPT is populated in addition to enforcing
> PageGuest() fro private SEPT.

Isn't that what omitting FOLL_GUEST would accomplish?  For shared memory,
including mapping memory into the shared EPT, KVM will omit FOLL_GUEST and thus
require the memory to be readable/writable according to the guest access type.

By definition, that excludes PageGuest() because PageGuest() pages must always
be unmapped, e.g. PROTNONE.  And for private EPT, because PageGuest() is always
PROTNONE or whatever, it will require FOLL_GUEST to retrieve the PTE/PMD/Pxx.

On a semi-related topic, I don't think can_follow_write_pte() is the correct
place to hook PageGuest().  TDX's S-EPT has a quirk where all private guest
memory must be mapped writable, but that quirk doesn't hold true for non-TDX
guests.  It should be legal to map private guest memory as read-only.  And I
believe the below snippet in follow_page_pte() will be problematic too, since
FOLL_NUMA is added unless FOLL_FORCE is set.  I suspect the correct approach is
to handle FOLL_GUEST as an exception to pte_protnone(), though that might require
adjusting pte_protnone() to be meaningful even when CONFIG_NUMA_BALANCING=n.

	if ((flags & FOLL_NUMA) && pte_protnone(pte))
		goto no_page;
	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
		pte_unmap_unlock(ptep, ptl);
		return NULL;
	}

> Do you see any problems with this?
> 
> > Oh, and the other nicety is that I think it would avoid having to explicitly
> > handle PageGuest() memory that is being accessed from kernel/KVM, i.e. if all
> > memory exposed to KVM must be !PageGuest(), then it is also eligible for
> > copy_{to,from}_user().
> 
> copy_{to,from}_user() enforce by setting PTE entries to PROT_NONE.

But KVM does _not_ want those PTEs PROT_NONE.  If KVM is accessing memory that
is also accessible by the the guest, then it must be shared.  And if it's shared,
it must also be accessible to host userspace, i.e. something other than PROT_NONE,
otherwise the memory isn't actually shared with anything.

As above, any guest-accessible memory that is accessed by the host must be
shared, and so must be mapped with the required permissions.

  reply index

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16 15:40 [RFCv2 00/13] TDX and guest memory unmapping Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 01/13] x86/mm: Move force_dma_unencrypted() to common code Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 02/13] x86/kvm: Introduce KVM memory protection feature Kirill A. Shutemov
2021-04-16 16:10   ` Borislav Petkov
2021-04-19 10:10     ` Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 03/13] x86/kvm: Make DMA pages shared Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 04/13] x86/kvm: Use bounce buffers for KVM memory protection Kirill A. Shutemov
2021-04-16 16:21   ` Dave Hansen
2021-04-16 15:40 ` [RFCv2 05/13] x86/kvmclock: Share hvclock memory with the host Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 06/13] x86/realmode: Share trampoline area if KVM memory protection enabled Kirill A. Shutemov
2021-04-19 16:49   ` Dave Hansen
2021-04-16 15:41 ` [RFCv2 07/13] mm: Add hwpoison_entry_to_pfn() and hwpoison_entry_to_page() Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 08/13] mm/gup: Add FOLL_ALLOW_POISONED Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 09/13] shmem: Fail shmem_getpage_gfp() on poisoned pages Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 10/13] mm: Keep page reference for hwpoison entries Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 11/13] mm: Replace hwpoison entry with present PTE if page got unpoisoned Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 12/13] KVM: passdown struct kvm to hva_to_pfn_slow() Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Kirill A. Shutemov
2021-04-16 17:30   ` Sean Christopherson
2021-04-19 11:32     ` Xiaoyao Li
2021-04-19 14:26     ` Kirill A. Shutemov
2021-04-19 16:01       ` Sean Christopherson
2021-04-19 16:40         ` Kirill A. Shutemov
2021-04-19 18:09           ` Sean Christopherson
2021-04-19 18:12             ` David Hildenbrand
2021-04-19 18:53             ` Kirill A. Shutemov
2021-04-19 20:09               ` Sean Christopherson
2021-04-19 22:57                 ` Kirill A. Shutemov
2021-04-20 17:13                   ` Sean Christopherson
2021-05-21 12:31                     ` Kirill A. Shutemov
2021-05-26 19:46                       ` Sean Christopherson
2021-05-31 20:07                         ` Kirill A. Shutemov
2021-06-02 17:51                           ` Sean Christopherson [this message]
2021-06-02 23:33                             ` Kirill A. Shutemov
2021-06-03 19:46                               ` Sean Christopherson
2021-06-04 14:29                                 ` Kirill A. Shutemov
2021-06-04 17:16                       ` Andy Lutomirski
2021-06-04 17:54                         ` Kirill A. Shutemov
2021-04-16 16:46 ` [RFCv2 00/13] TDX and guest memory unmapping Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YLfFBgPeWZ91TfH7@google.com \
    --to=seanjc@google.com \
    --cc=andi.kleen@intel.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=erdemaktas@google.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jmattson@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pgonda@google.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=srutherford@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git