All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sean Christopherson <seanjc@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Jim Mattson <jmattson@google.com>,
	David Rientjes <rientjes@google.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"Kleen, Andi" <andi.kleen@intel.com>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	Erdem Aktas <erdemaktas@google.com>,
	Steve Rutherford <srutherford@google.com>,
	Peter Gonda <pgonda@google.com>,
	David Hildenbrand <david@redhat.com>,
	x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages
Date: Mon, 19 Apr 2021 21:53:54 +0300	[thread overview]
Message-ID: <20210419185354.v3rgandtrel7bzjj@box> (raw)
In-Reply-To: <YH3HWeOXFiCTZN4y@google.com>

On Mon, Apr 19, 2021 at 06:09:29PM +0000, Sean Christopherson wrote:
> On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > On Mon, Apr 19, 2021 at 04:01:46PM +0000, Sean Christopherson wrote:
> > > But fundamentally the private pages, are well, private.  They can't be shared
> > > across processes, so I think we could (should?) require the VMA to always be
> > > MAP_PRIVATE.  Does that buy us enough to rely on the VMA alone?  I.e. is that
> > > enough to prevent userspace and unaware kernel code from acquiring a reference
> > > to the underlying page?
> > 
> > Shared pages should be fine too (you folks wanted tmpfs support).
> 
> Is that a conflict though?  If the private->shared conversion request is kicked
> out to userspace, then userspace can re-mmap() the files as MAP_SHARED, no?
> 
> Allowing MAP_SHARED for guest private memory feels wrong.  The data can't be
> shared, and dirty data can't be written back to the file.

It can be remapped, but faulting in the page would produce hwpoison entry.
I don't see other way to make Google's use-case with tmpfs-backed guest
memory work.

> > The poisoned pages must be useless outside of the process with the blessed
> > struct kvm. See kvm_pfn_map in the patch.
> 
> The big requirement for kernel TDX support is that the pages are useless in the
> host.  Regarding the guest, for TDX, the TDX Module guarantees that at most a
> single KVM guest can have access to a page at any given time.  I believe the RMP
> provides the same guarantees for SEV-SNP.
> 
> SEV/SEV-ES could still end up with corruption if multiple guests map the same
> private page, but that's obviously not the end of the world since it's the status
> quo today.  Living with that shortcoming might be a worthy tradeoff if punting
> mutual exclusion between guests to firmware/hardware allows us to simplify the
> kernel implementation.

The critical question is whether we ever need to translate hva->pfn after
the page is added to the guest private memory. I believe we do, but I
never checked. And that's the reason we need to keep hwpoison entries
around, which encode pfn.

If we don't, it would simplify the solution: kvm_pfn_map is not needed.
Single bit-per page would be enough.

> > > >  - Add a new GUP flag to retrive such pages from the userspace mapping.
> > > >    Used only for private mapping population.
> > > 
> > > >  - Shared gfn ranges managed by userspace, based on hypercalls from the
> > > >    guest.
> > > > 
> > > >  - Shared mappings get populated via normal VMA. Any poisoned pages here
> > > >    would lead to SIGBUS.
> > > > 
> > > > So far it looks pretty straight-forward.
> > > > 
> > > > The only thing that I don't understand is at way point the page gets tied
> > > > to the KVM instance. Currently we do it just before populating shadow
> > > > entries, but it would not work with the new scheme: as we poison pages
> > > > on fault it they may never get inserted into shadow entries. That's not
> > > > good as we rely on the info to unpoison page on free.
> > > 
> > > Can you elaborate on what you mean by "unpoison"?  If the page is never actually
> > > mapped into the guest, then its poisoned status is nothing more than a software
> > > flag, i.e. nothing extra needs to be done on free.
> > 
> > Normally, poisoned flag preserved for freed pages as it usually indicate
> > hardware issue. In this case we need return page to the normal circulation.
> > So we need a way to differentiate two kinds of page poison. Current patch
> > does this by adding page's pfn to kvm_pfn_map. But this will not work if
> > we uncouple poisoning and adding to shadow PTE.
> 
> Why use PG_hwpoison then?

Page flags are scarce. I don't want to take occupy a new one until I'm
sure I must.

And we can re-use existing infrastructure to SIGBUS on access to such
pages.

-- 
 Kirill A. Shutemov

  parent reply	other threads:[~2021-04-19 18:54 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16 15:40 [RFCv2 00/13] TDX and guest memory unmapping Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 01/13] x86/mm: Move force_dma_unencrypted() to common code Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 02/13] x86/kvm: Introduce KVM memory protection feature Kirill A. Shutemov
2021-04-16 16:10   ` Borislav Petkov
2021-04-19 10:10     ` Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 03/13] x86/kvm: Make DMA pages shared Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 04/13] x86/kvm: Use bounce buffers for KVM memory protection Kirill A. Shutemov
2021-04-16 16:21   ` Dave Hansen
2021-04-16 15:40 ` [RFCv2 05/13] x86/kvmclock: Share hvclock memory with the host Kirill A. Shutemov
2021-04-16 15:40 ` [RFCv2 06/13] x86/realmode: Share trampoline area if KVM memory protection enabled Kirill A. Shutemov
2021-04-19 16:49   ` Dave Hansen
2021-04-16 15:41 ` [RFCv2 07/13] mm: Add hwpoison_entry_to_pfn() and hwpoison_entry_to_page() Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 08/13] mm/gup: Add FOLL_ALLOW_POISONED Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 09/13] shmem: Fail shmem_getpage_gfp() on poisoned pages Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 10/13] mm: Keep page reference for hwpoison entries Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 11/13] mm: Replace hwpoison entry with present PTE if page got unpoisoned Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 12/13] KVM: passdown struct kvm to hva_to_pfn_slow() Kirill A. Shutemov
2021-04-16 15:41 ` [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Kirill A. Shutemov
2021-04-16 17:30   ` Sean Christopherson
2021-04-19 11:32     ` Xiaoyao Li
2021-04-19 14:26     ` Kirill A. Shutemov
2021-04-19 16:01       ` Sean Christopherson
2021-04-19 16:40         ` Kirill A. Shutemov
2021-04-19 18:09           ` Sean Christopherson
2021-04-19 18:12             ` David Hildenbrand
2021-04-19 18:53             ` Kirill A. Shutemov [this message]
2021-04-19 20:09               ` Sean Christopherson
2021-04-19 22:57                 ` Kirill A. Shutemov
2021-04-20 17:13                   ` Sean Christopherson
2021-05-21 12:31                     ` Kirill A. Shutemov
2021-05-26 19:46                       ` Sean Christopherson
2021-05-31 20:07                         ` Kirill A. Shutemov
2021-06-02 17:51                           ` Sean Christopherson
2021-06-02 23:33                             ` Kirill A. Shutemov
2021-06-03 19:46                               ` Sean Christopherson
2021-06-04 14:29                                 ` Kirill A. Shutemov
2021-06-04 17:16                       ` Andy Lutomirski
2021-06-04 17:54                         ` Kirill A. Shutemov
2021-04-16 16:46 ` [RFCv2 00/13] TDX and guest memory unmapping Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210419185354.v3rgandtrel7bzjj@box \
    --to=kirill@shutemov.name \
    --cc=andi.kleen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=erdemaktas@google.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jmattson@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pgonda@google.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=srutherford@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.