All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nikunj A. Dadhania" <nikunj@amd.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Brijesh Singh <brijesh.singh@amd.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Peter Gonda <pgonda@google.com>, Bharata B Rao <bharata@amd.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Mingwei Zhang <mizhang@google.com>,
	David Hildenbrand <david@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v1 0/9] KVM: SVM: Defer page pinning for SEV guests
Date: Wed, 30 Mar 2022 10:12:24 +0530	[thread overview]
Message-ID: <c4b33753-01d7-684e-23ac-1189bd217761@amd.com> (raw)
In-Reply-To: <YkIh8zM7XfhsFN8L@google.com>

On 3/29/2022 2:30 AM, Sean Christopherson wrote:
> On Tue, Mar 08, 2022, Nikunj A Dadhania wrote:
>> This is a follow-up to the RFC implementation [1] that incorporates
>> review feedback and bug fixes. See the "RFC v1" section below for a 
>> list of changes.
> 
> Heh, for future reference, the initial posting of a series/patch/RFC is implicitly
> v1, i.e. this should be RFC v2.

Sure.

> 
>> SEV guest requires the guest's pages to be pinned in host physical
>> memory as migration of encrypted pages is not supported. The memory
>> encryption scheme uses the physical address of the memory being
>> encrypted. If guest pages are moved by the host, content decrypted in
>> the guest would be incorrect thereby corrupting guest's memory.
>>
>> For SEV/SEV-ES guests, the hypervisor doesn't know which pages are
>> encrypted and when the guest is done using those pages. Hypervisor
>> should treat all the guest pages as encrypted until they are 
>> deallocated or the guest is destroyed.
>>
>> While provision a pfn, make KVM aware that guest pages need to be 
>> pinned for long-term and use appropriate pin_user_pages API for these
>> special encrypted memory regions. KVM takes the first reference and
>> holds it until a mapping is done. Take an extra reference before KVM
>> releases the pfn. 
>>
>> Actual pinning management is handled by vendor code via new
>> kvm_x86_ops hooks. MMU calls in to vendor code to pin the page on
>> demand. Metadata of the pinning is stored in architecture specific
>> memslot area. During the memslot freeing path and deallocation path
>> guest pages are unpinned.
>>
>> Guest boot time comparison:
>> +---------------+----------------+-------------------+
>> | Guest Memory  |   baseline     |  Demand Pinning + |
>> | Size (GB)     | v5.17-rc6(secs)| v5.17-rc6(secs)   |
>> +---------------+----------------+-------------------+
>> |      4        |     6.16       |      5.71         |
>> +---------------+----------------+-------------------+
>> |     16        |     7.38       |      5.91         |
>> +---------------+----------------+-------------------+
>> |     64        |    12.17       |      6.16         |
>> +---------------+----------------+-------------------+
>> |    128        |    18.20       |      6.50         |
>> +---------------+----------------+-------------------+
>> |    192        |    24.56       |      6.80         |
>> +---------------+----------------+-------------------+
> 
> Let me preface this by saying I generally like the idea and especially the
> performance, but...
> 
> I think we should abandon this approach in favor of committing all our resources
> to fd-based private memory[*], which (if done right) will provide on-demand pinning
> for "free".  

I will give this a try for SEV, was on my todo list.

> I would much rather get that support merged sooner than later, and use
> it as a carrot for legacy SEV to get users to move over to its new APIs, with a long
> term goal of deprecating and disallowing SEV/SEV-ES guests without fd-based private
> memory.  

> That would require guest kernel support to communicate private vs. shared,

Could you explain this in more detail? This is required for punching hole for shared pages?

> but SEV guests already "need" to do that to play nice with live migration, so it's
> not a big ask, just another carrot to entice guests/customers to update their kernel
> (and possibly users to update their guest firmware).
> 
> This series isn't awful by any means, but it requires poking into core flows and
> further complicates paths that are already anything but simple.  And things like
> conditionally grabbing vCPU0 to pin pages in its MMU make me flinch.  And I think
> the situation would only get worse by the time all the bugs and corner cases are
> ironed out.  E.g. this code is wrong:
> 
>   void kvm_release_pfn_clean(kvm_pfn_t pfn)
>   {
> 	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) {
> 		struct page *page = pfn_to_page(pfn);
> 
> 		if (page_maybe_dma_pinned(page))
> 			unpin_user_page(page);
> 		else
> 			put_page(page);
> 	}
>   }
>   EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
> 
> Because (a) page_maybe_dma_pinned() is susceptible to false positives (clearly
> documented), and (b) even if it didn't get false positives, there's no guarantee
> that _KVM_ owns a pin of the page.

Right, the pinning could have been done by some other subsystem.

> 
> It's not an impossible problem to solve, but I suspect any solution will require
> either touching a lot of code or will be fragile and difficult to maintain, e.g.
> by auditing all users to understand which need to pin and which don't.  Even if
> we _always_ pin memory for SEV guests, we'd still need to plumb the "is SEV guest"
> info around.
> 
> And FWIW, my years-old idea of using a software-available SPTE bit to track pinned
> pages is plagued by the same underlying issue: KVM's current management (or lack
> thereof) of SEV guest memory just isn't viable long term.  In all honesty, it
> probably should never have been merged.  We can't change the past, but we can, 
> and IMO should, avoid piling on more code to an approach that is fundamentally flawed.
> 	
> [*] https://lore.kernel.org/all/20220310140911.50924-1-chao.p.peng@linux.intel.com
>

Thanks for the valuable feedback. 

Regards
Nikunj


  reply	other threads:[~2022-03-30  4:43 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-08  4:38 [PATCH RFC v1 0/9] KVM: SVM: Defer page pinning for SEV guests Nikunj A Dadhania
2022-03-08  4:38 ` [PATCH RFC v1 1/9] KVM: Introduce pinning flag to hva_to_pfn* Nikunj A Dadhania
2022-03-08  4:38 ` [PATCH RFC v1 2/9] KVM: x86/mmu: Move hugepage adjust to direct_page_fault Nikunj A Dadhania
2022-03-28 21:04   ` Sean Christopherson
2022-03-08  4:38 ` [PATCH RFC v1 3/9] KVM: x86/mmu: Add hook to pin PFNs on demand in MMU Nikunj A Dadhania
2022-03-08  4:38 ` [PATCH RFC v1 4/9] KVM: SVM: Add pinning metadata in the arch memslot Nikunj A Dadhania
2022-03-08  4:38 ` [PATCH RFC v1 5/9] KVM: SVM: Implement demand page pinning Nikunj A Dadhania
2022-03-08 21:53   ` Mingwei Zhang
2022-03-09  5:10     ` Nikunj A. Dadhania
2022-03-21  6:11       ` Mingwei Zhang
2022-03-21  9:19         ` Nikunj A. Dadhania
2022-03-08  4:38 ` [PATCH RFC v1 6/9] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV/TDX Nikunj A Dadhania
2022-03-08  4:38 ` [PATCH RFC v1 7/9] KVM: SEV: Carve out routine for allocation of pages Nikunj A Dadhania
2022-03-08  4:38 ` [PATCH RFC v1 8/9] KVM: Move kvm_for_each_memslot_in_hva_range() to be used in SVM Nikunj A Dadhania
2022-03-08  4:38 ` [PATCH RFC v1 9/9] KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data() Nikunj A Dadhania
2022-03-09 16:57   ` Maciej S. Szmigiero
2022-03-09 17:47     ` Nikunj A. Dadhania
2022-03-28 21:00 ` [PATCH RFC v1 0/9] KVM: SVM: Defer page pinning for SEV guests Sean Christopherson
2022-03-30  4:42   ` Nikunj A. Dadhania [this message]
2022-03-30 19:47     ` Sean Christopherson
2022-03-31  4:48       ` Nikunj A. Dadhania
2022-03-31 18:32         ` Peter Gonda
2022-03-31 19:00           ` Sean Christopherson
2022-04-01  3:22             ` Nikunj A. Dadhania
2022-04-01 14:54               ` Sean Christopherson
2022-04-01 15:39                 ` Nikunj A. Dadhania
2022-04-01 17:28             ` Marc Orr
2022-04-01 18:02               ` Sean Christopherson
2022-04-01 18:19                 ` Marc Orr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c4b33753-01d7-684e-23ac-1189bd217761@amd.com \
    --to=nikunj@amd.com \
    --cc=bharata@amd.com \
    --cc=brijesh.singh@amd.com \
    --cc=david@redhat.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=pgonda@google.com \
    --cc=seanjc@google.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.