KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Brijesh Singh <brijesh.singh@amd.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: brijesh.singh@amd.com, Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	eric van tassell <Eric.VanTassell@amd.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [RFC PATCH 0/8] KVM: x86/mmu: Introduce pinned SPTEs framework
Date: Tue, 4 Aug 2020 14:40:49 -0500
Message-ID: <16ef9998-84a9-1db4-d9a3-a0cab055a8a2@amd.com> (raw)
In-Reply-To: <20200803171620.GC3151@linux.intel.com>


On 8/3/20 12:16 PM, Sean Christopherson wrote:
> On Mon, Aug 03, 2020 at 10:52:05AM -0500, Brijesh Singh wrote:
>> Thanks for series Sean. Some thoughts
>>
>>
>> On 7/31/20 4:23 PM, Sean Christopherson wrote:
>>> SEV currently needs to pin guest memory as it doesn't support migrating
>>> encrypted pages.  Introduce a framework in KVM's MMU to support pinning
>>> pages on demand without requiring additional memory allocations, and with
>>> (somewhat hazy) line of sight toward supporting more advanced features for
>>> encrypted guest memory, e.g. host page migration.
>>
>> Eric's attempt to do a lazy pinning suffers with the memory allocation
>> problem and your series seems to address it. As you have noticed,
>> currently the SEV enablement  in the KVM does not support migrating the
>> encrypted pages. But the recent SEV firmware provides a support to
>> migrate the encrypted pages (e.g host page migration). The support is
>> available in SEV FW >= 0.17.
> I assume SEV also doesn't support ballooning?  Ballooning would be a good
> first step toward page migration as I think it'd be easier for KVM to
> support, e.g. only needs to deal with the "zap" and not the "move".


Yes, the ballooning does not work with the SEV.


>
>>> The idea is to use a software available bit in the SPTE to track that a
>>> page has been pinned.  The decision to pin a page and the actual pinning
>>> managment is handled by vendor code via kvm_x86_ops hooks.  There are
>>> intentionally two hooks (zap and unzap) introduced that are not needed for
>>> SEV.  I included them to again show how the flag (probably renamed?) could
>>> be used for more than just pin/unpin.
>> If using the available software bits for the tracking the pinning is
>> acceptable then it can be used for the non-SEV guests (if needed). I
>> will look through your patch more carefully but one immediate question,
>> when do we unpin the pages? In the case of the SEV, once a page is
>> pinned then it should not be unpinned until the guest terminates. If we
>> unpin the page before the VM terminates then there is a  chance the host
>> page migration will kick-in and move the pages. The KVM MMU code may
>> call to drop the spte's during the zap/unzap and this happens a lot
>> during a guest execution and it will lead us to the path where a vendor
>> specific code will unpin the pages during the guest execution and cause
>> a data corruption for the SEV guest.
> The pages are unpinned by:
>
>   drop_spte()
>   |
>   -> rmap_remove()
>      |
>      -> sev_drop_pinned_spte()
>
>
> The intent is to allow unpinning pages when the mm_struct dies, i.e. when
> the memory is no longer reachable (as opposed to when the last reference to
> KVM is put), but typing that out, I realize there are dependencies and
> assumptions that don't hold true for SEV as implemented.


So, I tried this RFC with the SEV guest (of course after adding some of
the stuff you highlighted below), the guest fails randomly. I have seen
a two to three type of failures 1) boot 2) kernbench execution and 3)
device addition/removal, the failure signature is not consistent. I
believe after addressing some of the dependencies we may able to make
some progress but it will add new restriction which did not existed before.

>
>   - Parent shadow pages won't be zapped.  Recycling MMU pages and zapping
>     all SPs due to memslot updates are the two concerns.
>
>     The easy way out for recycling is to not recycle SPs with pinned
>     children, though that may or may not fly with VMM admins.
>
>     I'm trying to resolve the memslot issue[*], but confirming that there's
>     no longer an issue with not zapping everything is proving difficult as
>     we haven't yet reproduced the original bug.
>
>   - drop_large_spte() won't be invoked.  I believe the only semi-legitimate
>     scenario is if the NX huge page workaround is toggled on while a VM is
>     running.  Disallowing that if there is an SEV guest seems reasonable?
>
>     There might be an issue with the host page size changing, but I don't
>     think that can happen if the page is pinned.  That needs more
>     investigation.
>
>
> [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20200703025047.13987-1-sean.j.christopherson%40intel.com&amp;data=02%7C01%7Cbrijesh.singh%40amd.com%7C8d0dd94297ff4d24e54108d837d0f1dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637320717832773086&amp;sdata=yAHvMptxstoczXBZkFCpNC4AbADOJOgluwAtIYCuNVo%3D&amp;reserved=0
>
>>> Bugs in the core implementation are pretty much guaranteed.  The basic
>>> concept has been tested, but in a fairly different incarnation.  Most
>>> notably, tagging PRESENT SPTEs as PINNED has not been tested, although
>>> using the PINNED flag to track zapped (and known to be pinned) SPTEs has
>>> been tested.  I cobbled this variation together fairly quickly to get the
>>> code out there for discussion.
>>>
>>> The last patch to pin SEV pages during sev_launch_update_data() is
>>> incomplete; it's there to show how we might leverage MMU-based pinning to
>>> support pinning pages before the guest is live.
>>
>> I will add the SEV specific bits and  give this a try.
>>
>>> Sean Christopherson (8):
>>>   KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits()
>>>   KVM: x86/mmu: Use bits 2:0 to check for present SPTEs
>>>   KVM: x86/mmu: Refactor handling of not-present SPTEs in mmu_set_spte()
>>>   KVM: x86/mmu: Add infrastructure for pinning PFNs on demand
>>>   KVM: SVM: Use the KVM MMU SPTE pinning hooks to pin pages on demand
>>>   KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
>>>   KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV
>>>   KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data()
>>>
>>>  arch/x86/include/asm/kvm_host.h |   7 ++
>>>  arch/x86/kvm/mmu.h              |   3 +
>>>  arch/x86/kvm/mmu/mmu.c          | 186 +++++++++++++++++++++++++-------
>>>  arch/x86/kvm/mmu/paging_tmpl.h  |   3 +-
>>>  arch/x86/kvm/svm/sev.c          | 141 +++++++++++++++++++++++-
>>>  arch/x86/kvm/svm/svm.c          |   3 +
>>>  arch/x86/kvm/svm/svm.h          |   3 +
>>>  7 files changed, 302 insertions(+), 44 deletions(-)
>>>

      reply index

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31 21:23 Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 1/8] KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits() Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 2/8] KVM: x86/mmu: Use bits 2:0 to check for present SPTEs Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 3/8] KVM: x86/mmu: Refactor handling of not-present SPTEs in mmu_set_spte() Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 4/8] KVM: x86/mmu: Add infrastructure for pinning PFNs on demand Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 5/8] KVM: SVM: Use the KVM MMU SPTE pinning hooks to pin pages " Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 6/8] KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault() Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 7/8] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 8/8] KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data() Sean Christopherson
2020-08-03  3:00 ` [RFC PATCH 0/8] KVM: x86/mmu: Introduce pinned SPTEs framework Eric van Tassell
2020-08-03 15:00   ` Sean Christopherson
2020-08-03 15:52 ` Brijesh Singh
2020-08-03 17:16   ` Sean Christopherson
2020-08-04 19:40     ` Brijesh Singh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=16ef9998-84a9-1db4-d9a3-a0cab055a8a2@amd.com \
    --to=brijesh.singh@amd.com \
    --cc=Eric.VanTassell@amd.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git