kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brijesh Singh <brijesh.singh@amd.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: brijesh.singh@amd.com, Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	eric van tassell <Eric.VanTassell@amd.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [RFC PATCH 0/8] KVM: x86/mmu: Introduce pinned SPTEs framework
Date: Mon, 26 Oct 2020 22:22:24 -0500	[thread overview]
Message-ID: <a1ecadce-531f-3698-7962-cbca6a1ebaf3@amd.com> (raw)
In-Reply-To: <16ef9998-84a9-1db4-d9a3-a0cab055a8a2@amd.com>

Hi Sean,

On 8/4/20 2:40 PM, Brijesh Singh wrote:
> On 8/3/20 12:16 PM, Sean Christopherson wrote:
>> On Mon, Aug 03, 2020 at 10:52:05AM -0500, Brijesh Singh wrote:
>>> Thanks for series Sean. Some thoughts
>>>
>>>
>>> On 7/31/20 4:23 PM, Sean Christopherson wrote:
>>>> SEV currently needs to pin guest memory as it doesn't support migrating
>>>> encrypted pages.  Introduce a framework in KVM's MMU to support pinning
>>>> pages on demand without requiring additional memory allocations, and with
>>>> (somewhat hazy) line of sight toward supporting more advanced features for
>>>> encrypted guest memory, e.g. host page migration.
>>> Eric's attempt to do a lazy pinning suffers with the memory allocation
>>> problem and your series seems to address it. As you have noticed,
>>> currently the SEV enablement  in the KVM does not support migrating the
>>> encrypted pages. But the recent SEV firmware provides a support to
>>> migrate the encrypted pages (e.g host page migration). The support is
>>> available in SEV FW >= 0.17.
>> I assume SEV also doesn't support ballooning?  Ballooning would be a good
>> first step toward page migration as I think it'd be easier for KVM to
>> support, e.g. only needs to deal with the "zap" and not the "move".
>
> Yes, the ballooning does not work with the SEV.
>
>
>>>> The idea is to use a software available bit in the SPTE to track that a
>>>> page has been pinned.  The decision to pin a page and the actual pinning
>>>> managment is handled by vendor code via kvm_x86_ops hooks.  There are
>>>> intentionally two hooks (zap and unzap) introduced that are not needed for
>>>> SEV.  I included them to again show how the flag (probably renamed?) could
>>>> be used for more than just pin/unpin.
>>> If using the available software bits for the tracking the pinning is
>>> acceptable then it can be used for the non-SEV guests (if needed). I
>>> will look through your patch more carefully but one immediate question,
>>> when do we unpin the pages? In the case of the SEV, once a page is
>>> pinned then it should not be unpinned until the guest terminates. If we
>>> unpin the page before the VM terminates then there is a  chance the host
>>> page migration will kick-in and move the pages. The KVM MMU code may
>>> call to drop the spte's during the zap/unzap and this happens a lot
>>> during a guest execution and it will lead us to the path where a vendor
>>> specific code will unpin the pages during the guest execution and cause
>>> a data corruption for the SEV guest.
>> The pages are unpinned by:
>>
>>   drop_spte()
>>   |
>>   -> rmap_remove()
>>      |
>>      -> sev_drop_pinned_spte()
>>
>>
>> The intent is to allow unpinning pages when the mm_struct dies, i.e. when
>> the memory is no longer reachable (as opposed to when the last reference to
>> KVM is put), but typing that out, I realize there are dependencies and
>> assumptions that don't hold true for SEV as implemented.
>
> So, I tried this RFC with the SEV guest (of course after adding some of
> the stuff you highlighted below), the guest fails randomly. I have seen
> a two to three type of failures 1) boot 2) kernbench execution and 3)
> device addition/removal, the failure signature is not consistent. I
> believe after addressing some of the dependencies we may able to make
> some progress but it will add new restriction which did not existed before.
>
>>   - Parent shadow pages won't be zapped.  Recycling MMU pages and zapping
>>     all SPs due to memslot updates are the two concerns.
>>
>>     The easy way out for recycling is to not recycle SPs with pinned
>>     children, though that may or may not fly with VMM admins.
>>
>>     I'm trying to resolve the memslot issue[*], but confirming that there's
>>     no longer an issue with not zapping everything is proving difficult as
>>     we haven't yet reproduced the original bug.
>>
>>   - drop_large_spte() won't be invoked.  I believe the only semi-legitimate
>>     scenario is if the NX huge page workaround is toggled on while a VM is
>>     running.  Disallowing that if there is an SEV guest seems reasonable?
>>
>>     There might be an issue with the host page size changing, but I don't
>>     think that can happen if the page is pinned.  That needs more
>>     investigation.
>>
>>
>> [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20200703025047.13987-1-sean.j.christopherson%40intel.com&amp;data=02%7C01%7Cbrijesh.singh%40amd.com%7C8d0dd94297ff4d24e54108d837d0f1dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637320717832773086&amp;sdata=yAHvMptxstoczXBZkFCpNC4AbADOJOgluwAtIYCuNVo%3D&amp;reserved=0


We would like to pin the guest memory on #NPF to reduce the boot delay
for the SEV guest. Are you planning to proceed with this RFC? With the
some fixes, I am able to get the RFC working for the SEV guest. I can
share those fixes with you so that you can include them on next
revision. One of the main roadblock I see is that the proposed framework
has a dependency on the memslot patch you mentioned above. Without the
memslot patch we will end up dropping (aka unpinning) spte during
memslot updates which is not acceptable for the SEV guest. I don't see
any resolution on the memslot patch yet. Any updates are appreciated. I
understand that getting memslot issue resolved may be difficult, so I am
wondering if in the meantime we should proceed with the xarray approach
to track the pinned pages and release them on VM termination.


>>>> Bugs in the core implementation are pretty much guaranteed.  The basic
>>>> concept has been tested, but in a fairly different incarnation.  Most
>>>> notably, tagging PRESENT SPTEs as PINNED has not been tested, although
>>>> using the PINNED flag to track zapped (and known to be pinned) SPTEs has
>>>> been tested.  I cobbled this variation together fairly quickly to get the
>>>> code out there for discussion.
>>>>
>>>> The last patch to pin SEV pages during sev_launch_update_data() is
>>>> incomplete; it's there to show how we might leverage MMU-based pinning to
>>>> support pinning pages before the guest is live.
>>> I will add the SEV specific bits and  give this a try.
>>>
>>>> Sean Christopherson (8):
>>>>   KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits()
>>>>   KVM: x86/mmu: Use bits 2:0 to check for present SPTEs
>>>>   KVM: x86/mmu: Refactor handling of not-present SPTEs in mmu_set_spte()
>>>>   KVM: x86/mmu: Add infrastructure for pinning PFNs on demand
>>>>   KVM: SVM: Use the KVM MMU SPTE pinning hooks to pin pages on demand
>>>>   KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
>>>>   KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV
>>>>   KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data()
>>>>
>>>>  arch/x86/include/asm/kvm_host.h |   7 ++
>>>>  arch/x86/kvm/mmu.h              |   3 +
>>>>  arch/x86/kvm/mmu/mmu.c          | 186 +++++++++++++++++++++++++-------
>>>>  arch/x86/kvm/mmu/paging_tmpl.h  |   3 +-
>>>>  arch/x86/kvm/svm/sev.c          | 141 +++++++++++++++++++++++-
>>>>  arch/x86/kvm/svm/svm.c          |   3 +
>>>>  arch/x86/kvm/svm/svm.h          |   3 +
>>>>  7 files changed, 302 insertions(+), 44 deletions(-)
>>>>

      reply	other threads:[~2020-10-27  3:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31 21:23 [RFC PATCH 0/8] KVM: x86/mmu: Introduce pinned SPTEs framework Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 1/8] KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits() Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 2/8] KVM: x86/mmu: Use bits 2:0 to check for present SPTEs Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 3/8] KVM: x86/mmu: Refactor handling of not-present SPTEs in mmu_set_spte() Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 4/8] KVM: x86/mmu: Add infrastructure for pinning PFNs on demand Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 5/8] KVM: SVM: Use the KVM MMU SPTE pinning hooks to pin pages " Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 6/8] KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault() Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 7/8] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV Sean Christopherson
2020-07-31 21:23 ` [RFC PATCH 8/8] KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data() Sean Christopherson
2020-08-03  3:00 ` [RFC PATCH 0/8] KVM: x86/mmu: Introduce pinned SPTEs framework Eric van Tassell
2020-08-03 15:00   ` Sean Christopherson
2020-08-03 15:52 ` Brijesh Singh
2020-08-03 17:16   ` Sean Christopherson
2020-08-04 19:40     ` Brijesh Singh
2020-10-27  3:22       ` Brijesh Singh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1ecadce-531f-3698-7962-cbca6a1ebaf3@amd.com \
    --to=brijesh.singh@amd.com \
    --cc=Eric.VanTassell@amd.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).