Re: [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page()

From: Eric van Tassell <evantass@amd.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>,
	eric van tassell <Eric.VanTassell@amd.com>
Cc: kvm@vger.kernel.org, bp@alien8.de, hpa@zytor.com,
	mingo@redhat.com, jmattson@google.com, joro@8bytes.org,
	pbonzini@redhat.com, tglx@linutronix.de, vkuznets@redhat.com,
	wanpengli@tencent.com, x86@kernel.org
Subject: Re: [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page()
Date: Wed, 19 Aug 2020 11:20:48 -0500	[thread overview]
Message-ID: <2770f023-c243-5467-217a-632ec57f4213@amd.com> (raw)
In-Reply-To: <bd9f4790-66d0-7ca8-4aed-5349e765991f@amd.com>

On 8/2/20 6:55 PM, Eric van Tassell wrote:
> 
> 
> On 7/31/20 3:40 PM, Sean Christopherson wrote:
>> On Fri, Jul 24, 2020 at 06:54:47PM -0500, eric van tassell wrote:
>>> Add 2 small infrastructure functions here which to enable pinning the 
>>> SEV
>>> guest pages used for sev_launch_update_data() using sev_get_page().
>>>
>>> Pin the memory for the data being passed to launch_update_data() 
>>> because it
>>> gets encrypted before the guest is first run and must not be moved which
>>> would corrupt it.
>>>
>>> Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
>>> ---
>>>   arch/x86/kvm/svm/sev.c | 48 ++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 48 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>> index 040ae4aa7c5a..e0eed9a20a51 100644
>>> --- a/arch/x86/kvm/svm/sev.c
>>> +++ b/arch/x86/kvm/svm/sev.c
>>> @@ -453,6 +453,37 @@ static int sev_get_page(struct kvm *kvm, gfn_t 
>>> gfn, kvm_pfn_t pfn)
>>>       return 0;
>>>   }
>>> +static struct kvm_memory_slot *hva_to_memslot(struct kvm *kvm,
>>> +                          unsigned long hva)
>>> +{
>>> +    struct kvm_memslots *slots = kvm_memslots(kvm);
>>> +    struct kvm_memory_slot *memslot;
>>> +
>>> +    kvm_for_each_memslot(memslot, slots) {
>>> +        if (hva >= memslot->userspace_addr &&
>>> +            hva < memslot->userspace_addr +
>>> +                  (memslot->npages << PAGE_SHIFT))
>>> +            return memslot;
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static bool hva_to_gfn(struct kvm *kvm, unsigned long hva, gfn_t *gfn)
>>> +{
>>> +    struct kvm_memory_slot *memslot;
>>> +    gpa_t gpa_offset;
>>> +
>>> +    memslot = hva_to_memslot(kvm, hva);
>>> +    if (!memslot)
>>> +        return false;
>>> +
>>> +    gpa_offset = hva - memslot->userspace_addr;
>>> +    *gfn = ((memslot->base_gfn << PAGE_SHIFT) + gpa_offset) >> 
>>> PAGE_SHIFT;
>>> +
>>> +    return true;
>>> +}
>>> +
>>>   static int sev_launch_update_data(struct kvm *kvm, struct 
>>> kvm_sev_cmd *argp)
>>>   {
>>>       unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, 
>>> size, i;
>>> @@ -483,6 +514,23 @@ static int sev_launch_update_data(struct kvm 
>>> *kvm, struct kvm_sev_cmd *argp)
>>>           goto e_free;
>>>       }
>>> +    /*
>>> +     * Increment the page ref count so that the pages do not get 
>>> migrated or
>>> +     * moved after we are done from the LAUNCH_UPDATE_DATA.
>>> +     */
>>> +    for (i = 0; i < npages; i++) {
>>> +        gfn_t gfn;
>>> +
>>> +        if (!hva_to_gfn(kvm, (vaddr + (i * PAGE_SIZE)) & PAGE_MASK, 
>>> &gfn)) {
>>
>> This needs to hold kvm->srcu to block changes to memslots while 
>> looking up
>> the hva->gpa translation.
> I'll look into this.

fixed

>>
>>> +            ret = -EFAULT;
>>> +            goto e_unpin;
>>> +        }
>>> +
>>> +        ret = sev_get_page(kvm, gfn, page_to_pfn(inpages[i]));
>>
>> Rather than dump everything into an xarray, KVM can instead pin the pages
>> by faulting them into its MMU.  By pinning pages in the MMU proper, 
>> KVM can
>> use software available bits in the SPTEs to track which pages are pinned,
>> can assert/WARN on unexpected behavior with respect to pinned pages, and
>> can drop/unpin pages as soon as they are no longer reachable from KVM, 
>> e.g.
>> when the mm_struct dies or the associated memslot is removed.
>>
>> Leveraging the MMU would also make this extensible to non-SEV features,
>> e.g. it can be shared by VMX if VMX adds a feature that needs similar 
>> hooks
>> in the MMU.  Shoving the tracking in SEV means the core logic would 
>> need to
>> be duplicated for other features.
>>
>> The big caveat is that funneling this through the MMU requires a vCPU[*],
>> i.e. is only viable if userspace has already created at least one vCPU.
>> For QEMU, this is guaranteed.  I don't know about other VMMs.
>>
>> If there are VMMs that support SEV and don't create vCPUs before 
>> encrypting
>> guest memory, one option would be to automatically go down the optimized
>> route iff at least one vCPU has been created.  In other words, don't 
>> break
>> old VMMs, but don't carry more hacks to make them faster either.
>>
>> It just so happens that I have some code that sort of implements the 
>> above.
>> I reworked it to mesh with SEV and will post it as an RFC.  It's far from
>> a tested-and-ready-to-roll implemenation, but I think it's fleshed out
>> enough to start a conversation.
>>
>> [*] This isn't a hard requirement, i.e. KVM could be reworked to 
>> provide a
>>      common MMU for non-nested TDP, but that's a much bigger effort.
>>
> I will think about this. (I'm out of the office Monday and Tuesday.)

Brijesh invested time and could not get this approach to meet SEV's 
needs as he reported in a previous email.

>>> +        if (ret)
>>> +            goto e_unpin;
>>> +    }
>>> +
>>>       /*
>>>        * The LAUNCH_UPDATE command will perform in-place encryption 
>>> of the
>>>        * memory content (i.e it will write the same memory region 
>>> with C=1).
>>> -- 
>>> 2.17.1
>>>