Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping

From: Laurent Dufour <ldufour@linux.ibm.com>
To: bharata@linux.ibm.com, linuxram@us.ibm.com
Cc: linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org,
	paulus@samba.org, sukadev@linux.ibm.com,
	linuxppc-dev@lists.ozlabs.org, bauerman@linux.ibm.com
Subject: Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping
Date: Thu, 23 Jul 2020 16:06:43 +0200	[thread overview]
Message-ID: <0631397b-44af-ea3b-b70b-e4a0dc2c0366@linux.ibm.com> (raw)
In-Reply-To: <4a3caeaf-cd0c-fcd7-0a97-f367a5f78dac@linux.ibm.com>

Le 23/07/2020 à 14:32, Laurent Dufour a écrit :
> Le 23/07/2020 à 05:36, Bharata B Rao a écrit :
>> On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote:
>>> When a secure memslot is dropped, all the pages backed in the secure device
>>> (aka really backed by secure memory by the Ultravisor) should be paged out
>>> to a normal page. Previously, this was achieved by triggering the page
>>> fault mechanism which is calling kvmppc_svm_page_out() on each pages.
>>>
>>> This can't work when hot unplugging a memory slot because the memory slot
>>> is flagged as invalid and gfn_to_pfn() is then not trying to access the
>>> page, so the page fault mechanism is not triggered.
>>>
>>> Since the final goal is to make a call to kvmppc_svm_page_out() it seems
>>> simpler to directly calling it instead of triggering such a mechanism. This
>>> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
>>> memslot.
>>>
>>> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
>>> the call to __kvmppc_svm_page_out() is made.
>>> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the
>>> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In
>>> addition, the mmap_sem is help in read mode during that time, not in write
>>> mode since the virual memory layout is not impacted, and
>>> kvm->arch.uvmem_lock prevents concurrent operation on the secure device.
>>>
>>> Cc: Ram Pai <linuxram@us.ibm.com>
>>> Cc: Bharata B Rao <bharata@linux.ibm.com>
>>> Cc: Paul Mackerras <paulus@ozlabs.org>
>>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>>> ---
>>>   arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++++++++++++++++++++----------
>>>   1 file changed, 37 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
>>> b/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> index 5a4b02d3f651..ba5c7c77cc3a 100644
>>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> @@ -624,35 +624,55 @@ static inline int kvmppc_svm_page_out(struct 
>>> vm_area_struct *vma,
>>>    * fault on them, do fault time migration to replace the device PTEs in
>>>    * QEMU page table with normal PTEs from newly allocated pages.
>>>    */
>>> -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>>> +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
>>>                    struct kvm *kvm, bool skip_page_out)
>>>   {
>>>       int i;
>>>       struct kvmppc_uvmem_page_pvt *pvt;
>>> -    unsigned long pfn, uvmem_pfn;
>>> -    unsigned long gfn = free->base_gfn;
>>> +    struct page *uvmem_page;
>>> +    struct vm_area_struct *vma = NULL;
>>> +    unsigned long uvmem_pfn, gfn;
>>> +    unsigned long addr, end;
>>> +
>>> +    mmap_read_lock(kvm->mm);
>>> +
>>> +    addr = slot->userspace_addr;
>>
>> We typically use gfn_to_hva() for that, but that won't work for a
>> memslot that is already marked INVALID which is the case here.
>> I think it is ok to access slot->userspace_addr here of an INVALID
>> memslot, but just thought of explictly bringing this up.
> 
> Which explicitly mentioned above in the patch's description:
> 
> This can't work when hot unplugging a memory slot because the memory slot
> is flagged as invalid and gfn_to_pfn() is then not trying to access the
> page, so the page fault mechanism is not triggered.
> 
>>
>>> +    end = addr + (slot->npages * PAGE_SIZE);
>>> -    for (i = free->npages; i; --i, ++gfn) {
>>> -        struct page *uvmem_page;
>>> +    gfn = slot->base_gfn;
>>> +    for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
>>> +
>>> +        /* Fetch the VMA if addr is not in the latest fetched one */
>>> +        if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) {
>>> +            vma = find_vma_intersection(kvm->mm, addr, end);
>>> +            if (!vma ||
>>> +                vma->vm_start > addr || vma->vm_end < end) {
>>> +                pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
>>> +                break;
>>> +            }
>>> +        }
>>
>> In Ram's series, kvmppc_memslot_page_merge() also walks the VMAs spanning
>> the memslot, but it uses a different logic for the same. Why can't these
>> two cases use the same method to walk the VMAs? Is there anything subtly
>> different between the two cases?
> 
> This is probably doable. At the time I wrote that patch, the 
> kvmppc_memslot_page_merge() was not yet introduced AFAIR.
> 
> This being said, I'd help a lot to factorize that code... I let Ram dealing with 
> that ;)

Indeed I don't think this is relevant, the loop in kvmppc_memslot_page_merge() 
deals with one call (to ksm_advise) per VMA, while this code is dealing with one 
call per page of the VMA, which completely different.

I don't think merging the both will be a good idea.

Cheers,
Laurent.