Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO

From: Marc Zyngier <maz@kernel.org>
To: Keqian Zhu <zhukeqian1@huawei.com>
Cc: <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <kvm@vger.kernel.org>,
	<kvmarm@lists.cs.columbia.edu>, <wanghaibin.wang@huawei.com>
Subject: Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
Date: Fri, 16 Apr 2021 15:44:22 +0100	[thread overview]
Message-ID: <87a6py2ss9.wl-maz@kernel.org> (raw)
In-Reply-To: <8f55b64f-b4dd-700e-c997-8de9c5ea282f@huawei.com>

On Thu, 15 Apr 2021 15:08:09 +0100,
Keqian Zhu <zhukeqian1@huawei.com> wrote:
> 
> Hi Marc,
> 
> On 2021/4/15 22:03, Keqian Zhu wrote:
> > The MMIO region of a device maybe huge (GB level), try to use
> > block mapping in stage2 to speedup both map and unmap.
> > 
> > Compared to normal memory mapping, we should consider two more
> > points when try block mapping for MMIO region:
> > 
> > 1. For normal memory mapping, the PA(host physical address) and
> > HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
> > the HVA to request hugepage, so we don't need to consider PA
> > alignment when verifing block mapping. But for device memory
> > mapping, the PA and HVA may have different alignment.
> > 
> > 2. For normal memory mapping, we are sure hugepage size properly
> > fit into vma, so we don't check whether the mapping size exceeds
> > the boundary of vma. But for device memory mapping, we should pay
> > attention to this.
> > 
> > This adds get_vma_page_shift() to get page shift for both normal
> > memory and device MMIO region, and check these two points when
> > selecting block mapping size for MMIO region.
> > 
> > Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> > ---
> >  arch/arm64/kvm/mmu.c | 61 ++++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 51 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c59af5ca01b0..5a1cc7751e6d 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >  	return PAGE_SIZE;
> >  }
> >  
> > +static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva)
> > +{
> > +	unsigned long pa;
> > +
> > +	if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP))
> > +		return huge_page_shift(hstate_vma(vma));
> > +
> > +	if (!(vma->vm_flags & VM_PFNMAP))
> > +		return PAGE_SHIFT;
> > +
> > +	VM_BUG_ON(is_vm_hugetlb_page(vma));
> > +
> > +	pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
> > +
> > +#ifndef __PAGETABLE_PMD_FOLDED
> > +	if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
> > +	    ALIGN_DOWN(hva, PUD_SIZE) >= vma->vm_start &&
> > +	    ALIGN(hva, PUD_SIZE) <= vma->vm_end)
> > +		return PUD_SHIFT;
> > +#endif
> > +
> > +	if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
> > +	    ALIGN_DOWN(hva, PMD_SIZE) >= vma->vm_start &&
> > +	    ALIGN(hva, PMD_SIZE) <= vma->vm_end)
> > +		return PMD_SHIFT;
> > +
> > +	return PAGE_SHIFT;
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >  			  struct kvm_memory_slot *memslot, unsigned long hva,
> >  			  unsigned long fault_status)
> > @@ -769,7 +798,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >  		return -EFAULT;
> >  	}
> >  
> > -	/* Let's check if we will get back a huge page backed by hugetlbfs */
> > +	/*
> > +	 * Let's check if we will get back a huge page backed by hugetlbfs, or
> > +	 * get block mapping for device MMIO region.
> > +	 */
> >  	mmap_read_lock(current->mm);
> >  	vma = find_vma_intersection(current->mm, hva, hva + 1);
> >  	if (unlikely(!vma)) {
> > @@ -778,15 +810,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >  		return -EFAULT;
> >  	}
> >  
> > -	if (is_vm_hugetlb_page(vma))
> > -		vma_shift = huge_page_shift(hstate_vma(vma));
> > -	else
> > -		vma_shift = PAGE_SHIFT;
> > -
> > -	if (logging_active ||
> > -	    (vma->vm_flags & VM_PFNMAP)) {
> > +	/*
> > +	 * logging_active is guaranteed to never be true for VM_PFNMAP
> > +	 * memslots.
> > +	 */
> > +	if (logging_active) {
> >  		force_pte = true;
> >  		vma_shift = PAGE_SHIFT;
> > +	} else {
> > +		vma_shift = get_vma_page_shift(vma, hva);
> >  	}
> I use a if/else manner in v4, please check that. Thanks very much!

That's fine. However, it is getting a bit late for 5.13, and we don't
have much time to left it simmer in -next. I'll probably wait until
after the merge window to pick it up.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.