From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D83AC433B4 for ; Fri, 16 Apr 2021 14:44:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6171A6115B for ; Fri, 16 Apr 2021 14:44:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245015AbhDPOo5 (ORCPT ); Fri, 16 Apr 2021 10:44:57 -0400 Received: from mail.kernel.org ([198.145.29.99]:35750 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244946AbhDPOoy (ORCPT ); Fri, 16 Apr 2021 10:44:54 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9800161073; Fri, 16 Apr 2021 14:44:29 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1lXPhr-007sW3-CZ; Fri, 16 Apr 2021 15:44:27 +0100 Date: Fri, 16 Apr 2021 15:44:22 +0100 Message-ID: <87a6py2ss9.wl-maz@kernel.org> From: Marc Zyngier To: Keqian Zhu Cc: , , , , Subject: Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO In-Reply-To: <8f55b64f-b4dd-700e-c997-8de9c5ea282f@huawei.com> References: <20210415140328.24200-1-zhukeqian1@huawei.com> <20210415140328.24200-3-zhukeqian1@huawei.com> <8f55b64f-b4dd-700e-c997-8de9c5ea282f@huawei.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: zhukeqian1@huawei.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, wanghaibin.wang@huawei.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, 15 Apr 2021 15:08:09 +0100, Keqian Zhu wrote: > > Hi Marc, > > On 2021/4/15 22:03, Keqian Zhu wrote: > > The MMIO region of a device maybe huge (GB level), try to use > > block mapping in stage2 to speedup both map and unmap. > > > > Compared to normal memory mapping, we should consider two more > > points when try block mapping for MMIO region: > > > > 1. For normal memory mapping, the PA(host physical address) and > > HVA have same alignment within PUD_SIZE or PMD_SIZE when we use > > the HVA to request hugepage, so we don't need to consider PA > > alignment when verifing block mapping. But for device memory > > mapping, the PA and HVA may have different alignment. > > > > 2. For normal memory mapping, we are sure hugepage size properly > > fit into vma, so we don't check whether the mapping size exceeds > > the boundary of vma. But for device memory mapping, we should pay > > attention to this. > > > > This adds get_vma_page_shift() to get page shift for both normal > > memory and device MMIO region, and check these two points when > > selecting block mapping size for MMIO region. > > > > Signed-off-by: Keqian Zhu > > --- > > arch/arm64/kvm/mmu.c | 61 ++++++++++++++++++++++++++++++++++++-------- > > 1 file changed, 51 insertions(+), 10 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index c59af5ca01b0..5a1cc7751e6d 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot, > > return PAGE_SIZE; > > } > > > > +static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva) > > +{ > > + unsigned long pa; > > + > > + if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP)) > > + return huge_page_shift(hstate_vma(vma)); > > + > > + if (!(vma->vm_flags & VM_PFNMAP)) > > + return PAGE_SHIFT; > > + > > + VM_BUG_ON(is_vm_hugetlb_page(vma)); > > + > > + pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start); > > + > > +#ifndef __PAGETABLE_PMD_FOLDED > > + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) && > > + ALIGN_DOWN(hva, PUD_SIZE) >= vma->vm_start && > > + ALIGN(hva, PUD_SIZE) <= vma->vm_end) > > + return PUD_SHIFT; > > +#endif > > + > > + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) && > > + ALIGN_DOWN(hva, PMD_SIZE) >= vma->vm_start && > > + ALIGN(hva, PMD_SIZE) <= vma->vm_end) > > + return PMD_SHIFT; > > + > > + return PAGE_SHIFT; > > +} > > + > > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > struct kvm_memory_slot *memslot, unsigned long hva, > > unsigned long fault_status) > > @@ -769,7 +798,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > return -EFAULT; > > } > > > > - /* Let's check if we will get back a huge page backed by hugetlbfs */ > > + /* > > + * Let's check if we will get back a huge page backed by hugetlbfs, or > > + * get block mapping for device MMIO region. > > + */ > > mmap_read_lock(current->mm); > > vma = find_vma_intersection(current->mm, hva, hva + 1); > > if (unlikely(!vma)) { > > @@ -778,15 +810,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > return -EFAULT; > > } > > > > - if (is_vm_hugetlb_page(vma)) > > - vma_shift = huge_page_shift(hstate_vma(vma)); > > - else > > - vma_shift = PAGE_SHIFT; > > - > > - if (logging_active || > > - (vma->vm_flags & VM_PFNMAP)) { > > + /* > > + * logging_active is guaranteed to never be true for VM_PFNMAP > > + * memslots. > > + */ > > + if (logging_active) { > > force_pte = true; > > vma_shift = PAGE_SHIFT; > > + } else { > > + vma_shift = get_vma_page_shift(vma, hva); > > } > I use a if/else manner in v4, please check that. Thanks very much! That's fine. However, it is getting a bit late for 5.13, and we don't have much time to left it simmer in -next. I'll probably wait until after the merge window to pick it up. Thanks, M. -- Without deviation from the norm, progress is not possible.