From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEF61C10F14 for ; Thu, 3 Oct 2019 01:36:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 614D721D81 for ; Thu, 3 Oct 2019 01:36:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 614D721D81 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 105546B026F; Wed, 2 Oct 2019 21:36:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B8786B0270; Wed, 2 Oct 2019 21:36:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E99166B0271; Wed, 2 Oct 2019 21:36:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id C71A16B026F for ; Wed, 2 Oct 2019 21:36:02 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 8160968BF for ; Thu, 3 Oct 2019 01:36:02 +0000 (UTC) X-FDA: 76000757364.25.bird68_51c6c8f898b49 X-HE-Tag: bird68_51c6c8f898b49 X-Filterd-Recvd-Size: 11808 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Oct 2019 01:36:01 +0000 (UTC) Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x931WV9W003339; Wed, 2 Oct 2019 21:35:39 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 2vd1sg80vv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Oct 2019 21:35:39 -0400 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.27/8.16.0.27) with SMTP id x931Wam8003549; Wed, 2 Oct 2019 21:35:39 -0400 Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com with ESMTP id 2vd1sg80va-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Oct 2019 21:35:38 -0400 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id x931ZKKL030145; Thu, 3 Oct 2019 01:35:37 GMT Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by ppma04wdc.us.ibm.com with ESMTP id 2v9y588ee4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Oct 2019 01:35:37 +0000 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x931Zare60686794 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 3 Oct 2019 01:35:36 GMT Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4C9BF78063; Thu, 3 Oct 2019 01:35:36 +0000 (GMT) Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D898F7805C; Thu, 3 Oct 2019 01:35:23 +0000 (GMT) Received: from LeoBras.ibmuc.com (unknown [9.85.174.224]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 3 Oct 2019 01:35:23 +0000 (GMT) From: Leonardo Bras To: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: Leonardo Bras , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Arnd Bergmann , "Aneesh Kumar K.V" , Christophe Leroy , Nicholas Piggin , Andrew Morton , Mahesh Salgaonkar , Reza Arbab , Santosh Sivaraj , Balbir Singh , Thomas Gleixner , Greg Kroah-Hartman , Mike Rapoport , Allison Randal , Jason Gunthorpe , Dan Williams , Vlastimil Babka , Christoph Lameter , Logan Gunthorpe , Andrey Ryabinin , Alexey Dobriyan , Souptick Joarder , Mathieu Desnoyers , Ralph Campbell , Jesper Dangaard Brouer , Jann Horn , Davidlohr Bueso , "Peter Zijlstra (Intel)" , Ingo Molnar , Christian Brauner , Michal Hocko , Elena Reshetova , Roman Gushchin , Andrea Arcangeli , Al Viro , "Dmitry V. Levin" , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Song Liu , Bartlomiej Zolnierkiewicz , Ira Weiny , "Kirill A. Shutemov" , John Hubbard , Keith Busch Subject: [PATCH v5 08/11] powerpc/kvm/book3s_hv: Applies counting method to monitor lockless pgtbl walks Date: Wed, 2 Oct 2019 22:33:22 -0300 Message-Id: <20191003013325.2614-9-leonardo@linux.ibm.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191003013325.2614-1-leonardo@linux.ibm.com> References: <20191003013325.2614-1-leonardo@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-10-03_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=997 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1908290000 definitions=main-1910030012 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Applies the counting-based method for monitoring all book3s_hv related functions that do lockless pagetable walks. Adds comments explaining that some lockless pagetable walks don't need protection due to guest pgd not being a target of THP collapse/split, or due to being called from Realmode + MSR_EE =3D 0 kvmppc_do_h_enter: Fixes where local_irq_restore() must be placed (after the last usage of ptep). Given that some of these functions can be called in real mode, and others always are, we use __{begin,end}_lockless_pgtbl_walk so we can decide whe= n to disable interrupts. Signed-off-by: Leonardo Bras --- arch/powerpc/kvm/book3s_hv_nested.c | 22 ++++++++++++++++++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 32 ++++++++++++++++------------- 2 files changed, 38 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3= s_hv_nested.c index cdf30c6eaf54..89944c699fd6 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -803,7 +803,11 @@ static void kvmhv_update_nest_rmap_rc(struct kvm *kv= m, u64 n_rmap, if (!gp) return; =20 - /* Find the pte */ + /* Find the pte: + * We are walking the nested guest (partition-scoped) page table here. + * We can do this without disabling irq because the Linux MM + * subsystem doesn't do THP splits and collapses on this tree. + */ ptep =3D __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift); /* * If the pte is present and the pfn is still the same, update the pte. @@ -853,7 +857,11 @@ static void kvmhv_remove_nest_rmap(struct kvm *kvm, = u64 n_rmap, if (!gp) return; =20 - /* Find and invalidate the pte */ + /* Find and invalidate the pte: + * We are walking the nested guest (partition-scoped) page table here. + * We can do this without disabling irq because the Linux MM + * subsystem doesn't do THP splits and collapses on this tree. + */ ptep =3D __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift); /* Don't spuriously invalidate ptes if the pfn has changed */ if (ptep && pte_present(*ptep) && ((pte_val(*ptep) & mask) =3D=3D hpa)) @@ -921,6 +929,11 @@ static bool kvmhv_invalidate_shadow_pte(struct kvm_v= cpu *vcpu, int shift; =20 spin_lock(&kvm->mmu_lock); + /* + * We are walking the nested guest (partition-scoped) page table here. + * We can do this without disabling irq because the Linux MM + * subsystem doesn't do THP splits and collapses on this tree. + */ ptep =3D __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift); if (!shift) shift =3D PAGE_SHIFT; @@ -1362,6 +1375,11 @@ static long int __kvmhv_nested_page_fault(struct k= vm_run *run, /* See if can find translation in our partition scoped tables for L1 */ pte =3D __pte(0); spin_lock(&kvm->mmu_lock); + /* + * We are walking the secondary (partition-scoped) page table here. + * We can do this without disabling irq because the Linux MM + * subsystem doesn't do THP splits and collapses on this tree. + */ pte_p =3D __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); if (!shift) shift =3D PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3= s_hv_rm_mmu.c index 220305454c23..a8be42f5be1e 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -210,7 +210,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long= flags, pte_t *ptep; unsigned int writing; unsigned long mmu_seq; - unsigned long rcbits, irq_flags =3D 0; + unsigned long rcbits, irq_mask =3D 0; =20 if (kvm_is_radix(kvm)) return H_FUNCTION; @@ -252,12 +252,8 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned lon= g flags, * If we had a page table table change after lookup, we would * retry via mmu_notifier_retry. */ - if (!realmode) - local_irq_save(irq_flags); - /* - * If called in real mode we have MSR_EE =3D 0. Otherwise - * we disable irq above. - */ + irq_mask =3D __begin_lockless_pgtbl_walk(kvm->mm, !realmode); + ptep =3D __find_linux_pte(pgdir, hva, NULL, &hpage_shift); if (ptep) { pte_t pte; @@ -272,8 +268,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long= flags, * to <=3D host page size, if host is using hugepage */ if (host_pte_size < psize) { - if (!realmode) - local_irq_restore(flags); + __end_lockless_pgtbl_walk(kvm->mm, irq_mask, !realmode); return H_PARAMETER; } pte =3D kvmppc_read_update_linux_pte(ptep, writing); @@ -287,8 +282,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long= flags, pa |=3D gpa & ~PAGE_MASK; } } - if (!realmode) - local_irq_restore(irq_flags); =20 ptel &=3D HPTE_R_KEY | HPTE_R_PP0 | (psize-1); ptel |=3D pa; @@ -302,8 +295,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned lon= g flags, =20 /*If we had host pte mapping then Check WIMG */ if (ptep && !hpte_cache_flags_ok(ptel, is_ci)) { - if (is_ci) + if (is_ci) { + __end_lockless_pgtbl_walk(kvm->mm, irq_mask, !realmode); return H_PARAMETER; + } /* * Allow guest to map emulated device memory as * uncacheable, but actually make it cacheable. @@ -311,6 +306,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long= flags, ptel &=3D ~(HPTE_R_W|HPTE_R_I|HPTE_R_G); ptel |=3D HPTE_R_M; } + __end_lockless_pgtbl_walk(kvm->mm, irq_mask, !realmode); =20 /* Find and lock the HPTEG slot to use */ do_insert: @@ -907,11 +903,19 @@ static int kvmppc_get_hpa(struct kvm_vcpu *vcpu, un= signed long gpa, /* Translate to host virtual address */ hva =3D __gfn_to_hva_memslot(memslot, gfn); =20 - /* Try to find the host pte for that virtual address */ + /* Try to find the host pte for that virtual address : + * Called by hcall_real_table (real mode + MSR_EE=3D0) + * Interrupts are disabled here. + */ + __begin_lockless_pgtbl_walk(kvm->mm, false); ptep =3D __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift); - if (!ptep) + if (!ptep) { + __end_lockless_pgtbl_walk(kvm->mm, 0, false); return H_TOO_HARD; + } pte =3D kvmppc_read_update_linux_pte(ptep, writing); + __end_lockless_pgtbl_walk(kvm->mm, 0, false); + if (!pte_present(pte)) return H_TOO_HARD; =20 --=20 2.20.1