From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E09BE1FA3 for ; Sun, 12 Mar 2023 10:49:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83DE4C433D2; Sun, 12 Mar 2023 10:49:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678618171; bh=B3urysGaJdXsGL+TyngVKPW62/euDarj+hSfjXZ6EuU=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=oJTtiEk2XUqPC2IcXstrn6D10XhXJepYTsiAVwpmuAm7Vfj90YxsCo8YjEO7Fqo/W qpJ2k1SSDdtUf/iqbkq9JntdvWQKgNl8I18/AqQyB0ZXnPm6nfLd5QxLeXEbNYccVt wBHEk6fMFGnTskIq4O5MyZSBB3iGaeYuwdubITlTp8lZz1pnrovlxdLu24qgwXqQ0/ elt2q75bjcFAldIXw0g5ww4Crs+5xhC8BzdAnDPDduLYlGKivn03ngiDr90/P5pK2u s7D6tzu6UwmeGSUn5HKhnB8B08LwQ30x8XFNvmNsnYh6TTJ0HoEHkNXMT1HJsQ91tJ 7642WZ6QRht9g== Received: from sofa.misterjones.org ([185.219.108.64] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pbJGb-00GyYv-9V; Sun, 12 Mar 2023 10:49:29 +0000 Date: Sun, 12 Mar 2023 10:49:28 +0000 Message-ID: <87cz5e5jnr.wl-maz@kernel.org> From: Marc Zyngier To: Ricardo Koller Cc: pbonzini@redhat.com, oupton@google.com, yuzenghui@huawei.com, dmatlack@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, qperret@google.com, catalin.marinas@arm.com, andrew.jones@linux.dev, seanjc@google.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, eric.auger@redhat.com, gshan@redhat.com, reijiw@google.com, rananta@google.com, bgardon@google.com, ricarkol@gmail.com, Shaoqin Huang Subject: Re: [PATCH v6 02/12] KVM: arm64: Add KVM_PGTABLE_WALK ctx->flags for skipping BBM and CMO In-Reply-To: <20230307034555.39733-3-ricarkol@google.com> References: <20230307034555.39733-1-ricarkol@google.com> <20230307034555.39733-3-ricarkol@google.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: ricarkol@google.com, pbonzini@redhat.com, oupton@google.com, yuzenghui@huawei.com, dmatlack@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, qperret@google.com, catalin.marinas@arm.com, andrew.jones@linux.dev, seanjc@google.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, eric.auger@redhat.com, gshan@redhat.com, reijiw@google.com, rananta@google.com, bgardon@google.com, ricarkol@gmail.com, shahuang@redhat.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Tue, 07 Mar 2023 03:45:45 +0000, Ricardo Koller wrote: > > Add two flags to kvm_pgtable_visit_ctx, KVM_PGTABLE_WALK_SKIP_BBM and > KVM_PGTABLE_WALK_SKIP_CMO, to indicate that the walk should not > perform break-before-make (BBM) nor cache maintenance operations > (CMO). This will by a future commit to create unlinked tables not This will *be used*? > accessible to the HW page-table walker. This is safe as these > unlinked tables are not visible to the HW page-table walker. I don't think this last sentence makes much sense. The PTW is always coherent with the CPU caches and doesn't require cache maintenance (CMOs are solely for the pages the PTs point to). But this makes me question this patch further. The key observation here is that if you are creating new PTs that shadow an existing structure and still points to the same data pages, the cache state is independent of the intermediate PT walk, and thus CMOs are pointless anyway. So skipping CMOs makes sense. I agree with the assertion that there is little point in doing BBM when *creating* page tables, as all PTs start in an invalid state. But then, why do you need to skip it? The invalidation calls are already gated on the previous pointer being valid, which I presume won't be the case for what you describe here. > > Signed-off-by: Ricardo Koller > Reviewed-by: Shaoqin Huang > --- > arch/arm64/include/asm/kvm_pgtable.h | 18 ++++++++++++++++++ > arch/arm64/kvm/hyp/pgtable.c | 27 ++++++++++++++++----------- > 2 files changed, 34 insertions(+), 11 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h > index 26a4293726c1..c7a269cad053 100644 > --- a/arch/arm64/include/asm/kvm_pgtable.h > +++ b/arch/arm64/include/asm/kvm_pgtable.h > @@ -195,6 +195,12 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end, > * with other software walkers. > * @KVM_PGTABLE_WALK_HANDLE_FAULT: Indicates the page-table walk was > * invoked from a fault handler. > + * @KVM_PGTABLE_WALK_SKIP_BBM: Visit and update table entries > + * without Break-before-make > + * requirements. > + * @KVM_PGTABLE_WALK_SKIP_CMO: Visit and update table entries > + * without Cache maintenance > + * operations required. We have both I and D side CMOs. Is it reasonable to always treat them identically? > */ > enum kvm_pgtable_walk_flags { > KVM_PGTABLE_WALK_LEAF = BIT(0), > @@ -202,6 +208,8 @@ enum kvm_pgtable_walk_flags { > KVM_PGTABLE_WALK_TABLE_POST = BIT(2), > KVM_PGTABLE_WALK_SHARED = BIT(3), > KVM_PGTABLE_WALK_HANDLE_FAULT = BIT(4), > + KVM_PGTABLE_WALK_SKIP_BBM = BIT(5), > + KVM_PGTABLE_WALK_SKIP_CMO = BIT(6), > }; > > struct kvm_pgtable_visit_ctx { > @@ -223,6 +231,16 @@ static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *c > return ctx->flags & KVM_PGTABLE_WALK_SHARED; > } > > +static inline bool kvm_pgtable_walk_skip_bbm(const struct kvm_pgtable_visit_ctx *ctx) > +{ > + return ctx->flags & KVM_PGTABLE_WALK_SKIP_BBM; Probably worth wrapping this with an 'unlikely'. > +} > + > +static inline bool kvm_pgtable_walk_skip_cmo(const struct kvm_pgtable_visit_ctx *ctx) > +{ > + return ctx->flags & KVM_PGTABLE_WALK_SKIP_CMO; Same here. Also, why are these in kvm_pgtable.h? Can't they be moved inside pgtable.c and thus have the "inline" attribute dropped? > +} > + > /** > * struct kvm_pgtable_walker - Hook into a page-table walk. > * @cb: Callback function to invoke during the walk. > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index a3246d6cddec..4f703cc4cb03 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -741,14 +741,17 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx, > if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED)) > return false; > > - /* > - * Perform the appropriate TLB invalidation based on the evicted pte > - * value (if any). > - */ > - if (kvm_pte_table(ctx->old, ctx->level)) > - kvm_call_hyp(__kvm_tlb_flush_vmid, mmu); > - else if (kvm_pte_valid(ctx->old)) > - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level); > + if (!kvm_pgtable_walk_skip_bbm(ctx)) { > + /* > + * Perform the appropriate TLB invalidation based on the > + * evicted pte value (if any). > + */ > + if (kvm_pte_table(ctx->old, ctx->level)) You're not skipping BBM here. You're skipping the TLB invalidation. Not quite the same thing. > + kvm_call_hyp(__kvm_tlb_flush_vmid, mmu); > + else if (kvm_pte_valid(ctx->old)) > + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, > + ctx->addr, ctx->level); > + } > > if (stage2_pte_is_counted(ctx->old)) > mm_ops->put_page(ctx->ptep); > @@ -832,11 +835,13 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx, > return -EAGAIN; > > /* Perform CMOs before installation of the guest stage-2 PTE */ > - if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new)) > + if (!kvm_pgtable_walk_skip_cmo(ctx) && mm_ops->dcache_clean_inval_poc && > + stage2_pte_cacheable(pgt, new)) > mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops), > - granule); > + granule); > > - if (mm_ops->icache_inval_pou && stage2_pte_executable(new)) > + if (!kvm_pgtable_walk_skip_cmo(ctx) && mm_ops->icache_inval_pou && > + stage2_pte_executable(new)) > mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule); > > stage2_make_pte(ctx, new); Thanks, M. -- Without deviation from the norm, progress is not possible.