From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDA6BC3A5A1 for ; Thu, 22 Aug 2019 12:38:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B0FE821726 for ; Thu, 22 Aug 2019 12:38:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=brainfault-org.20150623.gappssmtp.com header.i=@brainfault-org.20150623.gappssmtp.com header.b="Qg8jr2yW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732930AbfHVMiq (ORCPT ); Thu, 22 Aug 2019 08:38:46 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:42119 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729052AbfHVMiq (ORCPT ); Thu, 22 Aug 2019 08:38:46 -0400 Received: by mail-wr1-f65.google.com with SMTP id b16so5257572wrq.9 for ; Thu, 22 Aug 2019 05:38:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brainfault-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BnFwQaeHO9BwaDaPDs30ViOA3PYpNnt8DXjyb7utwWk=; b=Qg8jr2yWOLGcPOOts1J5YDswgaDhv2IKEGB6uG/dpPewlZixUmhziVkFuGIHuajdda gTepf4bkcV/X96wxwDEW5TutP6blq3MrJKOtjDZaDC1hBqeuln5N0N5AaM1mlvedjnd+ rmFgPP4Nwn5+CwnFqhI/ogR5nGw7oQ8OxuuuyS1PaNVD+KAdItTbB1ew6xbd+zm2+ocr 7CNtzkP3EtQXqybSQFOANXOiBnXZ5tJMUKzRarkt4W3CyvtrM0W/G4eteBzXY9TBDoGs 3nTtfpHwAZFZduYVg7KBikEcycQXr2P/BrDUlumgBk84Sa0dK8tqxO0Mt1nnOiGsHUAO Rxfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BnFwQaeHO9BwaDaPDs30ViOA3PYpNnt8DXjyb7utwWk=; b=tgPH5vm/3E4eARVPFShMJZ4MFVPephtFnJDcl+Mlo+fmNCGY8/vefJaDq0EgDaGtvi HMa0uOagkYendwJsoG6vgEpRhjRuEhs7jfCNHp93FPDH1yg+o5td6JKFSV+AVjjUHfFG Efcezmqm68U6T8yR4C83MO9awiWJe225oqsSAEGpm8TXG5VEgOpBHoiEGJe4aOPu/MF0 I5+TNmaxYDbEEKLEVVFgvib89Tcu6Mfzi2zdYmqteHpalYYMuCgNRbJn7BhJxyuE8RS3 2IKHzaMfjtf3LBkQLFF0G9lErdab+mtOtpYaAvDlv2J3F7VWsdB54tTZXhWBg/BQr3ga 6wkQ== X-Gm-Message-State: APjAAAVLUB5U5XJBs8U653sIIQ/FPVuWhF8ktqt0WVGkR0wRsWH5yLe8 lvuPIQfH5SypGum+8dwtxz+p2zbi8TFYDdc8PFawew== X-Google-Smtp-Source: APXvYqxliBRuaaMcL1lknjwgrVUrKd1aZkJOJwxxfHvTC3naTZb/hSNSUzJP45WaEW+UG6yIhAMbUhJTS+QRL7uN2pc= X-Received: by 2002:a5d:4ecb:: with SMTP id s11mr20341742wrv.323.1566477524109; Thu, 22 Aug 2019 05:38:44 -0700 (PDT) MIME-Version: 1.0 References: <20190822084131.114764-1-anup.patel@wdc.com> <20190822084131.114764-14-anup.patel@wdc.com> <77b9ff3c-292f-ee17-ddbb-134c0666fde7@amazon.com> In-Reply-To: <77b9ff3c-292f-ee17-ddbb-134c0666fde7@amazon.com> From: Anup Patel Date: Thu, 22 Aug 2019 18:08:32 +0530 Message-ID: Subject: Re: [PATCH v5 13/20] RISC-V: KVM: Implement stage2 page table programming To: Alexander Graf Cc: Anup Patel , Palmer Dabbelt , Paul Walmsley , Paolo Bonzini , Radim K , Daniel Lezcano , Thomas Gleixner , Atish Patra , Alistair Francis , Damien Le Moal , Christoph Hellwig , "kvm@vger.kernel.org" , "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, Aug 22, 2019 at 5:58 PM Alexander Graf wrote: > > On 22.08.19 10:45, Anup Patel wrote: > > This patch implements all required functions for programming > > the stage2 page table for each Guest/VM. > > > > At high-level, the flow of stage2 related functions is similar > > from KVM ARM/ARM64 implementation but the stage2 page table > > format is quite different for KVM RISC-V. > > > > Signed-off-by: Anup Patel > > Acked-by: Paolo Bonzini > > Reviewed-by: Paolo Bonzini > > --- > > arch/riscv/include/asm/kvm_host.h | 10 + > > arch/riscv/include/asm/pgtable-bits.h | 1 + > > arch/riscv/kvm/mmu.c | 637 +++++++++++++++++++++++++- > > 3 files changed, 638 insertions(+), 10 deletions(-) > > > > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h > > index 3b09158f80f2..a37775c92586 100644 > > --- a/arch/riscv/include/asm/kvm_host.h > > +++ b/arch/riscv/include/asm/kvm_host.h > > @@ -72,6 +72,13 @@ struct kvm_mmio_decode { > > int shift; > > }; > > > > +#define KVM_MMU_PAGE_CACHE_NR_OBJS 32 > > + > > +struct kvm_mmu_page_cache { > > + int nobjs; > > + void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS]; > > +}; > > + > > struct kvm_cpu_context { > > unsigned long zero; > > unsigned long ra; > > @@ -163,6 +170,9 @@ struct kvm_vcpu_arch { > > /* MMIO instruction details */ > > struct kvm_mmio_decode mmio_decode; > > > > + /* Cache pages needed to program page tables with spinlock held */ > > + struct kvm_mmu_page_cache mmu_page_cache; > > + > > /* VCPU power-off state */ > > bool power_off; > > > > diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h > > index bbaeb5d35842..be49d62fcc2b 100644 > > --- a/arch/riscv/include/asm/pgtable-bits.h > > +++ b/arch/riscv/include/asm/pgtable-bits.h > > @@ -26,6 +26,7 @@ > > > > #define _PAGE_SPECIAL _PAGE_SOFT > > #define _PAGE_TABLE _PAGE_PRESENT > > +#define _PAGE_LEAF (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC) > > > > /* > > * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to > > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c > > index 2b965f9aac07..9e95ab6769f6 100644 > > --- a/arch/riscv/kvm/mmu.c > > +++ b/arch/riscv/kvm/mmu.c > > @@ -18,6 +18,432 @@ > > #include > > #include > > > > +#ifdef CONFIG_64BIT > > +#define stage2_have_pmd true > > +#define stage2_gpa_size ((phys_addr_t)(1ULL << 39)) > > +#define stage2_cache_min_pages 2 > > +#else > > +#define pmd_index(x) 0 > > +#define pfn_pmd(x, y) ({ pmd_t __x = { 0 }; __x; }) > > +#define stage2_have_pmd false > > +#define stage2_gpa_size ((phys_addr_t)(1ULL << 32)) > > +#define stage2_cache_min_pages 1 > > +#endif > > + > > +static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache, > > + int min, int max) > > +{ > > + void *page; > > + > > + BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS); > > + if (pcache->nobjs >= min) > > + return 0; > > + while (pcache->nobjs < max) { > > + page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); > > + if (!page) > > + return -ENOMEM; > > + pcache->objects[pcache->nobjs++] = page; > > + } > > + > > + return 0; > > +} > > + > > +static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache) > > +{ > > + while (pcache && pcache->nobjs) > > + free_page((unsigned long)pcache->objects[--pcache->nobjs]); > > +} > > + > > +static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache) > > +{ > > + void *p; > > + > > + if (!pcache) > > + return NULL; > > + > > + BUG_ON(!pcache->nobjs); > > + p = pcache->objects[--pcache->nobjs]; > > + > > + return p; > > +} > > + > > +struct local_guest_tlb_info { > > + struct kvm_vmid *vmid; > > + gpa_t addr; > > +}; > > + > > +static void local_guest_tlb_flush_vmid_gpa(void *info) > > +{ > > + struct local_guest_tlb_info *infop = info; > > + > > + __kvm_riscv_hfence_gvma_vmid_gpa(READ_ONCE(infop->vmid->vmid_version), > > + infop->addr); > > +} > > + > > +static void stage2_remote_tlb_flush(struct kvm *kvm, gpa_t addr) > > +{ > > + struct local_guest_tlb_info info; > > + struct kvm_vmid *vmid = &kvm->arch.vmid; > > + > > + /* TODO: This should be SBI call */ > > + info.vmid = vmid; > > + info.addr = addr; > > + preempt_disable(); > > + smp_call_function_many(cpu_all_mask, local_guest_tlb_flush_vmid_gpa, > > + &info, true); > > This is all nice and dandy on the toy 4 core systems we have today, but > it will become a bottleneck further down the road. > > How many VMIDs do you have? Could you just allocate a new one every time > you switch host CPUs? Then you know exactly which CPUs to flush by > looking at all your vcpu structs and a local field that tells you which > pCPU they're on at this moment. > > Either way, it's nothing that should block inclusion. For today, we're fine. We are not happy about this either. Other two options, we have are: 1. Have SBI calls for remote HFENCEs 2. Propose RISC-V ISA extension for remote FENCEs Option1 is mostly extending SBI spec and implementing it in runtime firmware. Option2 is ideal solution but requires consensus among wider audience in RISC-V foundation. At this point, we are fine with a simple solution. Regards, Anup > > > Alex