From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F0BAC433F5 for ; Tue, 31 May 2022 16:46:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346217AbiEaQqE (ORCPT ); Tue, 31 May 2022 12:46:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241712AbiEaQqD (ORCPT ); Tue, 31 May 2022 12:46:03 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FF377B9E0 for ; Tue, 31 May 2022 09:46:01 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B5069B810F1 for ; Tue, 31 May 2022 16:45:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 06729C385A9; Tue, 31 May 2022 16:45:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1654015558; bh=UgFmzqyIOVV3Z6q57AM1Z6rN2JFZWxu+WYJ+ydi4GkY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=NbQCV3E8FhGa58tJhMBnjaAXMeC48imlaJ31TzT5+bXbjsrJuqOdVdO/yoQxjT0ZP O+NW8rdwGmQiFL+mp6RcYPRakJtSeVCU3AwPNt0cFTsoro0HZOm5S36NnneSF/Ez4q zUQqsEKZ3u4NHgdiOT5PPpETtrAuN4rKCY7fyyh6g9DDlNv4SEhAP3HmmztGzhYkdR o4KnJde+DdVBmSny/sG0cRpK14FwC+LHRi94BaXlOQcZh75Kt/3eqUirVuxoDf7zRv nec8h/tFY/kwWhaMuY1dV4UyRJTyaYChVLcGVbedt1V39CshsIGmNo1zegPx+F9C7b x+VOb6X9oGUmQ== Date: Tue, 31 May 2022 17:45:51 +0100 From: Will Deacon To: Alexandru Elisei Cc: kvmarm@lists.cs.columbia.edu, Ard Biesheuvel , Sean Christopherson , Andy Lutomirski , Catalin Marinas , James Morse , Chao Peng , Quentin Perret , Suzuki K Poulose , Michael Roth , Mark Rutland , Fuad Tabba , Oliver Upton , Marc Zyngier , kernel-team@android.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH 33/89] KVM: arm64: Handle guest stage-2 page-tables entirely at EL2 Message-ID: <20220531164550.GA25631@willie-the-truck> References: <20220519134204.5379-1-will@kernel.org> <20220519134204.5379-34-will@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Fri, May 20, 2022 at 05:03:29PM +0100, Alexandru Elisei wrote: > On Thu, May 19, 2022 at 02:41:08PM +0100, Will Deacon wrote: > > Now that EL2 is able to manage guest stage-2 page-tables, avoid > > allocating a separate MMU structure in the host and instead introduce a > > new fault handler which responds to guest stage-2 faults by sharing > > GUP-pinned pages with the guest via a hypercall. These pages are > > recovered (and unpinned) on guest teardown via the page reclaim > > hypercall. > > > > Signed-off-by: Will Deacon > > --- > [..] > > +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > + unsigned long hva) > > +{ > > + struct kvm_hyp_memcache *hyp_memcache = &vcpu->arch.pkvm_memcache; > > + struct mm_struct *mm = current->mm; > > + unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE; > > + struct kvm_pinned_page *ppage; > > + struct kvm *kvm = vcpu->kvm; > > + struct page *page; > > + u64 pfn; > > + int ret; > > + > > + ret = topup_hyp_memcache(hyp_memcache, kvm_mmu_cache_min_pages(kvm)); > > + if (ret) > > + return -ENOMEM; > > + > > + ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT); > > + if (!ppage) > > + return -ENOMEM; > > + > > + ret = account_locked_vm(mm, 1, true); > > + if (ret) > > + goto free_ppage; > > + > > + mmap_read_lock(mm); > > + ret = pin_user_pages(hva, 1, flags, &page, NULL); > > When I implemented memory pinning via GUP for the KVM SPE series, I > discovered that the pages were regularly unmapped at stage 2 because of > automatic numa balancing, as change_prot_numa() ends up calling > mmu_notifier_invalidate_range_start(). > > I was curious how you managed to avoid that, I don't know my way around > pKVM and can't seem to find where that's implemented. With this series, we don't take any notice of the MMU notifiers at EL2 so the stage-2 remains intact. The GUP pin will prevent the page from being migrated as the rmap walker won't be able to drop the mapcount. It's functional, but we'd definitely like to do better in the long term. The fd-based approach that I mentioned in the cover letter gets us some of the way there for protected guests ("private memory"), but non-protected guests running under pKVM are proving to be pretty challenging (we need to deal with things like sharing the zero page...). Will From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id 145DCC433EF for ; Tue, 31 May 2022 16:46:04 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 669574B332; Tue, 31 May 2022 12:46:04 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@kernel.org Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2BnrbfDzB4Gh; Tue, 31 May 2022 12:46:03 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 0AD424B38D; Tue, 31 May 2022 12:46:03 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 35C2B4B24F for ; Tue, 31 May 2022 12:46:02 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id avMVRCtEZx+2 for ; Tue, 31 May 2022 12:46:01 -0400 (EDT) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id F14A749F3C for ; Tue, 31 May 2022 12:46:00 -0400 (EDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 89CBFB80E71; Tue, 31 May 2022 16:45:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 06729C385A9; Tue, 31 May 2022 16:45:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1654015558; bh=UgFmzqyIOVV3Z6q57AM1Z6rN2JFZWxu+WYJ+ydi4GkY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=NbQCV3E8FhGa58tJhMBnjaAXMeC48imlaJ31TzT5+bXbjsrJuqOdVdO/yoQxjT0ZP O+NW8rdwGmQiFL+mp6RcYPRakJtSeVCU3AwPNt0cFTsoro0HZOm5S36NnneSF/Ez4q zUQqsEKZ3u4NHgdiOT5PPpETtrAuN4rKCY7fyyh6g9DDlNv4SEhAP3HmmztGzhYkdR o4KnJde+DdVBmSny/sG0cRpK14FwC+LHRi94BaXlOQcZh75Kt/3eqUirVuxoDf7zRv nec8h/tFY/kwWhaMuY1dV4UyRJTyaYChVLcGVbedt1V39CshsIGmNo1zegPx+F9C7b x+VOb6X9oGUmQ== Date: Tue, 31 May 2022 17:45:51 +0100 From: Will Deacon To: Alexandru Elisei Subject: Re: [PATCH 33/89] KVM: arm64: Handle guest stage-2 page-tables entirely at EL2 Message-ID: <20220531164550.GA25631@willie-the-truck> References: <20220519134204.5379-1-will@kernel.org> <20220519134204.5379-34-will@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Cc: Marc Zyngier , kvm@vger.kernel.org, Andy Lutomirski , linux-arm-kernel@lists.infradead.org, Michael Roth , Catalin Marinas , Chao Peng , kernel-team@android.com, kvmarm@lists.cs.columbia.edu X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Fri, May 20, 2022 at 05:03:29PM +0100, Alexandru Elisei wrote: > On Thu, May 19, 2022 at 02:41:08PM +0100, Will Deacon wrote: > > Now that EL2 is able to manage guest stage-2 page-tables, avoid > > allocating a separate MMU structure in the host and instead introduce a > > new fault handler which responds to guest stage-2 faults by sharing > > GUP-pinned pages with the guest via a hypercall. These pages are > > recovered (and unpinned) on guest teardown via the page reclaim > > hypercall. > > > > Signed-off-by: Will Deacon > > --- > [..] > > +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > + unsigned long hva) > > +{ > > + struct kvm_hyp_memcache *hyp_memcache = &vcpu->arch.pkvm_memcache; > > + struct mm_struct *mm = current->mm; > > + unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE; > > + struct kvm_pinned_page *ppage; > > + struct kvm *kvm = vcpu->kvm; > > + struct page *page; > > + u64 pfn; > > + int ret; > > + > > + ret = topup_hyp_memcache(hyp_memcache, kvm_mmu_cache_min_pages(kvm)); > > + if (ret) > > + return -ENOMEM; > > + > > + ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT); > > + if (!ppage) > > + return -ENOMEM; > > + > > + ret = account_locked_vm(mm, 1, true); > > + if (ret) > > + goto free_ppage; > > + > > + mmap_read_lock(mm); > > + ret = pin_user_pages(hva, 1, flags, &page, NULL); > > When I implemented memory pinning via GUP for the KVM SPE series, I > discovered that the pages were regularly unmapped at stage 2 because of > automatic numa balancing, as change_prot_numa() ends up calling > mmu_notifier_invalidate_range_start(). > > I was curious how you managed to avoid that, I don't know my way around > pKVM and can't seem to find where that's implemented. With this series, we don't take any notice of the MMU notifiers at EL2 so the stage-2 remains intact. The GUP pin will prevent the page from being migrated as the rmap walker won't be able to drop the mapcount. It's functional, but we'd definitely like to do better in the long term. The fd-based approach that I mentioned in the cover letter gets us some of the way there for protected guests ("private memory"), but non-protected guests running under pKVM are proving to be pretty challenging (we need to deal with things like sharing the zero page...). Will _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 44537C433F5 for ; Tue, 31 May 2022 16:47:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=NhBofHXp8zzFgeUVUzBVn1gTIlDKfcjGIxZ8wgyZV14=; b=nb5cj/XOqgwkld D4vu0OAc+SeOPJ26ZtAEEan7bgstPCOkjzOpeC9FOq9miThwOzAcMg3nfTWlhvarDDGZxXCAsxfRO gDxZqaVxx2tAhXt06HIfM0h1dFKbvTlI1/WJjqn5ASHZu+D++4BNFiOxIpUU6xz8N7nglPKydgLbf D8+2cFI7fD3x1WBFl2x67JD+UuCszLw5VKZRs7s1r1z/KxOFcwjCY2rSNiUDmzVZJNWUCFW/T7bEj cDL0P+Q45Vczn8ykGJ+ZV1OzDv5/b//UsOM99yeh+oYmGGNhfpfT1opTTsHmUs/0+0hg/1Q5UsgCZ 8Uv3OF5BsErAHM1TceDA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nw50O-00BqIT-E5; Tue, 31 May 2022 16:46:04 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nw50L-00BqH7-4s for linux-arm-kernel@lists.infradead.org; Tue, 31 May 2022 16:46:02 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 89CBFB80E71; Tue, 31 May 2022 16:45:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 06729C385A9; Tue, 31 May 2022 16:45:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1654015558; bh=UgFmzqyIOVV3Z6q57AM1Z6rN2JFZWxu+WYJ+ydi4GkY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=NbQCV3E8FhGa58tJhMBnjaAXMeC48imlaJ31TzT5+bXbjsrJuqOdVdO/yoQxjT0ZP O+NW8rdwGmQiFL+mp6RcYPRakJtSeVCU3AwPNt0cFTsoro0HZOm5S36NnneSF/Ez4q zUQqsEKZ3u4NHgdiOT5PPpETtrAuN4rKCY7fyyh6g9DDlNv4SEhAP3HmmztGzhYkdR o4KnJde+DdVBmSny/sG0cRpK14FwC+LHRi94BaXlOQcZh75Kt/3eqUirVuxoDf7zRv nec8h/tFY/kwWhaMuY1dV4UyRJTyaYChVLcGVbedt1V39CshsIGmNo1zegPx+F9C7b x+VOb6X9oGUmQ== Date: Tue, 31 May 2022 17:45:51 +0100 From: Will Deacon To: Alexandru Elisei Cc: kvmarm@lists.cs.columbia.edu, Ard Biesheuvel , Sean Christopherson , Andy Lutomirski , Catalin Marinas , James Morse , Chao Peng , Quentin Perret , Suzuki K Poulose , Michael Roth , Mark Rutland , Fuad Tabba , Oliver Upton , Marc Zyngier , kernel-team@android.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH 33/89] KVM: arm64: Handle guest stage-2 page-tables entirely at EL2 Message-ID: <20220531164550.GA25631@willie-the-truck> References: <20220519134204.5379-1-will@kernel.org> <20220519134204.5379-34-will@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220531_094601_516698_C198AA65 X-CRM114-Status: GOOD ( 21.04 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, May 20, 2022 at 05:03:29PM +0100, Alexandru Elisei wrote: > On Thu, May 19, 2022 at 02:41:08PM +0100, Will Deacon wrote: > > Now that EL2 is able to manage guest stage-2 page-tables, avoid > > allocating a separate MMU structure in the host and instead introduce a > > new fault handler which responds to guest stage-2 faults by sharing > > GUP-pinned pages with the guest via a hypercall. These pages are > > recovered (and unpinned) on guest teardown via the page reclaim > > hypercall. > > > > Signed-off-by: Will Deacon > > --- > [..] > > +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > + unsigned long hva) > > +{ > > + struct kvm_hyp_memcache *hyp_memcache = &vcpu->arch.pkvm_memcache; > > + struct mm_struct *mm = current->mm; > > + unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE; > > + struct kvm_pinned_page *ppage; > > + struct kvm *kvm = vcpu->kvm; > > + struct page *page; > > + u64 pfn; > > + int ret; > > + > > + ret = topup_hyp_memcache(hyp_memcache, kvm_mmu_cache_min_pages(kvm)); > > + if (ret) > > + return -ENOMEM; > > + > > + ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT); > > + if (!ppage) > > + return -ENOMEM; > > + > > + ret = account_locked_vm(mm, 1, true); > > + if (ret) > > + goto free_ppage; > > + > > + mmap_read_lock(mm); > > + ret = pin_user_pages(hva, 1, flags, &page, NULL); > > When I implemented memory pinning via GUP for the KVM SPE series, I > discovered that the pages were regularly unmapped at stage 2 because of > automatic numa balancing, as change_prot_numa() ends up calling > mmu_notifier_invalidate_range_start(). > > I was curious how you managed to avoid that, I don't know my way around > pKVM and can't seem to find where that's implemented. With this series, we don't take any notice of the MMU notifiers at EL2 so the stage-2 remains intact. The GUP pin will prevent the page from being migrated as the rmap walker won't be able to drop the mapcount. It's functional, but we'd definitely like to do better in the long term. The fd-based approach that I mentioned in the cover letter gets us some of the way there for protected guests ("private memory"), but non-protected guests running under pKVM are proving to be pretty challenging (we need to deal with things like sharing the zero page...). Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel