From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA67F3FC1 for ; Fri, 27 Aug 2021 02:31:58 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="303458343" X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="303458343" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 19:31:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="528139305" Received: from xumingcu-mobl.ccr.corp.intel.com (HELO localhost) ([10.249.172.104]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 19:31:49 -0700 Date: Fri, 27 Aug 2021 10:31:50 +0800 From: Yu Zhang To: David Hildenbrand Cc: Sean Christopherson , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Borislav Petkov , Andy Lutomirski , Andrew Morton , Joerg Roedel , Andi Kleen , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, "Kirill A . Shutemov" , "Kirill A . Shutemov" , Kuppuswamy Sathyanarayanan , Dave Hansen Subject: Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: <20210827023150.jotwvom7mlsawjh4@linux.intel.com> References: <20210824005248.200037-1-seanjc@google.com> <307d385a-a263-276f-28eb-4bc8dd287e32@redhat.com> Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <307d385a-a263-276f-28eb-4bc8dd287e32@redhat.com> User-Agent: NeoMutt/20171215 On Thu, Aug 26, 2021 at 12:15:48PM +0200, David Hildenbrand wrote: > On 24.08.21 02:52, Sean Christopherson wrote: > > The goal of this RFC is to try and align KVM, mm, and anyone else with skin in the > > game, on an acceptable direction for supporting guest private memory, e.g. for > > Intel's TDX. The TDX architectural effectively allows KVM guests to crash the > > host if guest private memory is accessible to host userspace, and thus does not > > play nice with KVM's existing approach of pulling the pfn and mapping level from > > the host page tables. > > > > This is by no means a complete patch; it's a rough sketch of the KVM changes that > > would be needed. The kernel side of things is completely omitted from the patch; > > the design concept is below. > > > > There's also fair bit of hand waving on implementation details that shouldn't > > fundamentally change the overall ABI, e.g. how the backing store will ensure > > there are no mappings when "converting" to guest private. > > > > This is a lot of complexity and rather advanced approaches (not saying they > are bad, just that we try to teach the whole stack something completely > new). > > > What I think would really help is a list of requirements, such that > everybody is aware of what we actually want to achieve. Let me start: > > GFN: Guest Frame Number > EPFN: Encrypted Physical Frame Number > > > 1) An EPFN must not get mapped into more than one VM: it belongs exactly to > one VM. It must neither be shared between VMs between processes nor between > VMs within a processes. > > > 2) User space (well, and actually the kernel) must never access an EPFN: > > - If we go for an fd, essentially all operations (read/write) have to > fail. > - If we have to map an EPFN into user space page tables (e.g., to > simplify KVM), we could only allow fake swap entries such that "there > is something" but it cannot be accessed and is flagged accordingly. > - /proc/kcore and friends have to be careful as well and should not read > this memory. So there has to be a way to flag these pages. > > 3) We need a way to express the GFN<->EPFN mapping and essentially assign an > EPFN to a GFN. > > > 4) Once we assigned a EPFN to a GFN, that assignment must not longer change. > Further, an EPFN must not get assigned to multiple GFNs. > > > 5) There has to be a way to "replace" encrypted parts by "shared" parts > and the other way around. > > What else? Thanks a lot for this summary. A question about the requirement: do we or do we not have plan to support assigned device to the protected VM? If yes. The fd based solution may need change the VFIO interface as well( though the fake swap entry solution need mess with VFIO too). Because: 1> KVM uses VFIO when assigning devices into a VM. 2> Not knowing which GPA ranges may be used by the VM as DMA buffer, all guest pages will have to be mapped in host IOMMU page table to host pages, which are pinned during the whole life cycle fo the VM. 3> IOMMU mapping is done during VM creation time by VFIO and IOMMU driver, in vfio_dma_do_map(). 4> However, vfio_dma_do_map() needs the HVA to perform a GUP to get the HPA and pin the page. But if we are using fd based solution, not every GPA can have a HVA, thus the current VFIO interface to map and pin the GPA(IOVA) wont work. And I doubt if VFIO can be modified to support this easily. B.R. Yu From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0733AC4320E for ; Fri, 27 Aug 2021 02:32:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDD3860F25 for ; Fri, 27 Aug 2021 02:32:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244018AbhH0Ccr (ORCPT ); Thu, 26 Aug 2021 22:32:47 -0400 Received: from mga07.intel.com ([134.134.136.100]:37472 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231154AbhH0Ccq (ORCPT ); Thu, 26 Aug 2021 22:32:46 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="281601373" X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="281601373" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 19:31:58 -0700 X-IronPort-AV: E=Sophos;i="5.84,355,1620716400"; d="scan'208";a="528139305" Received: from xumingcu-mobl.ccr.corp.intel.com (HELO localhost) ([10.249.172.104]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Aug 2021 19:31:49 -0700 Date: Fri, 27 Aug 2021 10:31:50 +0800 From: Yu Zhang To: David Hildenbrand Cc: Sean Christopherson , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Borislav Petkov , Andy Lutomirski , Andrew Morton , Joerg Roedel , Andi Kleen , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, "Kirill A . Shutemov" , "Kirill A . Shutemov" , Kuppuswamy Sathyanarayanan , Dave Hansen Subject: Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: <20210827023150.jotwvom7mlsawjh4@linux.intel.com> References: <20210824005248.200037-1-seanjc@google.com> <307d385a-a263-276f-28eb-4bc8dd287e32@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <307d385a-a263-276f-28eb-4bc8dd287e32@redhat.com> User-Agent: NeoMutt/20171215 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 26, 2021 at 12:15:48PM +0200, David Hildenbrand wrote: > On 24.08.21 02:52, Sean Christopherson wrote: > > The goal of this RFC is to try and align KVM, mm, and anyone else with skin in the > > game, on an acceptable direction for supporting guest private memory, e.g. for > > Intel's TDX. The TDX architectural effectively allows KVM guests to crash the > > host if guest private memory is accessible to host userspace, and thus does not > > play nice with KVM's existing approach of pulling the pfn and mapping level from > > the host page tables. > > > > This is by no means a complete patch; it's a rough sketch of the KVM changes that > > would be needed. The kernel side of things is completely omitted from the patch; > > the design concept is below. > > > > There's also fair bit of hand waving on implementation details that shouldn't > > fundamentally change the overall ABI, e.g. how the backing store will ensure > > there are no mappings when "converting" to guest private. > > > > This is a lot of complexity and rather advanced approaches (not saying they > are bad, just that we try to teach the whole stack something completely > new). > > > What I think would really help is a list of requirements, such that > everybody is aware of what we actually want to achieve. Let me start: > > GFN: Guest Frame Number > EPFN: Encrypted Physical Frame Number > > > 1) An EPFN must not get mapped into more than one VM: it belongs exactly to > one VM. It must neither be shared between VMs between processes nor between > VMs within a processes. > > > 2) User space (well, and actually the kernel) must never access an EPFN: > > - If we go for an fd, essentially all operations (read/write) have to > fail. > - If we have to map an EPFN into user space page tables (e.g., to > simplify KVM), we could only allow fake swap entries such that "there > is something" but it cannot be accessed and is flagged accordingly. > - /proc/kcore and friends have to be careful as well and should not read > this memory. So there has to be a way to flag these pages. > > 3) We need a way to express the GFN<->EPFN mapping and essentially assign an > EPFN to a GFN. > > > 4) Once we assigned a EPFN to a GFN, that assignment must not longer change. > Further, an EPFN must not get assigned to multiple GFNs. > > > 5) There has to be a way to "replace" encrypted parts by "shared" parts > and the other way around. > > What else? Thanks a lot for this summary. A question about the requirement: do we or do we not have plan to support assigned device to the protected VM? If yes. The fd based solution may need change the VFIO interface as well( though the fake swap entry solution need mess with VFIO too). Because: 1> KVM uses VFIO when assigning devices into a VM. 2> Not knowing which GPA ranges may be used by the VM as DMA buffer, all guest pages will have to be mapped in host IOMMU page table to host pages, which are pinned during the whole life cycle fo the VM. 3> IOMMU mapping is done during VM creation time by VFIO and IOMMU driver, in vfio_dma_do_map(). 4> However, vfio_dma_do_map() needs the HVA to perform a GUP to get the HPA and pin the page. But if we are using fd based solution, not every GPA can have a HVA, thus the current VFIO interface to map and pin the GPA(IOVA) wont work. And I doubt if VFIO can be modified to support this easily. B.R. Yu