From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CBB5FA3748 for ; Mon, 31 Oct 2022 14:19:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B20056B0071; Mon, 31 Oct 2022 10:19:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD0548E0001; Mon, 31 Oct 2022 10:19:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 970896B0074; Mon, 31 Oct 2022 10:19:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 729F66B0071 for ; Mon, 31 Oct 2022 10:19:14 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EE26AA0D00 for ; Mon, 31 Oct 2022 14:19:13 +0000 (UTC) X-FDA: 80081451786.14.8A5FD65 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf16.hostedemail.com (Postfix) with ESMTP id CDAED180041 for ; Mon, 31 Oct 2022 14:19:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667225951; x=1698761951; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=tqW2d6h1UDzp5U1oT1GYec0J1/UWlA3NgSO3LKyeYTY=; b=Yrxael9I92oKYDCEg1U31cEeomj0sDmGbtaO8Td4qwMx7ECBzgWGI5jt 2BjiYdinJe3CWdxng73bst29Ta4UjFk3BmQ1PgYa5V1vnpJt34NWRYADS kRdKRbd3x69avnL1yuu/3GdWrM+2pvTxgeUeOf7d7ay2tNg9aKGDDJdmC eq4UitcgvjZYMZyRdyqyhfGZHKJw3Hz2JJjsU26OV+cdfon5tlwd7Kn2s D5VNgvwtBj2I0/LLd00VYCxetg8lVOe/UBOLJWSFrG/cdeppaqdP9uq/B ePQxhULFBFhAfQxCU08Dv3Qaa4EJp9JgdCxYWV6x8fP+bSGftr2ElptXk g==; X-IronPort-AV: E=McAfee;i="6500,9779,10517"; a="310606113" X-IronPort-AV: E=Sophos;i="5.95,228,1661842800"; d="scan'208";a="310606113" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Oct 2022 07:19:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10517"; a="628252321" X-IronPort-AV: E=Sophos;i="5.95,228,1661842800"; d="scan'208";a="628252321" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.193.75]) by orsmga007.jf.intel.com with ESMTP; 31 Oct 2022 07:18:54 -0700 Date: Mon, 31 Oct 2022 22:14:26 +0800 From: Chao Peng To: Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , tabba@google.com, Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com Subject: Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory Message-ID: <20221031141426.GA3994099@chaop.bj.intel.com> Reply-To: Chao Peng References: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> <20221025151344.3784230-3-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Yrxael9I; spf=none (imf16.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667225952; a=rsa-sha256; cv=none; b=5UHgXYt4foQkWFNQXB4lMD8WM1VHPAOf3xhRiWmUZmSTtGhWstMiUIrxwiKg72ajCB1DJt PU/uX7qfolBj7GUqI0Z4yMoz5cMGszzINXjSrj7qbY0QXE/hH5qNOA21lDrKfAeOZC7Uc8 v6Z9R6giv65lb3IeX8RcQtHnnAnBWJ0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667225952; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9y8goe/Ylc0K9sD7yjhGbLJqzXVVUA4piH64zSxUozg=; b=7wTWHJxUtmD7sGLp1tJ0S2MjN81JYvvc+C2fAg6UNoKPtdePATYshNTSaf9hscVbxUn/0b 25lh8/V6XQhX6TTY1E1ZOGxaX4suC0GG5Ho0sILCq/jecVaS818nuHJ0neYUvGkVeKYZCx jEcLePjRqLOuu1ya245SigwS5YKDAXw= X-Rspam-User: X-Rspamd-Queue-Id: CDAED180041 Authentication-Results: imf16.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Yrxael9I; spf=none (imf16.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) X-Stat-Signature: 94qz9r67nhdbtpu8pkuj3grdneqqf1sb X-Rspamd-Server: rspam10 X-HE-Tag: 1667225951-924205 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 28, 2022 at 03:04:27PM +0800, Xiaoyao Li wrote: > On 10/25/2022 11:13 PM, Chao Peng wrote: > > In memory encryption usage, guest memory may be encrypted with special > > key and can be accessed only by the guest itself. We call such memory > > private memory. It's valueless and sometimes can cause problem to allow > > userspace to access guest private memory. This new KVM memslot extension > > allows guest private memory being provided though a restrictedmem > ^ > > typo Thanks! > > > backed file descriptor(fd) and userspace is restricted to access the > > bookmarked memory in the fd. > > > > This new extension, indicated by the new flag KVM_MEM_PRIVATE, adds two > > additional KVM memslot fields restricted_fd/restricted_offset to allow > > userspace to instruct KVM to provide guest memory through restricted_fd. > > 'guest_phys_addr' is mapped at the restricted_offset of restricted_fd > > and the size is 'memory_size'. > > > > The extended memslot can still have the userspace_addr(hva). When use, a > > single memslot can maintain both private memory through restricted_fd > > and shared memory through userspace_addr. Whether the private or shared > > part is visible to guest is maintained by other KVM code. > > > > A restrictedmem_notifier field is also added to the memslot structure to > > allow the restricted_fd's backing store to notify KVM the memory change, > > KVM then can invalidate its page table entries. > > > > Together with the change, a new config HAVE_KVM_RESTRICTED_MEM is added > > and right now it is selected on X86_64 only. A KVM_CAP_PRIVATE_MEM is > > also introduced to indicate KVM support for KVM_MEM_PRIVATE. > > > > To make code maintenance easy, internally we use a binary compatible > > alias struct kvm_user_mem_region to handle both the normal and the > > '_ext' variants. > > > > Co-developed-by: Yu Zhang > > Signed-off-by: Yu Zhang > > Signed-off-by: Chao Peng > > --- > > Documentation/virt/kvm/api.rst | 48 ++++++++++++++++++++++++++++----- > > arch/x86/kvm/Kconfig | 2 ++ > > arch/x86/kvm/x86.c | 2 +- > > include/linux/kvm_host.h | 13 +++++++-- > > include/uapi/linux/kvm.h | 29 ++++++++++++++++++++ > > virt/kvm/Kconfig | 3 +++ > > virt/kvm/kvm_main.c | 49 ++++++++++++++++++++++++++++------ > > 7 files changed, 128 insertions(+), 18 deletions(-) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index eee9f857a986..f3fa75649a78 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -1319,7 +1319,7 @@ yet and must be cleared on entry. > > :Capability: KVM_CAP_USER_MEMORY > > :Architectures: all > > :Type: vm ioctl > > -:Parameters: struct kvm_userspace_memory_region (in) > > +:Parameters: struct kvm_userspace_memory_region(_ext) (in) > > :Returns: 0 on success, -1 on error > > :: > > @@ -1332,9 +1332,18 @@ yet and must be cleared on entry. > > __u64 userspace_addr; /* start of the userspace allocated memory */ > > }; > > + struct kvm_userspace_memory_region_ext { > > + struct kvm_userspace_memory_region region; > > + __u64 restricted_offset; > > + __u32 restricted_fd; > > + __u32 pad1; > > + __u64 pad2[14]; > > + }; > > + > > /* for kvm_memory_region::flags */ > > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > > #define KVM_MEM_READONLY (1UL << 1) > > + #define KVM_MEM_PRIVATE (1UL << 2) > > This ioctl allows the user to create, modify or delete a guest physical > > memory slot. Bits 0-15 of "slot" specify the slot id and this value > > @@ -1365,12 +1374,27 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr > > be identical. This allows large pages in the guest to be backed by large > > pages in the host. > > -The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and > > -KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of > > -writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to > > -use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it, > > -to make a new slot read-only. In this case, writes to this memory will be > > -posted to userspace as KVM_EXIT_MMIO exits. > > +kvm_userspace_memory_region_ext struct includes all fields of > > +kvm_userspace_memory_region struct, while also adds additional fields for some > > +other features. See below description of flags field for more information. > > +It's recommended to use kvm_userspace_memory_region_ext in new userspace code. > > + > > +The flags field supports following flags: > > + > > +- KVM_MEM_LOG_DIRTY_PAGES to instruct KVM to keep track of writes to memory > > + within the slot. For more details, see KVM_GET_DIRTY_LOG ioctl. > > + > > +- KVM_MEM_READONLY, if KVM_CAP_READONLY_MEM allows, to make a new slot > > + read-only. In this case, writes to this memory will be posted to userspace as > > + KVM_EXIT_MMIO exits. > > + > > +- KVM_MEM_PRIVATE, if KVM_CAP_PRIVATE_MEM allows, to indicate a new slot has > > + private memory backed by a file descriptor(fd) and userspace access to the > > + fd may be restricted. Userspace should use restricted_fd/restricted_offset in > > + kvm_userspace_memory_region_ext to instruct KVM to provide private memory > > + to guest. Userspace should guarantee not to map the same pfn indicated by > > + restricted_fd/restricted_offset to different gfns with multiple memslots. > > + Failed to do this may result undefined behavior. > > When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of > > the memory region are automatically reflected into the guest. For example, an > > @@ -8215,6 +8239,16 @@ structure. > > When getting the Modified Change Topology Report value, the attr->addr > > must point to a byte where the value will be stored or retrieved from. > > +8.36 KVM_CAP_PRIVATE_MEM > > +------------------------ > > + > > +:Architectures: x86 > > + > > +This capability indicates that private memory is supported and userspace can > > +set KVM_MEM_PRIVATE flag for KVM_SET_USER_MEMORY_REGION ioctl. See > > +KVM_SET_USER_MEMORY_REGION for details on the usage of KVM_MEM_PRIVATE and > > +kvm_userspace_memory_region_ext fields. > > + > > 9. Known KVM API problems > > ========================= > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > > index 67be7f217e37..8d2bd455c0cd 100644 > > --- a/arch/x86/kvm/Kconfig > > +++ b/arch/x86/kvm/Kconfig > > @@ -49,6 +49,8 @@ config KVM > > select SRCU > > select INTERVAL_TREE > > select HAVE_KVM_PM_NOTIFIER if PM > > + select HAVE_KVM_RESTRICTED_MEM if X86_64 > > + select RESTRICTEDMEM if HAVE_KVM_RESTRICTED_MEM > > help > > Support hosting fully virtualized guest machines using hardware > > virtualization extensions. You will need a fairly recent > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 4bd5f8a751de..02ad31f46dd7 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -12425,7 +12425,7 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, > > } > > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { > > - struct kvm_userspace_memory_region m; > > + struct kvm_user_mem_region m; > > m.slot = id | (i << 16); > > m.flags = 0; > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 32f259fa5801..739a7562a1f3 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -44,6 +44,7 @@ > > #include > > #include > > +#include > > #ifndef KVM_MAX_VCPU_IDS > > #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS > > @@ -575,8 +576,16 @@ struct kvm_memory_slot { > > u32 flags; > > short id; > > u16 as_id; > > + struct file *restricted_file; > > + loff_t restricted_offset; > > + struct restrictedmem_notifier notifier; > > }; > > +static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot) > > +{ > > + return slot && (slot->flags & KVM_MEM_PRIVATE); > > +} > > + > > We can introduce this function in patch 6 when it's first used. Good to me. Chao > >