From: "Andy Lutomirski" <luto@kernel.org> To: "Quentin Perret" <qperret@google.com> Cc: "Sean Christopherson" <seanjc@google.com>, "Steven Price" <steven.price@arm.com>, "Chao Peng" <chao.p.peng@linux.intel.com>, "kvm list" <kvm@vger.kernel.org>, "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Linux API" <linux-api@vger.kernel.org>, qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>, "Jonathan Corbet" <corbet@lwn.net>, "Vitaly Kuznetsov" <vkuznets@redhat.com>, "Wanpeng Li" <wanpengli@tencent.com>, "Jim Mattson" <jmattson@google.com>, "Joerg Roedel" <joro@8bytes.org>, "Thomas Gleixner" <tglx@linutronix.de>, "Ingo Molnar" <mingo@redhat.com>, "Borislav Petkov" <bp@alien8.de>, "the arch/x86 maintainers" <x86@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>, "Hugh Dickins" <hughd@google.com>, "Jeff Layton" <jlayton@kernel.org>, "J . Bruce Fields" <bfields@fieldses.org>, "Andrew Morton" <akpm@linux-foundation.org>, "Mike Rapoport" <rppt@kernel.org>, "Maciej S . Szmigiero" <mail@maciej.szmigiero.name>, "Vlastimil Babka" <vbabka@suse.cz>, "Vishal Annapurve" <vannapurve@google.com>, "Yu Zhang" <yu.c.zhang@linux.intel.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, "Nakajima, Jun" <jun.nakajima@intel.com>, "Dave Hansen" <dave.hansen@intel.com>, "Andi Kleen" <ak@linux.intel.com>, "David Hildenbrand" <david@redhat.com>, "Marc Zyngier" <maz@kernel.org>, "Will Deacon" <will@kernel.org> Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Date: Fri, 01 Apr 2022 12:56:50 -0700 [thread overview] Message-ID: <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> (raw) In-Reply-To: <YkcTTY4YjQs5BRhE@google.com> On Fri, Apr 1, 2022, at 7:59 AM, Quentin Perret wrote: > On Thursday 31 Mar 2022 at 09:04:56 (-0700), Andy Lutomirski wrote: > To answer your original question about memory 'conversion', the key > thing is that the pKVM hypervisor controls the stage-2 page-tables for > everyone in the system, all guests as well as the host. As such, a page > 'conversion' is nothing more than a permission change in the relevant > page-tables. > So I can see two different ways to approach this. One is that you split the whole address space in half and, just like SEV and TDX, allocate one bit to indicate the shared/private status of a page. This makes it work a lot like SEV and TDX. The other is to have shared and private pages be distinguished only by their hypercall history and the (protected) page tables. This saves some address space and some page table allocations, but it opens some cans of worms too. In particular, the guest and the hypervisor need to coordinate, in a way that the guest can trust, to ensure that the guest's idea of which pages are private match the host's. This model seems a bit harder to support nicely with the private memory fd model, but not necessarily impossible. Also, what are you trying to accomplish by having the host userspace mmap private pages? Is the idea that multiple guest could share the same page until such time as one of them tries to write to it? That would be kind of like having a third kind of memory that's visible to host and guests but is read-only for everyone. TDX and SEV can't support this at all (a private page belongs to one guest and one guest only, at least in SEV and in the current TDX SEAM spec). I imagine that this could be supported with private memory fds with some care without mmap, though -- the host could still populate the page with memcpy. Or I suppose a memslot could support using MAP_PRIVATE fds and have approximately the right semantics. --Andy
WARNING: multiple messages have this Message-ID (diff)
From: "Andy Lutomirski" <luto@kernel.org> To: "Quentin Perret" <qperret@google.com> Cc: Wanpeng Li <wanpengli@tencent.com>, kvm list <kvm@vger.kernel.org>, David Hildenbrand <david@redhat.com>, qemu-devel@nongnu.org, "J . Bruce Fields" <bfields@fieldses.org>, linux-mm@kvack.org, "H. Peter Anvin" <hpa@zytor.com>, Chao Peng <chao.p.peng@linux.intel.com>, Will Deacon <will@kernel.org>, Andi Kleen <ak@linux.intel.com>, Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>, Joerg Roedel <joro@8bytes.org>, the arch/x86 maintainers <x86@kernel.org>, Hugh Dickins <hughd@google.com>, Steven Price <steven.price@arm.com>, Ingo Molnar <mingo@redhat.com>, "Maciej S . Szmigiero" <mail@maciej.szmigiero.name>, Borislav Petkov <bp@alien8.de>, "Nakajima, Jun" <jun.nakajima@intel.com>, Thomas Gleixner <tglx@linutronix.de>, Andrew Morton <akpm@linux-foundation.org>, Vlastimil Babka <vbabka@suse.cz>, Jim Mattson <jmattson@google.com>, Dave Hansen <dave.hansen@intel.com>, Sean Christopherson <seanjc@google.com>, Jeff Layton <jlayton@kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Yu Zhang <yu.c.zhang@linux.intel.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Linux API <linux-api@vger.kernel.org>, linux-fsdevel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Vishal Annapurve <vannapurve@google.com>, Mike Rapoport <rppt@kernel.org> Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Date: Fri, 01 Apr 2022 12:56:50 -0700 [thread overview] Message-ID: <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> (raw) In-Reply-To: <YkcTTY4YjQs5BRhE@google.com> On Fri, Apr 1, 2022, at 7:59 AM, Quentin Perret wrote: > On Thursday 31 Mar 2022 at 09:04:56 (-0700), Andy Lutomirski wrote: > To answer your original question about memory 'conversion', the key > thing is that the pKVM hypervisor controls the stage-2 page-tables for > everyone in the system, all guests as well as the host. As such, a page > 'conversion' is nothing more than a permission change in the relevant > page-tables. > So I can see two different ways to approach this. One is that you split the whole address space in half and, just like SEV and TDX, allocate one bit to indicate the shared/private status of a page. This makes it work a lot like SEV and TDX. The other is to have shared and private pages be distinguished only by their hypercall history and the (protected) page tables. This saves some address space and some page table allocations, but it opens some cans of worms too. In particular, the guest and the hypervisor need to coordinate, in a way that the guest can trust, to ensure that the guest's idea of which pages are private match the host's. This model seems a bit harder to support nicely with the private memory fd model, but not necessarily impossible. Also, what are you trying to accomplish by having the host userspace mmap private pages? Is the idea that multiple guest could share the same page until such time as one of them tries to write to it? That would be kind of like having a third kind of memory that's visible to host and guests but is read-only for everyone. TDX and SEV can't support this at all (a private page belongs to one guest and one guest only, at least in SEV and in the current TDX SEAM spec). I imagine that this could be supported with private memory fds with some care without mmap, though -- the host could still populate the page with memcpy. Or I suppose a memslot could support using MAP_PRIVATE fds and have approximately the right semantics. --Andy
next prev parent reply other threads:[~2022-04-01 19:57 UTC|newest] Thread overview: 183+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-10 14:08 [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng 2022-03-10 14:08 ` Chao Peng 2022-03-10 14:08 ` [PATCH v5 01/13] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng 2022-03-10 14:08 ` Chao Peng 2022-04-11 15:10 ` Kirill A. Shutemov 2022-04-11 15:10 ` Kirill A. Shutemov 2022-04-12 13:11 ` Chao Peng 2022-04-12 13:11 ` Chao Peng 2022-04-23 5:43 ` Vishal Annapurve 2022-04-24 8:15 ` Chao Peng 2022-04-24 8:15 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 02/13] mm: Introduce memfile_notifier Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-29 18:45 ` Sean Christopherson 2022-04-08 12:54 ` Chao Peng 2022-04-08 12:54 ` Chao Peng 2022-04-12 14:36 ` Hillf Danton 2022-04-13 6:47 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 03/13] mm/shmem: Support memfile_notifier Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-10 23:08 ` Dave Chinner 2022-03-10 23:08 ` Dave Chinner 2022-03-11 8:42 ` Chao Peng 2022-03-11 8:42 ` Chao Peng 2022-04-11 15:26 ` Kirill A. Shutemov 2022-04-11 15:26 ` Kirill A. Shutemov 2022-04-12 13:12 ` Chao Peng 2022-04-12 13:12 ` Chao Peng 2022-04-19 22:40 ` Vishal Annapurve 2022-04-20 3:24 ` Chao Peng 2022-04-20 3:24 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-04-07 16:05 ` Sean Christopherson 2022-04-07 17:09 ` Andy Lutomirski 2022-04-07 17:09 ` Andy Lutomirski 2022-04-08 17:56 ` Sean Christopherson 2022-04-08 18:54 ` David Hildenbrand 2022-04-08 18:54 ` David Hildenbrand 2022-04-12 14:36 ` Jason Gunthorpe 2022-04-12 14:36 ` Jason Gunthorpe 2022-04-12 21:27 ` Andy Lutomirski 2022-04-12 21:27 ` Andy Lutomirski 2022-04-13 16:30 ` David Hildenbrand 2022-04-13 16:30 ` David Hildenbrand 2022-04-13 16:24 ` David Hildenbrand 2022-04-13 16:24 ` David Hildenbrand 2022-04-13 17:52 ` Jason Gunthorpe 2022-04-13 17:52 ` Jason Gunthorpe 2022-04-25 14:07 ` David Hildenbrand 2022-04-25 14:07 ` David Hildenbrand 2022-04-08 13:02 ` Chao Peng 2022-04-08 13:02 ` Chao Peng 2022-04-11 15:34 ` Kirill A. Shutemov 2022-04-11 15:34 ` Kirill A. Shutemov 2022-04-12 5:14 ` Hugh Dickins 2022-04-11 15:32 ` Kirill A. Shutemov 2022-04-11 15:32 ` Kirill A. Shutemov 2022-04-12 13:39 ` Chao Peng 2022-04-12 13:39 ` Chao Peng 2022-04-12 19:28 ` Kirill A. Shutemov 2022-04-12 19:28 ` Kirill A. Shutemov 2022-04-13 9:15 ` Chao Peng 2022-04-13 9:15 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 05/13] KVM: Extend the memslot to support fd-based private memory Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-28 21:27 ` Sean Christopherson 2022-04-08 13:21 ` Chao Peng 2022-04-08 13:21 ` Chao Peng 2022-03-28 21:56 ` Sean Christopherson 2022-04-08 13:46 ` Chao Peng 2022-04-08 13:46 ` Chao Peng 2022-04-08 17:45 ` Sean Christopherson 2022-03-10 14:09 ` [PATCH v5 06/13] KVM: Use kvm_userspace_memory_region_ext Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-28 22:26 ` Sean Christopherson 2022-04-08 13:58 ` Chao Peng 2022-04-08 13:58 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 07/13] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-28 22:33 ` Sean Christopherson 2022-04-08 13:59 ` Chao Peng 2022-04-08 13:59 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 08/13] KVM: Use memfile_pfn_ops to obtain pfn for private pages Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-28 23:56 ` Sean Christopherson 2022-04-08 14:07 ` Chao Peng 2022-04-08 14:07 ` Chao Peng 2022-04-28 12:37 ` Chao Peng 2022-04-28 12:37 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 09/13] KVM: Handle page fault for private memory Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-29 1:07 ` Sean Christopherson 2022-04-12 12:10 ` Chao Peng 2022-04-12 12:10 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 10/13] KVM: Register private memslot to memory backing store Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-29 19:01 ` Sean Christopherson 2022-04-12 12:40 ` Chao Peng 2022-04-12 12:40 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 11/13] KVM: Zap existing KVM mappings when pages changed in the private fd Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-29 19:23 ` Sean Christopherson 2022-04-12 12:43 ` Chao Peng 2022-04-12 12:43 ` Chao Peng 2022-04-05 23:45 ` Michael Roth 2022-04-08 3:06 ` Sean Christopherson 2022-04-19 22:43 ` Vishal Annapurve 2022-04-20 3:17 ` Chao Peng 2022-04-20 3:17 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 12/13] KVM: Expose KVM_MEM_PRIVATE Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-29 19:13 ` Sean Christopherson 2022-04-12 12:56 ` Chao Peng 2022-04-12 12:56 ` Chao Peng 2022-03-10 14:09 ` [PATCH v5 13/13] memfd_create.2: Describe MFD_INACCESSIBLE flag Chao Peng 2022-03-10 14:09 ` Chao Peng 2022-03-24 15:51 ` [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Quentin Perret 2022-03-28 17:13 ` Sean Christopherson 2022-03-28 18:00 ` Quentin Perret 2022-03-28 18:58 ` Sean Christopherson 2022-03-29 17:01 ` Quentin Perret 2022-03-30 8:58 ` Steven Price 2022-03-30 8:58 ` Steven Price 2022-03-30 10:39 ` Quentin Perret 2022-03-30 17:58 ` Sean Christopherson 2022-03-31 16:04 ` Andy Lutomirski 2022-03-31 16:04 ` Andy Lutomirski 2022-04-01 14:59 ` Quentin Perret 2022-04-01 17:14 ` Sean Christopherson 2022-04-01 18:03 ` Quentin Perret 2022-04-01 18:24 ` Sean Christopherson 2022-04-01 19:56 ` Andy Lutomirski [this message] 2022-04-01 19:56 ` Andy Lutomirski 2022-04-04 15:01 ` Quentin Perret 2022-04-04 17:06 ` Sean Christopherson 2022-04-04 22:04 ` Andy Lutomirski 2022-04-04 22:04 ` Andy Lutomirski 2022-04-05 10:36 ` Quentin Perret 2022-04-05 17:51 ` Andy Lutomirski 2022-04-05 17:51 ` Andy Lutomirski 2022-04-05 18:30 ` Sean Christopherson 2022-04-06 18:42 ` Andy Lutomirski 2022-04-06 18:42 ` Andy Lutomirski 2022-04-06 13:05 ` Quentin Perret 2022-04-05 18:03 ` Sean Christopherson 2022-04-06 10:34 ` Quentin Perret 2022-04-22 10:56 ` Chao Peng 2022-04-22 10:56 ` Chao Peng 2022-04-22 11:06 ` Paolo Bonzini 2022-04-22 11:06 ` Paolo Bonzini 2022-04-24 8:07 ` Chao Peng 2022-04-24 8:07 ` Chao Peng 2022-04-24 16:59 ` Andy Lutomirski 2022-04-24 16:59 ` Andy Lutomirski 2022-04-25 13:40 ` Chao Peng 2022-04-25 13:40 ` Chao Peng 2022-04-25 14:52 ` Andy Lutomirski 2022-04-25 14:52 ` Andy Lutomirski 2022-04-25 20:30 ` Sean Christopherson 2022-06-10 19:18 ` Andy Lutomirski 2022-06-10 19:27 ` Sean Christopherson 2022-04-28 12:29 ` Chao Peng 2022-04-28 12:29 ` Chao Peng 2022-05-03 11:12 ` Quentin Perret 2022-05-09 22:30 ` Michael Roth 2022-05-09 23:29 ` Sean Christopherson 2022-07-21 20:05 ` Gupta, Pankaj 2022-07-21 21:19 ` Sean Christopherson 2022-07-21 21:36 ` Gupta, Pankaj 2022-07-23 3:09 ` Andy Lutomirski 2022-07-25 9:19 ` Gupta, Pankaj 2022-03-30 16:18 ` Sean Christopherson 2022-03-28 20:16 ` Andy Lutomirski 2022-03-28 20:16 ` Andy Lutomirski 2022-03-28 22:48 ` Nakajima, Jun 2022-03-28 22:48 ` Nakajima, Jun 2022-03-29 0:04 ` Sean Christopherson 2022-04-08 21:35 ` Vishal Annapurve 2022-04-12 13:00 ` Chao Peng 2022-04-12 13:00 ` Chao Peng 2022-04-12 19:58 ` Kirill A. Shutemov 2022-04-12 19:58 ` Kirill A. Shutemov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com \ --to=luto@kernel.org \ --cc=ak@linux.intel.com \ --cc=akpm@linux-foundation.org \ --cc=bfields@fieldses.org \ --cc=bp@alien8.de \ --cc=chao.p.peng@linux.intel.com \ --cc=corbet@lwn.net \ --cc=dave.hansen@intel.com \ --cc=david@redhat.com \ --cc=hpa@zytor.com \ --cc=hughd@google.com \ --cc=jlayton@kernel.org \ --cc=jmattson@google.com \ --cc=joro@8bytes.org \ --cc=jun.nakajima@intel.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=kvm@vger.kernel.org \ --cc=linux-api@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mail@maciej.szmigiero.name \ --cc=maz@kernel.org \ --cc=mingo@redhat.com \ --cc=pbonzini@redhat.com \ --cc=qemu-devel@nongnu.org \ --cc=qperret@google.com \ --cc=rppt@kernel.org \ --cc=seanjc@google.com \ --cc=steven.price@arm.com \ --cc=tglx@linutronix.de \ --cc=vannapurve@google.com \ --cc=vbabka@suse.cz \ --cc=vkuznets@redhat.com \ --cc=wanpengli@tencent.com \ --cc=will@kernel.org \ --cc=x86@kernel.org \ --cc=yu.c.zhang@linux.intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.