From: "Nikunj A. Dadhania" <nikunj@amd.com>
To: "Gupta, Pankaj" <pankaj.gupta@amd.com>,
Chao Peng <chao.p.peng@linux.intel.com>,
Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
"J . Bruce Fields" <bfields@fieldses.org>,
Andrew Morton <akpm@linux-foundation.org>,
Shuah Khan <shuah@kernel.org>, Mike Rapoport <rppt@kernel.org>,
Steven Price <steven.price@arm.com>,
"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
Vlastimil Babka <vbabka@suse.cz>,
Vishal Annapurve <vannapurve@google.com>,
Yu Zhang <yu.c.zhang@linux.intel.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com,
ak@linux.intel.com, david@redhat.com, aarcange@redhat.com,
ddutile@redhat.com, dhildenb@redhat.com,
Quentin Perret <qperret@google.com>,
Michael Roth <michael.roth@amd.com>,
mhocko@suse.com, Muchun Song <songmuchun@bytedance.com>,
bharata@amd.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org, linux-api@vger.kernel.org,
linux-doc@vger.kernel.org, qemu-devel@nongnu.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory
Date: Thu, 11 Aug 2022 22:48:28 +0530 [thread overview]
Message-ID: <9dc91ce8-4cb6-37e6-4c25-27a72dc11dd0@amd.com> (raw)
In-Reply-To: <9e86daea-5619-a216-fe02-0562cf14c501@amd.com>
On 11/08/22 17:00, Gupta, Pankaj wrote:
>
>>> This is the v7 of this series which tries to implement the fd-based KVM
>>> guest private memory. The patches are based on latest kvm/queue branch
>>> commit:
>>>
>>> b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU
>>> split_desc_cache only by default capacity
>>>
>>> Introduction
>>> ------------
>>> In general this patch series introduce fd-based memslot which provides
>>> guest memory through memory file descriptor fd[offset,size] instead of
>>> hva/size. The fd can be created from a supported memory filesystem
>>> like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM
>>> and the the memory backing store exchange callbacks when such memslot
>>> gets created. At runtime KVM will call into callbacks provided by the
>>> backing store to get the pfn with the fd+offset. Memory backing store
>>> will also call into KVM callbacks when userspace punch hole on the fd
>>> to notify KVM to unmap secondary MMU page table entries.
>>>
>>> Comparing to existing hva-based memslot, this new type of memslot allows
>>> guest memory unmapped from host userspace like QEMU and even the kernel
>>> itself, therefore reduce attack surface and prevent bugs.
>>>
>>> Based on this fd-based memslot, we can build guest private memory that
>>> is going to be used in confidential computing environments such as Intel
>>> TDX and AMD SEV. When supported, the memory backing store can provide
>>> more enforcement on the fd and KVM can use a single memslot to hold both
>>> the private and shared part of the guest memory.
>>>
>>> mm extension
>>> ---------------------
>>> Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file
>>> created with these flags cannot read(), write() or mmap() etc via normal
>>> MMU operations. The file content can only be used with the newly
>>> introduced memfile_notifier extension.
>>>
>>> The memfile_notifier extension provides two sets of callbacks for KVM to
>>> interact with the memory backing store:
>>> - memfile_notifier_ops: callbacks for memory backing store to notify
>>> KVM when memory gets invalidated.
>>> - backing store callbacks: callbacks for KVM to call into memory
>>> backing store to request memory pages for guest private memory.
>>>
>>> The memfile_notifier extension also provides APIs for memory backing
>>> store to register/unregister itself and to trigger the notifier when the
>>> bookmarked memory gets invalidated.
>>>
>>> The patchset also introduces a new memfd seal F_SEAL_AUTO_ALLOCATE to
>>> prevent double allocation caused by unintentional guest when we only
>>> have a single side of the shared/private memfds effective.
>>>
>>> memslot extension
>>> -----------------
>>> Add the private fd and the fd offset to existing 'shared' memslot so
>>> that both private/shared guest memory can live in one single memslot.
>>> A page in the memslot is either private or shared. Whether a guest page
>>> is private or shared is maintained through reusing existing SEV ioctls
>>> KVM_MEMORY_ENCRYPT_{UN,}REG_REGION.
>>>
>>> Test
>>> ----
>>> To test the new functionalities of this patch TDX patchset is needed.
>>> Since TDX patchset has not been merged so I did two kinds of test:
>>>
>>> - Regresion test on kvm/queue (this patchset)
>>> Most new code are not covered. Code also in below repo:
>>> https://github.com/chao-p/linux/tree/privmem-v7
>>>
>>> - New Funational test on latest TDX code
>>> The patch is rebased to latest TDX code and tested the new
>>> funcationalities. See below repos:
>>> Linux: https://github.com/chao-p/linux/tree/privmem-v7-tdx
>>> QEMU: https://github.com/chao-p/qemu/tree/privmem-v7
>>
>> While debugging an issue with SEV+UPM, found that fallocate() returns
>> an error in QEMU which is not handled (EINTR). With the below handling
>> of EINTR subsequent fallocate() succeeds:
>>
>>
>> diff --git a/backends/hostmem-memfd-private.c b/backends/hostmem-memfd-private.c
>> index af8fb0c957..e8597ed28d 100644
>> --- a/backends/hostmem-memfd-private.c
>> +++ b/backends/hostmem-memfd-private.c
>> @@ -39,7 +39,7 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>> MachineState *machine = MACHINE(qdev_get_machine());
>> uint32_t ram_flags;
>> char *name;
>> - int fd, priv_fd;
>> + int fd, priv_fd, ret;
>> if (!backend->size) {
>> error_setg(errp, "can't create backend with size 0");
>> @@ -65,7 +65,15 @@ priv_memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>> backend->size, ram_flags, fd, 0, errp);
>> g_free(name);
>> - fallocate(priv_fd, 0, 0, backend->size);
>> +again:
>> + ret = fallocate(priv_fd, 0, 0, backend->size);
>> + if (ret) {
>> + perror("Fallocate failed: \n");
>> + if (errno == EINTR)
>> + goto again;
>> + else
>> + exit(1);
>> + }
>>
>> However, fallocate() preallocates full guest memory before starting the guest.
>> With this behaviour guest memory is *not* demand pinned. Is there a way to
>> prevent fallocate() from reserving full guest memory?
>
> Isn't the pinning being handled by the corresponding host memory backend with mmu > notifier and architecture support while doing the memory operations e.g page> migration and swapping/reclaim (not supported currently AFAIU). But yes, we need> to allocate entire guest memory with the new flags MEMFILE_F_{UNMOVABLE, UNRECLAIMABLE etc}.
That is correct, but the question is when does the memory allocated, as these flags are set,
memory is neither moved nor reclaimed. In current scenario, if I start a 32GB guest, all 32GB is
allocated.
Regards
Nikunj
next prev parent reply other threads:[~2022-08-11 17:21 UTC|newest]
Thread overview: 172+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-06 8:20 [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory Chao Peng
2022-07-06 8:20 ` [PATCH v7 01/14] mm: Add F_SEAL_AUTO_ALLOCATE seal to memfd Chao Peng
2022-07-21 9:44 ` David Hildenbrand
2022-07-21 9:50 ` David Hildenbrand
2022-07-21 15:05 ` Sean Christopherson
2022-07-25 13:46 ` Chao Peng
2022-07-21 10:27 ` Gupta, Pankaj
2022-07-25 13:54 ` Chao Peng
2022-07-25 14:49 ` Gupta, Pankaj
2022-07-25 13:42 ` Chao Peng
2022-08-05 17:55 ` Paolo Bonzini
2022-08-05 18:06 ` David Hildenbrand
2022-08-10 9:40 ` Chao Peng
2022-08-10 9:38 ` Chao Peng
2022-08-17 23:41 ` Kirill A. Shutemov
2022-08-18 9:09 ` Paolo Bonzini
2022-08-23 7:36 ` David Hildenbrand
2022-08-24 10:20 ` Chao Peng
2022-08-26 15:19 ` Fuad Tabba
2022-08-29 15:18 ` Chao Peng
2022-07-06 8:20 ` [PATCH v7 02/14] selftests/memfd: Add tests for F_SEAL_AUTO_ALLOCATE Chao Peng
2022-08-05 13:11 ` David Hildenbrand
2022-07-06 8:20 ` [PATCH v7 03/14] mm: Introduce memfile_notifier Chao Peng
2022-08-05 13:22 ` David Hildenbrand
2022-08-10 9:22 ` Chao Peng
2022-08-10 10:05 ` David Hildenbrand
2022-08-10 14:38 ` Sean Christopherson
2022-08-11 12:27 ` Quentin Perret
2022-08-11 13:39 ` Chao Peng
2022-07-06 8:20 ` [PATCH v7 04/14] mm/shmem: Support memfile_notifier Chao Peng
2022-07-12 18:02 ` Gupta, Pankaj
2022-07-13 7:44 ` Chao Peng
2022-07-13 10:01 ` Gupta, Pankaj
2022-07-13 23:49 ` Chao Peng
2022-07-14 4:15 ` Gupta, Pankaj
2022-08-05 13:26 ` David Hildenbrand
2022-08-10 9:25 ` Chao Peng
2022-07-06 8:20 ` [PATCH v7 05/14] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2022-08-05 13:28 ` David Hildenbrand
2022-08-10 9:37 ` Chao Peng
2022-08-10 9:55 ` David Hildenbrand
2022-08-11 13:17 ` Chao Peng
2022-09-07 16:18 ` Kirill A. Shutemov
2022-07-06 8:20 ` [PATCH v7 06/14] KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS Chao Peng
2022-07-06 8:20 ` [PATCH v7 07/14] KVM: Use gfn instead of hva for mmu_notifier_retry Chao Peng
2022-07-15 11:36 ` Gupta, Pankaj
2022-07-18 13:29 ` Chao Peng
2022-07-18 15:26 ` Sean Christopherson
2022-07-19 14:02 ` Chao Peng
2022-08-04 7:10 ` Isaku Yamahata
2022-08-10 8:19 ` Chao Peng
2022-07-06 8:20 ` [PATCH v7 08/14] KVM: Rename mmu_notifier_* Chao Peng
2022-07-29 19:02 ` Sean Christopherson
2022-08-03 10:13 ` Chao Peng
2022-08-05 19:54 ` Paolo Bonzini
2022-08-10 8:09 ` Chao Peng
2023-05-23 7:19 ` Kautuk Consul
2023-05-23 14:19 ` Sean Christopherson
2023-05-24 6:12 ` Kautuk Consul
2023-05-24 20:16 ` Sean Christopherson
2023-05-24 20:33 ` Peter Zijlstra
2023-05-24 21:39 ` Sean Christopherson
2023-05-25 8:54 ` Peter Zijlstra
2023-05-25 3:52 ` Kautuk Consul
2023-05-24 20:28 ` Peter Zijlstra
2022-07-06 8:20 ` [PATCH v7 09/14] KVM: Extend the memslot to support fd-based private memory Chao Peng
2022-07-29 19:51 ` Sean Christopherson
2022-08-03 10:08 ` Chao Peng
2022-08-03 14:42 ` Sean Christopherson
2022-07-06 8:20 ` [PATCH v7 10/14] KVM: Add KVM_EXIT_MEMORY_FAULT exit Chao Peng
2022-07-06 8:20 ` [PATCH v7 11/14] KVM: Register/unregister the guest private memory regions Chao Peng
2022-07-19 8:00 ` Gupta, Pankaj
2022-07-19 14:08 ` Chao Peng
2022-07-19 14:23 ` Gupta, Pankaj
2022-07-20 15:07 ` Chao Peng
2022-07-20 15:31 ` Gupta, Pankaj
2022-07-20 16:21 ` Sean Christopherson
2022-07-20 17:41 ` Gupta, Pankaj
2022-07-21 7:34 ` Wei Wang
2022-07-21 9:29 ` Chao Peng
2022-07-21 17:58 ` Sean Christopherson
2022-07-25 13:04 ` Chao Peng
2022-07-29 19:54 ` Sean Christopherson
2022-08-02 0:49 ` Sean Christopherson
2022-08-02 16:38 ` Sean Christopherson
2022-08-03 9:48 ` Chao Peng
2022-08-03 15:51 ` Sean Christopherson
2022-08-04 7:58 ` Chao Peng
2022-07-20 16:44 ` Sean Christopherson
2022-07-21 9:37 ` Chao Peng
2022-08-19 19:37 ` Vishal Annapurve
2022-08-24 10:37 ` Chao Peng
2022-08-26 15:19 ` Fuad Tabba
2022-08-29 15:21 ` Chao Peng
2022-07-06 8:20 ` [PATCH v7 12/14] KVM: Handle page fault for private memory Chao Peng
2022-07-29 20:58 ` Sean Christopherson
2022-08-03 9:52 ` Chao Peng
2022-07-06 8:20 ` [PATCH v7 13/14] KVM: Enable and expose KVM_MEM_PRIVATE Chao Peng
2022-07-19 9:55 ` Gupta, Pankaj
2022-07-19 14:12 ` Chao Peng
2022-07-06 8:20 ` [PATCH v7 14/14] memfd_create.2: Describe MFD_INACCESSIBLE flag Chao Peng
2022-08-01 14:40 ` Dave Hansen
2022-08-03 9:53 ` Chao Peng
2022-07-13 3:58 ` [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory Gupta, Pankaj
2022-07-13 7:57 ` Chao Peng
2022-07-13 10:35 ` Gupta, Pankaj
2022-07-13 23:59 ` Chao Peng
2022-07-14 4:39 ` Gupta, Pankaj
2022-07-14 5:06 ` Gupta, Pankaj
2022-07-14 4:29 ` Andy Lutomirski
2022-07-14 5:13 ` Gupta, Pankaj
2022-08-11 10:02 ` Nikunj A. Dadhania
2022-08-11 11:30 ` Gupta, Pankaj
2022-08-11 13:32 ` Chao Peng
2022-08-11 17:28 ` Nikunj A. Dadhania
2022-08-12 3:22 ` Nikunj A. Dadhania
2022-08-11 17:18 ` Nikunj A. Dadhania [this message]
2022-08-11 23:02 ` Gupta, Pankaj
2022-08-12 6:02 ` Gupta, Pankaj
2022-08-12 7:18 ` Gupta, Pankaj
2022-08-12 8:48 ` Nikunj A. Dadhania
2022-08-12 9:33 ` Gupta, Pankaj
2022-08-15 13:04 ` Chao Peng
2022-08-16 4:28 ` Nikunj A. Dadhania
2022-08-16 11:33 ` Gupta, Pankaj
2022-08-16 12:24 ` Kirill A . Shutemov
2022-08-16 13:03 ` Gupta, Pankaj
2022-08-16 15:38 ` Sean Christopherson
2022-08-17 15:27 ` Michael Roth
2022-08-23 1:25 ` Isaku Yamahata
2022-08-23 17:41 ` Gupta, Pankaj
2022-08-18 5:40 ` Hugh Dickins
2022-08-18 13:24 ` Kirill A . Shutemov
2022-08-19 0:20 ` Sean Christopherson
2022-08-19 3:38 ` Hugh Dickins
2022-08-19 22:53 ` Sean Christopherson
2022-08-23 7:55 ` David Hildenbrand
2022-08-23 16:05 ` Sean Christopherson
2022-08-24 9:41 ` Chao Peng
2022-09-09 4:55 ` Andy Lutomirski
2022-08-19 3:00 ` Hugh Dickins
2022-08-20 0:27 ` Kirill A. Shutemov
2022-08-21 5:15 ` Hugh Dickins
2022-08-31 14:24 ` Kirill A . Shutemov
2022-09-02 10:27 ` Chao Peng
2022-09-02 12:30 ` Kirill A . Shutemov
2022-09-08 1:10 ` Kirill A. Shutemov
2022-09-13 9:44 ` Sean Christopherson
2022-09-13 13:28 ` Kirill A. Shutemov
2022-09-13 14:53 ` Sean Christopherson
2022-09-13 16:00 ` Kirill A. Shutemov
2022-09-13 16:12 ` Sean Christopherson
2022-09-09 4:48 ` Andy Lutomirski
2022-09-09 14:32 ` Kirill A . Shutemov
2022-09-09 19:11 ` Andy Lutomirski
2022-09-09 23:02 ` Kirill A . Shutemov
2022-08-21 10:27 ` Matthew Wilcox
2022-08-24 10:27 ` Chao Peng
2022-09-09 4:44 ` Andy Lutomirski
[not found] ` <diqzlej60z57.fsf@ackerleytng-cloudtop.c.googlers.com>
[not found] ` <20221202061347.1070246-2-chao.p.peng@linux.intel.com>
2023-04-13 15:25 ` Christian Brauner
2023-04-13 22:28 ` Sean Christopherson
2023-04-14 22:38 ` Ackerley Tng
2023-04-14 23:26 ` Sean Christopherson
2023-04-15 0:06 ` Sean Christopherson
2023-04-19 8:29 ` Christian Brauner
2023-04-20 0:49 ` Sean Christopherson
2023-04-20 8:35 ` Christian Brauner
2022-08-26 15:19 ` Fuad Tabba
2022-08-29 15:17 ` Chao Peng
2022-08-31 9:12 ` Fuad Tabba
2022-09-02 10:19 ` Chao Peng
2022-09-09 15:35 ` Michael Roth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9dc91ce8-4cb6-37e6-4c25-27a72dc11dd0@amd.com \
--to=nikunj@amd.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=bharata@amd.com \
--cc=bp@alien8.de \
--cc=chao.p.peng@linux.intel.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=ddutile@redhat.com \
--cc=dhildenb@redhat.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jlayton@kernel.org \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=jun.nakajima@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mail@maciej.szmigiero.name \
--cc=mhocko@suse.com \
--cc=michael.roth@amd.com \
--cc=mingo@redhat.com \
--cc=pankaj.gupta@amd.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qperret@google.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=songmuchun@bytedance.com \
--cc=steven.price@arm.com \
--cc=tglx@linutronix.de \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
--cc=yu.c.zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).