linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
@ 2023-12-30 17:23 Michael Roth
  2023-12-30 17:23 ` [PATCH v11 01/35] KVM: Add hugepage support for dedicated guest memory Michael Roth
                   ` (34 more replies)
  0 siblings, 35 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

This patchset is also available at:

  https://github.com/amdese/linux/commits/snp-host-v11

and is based on top of the following series:

  "[PATCH v1] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support"
  https://lore.kernel.org/kvm/20231230161954.569267-1-michael.roth@amd.com/

which in turn is based on linux-next tag next-20231222

The host initialization patches have been split off to a separate series
as noted above to more easily shepherd them into tip/x86, while this series
now focuses on KVM support. Additionally, the gmem RFC[1] that this series
was previously based on is now included for better visibility and to provide
more context. Please see the RFC link however for more context on why the
gmem changes/hooks are implemented in their current form.

[1] https://lore.kernel.org/kvm/20231016115028.996656-1-michael.roth@amd.com/


== OVERVIEW ==

This patchset implements SEV-SNP hypervisor support for linux. It
relies on the SNP host initialization patches noted above, which we
originally included in this series up until v10, but are now posted
separately for inclusion into x86 tree, while this series is targetted
for the x86 KVM tree. Both aggregate of these patchsets are being based
on linux-next to hopefully make it easier coordinate and test against
tip and kvm-next. 


== TESTING ==

For testing this via QEMU, use the following tree:

  https://github.com/amdese/qemu/commits/snp-v3-wip

SEV-SNP with gmem enabled:

  qemu-system-x86_64 -cpu EPYC-Milan-v2 \
    -object memory-backend-memfd,id=ram1,size=2G,share=true,prealloc=false,reserve=false \
    -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \
    -machine q35,confidential-guest-support=sev0,memory-backend=ram1 \
    ...

KVM selftests for guest_memfd / KVM_GENERIC_PRIVATE_MEM:

  cd $kernel_src_dir
  make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I<path to kernel headers>"
  sudo tools/testing/selftests/kvm/x86_64/private_mem_conversions_test


== BACKGROUND (SEV-SNP) ==

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required to add KVM support for SEV-SNP. The series builds upon
SEV-SNP Guest Support, now part of mainline, and a separate series that
implements basic host initialization requirements for SNP-enabled systems.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:

- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

When pages are marked as guest-owned in the RMP table, they are assigned
to a specific guest/ASID, as well as a specific GFN with in the guest. Any
attempts to map it in the RMP table to a different guest/ASID, or a
different GFN within a guest/ASID, will result in an RMP nested page fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outisde of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests 
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation SEV-SNP, private guest memory is managed by a new 
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike

Changes since v10:

 * Split off host initialization patches to separate series
 * Drop SNP_{SET,GET}_EXT_CONFIG SEV ioctls, and drop 
   KVM_SEV_SNP_{SET,GET}_CERTS KVM ioctls. Instead, all certificate data is
   now fetched from uerspace as part of a new KVM_EXIT_VMGEXIT event type.
   (Sean, Dionna)
 * SNP_SET_EXT_CONFIG is now replaced with a more basic SNP_SET_CONFIG,
   which is now just a light wrapper around the SNP_CONFIG firmware command,
   and SNP_GET_EXT_CONFIG is now redundant with existing SNP_PLATFORM_STATUS,
   so just stick with that interface
 * Introduce SNP_SET_CONFIG_{START,END}, which can be used to pause extended
   guest requests while reported TCB / certificates are being updated so
   the updates are done atomically relative to running guests.
 * Improve documentation for KVM_EXIT_VMGEXIT event types and tighten down
   the expected input/output for union types rather than exposing GHCB
   page/MSR
 * Various re-factorings, commit/comments fixups (Boris, Liam, Vlastimil) 
 * Make CONFIG_KVM_AMD_SEV depend on KVM_GENERIC_PRIVATE_MEM instead of
   CONFIG_KVM_SW_PROTECTED_VM (Paolo)
 * Include Sean's patch to add hugepage support to gmem, but modify it based
   on discussions to be best-effort and not rely on explicit flag

Changes since v9:

 * Split off gmem changes to separate RFC series, drop RFC tag from this series
 * Use 2M RMPUPDATE instructions whenever possible when invalidating/releasing
   gmem pages
 * Tighten up RMP #NPF handling to better differentiate spurious cases from
   unexpected behavior
 * Simplify/optimize logic for determine when 2M NPT private mappings are
   possible
 * Be more consistent with PFN data types and stub return values (Dave)
 * Reduce potential flooding from frequently-printed pr_debug()'s (Dave)
 * Use existing #PF handling paths to catch illegal userspace-generated RMP
   faults (Dave)
 * Improve host kexec/kdump support (Ashish)
 * Reduce overhead from unecessary WBINVD via MMU notifiers (Ashish)
 * Avoid host crashes during CCP module probe if SNP_INIT* is issued while
   guests are running (Tom L.)
 * Simplify AutoIBRS disablement (Kim, Dave)
 * Avoid unecessary zero'ing in extended guest requests (Alexey)
 * Fix padding in struct sev_user_data_ext_snp_config (Alexey)
 * Report AP creation failures via GHCB error codes rather than inducing #GP in
   guest (Peter)
 * Disallow multiple allocations of snp_context via userspace (Peter)
 * Error out on unsupported SNP policy bits (Tom)
 * Fix snp_leak_pages() stub (Jeremi)
 * Use C99 flexible arrays where appropriate
 * Use helper to handle HVA->PFN conversions prior to dumping RMP entries (Dave)
 * Don't potentially print out all 512 entries when dumping 2MB RMP range (Dave)
 * Don't use a union to dump raw RMP entries, just cast at dump-site (Dave)
 * Don't use helpers to access RMP entry bitfields, use them directly (Dave) 
 * Simplify logic and improve comments for AutoIBRS disablement (Dave)

 # Changes that were split off to separate gmem series
 * Use KVM_X86_SNP_VM to implement SNP-specific checks on whether a fault was
   shared/private and drop the duplicate memslot lookup (Isaku, Sean)
 * Use Isaku's version of patch to plumb 64-bit #NPF error code (Isaku)
 * Fix up stub for kvm_arch_gmem_invalidate() (Boris)

----------------------------------------------------------------
Ashish Kalra (1):
      KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

Brijesh Singh (14):
      KVM: x86: Define RMP page fault error bits for #NPF
      KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
      KVM: SEV: Add initial SEV-SNP support
      KVM: SEV: Add KVM_SNP_INIT command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
      KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
      KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
      KVM: SEV: Add support to handle Page State Change VMGEXIT
      KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
      KVM: SEV: Add support to handle RMP nested page faults
      KVM: SVM: Add module parameter to enable the SEV-SNP
      KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

Michael Roth (15):
      mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory
      KVM: Use AS_INACCESSIBLE when creating guest_memfd inode
      KVM: x86: Add gmem hook for initializing memory
      KVM: x86: Add gmem hook for invalidating memory
      KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults
      KVM: x86: Add KVM_X86_SNP_VM vm_type
      KVM: x86: Determine shared/private faults based on vm_type
      KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
      KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
      KVM: SEV: Add support for GHCB-based termination requests
      KVM: SEV: Implement gmem hook for initializing private pages
      KVM: SEV: Implement gmem hook for invalidating private pages
      KVM: x86: Add gmem hook for determining max NPT mapping level
      crypto: ccp: Add the SNP_SET_CONFIG_{START,END} commands
      KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Sean Christopherson (1):
      KVM: Add hugepage support for dedicated guest memory

Tom Lendacky (3):
      KVM: SEV: Add support to handle AP reset MSR protocol
      KVM: SEV: Use a VMSA physical address variable for populating VMCB
      KVM: SEV: Support SEV-SNP AP Creation NAE event

Vishal Annapurve (1):
      KVM: Add HVA range operator

 Documentation/virt/coco/sev-guest.rst              |   33 +-
 Documentation/virt/kvm/api.rst                     |   73 ++
 .../virt/kvm/x86/amd-memory-encryption.rst         |  103 ++
 arch/x86/include/asm/kvm-x86-ops.h                 |    3 +
 arch/x86/include/asm/kvm_host.h                    |   15 +
 arch/x86/include/asm/sev-common.h                  |   22 +-
 arch/x86/include/asm/sev.h                         |   11 +
 arch/x86/include/asm/svm.h                         |    6 +
 arch/x86/include/uapi/asm/kvm.h                    |    1 +
 arch/x86/kvm/Kconfig                               |    3 +
 arch/x86/kvm/mmu.h                                 |    2 -
 arch/x86/kvm/mmu/mmu.c                             |   28 +-
 arch/x86/kvm/mmu/mmu_internal.h                    |   24 +-
 arch/x86/kvm/mmu/mmutrace.h                        |    2 +-
 arch/x86/kvm/mmu/paging_tmpl.h                     |    2 +-
 arch/x86/kvm/svm/sev.c                             | 1362 +++++++++++++++++++-
 arch/x86/kvm/svm/svm.c                             |   38 +-
 arch/x86/kvm/svm/svm.h                             |   40 +-
 arch/x86/kvm/x86.c                                 |   44 +-
 arch/x86/virt/svm/sev.c                            |   51 +
 drivers/crypto/ccp/sev-dev.c                       |   44 +
 include/linux/kvm_host.h                           |   24 +
 include/linux/pagemap.h                            |    1 +
 include/uapi/linux/kvm.h                           |   84 ++
 include/uapi/linux/psp-sev.h                       |   12 +
 include/uapi/linux/sev-guest.h                     |    9 +
 mm/truncate.c                                      |    3 +-
 virt/kvm/Kconfig                                   |    8 +
 virt/kvm/guest_memfd.c                             |  132 +-
 virt/kvm/kvm_main.c                                |   49 +
 30 files changed, 2175 insertions(+), 54 deletions(-)



^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v11 01/35] KVM: Add hugepage support for dedicated guest memory
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 02/35] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

From: Sean Christopherson <seanjc@google.com>

Extended guest_memfd to allow backing guest memory with hugepages. This
is done as a best-effort by default until a better-defined mechanism is
put in place that can provide better control/assurances to userspace
about hugepage allocations.

When reporting the max order when KVM gets a pfn from guest_memfd, force
order-0 pages if the hugepage is not fully contained by the memslot
binding, e.g. if userspace requested hugepages but punches a hole in the
memslot bindings in order to emulate x86's VGA hole.

Link: https://lore.kernel.org/kvm/20231027182217.3615211-1-seanjc@google.com/T/#mccbd3e8bf9897f0ddbf864e6318d6f2f208b269c
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-18-seanjc@google.com>
[Allow even with CONFIG_TRANSPARENT_HUGEPAGE; dropped momentarily due to
 uneasiness about the API. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[mdr: based on discussion in the Link regarding original patch, make the
      following set of changes:
      - For now, don't introduce an opt-in flag to enable hugepage
        support. By default, just make a best-effort for PMD_ORDER
        allocations so that there are no false assurances to userspace
        that they'll get hugepages. It's better at least than the
        current guarantee that they will get 4K pages every time. A more
        proper opt-in interface can then improve on things later.
      - Pass GFP_NOWARN to alloc_pages() so failures are not disruptive
        to normal operations
      - Drop size checks during creation time. Instead just avoid huge
        allocations if they extend beyond end of the memfd.
      - Drop hugepage-related unit tests since everything is now handled
        transparently to userspace anyway.
      - Update commit message accordingly.]
Signed-off-by: Michael Roth <michael.roth@amd.com>

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 virt/kvm/guest_memfd.c | 63 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 56 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 4aa23b01aa98..784690a664ac 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -13,14 +13,46 @@ struct kvm_gmem {
 	struct list_head entry;
 };
 
-static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
+static struct folio *kvm_gmem_get_huge_folio(struct inode *inode, pgoff_t index,
+					     unsigned int order)
 {
+	pgoff_t npages = 1UL << order;
+	pgoff_t huge_index = round_down(index, npages);
+	struct address_space *mapping  = inode->i_mapping;
+	gfp_t gfp = mapping_gfp_mask(mapping) | __GFP_NOWARN;
+	loff_t size = i_size_read(inode);
 	struct folio *folio;
 
-	/* TODO: Support huge pages. */
-	folio = filemap_grab_folio(inode->i_mapping, index);
-	if (IS_ERR_OR_NULL(folio))
+	/* Make sure hugepages would be fully-contained by inode */
+	if ((huge_index + npages) * PAGE_SIZE > size)
+		return NULL;
+
+	if (filemap_range_has_page(mapping, (loff_t)huge_index << PAGE_SHIFT,
+				   (loff_t)(huge_index + npages - 1) << PAGE_SHIFT))
+		return NULL;
+
+	folio = filemap_alloc_folio(gfp, order);
+	if (!folio)
+		return NULL;
+
+	if (filemap_add_folio(mapping, folio, huge_index, gfp)) {
+		folio_put(folio);
 		return NULL;
+	}
+
+	return folio;
+}
+
+static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
+{
+	struct folio *folio;
+
+	folio = kvm_gmem_get_huge_folio(inode, index, PMD_ORDER);
+	if (!folio) {
+		folio = filemap_grab_folio(inode->i_mapping, index);
+		if (IS_ERR_OR_NULL(folio))
+			return NULL;
+	}
 
 	/*
 	 * Use the up-to-date flag to track whether or not the memory has been
@@ -361,6 +393,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	inode->i_mode |= S_IFREG;
 	inode->i_size = size;
 	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
+	mapping_set_large_folios(inode->i_mapping);
 	mapping_set_unmovable(inode->i_mapping);
 	/* Unmovable mappings are supposed to be marked unevictable as well. */
 	WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
@@ -486,7 +519,7 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
 {
-	pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff;
+	pgoff_t index, huge_index;
 	struct kvm_gmem *gmem;
 	struct folio *folio;
 	struct page *page;
@@ -499,6 +532,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 
 	gmem = file->private_data;
 
+	index = gfn - slot->base_gfn + slot->gmem.pgoff;
 	if (WARN_ON_ONCE(xa_load(&gmem->bindings, index) != slot)) {
 		r = -EIO;
 		goto out_fput;
@@ -518,9 +552,24 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	page = folio_file_page(folio, index);
 
 	*pfn = page_to_pfn(page);
-	if (max_order)
-		*max_order = 0;
+	if (!max_order)
+		goto success;
+
+	*max_order = compound_order(compound_head(page));
+	if (!*max_order)
+		goto success;
 
+	/*
+	 * The folio can be mapped with a hugepage if and only if the folio is
+	 * fully contained by the range the memslot is bound to.  Note, the
+	 * caller is responsible for handling gfn alignment, this only deals
+	 * with the file binding.
+	 */
+	huge_index = ALIGN(index, 1ull << *max_order);
+	if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) ||
+	    huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages)
+		*max_order = 0;
+success:
 	r = 0;
 
 out_unlock:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 02/35] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2023-12-30 17:23 ` [PATCH v11 01/35] KVM: Add hugepage support for dedicated guest memory Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 03/35] KVM: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Matthew Wilcox

filemap users like guest_memfd may use page cache pages to
allocate/manage memory that is only intended to be accessed by guests
via hardware protections like encryption. Writes to memory of this sort
in common paths like truncation may cause unexpected behavior such
writing garbage instead of zeros when attempting to zero pages, or
worse, triggering hardware protections that are considered fatal as far
as the kernel is concerned.

Introduce a new address_space flag, AS_INACCESSIBLE, and use this
initially to prevent zero'ing of pages during truncation, with the
understanding that it is up to the owner of the mapping to handle this
specially if needed.

Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/
Cc: Matthew Wilcox <willy@infradead.org>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/pagemap.h | 1 +
 mm/truncate.c           | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c2d90588c0bf..b56081a3512e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -207,6 +207,7 @@ enum mapping_flags {
 	AS_STABLE_WRITES,	/* must wait for writeback before modifying
 				   folio contents */
 	AS_UNMOVABLE,		/* The mapping cannot be moved, ever */
+	AS_INACCESSIBLE,	/* Do not attempt direct R/W access to the mapping */
 };
 
 /**
diff --git a/mm/truncate.c b/mm/truncate.c
index 725b150e47ac..c501338c7ebd 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -233,7 +233,8 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 	 * doing a complex calculation here, and then doing the zeroing
 	 * anyway if the page split fails.
 	 */
-	folio_zero_range(folio, offset, length);
+	if (!(folio->mapping->flags & AS_INACCESSIBLE))
+		folio_zero_range(folio, offset, length);
 
 	if (folio_has_private(folio))
 		folio_invalidate(folio, offset, length);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 03/35] KVM: Use AS_INACCESSIBLE when creating guest_memfd inode
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2023-12-30 17:23 ` [PATCH v11 01/35] KVM: Add hugepage support for dedicated guest memory Michael Roth
  2023-12-30 17:23 ` [PATCH v11 02/35] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 04/35] KVM: x86: Add gmem hook for initializing memory Michael Roth
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

truncate_inode_pages_range() may attempt to zero pages before truncating
them, and this will occur before arch-specific invalidations can be
triggered via .invalidate_folio/.free_folio hooks via kvm_gmem_aops. For
AMD SEV-SNP this would result in an RMP #PF being generated by the
hardware, which is currently treated as fatal (and even if specifically
allowed for, would not result in anything other than garbage being
written to guest pages due to encryption). On Intel TDX this would also
result in undesirable behavior.

Set the AS_INACCESSIBLE flag to prevent the MM from attempting
unexpected accesses of this sort during operations like truncation.

This may also in some cases yield a decent performance improvement for
guest_memfd userspace implementations that hole-punch ranges immediately
after private->shared conversions via KVM_SET_MEMORY_ATTRIBUTES, since
the current implementation of truncate_inode_pages_range() always ends
up zero'ing an entire 4K range if it is backing by a 2M folio.

Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 virt/kvm/guest_memfd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 784690a664ac..4aaa82227978 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -390,6 +390,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	inode->i_private = (void *)(unsigned long)flags;
 	inode->i_op = &kvm_gmem_iops;
 	inode->i_mapping->a_ops = &kvm_gmem_aops;
+	inode->i_mapping->flags |= AS_INACCESSIBLE;
 	inode->i_mode |= S_IFREG;
 	inode->i_size = size;
 	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 04/35] KVM: x86: Add gmem hook for initializing memory
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (2 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 03/35] KVM: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 05/35] KVM: x86: Add gmem hook for invalidating memory Michael Roth
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

guest_memfd pages are generally expected to be in some arch-defined
initial state prior to using them for guest memory. For SEV-SNP this
initial state is 'private', or 'guest-owned', and requires additional
operations to move these pages into a 'private' state by updating the
corresponding entries the RMP table.

Allow for an arch-defined hook to handle updates of this sort, and go
ahead and implement one for x86 so KVM implementations like AMD SVM can
register a kvm_x86_ops callback to handle these updates for SEV-SNP
guests.

The preparation callback is always called when allocating/grabbing
folios via gmem, and it is up to the architecture to keep track of
whether or not the pages are already in the expected state (e.g. the RMP
table in the case of SEV-SNP).

In some cases, it is necessary to defer the preparation of the pages to
handle things like in-place encryption of initial guest memory payloads
before marking these pages as 'private'/'guest-owned', so also add a
helper that performs the same function as kvm_gmem_get_pfn(), but allows
for the preparation callback to be bypassed to allow for pages to be
accessed beforehand.

Link: https://lore.kernel.org/lkml/ZLqVdvsF11Ddo7Dq@google.com/
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/x86.c                 |  6 ++++
 include/linux/kvm_host.h           | 14 ++++++++
 virt/kvm/Kconfig                   |  4 +++
 virt/kvm/guest_memfd.c             | 56 +++++++++++++++++++++++++++---
 6 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index ab24ce207988..5e6b58439100 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -139,6 +139,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
+KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b0f18d096ed..1fc14aa58913 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1795,6 +1795,7 @@ struct kvm_x86_ops {
 
 	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 27e23714e960..67d7c9e1331c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13512,6 +13512,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
+#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
+{
+	return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
+}
+#endif
 
 int kvm_spec_ctrl_test_value(u64 value)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..22feb4910854 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2375,9 +2375,19 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
 #ifdef CONFIG_KVM_PRIVATE_MEM
+int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+		       gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep);
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order);
 #else
+static inline int __kvm_gmem_get_pfn(struct kvm *kvm,
+				     struct kvm_memory_slot *slot, gfn_t gfn,
+				     kvm_pfn_t *pfn, int *max_order, bool prep)
+{
+	KVM_BUG_ON(1, kvm);
+	return -EIO;
+}
+
 static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
 				   kvm_pfn_t *pfn, int *max_order)
@@ -2387,4 +2397,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 }
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
+#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
+#endif
+
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 6793211a0b64..f5d6256607d2 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -103,3 +103,7 @@ config KVM_GENERIC_PRIVATE_MEM
        select KVM_GENERIC_MEMORY_ATTRIBUTES
        select KVM_PRIVATE_MEM
        bool
+
+config HAVE_KVM_GMEM_PREPARE
+       bool
+       depends on KVM_PRIVATE_MEM
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 4aaa82227978..5e88e525cf75 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -43,7 +43,40 @@ static struct folio *kvm_gmem_get_huge_folio(struct inode *inode, pgoff_t index,
 	return folio;
 }
 
-static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
+static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
+{
+#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
+	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
+	struct kvm_gmem *gmem;
+
+	list_for_each_entry(gmem, gmem_list, entry) {
+		struct kvm_memory_slot *slot;
+		struct kvm *kvm = gmem->kvm;
+		struct page *page;
+		kvm_pfn_t pfn;
+		gfn_t gfn;
+		int rc;
+
+		slot = xa_load(&gmem->bindings, index);
+		if (!slot)
+			continue;
+
+		page = folio_file_page(folio, index);
+		pfn = page_to_pfn(page);
+		gfn = slot->base_gfn + index - slot->gmem.pgoff;
+		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
+		if (rc) {
+			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
+					    index, rc);
+			return rc;
+		}
+	}
+
+#endif
+	return 0;
+}
+
+static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prep)
 {
 	struct folio *folio;
 
@@ -73,6 +106,12 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
 		folio_mark_uptodate(folio);
 	}
 
+	if (prep && kvm_gmem_prepare_folio(inode, index, folio)) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return NULL;
+	}
+
 	/*
 	 * Ignore accessed, referenced, and dirty flags.  The memory is
 	 * unevictable and there is no storage to write back to.
@@ -177,7 +216,7 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
 			break;
 		}
 
-		folio = kvm_gmem_get_folio(inode, index);
+		folio = kvm_gmem_get_folio(inode, index, true);
 		if (!folio) {
 			r = -ENOMEM;
 			break;
@@ -517,8 +556,8 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 	fput(file);
 }
 
-int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
+int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+		       gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep)
 {
 	pgoff_t index, huge_index;
 	struct kvm_gmem *gmem;
@@ -539,7 +578,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 		goto out_fput;
 	}
 
-	folio = kvm_gmem_get_folio(file_inode(file), index);
+	folio = kvm_gmem_get_folio(file_inode(file), index, prep);
 	if (!folio) {
 		r = -ENOMEM;
 		goto out_fput;
@@ -580,4 +619,11 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 
 	return r;
 }
+EXPORT_SYMBOL_GPL(__kvm_gmem_get_pfn);
+
+int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+		     gfn_t gfn, kvm_pfn_t *pfn, int *max_order)
+{
+	return __kvm_gmem_get_pfn(kvm, slot, gfn, pfn, max_order, true);
+}
 EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 05/35] KVM: x86: Add gmem hook for invalidating memory
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (3 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 04/35] KVM: x86: Add gmem hook for initializing memory Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 06/35] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults Michael Roth
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

In some cases, like with SEV-SNP, guest memory needs to be updated in a
platform-specific manner before it can be safely freed back to the host.
Wire up arch-defined hooks to the .free_folio kvm_gmem_aops callback to
allow for special handling of this sort when freeing memory in response
to FALLOC_FL_PUNCH_HOLE operations and when releasing the inode, and go
ahead and define an arch-specific hook for x86 since it will be needed
for handling memory used for SEV-SNP guests.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/x86.c                 |  7 +++++++
 include/linux/kvm_host.h           |  4 ++++
 virt/kvm/Kconfig                   |  4 ++++
 virt/kvm/guest_memfd.c             | 14 ++++++++++++++
 6 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 5e6b58439100..c4b7b0db7be3 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL(gmem_invalidate)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1fc14aa58913..63596fe45013 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1796,6 +1796,7 @@ struct kvm_x86_ops {
 	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 67d7c9e1331c..aaf71e5c1d18 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13519,6 +13519,13 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
 }
 #endif
 
+#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+	static_call_cond(kvm_x86_gmem_invalidate)(start, end);
+}
+#endif
+
 int kvm_spec_ctrl_test_value(u64 value)
 {
 	/*
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 22feb4910854..a2a8331fbb94 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2401,4 +2401,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
 #endif
 
+#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+#endif
+
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index f5d6256607d2..734b7d467380 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -107,3 +107,7 @@ config KVM_GENERIC_PRIVATE_MEM
 config HAVE_KVM_GMEM_PREPARE
        bool
        depends on KVM_PRIVATE_MEM
+
+config HAVE_KVM_GMEM_INVALIDATE
+       bool
+       depends on KVM_PRIVATE_MEM
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 5e88e525cf75..feec0da93d98 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -370,10 +370,24 @@ static int kvm_gmem_error_folio(struct address_space *mapping,
 	return MF_DELAYED;
 }
 
+#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+static void kvm_gmem_free_folio(struct folio *folio)
+{
+	struct page *page = folio_page(folio, 0);
+	kvm_pfn_t pfn = page_to_pfn(page);
+	int order = folio_order(folio);
+
+	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
+}
+#endif
+
 static const struct address_space_operations kvm_gmem_aops = {
 	.dirty_folio = noop_dirty_folio,
 	.migrate_folio	= kvm_gmem_migrate_folio,
 	.error_remove_folio = kvm_gmem_error_folio,
+#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
+	.free_folio = kvm_gmem_free_folio,
+#endif
 };
 
 static int kvm_gmem_getattr(struct mnt_idmap *idmap, const struct path *path,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 06/35] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (4 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 05/35] KVM: x86: Add gmem hook for invalidating memory Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2024-02-06 20:51   ` Sean Christopherson
  2023-12-30 17:23 ` [PATCH v11 07/35] KVM: x86: Add KVM_X86_SNP_VM vm_type Michael Roth
                   ` (28 subsequent siblings)
  34 siblings, 1 reply; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Isaku Yamahata

In some cases the full 64-bit error code for the KVM page fault will be
needed to determine things like whether or not a fault was for a private
or shared guest page, so update related code to accept the full 64-bit
value so it can be plumbed all the way through to where it is needed.

The accessors of fault->error_code are changed as follows:

- FNAME(page_fault): change to explicitly use lower_32_bits() since that
                     is no longer done in kvm_mmu_page_fault()
- kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
                        PFERR_NESTED_GUEST_PAGE
- mmutrace: changed u32 -> u64

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Link: https://lore.kernel.org/kvm/20230612042559.375660-1-michael.roth@amd.com/T/#mbd0b20c9a2cf50319d5d2a27b63f73c772112076
[mdr: drop references/changes to code not in current gmem tree, update
      commit message]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/mmu/mmu.c          | 3 +--
 arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
 arch/x86/kvm/mmu/mmutrace.h     | 2 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 2 +-
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3c844e428684..d3fbfe0686a0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5813,8 +5813,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	}
 
 	if (r == RET_PF_INVALID) {
-		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
-					  lower_32_bits(error_code), false,
+		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
 					  &emulation_type);
 		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
 			return -EIO;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 0669a8a668ca..21f55e8b4dc6 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -190,7 +190,7 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 struct kvm_page_fault {
 	/* arguments to kvm_mmu_do_page_fault.  */
 	const gpa_t addr;
-	const u32 error_code;
+	const u64 error_code;
 	const bool prefetch;
 
 	/* Derived from error_code.  */
@@ -280,7 +280,7 @@ enum {
 };
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-					u32 err, bool prefetch, int *emulation_type)
+					u64 err, bool prefetch, int *emulation_type)
 {
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ae86820cef69..195d98bc8de8 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -260,7 +260,7 @@ TRACE_EVENT(
 	TP_STRUCT__entry(
 		__field(int, vcpu_id)
 		__field(gpa_t, cr2_or_gpa)
-		__field(u32, error_code)
+		__field(u64, error_code)
 		__field(u64 *, sptep)
 		__field(u64, old_spte)
 		__field(u64, new_spte)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 4d4e98fe4f35..c418f3b1cfca 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -787,7 +787,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * The bit needs to be cleared before walking guest page tables.
 	 */
 	r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
-			     fault->error_code & ~PFERR_RSVD_MASK);
+			     lower_32_bits(fault->error_code) & ~PFERR_RSVD_MASK);
 
 	/*
 	 * The page is not mapped by the guest.  Let the guest handle it.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 07/35] KVM: x86: Add KVM_X86_SNP_VM vm_type
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (5 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 06/35] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 08/35] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

In some cases, such as detecting whether a page fault should be handled
as a private fault or not, KVM will need to handle things differently
versus the existing KVM_X86_PROTECTED_VM type.

Add a new KVM_X86_SNP_VM to allow for this, along with a helper to query
the vm_type.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/x86.c              | 20 +++++++++++++++++---
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 63596fe45013..e38cab5dccae 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2141,6 +2141,8 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 #define kvm_arch_has_private_mem(kvm) false
 #endif
 
+bool kvm_is_vm_type(struct kvm *kvm, unsigned long type);
+
 static inline u16 kvm_read_ldt(void)
 {
 	u16 ldt;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index a448d0964fc0..57e4ba484aa2 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -564,5 +564,6 @@ struct kvm_pmu_event_filter {
 
 #define KVM_X86_DEFAULT_VM	0
 #define KVM_X86_SW_PROTECTED_VM	1
+#define KVM_X86_SNP_VM		3
 
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aaf71e5c1d18..87b78d63e81d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4579,9 +4579,21 @@ static int kvm_ioctl_get_supported_hv_cpuid(struct kvm_vcpu *vcpu,
 
 static bool kvm_is_vm_type_supported(unsigned long type)
 {
-	return type == KVM_X86_DEFAULT_VM ||
-	       (type == KVM_X86_SW_PROTECTED_VM &&
-		IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_enabled);
+	if (type == KVM_X86_DEFAULT_VM)
+		return true;
+	else if (type == KVM_X86_SW_PROTECTED_VM &&
+		 IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_enabled)
+		return true;
+	else if (type == KVM_X86_SNP_VM &&
+		 IS_ENABLED(CONFIG_KVM_AMD_SEV) && tdp_enabled)
+		return true;
+
+	return false;
+}
+
+bool kvm_is_vm_type(struct kvm *kvm, unsigned long type)
+{
+	return kvm->arch.vm_type == type;
 }
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
@@ -4784,6 +4796,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = BIT(KVM_X86_DEFAULT_VM);
 		if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
 			r |= BIT(KVM_X86_SW_PROTECTED_VM);
+		if (kvm_is_vm_type_supported(KVM_X86_SNP_VM))
+			r |= BIT(KVM_X86_SNP_VM);
 		break;
 	default:
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 08/35] KVM: x86: Define RMP page fault error bits for #NPF
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (6 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 07/35] KVM: x86: Add KVM_X86_SNP_VM vm_type Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 09/35] KVM: x86: Determine shared/private faults based on vm_type Michael Roth
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled globally, the hardware places restrictions on all
memory accesses based on the RMP entry, whether the hypervisor or a VM,
performs the accesses. When hardware encounters an RMP access violation
during a guest access, it will cause a #VMEXIT(NPF) with a number of
additional bits set to indicate the reasons for the #NPF. Define those
here.

See APM2 section 16.36.10 for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: add some additional details to commit message]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e38cab5dccae..843695217b4b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -262,9 +262,13 @@ enum x86_intercept_stage;
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
 #define PFERR_SGX_BIT 15
+#define PFERR_GUEST_RMP_BIT 31
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
 #define PFERR_IMPLICIT_ACCESS_BIT 48
+#define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36
 
 #define PFERR_PRESENT_MASK	BIT(PFERR_PRESENT_BIT)
 #define PFERR_WRITE_MASK	BIT(PFERR_WRITE_BIT)
@@ -276,6 +280,10 @@ enum x86_intercept_stage;
 #define PFERR_GUEST_FINAL_MASK	BIT_ULL(PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+#define PFERR_GUEST_RMP_MASK	BIT_ULL(PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_ENC_MASK	BIT_ULL(PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_SIZEM_MASK	BIT_ULL(PFERR_GUEST_SIZEM_BIT)
+#define PFERR_GUEST_VMPL_MASK	BIT_ULL(PFERR_GUEST_VMPL_BIT)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 09/35] KVM: x86: Determine shared/private faults based on vm_type
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (7 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 08/35] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2024-02-12 10:31   ` Paolo Bonzini
  2023-12-30 17:23 ` [PATCH v11 10/35] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests Michael Roth
                   ` (25 subsequent siblings)
  34 siblings, 1 reply; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

For KVM_X86_SNP_VM, only the PFERR_GUEST_ENC_MASK flag is needed to
determine with an #NPF is due to a private/shared access by the guest.
Implement that handling here. Also add handling needed to deal with
SNP guests which in some cases will make MMIO accesses with the
encryption bit.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/mmu/mmu.c          | 12 ++++++++++--
 arch/x86/kvm/mmu/mmu_internal.h | 20 +++++++++++++++++++-
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d3fbfe0686a0..61213f6648a1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4331,6 +4331,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_memory_slot *slot = fault->slot;
+	bool private_fault = fault->is_private;
 	bool async;
 
 	/*
@@ -4360,12 +4361,19 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 			return RET_PF_EMULATE;
 	}
 
-	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+	/*
+	 * In some cases SNP guests will make MMIO accesses with the encryption
+	 * bit set. Handle these via the normal MMIO fault path.
+	 */
+	if (!slot && private_fault && kvm_is_vm_type(vcpu->kvm, KVM_X86_SNP_VM))
+		private_fault = false;
+
+	if (private_fault != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 		return -EFAULT;
 	}
 
-	if (fault->is_private)
+	if (private_fault)
 		return kvm_faultin_pfn_private(vcpu, fault);
 
 	async = false;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 21f55e8b4dc6..e519dd363c28 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -251,6 +251,24 @@ struct kvm_page_fault {
 
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
+static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
+{
+	bool private_fault = false;
+
+	if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) {
+		private_fault = !!(err & PFERR_GUEST_ENC_MASK);
+	} else if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) {
+		/*
+		 * This handling is for gmem self-tests and guests that treat
+		 * userspace as the authority on whether a fault should be
+		 * private or not.
+		 */
+		private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
+	}
+
+	return private_fault;
+}
+
 /*
  * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
  * and of course kvm_mmu_do_page_fault().
@@ -298,7 +316,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
-		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
+		.is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
 	};
 	int r;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 10/35] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (8 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 09/35] KVM: x86: Determine shared/private faults based on vm_type Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 11/35] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y Michael Roth
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Alexey Kardashevskiy

When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out
the guest-defined values while context-switching to/from guest mode.
However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set,
so the guest-defined value is effectively ignored when switching to
guest mode with the understanding that the VMSA will handle swapping
in/out this register state.

However, SVM is still configured to intercept these accesses for SEV-ES
guests, so the values in the initial MSR_IA32_XSS are effectively
read-only, and a guest will experience undefined behavior if it actually
tries to write to this MSR. Fortunately, only CET/shadowstack makes use
of this register on SEV-ES-capable systems currently, which isn't yet
widely used, but this may become more of an issue in the future.

Additionally, enabling intercepts of MSR_IA32_XSS results in #VC
exceptions in the guest in certain paths that can lead to unexpected #VC
nesting levels. One example is SEV-SNP guests when handling #VC
exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since
they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then
generate another #VC when accessing MSR_IA32_XSS, which can lead to
guest crashes if an NMI occurs at that point in time. Running perf on a
guest while it is issuing such a sequence is one example where these can
be problematic.

Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
if the host/guest configuration allows it. If the host/guest
configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
that it can be caught by the existing checks in
kvm_{set,get}_msr_common() if the guest still attempts to access it.

Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Cc: Alexey Kardashevskiy <aik@amd.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 19 +++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  2 +-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2efe3ed89808..f99435b6648f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2978,6 +2978,25 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 
 		set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
 	}
+
+	/*
+	 * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
+	 * the host/guest supports its use.
+	 *
+	 * guest_can_use() checks a number of requirements on the host/guest to
+	 * ensure that MSR_IA32_XSS is available, but it might report true even
+	 * if X86_FEATURE_XSAVES isn't configured in the guest to ensure host
+	 * MSR_IA32_XSS is always properly restored. For SEV-ES, it is better
+	 * to further check that the guest CPUID actually supports
+	 * X86_FEATURE_XSAVES so that accesses to MSR_IA32_XSS by misbehaved
+	 * guests will still get intercepted and caught in the normal
+	 * kvm_emulate_rdmsr()/kvm_emulated_wrmsr() paths.
+	 */
+	if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
+	else
+		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 0, 0);
 }
 
 void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b6179696861a..18d55df7fa5f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -103,6 +103,7 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTTOIP,		.always = false },
+	{ .index = MSR_IA32_XSS,			.always = false },
 	{ .index = MSR_EFER,				.always = false },
 	{ .index = MSR_IA32_CR_PAT,			.always = false },
 	{ .index = MSR_AMD64_SEV_ES_GHCB,		.always = true  },
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9ed9d72546b3..7f1fbd874c45 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -30,7 +30,7 @@
 #define	IOPM_SIZE PAGE_SIZE * 3
 #define	MSRPM_SIZE PAGE_SIZE * 2
 
-#define MAX_DIRECT_ACCESS_MSRS	46
+#define MAX_DIRECT_ACCESS_MSRS	47
 #define MSRPM_OFFSETS	32
 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 11/35] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (9 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 10/35] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 12/35] KVM: SEV: Add support to handle AP reset MSR protocol Michael Roth
                   ` (23 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

SEV-SNP relies on restricted/protected memory support to run guests, so
make sure to enable that support via the CONFIG_KVM_GENERIC_PRIVATE_MEM
config option.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index e807eb56dc08..4ec53d6d5773 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -124,6 +124,7 @@ config KVM_AMD_SEV
 	bool "AMD Secure Encrypted Virtualization (SEV) support"
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
+	select KVM_GENERIC_PRIVATE_MEM
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 12/35] KVM: SEV: Add support to handle AP reset MSR protocol
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (10 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 11/35] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 13/35] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  6 ++--
 arch/x86/kvm/svm/sev.c            | 56 ++++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.h            |  1 +
 3 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b463fcbd4b90..01261f7054ad 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -54,8 +54,10 @@
 	(((unsigned long)fn) << 32))
 
 /* AP Reset Hold */
-#define GHCB_MSR_AP_RESET_HOLD_REQ	0x006
-#define GHCB_MSR_AP_RESET_HOLD_RESP	0x007
+#define GHCB_MSR_AP_RESET_HOLD_REQ		0x006
+#define GHCB_MSR_AP_RESET_HOLD_RESP		0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f99435b6648f..b09bdaed586e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -65,6 +65,10 @@ module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 #define sev_es_debug_swap_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+#define AP_RESET_HOLD_NONE		0
+#define AP_RESET_HOLD_NAE_EVENT		1
+#define AP_RESET_HOLD_MSR_PROTO		2
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2600,6 +2604,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 
 void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 {
+	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
+	svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
 	if (!svm->sev_es.ghcb)
 		return;
 
@@ -2811,6 +2818,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_AP_RESET_HOLD_REQ:
+		svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+		ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+		/*
+		 * Preset the result to a non-SIPI return and then only set
+		 * the result to non-zero when delivering a SIPI.
+		 */
+		set_ghcb_msr_bits(svm, 0,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2910,6 +2933,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
+		svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
 		ret = kvm_emulate_ap_reset_hold(vcpu);
 		break;
 	case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3153,15 +3177,31 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		return;
 	}
 
-	/*
-	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
-	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
-	 * non-zero value.
-	 */
-	if (!svm->sev_es.ghcb)
-		return;
+	/* Subsequent SIPI */
+	switch (svm->sev_es.ap_reset_hold_type) {
+	case AP_RESET_HOLD_NAE_EVENT:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+		 */
+		ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+		break;
+	case AP_RESET_HOLD_MSR_PROTO:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set GHCB data field to a non-zero value.
+		 */
+		set_ghcb_msr_bits(svm, 1,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
 
-	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	default:
+		break;
+	}
 }
 
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7f1fbd874c45..eecb2b744d79 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -197,6 +197,7 @@ struct vcpu_sev_es_state {
 	u8 valid_bitmap[16];
 	struct kvm_host_map ghcb_map;
 	bool received_first_sipi;
+	unsigned int ap_reset_hold_type;
 
 	/* SEV-ES scratch area support */
 	u64 sw_scratch;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 13/35] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (11 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 12/35] KVM: SEV: Add support to handle AP reset MSR protocol Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 14/35] KVM: SEV: Add initial SEV-SNP support Michael Roth
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  2 ++
 arch/x86/kvm/svm/sev.c            | 12 ++++++++++++
 arch/x86/kvm/svm/svm.h            |  3 ++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 01261f7054ad..5a8246dd532f 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
 /* GHCB Hypervisor Feature Request/Response */
 #define GHCB_MSR_HV_FT_REQ		0x080
 #define GHCB_MSR_HV_FT_RESP		0x081
+#define GHCB_MSR_HV_FT_POS		12
+#define GHCB_MSR_HV_FT_MASK		GENMASK_ULL(51, 0)
 #define GHCB_MSR_HV_FT_RESP_VAL(v)			\
 	/* GHCBData[63:12] */				\
 	(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b09bdaed586e..d6e206d21750 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2574,6 +2574,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+	case SVM_VMGEXIT_HV_FEATURES:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -2834,6 +2835,12 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK,
 				  GHCB_MSR_INFO_POS);
 		break;
+	case GHCB_MSR_HV_FT_REQ:
+		set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+				  GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2958,6 +2965,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_HV_FEATURES:
+		ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_HV_FT_SUPPORTED);
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index eecb2b744d79..d0f8167ada7c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -665,9 +665,10 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
-#define GHCB_VERSION_MAX	1ULL
+#define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
+#define GHCB_HV_FT_SUPPORTED	GHCB_HV_FT_SNP
 
 extern unsigned int max_sev_asid;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 14/35] KVM: SEV: Add initial SEV-SNP support
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (12 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 13/35] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 15/35] KVM: SEV: Add KVM_SNP_INIT command Michael Roth
                   ` (20 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based security protection. SEV-SNP adds strong memory
encryption and integrity protection to help prevent malicious
hypervisor-based attacks such as data replay, memory re-mapping, and
more, to create an isolated execution environment.

Implement some initial infrastructure in KVM to check/report when SNP is
enabled on the system.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: commit fixups, use similar ASID reporting as with SEV/SEV-ES]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 10 ++++++++++
 arch/x86/kvm/svm/svm.h |  8 ++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d6e206d21750..18c09863377b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -59,10 +59,13 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 /* enable/disable SEV-ES DebugSwap support */
 static bool sev_es_debug_swap_enabled = true;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
+
+static bool sev_snp_enabled;
 #else
 #define sev_enabled false
 #define sev_es_enabled false
 #define sev_es_debug_swap_enabled false
+#define sev_snp_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
 #define AP_RESET_HOLD_NONE		0
@@ -2189,6 +2192,7 @@ void __init sev_hardware_setup(void)
 {
 #ifdef CONFIG_KVM_AMD_SEV
 	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+	bool sev_snp_supported = false;
 	bool sev_es_supported = false;
 	bool sev_supported = false;
 
@@ -2267,6 +2271,7 @@ void __init sev_hardware_setup(void)
 	sev_es_asid_count = min_sev_asid - 1;
 	WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
 	sev_es_supported = true;
+	sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
 
 out:
 	if (boot_cpu_has(X86_FEATURE_SEV))
@@ -2277,12 +2282,17 @@ void __init sev_hardware_setup(void)
 		pr_info("SEV-ES %s (ASIDs %u - %u)\n",
 			sev_es_supported ? "enabled" : "disabled",
 			min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
+	if (boot_cpu_has(X86_FEATURE_SEV_SNP))
+		pr_info("SEV-SNP %s (ASIDs %u - %u)\n",
+			sev_snp_supported ? "enabled" : "disabled",
+			min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
 
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
 	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
 	    !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
 		sev_es_debug_swap_enabled = false;
+	sev_snp_enabled = sev_snp_supported;
 #endif
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d0f8167ada7c..a3e27c82866b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -79,6 +79,7 @@ enum {
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
+	bool snp_active;	/* SEV-SNP enabled guest */
 	unsigned int asid;	/* ASID used for this guest */
 	unsigned int handle;	/* SEV firmware handle */
 	int fd;			/* SEV device fd */
@@ -341,6 +342,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
 #endif
 }
 
+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return sev_es_guest(kvm) && sev->snp_active;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 15/35] KVM: SEV: Add KVM_SNP_INIT command
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (13 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 14/35] KVM: SEV: Add initial SEV-SNP support Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2024-02-06 23:51   ` Paolo Bonzini
  2024-03-20 17:28   ` Paolo Bonzini
  2023-12-30 17:23 ` [PATCH v11 16/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
                   ` (19 subsequent siblings)
  34 siblings, 2 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, Pavan Kumar Paluri

From: Brijesh Singh <brijesh.singh@amd.com>

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be
the first command issued. When creating SEV-SNP guest, the VMM must use
this command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

The flags value must be zero, it will be extended in future SNP support
to communicate the optional features (such as restricted INT injection
etc).

Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    | 27 +++++++++++++
 arch/x86/include/asm/svm.h                    |  1 +
 arch/x86/kvm/svm/sev.c                        | 39 ++++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |  4 ++
 include/uapi/linux/kvm.h                      | 13 +++++++
 5 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 995780088eb2..b1a19c9a577a 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -434,6 +434,33 @@ issued by the hypervisor to make the guest ready for execution.
 
 Returns: 0 on success, -negative on error
 
+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_snp_init {
+                __u64 flags;
+        };
+
+The flags bitmap is defined as::
+
+   /* enable the restricted injection */
+   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
+
+   /* enable the restricted injection timer */
+   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
 References
 ==========
 
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 87a7b917d30e..ba8ce15b27d7 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -286,6 +286,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 
 #define SVM_SEV_FEAT_DEBUG_SWAP                        BIT(5)
+#define SVM_SEV_FEAT_SNP_ACTIVE                        BIT(0)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 18c09863377b..43b8ae7b74f8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -250,6 +250,25 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 	sev_decommission(handle);
 }
 
+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_snp_init params;
+	int ret = 0;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+		ret = -EOPNOTSUPP;
+
+	params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
+		ret = -EFAULT;
+
+	return ret;
+}
+
 static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -264,12 +283,19 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		return ret;
 
 	sev->active = true;
-	sev->es_active = argp->id == KVM_SEV_ES_INIT;
+	sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
+	sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
 	asid = sev_asid_new(sev);
 	if (asid < 0)
 		goto e_no_asid;
 	sev->asid = asid;
 
+	if (sev->snp_active) {
+		ret = verify_snp_init_flags(kvm, argp);
+		if (ret)
+			goto e_free;
+	}
+
 	init_args.probe = false;
 	ret = sev_platform_init(&init_args);
 	if (ret)
@@ -287,6 +313,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	sev_asid_free(sev);
 	sev->asid = 0;
 e_no_asid:
+	sev->snp_active = false;
 	sev->es_active = false;
 	sev->active = false;
 	return ret;
@@ -625,6 +652,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	if (sev_es_debug_swap_enabled)
 		save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
 
+	/* Enable the SEV-SNP feature */
+	if (sev_snp_guest(svm->vcpu.kvm))
+		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
@@ -1883,6 +1914,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	}
 
 	switch (sev_cmd.id) {
+	case KVM_SEV_SNP_INIT:
+		if (!sev_snp_enabled) {
+			r = -ENOTTY;
+			goto out;
+		}
+		fallthrough;
 	case KVM_SEV_ES_INIT:
 		if (!sev_es_enabled) {
 			r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a3e27c82866b..07a9eb5b6ce5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -76,6 +76,9 @@ enum {
 /* TPR and CR2 are always written before VMRUN */
 #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
 
+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS		0x0
+
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
@@ -91,6 +94,7 @@ struct kvm_sev_info {
 	struct list_head mirror_entry; /* Use as a list entry of mirrors */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	atomic_t migration_in_progress;
+	u64 snp_init_flags;
 };
 
 struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c3308536482b..73702e9b9d76 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1869,6 +1869,9 @@ enum sev_cmd_id {
 	/* Guest Migration Extension */
 	KVM_SEV_SEND_CANCEL,
 
+	/* SNP specific commands */
+	KVM_SEV_SNP_INIT,
+
 	KVM_SEV_NR_MAX,
 };
 
@@ -1965,6 +1968,16 @@ struct kvm_sev_receive_update_data {
 	__u32 trans_len;
 };
 
+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
+
+struct kvm_snp_init {
+	__u64 flags;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 16/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (14 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 15/35] KVM: SEV: Add KVM_SNP_INIT command Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 17/35] KVM: Add HVA range operator Michael Roth
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: hold sev_deactivate_lock when calling SEV_CMD_SNP_DECOMMISSION]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  24 +++
 arch/x86/kvm/svm/sev.c                        | 144 +++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |   1 +
 include/uapi/linux/kvm.h                      |  10 ++
 4 files changed, 176 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index b1a19c9a577a..b1beb2fe8766 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -461,6 +461,30 @@ The flags bitmap is defined as::
 If the specified flags is not supported then return -EOPNOTSUPP, and the supported
 flags are returned.
 
+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct  kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_start {
+                __u64 policy;           /* Guest policy to use. */
+                __u64 ma_uaddr;         /* userspace address of migration agent */
+                __u8 ma_en;             /* 1 if the migration agent is enabled */
+                __u8 imi_en;            /* set IMI to 1. */
+                __u8 gosvw[16];         /* guest OS visible workarounds */
+        };
+
+See the SEV-SNP specification for further detail on the launch input.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 43b8ae7b74f8..e2f4d4bc125c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -24,6 +24,7 @@
 #include <asm/trapnr.h>
 #include <asm/fpu/xcr.h>
 #include <asm/debugreg.h>
+#include <asm/sev.h>
 
 #include "mmu.h"
 #include "x86.h"
@@ -72,6 +73,10 @@ static bool sev_snp_enabled;
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
 
+/* As defined by SEV-SNP Firmware ABI, under "Guest Policy". */
+#define SNP_POLICY_MASK_SMT		BIT_ULL(16)
+#define SNP_POLICY_MASK_SINGLE_SOCKET	BIT_ULL(20)
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -82,6 +87,8 @@ static unsigned int nr_asids;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 
+static int snp_decommission_context(struct kvm *kvm);
+
 struct enc_region {
 	struct list_head list;
 	unsigned long npages;
@@ -107,12 +114,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
 	down_write(&sev_deactivate_lock);
 
 	wbinvd_on_all_cpus();
-	ret = sev_guest_df_flush(&error);
+
+	if (sev_snp_enabled)
+		ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+	else
+		ret = sev_guest_df_flush(&error);
 
 	up_write(&sev_deactivate_lock);
 
 	if (ret)
-		pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+		pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+		       sev_snp_enabled ? "-SNP" : "", ret, error);
 
 	return ret;
 }
@@ -1890,6 +1902,94 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct sev_data_snp_addr data = {};
+	void *context;
+	int rc;
+
+	/* Allocate memory for context page */
+	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!context)
+		return NULL;
+
+	data.gctx_paddr = __psp_pa(context);
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+	if (rc) {
+		snp_free_firmware_page(context);
+		return NULL;
+	}
+
+	return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_activate data = {0};
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.asid   = sev_get_asid(kvm);
+	return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_start start = {0};
+	struct kvm_sev_snp_launch_start params;
+	int rc;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	/* Don't allow userspace to allocate memory for more than 1 SNP context. */
+	if (sev->snp_context)
+		return -EINVAL;
+
+	sev->snp_context = snp_context_create(kvm, argp);
+	if (!sev->snp_context)
+		return -ENOTTY;
+
+	if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET) {
+		pr_warn("SEV-SNP hypervisor does not support limiting guests to a single socket.");
+		return -EINVAL;
+	}
+
+	if (!(params.policy & SNP_POLICY_MASK_SMT)) {
+		pr_warn("SEV-SNP hypervisor does not support limiting guests to a single SMT thread.");
+		return -EINVAL;
+	}
+
+	start.gctx_paddr = __psp_pa(sev->snp_context);
+	start.policy = params.policy;
+	memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	sev->fd = argp->sev_fd;
+	rc = snp_bind_asid(kvm, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	return 0;
+
+e_free_context:
+	snp_decommission_context(kvm);
+
+	return rc;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1980,6 +2080,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_FINISH:
 		r = sev_receive_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_START:
+		r = snp_launch_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2172,6 +2275,33 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+static int snp_decommission_context(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_addr data = {};
+	int ret;
+
+	/* If context is not created then do nothing */
+	if (!sev->snp_context)
+		return 0;
+
+	data.gctx_paddr = __sme_pa(sev->snp_context);
+	down_write(&sev_deactivate_lock);
+	ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+	if (WARN_ONCE(ret, "failed to release guest context")) {
+		up_write(&sev_deactivate_lock);
+		return ret;
+	}
+
+	up_write(&sev_deactivate_lock);
+
+	/* free the context page now */
+	snp_free_firmware_page(sev->snp_context);
+	sev->snp_context = NULL;
+
+	return 0;
+}
+
 void sev_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2213,7 +2343,15 @@ void sev_vm_destroy(struct kvm *kvm)
 		}
 	}
 
-	sev_unbind_asid(kvm, sev->handle);
+	if (sev_snp_guest(kvm)) {
+		if (snp_decommission_context(kvm)) {
+			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+			return;
+		}
+	} else {
+		sev_unbind_asid(kvm, sev->handle);
+	}
+
 	sev_asid_free(sev);
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 07a9eb5b6ce5..9c633173b779 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -95,6 +95,7 @@ struct kvm_sev_info {
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	atomic_t migration_in_progress;
 	u64 snp_init_flags;
+	void *snp_context;      /* SNP guest context page */
 };
 
 struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 73702e9b9d76..9fe36408d55b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1871,6 +1871,7 @@ enum sev_cmd_id {
 
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
+	KVM_SEV_SNP_LAUNCH_START,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1978,6 +1979,15 @@ struct kvm_snp_init {
 	__u64 flags;
 };
 
+struct kvm_sev_snp_launch_start {
+	__u64 policy;
+	__u64 ma_uaddr;
+	__u8 ma_en;
+	__u8 imi_en;
+	__u8 gosvw[16];
+	__u8 pad[6];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 17/35] KVM: Add HVA range operator
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (15 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 16/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Vishal Annapurve

From: Vishal Annapurve <vannapurve@google.com>

Introduce HVA range operator so that other KVM subsystems can operate on
HVA ranges.

Signed-off-by: Vishal Annapurve <vannapurve@google.com>
[mdr: minor checkpatch alignment fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/kvm_host.h |  6 +++++
 virt/kvm/kvm_main.c      | 49 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a2a8331fbb94..bc3a468e97e3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1429,6 +1429,12 @@ void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
 void kvm_mmu_invalidate_end(struct kvm *kvm);
 bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 
+typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
+				struct kvm_gfn_range *range, void *data);
+
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
+
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg);
 long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4fd0fb0044f5..03243a7ece08 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -667,6 +667,55 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
 	return r;
 }
 
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
+{
+	int ret = 0;
+	struct kvm_gfn_range gfn_range;
+	struct kvm_memory_slot *slot;
+	struct kvm_memslots *slots;
+	int i, idx;
+
+	if (WARN_ON_ONCE(hva_end <= hva_start))
+		return -EINVAL;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+		struct interval_tree_node *node;
+
+		slots = __kvm_memslots(kvm, i);
+		kvm_for_each_memslot_in_hva_range(node, slots,
+						  hva_start, hva_end - 1) {
+			unsigned long start, end;
+
+			slot = container_of(node, struct kvm_memory_slot,
+					    hva_node[slots->node_idx]);
+			start = max(hva_start, slot->userspace_addr);
+			end = min(hva_end, slot->userspace_addr +
+						  (slot->npages << PAGE_SHIFT));
+
+			/*
+			 * {gfn(page) | page intersects with [hva_start, hva_end)} =
+			 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
+			 */
+			gfn_range.start = hva_to_gfn_memslot(start, slot);
+			gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
+			gfn_range.slot = slot;
+
+			ret = handler(kvm, &gfn_range, data);
+			if (ret)
+				goto e_ret;
+		}
+	}
+
+e_ret:
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_do_hva_range_op);
+
 static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
 						unsigned long start,
 						unsigned long end,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (16 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 17/35] KVM: Add HVA range operator Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2024-01-10 15:45   ` Sean Christopherson
  2024-03-18 21:02   ` Peter Gonda
  2023-12-30 17:23 ` [PATCH v11 19/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
                   ` (16 subsequent siblings)
  34 siblings, 2 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into
the guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

While terminating the guest, reclaim the guest pages added in the RMP
table. If the reclaim fails, then the page is no longer safe to be
released back to the system and leak them.

For more information see the SEV-SNP specification.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  28 +++
 arch/x86/kvm/svm/sev.c                        | 181 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |  19 ++
 3 files changed, 228 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index b1beb2fe8766..d4325b26724c 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error
 
 See the SEV-SNP specification for further detail on the launch input.
 
+20. KVM_SNP_LAUNCH_UPDATE
+-------------------------
+
+The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
+calculates a measurement of the memory contents. The measurement is a signature
+of the memory contents that can be sent to the guest owner as an attestation
+that the memory was encrypted correctly by the firmware.
+
+Parameters (in): struct  kvm_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_update {
+                __u64 start_gfn;        /* Guest page number to start from. */
+                __u64 uaddr;            /* userspace address need to be encrypted */
+                __u32 len;              /* length of memory region */
+                __u8 imi_page;          /* 1 if memory is part of the IMI */
+                __u8 page_type;         /* page type */
+                __u8 vmpl3_perms;       /* VMPL3 permission mask */
+                __u8 vmpl2_perms;       /* VMPL2 permission mask */
+                __u8 vmpl1_perms;       /* VMPL1 permission mask */
+        };
+
+See the SEV-SNP spec for further details on how to build the VMPL permission
+mask and page type.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e2f4d4bc125c..d60209e6e68b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -245,6 +245,36 @@ static void sev_decommission(unsigned int handle)
 	sev_guest_decommission(&decommission, NULL);
 }
 
+static int snp_page_reclaim(u64 pfn)
+{
+	struct sev_data_snp_page_reclaim data = {0};
+	int err, rc;
+
+	data.paddr = __sme_set(pfn << PAGE_SHIFT);
+	rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+	if (rc) {
+		/*
+		 * If the reclaim failed, then page is no longer safe
+		 * to use.
+		 */
+		snp_leak_pages(pfn, 1);
+	}
+
+	return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+	int rc;
+
+	rc = rmp_make_shared(pfn, level);
+	if (rc && leak)
+		snp_leak_pages(pfn,
+			       page_level_size(level) >> PAGE_SHIFT);
+
+	return rc;
+}
+
 static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 {
 	struct sev_data_deactivate deactivate;
@@ -1990,6 +2020,154 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return rc;
 }
 
+static int snp_launch_update_gfn_handler(struct kvm *kvm,
+					 struct kvm_gfn_range *range,
+					 void *opaque)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_memory_slot *memslot = range->slot;
+	struct sev_data_snp_launch_update data = {0};
+	struct kvm_sev_snp_launch_update params;
+	struct kvm_sev_cmd *argp = opaque;
+	int *error = &argp->error;
+	int i, n = 0, ret = 0;
+	unsigned long npages;
+	kvm_pfn_t *pfns;
+	gfn_t gfn;
+
+	if (!kvm_slot_can_be_private(memslot)) {
+		pr_err("SEV-SNP requires private memory support via guest_memfd.\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
+		pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
+		return -EFAULT;
+	}
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+
+	npages = range->end - range->start;
+	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
+	if (!pfns)
+		return -ENOMEM;
+
+	pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
+		 range->start, range->end, params.page_type);
+
+	for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
+		int order, level;
+		bool assigned;
+		void *kvaddr;
+
+		ret = __kvm_gmem_get_pfn(kvm, memslot, gfn, &pfns[i], &order, false);
+		if (ret)
+			goto e_release;
+
+		n++;
+		ret = snp_lookup_rmpentry((u64)pfns[i], &assigned, &level);
+		if (ret || assigned) {
+			pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d, assigned: %d\n",
+			       gfn, ret, assigned);
+			return -EFAULT;
+		}
+
+		kvaddr = pfn_to_kaddr(pfns[i]);
+		if (!virt_addr_valid(kvaddr)) {
+			pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
+			ret = -EINVAL;
+			goto e_release;
+		}
+
+		ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+		if (ret) {
+			pr_err("Guest read failed, ret: 0x%x\n", ret);
+			goto e_release;
+		}
+
+		ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
+				       sev_get_asid(kvm), true);
+		if (ret) {
+			ret = -EFAULT;
+			goto e_release;
+		}
+
+		data.address = __sme_set(pfns[i] << PAGE_SHIFT);
+		data.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
+		data.page_type = params.page_type;
+		data.vmpl3_perms = params.vmpl3_perms;
+		data.vmpl2_perms = params.vmpl2_perms;
+		data.vmpl1_perms = params.vmpl1_perms;
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, error);
+		if (ret) {
+			pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
+			       ret, *error);
+			snp_page_reclaim(pfns[i]);
+
+			/*
+			 * When invalid CPUID function entries are detected, the firmware
+			 * corrects these entries for debugging purpose and leaves the
+			 * page unencrypted so it can be provided users for debugging
+			 * and error-reporting.
+			 *
+			 * Copy the corrected CPUID page back to shared memory so
+			 * userpsace can retrieve this information.
+			 */
+			if (params.page_type == SNP_PAGE_TYPE_CPUID &&
+			    *error == SEV_RET_INVALID_PARAM) {
+				int ret;
+
+				host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+				ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+				if (ret)
+					pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
+					       ret);
+			}
+
+			goto e_release;
+		}
+	}
+
+e_release:
+	/* Content of memory is updated, mark pages dirty */
+	for (i = 0; i < n; i++) {
+		set_page_dirty(pfn_to_page(pfns[i]));
+		mark_page_accessed(pfn_to_page(pfns[i]));
+
+		/*
+		 * If its an error, then update RMP entry to change page ownership
+		 * to the hypervisor.
+		 */
+		if (ret)
+			host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+		put_page(pfn_to_page(pfns[i]));
+	}
+
+	kvfree(pfns);
+	return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_launch_update params;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
+				      snp_launch_update_gfn_handler, argp);
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2083,6 +2261,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_START:
 		r = snp_launch_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_UPDATE:
+		r = snp_launch_update(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9fe36408d55b..6e6e3a478022 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1872,6 +1872,7 @@ enum sev_cmd_id {
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
+	KVM_SEV_SNP_LAUNCH_UPDATE,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1988,6 +1989,24 @@ struct kvm_sev_snp_launch_start {
 	__u8 pad[6];
 };
 
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA		0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS		0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID		0x6
+
+struct kvm_sev_snp_launch_update {
+	__u64 start_gfn;
+	__u64 uaddr;
+	__u32 len;
+	__u8 imi_page;
+	__u8 page_type;
+	__u8 vmpl3_perms;
+	__u8 vmpl2_perms;
+	__u8 vmpl1_perms;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 19/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (17 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 20/35] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, Harald Hoyer

From: Brijesh Singh <brijesh.singh@amd.com>

The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

If its an SNP guest, then VMSA was added in the RMP entry as
a guest owned page and also removed from the kernel direct map
so flush it later after it is transitioned back to hypervisor
state and restored in the direct map.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: always measure BSP first to get consistent launch measurements]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  24 +++
 arch/x86/kvm/svm/sev.c                        | 146 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |  14 ++
 3 files changed, 184 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index d4325b26724c..b89634cfcc06 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -513,6 +513,30 @@ Returns: 0 on success, -negative on error
 See the SEV-SNP spec for further details on how to build the VMPL permission
 mask and page type.
 
+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_finish {
+                __u64 id_block_uaddr;
+                __u64 id_auth_uaddr;
+                __u8 id_block_en;
+                __u8 auth_key_en;
+                __u8 host_data[32];
+                __u8 pad[6];
+        };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d60209e6e68b..ada40a79b2f7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -77,6 +77,8 @@ static bool sev_snp_enabled;
 #define SNP_POLICY_MASK_SMT		BIT_ULL(16)
 #define SNP_POLICY_MASK_SINGLE_SOCKET	BIT_ULL(20)
 
+#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -749,7 +751,29 @@ static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (!sev_es_guest(kvm))
 		return -ENOTTY;
 
+	/* Handle boot vCPU first to ensure consistent measurement of initial state. */
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (vcpu->vcpu_id != 0)
+			continue;
+
+		ret = mutex_lock_killable(&vcpu->mutex);
+		if (ret)
+			return ret;
+
+		ret = __sev_launch_update_vmsa(kvm, vcpu, &argp->error);
+
+		mutex_unlock(&vcpu->mutex);
+		if (ret)
+			return ret;
+
+		break;
+	}
+
+	/* Handle remaining vCPUs. */
 	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (vcpu->vcpu_id == 0)
+			continue;
+
 		ret = mutex_lock_killable(&vcpu->mutex);
 		if (ret)
 			return ret;
@@ -2168,6 +2192,109 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 				      snp_launch_update_gfn_handler, argp);
 }
 
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {};
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+	int ret;
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.page_type = SNP_PAGE_TYPE_VMSA;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct vcpu_svm *svm = to_svm(vcpu);
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		/* Perform some pre-encryption checks against the VMSA */
+		ret = sev_es_sync_vmsa(svm);
+		if (ret)
+			return ret;
+
+		/* Transition the VMSA page to a firmware state. */
+		ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+		if (ret)
+			return ret;
+
+		/* Issue the SNP command to encrypt the VMSA */
+		data.address = __sme_pa(svm->sev_es.vmsa);
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, &argp->error);
+		if (ret) {
+			snp_page_reclaim(pfn);
+			return ret;
+		}
+
+		svm->vcpu.arch.guest_state_protected = true;
+	}
+
+	return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_launch_finish params;
+	struct sev_data_snp_launch_finish *data;
+	void *id_block = NULL, *id_auth = NULL;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	/* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+	ret = snp_launch_update_vmsa(kvm, argp);
+	if (ret)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	if (params.id_block_en) {
+		id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+		if (IS_ERR(id_block)) {
+			ret = PTR_ERR(id_block);
+			goto e_free;
+		}
+
+		data->id_block_en = 1;
+		data->id_block_paddr = __sme_pa(id_block);
+
+		id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+		if (IS_ERR(id_auth)) {
+			ret = PTR_ERR(id_auth);
+			goto e_free_id_block;
+		}
+
+		data->id_auth_paddr = __sme_pa(id_auth);
+
+		if (params.auth_key_en)
+			data->auth_key_en = 1;
+	}
+
+	memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+	kfree(id_auth);
+
+e_free_id_block:
+	kfree(id_block);
+
+e_free:
+	kfree(data);
+
+	return ret;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2264,6 +2391,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_UPDATE:
 		r = snp_launch_update(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_FINISH:
+		r = snp_launch_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2736,11 +2866,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 
 	svm = to_svm(vcpu);
 
+	/*
+	 * If its an SNP guest, then VMSA was added in the RMP entry as
+	 * a guest owned page. Transition the page to hypervisor state
+	 * before releasing it back to the system.
+	 * Also the page is removed from the kernel direct map, so flush it
+	 * later after it is transitioned back to hypervisor state and
+	 * restored in the direct map.
+	 */
+	if (sev_snp_guest(vcpu->kvm)) {
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
+			goto skip_vmsa_free;
+	}
+
 	if (vcpu->arch.guest_state_protected)
 		sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
 
 	__free_page(virt_to_page(svm->sev_es.vmsa));
 
+skip_vmsa_free:
 	if (svm->sev_es.ghcb_sa_free)
 		kvfree(svm->sev_es.ghcb_sa);
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6e6e3a478022..5218075fe1f4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1873,6 +1873,7 @@ enum sev_cmd_id {
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
+	KVM_SEV_SNP_LAUNCH_FINISH,
 
 	KVM_SEV_NR_MAX,
 };
@@ -2007,6 +2008,19 @@ struct kvm_sev_snp_launch_update {
 	__u8 vmpl1_perms;
 };
 
+#define KVM_SEV_SNP_ID_BLOCK_SIZE	96
+#define KVM_SEV_SNP_ID_AUTH_SIZE	4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE	32
+
+struct kvm_sev_snp_launch_finish {
+	__u64 id_block_uaddr;
+	__u64 id_auth_uaddr;
+	__u8 id_block_en;
+	__u8 auth_key_en;
+	__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+	__u8 pad[6];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 20/35] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (18 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 19/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 21/35] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, verify that the GHCB GPA matches with the registered value.
If a mismatch is detected, then abort the guest.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  8 ++++++++
 arch/x86/kvm/svm/sev.c            | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h            |  7 +++++++
 3 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 5a8246dd532f..1006bfffe07a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ		0x010
+#define GHCB_MSR_GPA_VALUE_POS		12
+#define GHCB_MSR_GPA_VALUE_MASK		GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP		0x011
+#define GHCB_MSR_PREF_GPA_NONE		0xfffffffffffff
+
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
 #define GHCB_MSR_REG_GPA_REQ_VAL(v)			\
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ada40a79b2f7..0b8837e21705 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3353,6 +3353,26 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
 				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
 		break;
+	case GHCB_MSR_PREF_GPA_REQ:
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	case GHCB_MSR_REG_GPA_REQ: {
+		u64 gfn;
+
+		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+					GHCB_MSR_GPA_VALUE_POS);
+
+		svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+		set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3416,6 +3436,13 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 	trace_kvm_vmgexit_enter(vcpu->vcpu_id, svm->sev_es.ghcb);
 
 	sev_es_sync_from_ghcb(svm);
+
+	/* SEV-SNP guest requires that the GHCB GPA must be registered */
+	if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+		vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+		return -EINVAL;
+	}
+
 	ret = sev_es_validate_vmgexit(svm);
 	if (ret)
 		return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9c633173b779..2bee24017bae 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -211,6 +211,8 @@ struct vcpu_sev_es_state {
 	u32 ghcb_sa_len;
 	bool ghcb_sa_sync;
 	bool ghcb_sa_free;
+
+	u64 ghcb_registered_gpa;
 };
 
 struct vcpu_svm {
@@ -354,6 +356,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
 	return sev_es_guest(kvm) && sev->snp_active;
 }
 
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+	return svm->sev_es.ghcb_registered_gpa == val;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 21/35] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (19 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 20/35] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 22/35] KVM: SEV: Add support to handle " Michael Roth
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
backed by private memory or not.

Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.

Define a new KVM_EXIT_VMGEXIT for exits of this type, and structure it
so that it can be extended for other cases where VMGEXITs need some
level of handling in userspace.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/api.rst    | 33 +++++++++++++++++++++++++++++++
 arch/x86/include/asm/sev-common.h |  6 ++++++
 arch/x86/kvm/svm/sev.c            | 33 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          | 17 ++++++++++++++++
 4 files changed, 89 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 3ec0b7a455a0..682490230feb 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7031,6 +7031,39 @@ Please note that the kernel is allowed to use the kvm_run structure as the
 primary storage for certain register types. Therefore, the kernel may use the
 values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
+::
+
+		/* KVM_EXIT_VMGEXIT */
+		struct kvm_user_vmgexit {
+		#define KVM_USER_VMGEXIT_PSC_MSR	1
+			__u32 type; /* KVM_USER_VMGEXIT_* type */
+			union {
+				struct {
+					__u64 gpa;
+		#define KVM_USER_VMGEXIT_PSC_MSR_OP_PRIVATE	1
+		#define KVM_USER_VMGEXIT_PSC_MSR_OP_SHARED	2
+					__u8 op;
+					__u32 ret;
+				} psc_msr;
+			};
+		};
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
+has issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. These are generally handled by the host kernel, but in some
+cases some aspects handling a VMGEXIT are handled by userspace.
+
+A kvm_user_vmgexit structure is defined to encapsulate the data to be
+sent to or returned by userspace. The type field defines the specific type
+of exit that needs to be serviced, and that type is used as a discriminator
+to determine which union type should be used for input/output.
+
+For the KVM_USER_VMGEXIT_PSC_MSR type, the psc_msr union type is used. The
+kernel will supply the 'gpa' and 'op' fields, and userspace is expected to
+update the private/shared state of the GPA using the corresponding
+KVM_SET_MEMORY_ATTRIBUTES ioctl. The 'ret' field is to be set to 0 by
+userpace on success, or some non-zero value on failure.
 
 6. Capabilities that can be enabled on vCPUs
 ============================================
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1006bfffe07a..6d68db812de1 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,11 +101,17 @@ enum psc_op {
 	/* GHCBData[11:0] */				\
 	GHCB_MSR_PSC_REQ)
 
+#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
+#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
+
 #define GHCB_MSR_PSC_RESP		0x015
 #define GHCB_MSR_PSC_RESP_VAL(val)			\
 	/* GHCBData[63:32] */				\
 	(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
 
+/* Set highest bit as a generic error response */
+#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
+
 /* GHCB Hypervisor Feature Request/Response */
 #define GHCB_MSR_HV_FT_REQ		0x080
 #define GHCB_MSR_HV_FT_RESP		0x081
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0b8837e21705..37e65d5700b8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3275,6 +3275,36 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	u64 vmm_ret = vcpu->run->vmgexit.psc_msr.ret;
+
+	set_ghcb_msr(svm, (vmm_ret << 32) | GHCB_MSR_PSC_RESP);
+
+	return 1; /* resume guest */
+}
+
+static int snp_begin_psc_msr(struct kvm_vcpu *vcpu, u64 ghcb_msr)
+{
+	u64 gpa = gfn_to_gpa(GHCB_MSR_PSC_REQ_TO_GFN(ghcb_msr));
+	u8 op = GHCB_MSR_PSC_REQ_TO_OP(ghcb_msr);
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (op != SNP_PAGE_STATE_PRIVATE && op != SNP_PAGE_STATE_SHARED) {
+		set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
+		return 1; /* resume guest */
+	}
+
+	vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+	vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_PSC_MSR;
+	vcpu->run->vmgexit.psc_msr.gpa = gpa;
+	vcpu->run->vmgexit.psc_msr.op = op;
+	vcpu->arch.complete_userspace_io = snp_complete_psc_msr;
+
+	return 0; /* forward request to userspace */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3373,6 +3403,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PSC_REQ:
+		ret = snp_begin_psc_msr(vcpu, control->ghcb_gpa);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5218075fe1f4..62093ddf7ec3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -167,6 +167,20 @@ struct kvm_xen_exit {
 	} u;
 };
 
+struct kvm_user_vmgexit {
+#define KVM_USER_VMGEXIT_PSC_MSR	1
+	__u32 type; /* KVM_USER_VMGEXIT_* type */
+	union {
+		struct {
+			__u64 gpa;
+#define KVM_USER_VMGEXIT_PSC_MSR_OP_PRIVATE	1
+#define KVM_USER_VMGEXIT_PSC_MSR_OP_SHARED	2
+			__u8 op;
+			__u32 ret;
+		} psc_msr;
+	};
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX        1048576
 
@@ -210,6 +224,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
+#define KVM_EXIT_VMGEXIT          40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -470,6 +485,8 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
+		/* KVM_EXIT_VMGEXIT */
+		struct kvm_user_vmgexit vmgexit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 22/35] KVM: SEV: Add support to handle Page State Change VMGEXIT
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (20 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 21/35] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 23/35] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/api.rst | 14 ++++++++++++++
 arch/x86/kvm/svm/sev.c         | 16 ++++++++++++++++
 include/uapi/linux/kvm.h       |  5 +++++
 3 files changed, 35 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 682490230feb..2a526b4f8e06 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7036,6 +7036,7 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 		/* KVM_EXIT_VMGEXIT */
 		struct kvm_user_vmgexit {
 		#define KVM_USER_VMGEXIT_PSC_MSR	1
+		#define KVM_USER_VMGEXIT_PSC		2
 			__u32 type; /* KVM_USER_VMGEXIT_* type */
 			union {
 				struct {
@@ -7045,9 +7046,14 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 					__u8 op;
 					__u32 ret;
 				} psc_msr;
+				struct {
+					__u64 shared_gpa;
+					__u64 ret;
+				} psc;
 			};
 		};
 
+
 If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
 has issued a VMGEXIT instruction (as documented by the AMD Architecture
 Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
@@ -7065,6 +7071,14 @@ update the private/shared state of the GPA using the corresponding
 KVM_SET_MEMORY_ATTRIBUTES ioctl. The 'ret' field is to be set to 0 by
 userpace on success, or some non-zero value on failure.
 
+For the KVM_USER_VMGEXIT_PSC type, the psc union type is used. The kernel
+will supply the GPA of the Page State Structure defined in the GHCB spec.
+Userspace will process this structure as defined by the GHCB, and issue
+KVM_SET_MEMORY_ATTRIBUTES ioctls to set the GPAs therein to the expected
+private/shared state. Userspace will return a value in 'ret' that is in
+agreement with the GHCB-defined return values that the guest will expect
+in the SW_EXITINFO2 field of the GHCB in response to these requests.
+
 6. Capabilities that can be enabled on vCPUs
 ============================================
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 37e65d5700b8..8b6143110411 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3087,6 +3087,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
+	case SVM_VMGEXIT_PSC:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3305,6 +3306,15 @@ static int snp_begin_psc_msr(struct kvm_vcpu *vcpu, u64 ghcb_msr)
 	return 0; /* forward request to userspace */
 }
 
+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, vcpu->run->vmgexit.psc.ret);
+
+	return 1; /* resume guest */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3542,6 +3552,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 
 		ret = 1;
 		break;
+	case SVM_VMGEXIT_PSC:
+		vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+		vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_PSC;
+		vcpu->run->vmgexit.psc.shared_gpa = svm->sev_es.sw_scratch;
+		vcpu->arch.complete_userspace_io = snp_complete_psc;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 62093ddf7ec3..e0599144387b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -169,6 +169,7 @@ struct kvm_xen_exit {
 
 struct kvm_user_vmgexit {
 #define KVM_USER_VMGEXIT_PSC_MSR	1
+#define KVM_USER_VMGEXIT_PSC		2
 	__u32 type; /* KVM_USER_VMGEXIT_* type */
 	union {
 		struct {
@@ -178,6 +179,10 @@ struct kvm_user_vmgexit {
 			__u8 op;
 			__u32 ret;
 		} psc_msr;
+		struct {
+			__u64 shared_gpa;
+			__u64 ret;
+		} psc;
 	};
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 23/35] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (21 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 22/35] KVM: SEV: Add support to handle " Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 24/35] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

While resolving the RMP page fault, there may be cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, zap the gfn range after
splitting the pages in the RMP entry. The zap should force the TDP to
gets rebuilt with the new page level.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu.h              | 2 --
 arch/x86/kvm/mmu/mmu.c          | 1 +
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 843695217b4b..3fdcbb1da856 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1923,6 +1923,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 60f21bb4c27b..df4d2c137a67 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -252,8 +252,6 @@ static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
 	return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm));
 }
 
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 61213f6648a1..1882096fba3e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6702,6 +6702,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 
 	return need_tlb_flush;
 }
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
 
 static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 					   const struct kvm_memory_slot *slot)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 24/35] KVM: SEV: Add support to handle RMP nested page faults
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (22 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 23/35] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 25/35] KVM: SEV: Use a VMSA physical address variable for populating VMCB Michael Roth
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.

When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/sev.h |  3 ++
 arch/x86/kvm/svm/sev.c     | 92 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c     | 21 +++++++--
 arch/x86/kvm/svm/svm.h     |  1 +
 4 files changed, 113 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 435ba9bc4510..e84dd1d2d8ab 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -90,6 +90,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 /* RMUPDATE detected 4K page and 2MB page overlap. */
 #define RMPUPDATE_FAIL_OVERLAP		4
 
+/* PSMASH failed due to concurrent access by another CPU */
+#define PSMASH_FAIL_INUSE		3
+
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
 #define RMP_PG_SIZE_2M			1
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8b6143110411..ad1aea7f6266 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3276,6 +3276,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_rmptable_psmash(kvm_pfn_t pfn)
+{
+	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+	return psmash(pfn);
+}
+
 static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3835,3 +3842,88 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 
 	return p;
 }
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+	struct kvm_memory_slot *slot;
+	struct kvm *kvm = vcpu->kvm;
+	int order, rmp_level, ret;
+	bool assigned;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+
+	gfn = gpa >> PAGE_SHIFT;
+
+	/*
+	 * The only time RMP faults occur for shared pages is when the guest is
+	 * triggering an RMP fault for an implicit page-state change from
+	 * shared->private. Implicit page-state changes are forwarded to
+	 * userspace via KVM_EXIT_MEMORY_FAULT events, however, so RMP faults
+	 * for shared pages should not end up here.
+	 */
+	if (!kvm_mem_is_private(kvm, gfn)) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+	if (ret) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, no private backing page for GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+	if (ret || !assigned) {
+		pr_warn_ratelimited("SEV: Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
+				    gpa, pfn, ret);
+		goto out;
+	}
+
+	/*
+	 * There are 2 cases where a PSMASH may be needed to resolve an #NPF
+	 * with PFERR_GUEST_RMP_BIT set:
+	 *
+	 * 1) RMPADJUST/PVALIDATE can trigger an #NPF with PFERR_GUEST_SIZEM
+	 *    bit set if the guest issues them with a smaller granularity than
+	 *    what is indicated by the page-size bit in the 2MB-aligned RMP
+	 *    entry for the PFN that backs the GPA.
+	 *
+	 * 2) Guest access via NPT can trigger an #NPF if the NPT mapping is
+	 *    smaller than what is indicated by the 2MB-aligned RMP entry for
+	 *    the PFN that backs the GPA.
+	 *
+	 * In both these cases, the corresponding 2M RMP entry needs to
+	 * be PSMASH'd to 512 4K RMP entries.  If the RMP entry is already
+	 * split into 4K RMP entries, then this is likely a spurious case which
+	 * can occur when there are concurrent accesses by the guest to a 2MB
+	 * GPA range that is backed by a 2MB-aligned PFN who's RMP entry is in
+	 * the process of being PMASH'd into 4K entries. These cases should
+	 * resolve automatically on subsequent accesses, so just ignore them
+	 * here.
+	 */
+	if (rmp_level == PG_LEVEL_4K) {
+		pr_debug_ratelimited("%s: Spurious RMP fault for GPA 0x%llx, error_code 0x%llx",
+				     __func__, gpa, error_code);
+		goto out;
+	}
+
+	pr_debug_ratelimited("%s: Splitting 2M RMP entry for GPA 0x%llx, error_code 0x%llx",
+			     __func__, gpa, error_code);
+	ret = snp_rmptable_psmash(pfn);
+	if (ret && ret != PSMASH_FAIL_INUSE)
+		pr_err_ratelimited("SEV: Unable to split RMP entry for GPA 0x%llx PFN 0x%llx ret %d\n",
+				   gpa, pfn, ret);
+
+	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+out:
+	put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 18d55df7fa5f..4367da074612 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2051,15 +2051,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 static int npf_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	int rc;
 
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
 	trace_kvm_page_fault(vcpu, fault_address, error_code);
-	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
-			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
-			svm->vmcb->control.insn_bytes : NULL,
-			svm->vmcb->control.insn_len);
+	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+				svm->vmcb->control.insn_bytes : NULL,
+				svm->vmcb->control.insn_len);
+
+	/*
+	 * rc == 0 indicates a userspace exit is needed to handle page
+	 * transitions, so do that first before updating the RMP table.
+	 */
+	if (error_code & PFERR_GUEST_RMP_MASK) {
+		if (rc == 0)
+			return rc;
+		handle_rmp_page_fault(vcpu, fault_address, error_code);
+	}
+
+	return rc;
 }
 
 static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2bee24017bae..fb98d88d8124 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -717,6 +717,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 25/35] KVM: SEV: Use a VMSA physical address variable for populating VMCB
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (23 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 24/35] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 26/35] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

From: Tom Lendacky <thomas.lendacky@amd.com>

In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 3 +--
 arch/x86/kvm/svm/svm.c | 9 ++++++++-
 arch/x86/kvm/svm/svm.h | 1 +
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ad1aea7f6266..996b5a668938 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3659,8 +3659,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	 * the VMSA will be NULL if this vCPU is the destination for intrahost
 	 * migration, and will be copied later.
 	 */
-	if (svm->sev_es.vmsa)
-		svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
+	svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
 
 	/* Can't intercept CR register access, HV can't modify CR registers */
 	svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4367da074612..da49e4981d75 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1459,9 +1459,16 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 	svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
 	svm_switch_vmcb(svm, &svm->vmcb01);
 
-	if (vmsa_page)
+	if (vmsa_page) {
 		svm->sev_es.vmsa = page_address(vmsa_page);
 
+		/*
+		 * Do not include the encryption mask on the VMSA physical
+		 * address since hardware will access it using the guest key.
+		 */
+		svm->sev_es.vmsa_pa = __pa(svm->sev_es.vmsa);
+	}
+
 	svm->guest_state_loaded = false;
 
 	return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index fb98d88d8124..4ef41f4d4ee6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -202,6 +202,7 @@ struct vcpu_sev_es_state {
 	struct ghcb *ghcb;
 	u8 valid_bitmap[16];
 	struct kvm_host_map ghcb_map;
+	hpa_t vmsa_pa;
 	bool received_first_sipi;
 	unsigned int ap_reset_hold_type;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 26/35] KVM: SEV: Support SEV-SNP AP Creation NAE event
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (24 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 25/35] KVM: SEV: Use a VMSA physical address variable for populating VMCB Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2024-01-05 22:08   ` Jacob Xu
  2023-12-30 17:23 ` [PATCH v11 27/35] KVM: SEV: Add support for GHCB-based termination requests Michael Roth
                   ` (8 subsequent siblings)
  34 siblings, 1 reply; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID. The GPA is saved in the svm struct of the
  target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
  to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID the next time an INIT is performed. The GPA is
  saved in the svm struct of the target vCPU.

For DESTROY:
  The guest indicates it wishes to stop the vCPU. The GPA is cleared
  from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
  added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. The handler sets the
vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
leave the vCPU as not runnable. Any previous VMSA pages that were
installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
a new VMSA is to be installed, the VMSA guest page is pinned and set as
the VMSA in the vCPU VMCB and the vCPU state is set to
KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
cleared in the vCPU VMCB and the vCPU state is left as
KVM_MP_STATE_UNINITIALIZED to prevent it from being run.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: add handling for gmem]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/include/asm/svm.h      |   5 +
 arch/x86/kvm/svm/sev.c          | 219 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |   3 +
 arch/x86/kvm/svm/svm.h          |   8 +-
 arch/x86/kvm/x86.c              |  11 ++
 6 files changed, 246 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3fdcbb1da856..9e45402e51bc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -121,6 +121,7 @@
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(34)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ba8ce15b27d7..4b73cf5e9de0 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -287,6 +287,11 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 
 #define SVM_SEV_FEAT_DEBUG_SWAP                        BIT(5)
 #define SVM_SEV_FEAT_SNP_ACTIVE                        BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION              BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION	       BIT(4)
+#define SVM_SEV_FEAT_INT_INJ_MODES		\
+	(SVM_SEV_FEAT_RESTRICTED_INJECTION |	\
+	 SVM_SEV_FEAT_ALTERNATE_INJECTION)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 996b5a668938..3bb89c4df5d6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -652,6 +652,7 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 {
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
 	struct sev_es_save_area *save = svm->sev_es.vmsa;
 
 	/* Check some debug related fields before encrypting the VMSA */
@@ -700,6 +701,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	if (sev_snp_guest(svm->vcpu.kvm))
 		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
 
+	/*
+	 * Save the VMSA synced SEV features. For now, they are the same for
+	 * all vCPUs, so just save each time.
+	 */
+	sev->sev_features = save->sev_features;
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
@@ -3082,6 +3089,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		if (!kvm_ghcb_sw_scratch_is_valid(svm))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
+			if (!kvm_ghcb_rax_is_valid(svm))
+				goto vmgexit_err;
+		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3322,6 +3334,202 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
 	return 1; /* resume guest */
 }
 
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	hpa_t cur_pa;
+
+	WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+	/* Save off the current VMSA PA for later checks */
+	cur_pa = svm->sev_es.vmsa_pa;
+
+	/* Mark the vCPU as offline and not runnable */
+	vcpu->arch.pv.pv_unhalted = false;
+	vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+
+	/* Clear use of the VMSA */
+	svm->sev_es.vmsa_pa = INVALID_PAGE;
+	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+	/*
+	 * sev->sev_es.vmsa holds the virtual address of the VMSA initially
+	 * allocated by the host. If the guest specified a new a VMSA via
+	 * AP_CREATION, it will have been pinned to avoid future issues
+	 * with things like page migration support. Make sure to un-pin it
+	 * before switching to a newer guest-specified VMSA.
+	 */
+	if (cur_pa != __pa(svm->sev_es.vmsa) && VALID_PAGE(cur_pa))
+		kvm_release_pfn_dirty(__phys_to_pfn(cur_pa));
+
+	if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+		gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
+		struct kvm_memory_slot *slot;
+		kvm_pfn_t pfn;
+
+		slot = gfn_to_memslot(vcpu->kvm, gfn);
+		if (!slot)
+			return -EINVAL;
+
+		/*
+		 * The new VMSA will be private memory guest memory, so
+		 * retrieve the PFN from the gmem backend, and leave the ref
+		 * count of the associated folio elevated to ensure it won't
+		 * ever be migrated.
+		 */
+		if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+			return -EINVAL;
+
+		/* Use the new VMSA */
+		svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
+		svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
+
+		/* Mark the vCPU as runnable */
+		vcpu->arch.pv.pv_unhalted = false;
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+		svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	}
+
+	/*
+	 * When replacing the VMSA during SEV-SNP AP creation,
+	 * mark the VMCB dirty so that full state is always reloaded.
+	 */
+	vmcb_mark_all_dirty(svm->vmcb);
+
+	return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return;
+
+	mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+	if (!svm->sev_es.snp_ap_create)
+		goto unlock;
+
+	svm->sev_es.snp_ap_create = false;
+
+	ret = __sev_snp_update_protected_guest_state(vcpu);
+	if (ret)
+		vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+	mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_vcpu *target_vcpu;
+	struct vcpu_svm *target_svm;
+	unsigned int request;
+	unsigned int apic_id;
+	bool kick;
+	int ret;
+
+	request = lower_32_bits(svm->vmcb->control.exit_info_1);
+	apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+	/* Validate the APIC ID */
+	target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+	if (!target_vcpu) {
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+			    apic_id);
+		return -EINVAL;
+	}
+
+	ret = 0;
+
+	target_svm = to_svm(target_vcpu);
+
+	/*
+	 * The target vCPU is valid, so the vCPU will be kicked unless the
+	 * request is for CREATE_ON_INIT. For any errors at this stage, the
+	 * kick will place the vCPU in an non-runnable state.
+	 */
+	kick = true;
+
+	mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	target_svm->sev_es.snp_ap_create = true;
+
+	/* Interrupt injection mode shouldn't change for AP creation */
+	if (request < SVM_VMGEXIT_AP_DESTROY) {
+		u64 sev_features;
+
+		sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+		sev_features ^= sev->sev_features;
+		if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+				    vcpu->arch.regs[VCPU_REGS_RAX]);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	switch (request) {
+	case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+		kick = false;
+		fallthrough;
+	case SVM_VMGEXIT_AP_CREATE:
+		if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		/*
+		 * Malicious guest can RMPADJUST a large page into VMSA which
+		 * will hit the SNP erratum where the CPU will incorrectly signal
+		 * an RMP violation #PF if a hugepage collides with the RMP entry
+		 * of VMSA page, reject the AP CREATE request if VMSA address from
+		 * guest is 2M aligned.
+		 */
+		if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+			vcpu_unimpl(vcpu,
+				    "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+		break;
+	case SVM_VMGEXIT_AP_DESTROY:
+		break;
+	default:
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+			    request);
+		ret = -EINVAL;
+		break;
+	}
+
+out:
+	if (kick) {
+		if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+			target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+		kvm_vcpu_kick(target_vcpu);
+	}
+
+	mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	return ret;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3565,6 +3773,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		vcpu->run->vmgexit.psc.shared_gpa = svm->sev_es.sw_scratch;
 		vcpu->arch.complete_userspace_io = snp_complete_psc;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		ret = sev_snp_ap_creation(svm);
+		if (ret) {
+			ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
+			ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
+		}
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3731,6 +3948,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
 	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
 					    GHCB_VERSION_MIN,
 					    sev_enc_bit));
+
+	mutex_init(&svm->sev_es.snp_vmsa_mutex);
 }
 
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index da49e4981d75..240518f8d6c7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1398,6 +1398,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	svm->spec_ctrl = 0;
 	svm->virt_spec_ctrl = 0;
 
+	if (init_event)
+		sev_snp_init_protected_guest_state(vcpu);
+
 	init_vmcb(vcpu);
 
 	if (!init_event)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4ef41f4d4ee6..d953ae41c619 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -96,6 +96,7 @@ struct kvm_sev_info {
 	atomic_t migration_in_progress;
 	u64 snp_init_flags;
 	void *snp_context;      /* SNP guest context page */
+	u64 sev_features;	/* Features set at VMSA creation */
 };
 
 struct kvm_svm {
@@ -214,6 +215,10 @@ struct vcpu_sev_es_state {
 	bool ghcb_sa_free;
 
 	u64 ghcb_registered_gpa;
+
+	struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+	gpa_t snp_vmsa_gpa;
+	bool snp_ap_create;
 };
 
 struct vcpu_svm {
@@ -689,7 +694,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 #define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
-#define GHCB_HV_FT_SUPPORTED	GHCB_HV_FT_SNP
+#define GHCB_HV_FT_SUPPORTED	(GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)
 
 extern unsigned int max_sev_asid;
 
@@ -719,6 +724,7 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 87b78d63e81d..df9ec357d538 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10858,6 +10858,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+		if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+			kvm_vcpu_reset(vcpu, true);
+			if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+				r = 1;
+				goto out;
+			}
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -13072,6 +13080,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 	if (kvm_test_request(KVM_REQ_PMI, vcpu))
 		return true;
 
+	if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+		return true;
+
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
 	    kvm_guest_apic_has_interrupt(vcpu)))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 27/35] KVM: SEV: Add support for GHCB-based termination requests
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (25 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 26/35] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 28/35] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

GHCB version 2 adds support for a GHCB-based termination request that
a guest can issue when it reaches an error state and wishes to inform
the hypervisor that it should be terminated. Implement support for that
similarly to GHCB MSR-based termination requests that are already
available to SEV-ES guests via earlier versions of the GHCB protocol.

See 'Termination Request' in the 'Invoking VMGEXIT' section of the GHCB
specification for more details.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3bb89c4df5d6..b2ac696c436a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3100,6 +3100,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
 	case SVM_VMGEXIT_PSC:
+	case SVM_VMGEXIT_TERM_REQUEST:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3782,6 +3783,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 
 		ret = 1;
 		break;
+	case SVM_VMGEXIT_TERM_REQUEST:
+		pr_info("SEV-ES guest requested termination: reason %#llx info %#llx\n",
+			control->exit_info_1, control->exit_info_2);
+		vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+		vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+		vcpu->run->system_event.ndata = 1;
+		vcpu->run->system_event.data[0] = control->ghcb_gpa;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 28/35] KVM: SEV: Implement gmem hook for initializing private pages
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (26 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 27/35] KVM: SEV: Add support for GHCB-based termination requests Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2024-03-11  5:50   ` Binbin Wu
  2023-12-30 17:23 ` [PATCH v11 29/35] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
                   ` (6 subsequent siblings)
  34 siblings, 1 reply; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

This will handle RMP table updates and direct map changes needed to put
a page into a private state before mapping it into an SEV-SNP guest.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig   |  1 +
 arch/x86/kvm/svm/sev.c | 98 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  2 +
 arch/x86/kvm/svm/svm.h |  1 +
 virt/kvm/guest_memfd.c |  4 +-
 5 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 4ec53d6d5773..79c002e1bb5c 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -125,6 +125,7 @@ config KVM_AMD_SEV
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select KVM_GENERIC_PRIVATE_MEM
+	select HAVE_KVM_GMEM_PREPARE
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b2ac696c436a..91f53f4a6059 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4154,3 +4154,101 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 out:
 	put_page(pfn_to_page(pfn));
 }
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+	kvm_pfn_t pfn = start;
+
+	while (pfn < end) {
+		int ret, rmp_level;
+		bool assigned;
+
+		ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+		if (ret) {
+			pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
+					    pfn, start, end, rmp_level, ret);
+			return false;
+		}
+
+		if (assigned) {
+			pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+				 __func__, pfn, start, end, rmp_level);
+			return false;
+		}
+
+		pfn++;
+	}
+
+	return true;
+}
+
+static u8 max_level_for_order(int order)
+{
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
+{
+	kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+
+	/*
+	 * If this is a large folio, and the entire 2M range containing the
+	 * PFN is currently shared, then the entire 2M-aligned range can be
+	 * set to private via a single 2M RMP entry.
+	 */
+	if (max_level_for_order(order) > PG_LEVEL_4K &&
+	    is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
+		return true;
+
+	return false;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	kvm_pfn_t pfn_aligned;
+	gfn_t gfn_aligned;
+	int level, rc;
+	bool assigned;
+
+	if (!sev_snp_guest(kvm))
+		return 0;
+
+	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+	if (rc) {
+		pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n",
+				   gfn, pfn, rc);
+		return -ENOENT;
+	}
+
+	if (assigned) {
+		pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
+			 __func__, gfn, pfn, max_order, level);
+		return 0;
+	}
+
+	if (is_large_rmp_possible(kvm, pfn, max_order)) {
+		level = PG_LEVEL_2M;
+		pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+		gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
+	} else {
+		level = PG_LEVEL_4K;
+		pfn_aligned = pfn;
+		gfn_aligned = gfn;
+	}
+
+	rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
+	if (rc) {
+		pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
+				   gfn, pfn, level, rc);
+		return -EINVAL;
+	}
+
+	pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
+		 __func__, gfn, pfn, pfn_aligned, max_order, level);
+
+	return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 240518f8d6c7..32cef8626b57 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5065,6 +5065,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+	.gmem_prepare = sev_gmem_prepare,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d953ae41c619..9ece9612dbb9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -725,6 +725,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 
 /* vmenter.S */
 
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index feec0da93d98..ddea45279fef 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -66,8 +66,8 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
 		gfn = slot->base_gfn + index - slot->gmem.pgoff;
 		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
 		if (rc) {
-			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
-					    index, rc);
+			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
+					    index, gfn, pfn, rc);
 			return rc;
 		}
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 29/35] KVM: SEV: Implement gmem hook for invalidating private pages
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (27 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 28/35] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 30/35] KVM: x86: Add gmem hook for determining max NPT mapping level Michael Roth
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig   |  1 +
 arch/x86/kvm/svm/sev.c | 63 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  2 ++
 4 files changed, 67 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 79c002e1bb5c..eb8a09f9341a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -126,6 +126,7 @@ config KVM_AMD_SEV
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	select KVM_GENERIC_PRIVATE_MEM
 	select HAVE_KVM_GMEM_PREPARE
+	select HAVE_KVM_GMEM_INVALIDATE
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 91f53f4a6059..85f63b6842b6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4252,3 +4252,66 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
 
 	return 0;
 }
+
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+	kvm_pfn_t pfn;
+
+	pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
+
+	for (pfn = start; pfn < end;) {
+		bool use_2m_update = false;
+		int rc, rmp_level;
+		bool assigned;
+
+		rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+		if (rc) {
+			pr_debug_ratelimited("SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+					     pfn, rc);
+			goto next_pfn;
+		}
+
+		if (!assigned)
+			goto next_pfn;
+
+		use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) &&
+				end >= (pfn + PTRS_PER_PMD) &&
+				rmp_level > PG_LEVEL_4K;
+
+		/*
+		 * If an unaligned PFN corresponds to a 2M region assigned as a
+		 * large page in he RMP table, PSMASH the region into individual
+		 * 4K RMP entries before attempting to convert a 4K sub-page.
+		 */
+		if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
+			rc = snp_rmptable_psmash(pfn);
+			if (rc)
+				pr_err_ratelimited("SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+						   pfn, rc);
+		}
+
+		rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
+		if (WARN_ON_ONCE(rc)) {
+			pr_err_ratelimited("SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+					   pfn, rc);
+			goto next_pfn;
+		}
+
+		/*
+		 * SEV-ES avoids host/guest cache coherency issues through
+		 * WBINVD hooks issued via MMU notifiers during run-time, and
+		 * KVM's VM destroy path at shutdown. Those MMU notifier events
+		 * don't cover gmem since there is no requirement to map pages
+		 * to a HVA in order to use them for a running guest. While the
+		 * shutdown path would still likely cover things for SNP guests,
+		 * userspace may also free gmem pages during run-time via
+		 * hole-punching operations on the guest_memfd, so flush the
+		 * cache entries for these pages before free'ing them back to
+		 * the host.
+		 */
+		clflush_cache_range(__va(pfn_to_hpa(pfn)),
+				    use_2m_update ? PMD_SIZE : PAGE_SIZE);
+next_pfn:
+		pfn += use_2m_update ? PTRS_PER_PMD : 1;
+	}
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 32cef8626b57..f26b8c2a8be4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5067,6 +5067,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 
 	.gmem_prepare = sev_gmem_prepare,
+	.gmem_invalidate = sev_gmem_invalidate,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9ece9612dbb9..a56109e100ac 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -726,6 +726,8 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+int sev_gmem_max_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 30/35] KVM: x86: Add gmem hook for determining max NPT mapping level
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (28 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 29/35] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2024-02-12 10:50   ` Paolo Bonzini
  2023-12-30 17:23 ` [PATCH v11 31/35] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
                   ` (4 subsequent siblings)
  34 siblings, 1 reply; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:

  - gmem allocates 2MB page
  - guest issues PVALIDATE on 2MB page
  - guest later converts a subpage to shared
  - SNP host code issues PSMASH to split 2MB RMP mapping to 4K
  - KVM MMU splits NPT mapping to 4K

At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.

Add a hook to determine the max NPT mapping size in situations like
this.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/mmu/mmu.c             | 12 ++++++++++--
 arch/x86/kvm/svm/sev.c             | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c             |  1 +
 5 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index c4b7b0db7be3..b0a174213dad 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
 KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL_RET0(gmem_max_level)
 KVM_X86_OP_OPTIONAL(gmem_invalidate)
 
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9e45402e51bc..ee1e81608e07 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1806,6 +1806,7 @@ struct kvm_x86_ops {
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 	void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
+	int (*gmem_max_level)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1882096fba3e..21f44ec37b29 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4308,6 +4308,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault)
 {
 	int max_order, r;
+	u8 max_level;
 
 	if (!kvm_slot_can_be_private(fault->slot)) {
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
@@ -4321,8 +4322,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 		return r;
 	}
 
-	fault->max_level = min(kvm_max_level_for_order(max_order),
-			       fault->max_level);
+	max_level = kvm_max_level_for_order(max_order);
+	r = static_call(kvm_x86_gmem_max_level)(vcpu->kvm, fault->pfn,
+						fault->gfn, &max_level);
+	if (r) {
+		kvm_release_pfn_clean(fault->pfn);
+		return r;
+	}
+
+	fault->max_level = min(max_level, fault->max_level);
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
 
 	return RET_PF_CONTINUE;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 85f63b6842b6..5eb836b73131 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4315,3 +4315,30 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
 		pfn += use_2m_update ? PTRS_PER_PMD : 1;
 	}
 }
+
+int sev_gmem_max_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level)
+{
+	int level, rc;
+	bool assigned;
+
+	if (!sev_snp_guest(kvm))
+		return 0;
+
+	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+	if (rc) {
+		pr_err_ratelimited("SEV: RMP entry not found: GFN %llx PFN %llx level %d error %d\n",
+				   gfn, pfn, level, rc);
+		return -ENOENT;
+	}
+
+	if (!assigned) {
+		pr_err_ratelimited("SEV: RMP entry is not assigned: GFN %llx PFN %llx level %d\n",
+				   gfn, pfn, level);
+		return -EINVAL;
+	}
+
+	if (level < *max_level)
+		*max_level = level;
+
+	return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f26b8c2a8be4..f745022f7454 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5067,6 +5067,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 
 	.gmem_prepare = sev_gmem_prepare,
+	.gmem_max_level = sev_gmem_max_level,
 	.gmem_invalidate = sev_gmem_invalidate,
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 31/35] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (29 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 30/35] KVM: x86: Add gmem hook for determining max NPT mapping level Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 32/35] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

From: Ashish Kalra <ashish.kalra@amd.com>

With SNP/guest_memfd, private/encrypted memory should not be mappable,
and MMU notifications for HVA-mapped memory will only be relevant to
unencrypted guest memory. Therefore, the rationale behind issuing a
wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
for SNP guests and can be ignored.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: Add some clarifications in commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5eb836b73131..2cc909cc18c1 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2858,7 +2858,14 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
 
 void sev_guest_memory_reclaimed(struct kvm *kvm)
 {
-	if (!sev_guest(kvm))
+	/*
+	 * With SNP+gmem, private/encrypted memory should be
+	 * unreachable via the hva-based mmu notifiers. Additionally,
+	 * for shared->private translations, H/W coherency will ensure
+	 * first guest access to the page would clear out any existing
+	 * dirty copies of that cacheline.
+	 */
+	if (!sev_guest(kvm) || sev_snp_guest(kvm))
 		return;
 
 	wbinvd_on_all_cpus();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 32/35] KVM: SVM: Add module parameter to enable the SEV-SNP
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (30 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 31/35] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 33/35] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2cc909cc18c1..30a2e75fd94f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -61,7 +61,8 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 static bool sev_es_debug_swap_enabled = true;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 
-static bool sev_snp_enabled;
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
 #else
 #define sev_enabled false
 #define sev_es_enabled false
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 33/35] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (31 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 32/35] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 34/35] crypto: ccp: Add the SNP_SET_CONFIG_{START,END} commands Michael Roth
  2023-12-30 17:23 ` [PATCH v11 35/35] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event Michael Roth
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, Alexey Kardashevskiy

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.

This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made are opaque to the
hypervisor, which only serves as a proxy for the guest requests and
firmware responses.

Implement handling for these events.

Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
 request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c         | 83 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/sev-guest.h |  9 ++++
 2 files changed, 92 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 30a2e75fd94f..05051e36926d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
 #include <linux/misc_cgroup.h>
 #include <linux/processor.h>
 #include <linux/trace_events.h>
+#include <uapi/linux/sev-guest.h>
 
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
@@ -3109,6 +3110,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_HV_FEATURES:
 	case SVM_VMGEXIT_PSC:
 	case SVM_VMGEXIT_TERM_REQUEST:
+	case SVM_VMGEXIT_GUEST_REQUEST:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3539,6 +3541,83 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
 	return ret;
 }
 
+static bool snp_setup_guest_buf(struct kvm *kvm, struct sev_data_snp_guest_request *data,
+				gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	kvm_pfn_t req_pfn, resp_pfn;
+
+	if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+		return false;
+
+	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+	if (is_error_noslot_pfn(req_pfn))
+		return false;
+
+	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+	if (is_error_noslot_pfn(resp_pfn))
+		return false;
+
+	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+		return false;
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+	data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+	return true;
+}
+
+static bool snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data)
+{
+	u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+
+	if (snp_page_reclaim(pfn))
+		return false;
+
+	if (rmp_make_shared(pfn, PG_LEVEL_4K))
+		return false;
+
+	return true;
+}
+
+static bool __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa,
+				   sev_ret_code *fw_err)
+{
+	struct sev_data_snp_guest_request data = {0};
+	struct kvm_sev_info *sev;
+	bool ret = true;
+
+	if (!sev_snp_guest(kvm))
+		return false;
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!snp_setup_guest_buf(kvm, &data, req_gpa, resp_gpa))
+		return false;
+
+	if (sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err))
+		ret = false;
+
+	if (!snp_cleanup_guest_buf(&data))
+		ret = false;
+
+	return ret;
+}
+
+static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	sev_ret_code fw_err = 0;
+	int vmm_ret = 0;
+
+	if (!__snp_handle_guest_req(kvm, req_gpa, resp_gpa, &fw_err))
+		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3799,6 +3878,10 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		vcpu->run->system_event.ndata = 1;
 		vcpu->run->system_event.data[0] = control->ghcb_gpa;
 		break;
+	case SVM_VMGEXIT_GUEST_REQUEST:
+		snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h
index 154a87a1eca9..7bd78e258569 100644
--- a/include/uapi/linux/sev-guest.h
+++ b/include/uapi/linux/sev-guest.h
@@ -89,8 +89,17 @@ struct snp_ext_report_req {
 #define SNP_GUEST_FW_ERR_MASK		GENMASK_ULL(31, 0)
 #define SNP_GUEST_VMM_ERR_SHIFT		32
 #define SNP_GUEST_VMM_ERR(x)		(((u64)x) << SNP_GUEST_VMM_ERR_SHIFT)
+#define SNP_GUEST_FW_ERR(x)		((x) & SNP_GUEST_FW_ERR_MASK)
+#define SNP_GUEST_ERR(vmm_err, fw_err)	(SNP_GUEST_VMM_ERR(vmm_err) | \
+					 SNP_GUEST_FW_ERR(fw_err))
 
+/*
+ * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
+ * a GENERIC error code such that it won't ever conflict with GHCB-defined
+ * errors if any get added in the future.
+ */
 #define SNP_GUEST_VMM_ERR_INVALID_LEN	1
 #define SNP_GUEST_VMM_ERR_BUSY		2
+#define SNP_GUEST_VMM_ERR_GENERIC	BIT(31)
 
 #endif /* __UAPI_LINUX_SEV_GUEST_H_ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 34/35] crypto: ccp: Add the SNP_SET_CONFIG_{START,END} commands
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (32 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 33/35] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  2023-12-30 17:23 ` [PATCH v11 35/35] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event Michael Roth
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

These commands can be used to create a transaction such that commands
that update the reported TCB, such as SNP_SET_CONFIG/SNP_COMMIT, and
updates to userspace-supplied certificates, can be handled atomically
relative to any extended guest requests issued by any SNP guests while
the updates are taking place.

Without this interface, there is a risk that a guest will be given
certificate information that does not correspond to the VCEK/VLEK used
to sign a particular attestation report unless all the running guests
are paused in advance, which would cause disruption to all guests in the
system even if no attestation requests are being made. Even then, care
is needed to ensure that KVM does not pass along certificate information
that was fetched from userspace in advance of the guest being paused.

This interface also provides some versatility with how similar firmware
maintenance activity can be handled in the future without passing
unnecessary management complexity on to userspace.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst | 33 ++++++++++++++++++--
 arch/x86/include/asm/sev.h            |  4 +++
 arch/x86/virt/svm/sev.c               | 31 +++++++++++++++++++
 drivers/crypto/ccp/sev-dev.c          | 44 +++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h          | 12 ++++++++
 5 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 4f696aacc866..0426ebad7671 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -127,8 +127,6 @@ the SEV-SNP specification for further details.
 
 The SNP_GET_EXT_REPORT ioctl is similar to the SNP_GET_REPORT. The difference is
 related to the additional certificate data that is returned with the report.
-The certificate data returned is being provided by the hypervisor through the
-SNP_SET_EXT_CONFIG.
 
 The ioctl uses the SNP_GUEST_REQUEST (MSG_REPORT_REQ) command provided by the SEV-SNP
 firmware to get the attestation report.
@@ -175,6 +173,37 @@ SNP_CONFIG command defined in the SEV-SNP spec. The current values of the
 firmware parameters affected by this command can be queried via
 SNP_PLATFORM_STATUS.
 
+2.7 SNP_SET_CONFIG_START / SNP_SET_CONFIG_END
+---------------------------------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (out): struct sev_user_data_snp_config_transaction
+:Returns (out): 0 on success, -negative on error
+
+When requesting attestation reports, SNP guests have the option of issuing
+an extended guest request which allows host userspace to supply additional
+certificate data that can be used to validate the signature used to sign
+the attestation report. This signature is generated using a key that is
+derived from the reported TCB that can be set via the SNP_SET_CONFIG and
+SNP_COMMIT ioctls, so the accompanying certificate data needs to be kept in
+sync with the changes made to the reported TCB via these ioctls.
+
+To allow for this, SNP_SET_CONFIG_START can be issued prior to performing
+any updates to the reported TCB or certificate data that will be fetched
+from userspace. Any attestation report requests via extended guest requests
+that are in-progress, or received after SNP_SET_CONFIG_START is issued, will
+result in the guest receiving a GHCB-defined error message instructing it to
+retry the request. Once the updates are completed on the host,
+SNP_SET_CONFIG_END must be issued to resume normal servicing of extended
+guest requests.
+
+In general, hosts should avoid having more than 1 outstanding
+SNP_SET_CONFIG_{START,END} transaction in flight at any point in time, but
+each ioctl will return a transaction ID in the response so the caller can
+monitor whether the start/end ID both match. If they don't, the caller
+should assume the transaction has been invalidated and retry the full update
+sequence.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index e84dd1d2d8ab..925578ad34e6 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -268,6 +268,8 @@ int psmash(u64 pfn);
 int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
 int rmp_make_shared(u64 pfn, enum pg_level level);
 void snp_leak_pages(u64 pfn, unsigned int npages);
+u64 snp_config_transaction_start(void);
+u64 snp_config_transaction_end(void);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -280,6 +282,8 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
 }
 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
 static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
+static inline u64 snp_config_transaction_start(void) { return 0; }
+static inline u64 snp_config_transaction_end(void) { return 0; }
 #endif
 
 #endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 0f2e1ce241b5..fc9e1b7fc187 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -71,6 +71,11 @@ static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
 
 static unsigned long snp_nr_leaked_pages;
 
+/* For synchronizing TCB updates with extended guest requests */
+static DEFINE_MUTEX(snp_transaction_lock);
+static u64 snp_transaction_id;
+static bool snp_transaction_pending;
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"SEV-SNP: " fmt
 
@@ -511,3 +516,29 @@ void snp_leak_pages(u64 pfn, unsigned int npages)
 	spin_unlock(&snp_leaked_pages_list_lock);
 }
 EXPORT_SYMBOL_GPL(snp_leak_pages);
+
+u64 snp_config_transaction_start(void)
+{
+	u64 id;
+
+	mutex_lock(&snp_transaction_lock);
+	snp_transaction_pending = true;
+	id = ++snp_transaction_id;
+	mutex_unlock(&snp_transaction_lock);
+
+	return id;
+}
+EXPORT_SYMBOL_GPL(snp_config_transaction_start);
+
+u64 snp_config_transaction_end(void)
+{
+	u64 id;
+
+	mutex_lock(&snp_transaction_lock);
+	snp_transaction_pending = false;
+	id = snp_transaction_id;
+	mutex_unlock(&snp_transaction_lock);
+
+	return id;
+}
+EXPORT_SYMBOL_GPL(snp_config_transaction_end);
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index c5b26b3fe7ff..d81f86d2697a 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2035,6 +2035,44 @@ static int sev_ioctl_do_snp_set_config(struct sev_issue_cmd *argp, bool writable
 	return __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
 }
 
+static int sev_ioctl_do_snp_set_config_start(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_user_data_snp_config_transaction transaction = {0};
+	struct sev_device *sev = psp_master->sev_data;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	transaction.id = snp_config_transaction_start();
+
+	if (copy_to_user((void __user *)argp->data, &transaction, sizeof(transaction)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int sev_ioctl_do_snp_set_config_end(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_user_data_snp_config_transaction transaction = {0};
+	struct sev_device *sev = psp_master->sev_data;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	transaction.id = snp_config_transaction_end();
+
+	if (copy_to_user((void __user *)argp->data, &transaction, sizeof(transaction)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -2095,6 +2133,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SNP_SET_CONFIG:
 		ret = sev_ioctl_do_snp_set_config(&input, writable);
 		break;
+	case SNP_SET_CONFIG_START:
+		ret = sev_ioctl_do_snp_set_config_start(&input, writable);
+		break;
+	case SNP_SET_CONFIG_END:
+		ret = sev_ioctl_do_snp_set_config_end(&input, writable);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index f28d4fb5bc21..9deacb894b1e 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -31,6 +31,8 @@ enum {
 	SNP_PLATFORM_STATUS,
 	SNP_COMMIT,
 	SNP_SET_CONFIG,
+	SNP_SET_CONFIG_START,
+	SNP_SET_CONFIG_END,
 
 	SEV_MAX,
 };
@@ -214,6 +216,16 @@ struct sev_user_data_snp_config {
 	__u8 rsvd1[52];
 } __packed;
 
+/**
+ * struct sev_user_data_snp_config_transaction - metadata for config transactions
+ *
+ * @id: the ID of the transaction started/ended by a call to SNP_SET_CONFIG_START
+ *	or SNP_SET_CONFIG_END, respectively.
+ */
+struct sev_user_data_snp_config_transaction {
+	__u64 id;		/* Out */
+} __packed;
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v11 35/35] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (33 preceding siblings ...)
  2023-12-30 17:23 ` [PATCH v11 34/35] crypto: ccp: Add the SNP_SET_CONFIG_{START,END} commands Michael Roth
@ 2023-12-30 17:23 ` Michael Roth
  34 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2023-12-30 17:23 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

Version 2 of GHCB specification added support for the SNP Extended Guest
Request Message NAE event. This event serves a nearly identical purpose
to the previously-added SNP_GUEST_REQUEST event, but allows for
additional certificate data to be supplied via an additional
guest-supplied buffer to be used mainly for verifying the signature of
an attestation report as returned by firmware.

This certificate data is supplied by userspace, so unlike with
SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
forwarded to userspace via a KVM_EXIT_VMGEXIT exit type, and then the
firmware request is made only afterward.

Implement handling for these events.

Since there is a potential for race conditions where the
userspace-supplied certificate data may be out-of-sync relative to the
reported TCB that firmware will use when signing attestation reports,
make use of the transaction/synchronization mechanisms added by the
SNP_SET_CONFIG_{START,END} SEV device ioctls such that the guest will be
told to retry the request when an update to reported TCB or
userspace-supplied certificates may have occurred or is in progress
while an extended guest request is being processed.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/kvm/api.rst | 26 ++++++++++++
 arch/x86/include/asm/sev.h     |  4 ++
 arch/x86/kvm/svm/sev.c         | 75 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h         |  3 ++
 arch/x86/virt/svm/sev.c        | 20 +++++++++
 include/uapi/linux/kvm.h       |  6 +++
 6 files changed, 134 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2a526b4f8e06..960e2153d468 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7037,6 +7037,7 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 		struct kvm_user_vmgexit {
 		#define KVM_USER_VMGEXIT_PSC_MSR	1
 		#define KVM_USER_VMGEXIT_PSC		2
+		#define KVM_USER_VMGEXIT_EXT_GUEST_REQ	3
 			__u32 type; /* KVM_USER_VMGEXIT_* type */
 			union {
 				struct {
@@ -7050,6 +7051,11 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 					__u64 shared_gpa;
 					__u64 ret;
 				} psc;
+				struct {
+					__u64 data_gpa;
+					__u64 data_npages;
+					__u32 ret;
+				} ext_guest_req;
 			};
 		};
 
@@ -7079,6 +7085,26 @@ private/shared state. Userspace will return a value in 'ret' that is in
 agreement with the GHCB-defined return values that the guest will expect
 in the SW_EXITINFO2 field of the GHCB in response to these requests.
 
+For the KVM_USER_VMGEXIT_EXT_GUEST_REQ type, the ext_guest_req union type
+is used. The kernel will supply in 'data_gpa' the value the guest supplies
+via the RAX field of the GHCB when issued extended guest requests.
+'data_npages' will similarly contain the value the guest supplies in RBX
+denoting the number of shared pages available to write the certificate
+data into.
+
+  - If the supplied number of pages is sufficient, userspace should write
+    the certificate data blob (in the format defined by the GHCB spec) in
+    the address indicated by 'data_gpa' and set 'ret' to 0.
+
+  - If the number of pages supplied is not sufficient, userspace must write
+    the required number of pages in 'data_npages' and then set 'ret' to 1.
+
+  - If userspace is temporarily unable to handle the request, 'ret' should
+    be set to 2 to inform the guest to retry later.
+
+  - If some other error occurred, userspace should set 'ret' to a non-zero
+    value that is distinct from the specific return values mentioned above.
+
 6. Capabilities that can be enabled on vCPUs
 ============================================
 
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 925578ad34e6..25f533827d62 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -270,6 +270,8 @@ int rmp_make_shared(u64 pfn, enum pg_level level);
 void snp_leak_pages(u64 pfn, unsigned int npages);
 u64 snp_config_transaction_start(void);
 u64 snp_config_transaction_end(void);
+u64 snp_config_transaction_get_id(void);
+bool snp_config_transaction_is_stale(u64 id);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -284,6 +286,8 @@ static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV
 static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
 static inline u64 snp_config_transaction_start(void) { return 0; }
 static inline u64 snp_config_transaction_end(void) { return 0; }
+static inline u64 snp_config_transaction_get_id(void) { return 0; }
+static inline bool snp_config_transaction_is_stale(u64 id) { return false; }
 #endif
 
 #endif
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 05051e36926d..dda195999c42 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3111,6 +3111,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_PSC:
 	case SVM_VMGEXIT_TERM_REQUEST:
 	case SVM_VMGEXIT_GUEST_REQUEST:
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3618,6 +3619,77 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
 	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
 }
 
+static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb_control_area *control;
+	struct kvm *kvm = vcpu->kvm;
+	sev_ret_code fw_err = 0;
+	int vmm_ret;
+
+	vmm_ret = vcpu->run->vmgexit.ext_guest_req.ret;
+	if (vmm_ret) {
+		if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
+			vcpu->arch.regs[VCPU_REGS_RBX] =
+				vcpu->run->vmgexit.ext_guest_req.data_npages;
+		goto abort_request;
+	}
+
+	control = &svm->vmcb->control;
+
+	if (!__snp_handle_guest_req(kvm, control->exit_info_1, control->exit_info_2,
+				    &fw_err))
+		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+	/*
+	 * Give errors related to stale transactions precedence to provide more
+	 * potential options for servicing firmware while guests are running.
+	 */
+	if (snp_config_transaction_is_stale(svm->snp_transaction_id))
+		vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
+
+abort_request:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+
+	return 1; /* resume guest */
+}
+
+static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+	int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned long data_npages;
+	sev_ret_code fw_err;
+	gpa_t data_gpa;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		goto abort_request;
+
+	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+	if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
+		goto abort_request;
+
+	svm->snp_transaction_id = snp_config_transaction_get_id();
+	if (snp_config_transaction_is_stale(svm->snp_transaction_id)) {
+		vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
+		goto abort_request;
+	}
+
+	vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+	vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_EXT_GUEST_REQ;
+	vcpu->run->vmgexit.ext_guest_req.data_gpa = data_gpa;
+	vcpu->run->vmgexit.ext_guest_req.data_npages = data_npages;
+	vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
+
+	return 0; /* forward request to userspace */
+
+abort_request:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+	return 1; /* resume guest */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3882,6 +3954,9 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
 		ret = 1;
 		break;
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+		ret = snp_begin_ext_guest_req(vcpu);
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a56109e100ac..a2ac6dc3a79a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -307,6 +307,9 @@ struct vcpu_svm {
 
 	/* Guest GIF value, used when vGIF is not enabled */
 	bool guest_gif;
+
+	/* Transaction ID associated with SNP config updates */
+	u64 snp_transaction_id;
 };
 
 struct svm_cpu_data {
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index fc9e1b7fc187..fee273a84030 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -542,3 +542,23 @@ u64 snp_config_transaction_end(void)
 	return id;
 }
 EXPORT_SYMBOL_GPL(snp_config_transaction_end);
+
+u64 snp_config_transaction_get_id(void)
+{
+	return snp_transaction_id;
+}
+EXPORT_SYMBOL_GPL(snp_config_transaction_get_id);
+
+bool snp_config_transaction_is_stale(u64 id)
+{
+	bool stale = false;
+
+	mutex_lock(&snp_transaction_lock);
+	if (snp_transaction_pending ||
+	    id != snp_transaction_id)
+		stale = true;
+	mutex_unlock(&snp_transaction_lock);
+
+	return stale;
+}
+EXPORT_SYMBOL_GPL(snp_config_transaction_is_stale);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e0599144387b..fe8994b95de9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -170,6 +170,7 @@ struct kvm_xen_exit {
 struct kvm_user_vmgexit {
 #define KVM_USER_VMGEXIT_PSC_MSR	1
 #define KVM_USER_VMGEXIT_PSC		2
+#define KVM_USER_VMGEXIT_EXT_GUEST_REQ	3
 	__u32 type; /* KVM_USER_VMGEXIT_* type */
 	union {
 		struct {
@@ -183,6 +184,11 @@ struct kvm_user_vmgexit {
 			__u64 shared_gpa;
 			__u64 ret;
 		} psc;
+		struct {
+			__u64 data_gpa;
+			__u64 data_npages;
+			__u32 ret;
+		} ext_guest_req;
 	};
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 26/35] KVM: SEV: Support SEV-SNP AP Creation NAE event
  2023-12-30 17:23 ` [PATCH v11 26/35] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
@ 2024-01-05 22:08   ` Jacob Xu
  2024-01-08 15:53     ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: Jacob Xu @ 2024-01-05 22:08 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, Adam Dunlap

On Sat, Dec 30, 2023 at 9:32 AM Michael Roth <michael.roth@amd.com> wrote:
>
> From: Tom Lendacky <thomas.lendacky@amd.com>
>
> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
> guests to alter the register state of the APs on their own. This allows
> the guest a way of simulating INIT-SIPI.
>
> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
> so as to avoid updating the VMSA pointer while the vCPU is running.
>
> For CREATE
>   The guest supplies the GPA of the VMSA to be used for the vCPU with
>   the specified APIC ID. The GPA is saved in the svm struct of the
>   target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>   to the vCPU and then the vCPU is kicked.
>
> For CREATE_ON_INIT:
>   The guest supplies the GPA of the VMSA to be used for the vCPU with
>   the specified APIC ID the next time an INIT is performed. The GPA is
>   saved in the svm struct of the target vCPU.
>
> For DESTROY:
>   The guest indicates it wishes to stop the vCPU. The GPA is cleared
>   from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
>   added to vCPU and then the vCPU is kicked.
>
> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
> as a result of the event or as a result of an INIT. The handler sets the
> vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
> leave the vCPU as not runnable. Any previous VMSA pages that were
> installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
> a new VMSA is to be installed, the VMSA guest page is pinned and set as
> the VMSA in the vCPU VMCB and the vCPU state is set to
> KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
> cleared in the vCPU VMCB and the vCPU state is left as
> KVM_MP_STATE_UNINITIALIZED to prevent it from being run.
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: add handling for gmem]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   1 +
>  arch/x86/include/asm/svm.h      |   5 +
>  arch/x86/kvm/svm/sev.c          | 219 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c          |   3 +
>  arch/x86/kvm/svm/svm.h          |   8 +-
>  arch/x86/kvm/x86.c              |  11 ++
>  6 files changed, 246 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3fdcbb1da856..9e45402e51bc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -121,6 +121,7 @@
>         KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_HV_TLB_FLUSH \
>         KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE   KVM_ARCH_REQ(34)
>
>  #define CR0_RESERVED_BITS                                               \
>         (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index ba8ce15b27d7..4b73cf5e9de0 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -287,6 +287,11 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
>
>  #define SVM_SEV_FEAT_DEBUG_SWAP                        BIT(5)
>  #define SVM_SEV_FEAT_SNP_ACTIVE                        BIT(0)
> +#define SVM_SEV_FEAT_RESTRICTED_INJECTION              BIT(3)
> +#define SVM_SEV_FEAT_ALTERNATE_INJECTION              BIT(4)
> +#define SVM_SEV_FEAT_INT_INJ_MODES             \
> +       (SVM_SEV_FEAT_RESTRICTED_INJECTION |    \
> +        SVM_SEV_FEAT_ALTERNATE_INJECTION)
>
>  struct vmcb_seg {
>         u16 selector;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 996b5a668938..3bb89c4df5d6 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -652,6 +652,7 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
>  static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>  {
> +       struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
>         struct sev_es_save_area *save = svm->sev_es.vmsa;
>
>         /* Check some debug related fields before encrypting the VMSA */
> @@ -700,6 +701,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>         if (sev_snp_guest(svm->vcpu.kvm))
>                 save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
>
> +       /*
> +        * Save the VMSA synced SEV features. For now, they are the same for
> +        * all vCPUs, so just save each time.
> +        */
> +       sev->sev_features = save->sev_features;
> +
>         pr_debug("Virtual Machine Save Area (VMSA):\n");
>         print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
>
> @@ -3082,6 +3089,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>                 if (!kvm_ghcb_sw_scratch_is_valid(svm))
>                         goto vmgexit_err;
>                 break;
> +       case SVM_VMGEXIT_AP_CREATION:
> +               if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
> +                       if (!kvm_ghcb_rax_is_valid(svm))
> +                               goto vmgexit_err;
> +               break;
>         case SVM_VMGEXIT_NMI_COMPLETE:
>         case SVM_VMGEXIT_AP_HLT_LOOP:
>         case SVM_VMGEXIT_AP_JUMP_TABLE:
> @@ -3322,6 +3334,202 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
>         return 1; /* resume guest */
>  }
>
> +static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
> +{
> +       struct vcpu_svm *svm = to_svm(vcpu);
> +       hpa_t cur_pa;
> +
> +       WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
> +
> +       /* Save off the current VMSA PA for later checks */
> +       cur_pa = svm->sev_es.vmsa_pa;
> +
> +       /* Mark the vCPU as offline and not runnable */
> +       vcpu->arch.pv.pv_unhalted = false;
> +       vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> +
> +       /* Clear use of the VMSA */
> +       svm->sev_es.vmsa_pa = INVALID_PAGE;
> +       svm->vmcb->control.vmsa_pa = INVALID_PAGE;
> +
> +       /*
> +        * sev->sev_es.vmsa holds the virtual address of the VMSA initially
> +        * allocated by the host. If the guest specified a new a VMSA via
> +        * AP_CREATION, it will have been pinned to avoid future issues
> +        * with things like page migration support. Make sure to un-pin it
> +        * before switching to a newer guest-specified VMSA.
> +        */
> +       if (cur_pa != __pa(svm->sev_es.vmsa) && VALID_PAGE(cur_pa))
> +               kvm_release_pfn_dirty(__phys_to_pfn(cur_pa));
> +
> +       if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
> +               gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
> +               struct kvm_memory_slot *slot;
> +               kvm_pfn_t pfn;
> +
> +               slot = gfn_to_memslot(vcpu->kvm, gfn);
> +               if (!slot)
> +                       return -EINVAL;
> +
> +               /*
> +                * The new VMSA will be private memory guest memory, so
> +                * retrieve the PFN from the gmem backend, and leave the ref
> +                * count of the associated folio elevated to ensure it won't
> +                * ever be migrated.
> +                */
> +               if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
> +                       return -EINVAL;
> +
> +               /* Use the new VMSA */
> +               svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
> +               svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
> +
> +               /* Mark the vCPU as runnable */
> +               vcpu->arch.pv.pv_unhalted = false;
> +               vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +
> +               svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
> +       }
> +
> +       /*
> +        * When replacing the VMSA during SEV-SNP AP creation,
> +        * mark the VMCB dirty so that full state is always reloaded.
> +        */
> +       vmcb_mark_all_dirty(svm->vmcb);
> +
> +       return 0;
> +}
> +
> +/*
> + * Invoked as part of svm_vcpu_reset() processing of an init event.
> + */
> +void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
> +{
> +       struct vcpu_svm *svm = to_svm(vcpu);
> +       int ret;
> +
> +       if (!sev_snp_guest(vcpu->kvm))
> +               return;
> +
> +       mutex_lock(&svm->sev_es.snp_vmsa_mutex);
> +
> +       if (!svm->sev_es.snp_ap_create)
> +               goto unlock;
> +
> +       svm->sev_es.snp_ap_create = false;
> +
> +       ret = __sev_snp_update_protected_guest_state(vcpu);
> +       if (ret)
> +               vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
> +
> +unlock:
> +       mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
> +}
> +
> +static int sev_snp_ap_creation(struct vcpu_svm *svm)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm_vcpu *target_vcpu;
> +       struct vcpu_svm *target_svm;
> +       unsigned int request;
> +       unsigned int apic_id;
> +       bool kick;
> +       int ret;
> +
> +       request = lower_32_bits(svm->vmcb->control.exit_info_1);
> +       apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
> +
> +       /* Validate the APIC ID */
> +       target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
> +       if (!target_vcpu) {
> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
> +                           apic_id);
> +               return -EINVAL;
> +       }
> +
> +       ret = 0;
> +
> +       target_svm = to_svm(target_vcpu);
> +
> +       /*
> +        * The target vCPU is valid, so the vCPU will be kicked unless the
> +        * request is for CREATE_ON_INIT. For any errors at this stage, the
> +        * kick will place the vCPU in an non-runnable state.
> +        */
> +       kick = true;
> +
> +       mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
> +
> +       target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
> +       target_svm->sev_es.snp_ap_create = true;
> +
> +       /* Interrupt injection mode shouldn't change for AP creation */
> +       if (request < SVM_VMGEXIT_AP_DESTROY) {
> +               u64 sev_features;
> +
> +               sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
> +               sev_features ^= sev->sev_features;
> +               if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
> +                                   vcpu->arch.regs[VCPU_REGS_RAX]);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +       }
> +
> +       switch (request) {
> +       case SVM_VMGEXIT_AP_CREATE_ON_INIT:
> +               kick = false;
> +               fallthrough;
> +       case SVM_VMGEXIT_AP_CREATE:
> +               if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
> +                                   svm->vmcb->control.exit_info_2);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               /*
> +                * Malicious guest can RMPADJUST a large page into VMSA which
> +                * will hit the SNP erratum where the CPU will incorrectly signal
> +                * an RMP violation #PF if a hugepage collides with the RMP entry
> +                * of VMSA page, reject the AP CREATE request if VMSA address from
> +                * guest is 2M aligned.
> +                */
> +               if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
> +                       vcpu_unimpl(vcpu,
> +                                   "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
> +                                   svm->vmcb->control.exit_info_2);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
> +               break;
> +       case SVM_VMGEXIT_AP_DESTROY:
> +               break;
> +       default:
> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
> +                           request);
> +               ret = -EINVAL;
> +               break;
> +       }
> +
> +out:
> +       if (kick) {
> +               if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
> +                       target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +
> +               kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);

I think we should  switch the order of these two statements for
setting mp_state and for making the request for
KVM_REQ_UPDATE_PROTECTED_GUEST_STATE.
There is a race condition I observed when booting with SVSM where:
1. BSP sets target vcpu to KVM_MP_STATE_RUNNABLE
2. AP thread within the loop of arch/x86/kvm.c:vcpu_run() checks
vm_vcpu_running()
3. AP enters the guest without having updated the VMSA state from
KVM_REQ_UPDATE_PROTECTED_GUEST_STATE

This results in the AP executing on a bad RIP and then crashing.
If we set the request first, then we avoid the race condition.

> +               kvm_vcpu_kick(target_vcpu);
> +       }
> +
> +       mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
> +
> +       return ret;
> +}
> +
>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  {
>         struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3565,6 +3773,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>                 vcpu->run->vmgexit.psc.shared_gpa = svm->sev_es.sw_scratch;
>                 vcpu->arch.complete_userspace_io = snp_complete_psc;
>                 break;
> +       case SVM_VMGEXIT_AP_CREATION:
> +               ret = sev_snp_ap_creation(svm);
> +               if (ret) {
> +                       ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
> +                       ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
> +               }
> +
> +               ret = 1;
> +               break;
>         case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>                 vcpu_unimpl(vcpu,
>                             "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
> @@ -3731,6 +3948,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
>         set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
>                                             GHCB_VERSION_MIN,
>                                             sev_enc_bit));
> +
> +       mutex_init(&svm->sev_es.snp_vmsa_mutex);
>  }
>
>  void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index da49e4981d75..240518f8d6c7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1398,6 +1398,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>         svm->spec_ctrl = 0;
>         svm->virt_spec_ctrl = 0;
>
> +       if (init_event)
> +               sev_snp_init_protected_guest_state(vcpu);
> +
>         init_vmcb(vcpu);
>
>         if (!init_event)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 4ef41f4d4ee6..d953ae41c619 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -96,6 +96,7 @@ struct kvm_sev_info {
>         atomic_t migration_in_progress;
>         u64 snp_init_flags;
>         void *snp_context;      /* SNP guest context page */
> +       u64 sev_features;       /* Features set at VMSA creation */
>  };
>
>  struct kvm_svm {
> @@ -214,6 +215,10 @@ struct vcpu_sev_es_state {
>         bool ghcb_sa_free;
>
>         u64 ghcb_registered_gpa;
> +
> +       struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
> +       gpa_t snp_vmsa_gpa;
> +       bool snp_ap_create;
>  };
>
>  struct vcpu_svm {
> @@ -689,7 +694,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
>  #define GHCB_VERSION_MAX       2ULL
>  #define GHCB_VERSION_MIN       1ULL
>
> -#define GHCB_HV_FT_SUPPORTED   GHCB_HV_FT_SNP
> +#define GHCB_HV_FT_SUPPORTED   (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)
>
>  extern unsigned int max_sev_asid;
>
> @@ -719,6 +724,7 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
>  struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>  void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
> +void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
>
>  /* vmenter.S */
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 87b78d63e81d..df9ec357d538 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10858,6 +10858,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>
>                 if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
>                         static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
> +
> +               if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
> +                       kvm_vcpu_reset(vcpu, true);
> +                       if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
> +                               r = 1;
> +                               goto out;
> +                       }
> +               }
>         }
>
>         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
> @@ -13072,6 +13080,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>         if (kvm_test_request(KVM_REQ_PMI, vcpu))
>                 return true;
>
> +       if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
> +               return true;
> +
>         if (kvm_arch_interrupt_allowed(vcpu) &&
>             (kvm_cpu_has_interrupt(vcpu) ||
>             kvm_guest_apic_has_interrupt(vcpu)))
> --
> 2.25.1
>
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 26/35] KVM: SEV: Support SEV-SNP AP Creation NAE event
  2024-01-05 22:08   ` Jacob Xu
@ 2024-01-08 15:53     ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-01-08 15:53 UTC (permalink / raw)
  To: Jacob Xu
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
	peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, bp,
	vbabka, kirill, ak, tony.luck, sathyanarayanan.kuppuswamy,
	alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, Adam Dunlap

On Fri, Jan 05, 2024, Jacob Xu wrote:
> > +       if (kick) {
> > +               if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
> > +                       target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> > +
> > +               kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
> 
> I think we should  switch the order of these two statements for
> setting mp_state and for making the request for
> KVM_REQ_UPDATE_PROTECTED_GUEST_STATE.
> There is a race condition I observed when booting with SVSM where:
> 1. BSP sets target vcpu to KVM_MP_STATE_RUNNABLE
> 2. AP thread within the loop of arch/x86/kvm.c:vcpu_run() checks
> vm_vcpu_running()
> 3. AP enters the guest without having updated the VMSA state from
> KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
> 
> This results in the AP executing on a bad RIP and then crashing.
> If we set the request first, then we avoid the race condition.

That just introducs a different race, e.g. if this task gets delayed and the
target vCPU processes KVM_REQ_UPDATE_PROTECTED_GUEST_STATE before its marked
RUNNABLE, then the target vCPU could end up stuck in the UNINITIALIZED loop.

Reading and writing arch.mp_state across vCPUs is simply not safe.  There's a
reason why KVM atomically manages INITs and SIPIs and only modifies mp_state when
processing events on the target vCPU.

> > +               kvm_vcpu_kick(target_vcpu);

...

> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 87b78d63e81d..df9ec357d538 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10858,6 +10858,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >
> >                 if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
> >                         static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
> > +
> > +               if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
> > +                       kvm_vcpu_reset(vcpu, true);
> > +                       if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
> > +                               r = 1;
> > +                               goto out;
> > +                       }
> > +               }
> >         }
> >
> >         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
> > @@ -13072,6 +13080,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
> >         if (kvm_test_request(KVM_REQ_PMI, vcpu))
> >                 return true;
> >
> > +       if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
> > +               return true;
> > +
> >         if (kvm_arch_interrupt_allowed(vcpu) &&
> >             (kvm_cpu_has_interrupt(vcpu) ||
> >             kvm_guest_apic_has_interrupt(vcpu)))
> > --
> > 2.25.1
> >
> >

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2023-12-30 17:23 ` [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
@ 2024-01-10 15:45   ` Sean Christopherson
  2024-01-16  4:14     ` Michael Roth
  2024-03-18 21:02   ` Peter Gonda
  1 sibling, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-01-10 15:45 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into
> the guest's memory. The data is encrypted with the cryptographic context
> created with the KVM_SEV_SNP_LAUNCH_START.
> 
> In addition to the inserting data, it can insert a two special pages
> into the guests memory: the secrets page and the CPUID page.
> 
> While terminating the guest, reclaim the guest pages added in the RMP
> table. If the reclaim fails, then the page is no longer safe to be
> released back to the system and leak them.
> 
> For more information see the SEV-SNP specification.

Please rewrite all changelogs to explain what *KVM* support is being added, why
the proposed uAPI looks like it does, and how the new uAPI is intended be used.

Porividing a crash course on the relevant hardware behavior is definitely helpful,
but the changelog absolutely needs to explain/justify the patch.

> Co-developed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  .../virt/kvm/x86/amd-memory-encryption.rst    |  28 +++
>  arch/x86/kvm/svm/sev.c                        | 181 ++++++++++++++++++
>  include/uapi/linux/kvm.h                      |  19 ++
>  3 files changed, 228 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index b1beb2fe8766..d4325b26724c 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error
>  
>  See the SEV-SNP specification for further detail on the launch input.
>  
> +20. KVM_SNP_LAUNCH_UPDATE
> +-------------------------
> +
> +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
> +calculates a measurement of the memory contents. The measurement is a signature
> +of the memory contents that can be sent to the guest owner as an attestation
> +that the memory was encrypted correctly by the firmware.
> +
> +Parameters (in): struct  kvm_snp_launch_update
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_snp_launch_update {
> +                __u64 start_gfn;        /* Guest page number to start from. */
> +                __u64 uaddr;            /* userspace address need to be encrypted */

Huh?  Why is KVM taking a userspace address?  IIUC, the address unconditionally
gets translated into a gfn, so why not pass a gfn?

And speaking of gfns, AFAICT start_gfn is never used.

Oof, reading more of the code, this *requires* an effective in-place copy-and-convert
of guest memory.

> +                __u32 len;              /* length of memory region */

Bytes?  Pages?  One field above operates on frame numbers, one apparently operates
on a byte-granularity address.

> +                __u8 imi_page;          /* 1 if memory is part of the IMI */

What's "the IMI"?  Initial Measurement Image?  I assume this is essentially just
a flag that communicates whether or not the page should be measured?

> +                __u8 page_type;         /* page type */
> +                __u8 vmpl3_perms;       /* VMPL3 permission mask */
> +                __u8 vmpl2_perms;       /* VMPL2 permission mask */
> +                __u8 vmpl1_perms;       /* VMPL1 permission mask */

Why?  KVM doesn't support VMPLs.

> +static int snp_page_reclaim(u64 pfn)
> +{
> +	struct sev_data_snp_page_reclaim data = {0};
> +	int err, rc;
> +
> +	data.paddr = __sme_set(pfn << PAGE_SHIFT);
> +	rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> +	if (rc) {
> +		/*
> +		 * If the reclaim failed, then page is no longer safe
> +		 * to use.

Uh, why can reclaim fail, and why does the kernel apparently not care about
leaking pages?  AFAICT, nothing ever complains beyond a pr_debug.  That sounds
bonkers to me, i.e. at the very minimum, why doesn't this warrant a WARN_ON_ONCE?

> +		 */
> +		snp_leak_pages(pfn, 1);
> +	}
> +
> +	return rc;
> +}
> +
> +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
> +{
> +	int rc;
> +
> +	rc = rmp_make_shared(pfn, level);
> +	if (rc && leak)
> +		snp_leak_pages(pfn,
> +			       page_level_size(level) >> PAGE_SHIFT);

Completely unnecessary wrap.

> +
> +	return rc;
> +}
> +
>  static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>  {
>  	struct sev_data_deactivate deactivate;
> @@ -1990,6 +2020,154 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return rc;
>  }
>  
> +static int snp_launch_update_gfn_handler(struct kvm *kvm,
> +					 struct kvm_gfn_range *range,
> +					 void *opaque)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct kvm_memory_slot *memslot = range->slot;
> +	struct sev_data_snp_launch_update data = {0};
> +	struct kvm_sev_snp_launch_update params;
> +	struct kvm_sev_cmd *argp = opaque;
> +	int *error = &argp->error;
> +	int i, n = 0, ret = 0;
> +	unsigned long npages;
> +	kvm_pfn_t *pfns;
> +	gfn_t gfn;
> +
> +	if (!kvm_slot_can_be_private(memslot)) {
> +		pr_err("SEV-SNP requires private memory support via guest_memfd.\n");

Yeah, no.  Sprinkling pr_err() all over the place in user-triggerable error paths
is not acceptable.  I get that it's often hard to extract "what went wrong" out
of the kernel, but adding pr_err() everywhere is not a viable solution.

> +		return -EINVAL;
> +	}
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
> +		pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
> +		return -EFAULT;
> +	}
> +
> +	data.gctx_paddr = __psp_pa(sev->snp_context);
> +
> +	npages = range->end - range->start;
> +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
> +	if (!pfns)
> +		return -ENOMEM;
> +
> +	pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
> +		 range->start, range->end, params.page_type);
> +
> +	for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
> +		int order, level;
> +		bool assigned;
> +		void *kvaddr;
> +
> +		ret = __kvm_gmem_get_pfn(kvm, memslot, gfn, &pfns[i], &order, false);
> +		if (ret)
> +			goto e_release;
> +
> +		n++;
> +		ret = snp_lookup_rmpentry((u64)pfns[i], &assigned, &level);
> +		if (ret || assigned) {
> +			pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d, assigned: %d\n",
> +			       gfn, ret, assigned);
> +			return -EFAULT;
> +		}
> +
> +		kvaddr = pfn_to_kaddr(pfns[i]);
> +		if (!virt_addr_valid(kvaddr)) {

I really, really don't like that this assume guest_memfd is backed by struct page.

> +			pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
> +			ret = -EINVAL;
> +			goto e_release;
> +		}
> +
> +		ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);

Good gravy.  If I'm reading this correctly, KVM:

  1. Translates an HVA into a GFN.
  2. Gets the PFN for that GFN from guest_memfd
  3. Verifies the PFN is not assigned to the guest
  4. Copies memory from the shared memslot page to the guest_memfd page
  5. Converts the page to private and asks the PSP to encrypt it

(a) As above, why is #1 a thing?
(b) Why are KVM's memory attributes never consulted?
(c) What prevents TOCTOU issues with respect to the RMP?
(d) Why is *KVM* copying memory into guest_memfd?
(e) What guarantees the direct map is valid for guest_memfd?
(f) Why does KVM's uAPI *require* the source page to come from the same memslot?

> +		if (ret) {
> +			pr_err("Guest read failed, ret: 0x%x\n", ret);
> +			goto e_release;
> +		}
> +
> +		ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
> +				       sev_get_asid(kvm), true);
> +		if (ret) {
> +			ret = -EFAULT;
> +			goto e_release;
> +		}
> +
> +		data.address = __sme_set(pfns[i] << PAGE_SHIFT);
> +		data.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
> +		data.page_type = params.page_type;
> +		data.vmpl3_perms = params.vmpl3_perms;
> +		data.vmpl2_perms = params.vmpl2_perms;
> +		data.vmpl1_perms = params.vmpl1_perms;
> +		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> +				      &data, error);
> +		if (ret) {
> +			pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
> +			       ret, *error);
> +			snp_page_reclaim(pfns[i]);
> +
> +			/*
> +			 * When invalid CPUID function entries are detected, the firmware
> +			 * corrects these entries for debugging purpose and leaves the
> +			 * page unencrypted so it can be provided users for debugging
> +			 * and error-reporting.
> +			 *
> +			 * Copy the corrected CPUID page back to shared memory so
> +			 * userpsace can retrieve this information.

Why?  IIUC, this is basically backdooring reads/writes into guest_memfd to avoid
having to add proper mmap() support.
 
> +			 */
> +			if (params.page_type == SNP_PAGE_TYPE_CPUID &&
> +			    *error == SEV_RET_INVALID_PARAM) {
> +				int ret;

Ugh, do not shadow variables.

> +
> +				host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
> +
> +				ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> +				if (ret)
> +					pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
> +					       ret);
> +			}
> +
> +			goto e_release;
> +		}
> +	}
> +
> +e_release:
> +	/* Content of memory is updated, mark pages dirty */
> +	for (i = 0; i < n; i++) {
> +		set_page_dirty(pfn_to_page(pfns[i]));
> +		mark_page_accessed(pfn_to_page(pfns[i]));
> +
> +		/*
> +		 * If its an error, then update RMP entry to change page ownership
> +		 * to the hypervisor.
> +		 */
> +		if (ret)
> +			host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
> +
> +		put_page(pfn_to_page(pfns[i]));
> +	}
> +
> +	kvfree(pfns);

Saving PFNs from guest_memfd, which is fully owned by KVM, is so unnecessarily
complex.  Add a guest_memfd API (or three) to do this safely, e.g. to lock the
pages, do (and track) the RMP conversion, etc...

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-01-10 15:45   ` Sean Christopherson
@ 2024-01-16  4:14     ` Michael Roth
  2024-02-02 22:54       ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: Michael Roth @ 2024-01-16  4:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Wed, Jan 10, 2024 at 07:45:36AM -0800, Sean Christopherson wrote:
> On Sat, Dec 30, 2023, Michael Roth wrote:
> > diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> > index b1beb2fe8766..d4325b26724c 100644
> > --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> > +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> > @@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error
> >  
> >  See the SEV-SNP specification for further detail on the launch input.
> >  
> > +20. KVM_SNP_LAUNCH_UPDATE
> > +-------------------------
> > +
> > +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
> > +calculates a measurement of the memory contents. The measurement is a signature
> > +of the memory contents that can be sent to the guest owner as an attestation
> > +that the memory was encrypted correctly by the firmware.
> > +
> > +Parameters (in): struct  kvm_snp_launch_update
> > +
> > +Returns: 0 on success, -negative on error
> > +
> > +::
> > +
> > +        struct kvm_sev_snp_launch_update {
> > +                __u64 start_gfn;        /* Guest page number to start from. */
> > +                __u64 uaddr;            /* userspace address need to be encrypted */
> 
> Huh?  Why is KVM taking a userspace address?  IIUC, the address unconditionally
> gets translated into a gfn, so why not pass a gfn?
> 
> And speaking of gfns, AFAICT start_gfn is never used.

I think having both the uaddr and start_gfn parameters makes sense. I
think it's only awkward because how I'm using the memslot to translate
the uaddr to a GFN in the current implementation, but:

 a) It's actually not a requirement that uaddr be associated with a
    memslot. It could just as easily be any random userspace address
    containing the payload that we want to copy into the actual gmem
    pages associated with start_gfn. I think TDX does something similar
    in that regard, and it makes sense to give VMMs the option of
    handling things that way.

 b) If we switch to just having start_gfn, and no uaddr, then things get
    awkward because then you really do need to have a memslot set up to
    get at the payload, and have some way of pre-populating the gmem pages
    prior to conversion, either the way the current code does it (via
    copying shared memory prior to conversion), or by having some way to
    populate the gmem pages directly, which is even more painful.

> 
> Oof, reading more of the code, this *requires* an effective in-place copy-and-convert
> of guest memory.

Yes, I'm having some trouble locating the various threads, but initially
there were some discussions about having a userspace API that allows for
UPM/gmem pages to be pre-populated before they are in-place encrypted, but
we'd all eventually decided that having KVM handle this internally was
the simplest approach.

So that's how it's done here, KVM_SNP_LAUNCH_UPDATE copies the pages into
gmem, then passes those pages on to firmware for encryption. Then the
VMM is expected to mark the GFN range as private via
KVM_SET_MEMORY_ATTRIBUTES, since the VMM understands what constitutes
initial private/encrypted payload. I should document that better in
KVM_SNP_LAUNCH_UPDATE docs however.

> 
> > +                __u32 len;              /* length of memory region */
> 
> Bytes?  Pages?  One field above operates on frame numbers, one apparently operates
> on a byte-granularity address.

If we implement things as mentioned above, it makes sense to decouple
uaddr from any page alignment/size restrictions since it would always be
copied into the target gmem page starting at byte 0. This sort of
assumes that the gmem page will initially be zero'd however, which is
the case currently, but there's a TODO in kvm_gmem_get_folio() about
potentially off-loading that to firmware. I'm not sure it would ever
be applicable for these pages though. Worst case, KVM_SNP_LAUNCH_UPDATE
can pad with 0's.

> 
> > +                __u8 imi_page;          /* 1 if memory is part of the IMI */
> 
> What's "the IMI"?  Initial Measurement Image?

Yes, though the SNP Firmware ABI also references it as "Incoming Migration
Image", which I think is a little clearer about its purpose and so that's
the terminology I've been using in the kernel.

> What's "the IMI"?  Initial Measurement Image?  I assume this is essentially just
> a flag that communicates whether or not the page should be measured?

This is actually for loading a measured migration agent into the target
system so that it can handle receiving the encrypted guest data from the
source. There's still a good deal of planning around how live migration
will be handled however so if you think it's premature to expose this
via KVM I can remove the related fields.

> 
> > +                __u8 page_type;         /* page type */
> > +                __u8 vmpl3_perms;       /* VMPL3 permission mask */
> > +                __u8 vmpl2_perms;       /* VMPL2 permission mask */
> > +                __u8 vmpl1_perms;       /* VMPL1 permission mask */
> 
> Why?  KVM doesn't support VMPLs.

It does actually get used by the SVSM. I can remove these but then we'd
need some capability bit or something to know when they are available if
they get re-introduced. But that may be needed anyway since KVM needs
some additional changes to handle scheduling threads running at
different VMPL levels.

> 
> > +static int snp_page_reclaim(u64 pfn)
> > +{
> > +	struct sev_data_snp_page_reclaim data = {0};
> > +	int err, rc;
> > +
> > +	data.paddr = __sme_set(pfn << PAGE_SHIFT);
> > +	rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> > +	if (rc) {
> > +		/*
> > +		 * If the reclaim failed, then page is no longer safe
> > +		 * to use.
> 
> Uh, why can reclaim fail, and why does the kernel apparently not care about
> leaking pages?  AFAICT, nothing ever complains beyond a pr_debug.  That sounds
> bonkers to me, i.e. at the very minimum, why doesn't this warrant a WARN_ON_ONCE?

PAGE_RECLAIM shouldn't happen in practice, so yes, it makes sense to warn
about this when it does. snp_leak_pages() is probably the most
consistent/user-friendly place to convey these failures so I'll add a
pr_warn() there.

> 
> > +		 */
> > +		snp_leak_pages(pfn, 1);
> > +	}
> > +
> > +	return rc;
> > +}
> > +
> > +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
> > +{
> > +	int rc;
> > +
> > +	rc = rmp_make_shared(pfn, level);
> > +	if (rc && leak)
> > +		snp_leak_pages(pfn,
> > +			       page_level_size(level) >> PAGE_SHIFT);
> 
> Completely unnecessary wrap.
> 
> > +
> > +	return rc;
> > +}
> > +
> >  static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
> >  {
> >  	struct sev_data_deactivate deactivate;
> > @@ -1990,6 +2020,154 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >  	return rc;
> >  }
> >  
> > +static int snp_launch_update_gfn_handler(struct kvm *kvm,
> > +					 struct kvm_gfn_range *range,
> > +					 void *opaque)
> > +{
> > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +	struct kvm_memory_slot *memslot = range->slot;
> > +	struct sev_data_snp_launch_update data = {0};
> > +	struct kvm_sev_snp_launch_update params;
> > +	struct kvm_sev_cmd *argp = opaque;
> > +	int *error = &argp->error;
> > +	int i, n = 0, ret = 0;
> > +	unsigned long npages;
> > +	kvm_pfn_t *pfns;
> > +	gfn_t gfn;
> > +
> > +	if (!kvm_slot_can_be_private(memslot)) {
> > +		pr_err("SEV-SNP requires private memory support via guest_memfd.\n");
> 
> Yeah, no.  Sprinkling pr_err() all over the place in user-triggerable error paths
> is not acceptable.  I get that it's often hard to extract "what went wrong" out
> of the kernel, but adding pr_err() everywhere is not a viable solution.

Makes sense, I'll drop this.

> 
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
> > +		pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
> > +		return -EFAULT;
> > +	}
> > +
> > +	data.gctx_paddr = __psp_pa(sev->snp_context);
> > +
> > +	npages = range->end - range->start;
> > +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
> > +	if (!pfns)
> > +		return -ENOMEM;
> > +
> > +	pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
> > +		 range->start, range->end, params.page_type);
> > +
> > +	for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
> > +		int order, level;
> > +		bool assigned;
> > +		void *kvaddr;
> > +
> > +		ret = __kvm_gmem_get_pfn(kvm, memslot, gfn, &pfns[i], &order, false);
> > +		if (ret)
> > +			goto e_release;
> > +
> > +		n++;
> > +		ret = snp_lookup_rmpentry((u64)pfns[i], &assigned, &level);
> > +		if (ret || assigned) {
> > +			pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d, assigned: %d\n",
> > +			       gfn, ret, assigned);
> > +			return -EFAULT;
> > +		}
> > +
> > +		kvaddr = pfn_to_kaddr(pfns[i]);
> > +		if (!virt_addr_valid(kvaddr)) {
> 
> I really, really don't like that this assume guest_memfd is backed by struct page.

There are similar enforcements in the SEV/SEV-ES code. There was some
initial discussion about relaxing this for SNP so we could support
things like /dev/mem-mapped guest memory, but then guest_memfd came
along and made that to be an unlikely use-case in the near-term given
that it relies on alloc_pages() currently and explicitly guards against
mmap()'ing pages in userspace.

I think it makes to keep the current tightened restrictions in-place
until such a use-case comes along, since otherwise we are relaxing a
bunch of currently-useful sanity checks that span all throughout the code
to support some nebulous potential use-case that might never come along.
I think it makes more sense to cross that bridge when we get there.

> 
> > +			pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
> > +			ret = -EINVAL;
> > +			goto e_release;
> > +		}
> > +
> > +		ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> 
> Good gravy.  If I'm reading this correctly, KVM:
> 
>   1. Translates an HVA into a GFN.
>   2. Gets the PFN for that GFN from guest_memfd
>   3. Verifies the PFN is not assigned to the guest
>   4. Copies memory from the shared memslot page to the guest_memfd page
>   5. Converts the page to private and asks the PSP to encrypt it
> 
> (a) As above, why is #1 a thing?

Yah, it's probably best to avoid this, as proposed above.

> (b) Why are KVM's memory attributes never consulted?

It doesn't really matter if the attributes are set before or after
KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches
they pages get set to private so they get faulted in from gmem. We could
document our expectations and enforce them here if that's preferable
however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance
would make it easier to enforce that userspace does the right thing.
I'll see how that looks if there are no objections.

> (c) What prevents TOCTOU issues with respect to the RMP?

Time-of-use will be when the guest faults the gmem page in with C-bit
set. If it is not in the expected Guest-owned/pre-validated state that
SEV_CMD_SNP_LAUNCH_UPDATE expected/set, then the guest will get an RMP
fault or #VC exception for unvalidated page access. It will also fail
attestation. Not sure if that covers the scenarios you had in mind.

> (d) Why is *KVM* copying memory into guest_memfd?

As mentioned above, there were various discussions of ways of allowing
userspace to pwrite() to the guest_memfd in advance of
"sealing"/"binding" it and then encrypting it in place. I think this was
one of the related threads:

  https://lore.kernel.org/linux-mm/YkyKywkQYbr9U0CA@google.com/

My read at the time was that the requirements between pKVM/TDX/SNP were all
so unique that we'd spin forever trying to come up with a userspace ABI
that worked for everyone. At the time you'd suggested for pKVM to handle
their specific requirements internally in pKVM to avoid unecessary churn on
TDX/SNP side, and I took the same approach with SNP in implementing it
internally in SNP's KVM interfaces since it seemed unlikely there would
be much common ground with how TDX handles it via KVM_TDX_INIT_MEM_REGION.

> (e) What guarantees the direct map is valid for guest_memfd?

Are you suggesting this may change in the near-term? If so, we can
re-work the code to write to guest_memfd via a temporary mapping or
something, but otherwise it seems awkward to account for that scenario
in current code given that SNP specifically has hooks to remove/re-add
directmap entries based on RMPUPDATEs to avoid host breakage, so we
would necessarily need to implement changes if guest_memfd ever made
any changes in this regard.

And we had prior discussions about handling directmap invalidation in
guest_memfd, but Kirill mentioned here[1] that special handling didn't
actually seem to be a requirement of TDX private memory, and so it
didn't seem likely that pushing that into gmem would be a welcome
change.

All that said, TDX does still seem to invalidate directmap entries as
part of tdh_mem_page_add(), so maybe there is a requirement there and
this is worth revisiting?

If so though, it's worth mentioning that cpa_lock contention on
directmap updates is actually a significant contributor to some
scalability issues we've noticed with lots of guests/vCPUs doing lazy
acceptance and needing to frequently invalidate directmap entries as part
of rmpupdate() during gmem allocations, so we're considering just forcing
a 4K directmap for SNP until directmap updates can scale better, so that
might be another reason to not have guest_memfd in the business of
managing directmap updates until there's some concrete use-case in
sight, like being able to rebuild 2MB/1GB directmap entries in a
scalable way during run-time.

[1] https://lore.kernel.org/linux-mm/20221102212656.6giugw542jdxsvhh@amd.com/

> (f) Why does KVM's uAPI *require* the source page to come from the same memslot?

As mentioned above, I think it makes sense to do away with this
requirement and just treat source page as any other user-provided blob.

> 
> > +		if (ret) {
> > +			pr_err("Guest read failed, ret: 0x%x\n", ret);
> > +			goto e_release;
> > +		}
> > +
> > +		ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
> > +				       sev_get_asid(kvm), true);
> > +		if (ret) {
> > +			ret = -EFAULT;
> > +			goto e_release;
> > +		}
> > +
> > +		data.address = __sme_set(pfns[i] << PAGE_SHIFT);
> > +		data.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
> > +		data.page_type = params.page_type;
> > +		data.vmpl3_perms = params.vmpl3_perms;
> > +		data.vmpl2_perms = params.vmpl2_perms;
> > +		data.vmpl1_perms = params.vmpl1_perms;
> > +		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> > +				      &data, error);
> > +		if (ret) {
> > +			pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
> > +			       ret, *error);
> > +			snp_page_reclaim(pfns[i]);
> > +
> > +			/*
> > +			 * When invalid CPUID function entries are detected, the firmware
> > +			 * corrects these entries for debugging purpose and leaves the
> > +			 * page unencrypted so it can be provided users for debugging
> > +			 * and error-reporting.
> > +			 *
> > +			 * Copy the corrected CPUID page back to shared memory so
> > +			 * userpsace can retrieve this information.
> 
> Why?  IIUC, this is basically backdooring reads/writes into guest_memfd to avoid
> having to add proper mmap() support.

The CPUID page is private/encrypted, so it needs to be a gmem page.
SNP firmware is doing the backdooring when it writes the unencrypted,
expected contents into the page during failure. What's wrong with copying
the contents back into the source page so userspace can be use of it?
If we implement the above-mentioned changes then the source page is just
a userspace buffer that isn't necessarily associated with a memslot, so
it becomes even more straightforward.

Would that be acceptable? I'm not sure what you're proposing with
mmap().

>  
> > +			 */
> > +			if (params.page_type == SNP_PAGE_TYPE_CPUID &&
> > +			    *error == SEV_RET_INVALID_PARAM) {
> > +				int ret;
> 
> Ugh, do not shadow variables.

Will fix.

> 
> > +
> > +				host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
> > +
> > +				ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> > +				if (ret)
> > +					pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
> > +					       ret);
> > +			}
> > +
> > +			goto e_release;
> > +		}
> > +	}
> > +
> > +e_release:
> > +	/* Content of memory is updated, mark pages dirty */
> > +	for (i = 0; i < n; i++) {
> > +		set_page_dirty(pfn_to_page(pfns[i]));
> > +		mark_page_accessed(pfn_to_page(pfns[i]));
> > +
> > +		/*
> > +		 * If its an error, then update RMP entry to change page ownership
> > +		 * to the hypervisor.
> > +		 */
> > +		if (ret)
> > +			host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
> > +
> > +		put_page(pfn_to_page(pfns[i]));
> > +	}
> > +
> > +	kvfree(pfns);
> 
> Saving PFNs from guest_memfd, which is fully owned by KVM, is so unnecessarily
> complex.  Add a guest_memfd API (or three) to do this safely, e.g. to lock the
> pages, do (and track) the RMP conversion, etc...

Is adding 3 gmem APIs and tracking RMP states inside gmem really less
complex than what's going on here? The PFNs are only held on to for the
duration of this single function so they can be cleanly rolled back, and
we're using blessed interfaces like kvm_gmem_get_pfn() to get at them.

There's some nuances here that I'm not sure will map to a re-usable gmem
API that would benefit other users. For instance, we need to:

  1) grab the gmem PFN
  2) initialize it in some platform-specific way (copy from source buffer in this case)
  3) switch it private in RMP table
  4) execute SEV_CMD_SNP_LAUNCH_UPDATE firmware cmd

If 2-4 can all be done with self-contained platform-specific callback,
then I could add a gmem API like:

  gmem_initialize_gfn_range(start, end, func, opaque)

  where:
    func: does roughly what snp_launch_update_gfn_handler currently does for
          each PFN it is handed
    opaque: some data structure that would provide the source buffer to
            initialize the gmem pages from

Is that along the lines of what you're suggesting? It wouldn't involve
"tracking" RMP conversions, 'func' would be aware of that for each
PFN it is handed, but it's simple enough that it is easily re-usable for
other platforms without too much fuss.

If you really want to build some deeper tracking of RMP table states
into gmem internals, then I could really use your feedback on the
gmem_prepare() hook I added in this RFC[2] and included as part of this
series, because I ended up not implementing tracking for number of
reason detailed under "Hooks for preparing gmem pages" in the RFC cover
letter and would likely need to revisit that aspect first before
building out this interface.

[2] https://lore.kernel.org/kvm/20231016115028.996656-1-michael.roth@amd.com/

Thanks,

Mike

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-01-16  4:14     ` Michael Roth
@ 2024-02-02 22:54       ` Sean Christopherson
  2024-02-06 23:43         ` Paolo Bonzini
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-02 22:54 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Mon, Jan 15, 2024, Michael Roth wrote:
> On Wed, Jan 10, 2024 at 07:45:36AM -0800, Sean Christopherson wrote:
> > On Sat, Dec 30, 2023, Michael Roth wrote:
> > > diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> > > index b1beb2fe8766..d4325b26724c 100644
> > > --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> > > +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> > > @@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error
> > >  
> > >  See the SEV-SNP specification for further detail on the launch input.
> > >  
> > > +20. KVM_SNP_LAUNCH_UPDATE
> > > +-------------------------
> > > +
> > > +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
> > > +calculates a measurement of the memory contents. The measurement is a signature
> > > +of the memory contents that can be sent to the guest owner as an attestation
> > > +that the memory was encrypted correctly by the firmware.
> > > +
> > > +Parameters (in): struct  kvm_snp_launch_update
> > > +
> > > +Returns: 0 on success, -negative on error
> > > +
> > > +::
> > > +
> > > +        struct kvm_sev_snp_launch_update {
> > > +                __u64 start_gfn;        /* Guest page number to start from. */
> > > +                __u64 uaddr;            /* userspace address need to be encrypted */
> > 
> > Huh?  Why is KVM taking a userspace address?  IIUC, the address unconditionally
> > gets translated into a gfn, so why not pass a gfn?
> > 
> > And speaking of gfns, AFAICT start_gfn is never used.
> 
> I think having both the uaddr and start_gfn parameters makes sense. I
> think it's only awkward because how I'm using the memslot to translate
> the uaddr to a GFN in the current implementation,

Yes.

> > Oof, reading more of the code, this *requires* an effective in-place copy-and-convert
> > of guest memory.
> 
> Yes, I'm having some trouble locating the various threads, but initially
> there were some discussions about having a userspace API that allows for
> UPM/gmem pages to be pre-populated before they are in-place encrypted, but
> we'd all eventually decided that having KVM handle this internally was
> the simplest approach.
> 
> So that's how it's done here, KVM_SNP_LAUNCH_UPDATE copies the pages into
> gmem, then passes those pages on to firmware for encryption. Then the
> VMM is expected to mark the GFN range as private via
> KVM_SET_MEMORY_ATTRIBUTES, since the VMM understands what constitutes
> initial private/encrypted payload. I should document that better in
> KVM_SNP_LAUNCH_UPDATE docs however.

That's fine.  As above, my complaints are that the unencrypted source *must* be
covered by a memslot.  That's beyond ugly.

> > What's "the IMI"?  Initial Measurement Image?  I assume this is essentially just
> > a flag that communicates whether or not the page should be measured?
> 
> This is actually for loading a measured migration agent into the target
> system so that it can handle receiving the encrypted guest data from the
> source. There's still a good deal of planning around how live migration
> will be handled however so if you think it's premature to expose this
> via KVM I can remove the related fields.

Yes, please.  Though FWIW, I honestly hope KVM_SEV_SNP_LAUNCH_UPDATE goes away
and we end up with a common uAPI for populating guest memory:

https://lore.kernel.org/all/Zbrj5WKVgMsUFDtb@google.com

> > > +                __u8 page_type;         /* page type */
> > > +                __u8 vmpl3_perms;       /* VMPL3 permission mask */
> > > +                __u8 vmpl2_perms;       /* VMPL2 permission mask */
> > > +                __u8 vmpl1_perms;       /* VMPL1 permission mask */
> > 
> > Why?  KVM doesn't support VMPLs.
> 
> It does actually get used by the SVSM.

> I can remove these but then we'd need some capability bit or something to
> know when they are available if they get re-introduced.

_If_.  We don't merge dead code, and we _definitely_ don't merge dead code that
creates ABI.

> > > +		kvaddr = pfn_to_kaddr(pfns[i]);
> > > +		if (!virt_addr_valid(kvaddr)) {
> > 
> > I really, really don't like that this assume guest_memfd is backed by struct page.
> 
> There are similar enforcements in the SEV/SEV-ES code. There was some
> initial discussion about relaxing this for SNP so we could support
> things like /dev/mem-mapped guest memory, but then guest_memfd came
> along and made that to be an unlikely use-case in the near-term given
> that it relies on alloc_pages() currently and explicitly guards against
> mmap()'ing pages in userspace.
> 
> I think it makes to keep the current tightened restrictions in-place
> until such a use-case comes along, since otherwise we are relaxing a
> bunch of currently-useful sanity checks that span all throughout the code
> to support some nebulous potential use-case that might never come along.
> I think it makes more sense to cross that bridge when we get there.

I disagree.  You say "sanity checks", while I see a bunch of arbitrary assumptions
that don't need to exist.  Yes, today guest_memfd always uses struct page memory,
but that should have _zero_ impact on SNP.  Arbitrary assumptions often cause a
lot of confusion for future readers, e.g. a few years from now, if the above code
still exists, someone might wonder what is so special about struct page memory,
and then waste a bunch of time trying to figure out that there's no technical
reason SNP "requires" struct page memory.

This is partly why I was pushing for guest_memfd to handle some of this; the
gmem code _knows_ what backing type it's using, it _knows_ if the direct map is
valid, etc.

> > > +			pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
> > > +			ret = -EINVAL;
> > > +			goto e_release;
> > > +		}
> > > +
> > > +		ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> > 
> > Good gravy.  If I'm reading this correctly, KVM:
> > 
> >   1. Translates an HVA into a GFN.
> >   2. Gets the PFN for that GFN from guest_memfd
> >   3. Verifies the PFN is not assigned to the guest
> >   4. Copies memory from the shared memslot page to the guest_memfd page
> >   5. Converts the page to private and asks the PSP to encrypt it
> > 
> > (a) As above, why is #1 a thing?
> 
> Yah, it's probably best to avoid this, as proposed above.
> 
> > (b) Why are KVM's memory attributes never consulted?
> 
> It doesn't really matter if the attributes are set before or after
> KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches
> they pages get set to private so they get faulted in from gmem. We could
> document our expectations and enforce them here if that's preferable
> however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance
> would make it easier to enforce that userspace does the right thing.
> I'll see how that looks if there are no objections.

Userspace owns whether a page is PRIVATE or SHARED, full stop.  If KVM can't
honor that, then we need to come up with better uAPI.

> > (c) What prevents TOCTOU issues with respect to the RMP?
> 
> Time-of-use will be when the guest faults the gmem page in with C-bit
> set. If it is not in the expected Guest-owned/pre-validated state that
> SEV_CMD_SNP_LAUNCH_UPDATE expected/set, then the guest will get an RMP
> fault or #VC exception for unvalidated page access. It will also fail
> attestation. Not sure if that covers the scenarios you had in mind.

I don't think this covers what I was asking, but I suspect my concern will go
away once the new APIs come along, so let's table this for now.

> 
> > (d) Why is *KVM* copying memory into guest_memfd?
> 
> As mentioned above, there were various discussions of ways of allowing
> userspace to pwrite() to the guest_memfd in advance of
> "sealing"/"binding" it and then encrypting it in place. I think this was
> one of the related threads:
> 
>   https://lore.kernel.org/linux-mm/YkyKywkQYbr9U0CA@google.com/
> 
> My read at the time was that the requirements between pKVM/TDX/SNP were all
> so unique that we'd spin forever trying to come up with a userspace ABI
> that worked for everyone. At the time you'd suggested for pKVM to handle
> their specific requirements internally in pKVM to avoid unecessary churn on
> TDX/SNP side, and I took the same approach with SNP in implementing it
> internally in SNP's KVM interfaces since it seemed unlikely there would
> be much common ground with how TDX handles it via KVM_TDX_INIT_MEM_REGION.

Yeah, the whole "intra-memslot copy" thing threw me.

> > (e) What guarantees the direct map is valid for guest_memfd?
> 
> Are you suggesting this may change in the near-term?

I asking because, when I asked, I was unaware that the plan was to shatter the
direct to address the 2MiB vs. 4KiB erratum (as opposed to unmapping guest_memfd
pfns).

> > > +		if (ret) {
> > > +			pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
> > > +			       ret, *error);
> > > +			snp_page_reclaim(pfns[i]);
> > > +
> > > +			/*
> > > +			 * When invalid CPUID function entries are detected, the firmware
> > > +			 * corrects these entries for debugging purpose and leaves the
> > > +			 * page unencrypted so it can be provided users for debugging
> > > +			 * and error-reporting.
> > > +			 *
> > > +			 * Copy the corrected CPUID page back to shared memory so
> > > +			 * userpsace can retrieve this information.
> > 
> > Why?  IIUC, this is basically backdooring reads/writes into guest_memfd to avoid
> > having to add proper mmap() support.
> 
> The CPUID page is private/encrypted, so it needs to be a gmem page.
> SNP firmware is doing the backdooring when it writes the unencrypted,
> expected contents into the page during failure. What's wrong with copying
> the contents back into the source page so userspace can be use of it?
> If we implement the above-mentioned changes then the source page is just
> a userspace buffer that isn't necessarily associated with a memslot, so
> it becomes even more straightforward.
> 
> Would that be acceptable?

Yes, I am specifically complaining about writing guest memory on failure, which is
all kinds of weird.

> > > +	kvfree(pfns);
> > 
> > Saving PFNs from guest_memfd, which is fully owned by KVM, is so unnecessarily
> > complex.  Add a guest_memfd API (or three) to do this safely, e.g. to lock the
> > pages, do (and track) the RMP conversion, etc...
> 
> Is adding 3 gmem APIs and tracking RMP states inside gmem really less
> complex than what's going on here?

I think we covered this in PUCK?  Holler if you still have questions here.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 06/35] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults
  2023-12-30 17:23 ` [PATCH v11 06/35] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults Michael Roth
@ 2024-02-06 20:51   ` Sean Christopherson
  2024-02-12 10:00     ` Paolo Bonzini
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-06 20:51 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Isaku Yamahata

On Sat, Dec 30, 2023, Michael Roth wrote:
> In some cases the full 64-bit error code for the KVM page fault will be
> needed to determine things like whether or not a fault was for a private
> or shared guest page, so update related code to accept the full 64-bit
> value so it can be plumbed all the way through to where it is needed.
> 
> The accessors of fault->error_code are changed as follows:
> 
> - FNAME(page_fault): change to explicitly use lower_32_bits() since that
>                      is no longer done in kvm_mmu_page_fault()
> - kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
>                         PFERR_NESTED_GUEST_PAGE
> - mmutrace: changed u32 -> u64
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Link: https://lore.kernel.org/kvm/20230612042559.375660-1-michael.roth@amd.com/T/#mbd0b20c9a2cf50319d5d2a27b63f73c772112076
> [mdr: drop references/changes to code not in current gmem tree, update
>       commit message]
> Signed-off-by: Michael Roth <michael.roth@amd.com>

I assume Isaku is the original author?  If so, that's missing from this patch.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-02-02 22:54       ` Sean Christopherson
@ 2024-02-06 23:43         ` Paolo Bonzini
  2024-02-07  2:43           ` Sean Christopherson
  2024-02-09  1:52           ` Michael Roth
  0 siblings, 2 replies; 58+ messages in thread
From: Paolo Bonzini @ 2024-02-06 23:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Fri, Feb 2, 2024 at 11:55 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > +        struct kvm_sev_snp_launch_update {
> > > > +                __u64 start_gfn;        /* Guest page number to start from. */
> > > > +                __u64 uaddr;            /* userspace address need to be encrypted */
> > >
> > > Huh?  Why is KVM taking a userspace address?  IIUC, the address unconditionally
> > > gets translated into a gfn, so why not pass a gfn?
> > >
> > > And speaking of gfns, AFAICT start_gfn is never used.
> >
> > I think having both the uaddr and start_gfn parameters makes sense. I
> > think it's only awkward because how I'm using the memslot to translate
> > the uaddr to a GFN in the current implementation,
>
> Yes.
>
> > > Oof, reading more of the code, this *requires* an effective in-place copy-and-convert
> > > of guest memory.
> >
> > So that's how it's done here, KVM_SNP_LAUNCH_UPDATE copies the pages into
> > gmem, then passes those pages on to firmware for encryption. Then the
> > VMM is expected to mark the GFN range as private via
> > KVM_SET_MEMORY_ATTRIBUTES, since the VMM understands what constitutes
> > initial private/encrypted payload. I should document that better in
> > KVM_SNP_LAUNCH_UPDATE docs however.
>
> That's fine.  As above, my complaints are that the unencrypted source *must* be
> covered by a memslot.  That's beyond ugly.

Yes, if there's one field that has to go it's uaddr, and then you'll
have a non-in-place encrypt (any copy performed by KVM it is hidden).

> > > > +         kvaddr = pfn_to_kaddr(pfns[i]);
> > > > +         if (!virt_addr_valid(kvaddr)) {
> > >
> > > I really, really don't like that this assume guest_memfd is backed by struct page.
> >
> > There are similar enforcements in the SEV/SEV-ES code. There was some
> > initial discussion about relaxing this for SNP so we could support
> > things like /dev/mem-mapped guest memory, but then guest_memfd came
> > along and made that to be an unlikely use-case in the near-term given
> > that it relies on alloc_pages() currently and explicitly guards against
> > mmap()'ing pages in userspace.
> >
> > I think it makes to keep the current tightened restrictions in-place
> > until such a use-case comes along, since otherwise we are relaxing a
> > bunch of currently-useful sanity checks that span all throughout the code

What sanity is being checked for, in other words why are they useful?
If all you get for breaking the promise is a KVM_BUG_ON, for example,
that's par for the course. If instead you get an oops, then we have a
problem.

I may be a bit less draconian than Sean, but the assumptions need to
be documented and explained because they _are_ going to go away.

> > > (b) Why are KVM's memory attributes never consulted?
> >
> > It doesn't really matter if the attributes are set before or after
> > KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches
> > they pages get set to private so they get faulted in from gmem. We could
> > document our expectations and enforce them here if that's preferable
> > however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance
> > would make it easier to enforce that userspace does the right thing.
> > I'll see how that looks if there are no objections.
>
> Userspace owns whether a page is PRIVATE or SHARED, full stop.  If KVM can't
> honor that, then we need to come up with better uAPI.

Can you explain more verbosely what you mean?

> > > > +                  * When invalid CPUID function entries are detected, the firmware
> > > > +                  * corrects these entries for debugging purpose and leaves the
> > > > +                  * page unencrypted so it can be provided users for debugging
> > > > +                  * and error-reporting.
> > >
> > > Why?  IIUC, this is basically backdooring reads/writes into guest_memfd to avoid
> > > having to add proper mmap() support.
>
> Yes, I am specifically complaining about writing guest memory on failure, which is
> all kinds of weird.

It is weird but I am not sure if you are complaining about firmware
behavior or something else.

Paolo


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 15/35] KVM: SEV: Add KVM_SNP_INIT command
  2023-12-30 17:23 ` [PATCH v11 15/35] KVM: SEV: Add KVM_SNP_INIT command Michael Roth
@ 2024-02-06 23:51   ` Paolo Bonzini
  2024-03-20 17:28   ` Paolo Bonzini
  1 sibling, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2024-02-06 23:51 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, Pavan Kumar Paluri

On Sat, Dec 30, 2023 at 6:26 PM Michael Roth <michael.roth@amd.com> wrote:
>
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The KVM_SNP_INIT command is used by the hypervisor to initialize the
> SEV-SNP platform context. In a typical workflow, this command should be
> the first command issued. When creating SEV-SNP guest, the VMM must use
> this command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
>
> The flags value must be zero, it will be extended in future SNP support
> to communicate the optional features (such as restricted INT injection
> etc).

We have a (preexisting) problem in that KVM_SEV_INIT and
KVM_SEV_ES_INIT are not flexible enough. debug_swap has broken
measurements of the VMSA because it changed the contents of the VMSA
under userspace's feet, therefore VMSA features need to be passed into
the API somehow. It's preexisting but we need to fix it before the new
KVM_SNP_INIT API makes it worse.

I have started prototyping a change to move SEV-ES/SEV-SNP to
KVM_CREATE_VM, and introduce a single KVM_SEV_INIT_VM operation that
can be used for the PSP initialization.

> +The flags bitmap is defined as::
> +
> +   /* enable the restricted injection */
> +   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
> +
> +   /* enable the restricted injection timer */
> +   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)

These are not yet supported, so they might as well not be documented.
If you want to document them, you need to provide an API to query
SEV_SNP_SUPPORTED_FLAGS. Let's do that later.

> +       if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
> +               ret = -EOPNOTSUPP;
> +
> +       params.flags = SEV_SNP_SUPPORTED_FLAGS;

This assignment is not necessary.

Paolo


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-02-06 23:43         ` Paolo Bonzini
@ 2024-02-07  2:43           ` Sean Christopherson
  2024-02-07  8:03             ` Paolo Bonzini
  2024-02-09  1:52           ` Michael Roth
  1 sibling, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07  2:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Wed, Feb 07, 2024, Paolo Bonzini wrote:
> On Fri, Feb 2, 2024 at 11:55 PM Sean Christopherson <seanjc@google.com> wrote:
> > > It doesn't really matter if the attributes are set before or after
> > > KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches
> > > they pages get set to private so they get faulted in from gmem. We could
> > > document our expectations and enforce them here if that's preferable
> > > however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance
> > > would make it easier to enforce that userspace does the right thing.
> > > I'll see how that looks if there are no objections.
> >
> > Userspace owns whether a page is PRIVATE or SHARED, full stop.  If KVM can't
> > honor that, then we need to come up with better uAPI.
> 
> Can you explain more verbosely what you mean?

As proposed, snp_launch_update_gfn_handler() doesn't verify the state of the
gfns' attributes.  But that's a minor problem and probably not a sticking point.

My overarching complaint is that the code is to be wildly unsafe, or at the very
least brittle.  Without guest_memfd's knowledge, and without holding any locks
beyond kvm->lock, it 

 1) checks if a pfn is shared in the RMP
 2) copies data to the page
 3) converts the page to private in the RMP
 4) does PSP stuff
 5) on failure, converts the page back to shared in RMP
 6) conditionally on failure, writes to the page via a gfn

I'm not at all confident that 1-4 isn't riddled with TOCTOU bugs, and that's
before KVM gains support for intrahost migration, i.e. before KVM allows multiple
VM instances to bind to a single guest_memfd.

But I _think_ we mostly sorted this out at PUCK.  IIRC, the plan is to have guest_memfd
provide (kernel) APIs to allow arch/vendor code to initialize a guest_memfd range.
That will give guest_memfd complete control over the state of a given page, will
allow guest_memfd to take the appropriate locks, and if we're lucky, will be reusable
by other CoCo flavors beyond SNP.

> > > > > +                  * When invalid CPUID function entries are detected, the firmware
> > > > > +                  * corrects these entries for debugging purpose and leaves the
> > > > > +                  * page unencrypted so it can be provided users for debugging
> > > > > +                  * and error-reporting.
> > > >
> > > > Why?  IIUC, this is basically backdooring reads/writes into guest_memfd to avoid
> > > > having to add proper mmap() support.
> >
> > Yes, I am specifically complaining about writing guest memory on failure, which is
> > all kinds of weird.
> 
> It is weird but I am not sure if you are complaining about firmware
> behavior or something else.

This proposed KVM code:

+                               host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+                               ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+                               if (ret)
+                                       pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
+                                              ret);


I have no objection to propagating error/debug information back to userspace,
but it needs to be routed through the source page (or I suppose some dedicated
error page, but that seems like overkill).  Shoving the error information into
guest memory is gross.

But this should naturally go away when the requirement that the source be
covered by the same memslot also goes away.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-02-07  2:43           ` Sean Christopherson
@ 2024-02-07  8:03             ` Paolo Bonzini
  0 siblings, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2024-02-07  8:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Wed, Feb 7, 2024 at 3:43 AM Sean Christopherson <seanjc@google.com> wrote:
> > > Userspace owns whether a page is PRIVATE or SHARED, full stop.  If KVM can't
> > > honor that, then we need to come up with better uAPI.
> >
> > Can you explain more verbosely what you mean?
>
> As proposed, snp_launch_update_gfn_handler() doesn't verify the state of the
> gfns' attributes.  But that's a minor problem and probably not a sticking point.
>
> My overarching complaint is that the code is to be wildly unsafe, or at the very
> least brittle.  Without guest_memfd's knowledge, and without holding any locks
> beyond kvm->lock, it
>
>  1) checks if a pfn is shared in the RMP
>  2) copies data to the page
>  3) converts the page to private in the RMP
>  4) does PSP stuff
>  5) on failure, converts the page back to shared in RMP
>  6) conditionally on failure, writes to the page via a gfn
>
> I'm not at all confident that 1-4 isn't riddled with TOCTOU bugs, and that's
> before KVM gains support for intrahost migration, i.e. before KVM allows multiple
> VM instances to bind to a single guest_memfd.

Absolutely.

> But I _think_ we mostly sorted this out at PUCK.  IIRC, the plan is to have guest_memfd
> provide (kernel) APIs to allow arch/vendor code to initialize a guest_memfd range.
> That will give guest_memfd complete control over the state of a given page, will
> allow guest_memfd to take the appropriate locks, and if we're lucky, will be reusable
> by other CoCo flavors beyond SNP.

Ok, either way it's clear that guest_memfd needs to be in charge.
Whether it's MEMORY_ENCRYPT_OP that calls into guest_memfd or vice
versa, that only matters so much.

> > > Yes, I am specifically complaining about writing guest memory on failure, which is
> > > all kinds of weird.
> >
> > It is weird but I am not sure if you are complaining about firmware
> > behavior or something else.
>
> This proposed KVM code:
>
> +                               host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
> +
> +                               ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> +                               if (ret)
> +                                       pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
> +                                              ret);
>
>
> I have no objection to propagating error/debug information back to userspace,
> but it needs to be routed through the source page (or I suppose some dedicated
> error page, but that seems like overkill).  Shoving the error information into
> guest memory is gross.

Yes, but it should be just a consequence of not actually using
start_gfn. Having to copy back remains weird, but what can you do.

Paolo


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-02-06 23:43         ` Paolo Bonzini
  2024-02-07  2:43           ` Sean Christopherson
@ 2024-02-09  1:52           ` Michael Roth
  2024-02-09 14:34             ` Sean Christopherson
  1 sibling, 1 reply; 58+ messages in thread
From: Michael Roth @ 2024-02-09  1:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, linux-coco, linux-mm, linux-crypto,
	x86, linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa,
	ardb, vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Wed, Feb 07, 2024 at 12:43:02AM +0100, Paolo Bonzini wrote:
> On Fri, Feb 2, 2024 at 11:55 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > > +        struct kvm_sev_snp_launch_update {
> > > > > +                __u64 start_gfn;        /* Guest page number to start from. */
> > > > > +                __u64 uaddr;            /* userspace address need to be encrypted */
> > > >
> > > > Huh?  Why is KVM taking a userspace address?  IIUC, the address unconditionally
> > > > gets translated into a gfn, so why not pass a gfn?
> > > >
> > > > And speaking of gfns, AFAICT start_gfn is never used.
> > >
> > > I think having both the uaddr and start_gfn parameters makes sense. I
> > > think it's only awkward because how I'm using the memslot to translate
> > > the uaddr to a GFN in the current implementation,
> >
> > Yes.
> >
> > > > Oof, reading more of the code, this *requires* an effective in-place copy-and-convert
> > > > of guest memory.
> > >
> > > So that's how it's done here, KVM_SNP_LAUNCH_UPDATE copies the pages into
> > > gmem, then passes those pages on to firmware for encryption. Then the
> > > VMM is expected to mark the GFN range as private via
> > > KVM_SET_MEMORY_ATTRIBUTES, since the VMM understands what constitutes
> > > initial private/encrypted payload. I should document that better in
> > > KVM_SNP_LAUNCH_UPDATE docs however.
> >
> > That's fine.  As above, my complaints are that the unencrypted source *must* be
> > covered by a memslot.  That's beyond ugly.
> 
> Yes, if there's one field that has to go it's uaddr, and then you'll
> have a non-in-place encrypt (any copy performed by KVM it is hidden).
> 
> > > > > +         kvaddr = pfn_to_kaddr(pfns[i]);
> > > > > +         if (!virt_addr_valid(kvaddr)) {
> > > >
> > > > I really, really don't like that this assume guest_memfd is backed by struct page.
> > >
> > > There are similar enforcements in the SEV/SEV-ES code. There was some
> > > initial discussion about relaxing this for SNP so we could support
> > > things like /dev/mem-mapped guest memory, but then guest_memfd came
> > > along and made that to be an unlikely use-case in the near-term given
> > > that it relies on alloc_pages() currently and explicitly guards against
> > > mmap()'ing pages in userspace.
> > >
> > > I think it makes to keep the current tightened restrictions in-place
> > > until such a use-case comes along, since otherwise we are relaxing a
> > > bunch of currently-useful sanity checks that span all throughout the code
> 
> What sanity is being checked for, in other words why are they useful?
> If all you get for breaking the promise is a KVM_BUG_ON, for example,
> that's par for the course. If instead you get an oops, then we have a
> problem.
> 
> I may be a bit less draconian than Sean, but the assumptions need to
> be documented and explained because they _are_ going to go away.

Maybe in this case sanity-check isn't the right word, but for instance
the occurance Sean objected to:

  kvaddr = pfn_to_kaddr(pfns[i]);
  if (!virt_addr_valid(kvaddr)) {
    ...
    ret = -EINVAL;

where there are pfn_valid() checks underneath the covers that provide
some assurance this is normal struct-page-backed/kernel-tracked memory
that has a mapping in the directmap we can use here. Dropping that
assumption means we need to create temporary mappings to access the PFN,
which complicates the code for a potential use-case that doesn't yet
exist. But if the maintainers are telling me this will change then I
have no objection to making those changes :) That was just my thinking
at the time.

And yes, if we move more of this sort of functionality closer to gmem
then those assumptions become reality and we can keep the code more
closely in sync will how memory is actually allocated.

I'll rework this to something closer to what Sean mentioned during the
PUCK call: a gmem interface that can be called to handle populating
initial gmem pages, and drop remaining any assumptions about
struct-backed/directmapped in PFNs in the code that remains afterward.

I'm hoping if we do move to a unified KVM API that a similar approach
will work in that case too. It may be a bit tricky with how TDX does
a lot of this through KVM MMU / SecureEPT hooks, this may complicate
locking expectations and not necessarily fit nicely into the same flow
as SNP, but we'll see how it goes.

-Mike

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2024-02-09  1:52           ` Michael Roth
@ 2024-02-09 14:34             ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-02-09 14:34 UTC (permalink / raw)
  To: Michael Roth
  Cc: Paolo Bonzini, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Thu, Feb 08, 2024, Michael Roth wrote:
> On Wed, Feb 07, 2024 at 12:43:02AM +0100, Paolo Bonzini wrote:
> > On Fri, Feb 2, 2024 at 11:55 PM Sean Christopherson <seanjc@google.com> wrote:
> > What sanity is being checked for, in other words why are they useful?
> > If all you get for breaking the promise is a KVM_BUG_ON, for example,
> > that's par for the course. If instead you get an oops, then we have a
> > problem.
> > 
> > I may be a bit less draconian than Sean, but the assumptions need to
> > be documented and explained because they _are_ going to go away.
> 
> Maybe in this case sanity-check isn't the right word, but for instance
> the occurance Sean objected to:
> 
>   kvaddr = pfn_to_kaddr(pfns[i]);
>   if (!virt_addr_valid(kvaddr)) {
>     ...
>     ret = -EINVAL;
> 
> where there are pfn_valid() checks underneath the covers that provide
> some assurance this is normal struct-page-backed/kernel-tracked memory
> that has a mapping in the directmap we can use here. Dropping that
> assumption means we need to create temporary mappings to access the PFN,

No, you don't.  kvm_vcpu_map() does all of the lifting for you, with the small
caveat that it obviously needs a vCPU.  But that's trivial to solve with a minor
refactoring, *if* we need to solve that problem (it's not clear to me whether or
not the APIs for copying data into guest_memfd will be VM-scoped or vCPU-scoped).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 06/35] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults
  2024-02-06 20:51   ` Sean Christopherson
@ 2024-02-12 10:00     ` Paolo Bonzini
  2024-02-12 16:42       ` Michael Roth
  0 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2024-02-12 10:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Isaku Yamahata

On Tue, Feb 6, 2024 at 9:52 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Sat, Dec 30, 2023, Michael Roth wrote:
> > In some cases the full 64-bit error code for the KVM page fault will be
> > needed to determine things like whether or not a fault was for a private
> > or shared guest page, so update related code to accept the full 64-bit
> > value so it can be plumbed all the way through to where it is needed.
> >
> > The accessors of fault->error_code are changed as follows:
> >
> > - FNAME(page_fault): change to explicitly use lower_32_bits() since that
> >                      is no longer done in kvm_mmu_page_fault()
> > - kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
> >                         PFERR_NESTED_GUEST_PAGE
> > - mmutrace: changed u32 -> u64
> >
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > Link: https://lore.kernel.org/kvm/20230612042559.375660-1-michael.roth@amd.com/T/#mbd0b20c9a2cf50319d5d2a27b63f73c772112076
> > [mdr: drop references/changes to code not in current gmem tree, update
> >       commit message]
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
>
> I assume Isaku is the original author?  If so, that's missing from this patch.

The root of this patch seem to be in a reply to "KVM: x86: Add
'fault_is_private' x86 op"
(https://patchew.org/linux/20230220183847.59159-1-michael.roth@amd.com/20230220183847.59159-2-michael.roth@amd.com/),
so yes.

Paolo


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 09/35] KVM: x86: Determine shared/private faults based on vm_type
  2023-12-30 17:23 ` [PATCH v11 09/35] KVM: x86: Determine shared/private faults based on vm_type Michael Roth
@ 2024-02-12 10:31   ` Paolo Bonzini
  2024-02-12 16:27     ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2024-02-12 10:31 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

On Sat, Dec 30, 2023 at 6:24 PM Michael Roth <michael.roth@amd.com> wrote:
>
> For KVM_X86_SNP_VM, only the PFERR_GUEST_ENC_MASK flag is needed to
> determine with an #NPF is due to a private/shared access by the guest.
> Implement that handling here. Also add handling needed to deal with
> SNP guests which in some cases will make MMIO accesses with the
> encryption bit.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/mmu/mmu.c          | 12 ++++++++++--
>  arch/x86/kvm/mmu/mmu_internal.h | 20 +++++++++++++++++++-
>  2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d3fbfe0686a0..61213f6648a1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4331,6 +4331,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
>  static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
>         struct kvm_memory_slot *slot = fault->slot;
> +       bool private_fault = fault->is_private;

I think it's nicer to just make the fault !is_private in
kvm_mmu_do_page_fault().

> +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> +{
> +       bool private_fault = false;
> +
> +       if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) {
> +               private_fault = !!(err & PFERR_GUEST_ENC_MASK);
> +       } else if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) {
> +               /*
> +                * This handling is for gmem self-tests and guests that treat
> +                * userspace as the authority on whether a fault should be
> +                * private or not.
> +                */
> +               private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
> +       }

Any reason to remove the is_private page fault that was there in
previous versions of the patch?  I don't really like having both TDX
and SVM-specific code in this function.

Paolo

> +       return private_fault;
> +}
> +
>  /*
>   * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
>   * and of course kvm_mmu_do_page_fault().
> @@ -298,7 +316,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                 .max_level = KVM_MAX_HUGEPAGE_LEVEL,
>                 .req_level = PG_LEVEL_4K,
>                 .goal_level = PG_LEVEL_4K,
> -               .is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> +               .is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
>         };
>         int r;
>
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 30/35] KVM: x86: Add gmem hook for determining max NPT mapping level
  2023-12-30 17:23 ` [PATCH v11 30/35] KVM: x86: Add gmem hook for determining max NPT mapping level Michael Roth
@ 2024-02-12 10:50   ` Paolo Bonzini
  2024-02-12 17:03     ` Michael Roth
  0 siblings, 1 reply; 58+ messages in thread
From: Paolo Bonzini @ 2024-02-12 10:50 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

On Sat, Dec 30, 2023 at 6:32 PM Michael Roth <michael.roth@amd.com> wrote:
>         int max_order, r;
> +       u8 max_level;
>
>         if (!kvm_slot_can_be_private(fault->slot)) {
>                 kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> @@ -4321,8 +4322,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
>                 return r;
>         }
>
> -       fault->max_level = min(kvm_max_level_for_order(max_order),
> -                              fault->max_level);
> +       max_level = kvm_max_level_for_order(max_order);
> +       r = static_call(kvm_x86_gmem_max_level)(vcpu->kvm, fault->pfn,
> +                                               fault->gfn, &max_level);

Might as well pass &fault->max_level directly to the callback, with no
change to the vendor-specific code.

I'll include the MMU part in a generic series to be the base for both
Intel TDX and AMD SEV-SNP, and will do that change.

Paolo

> +       if (r) {
> +               kvm_release_pfn_clean(fault->pfn);
> +               return r;
> +       }
> +
> +       fault->max_level = min(max_level, fault->max_level);
>         fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
>
>         return RET_PF_CONTINUE;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 85f63b6842b6..5eb836b73131 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4315,3 +4315,30 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
>                 pfn += use_2m_update ? PTRS_PER_PMD : 1;
>         }
>  }
> +
> +int sev_gmem_max_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level)
> +{
> +       int level, rc;
> +       bool assigned;
> +
> +       if (!sev_snp_guest(kvm))
> +               return 0;
> +
> +       rc = snp_lookup_rmpentry(pfn, &assigned, &level);
> +       if (rc) {
> +               pr_err_ratelimited("SEV: RMP entry not found: GFN %llx PFN %llx level %d error %d\n",
> +                                  gfn, pfn, level, rc);
> +               return -ENOENT;
> +       }
> +
> +       if (!assigned) {
> +               pr_err_ratelimited("SEV: RMP entry is not assigned: GFN %llx PFN %llx level %d\n",
> +                                  gfn, pfn, level);
> +               return -EINVAL;
> +       }
> +
> +       if (level < *max_level)
> +               *max_level = level;
> +
> +       return 0;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f26b8c2a8be4..f745022f7454 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5067,6 +5067,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>         .alloc_apic_backing_page = svm_alloc_apic_backing_page,
>
>         .gmem_prepare = sev_gmem_prepare,
> +       .gmem_max_level = sev_gmem_max_level,
>         .gmem_invalidate = sev_gmem_invalidate,
>  };
>
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 09/35] KVM: x86: Determine shared/private faults based on vm_type
  2024-02-12 10:31   ` Paolo Bonzini
@ 2024-02-12 16:27     ` Sean Christopherson
  2024-02-12 16:47       ` Michael Roth
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-12 16:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

On Mon, Feb 12, 2024, Paolo Bonzini wrote:
> On Sat, Dec 30, 2023 at 6:24 PM Michael Roth <michael.roth@amd.com> wrote:
> >
> > For KVM_X86_SNP_VM, only the PFERR_GUEST_ENC_MASK flag is needed to
> > determine with an #NPF is due to a private/shared access by the guest.
> > Implement that handling here. Also add handling needed to deal with
> > SNP guests which in some cases will make MMIO accesses with the
> > encryption bit.
> >
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c          | 12 ++++++++++--
> >  arch/x86/kvm/mmu/mmu_internal.h | 20 +++++++++++++++++++-
> >  2 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index d3fbfe0686a0..61213f6648a1 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4331,6 +4331,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> >  static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >  {
> >         struct kvm_memory_slot *slot = fault->slot;
> > +       bool private_fault = fault->is_private;
> 
> I think it's nicer to just make the fault !is_private in
> kvm_mmu_do_page_fault().

Yeah.  I'm starting to recall more of this discussion.  This is one of the reasons
I suggested/requested stuffing the error code to piggy-back the new SNP bit; doing
so allows is_private to be computed from the get-go without needing any vendor
specific hooks.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 06/35] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults
  2024-02-12 10:00     ` Paolo Bonzini
@ 2024-02-12 16:42       ` Michael Roth
  0 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2024-02-12 16:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, linux-coco, linux-mm, linux-crypto,
	x86, linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa,
	ardb, vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Isaku Yamahata

On Mon, Feb 12, 2024 at 11:00:27AM +0100, Paolo Bonzini wrote:
> On Tue, Feb 6, 2024 at 9:52 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Sat, Dec 30, 2023, Michael Roth wrote:
> > > In some cases the full 64-bit error code for the KVM page fault will be
> > > needed to determine things like whether or not a fault was for a private
> > > or shared guest page, so update related code to accept the full 64-bit
> > > value so it can be plumbed all the way through to where it is needed.
> > >
> > > The accessors of fault->error_code are changed as follows:
> > >
> > > - FNAME(page_fault): change to explicitly use lower_32_bits() since that
> > >                      is no longer done in kvm_mmu_page_fault()
> > > - kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
> > >                         PFERR_NESTED_GUEST_PAGE
> > > - mmutrace: changed u32 -> u64
> > >
> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > Link: https://lore.kernel.org/kvm/20230612042559.375660-1-michael.roth@amd.com/T/#mbd0b20c9a2cf50319d5d2a27b63f73c772112076
> > > [mdr: drop references/changes to code not in current gmem tree, update
> > >       commit message]
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> >
> > I assume Isaku is the original author?  If so, that's missing from this patch.
> 
> The root of this patch seem to be in a reply to "KVM: x86: Add
> 'fault_is_private' x86 op"
> (https://patchew.org/linux/20230220183847.59159-1-michael.roth@amd.com/20230220183847.59159-2-michael.roth@amd.com/),
> so yes.

Yes this is Isaku's patch, I think the authorship got mangled during a rebase.
I'll make sure to get that fixed up.

-Mike

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 09/35] KVM: x86: Determine shared/private faults based on vm_type
  2024-02-12 16:27     ` Sean Christopherson
@ 2024-02-12 16:47       ` Michael Roth
  0 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2024-02-12 16:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

On Mon, Feb 12, 2024 at 08:27:21AM -0800, Sean Christopherson wrote:
> On Mon, Feb 12, 2024, Paolo Bonzini wrote:
> > On Sat, Dec 30, 2023 at 6:24 PM Michael Roth <michael.roth@amd.com> wrote:
> > >
> > > For KVM_X86_SNP_VM, only the PFERR_GUEST_ENC_MASK flag is needed to
> > > determine with an #NPF is due to a private/shared access by the guest.
> > > Implement that handling here. Also add handling needed to deal with
> > > SNP guests which in some cases will make MMIO accesses with the
> > > encryption bit.
> > >
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c          | 12 ++++++++++--
> > >  arch/x86/kvm/mmu/mmu_internal.h | 20 +++++++++++++++++++-
> > >  2 files changed, 29 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index d3fbfe0686a0..61213f6648a1 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -4331,6 +4331,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > >  static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > >  {
> > >         struct kvm_memory_slot *slot = fault->slot;
> > > +       bool private_fault = fault->is_private;
> > 
> > I think it's nicer to just make the fault !is_private in
> > kvm_mmu_do_page_fault().
> 
> Yeah.  I'm starting to recall more of this discussion.  This is one of the reasons
> I suggested/requested stuffing the error code to piggy-back the new SNP bit; doing
> so allows is_private to be computed from the get-go without needing any vendor
> specific hooks.

Makes sense to me. Based on your suggestion here:

  https://lore.kernel.org/kvm/ZcUO5sFEAIH68JIA@google.com/

I was planning to drop this patch and adopt the TDX implementation:

  https://github.com/intel/tdx/commit/3717a903ef453aa7b62e7eb65f230566b7f158d4

-Mike

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 30/35] KVM: x86: Add gmem hook for determining max NPT mapping level
  2024-02-12 10:50   ` Paolo Bonzini
@ 2024-02-12 17:03     ` Michael Roth
  0 siblings, 0 replies; 58+ messages in thread
From: Michael Roth @ 2024-02-12 17:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang

On Mon, Feb 12, 2024 at 11:50:26AM +0100, Paolo Bonzini wrote:
> On Sat, Dec 30, 2023 at 6:32 PM Michael Roth <michael.roth@amd.com> wrote:
> >         int max_order, r;
> > +       u8 max_level;
> >
> >         if (!kvm_slot_can_be_private(fault->slot)) {
> >                 kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > @@ -4321,8 +4322,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> >                 return r;
> >         }
> >
> > -       fault->max_level = min(kvm_max_level_for_order(max_order),
> > -                              fault->max_level);
> > +       max_level = kvm_max_level_for_order(max_order);
> > +       r = static_call(kvm_x86_gmem_max_level)(vcpu->kvm, fault->pfn,
> > +                                               fault->gfn, &max_level);
> 
> Might as well pass &fault->max_level directly to the callback, with no
> change to the vendor-specific code.
> 
> I'll include the MMU part in a generic series to be the base for both
> Intel TDX and AMD SEV-SNP, and will do that change.

Sounds good. I'm not sure why I did it that way originally, but what
you're suggesting does seem like it should be equivalent.

-Mike

> 
> Paolo
> 
> > +       if (r) {
> > +               kvm_release_pfn_clean(fault->pfn);
> > +               return r;
> > +       }
> > +
> > +       fault->max_level = min(max_level, fault->max_level);
> >         fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> >
> >         return RET_PF_CONTINUE;
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 85f63b6842b6..5eb836b73131 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -4315,3 +4315,30 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
> >                 pfn += use_2m_update ? PTRS_PER_PMD : 1;
> >         }
> >  }
> > +
> > +int sev_gmem_max_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level)
> > +{
> > +       int level, rc;
> > +       bool assigned;
> > +
> > +       if (!sev_snp_guest(kvm))
> > +               return 0;
> > +
> > +       rc = snp_lookup_rmpentry(pfn, &assigned, &level);
> > +       if (rc) {
> > +               pr_err_ratelimited("SEV: RMP entry not found: GFN %llx PFN %llx level %d error %d\n",
> > +                                  gfn, pfn, level, rc);
> > +               return -ENOENT;
> > +       }
> > +
> > +       if (!assigned) {
> > +               pr_err_ratelimited("SEV: RMP entry is not assigned: GFN %llx PFN %llx level %d\n",
> > +                                  gfn, pfn, level);
> > +               return -EINVAL;
> > +       }
> > +
> > +       if (level < *max_level)
> > +               *max_level = level;
> > +
> > +       return 0;
> > +}
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index f26b8c2a8be4..f745022f7454 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -5067,6 +5067,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> >         .alloc_apic_backing_page = svm_alloc_apic_backing_page,
> >
> >         .gmem_prepare = sev_gmem_prepare,
> > +       .gmem_max_level = sev_gmem_max_level,
> >         .gmem_invalidate = sev_gmem_invalidate,
> >  };
> >
> > --
> > 2.25.1
> >
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 28/35] KVM: SEV: Implement gmem hook for initializing private pages
  2023-12-30 17:23 ` [PATCH v11 28/35] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
@ 2024-03-11  5:50   ` Binbin Wu
  0 siblings, 0 replies; 58+ messages in thread
From: Binbin Wu @ 2024-03-11  5:50 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang



On 12/31/2023 1:23 AM, Michael Roth wrote:
> This will handle RMP table updates and direct map changes needed to put
> a page into a private state before mapping it into an SEV-SNP guest.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   arch/x86/kvm/Kconfig   |  1 +
>   arch/x86/kvm/svm/sev.c | 98 ++++++++++++++++++++++++++++++++++++++++++
>   arch/x86/kvm/svm/svm.c |  2 +
>   arch/x86/kvm/svm/svm.h |  1 +
>   virt/kvm/guest_memfd.c |  4 +-
>   5 files changed, 104 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 4ec53d6d5773..79c002e1bb5c 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -125,6 +125,7 @@ config KVM_AMD_SEV
>   	depends on KVM_AMD && X86_64
>   	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
>   	select KVM_GENERIC_PRIVATE_MEM
> +	select HAVE_KVM_GMEM_PREPARE
>   	help
>   	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
>   	  with Encrypted State (SEV-ES) on AMD processors.
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index b2ac696c436a..91f53f4a6059 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4154,3 +4154,101 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
>   out:
>   	put_page(pfn_to_page(pfn));
>   }
> +
> +static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
> +{
> +	kvm_pfn_t pfn = start;
> +
> +	while (pfn < end) {
> +		int ret, rmp_level;
> +		bool assigned;
> +
> +		ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
> +		if (ret) {
> +			pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
> +					    pfn, start, end, rmp_level, ret);
> +			return false;
> +		}
> +
> +		if (assigned) {
> +			pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
> +				 __func__, pfn, start, end, rmp_level);
> +			return false;
> +		}
> +
> +		pfn++;

rmp_level can be got from snp_lookup_rmpentry().
I think the pfn can be updated according to rmp_level to avoid unnecessary
loops for 2MB large page, right?

> +	}
> +
> +	return true;
> +}
> +
> +static u8 max_level_for_order(int order)
> +{
> +	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
> +		return PG_LEVEL_2M;
> +
> +	return PG_LEVEL_4K;
> +}
> +
> +static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
> +{
> +	kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
> +
> +	/*
> +	 * If this is a large folio, and the entire 2M range containing the
> +	 * PFN is currently shared, then the entire 2M-aligned range can be
> +	 * set to private via a single 2M RMP entry.
> +	 */
> +	if (max_level_for_order(order) > PG_LEVEL_4K &&
> +	    is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
> +		return true;
> +
> +	return false;
> +}
> +
> +int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	kvm_pfn_t pfn_aligned;
> +	gfn_t gfn_aligned;
> +	int level, rc;
> +	bool assigned;
> +
> +	if (!sev_snp_guest(kvm))
> +		return 0;
> +
> +	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
> +	if (rc) {
> +		pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n",
> +				   gfn, pfn, rc);
> +		return -ENOENT;
> +	}
> +
> +	if (assigned) {
> +		pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
> +			 __func__, gfn, pfn, max_order, level);
> +		return 0;
> +	}
> +
> +	if (is_large_rmp_possible(kvm, pfn, max_order)) {
> +		level = PG_LEVEL_2M;
> +		pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
> +		gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
> +	} else {
> +		level = PG_LEVEL_4K;
> +		pfn_aligned = pfn;
> +		gfn_aligned = gfn;
> +	}
> +
> +	rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
> +	if (rc) {
> +		pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
> +				   gfn, pfn, level, rc);
> +		return -EINVAL;
> +	}
> +
> +	pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
> +		 __func__, gfn, pfn, pfn_aligned, max_order, level);
> +
> +	return 0;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 240518f8d6c7..32cef8626b57 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5065,6 +5065,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>   	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
>   	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
>   	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
> +
> +	.gmem_prepare = sev_gmem_prepare,
>   };
>   
>   /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index d953ae41c619..9ece9612dbb9 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -725,6 +725,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
>   struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>   void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
>   void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
> +int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
>   
>   /* vmenter.S */
>   
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index feec0da93d98..ddea45279fef 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -66,8 +66,8 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
>   		gfn = slot->base_gfn + index - slot->gmem.pgoff;
>   		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
>   		if (rc) {
> -			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
> -					    index, rc);
> +			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n",
> +					    index, gfn, pfn, rc);
>   			return rc;
>   		}
>   	}


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2023-12-30 17:23 ` [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
  2024-01-10 15:45   ` Sean Christopherson
@ 2024-03-18 21:02   ` Peter Gonda
  1 sibling, 0 replies; 58+ messages in thread
From: Peter Gonda @ 2024-03-18 21:02 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:27 AM Michael Roth <michael.roth@amd.com> wrote:
>
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into
> the guest's memory. The data is encrypted with the cryptographic context
> created with the KVM_SEV_SNP_LAUNCH_START.
>
> In addition to the inserting data, it can insert a two special pages
> into the guests memory: the secrets page and the CPUID page.
>
> While terminating the guest, reclaim the guest pages added in the RMP
> table. If the reclaim fails, then the page is no longer safe to be
> released back to the system and leak them.
>
> For more information see the SEV-SNP specification.
>
> Co-developed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  .../virt/kvm/x86/amd-memory-encryption.rst    |  28 +++
>  arch/x86/kvm/svm/sev.c                        | 181 ++++++++++++++++++
>  include/uapi/linux/kvm.h                      |  19 ++
>  3 files changed, 228 insertions(+)
>
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index b1beb2fe8766..d4325b26724c 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error
>
>  See the SEV-SNP specification for further detail on the launch input.
>
> +20. KVM_SNP_LAUNCH_UPDATE
> +-------------------------
> +
> +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
> +calculates a measurement of the memory contents. The measurement is a signature
> +of the memory contents that can be sent to the guest owner as an attestation
> +that the memory was encrypted correctly by the firmware.

Nit: The measurement is a rolling hash of all the launch updated pages
and their metadata. The attestation quote contains a signature of
information about the SNP VM including this measurement.

Also technically the attestation doesn't confirm to the guest owner
the memory was encrypted correctly, I don't think we can
cryptographically prove that. But the attestation does provide the
guest owner confirmation about the exact steps the ASP took in
creating the SNP VMs initial memory context. If the ASP firmware is
bug free and follows the spec, your 'memory was encrypted correctly by
the firmware' line is implied.

> +
> +Parameters (in): struct  kvm_snp_launch_update
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_snp_launch_update {
> +                __u64 start_gfn;        /* Guest page number to start from. */
> +                __u64 uaddr;            /* userspace address need to be encrypted */
> +                __u32 len;              /* length of memory region */
> +                __u8 imi_page;          /* 1 if memory is part of the IMI */
> +                __u8 page_type;         /* page type */
> +                __u8 vmpl3_perms;       /* VMPL3 permission mask */
> +                __u8 vmpl2_perms;       /* VMPL2 permission mask */
> +                __u8 vmpl1_perms;       /* VMPL1 permission mask */
> +        };
> +
> +See the SEV-SNP spec for further details on how to build the VMPL permission
> +mask and page type.
> +
>  References
>  ==========
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index e2f4d4bc125c..d60209e6e68b 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -245,6 +245,36 @@ static void sev_decommission(unsigned int handle)
>         sev_guest_decommission(&decommission, NULL);
>  }
>
> +static int snp_page_reclaim(u64 pfn)
> +{
> +       struct sev_data_snp_page_reclaim data = {0};
> +       int err, rc;
> +
> +       data.paddr = __sme_set(pfn << PAGE_SHIFT);
> +       rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> +       if (rc) {
> +               /*
> +                * If the reclaim failed, then page is no longer safe
> +                * to use.
> +                */
> +               snp_leak_pages(pfn, 1);
> +       }
> +
> +       return rc;
> +}
> +
> +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
> +{
> +       int rc;
> +
> +       rc = rmp_make_shared(pfn, level);
> +       if (rc && leak)
> +               snp_leak_pages(pfn,
> +                              page_level_size(level) >> PAGE_SHIFT);
> +
> +       return rc;
> +}
> +
>  static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>  {
>         struct sev_data_deactivate deactivate;
> @@ -1990,6 +2020,154 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>         return rc;
>  }
>
> +static int snp_launch_update_gfn_handler(struct kvm *kvm,
> +                                        struct kvm_gfn_range *range,
> +                                        void *opaque)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +       struct kvm_memory_slot *memslot = range->slot;
> +       struct sev_data_snp_launch_update data = {0};
> +       struct kvm_sev_snp_launch_update params;
> +       struct kvm_sev_cmd *argp = opaque;
> +       int *error = &argp->error;
> +       int i, n = 0, ret = 0;
> +       unsigned long npages;
> +       kvm_pfn_t *pfns;
> +       gfn_t gfn;
> +
> +       if (!kvm_slot_can_be_private(memslot)) {
> +               pr_err("SEV-SNP requires private memory support via guest_memfd.\n");
> +               return -EINVAL;
> +       }
> +
> +       if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
> +               pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
> +               return -EFAULT;
> +       }
> +
> +       data.gctx_paddr = __psp_pa(sev->snp_context);
> +
> +       npages = range->end - range->start;
> +       pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
> +       if (!pfns)
> +               return -ENOMEM;
> +
> +       pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
> +                range->start, range->end, params.page_type);
> +
> +       for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
> +               int order, level;
> +               bool assigned;
> +               void *kvaddr;
> +
> +               ret = __kvm_gmem_get_pfn(kvm, memslot, gfn, &pfns[i], &order, false);
> +               if (ret)
> +                       goto e_release;
> +
> +               n++;
> +               ret = snp_lookup_rmpentry((u64)pfns[i], &assigned, &level);
> +               if (ret || assigned) {
> +                       pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d, assigned: %d\n",
> +                              gfn, ret, assigned);
> +                       return -EFAULT;
> +               }
> +
> +               kvaddr = pfn_to_kaddr(pfns[i]);
> +               if (!virt_addr_valid(kvaddr)) {
> +                       pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
> +                       ret = -EINVAL;
> +                       goto e_release;
> +               }
> +
> +               ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> +               if (ret) {
> +                       pr_err("Guest read failed, ret: 0x%x\n", ret);

Should these be pr_debugs()?  This could get noisy.

> +                       goto e_release;
> +               }
> +
> +               ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
> +                                      sev_get_asid(kvm), true);
> +               if (ret) {
> +                       ret = -EFAULT;
> +                       goto e_release;
> +               }
> +
> +               data.address = __sme_set(pfns[i] << PAGE_SHIFT);
> +               data.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
> +               data.page_type = params.page_type;
> +               data.vmpl3_perms = params.vmpl3_perms;
> +               data.vmpl2_perms = params.vmpl2_perms;
> +               data.vmpl1_perms = params.vmpl1_perms;
> +               ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> +                                     &data, error);
> +               if (ret) {
> +                       pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
> +                              ret, *error);
> +                       snp_page_reclaim(pfns[i]);
> +
> +                       /*
> +                        * When invalid CPUID function entries are detected, the firmware
> +                        * corrects these entries for debugging purpose and leaves the
> +                        * page unencrypted so it can be provided users for debugging
> +                        * and error-reporting.
> +                        *
> +                        * Copy the corrected CPUID page back to shared memory so
> +                        * userpsace can retrieve this information.

Typo: userpsace

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v11 15/35] KVM: SEV: Add KVM_SNP_INIT command
  2023-12-30 17:23 ` [PATCH v11 15/35] KVM: SEV: Add KVM_SNP_INIT command Michael Roth
  2024-02-06 23:51   ` Paolo Bonzini
@ 2024-03-20 17:28   ` Paolo Bonzini
  1 sibling, 0 replies; 58+ messages in thread
From: Paolo Bonzini @ 2024-03-20 17:28 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, Pavan Kumar Paluri

On Sat, Dec 30, 2023 at 6:26 PM Michael Roth <michael.roth@amd.com> wrote:
> +        struct kvm_snp_init {
> +                __u64 flags;
> +        };
> +
> +The flags bitmap is defined as::
> +
> +   /* enable the restricted injection */
> +   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
> +
> +   /* enable the restricted injection timer */
> +   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)

The flags are the same as the vmsa_features introduced by
KVM_SEV_INIT2, which is great - SNP does not need any change in this
department and this patch almost entirely goes away.

>         if (sev_es_debug_swap_enabled)
>                 save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
>
> +       /* Enable the SEV-SNP feature */
> +       if (sev_snp_guest(svm->vcpu.kvm))
> +               save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;

... on the other hand this begs the question whether
SVM_SEV_FEAT_SNP_ACTIVE should be exposed in the
KVM_X86_SEV_VMSA_FEATURES attribute. I think it shouldn't.

This means that this patch becomes a two-liner change to
sev_guest_init() that you can squash in patch 14 ("KVM: SEV: Add
initial SEV-SNP support"):

     sev->es_active = es_active;
     sev->vmsa_features = data->vmsa_features;
+    if (vm_type == KVM_X86_SNP_VM)
+        sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE

Also, since there is now sev->vmsa_features (that wasn't there at the
time of your posting), I'd even drop sev->snp_active in favor of
"sev->vmsa_features & SVM_SEV_FEAT_SNP_ACTIVE". It's only ever used in
sev_snp_guest() so it's a useless duplication.

Looking forward to see v12. :)  If you have any problems rebasing on
top of https://lore.kernel.org/kvm/20240227232100.478238-1-pbonzini@redhat.com/,
please shout.

Paolo


>         pr_debug("Virtual Machine Save Area (VMSA):\n");
>         print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
>
> @@ -1883,6 +1914,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>         }
>
>         switch (sev_cmd.id) {
> +       case KVM_SEV_SNP_INIT:
> +               if (!sev_snp_enabled) {
> +                       r = -ENOTTY;
> +                       goto out;
> +               }
> +               fallthrough;
>         case KVM_SEV_ES_INIT:
>                 if (!sev_es_enabled) {
>                         r = -ENOTTY;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index a3e27c82866b..07a9eb5b6ce5 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -76,6 +76,9 @@ enum {
>  /* TPR and CR2 are always written before VMRUN */
>  #define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
>
> +/* Supported init feature flags */
> +#define SEV_SNP_SUPPORTED_FLAGS                0x0
> +
>  struct kvm_sev_info {
>         bool active;            /* SEV enabled guest */
>         bool es_active;         /* SEV-ES enabled guest */
> @@ -91,6 +94,7 @@ struct kvm_sev_info {
>         struct list_head mirror_entry; /* Use as a list entry of mirrors */
>         struct misc_cg *misc_cg; /* For misc cgroup accounting */
>         atomic_t migration_in_progress;
> +       u64 snp_init_flags;
>  };
>
>  struct kvm_svm {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index c3308536482b..73702e9b9d76 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1869,6 +1869,9 @@ enum sev_cmd_id {
>         /* Guest Migration Extension */
>         KVM_SEV_SEND_CANCEL,
>
> +       /* SNP specific commands */
> +       KVM_SEV_SNP_INIT,
> +
>         KVM_SEV_NR_MAX,
>  };
>
> @@ -1965,6 +1968,16 @@ struct kvm_sev_receive_update_data {
>         __u32 trans_len;
>  };
>
> +/* enable the restricted injection */
> +#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
> +
> +/* enable the restricted injection timer */
> +#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
> +
> +struct kvm_snp_init {
> +       __u64 flags;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3         (1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX       (1 << 2)
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2024-03-20 17:28 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-30 17:23 [PATCH v11 00/35] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
2023-12-30 17:23 ` [PATCH v11 01/35] KVM: Add hugepage support for dedicated guest memory Michael Roth
2023-12-30 17:23 ` [PATCH v11 02/35] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Michael Roth
2023-12-30 17:23 ` [PATCH v11 03/35] KVM: Use AS_INACCESSIBLE when creating guest_memfd inode Michael Roth
2023-12-30 17:23 ` [PATCH v11 04/35] KVM: x86: Add gmem hook for initializing memory Michael Roth
2023-12-30 17:23 ` [PATCH v11 05/35] KVM: x86: Add gmem hook for invalidating memory Michael Roth
2023-12-30 17:23 ` [PATCH v11 06/35] KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults Michael Roth
2024-02-06 20:51   ` Sean Christopherson
2024-02-12 10:00     ` Paolo Bonzini
2024-02-12 16:42       ` Michael Roth
2023-12-30 17:23 ` [PATCH v11 07/35] KVM: x86: Add KVM_X86_SNP_VM vm_type Michael Roth
2023-12-30 17:23 ` [PATCH v11 08/35] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
2023-12-30 17:23 ` [PATCH v11 09/35] KVM: x86: Determine shared/private faults based on vm_type Michael Roth
2024-02-12 10:31   ` Paolo Bonzini
2024-02-12 16:27     ` Sean Christopherson
2024-02-12 16:47       ` Michael Roth
2023-12-30 17:23 ` [PATCH v11 10/35] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests Michael Roth
2023-12-30 17:23 ` [PATCH v11 11/35] KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y Michael Roth
2023-12-30 17:23 ` [PATCH v11 12/35] KVM: SEV: Add support to handle AP reset MSR protocol Michael Roth
2023-12-30 17:23 ` [PATCH v11 13/35] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
2023-12-30 17:23 ` [PATCH v11 14/35] KVM: SEV: Add initial SEV-SNP support Michael Roth
2023-12-30 17:23 ` [PATCH v11 15/35] KVM: SEV: Add KVM_SNP_INIT command Michael Roth
2024-02-06 23:51   ` Paolo Bonzini
2024-03-20 17:28   ` Paolo Bonzini
2023-12-30 17:23 ` [PATCH v11 16/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
2023-12-30 17:23 ` [PATCH v11 17/35] KVM: Add HVA range operator Michael Roth
2023-12-30 17:23 ` [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
2024-01-10 15:45   ` Sean Christopherson
2024-01-16  4:14     ` Michael Roth
2024-02-02 22:54       ` Sean Christopherson
2024-02-06 23:43         ` Paolo Bonzini
2024-02-07  2:43           ` Sean Christopherson
2024-02-07  8:03             ` Paolo Bonzini
2024-02-09  1:52           ` Michael Roth
2024-02-09 14:34             ` Sean Christopherson
2024-03-18 21:02   ` Peter Gonda
2023-12-30 17:23 ` [PATCH v11 19/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
2023-12-30 17:23 ` [PATCH v11 20/35] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
2023-12-30 17:23 ` [PATCH v11 21/35] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
2023-12-30 17:23 ` [PATCH v11 22/35] KVM: SEV: Add support to handle " Michael Roth
2023-12-30 17:23 ` [PATCH v11 23/35] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
2023-12-30 17:23 ` [PATCH v11 24/35] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
2023-12-30 17:23 ` [PATCH v11 25/35] KVM: SEV: Use a VMSA physical address variable for populating VMCB Michael Roth
2023-12-30 17:23 ` [PATCH v11 26/35] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
2024-01-05 22:08   ` Jacob Xu
2024-01-08 15:53     ` Sean Christopherson
2023-12-30 17:23 ` [PATCH v11 27/35] KVM: SEV: Add support for GHCB-based termination requests Michael Roth
2023-12-30 17:23 ` [PATCH v11 28/35] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
2024-03-11  5:50   ` Binbin Wu
2023-12-30 17:23 ` [PATCH v11 29/35] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
2023-12-30 17:23 ` [PATCH v11 30/35] KVM: x86: Add gmem hook for determining max NPT mapping level Michael Roth
2024-02-12 10:50   ` Paolo Bonzini
2024-02-12 17:03     ` Michael Roth
2023-12-30 17:23 ` [PATCH v11 31/35] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
2023-12-30 17:23 ` [PATCH v11 32/35] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
2023-12-30 17:23 ` [PATCH v11 33/35] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
2023-12-30 17:23 ` [PATCH v11 34/35] crypto: ccp: Add the SNP_SET_CONFIG_{START,END} commands Michael Roth
2023-12-30 17:23 ` [PATCH v11 35/35] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event Michael Roth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).