kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
@ 2023-06-12  4:25 Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 01/51] KVM: x86: Add gmem hook for initializing private memory Michael Roth
                   ` (50 more replies)
  0 siblings, 51 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

This patchset is also available at:

  https://github.com/amdese/linux/commits/snp-host-v9-rfc

and is based on top of the following tree:

  https://github.com/mdroth/linux/commits/kvm_gmem_solo_fixes

which in turn is based on Sean Christopherson's UPM base support tree,
with a couple fixes/workarounds needed for SEV/SNP support. [1]

== OVERVIEW ==

This patchset implements SEV-SNP hypervisor support for linux.

This version is being posted as an RFC due to fairly extensive changes
relating to transitioning the SEV-SNP implementation to using
guest_memfd (gmem, aka Unmapped Private Memory) to manage private guest
pages instead of the legacy SEV memory registration ioctls.

For this purpose we've added a number of hooks on top of gmem to plumb
in necessary RMP table updates corresponding when mapping private
memory into a guest's nested page table, and then restoring it to
shared/hypervisor-owned state we free'ing gmem-allocated memory back to
the host. Our hope is that some of these hooks can be re-used for other
platforms as well, but have tried to make them as minimal as possible if
they prove to be SNP-specific. For quicker review of this aspect, they
are at the beginning of the series, directly on top of the gmem patchset.

Outside of UPM-related items, we've also included fairly extensive changes
based on review feedback from v8 and would appreciate any feedback on
those aspects as well.


== LAYOUT ==

PATCH 01-05: Pre-patches that add generic gmem and KVM MMU hooks to handle
             plumbing gmem memory into CoCo guests, and make arch/x86/coco
             re-usability for common SEV host code instead of only guest
             code..
PATCH 06-22: Host SNP initialization code and CCP driver prep for handling
             SNP cmds
PATCH 13-22: general SNP detection/enablement for host and CCP driver
PATCH 23-46: core KVM support for running SEV-SNP guests
PATCH 47-51: misc handling for IOMMU support, guest request handling, and
             debug infrastructure


== TESTING (note updated QEMU command-lines) ==

For testing this via QEMU, use the following tree:

  https://github.com/amdese/qemu/commits/snp-wip-gmem

SEV-SNP with gmem/UPM enabled:

  # set discard=none to disable discarding memory post-conversion, faster
  # boot times, but increased memory usage
  qemu-system-x86_64 -cpu EPYC-Milan-v2 \
    -object memory-backend-memfd-private,id=ram1,size=1G,share=true \
    -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,discard=both \
    -machine q35,confidential-guest-support=sev0,memory-backend=ram1,kvm-type=protected \
    ...

KVM selftests for UPM:

  cd $kernel_src_dir
  make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I<path to kernel headers>"
  sudo tools/testing/selftests/kvm/x86_64/private_mem_conversions_test


== BACKGROUND (SEV-SNP) ==

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Guest Support now part of mainline.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:

- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

When pages are marked as guest-owned in the RMP table, they are assigned
to a specific guest/ASID, as well as a specific GFN with in the guest. Any
attempts to map it in the RMP table to a different guest/ASID, or a
different GFN within a guest/ASID, will result in an RMP nested page fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outisde of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests 
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation SEV-SNP, private guest memory is managed by a new 
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks added in this series will then update the RMP table
entries for the backing PFNs to set them to guest-owned/private when mapping
private pages into the guest via KVM MMU, or use the normal KVM MMU handling
in the case of shared pages where the corresponding RMP table entries are
left in the default shared/hypervisor-owned state.

== TODO / KNOWN ISSUES ==

 * Add a per-arch CONFIG option for enabling platform-specific handling
   when invalidating gmem pages and free'ing the back to host, as opposed
   to the current approach which defaults to issuing invalidations to a
   weak-referenced stub implementation for non-x86 builds. Hoping for more
   feedback on general implementation first.
 * This should incorporate all review feedback from v8, but if anything
   slipped through the cracks please let me know.

[1] https://lore.kernel.org/lkml/20230512002124.3sap3kzxpegwj3n2@amd.com/

Changes since v8:

 * Rework gmem/UPM hooks based on Sean's latest gmem/UPM tree
 * Move SEV lazy-pinning support out to a separate series which uses this
   series as a prereq instead of the other way around.
 * Re-organize extended guest request patches into 3 patches encompassing
   SEV FD ioctls for host-wide certs, KVM ioctls for per-instance certs,
   and the guest request handling that consumes them. Also move them to
   the top of the series to better separate them for the core SNP patches
   (Alexey, Zhi, Ashish, Dov, Dionna, others)
 * Various other changes/fixups for extended guests request handling (Dov,
   Alexey, Dionna)
 * Use helper to calculate max RMP entry size and improve readability (Dave)
 * Use architecture-independent GPA value for initial VMSA pages
 * Ensure SEV_CMD_SNP_GUEST_REQUEST failures are indicated to guest (Alex)
 * Allocate per-instance certs on-demand (Alex)
 * comment fixup for RMP fault handling (Zhi)
 * commit msg rewording for MSR-based PSCs (Zhi)
 * update SNP command/struct definitions based on 1.54 ABI (Saban)
 * use sev_deactivate_lock around SEV_CMD_SNP_DECOMMISSION (Saban)
 * Various comment/commit fixups (Zhi, Alex, Kim, Vlastimil, Dave, 
 * kexec fixes for newer SNP firmwares (Ashish)
 * Various other fixups and re-ordering of patches.

Changes since v7:

 * Rebase to Sean's updated UPM base support tree
 * Drop KVM_CAP_UNMAPPED_MEMORY and .private_mem_enabled x86 op in favor
   of kvm_arch_has_private_mem() and vm_type KVM_VM_CREATE arg
 * Drop GHCB map/unmap refactoring and post map/unmap hooks as they are no
   longer needed with UPM
 * Move .fault_is_private implementation to SNP patch range, no longer
   needed for SEV.
 * Don't call attribute update / invalidation hooks under kvm->mmu_lock
   (Tom, Jarkko)
 * Revert switch to using set_memory_p()/set_memory_np() in rmpupdate() due
   to it causing performance regression
 * Commit fixups for 'fault_is_private'/'update_mem_attr' hooks, have
   'fault_is_private' return bool (Boris)
 * Split kvm_vm_set_region_attr() into separate patch. (Jarkko)
 * Copy corrected CPUID page to userspace when firmware rejects it (Tom,
   Jarkko)
 * Fix sev_dump_rmpentry() error-handling (Alper)
 * Use adjusted cmd_buf pointer rather than sev->cmd_buf directly (Alper)
 * Correct typo in SNP_GET_EXT_CONFIG documentation (Dov)
 * Update struct kvm_sev_snp_launch_finish definition in
   amd-memory-encryption.rst (Tom)
 * Fix snp_launch_update_vmsa replacing created_vcpus with online_vcpus
 * Fix SNP_DBG_DECRYPT to not include len parameter.
 * Fix SNP_LAUNCH_FINISH to copy host-data from userspace


Changes since v6:

 * Added support for restrictedmem/UPM, and removed SEV-specific
   implementation of private memory management. As a result of this rework
   the following patches were no longer needed so were dropped:
   - KVM: SVM: Mark the private vma unmergable for SEV-SNP guests
   - KVM: SVM: Disallow registering memory range from HugeTLB for SNP guest
   - KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
   - KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
 * Moved RMP table entry structure definition (struct rmpentry)
   to sev.c, to not expose this non-architectural definition to rest
   of the kernel and making the structure private to SNP code.
   Also made RMP table entry accessors to be inline functions and
   removed all accessors which are not called more than once. 
   Added a new function rmptable_entry() to index into the RMP table
   and return RMP table entry.
 * Moved RMPUPDATE, PSMASH helper function declerations to x86 arch
   specific include namespace from linux namespace. Added comments 
   for these helper functions.
 * Introduce set_memory_p() to provide a way to change atributes of a
   memory range to be marked as present and added to the kernel 
   directmap, and invalidating/restoring pages from directmap are
   now done using set_memory_np() and set_memory_p().
 * Added detailed comments around user RMP #PF fault handling and
   simplified computation of the faulting pfn for large-pages.
 * Added support to return pfn from dump_pagetable() to do SEV-specific
   fault handling, this is added a pre-patch. This support is now
   used to dump RMP entry in case of RMP #PF in show_fault_oops().
 * Added a new generic SNP command params structure sev_data_snp_addr,
   which is used for all SNP firmware API commands requiring a 
   single physical address parameter.
 * Added support for new SNP_INIT_EX command with support for HV-Fixed
   page range list. 
 * Added support for new SNP_SHUTDOWN_EX command which allows 
   disabling enforcement of SNP in the IOMMU. Also DF_FLUSH is done
   at SNP shutdown if it indicates DF_FLUSH is required.
 * Make sev_do_cmd() a generic API interface for the hypervisor
   to issue commands to manage an SEV and SNP guest. Also removed
   the API wrappers used by the hypervisor to manage an SEV-SNP guest.
   All these APIs now invoke sev_do_cmd() directly.
 * Introduce snp leaked pages list. If pages are unsafe to be released
   back to the page-allocator as they can't be reclaimed or 
   transitioned back to hypervisor/shared state are now added
   to this internal leaked pages list to prevent fatal page faults
   when accessing these pages. The function snp_leak_pages() is 
   renamed to snp_mark_pages_offline() and is an external function
   available to both CCP driver and the SNP hypervisor code. Removed
   call to memory_failure() when leaking/marking pages offline.
 * Remove snp_set_rmp_state() multiplexor code and add new separate
   helpers such as rmp_mark_pages_firmware() & rmp_mark_pages_shared().
   The callers now issue snp_reclaim_pages() directly when needed as
   done by __snp_free_firmware_pages() and unmap_firmware_writeable().
   All callers of snp_set_rmp_state() modified to call helpers
   rmp_mark_pages_firmware() or rmp_mark_pages_shared() as required.
 * Change snp_reclaim_pages() to take physical address as an argument
   and clear C-bit from this physical address argument internally.
 * Output parameter sev_user_data_ext_snp_config in sev_ioctl_snp_get_config()
   is memset to zero to avoid kernel memory leaking.
 * Prevent race between sev_ioctl_snp_set_config() and 
   snp_guest_ext_guest_request() for sev->snp_certs_data by acquiring
   sev->snp_certs_lock mutex.
 * Zeroed out struct sev_user_data_snp_config in
   sev_ioctl_snp_set_config() to prevent leaking uninitialized
   kernel memory.
 * Optimized snp_safe_alloc_page() by avoiding multiple calls to
   pfn_to_page() and checking for a hugepage using pfn instead of
   expanding to full physical address.
 * Invoke host_rmp_make_shared() with leak parameter set to true
   if VMSA page cannot be transitioned back to shared state.
 * Fix snp_launch_finish() to always sent the ID_AUTH struct to
   the firmware. Use params.auth_key_en indicator to set 
   if the ID_AUTH struct contains an author key or not.
 * Cleanup snp_context_create() and allocate certs_data in this
   function using kzalloc() to prevent giving the guest 
   uninitialized kernel memory.
 * Remove the check for guest supplied buffer greater than the data
   provided by the hypervisor in snp_handle_ext_guest_request().
 * Add check in sev_snp_ap_create() if a malicious guest can
   RMPADJUST a large page into VMSA which will hit the SNP erratum
   where the CPU will incorrectly signal an RMP violation #PF if a
   hugepage collides with the RMP entry of VMSA page, reject the
   AP CREATE request if VMSA address from guest is 2M aligned.
 * Make VMSAVE target area memory allocation SNP safe, implemented
   workaround for an SNP erratum where the CPU will incorrectly signal
   an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
   RMP entry of the VMSAVE target page.
 * Fix handle_split_page_fault() to work with memfd backed pages.
 * Add KVM commands for per-VM instance certificates.
 * Add IOMMU_SNP_SHUTDOWN support, this adds support for Host kexec
   support with SNP.

 Documentation/virt/coco/sev-guest.rst              |   54 +
 Documentation/virt/kvm/api.rst                     |   34 +
 .../virt/kvm/x86/amd-memory-encryption.rst         |  147 ++
 arch/x86/Kbuild                                    |    2 +-
 arch/x86/coco/Makefile                             |    3 +-
 arch/x86/coco/sev/Makefile                         |    3 +
 arch/x86/coco/sev/host.c                           |  524 ++++++
 arch/x86/include/asm/cpufeatures.h                 |    1 +
 arch/x86/include/asm/disabled-features.h           |    8 +-
 arch/x86/include/asm/kvm-x86-ops.h                 |    3 +
 arch/x86/include/asm/kvm_host.h                    |   23 +
 arch/x86/include/asm/msr-index.h                   |   11 +-
 arch/x86/include/asm/sev-common.h                  |   30 +
 arch/x86/include/asm/sev-host.h                    |   37 +
 arch/x86/include/asm/sev.h                         |    5 +-
 arch/x86/include/asm/svm.h                         |    6 +
 arch/x86/include/asm/trap_pf.h                     |   18 +-
 arch/x86/kernel/cpu/amd.c                          |   24 +-
 arch/x86/kernel/cpu/bugs.c                         |    7 +-
 arch/x86/kvm/Kconfig                               |    1 +
 arch/x86/kvm/lapic.c                               |    5 +-
 arch/x86/kvm/mmu.h                                 |    2 -
 arch/x86/kvm/mmu/mmu.c                             |   15 +-
 arch/x86/kvm/mmu/mmu_internal.h                    |   39 +-
 arch/x86/kvm/svm/nested.c                          |    2 +-
 arch/x86/kvm/svm/sev.c                             | 1802 +++++++++++++++++---
 arch/x86/kvm/svm/svm.c                             |   53 +-
 arch/x86/kvm/svm/svm.h                             |   38 +-
 arch/x86/kvm/x86.c                                 |   17 +
 arch/x86/mm/fault.c                                |   21 +
 drivers/crypto/ccp/sev-dev.c                       | 1064 +++++++++++-
 drivers/crypto/ccp/sev-dev.h                       |   16 +
 drivers/iommu/amd/init.c                           |   57 +-
 include/linux/amd-iommu.h                          |    3 +-
 include/linux/kvm_host.h                           |   10 +
 include/linux/psp-sev.h                            |  304 +++-
 include/uapi/linux/kvm.h                           |   74 +
 include/uapi/linux/psp-sev.h                       |   71 +
 tools/arch/x86/include/asm/cpufeatures.h           |    1 +
 virt/kvm/guest_mem.c                               |   48 +-
 virt/kvm/kvm_main.c                                |   75 +-
 41 files changed, 4383 insertions(+), 275 deletions(-)


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 01/51] KVM: x86: Add gmem hook for initializing private memory
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 02/51] KVM: x86: Add gmem hook for invalidating " Michael Roth
                   ` (49 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

All gmem pages are expected to be 'private' as defined by a particular
arch/platform. Platforms like SEV-SNP require additional operations to
move these pages into a private state, so implement a hook that can be
used to prepare this memory prior to mapping it into a guest.

In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table, so this hook will also be used by the KVM MMU to clamp
the maximum mapping size accordingly.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  3 +++
 arch/x86/kvm/mmu/mmu.c             | 11 ++++++++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 13bc212cd4bc..439ba4beb5af 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -133,6 +133,7 @@ KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8ae131dc645d..bd03b6cf40fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1732,6 +1732,9 @@ struct kvm_x86_ops {
 	 * Returns vCPU specific APICv inhibit reasons
 	 */
 	unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
+
+	int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
+			    kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dc2b9a2f717c..c54672ad6cbc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4341,6 +4341,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault)
 {
 	int order, r;
+	u8 max_level;
 
 	if (!kvm_slot_can_be_private(fault->slot))
 		return kvm_do_memory_fault_exit(vcpu, fault);
@@ -4349,7 +4350,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 	if (r)
 		return r;
 
-	fault->max_level = min(kvm_max_level_for_order(order), fault->max_level);
+	max_level = kvm_max_level_for_order(order);
+	r = static_call(kvm_x86_gmem_prepare)(vcpu->kvm, fault->slot, fault->pfn,
+					      fault->gfn, &max_level);
+	if (r) {
+		kvm_release_pfn_clean(fault->pfn);
+		return r;
+	}
+
+	fault->max_level = min(max_level, fault->max_level);
 	fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
 	return RET_PF_CONTINUE;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 02/51] KVM: x86: Add gmem hook for invalidating private memory
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 01/51] KVM: x86: Add gmem hook for initializing private memory Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 10:49   ` Borislav Petkov
  2023-06-12  4:25 ` [PATCH RFC v9 03/51] KVM: x86: Use full 64-bit error code for kvm_mmu_do_page_fault Michael Roth
                   ` (48 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

TODO: add a CONFIG option that can be to completely skip arch
invalidation loop and avoid __weak references for arch/platforms that
don't need an additional invalidation hook.

In some cases, like with SEV-SNP, guest memory needs to be updated in a
platform-specific manner before it can be safely freed back to the host.
Add hooks to wire up handling of this sort when freeing memory in
response to FALLOC_FL_PUNCH_HOLE operations.

Also issue invalidations of all allocated pages when releasing the gmem
file so that the pages are not left in an unusable state when they get
freed back to the host.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/x86.c                 |  6 ++++
 include/linux/kvm_host.h           |  3 ++
 virt/kvm/guest_mem.c               | 48 ++++++++++++++++++++++++++++--
 5 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 439ba4beb5af..48f043de2ec0 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -134,6 +134,7 @@ KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL(gmem_invalidate)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bd03b6cf40fb..b3bd24f2a390 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1735,6 +1735,7 @@ struct kvm_x86_ops {
 
 	int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
+	void (*gmem_invalidate)(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c9e1c9369be2..10d76afa23d9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13252,6 +13252,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
 
+#ifdef CONFIG_KVM_PRIVATE_MEM
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+	static_call_cond(kvm_x86_gmem_invalidate)(kvm, start, end);
+}
+#endif
 
 int kvm_spec_ctrl_test_value(u64 value)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1a47cedae8a1..7de06add2235 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2343,6 +2343,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 #ifdef CONFIG_KVM_PRIVATE_MEM
 int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 			      gfn_t gfn, kvm_pfn_t *pfn, int *order);
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
 #else
 static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2351,6 +2352,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	KVM_BUG_ON(1, kvm);
 	return -EIO;
 }
+
+void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end) { }
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
 #endif
diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index cdf2d84683c8..a7e926af4255 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -140,16 +140,58 @@ static void kvm_gmem_invalidate_end(struct kvm *kvm, struct kvm_gmem *gmem,
 	KVM_MMU_UNLOCK(kvm);
 }
 
+void __weak kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+}
+
+/* Handle arch-specific hooks needed before releasing guarded pages. */
+static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct file *file,
+					   pgoff_t start, pgoff_t end)
+{
+	pgoff_t file_end = i_size_read(file_inode(file)) >> PAGE_SHIFT;
+	pgoff_t index = start;
+
+	end = min(end, file_end);
+
+	while (index < end) {
+		struct folio *folio;
+		unsigned int order;
+		struct page *page;
+		kvm_pfn_t pfn;
+
+		folio = __filemap_get_folio(file->f_mapping, index,
+					    FGP_LOCK, 0);
+		if (!folio) {
+			index++;
+			continue;
+		}
+
+		page = folio_file_page(folio, index);
+		pfn = page_to_pfn(page);
+		order = folio_order(folio);
+
+		kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index));
+
+		index = folio_next_index(folio);
+		folio_unlock(folio);
+		folio_put(folio);
+
+		cond_resched();
+	}
+}
+
 static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len)
 {
 	struct kvm_gmem *gmem = file->private_data;
-	pgoff_t start = offset >> PAGE_SHIFT;
-	pgoff_t end = (offset + len) >> PAGE_SHIFT;
 	struct kvm *kvm = gmem->kvm;
+	pgoff_t start, end;
 
 	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
 		return 0;
 
+	start = offset >> PAGE_SHIFT;
+	end = (offset + len) >> PAGE_SHIFT;
+
 	/*
 	 * Bindings must stable across invalidation to ensure the start+end
 	 * are balanced.
@@ -158,6 +200,7 @@ static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len)
 
 	kvm_gmem_invalidate_begin(kvm, gmem, start, end);
 
+	kvm_gmem_issue_arch_invalidate(kvm, file, start, end);
 	truncate_inode_pages_range(file->f_mapping, offset, offset + len - 1);
 
 	kvm_gmem_invalidate_end(kvm, gmem, start, end);
@@ -264,6 +307,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
 	 * pointed at this file.
 	 */
 	kvm_gmem_invalidate_begin(kvm, gmem, 0, -1ul);
+	kvm_gmem_issue_arch_invalidate(gmem->kvm, file, 0, -1ul);
 	truncate_inode_pages_final(file->f_mapping);
 	kvm_gmem_invalidate_end(kvm, gmem, 0, -1ul);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 03/51] KVM: x86: Use full 64-bit error code for kvm_mmu_do_page_fault
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 01/51] KVM: x86: Add gmem hook for initializing private memory Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 02/51] KVM: x86: Add gmem hook for invalidating " Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-14 14:24   ` Isaku Yamahata
  2023-06-12  4:25 ` [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask Michael Roth
                   ` (47 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

The upper bits will be needed in some cases to distinguish between
nested page faults for private/shared pages, so pass along the full
64-bit value.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/mmu/mmu.c          | 3 +--
 arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c54672ad6cbc..0d3983b9aa7e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5829,8 +5829,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	}
 
 	if (r == RET_PF_INVALID) {
-		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
-					  lower_32_bits(error_code), false,
+		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
 					  &emulation_type);
 		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
 			return -EIO;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index f1786698ae00..780b91e1da9f 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -283,11 +283,11 @@ enum {
 };
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-					u32 err, bool prefetch, int *emulation_type)
+					u64 err, bool prefetch, int *emulation_type)
 {
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
-		.error_code = err,
+		.error_code = lower_32_bits(err),
 		.exec = err & PFERR_FETCH_MASK,
 		.write = err & PFERR_WRITE_MASK,
 		.present = err & PFERR_PRESENT_MASK,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (2 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 03/51] KVM: x86: Use full 64-bit error code for kvm_mmu_do_page_fault Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-14 16:47   ` Isaku Yamahata
  2023-06-19 16:27   ` Borislav Petkov
  2023-06-12  4:25 ` [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile Michael Roth
                   ` (46 subsequent siblings)
  50 siblings, 2 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

This will be used to determine whether or not an #NPF should be serviced
using a normal page vs. a guarded/gmem one.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  7 +++++++
 arch/x86/kvm/mmu/mmu_internal.h | 35 ++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b3bd24f2a390..c26f76641121 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1445,6 +1445,13 @@ struct kvm_arch {
 	 */
 #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
 	struct kvm_mmu_memory_cache split_desc_cache;
+
+	/*
+	 * When set, used to determine whether a fault should be treated as
+	 * private in the context of protected VMs which use a separate gmem
+	 * pool to back private guest pages.
+	 */
+	u64 mmu_private_fault_mask;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 780b91e1da9f..9b9e75aa43f4 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -252,6 +252,39 @@ struct kvm_page_fault {
 
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
+static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
+{
+	struct kvm_memory_slot *slot;
+	bool private_fault = false;
+	gfn_t gfn = gpa_to_gfn(gpa);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
+		goto out;
+	}
+
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
+		goto out;
+	}
+
+	if (kvm->arch.mmu_private_fault_mask) {
+		private_fault = !!(err & kvm->arch.mmu_private_fault_mask);
+		goto out;
+	}
+
+	/*
+	 * Handling below is for UPM self-tests and guests that treat userspace
+	 * as the authority on whether a fault should be private or not.
+	 */
+	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
+
+out:
+	pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault);
+	return private_fault;
+}
+
 /*
  * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
  * and of course kvm_mmu_do_page_fault().
@@ -301,7 +334,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
-		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
+		.is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
 	};
 	int r;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (3 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  7:07   ` Kirill A . Shutemov
                     ` (2 more replies)
  2023-06-12  4:25 ` [PATCH RFC v9 06/51] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
                   ` (45 subsequent siblings)
  50 siblings, 3 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Kirill A . Shutemov

Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
arch/x86/coco, but that is generally only applicable for guest support.

For SEV-SNP, helpers related purely to host support will also live in
arch/x86/coco. To allow for CoCo-related host support code in
arch/x86/coco, move that check down into the Makefile and check for it
specifically when needed.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/Kbuild        | 2 +-
 arch/x86/coco/Makefile | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index 5a83da703e87..1889cef48b58 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += coco/
+obj-y += coco/
 
 obj-y += entry/
 
diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
index c816acf78b6a..6aa52e719bf5 100644
--- a/arch/x86/coco/Makefile
+++ b/arch/x86/coco/Makefile
@@ -3,6 +3,6 @@ CFLAGS_REMOVE_core.o	= -pg
 KASAN_SANITIZE_core.o	:= n
 CFLAGS_core.o		+= -fno-stack-protector
 
-obj-y += core.o
+obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o
 
 obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx/
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 06/51] x86/cpufeatures: Add SEV-SNP CPU feature
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (4 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support Michael Roth
                   ` (44 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen,
	Ashish Kalra

From: Brijesh Singh <brijesh.singh@amd.com>

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/cpufeatures.h       | 1 +
 arch/x86/kernel/cpu/amd.c                | 5 +++--
 tools/arch/x86/include/asm/cpufeatures.h | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 97327a1e3aff..b60b32f47884 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -431,6 +431,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 95cdd08c4cbb..a79774181f22 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -558,8 +558,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 	 *	      SME feature (set in scattered.c).
 	 *	      If the kernel has not enabled SME via any means then
 	 *	      don't advertise the SME feature.
-	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
-	 *            SEV and SEV_ES feature (set in scattered.c).
+	 *   For SEV: If BIOS has not enabled SEV then don't advertise SEV and
+	 *	      any additional functionality based on it.
 	 *
 	 *   In all cases, since support for SME and SEV requires long mode,
 	 *   don't advertise the feature under CONFIG_X86_32.
@@ -594,6 +594,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 clear_sev:
 		setup_clear_cpu_cap(X86_FEATURE_SEV);
 		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
 	}
 }
 
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index b89005819cd5..b4f11a500470 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -424,6 +424,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (5 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 06/51] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 15:34   ` Dave Hansen
                     ` (2 more replies)
  2023-06-12  4:25 ` [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
                   ` (43 subsequent siblings)
  50 siblings, 3 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
a number of steps needed to detect/enable SEV-SNP and RMP table support
on the host:

 - Detect SEV-SNP support based on CPUID bit
 - Initialize the RMP table memory reported by the RMP base/end MSR
   registers and configure IOMMU to be compatible with RMP access
   restrictions
 - Set the MtrrFixDramModEn bit in SYSCFG MSR
 - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
 - Configure IOMMU

RMP table entry format is non-architectural and it can vary by
processor. It is defined by the PPR. Restrict SNP support to CPU
models/families which are compatible with the current RMP table entry
format to guard against any undefined behavior when running on other
system types. Future models/support will handle this through an
architectural mechanism to allow for broader compatibility.

SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
instead of CONFIG_AMD_MEM_ENCRYPT.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: rework commit message to be clearer about what patch does, squash
      in early_rmptable_check() handling from Tom]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/coco/Makefile                   |   1 +
 arch/x86/coco/sev/Makefile               |   3 +
 arch/x86/coco/sev/host.c                 | 212 +++++++++++++++++++++++
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/msr-index.h         |  11 +-
 arch/x86/include/asm/sev.h               |   2 +
 arch/x86/kernel/cpu/amd.c                |  19 ++
 drivers/iommu/amd/init.c                 |   2 +-
 include/linux/amd-iommu.h                |   2 +-
 9 files changed, 256 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/coco/sev/Makefile
 create mode 100644 arch/x86/coco/sev/host.c

diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
index 6aa52e719bf5..6a7d876130e2 100644
--- a/arch/x86/coco/Makefile
+++ b/arch/x86/coco/Makefile
@@ -6,3 +6,4 @@ CFLAGS_core.o		+= -fno-stack-protector
 obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o
 
 obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx/
+obj-$(CONFIG_KVM_AMD_SEV)	+= sev/
diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile
new file mode 100644
index 000000000000..27c0500d75c8
--- /dev/null
+++ b/arch/x86/coco/sev/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y += host.o
diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
new file mode 100644
index 000000000000..6907ce887b23
--- /dev/null
+++ b/arch/x86/coco/sev/host.c
@@ -0,0 +1,212 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD SVM-SEV Host Support.
+ *
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ *
+ * Author: Ashish Kalra <ashish.kalra@amd.com>
+ *
+ */
+
+#include <linux/cc_platform.h>
+#include <linux/printk.h>
+#include <linux/mm_types.h>
+#include <linux/set_memory.h>
+#include <linux/memblock.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/cpumask.h>
+#include <linux/iommu.h>
+#include <linux/amd-iommu.h>
+
+#include <asm/sev.h>
+#include <asm/processor.h>
+#include <asm/setup.h>
+#include <asm/svm.h>
+#include <asm/smp.h>
+#include <asm/cpu.h>
+#include <asm/apic.h>
+#include <asm/cpuid.h>
+#include <asm/cmdline.h>
+#include <asm/iommu.h>
+
+/*
+ * The first 16KB from the RMP_BASE is used by the processor for the
+ * bookkeeping, the range needs to be added during the RMP entry lookup.
+ */
+#define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
+
+static unsigned long rmptable_start __ro_after_init;
+static unsigned long rmptable_end __ro_after_init;
+
+#undef pr_fmt
+#define pr_fmt(fmt)	"SEV-SNP: " fmt
+
+static int __mfd_enable(unsigned int cpu)
+{
+	u64 val;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+
+	val |= MSR_AMD64_SYSCFG_MFDM;
+
+	wrmsrl(MSR_AMD64_SYSCFG, val);
+
+	return 0;
+}
+
+static __init void mfd_enable(void *arg)
+{
+	__mfd_enable(smp_processor_id());
+}
+
+static int __snp_enable(unsigned int cpu)
+{
+	u64 val;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+
+	val |= MSR_AMD64_SYSCFG_SNP_EN;
+	val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+	wrmsrl(MSR_AMD64_SYSCFG, val);
+
+	return 0;
+}
+
+static __init void snp_enable(void *arg)
+{
+	__snp_enable(smp_processor_id());
+}
+
+bool snp_get_rmptable_info(u64 *start, u64 *len)
+{
+	u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
+
+	rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+	rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+	if (!rmp_base || !rmp_end) {
+		pr_err("Memory for the RMP table has not been reserved by BIOS\n");
+		return false;
+	}
+
+	rmp_sz = rmp_end - rmp_base + 1;
+
+	/*
+	 * Calculate the amount the memory that must be reserved by the BIOS to
+	 * address the whole RAM, including the bookkeeping area. The RMP itself
+	 * must also be covered.
+	 */
+	max_rmp_pfn = max_pfn;
+	if (PHYS_PFN(rmp_end) > max_pfn)
+		max_rmp_pfn = PHYS_PFN(rmp_end);
+
+	calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+	if (calc_rmp_sz > rmp_sz) {
+		pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
+		       calc_rmp_sz, rmp_sz);
+		return false;
+	}
+
+	*start = rmp_base;
+	*len = rmp_sz;
+
+	return true;
+}
+
+static __init int __snp_rmptable_init(void)
+{
+	u64 rmp_base, sz;
+	void *start;
+	u64 val;
+
+	if (!snp_get_rmptable_info(&rmp_base, &sz))
+		return 1;
+
+	pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
+		rmp_base, rmp_base + sz - 1);
+
+	start = memremap(rmp_base, sz, MEMREMAP_WB);
+	if (!start) {
+		pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, sz);
+		return 1;
+	}
+
+	/*
+	 * Check if SEV-SNP is already enabled, this can happen in case of
+	 * kexec boot.
+	 */
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+	if (val & MSR_AMD64_SYSCFG_SNP_EN)
+		goto skip_enable;
+
+	/* Initialize the RMP table to zero */
+	memset(start, 0, sz);
+
+	/* Flush the caches to ensure that data is written before SNP is enabled. */
+	wbinvd_on_all_cpus();
+
+	/* MFDM must be enabled on all the CPUs prior to enabling SNP. */
+	on_each_cpu(mfd_enable, NULL, 1);
+
+	/* Enable SNP on all CPUs. */
+	on_each_cpu(snp_enable, NULL, 1);
+
+skip_enable:
+	rmptable_start = (unsigned long)start;
+	rmptable_end = rmptable_start + sz - 1;
+
+	return 0;
+}
+
+static int __init snp_rmptable_init(void)
+{
+	int family, model;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	family = boot_cpu_data.x86;
+	model  = boot_cpu_data.x86_model;
+
+	/*
+	 * RMP table entry format is not architectural and it can vary by processor and
+	 * is defined by the per-processor PPR. Restrict SNP support on the known CPU
+	 * model and family for which the RMP table entry format is currently defined for.
+	 */
+	if (!(family == 0x19 && model <= 0xaf) && !(family == 0x1a && model <= 0xf))
+		goto nosnp;
+
+	if (amd_iommu_snp_enable())
+		goto nosnp;
+
+	if (__snp_rmptable_init())
+		goto nosnp;
+
+	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+
+	return 0;
+
+nosnp:
+	setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+	return -ENOSYS;
+}
+
+/*
+ * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
+ * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
+ * called after subsys_initcall().
+ *
+ * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
+ * directly into guest private memory. In case of SNP, the IOMMU ensures that
+ * the page(s) used for DMA are hypervisor owned.
+ */
+fs_initcall(snp_rmptable_init);
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 5dfa4fb76f4b..0a9938aea305 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -99,6 +99,12 @@
 # define DISABLE_TDX_GUEST	(1 << (X86_FEATURE_TDX_GUEST & 31))
 #endif
 
+#ifdef CONFIG_KVM_AMD_SEV
+# define DISABLE_SEV_SNP	0
+#else
+# define DISABLE_SEV_SNP	(1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -123,7 +129,7 @@
 			 DISABLE_ENQCMD)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
-#define DISABLED_MASK19	0
+#define DISABLED_MASK19	(DISABLE_SEV_SNP)
 #define DISABLED_MASK20	0
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)
 
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ad35355ee43e..db0f3a041930 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -574,6 +574,8 @@
 #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
 #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
 #define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE		0xc0010132
+#define MSR_AMD64_RMP_END		0xc0010133
 
 /* SNP feature bits enabled by the hypervisor */
 #define MSR_AMD64_SNP_VTOM			BIT_ULL(3)
@@ -675,7 +677,14 @@
 #define MSR_K8_TOP_MEM2			0xc001001d
 #define MSR_AMD64_SYSCFG		0xc0010010
 #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
-#define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_MEM_ENCRYPT		BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
+#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
+#define MSR_AMD64_SYSCFG_MFDM_BIT		19
+#define MSR_AMD64_SYSCFG_MFDM			BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
+
 #define MSR_K8_INT_PENDING_MSG		0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ebc271bb6d8e..d34c46db7dd1 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -197,6 +197,7 @@ void snp_set_wakeup_secondary_cpu(void);
 bool snp_init(struct boot_params *bp);
 void __init __noreturn snp_abort(void);
 int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
+bool snp_get_rmptable_info(u64 *start, u64 *len);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -221,6 +222,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
 {
 	return -ENOTTY;
 }
+static inline bool snp_get_rmptable_info(u64 *start, u64 *len) { return false; }
 #endif
 
 #endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index a79774181f22..1493ddf89fdf 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -20,6 +20,7 @@
 #include <asm/delay.h>
 #include <asm/debugreg.h>
 #include <asm/resctrl.h>
+#include <asm/sev.h>
 
 #ifdef CONFIG_X86_64
 # include <asm/mmconfig.h>
@@ -546,6 +547,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 	resctrl_cpu_detect(c);
 }
 
+static bool early_rmptable_check(void)
+{
+	u64 rmp_base, rmp_size;
+
+	/*
+	 * For early BSP initialization, max_pfn won't be set up yet, wait until
+	 * it is set before performing the RMP table calculations.
+	 */
+	if (!max_pfn)
+		return true;
+
+	return snp_get_rmptable_info(&rmp_base, &rmp_size);
+}
+
 static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 {
 	u64 msr;
@@ -587,6 +602,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 		if (!(msr & MSR_K7_HWCR_SMMLOCK))
 			goto clear_sev;
 
+		if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
+			goto clear_snp;
+
 		return;
 
 clear_all:
@@ -594,6 +612,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 clear_sev:
 		setup_clear_cpu_cap(X86_FEATURE_SEV);
 		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+clear_snp:
 		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
 	}
 }
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 19a46b9f7357..33ea62d93540 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3665,7 +3665,7 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
 	return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
 }
 
-#ifdef CONFIG_AMD_MEM_ENCRYPT
+#ifdef CONFIG_KVM_AMD_SEV
 int amd_iommu_snp_enable(void)
 {
 	/*
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 953e6f12fa1c..8f0cde2d451c 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -206,7 +206,7 @@ int amd_iommu_pc_get_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn,
 		u64 *value);
 struct amd_iommu *get_amd_iommu(unsigned int idx);
 
-#ifdef CONFIG_AMD_MEM_ENCRYPT
+#ifdef CONFIG_KVM_AMD_SEV
 int amd_iommu_snp_enable(void);
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (6 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 15:39   ` Dave Hansen
  2023-06-12  4:25 ` [PATCH RFC v9 09/51] x86/sev: Add RMP entry lookup helpers Michael Roth
                   ` (42 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Kim Phillips

From: Kim Phillips <kim.phillips@amd.com>

A hardware limitation prevents the host from enabling Automatic IBRS
when SNP is enabled.  Instead, fall back to retpolines.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kernel/cpu/bugs.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index f9d060e71c3e..3fba3623ff64 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1507,7 +1507,12 @@ static void __init spectre_v2_select_mitigation(void)
 
 	if (spectre_v2_in_ibrs_mode(mode)) {
 		if (boot_cpu_has(X86_FEATURE_AUTOIBRS)) {
-			msr_set_bit(MSR_EFER, _EFER_AUTOIBRS);
+			if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
+				msr_set_bit(MSR_EFER, _EFER_AUTOIBRS);
+			} else {
+				pr_err("SNP feature available, not enabling AutoIBRS on the host.\n");
+				mode = spectre_v2_select_retpoline();
+			}
 		} else {
 			x86_spec_ctrl_base |= SPEC_CTRL_IBRS;
 			update_spec_ctrl(x86_spec_ctrl_base);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 09/51] x86/sev: Add RMP entry lookup helpers
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (7 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 16:08   ` Dave Hansen
  2023-06-12  4:25 ` [PATCH RFC v9 10/51] x86/fault: Add helper for dumping RMP entries Michael Roth
                   ` (41 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
entry for a given page. The RMP entry format is documented in AMD PPR, see
https://bugzilla.kernel.org/attachment.cgi?id=296015.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: separate 'assigned' indicator from return code]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/coco/sev/host.c          | 85 +++++++++++++++++++++++++++++++
 arch/x86/include/asm/sev-common.h |  4 ++
 arch/x86/include/asm/sev-host.h   | 22 ++++++++
 arch/x86/include/asm/sev.h        |  3 --
 4 files changed, 111 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/include/asm/sev-host.h

diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
index 6907ce887b23..0cc5a6d11b25 100644
--- a/arch/x86/coco/sev/host.c
+++ b/arch/x86/coco/sev/host.c
@@ -30,11 +30,36 @@
 #include <asm/cmdline.h>
 #include <asm/iommu.h>
 
+/*
+ * The RMP entry format is not architectural. The format is defined in PPR
+ * Family 19h Model 01h, Rev B1 processor.
+ */
+struct rmpentry {
+	union {
+		struct {
+			u64	assigned	: 1,
+				pagesize	: 1,
+				immutable	: 1,
+				rsvd1		: 9,
+				gpa		: 39,
+				asid		: 10,
+				vmsa		: 1,
+				validated	: 1,
+				rsvd2		: 1;
+		} info;
+		u64 low;
+	};
+	u64 high;
+} __packed;
+
 /*
  * The first 16KB from the RMP_BASE is used by the processor for the
  * bookkeeping, the range needs to be added during the RMP entry lookup.
  */
 #define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
+#define RMPENTRY_SHIFT			8
+#define rmptable_page_offset(x)	(RMPTABLE_CPU_BOOKKEEPING_SZ +		\
+				 (((unsigned long)x) >> RMPENTRY_SHIFT))
 
 static unsigned long rmptable_start __ro_after_init;
 static unsigned long rmptable_end __ro_after_init;
@@ -210,3 +235,63 @@ static int __init snp_rmptable_init(void)
  * the page(s) used for DMA are hypervisor owned.
  */
 fs_initcall(snp_rmptable_init);
+
+static inline unsigned int rmpentry_assigned(const struct rmpentry *e)
+{
+	return e->info.assigned;
+}
+
+static inline unsigned int rmpentry_pagesize(const struct rmpentry *e)
+{
+	return e->info.pagesize;
+}
+
+static int rmptable_entry(unsigned long paddr, struct rmpentry *entry)
+{
+	unsigned long vaddr;
+
+	vaddr = rmptable_start + rmptable_page_offset(paddr);
+	if (unlikely(vaddr > rmptable_end))
+		return -EFAULT;
+
+	*entry = *(struct rmpentry *)vaddr;
+
+	return 0;
+}
+
+static int __snp_lookup_rmpentry(u64 pfn, struct rmpentry *entry, int *level)
+{
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	struct rmpentry large_entry;
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	ret = rmptable_entry(paddr, entry);
+	if (ret)
+		return ret;
+
+	/* Read a large RMP entry to get the correct page level used in RMP entry. */
+	ret = rmptable_entry(paddr & PMD_MASK, &large_entry);
+	if (ret)
+		return ret;
+
+	*level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(&large_entry));
+
+	return 0;
+}
+
+int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
+{
+	struct rmpentry e;
+	int ret;
+
+	ret = __snp_lookup_rmpentry(pfn, &e, level);
+	if (ret)
+		return ret;
+
+	*assigned = !!rmpentry_assigned(&e);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b8357d6ecd47..bf0378136289 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -171,4 +171,8 @@ struct snp_psc_desc {
 #define GHCB_ERR_INVALID_INPUT		5
 #define GHCB_ERR_INVALID_EVENT		6
 
+/* RMP page size */
+#define RMP_PG_SIZE_4K			0
+#define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+
 #endif
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
new file mode 100644
index 000000000000..30d47e20081d
--- /dev/null
+++ b/arch/x86/include/asm/sev-host.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * AMD SVM-SEV Host Support.
+ *
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ *
+ * Author: Ashish Kalra <ashish.kalra@amd.com>
+ *
+ */
+
+#ifndef __ASM_X86_SEV_HOST_H
+#define __ASM_X86_SEV_HOST_H
+
+#include <asm/sev-common.h>
+
+#ifdef CONFIG_KVM_AMD_SEV
+int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
+#else
+static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
+#endif
+
+#endif
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index d34c46db7dd1..446fc7a9f7b0 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -81,9 +81,6 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 /* Software defined (when rFlags.CF = 1) */
 #define PVALIDATE_FAIL_NOUPDATE		255
 
-/* RMP page size */
-#define RMP_PG_SIZE_4K			0
-
 #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
 
 /* SNP Guest message request */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 10/51] x86/fault: Add helper for dumping RMP entries
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (8 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 09/51] x86/sev: Add RMP entry lookup helpers Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 16:12   ` Dave Hansen
  2023-06-12  4:25 ` [PATCH RFC v9 11/51] x86/traps: Define RMP violation #PF error code Michael Roth
                   ` (40 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

This information will be useful for debugging things like page faults
due to RMP access violations and RMPUPDATE failures.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: move helper to standalone patch]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/coco/sev/host.c        | 43 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/sev-host.h |  2 ++
 2 files changed, 45 insertions(+)

diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
index 0cc5a6d11b25..d766b3bc6647 100644
--- a/arch/x86/coco/sev/host.c
+++ b/arch/x86/coco/sev/host.c
@@ -295,3 +295,46 @@ int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+
+void sev_dump_rmpentry(u64 pfn)
+{
+	unsigned long pfn_end;
+	struct rmpentry e;
+	int level, ret;
+
+	ret = __snp_lookup_rmpentry(pfn, &e, &level);
+	if (ret) {
+		pr_info("Failed to read RMP entry for PFN 0x%llx, error %d\n", pfn, ret);
+		return;
+	}
+
+	if (rmpentry_assigned(&e)) {
+		pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+			pfn << PAGE_SHIFT, e.high, e.low);
+		return;
+	}
+
+	/*
+	 * If the RMP entry at the faulting pfn was not assigned, then not sure
+	 * what caused the RMP violation. To get some useful debug information,
+	 * iterate through the entire 2MB region, and dump the RMP entries if
+	 * one of the bit in the RMP entry is set.
+	 */
+	pfn = pfn & ~(PTRS_PER_PMD - 1);
+	pfn_end = pfn + PTRS_PER_PMD;
+
+	while (pfn < pfn_end) {
+		ret = __snp_lookup_rmpentry(pfn, &e, &level);
+		if (ret) {
+			pr_info("Failed to read RMP entry for PFN 0x%llx\n", pfn);
+			pfn++;
+			continue;
+		}
+
+		if (e.low || e.high)
+			pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+				pfn << PAGE_SHIFT, e.high, e.low);
+		pfn++;
+	}
+}
+EXPORT_SYMBOL_GPL(sev_dump_rmpentry);
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index 30d47e20081d..85cfe577155c 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -15,8 +15,10 @@
 
 #ifdef CONFIG_KVM_AMD_SEV
 int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
+void sev_dump_rmpentry(u64 pfn);
 #else
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
+static inline void sev_dump_rmpentry(u64 pfn) {}
 #endif
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 11/51] x86/traps: Define RMP violation #PF error code
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (9 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 10/51] x86/fault: Add helper for dumping RMP entries Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 16:26   ` Dave Hansen
  2023-06-12  4:25 ` [PATCH RFC v9 12/51] x86/fault: Report RMP page faults for kernel addresses Michael Roth
                   ` (39 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.

While at it, use the BIT_ULL() macro.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/trap_pf.h | 18 +++++++++++-------
 arch/x86/mm/fault.c            |  1 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..295be06f8db7 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_TRAP_PF_H
 #define _ASM_X86_TRAP_PF_H
 
+#include <linux/bits.h>  /* BIT() macro */
+
 /*
  * Page fault error code bits:
  *
@@ -12,15 +14,17 @@
  *   bit 4 ==				1: fault was an instruction fetch
  *   bit 5 ==				1: protection keys block access
  *   bit 15 ==				1: SGX MMU page-fault
+ *   bit 31 ==				1: fault was due to RMP violation
  */
 enum x86_pf_error_code {
-	X86_PF_PROT	=		1 << 0,
-	X86_PF_WRITE	=		1 << 1,
-	X86_PF_USER	=		1 << 2,
-	X86_PF_RSVD	=		1 << 3,
-	X86_PF_INSTR	=		1 << 4,
-	X86_PF_PK	=		1 << 5,
-	X86_PF_SGX	=		1 << 15,
+	X86_PF_PROT	=		BIT(0),
+	X86_PF_WRITE	=		BIT(1),
+	X86_PF_USER	=		BIT(2),
+	X86_PF_RSVD	=		BIT(3),
+	X86_PF_INSTR	=		BIT(4),
+	X86_PF_PK	=		BIT(5),
+	X86_PF_SGX	=		BIT(15),
+	X86_PF_RMP	=		BIT(31),
 };
 
 #endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a498ae1fbe66..95791071e3cd 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -546,6 +546,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		 !(error_code & X86_PF_PROT) ? "not-present page" :
 		 (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
 		 (error_code & X86_PF_PK)    ? "protection keys violation" :
+		 (error_code & X86_PF_RMP)   ? "RMP violation" :
 					       "permissions violation");
 
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 12/51] x86/fault: Report RMP page faults for kernel addresses
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (10 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 11/51] x86/traps: Define RMP violation #PF error code Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 16:30   ` Dave Hansen
  2023-06-12  4:25 ` [PATCH RFC v9 13/51] x86/fault: Handle RMP page faults for user addresses Michael Roth
                   ` (38 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

RMP #PFs on kernel addresses are fatal and should never happen in
practice. They indicate a bug in the host kernel somewhere, so dump some
information about any RMP entries related to the faulting address to aid
with debugging.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/mm/fault.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 95791071e3cd..d46b9cf832b9 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -33,6 +33,7 @@
 #include <asm/kvm_para.h>		/* kvm_handle_async_pf		*/
 #include <asm/vdso.h>			/* fixup_vdso_exception()	*/
 #include <asm/irq_stack.h>
+#include <asm/sev-host.h>		/* sev_dump_rmpentry()          */
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -579,6 +580,18 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 	}
 
 	dump_pagetable(address);
+
+	if (error_code & X86_PF_RMP) {
+		unsigned int level;
+		pgd_t *pgd;
+		pte_t *pte;
+
+		pgd = __va(read_cr3_pa());
+		pgd += pgd_index(address);
+		pte = lookup_address_in_pgd(pgd, address, &level);
+
+		sev_dump_rmpentry(pte_pfn(*pte));
+	}
 }
 
 static noinline void
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 13/51] x86/fault: Handle RMP page faults for user addresses
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (11 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 12/51] x86/fault: Report RMP page faults for kernel addresses Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 16:40   ` Dave Hansen
  2023-06-12  4:25 ` [PATCH RFC v9 14/51] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
                   ` (37 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled globally, a write from the host is subject to
checks performed by the hardware against the RMP table (APM2 15.36.10)
at the end of a page walk:

  1. Assigned bit in the RMP table is not set (i.e page is shared).
  2. Immutable bit in the RMP table is not set.
  3. If the page table entry that gives the sPA indicates that the
     target page size is a large page, then all RMP entries for the 4KB
     constituting pages of the target must have the assigned bit 0.

Nothing constructive can come of an attempt by userspace to violate case
1) (which will result in writing garbage due to page encryption) or case
2) (userspace should not ever need or be allowed to write to a page that
the host has specifically needed to mark immutable).

Case 3) is dependent on the hypervisor. In case of KVM, due to how
shared/private pages are partitioned into separate memory pools via
restricted/guarded memory, there should never be a case where a page in
the private pool overlaps with a shared page: either it is a
hugepage-sized allocation and all the sub-pages are private, or it is a
single-page allocation, in which case it cannot overlap with anything
but itself.

Therefore, for all 3 cases, it is appropriate to simply kill the
userspace process if it ever generates an RMP #PF. Implement that logic
here.

Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@profian.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: drop all previous page-splitting logic since it is no longer
 needed with restricted/guarded memory, update commit message
 accordingly]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/mm/fault.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d46b9cf832b9..6465bff9d1ba 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1329,6 +1329,13 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+	if (error_code & X86_PF_RMP) {
+		pr_err("Unexpected RMP page fault for address 0x%lx, terminating process\n",
+		       address);
+		do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
+		return;
+	}
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Faults in the vsyscall page might need emulation.  The
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 14/51] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (12 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 13/51] x86/fault: Handle RMP page faults for user addresses Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 17:00   ` Dave Hansen
  2023-06-12  4:25 ` [PATCH RFC v9 15/51] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
                   ` (36 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set
of contiguous 4KB-Page RMP entries. The hypervisor will use this
instruction to adjust the RMP entry without invalidating the previous
RMP entry.

Add the following external interface API functions:

psmash():
  Used to smash a 2MB aligned page into 4K pages while preserving the
  Validated bit in the RMP.

rmp_make_private():
  Used to assign a page to guest using the RMPUPDATE instruction.

rmp_make_shared():
  Used to transition a page to hypervisor/shared state using the
  RMPUPDATE instruction.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/coco/sev/host.c          | 94 +++++++++++++++++++++++++++++++
 arch/x86/include/asm/sev-common.h | 14 +++++
 arch/x86/include/asm/sev-host.h   | 10 ++++
 3 files changed, 118 insertions(+)

diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
index d766b3bc6647..9df690b0b263 100644
--- a/arch/x86/coco/sev/host.c
+++ b/arch/x86/coco/sev/host.c
@@ -338,3 +338,97 @@ void sev_dump_rmpentry(u64 pfn)
 	}
 }
 EXPORT_SYMBOL_GPL(sev_dump_rmpentry);
+
+/*
+ * PSMASH a 2MB aligned page into 4K pages in the RMP table while preserving the
+ * Validated bit.
+ */
+int psmash(u64 pfn)
+{
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	int ret;
+
+	pr_debug("%s: PFN: 0x%llx\n", __func__, pfn);
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	/* Binutils version 2.36 supports the PSMASH mnemonic. */
+	asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+		      : "=a"(ret)
+		      : "a"(paddr)
+		      : "memory", "cc");
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+static int rmpupdate(u64 pfn, struct rmp_state *val)
+{
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	int ret, level, npages;
+	int attempts = 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	do {
+		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+			     : "=a"(ret)
+			     : "a"(paddr), "c"((unsigned long)val)
+			     : "memory", "cc");
+
+		attempts++;
+	} while (ret == RMPUPDATE_FAIL_OVERLAP);
+
+	if (ret) {
+		pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n",
+		       attempts, ret, pfn, npages, level);
+		sev_dump_rmpentry(pfn);
+		dump_stack();
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+/*
+ * Assign a page to guest using the RMPUPDATE instruction.
+ */
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
+{
+	struct rmp_state val;
+
+	pr_debug("%s: GPA: 0x%llx, PFN: 0x%llx, level: %d, immutable: %d\n",
+		 __func__, gpa, pfn, level, immutable);
+
+	memset(&val, 0, sizeof(val));
+	val.assigned = 1;
+	val.asid = asid;
+	val.immutable = immutable;
+	val.gpa = gpa;
+	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+	return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_private);
+
+/*
+ * Transition a page to hypervisor/shared state using the RMPUPDATE instruction.
+ */
+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+	struct rmp_state val;
+
+	pr_debug("%s: PFN: 0x%llx, level: %d\n", __func__, pfn, level);
+
+	memset(&val, 0, sizeof(val));
+	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+	return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_shared);
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index bf0378136289..9eb20b416251 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -171,8 +171,22 @@ struct snp_psc_desc {
 #define GHCB_ERR_INVALID_INPUT		5
 #define GHCB_ERR_INVALID_EVENT		6
 
+/* RMUPDATE detected 4K page and 2MB page overlap. */
+#define RMPUPDATE_FAIL_OVERLAP		4
+
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
+#define RMP_PG_SIZE_2M			1
 #define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+#define X86_TO_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+
+struct rmp_state {
+	u64 gpa;
+	u8 assigned;
+	u8 pagesize;
+	u8 immutable;
+	u8 rsvd;
+	u32 asid;
+} __packed;
 
 #endif
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index 85cfe577155c..753e80d16433 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -16,9 +16,19 @@
 #ifdef CONFIG_KVM_AMD_SEV
 int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
 void sev_dump_rmpentry(u64 pfn);
+int psmash(u64 pfn);
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
+int rmp_make_shared(u64 pfn, enum pg_level level);
 #else
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
 static inline void sev_dump_rmpentry(u64 pfn) {}
+static inline int psmash(u64 pfn) { return -ENXIO; }
+static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
+				   bool immutable)
+{
+	return -ENODEV;
+}
+static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
 #endif
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 15/51] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (13 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 14/51] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 16/51] crypto: ccp: Define the SEV-SNP commands Michael Roth
                   ` (35 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used with standard x86 and IOMMU page tables to enforce
memory restrictions and page access rights. The RMP check is enforced as
soon as SEV-SNP is enabled globally in the system. When hardware
encounters an RMP-check failure, it raises a page-fault exception.

The rmp_make_private() and rmp_make_shared() helpers are used to add
or remove the pages from the RMP table. Improve the rmp_make_private()
to invalidate state so that pages cannot be used in the direct-map after
they are added the RMP table, and restored to their default valid
permission after the pages are removed from the RMP table.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/coco/sev/host.c | 62 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
index 9df690b0b263..cd3b4c6a25bc 100644
--- a/arch/x86/coco/sev/host.c
+++ b/arch/x86/coco/sev/host.c
@@ -366,6 +366,42 @@ int psmash(u64 pfn)
 }
 EXPORT_SYMBOL_GPL(psmash);
 
+static int restore_direct_map(u64 pfn, int npages)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < npages; i++) {
+		ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
+		if (ret)
+			break;
+	}
+
+	if (ret)
+		pr_warn("Failed to restore direct map for pfn 0x%llx, ret: %d\n",
+			pfn + i, ret);
+
+	return ret;
+}
+
+static int invalidate_direct_map(u64 pfn, int npages)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < npages; i++) {
+		ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		pr_warn("Failed to invalidate direct map for pfn 0x%llx, ret: %d\n",
+			pfn + i, ret);
+		restore_direct_map(pfn, i);
+	}
+
+	return ret;
+}
+
 static int rmpupdate(u64 pfn, struct rmp_state *val)
 {
 	unsigned long paddr = pfn << PAGE_SHIFT;
@@ -375,6 +411,21 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
 	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
 		return -ENXIO;
 
+	level = RMP_TO_X86_PG_LEVEL(val->pagesize);
+	npages = page_level_size(level) / PAGE_SIZE;
+
+	/*
+	 * If page is getting assigned in the RMP table then unmap it from the
+	 * direct map.
+	 */
+	if (val->assigned) {
+		if (invalidate_direct_map(pfn, npages)) {
+			pr_err("Failed to unmap %d pages at pfn 0x%llx from the direct_map\n",
+			       npages, pfn);
+			return -EFAULT;
+		}
+	}
+
 	do {
 		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
 		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
@@ -393,6 +444,17 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
 		return -EFAULT;
 	}
 
+	/*
+	 * Restore the direct map after the page is removed from the RMP table.
+	 */
+	if (!val->assigned) {
+		if (restore_direct_map(pfn, npages)) {
+			pr_err("Failed to map %d pages at pfn 0x%llx into the direct_map\n",
+			       npages, pfn);
+			return -EFAULT;
+		}
+	}
+
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 16/51] crypto: ccp: Define the SEV-SNP commands
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (14 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 15/51] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 17/51] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
                   ` (34 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: update SNP command list and SNP status struct based on current
      spec]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c |  16 +++
 include/linux/psp-sev.h      | 246 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h |  53 ++++++++
 3 files changed, 315 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index e2f25926eb51..ab3572286755 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -128,6 +128,8 @@ static int sev_cmd_buffer_len(int cmd)
 	switch (cmd) {
 	case SEV_CMD_INIT:			return sizeof(struct sev_data_init);
 	case SEV_CMD_INIT_EX:                   return sizeof(struct sev_data_init_ex);
+	case SEV_CMD_SNP_SHUTDOWN_EX:		return sizeof(struct sev_data_snp_shutdown_ex);
+	case SEV_CMD_SNP_INIT_EX:		return sizeof(struct sev_data_snp_init_ex);
 	case SEV_CMD_PLATFORM_STATUS:		return sizeof(struct sev_user_data_status);
 	case SEV_CMD_PEK_CSR:			return sizeof(struct sev_data_pek_csr);
 	case SEV_CMD_PEK_CERT_IMPORT:		return sizeof(struct sev_data_pek_cert_import);
@@ -156,6 +158,20 @@ static int sev_cmd_buffer_len(int cmd)
 	case SEV_CMD_GET_ID:			return sizeof(struct sev_data_get_id);
 	case SEV_CMD_ATTESTATION_REPORT:	return sizeof(struct sev_data_attestation_report);
 	case SEV_CMD_SEND_CANCEL:		return sizeof(struct sev_data_send_cancel);
+	case SEV_CMD_SNP_GCTX_CREATE:		return sizeof(struct sev_data_snp_addr);
+	case SEV_CMD_SNP_LAUNCH_START:		return sizeof(struct sev_data_snp_launch_start);
+	case SEV_CMD_SNP_LAUNCH_UPDATE:		return sizeof(struct sev_data_snp_launch_update);
+	case SEV_CMD_SNP_ACTIVATE:		return sizeof(struct sev_data_snp_activate);
+	case SEV_CMD_SNP_DECOMMISSION:		return sizeof(struct sev_data_snp_addr);
+	case SEV_CMD_SNP_PAGE_RECLAIM:		return sizeof(struct sev_data_snp_page_reclaim);
+	case SEV_CMD_SNP_GUEST_STATUS:		return sizeof(struct sev_data_snp_guest_status);
+	case SEV_CMD_SNP_LAUNCH_FINISH:		return sizeof(struct sev_data_snp_launch_finish);
+	case SEV_CMD_SNP_DBG_DECRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_DBG_ENCRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_PAGE_UNSMASH:		return sizeof(struct sev_data_snp_page_unsmash);
+	case SEV_CMD_SNP_PLATFORM_STATUS:	return sizeof(struct sev_data_snp_addr);
+	case SEV_CMD_SNP_GUEST_REQUEST:		return sizeof(struct sev_data_snp_guest_request);
+	case SEV_CMD_SNP_CONFIG:		return sizeof(struct sev_user_data_snp_config);
 	default:				return 0;
 	}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 1595088c428b..06d0619ca442 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -86,6 +86,36 @@ enum sev_cmd {
 	SEV_CMD_DBG_DECRYPT		= 0x060,
 	SEV_CMD_DBG_ENCRYPT		= 0x061,
 
+	/* SNP specific commands */
+	SEV_CMD_SNP_INIT			= 0x81,
+	SEV_CMD_SNP_SHUTDOWN			= 0x82,
+	SEV_CMD_SNP_PLATFORM_STATUS		= 0x83,
+	SEV_CMD_SNP_DF_FLUSH			= 0x84,
+	SEV_CMD_SNP_INIT_EX			= 0x85,
+	SEV_CMD_SNP_SHUTDOWN_EX			= 0x86,
+	SEV_CMD_SNP_DECOMMISSION		= 0x90,
+	SEV_CMD_SNP_ACTIVATE			= 0x91,
+	SEV_CMD_SNP_GUEST_STATUS		= 0x92,
+	SEV_CMD_SNP_GCTX_CREATE			= 0x93,
+	SEV_CMD_SNP_GUEST_REQUEST		= 0x94,
+	SEV_CMD_SNP_ACTIVATE_EX			= 0x95,
+	SEV_CMD_SNP_LAUNCH_START		= 0xA0,
+	SEV_CMD_SNP_LAUNCH_UPDATE		= 0xA1,
+	SEV_CMD_SNP_LAUNCH_FINISH		= 0xA2,
+	SEV_CMD_SNP_DBG_DECRYPT			= 0xB0,
+	SEV_CMD_SNP_DBG_ENCRYPT			= 0xB1,
+	SEV_CMD_SNP_PAGE_SWAP_OUT		= 0xC0,
+	SEV_CMD_SNP_PAGE_SWAP_IN		= 0xC1,
+	SEV_CMD_SNP_PAGE_MOVE			= 0xC2,
+	SEV_CMD_SNP_PAGE_MD_INIT		= 0xC3,
+	SEV_CMD_SNP_PAGE_SET_STATE		= 0xC6,
+	SEV_CMD_SNP_PAGE_RECLAIM		= 0xC7,
+	SEV_CMD_SNP_PAGE_UNSMASH		= 0xC8,
+	SEV_CMD_SNP_CONFIG			= 0xC9,
+	SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX	= 0xCA,
+	SEV_CMD_SNP_COMMIT			= 0xCB,
+	SEV_CMD_SNP_VLEK_LOAD			= 0xCD,
+
 	SEV_CMD_MAX,
 };
 
@@ -531,6 +561,222 @@ struct sev_data_attestation_report {
 	u32 len;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+	u64 address;				/* In */
+	u32 len;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+	u64 gctx_paddr;				/* In */
+	u32 asid;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_addr - generic SNP command params
+ *
+ * @address: system physical address guest context page
+ */
+struct sev_data_snp_addr {
+	u64 gctx_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_addr: system physical address of migration agent
+ * @imi_en: launch flow is launching an IMI for the purpose of
+ *   guest-assisted migration.
+ * @ma_en: the guest is associated with a migration agent
+ */
+struct sev_data_snp_launch_start {
+	u64 gctx_paddr;				/* In */
+	u64 policy;				/* In */
+	u64 ma_gctx_paddr;			/* In */
+	u32 ma_en:1;				/* In */
+	u32 imi_en:1;				/* In */
+	u32 rsvd:30;
+	u8 gosvw[16];				/* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+	SNP_PAGE_TYPE_NORMAL		= 0x1,
+	SNP_PAGE_TYPE_VMSA		= 0x2,
+	SNP_PAGE_TYPE_ZERO		= 0x3,
+	SNP_PAGE_TYPE_UNMEASURED	= 0x4,
+	SNP_PAGE_TYPE_SECRET		= 0x5,
+	SNP_PAGE_TYPE_CPUID		= 0x6,
+
+	SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @page_type: encoded page type
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @address: system physical address of destination page to encrypt
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ */
+struct sev_data_snp_launch_update {
+	u64 gctx_paddr;				/* In */
+	u32 page_size:1;			/* In */
+	u32 page_type:3;			/* In */
+	u32 imi_page:1;				/* In */
+	u32 rsvd:27;
+	u32 rsvd2;
+	u64 address;				/* In */
+	u32 rsvd3:8;
+	u32 vmpl1_perms:8;			/* In */
+	u32 vmpl2_perms:8;			/* In */
+	u32 vmpl3_perms:8;			/* In */
+	u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ */
+struct sev_data_snp_launch_finish {
+	u64 gctx_paddr;
+	u64 id_block_paddr;
+	u64 id_auth_paddr;
+	u8 id_block_en:1;
+	u8 auth_key_en:1;
+	u64 rsvd:62;
+	u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+	u64 gctx_paddr;
+	u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The 0th bit
+ *	in the address indicates the page size. 0h indicates 4 kB and
+ *	1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNSMASH command params
+ *
+ * @paddr: system physical address of page to be unsmashed. The 0th bit
+ *	in the address indicates the page size. 0h indicates 4 kB and
+ *	1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @handle: handle of the VM to perform debug operation
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ */
+struct sev_data_snp_dbg {
+	u64 gctx_paddr;				/* In */
+	u64 src_addr;				/* In */
+	u64 dst_addr;				/* In */
+} __packed;
+
+/**
+ * struct sev_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+	u64 gctx_paddr;				/* In */
+	u64 req_paddr;				/* In */
+	u64 res_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_init - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ * @list_paddr_en: indicate that list_paddr is valid
+ * @list_paddr: system physical address of range list
+ */
+struct sev_data_snp_init_ex {
+	u32 init_rmp:1;
+	u32 list_paddr_en:1;
+	u32 rsvd:30;
+	u32 rsvd1;
+	u64 list_paddr;
+	u8  rsvd2[48];
+} __packed;
+
+/**
+ * struct sev_data_range - RANGE structure
+ *
+ * @base: system physical address of first byte of range
+ * @page_count: number of 4KB pages in this range
+ */
+struct sev_data_range {
+	u64 base;
+	u32 page_count;
+	u32 rsvd;
+} __packed;
+
+/**
+ * struct sev_data_range_list - RANGE_LIST structure
+ *
+ * @num_elements: number of elements in RANGE_ARRAY
+ * @ranges: array of num_elements of type RANGE
+ */
+struct sev_data_range_list {
+	u32 num_elements;
+	u32 rsvd;
+	struct sev_data_range ranges[0];
+} __packed;
+
+/**
+ * struct sev_data_snp_shutdown_ex - SNP_SHUTDOWN_EX structure
+ *
+ * @length: len of the command buffer read by the PSP
+ * @iommu_snp_shutdown: Disable enforcement of SNP in the IOMMU
+ */
+struct sev_data_snp_shutdown_ex {
+	u32 length;
+	u32 iommu_snp_shutdown:1;
+	u32 rsvd1:31;
+} __packed;
+
 #ifdef CONFIG_CRYPTO_DEV_SP_PSP
 
 /**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 91b4c63d5cbf..7d8a2dd20273 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -61,6 +61,13 @@ typedef enum {
 	SEV_RET_INVALID_PARAM,
 	SEV_RET_RESOURCE_LIMIT,
 	SEV_RET_SECURE_DATA_INVALID,
+	SEV_RET_INVALID_PAGE_SIZE,
+	SEV_RET_INVALID_PAGE_STATE,
+	SEV_RET_INVALID_MDATA_ENTRY,
+	SEV_RET_INVALID_PAGE_OWNER,
+	SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+	SEV_RET_RMP_INIT_REQUIRED,
+
 	SEV_RET_MAX,
 } sev_ret_code;
 
@@ -147,6 +154,52 @@ struct sev_user_data_get_id2 {
 	__u32 length;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @major: API major version
+ * @minor: API minor version
+ * @state: current platform state
+ * @is_rmp_initialized: whether RMP is initialized or not
+ * @build: firmware build id for the API version
+ * @mask_chip_id: whether chip id is present in attestation reports or not
+ * @mask_chip_key: whether attestation reports are signed or not
+ * @vlek_en: VLEK hashstick is loaded
+ * @guest_count: the number of guest currently managed by the firmware
+ * @current_tcb_version: current TCB version
+ * @reported_tcb_version: reported TCB version
+ */
+struct sev_user_data_snp_status {
+	__u8 api_major;			/* Out */
+	__u8 api_minor;			/* Out */
+	__u8 state;			/* Out */
+	__u8 is_rmp_initialized:1;	/* Out */
+	__u8 rsvd:7;
+	__u32 build_id;			/* Out */
+	__u32 mask_chip_id:1;		/* Out */
+	__u32 mask_chip_key:1;		/* Out */
+	__u32 vlek_en:1;		/* Out */
+	__u32 rsvd1:29;
+	__u32 guest_count;		/* Out */
+	__u64 current_tcb_version;	/* Out */
+	__u64 reported_tcb_version;	/* Out */
+} __packed;
+
+/*
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: The TCB version to report in the guest attestation report.
+ * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
+ * will always be zero.
+ */
+struct sev_user_data_snp_config {
+	__u64 reported_tcb  ;   /* In */
+	__u32 mask_chip_id:1;   /* In */
+	__u32 mask_chip_key:1;  /* In */
+	__u32 rsvd:30;          /* In */
+	__u8 rsvd1[52];
+} __packed;
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 17/51] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (15 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 16/51] crypto: ccp: Define the SEV-SNP commands Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 18/51] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
                   ` (33 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen

From: Brijesh Singh <brijesh.singh@amd.com>

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
command to prepare for the first SNP guest launch after INIT.

During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to other page states
after SNP_INIT enables their SNP enforcement.

Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 242 +++++++++++++++++++++++++++++++++--
 drivers/crypto/ccp/sev-dev.h |   2 +
 2 files changed, 234 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index ab3572286755..d3764ee073f3 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -27,6 +27,7 @@
 
 #include <asm/smp.h>
 #include <asm/cacheflush.h>
+#include <asm/e820/types.h>
 
 #include "psp-dev.h"
 #include "sev-dev.h"
@@ -35,6 +36,10 @@
 #define SEV_FW_FILE		"amd/sev.fw"
 #define SEV_FW_NAME_SIZE	64
 
+/* Minimum firmware version required for the SEV-SNP support */
+#define SNP_MIN_API_MAJOR	1
+#define SNP_MIN_API_MINOR	51
+
 static DEFINE_MUTEX(sev_cmd_mutex);
 static struct sev_misc_dev *misc_dev;
 
@@ -78,6 +83,14 @@ static void *sev_es_tmr;
 #define NV_LENGTH (32 * 1024)
 static void *sev_init_ex_buffer;
 
+/*
+ * SEV_DATA_RANGE_LIST:
+ *   Array containing range of pages that firmware transitions to HV-fixed
+ *   page state.
+ */
+struct sev_data_range_list *snp_range_list;
+static int __sev_snp_init_locked(int *error);
+
 static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -462,7 +475,8 @@ static int __sev_platform_init_locked(int *error)
 {
 	struct psp_device *psp = psp_master;
 	struct sev_device *sev;
-	int rc = 0, psp_ret = -1;
+	int psp_ret = -1;
+	int rc;
 	int (*init_function)(int *error);
 
 	if (!psp || !psp->sev_data)
@@ -473,6 +487,26 @@ static int __sev_platform_init_locked(int *error)
 	if (sev->state == SEV_STATE_INIT)
 		return 0;
 
+	rc = __sev_snp_init_locked(error);
+	if (rc && rc != -ENODEV) {
+		/*
+		 * Don't abort the probe if SNP INIT failed,
+		 * continue to initialize the legacy SEV firmware.
+		 */
+		dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
+	}
+
+	if (!sev_es_tmr) {
+		/* Obtain the TMR memory area for SEV-ES use */
+		sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
+		if (sev_es_tmr)
+			/* Must flush the cache before giving it to the firmware */
+			clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
+		else
+			dev_warn(sev->dev,
+				 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
+		}
+
 	if (sev_init_ex_buffer) {
 		init_function = __sev_init_ex_locked;
 		rc = sev_read_init_ex_file();
@@ -832,6 +866,191 @@ static int sev_update_firmware(struct device *dev)
 	return ret;
 }
 
+static void snp_set_hsave_pa(void *arg)
+{
+	wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
+{
+	struct sev_data_range_list *range_list = arg;
+	struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
+	size_t size;
+
+	if ((range_list->num_elements * sizeof(struct sev_data_range) +
+	     sizeof(struct sev_data_range_list)) > PAGE_SIZE)
+		return -E2BIG;
+
+	switch (rs->desc) {
+	case E820_TYPE_RESERVED:
+	case E820_TYPE_PMEM:
+	case E820_TYPE_ACPI:
+		range->base = rs->start & PAGE_MASK;
+		size = (rs->end + 1) - rs->start;
+		range->page_count = size >> PAGE_SHIFT;
+		range_list->num_elements++;
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+	struct psp_device *psp = psp_master;
+	struct sev_data_snp_init_ex data;
+	struct sev_device *sev;
+	int rc = 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENODEV;
+
+	if (!psp || !psp->sev_data)
+		return -ENODEV;
+
+	sev = psp->sev_data;
+
+	if (sev->snp_initialized)
+		return 0;
+
+	if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
+		dev_dbg(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
+			SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
+		return 0;
+	}
+
+	/*
+	 * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
+	 * across all cores.
+	 */
+	on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+	/*
+	 * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
+	 * system physical address ranges to convert into the HV-fixed page states
+	 * during the RMP initialization.  For instance, the memory that UEFI
+	 * reserves should be included in the range list. This allows system
+	 * components that occasionally write to memory (e.g. logging to UEFI
+	 * reserved regions) to not fail due to RMP initialization and SNP enablement.
+	 */
+	if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
+		/*
+		 * Firmware checks that the pages containing the ranges enumerated
+		 * in the RANGES structure are either in the Default page state or in the
+		 * firmware page state.
+		 */
+		snp_range_list = kzalloc(PAGE_SIZE, GFP_KERNEL);
+		if (!snp_range_list) {
+			dev_err(sev->dev,
+				"SEV: SNP_INIT_EX range list memory allocation failed\n");
+			return -ENOMEM;
+		}
+
+		/*
+		 * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
+		 * to be setup as HV-fixed pages.
+		 */
+
+		rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
+					 snp_range_list, snp_filter_reserved_mem_regions);
+		if (rc) {
+			dev_err(sev->dev,
+				"SEV: SNP_INIT_EX walk_iomem_res_desc failed rc = %d\n", rc);
+			return rc;
+		}
+
+		memset(&data, 0, sizeof(data));
+		data.init_rmp = 1;
+		data.list_paddr_en = 1;
+		data.list_paddr = __psp_pa(snp_range_list);
+
+		/*
+		 * Before invoking SNP_INIT_EX with INIT_RMP=1, make sure that
+		 * all dirty cache lines containing the RMP are flushed.
+		 *
+		 * NOTE: that includes writes via RMPUPDATE instructions, which
+		 * are also cacheable writes.
+		 */
+		wbinvd_on_all_cpus();
+
+		rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT_EX, &data, error);
+		if (rc)
+			return rc;
+	} else {
+		/*
+		 * SNP_INIT is equivalent to SNP_INIT_EX with INIT_RMP=1, so
+		 * just as with that case, make sure all dirty cache lines
+		 * containing the RMP are flushed.
+		 */
+		wbinvd_on_all_cpus();
+
+		rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+		if (rc)
+			return rc;
+	}
+
+	/* Prepare for first SNP guest launch after INIT */
+	wbinvd_on_all_cpus();
+	rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+	if (rc)
+		return rc;
+
+	sev->snp_initialized = true;
+	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+	return rc;
+}
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_data_snp_shutdown_ex data;
+	int ret;
+
+	if (!sev->snp_initialized)
+		return 0;
+
+	memset(&data, 0, sizeof(data));
+	data.length = sizeof(data);
+	data.iommu_snp_shutdown = 1;
+
+	wbinvd_on_all_cpus();
+
+retry:
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
+	/* SHUTDOWN may require DF_FLUSH */
+	if (*error == SEV_RET_DFFLUSH_REQUIRED) {
+		ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
+		if (ret) {
+			dev_err(sev->dev, "SEV-SNP DF_FLUSH failed\n");
+			return ret;
+		}
+		goto retry;
+	}
+	if (ret) {
+		dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
+		return ret;
+	}
+
+	sev->snp_initialized = false;
+	dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+	return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+	int rc;
+
+	mutex_lock(&sev_cmd_mutex);
+	rc = __sev_snp_shutdown_locked(error);
+	mutex_unlock(&sev_cmd_mutex);
+
+	return rc;
+}
+
 static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -1279,6 +1498,8 @@ int sev_dev_init(struct psp_device *psp)
 
 static void sev_firmware_shutdown(struct sev_device *sev)
 {
+	int error;
+
 	sev_platform_shutdown(NULL);
 
 	if (sev_es_tmr) {
@@ -1295,6 +1516,13 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 			   get_order(NV_LENGTH));
 		sev_init_ex_buffer = NULL;
 	}
+
+	if (snp_range_list) {
+		kfree(snp_range_list);
+		snp_range_list = NULL;
+	}
+
+	sev_snp_shutdown(&error);
 }
 
 void sev_dev_destroy(struct psp_device *psp)
@@ -1350,15 +1578,6 @@ void sev_pci_init(void)
 		}
 	}
 
-	/* Obtain the TMR memory area for SEV-ES use */
-	sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
-	if (sev_es_tmr)
-		/* Must flush the cache before giving it to the firmware */
-		clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
-	else
-		dev_warn(sev->dev,
-			 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
-
 	if (!psp_init_on_probe)
 		return;
 
@@ -1368,6 +1587,9 @@ void sev_pci_init(void)
 		dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
 			error, rc);
 
+	dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
+		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
+
 	return;
 
 err:
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 666c21eb81ab..34767657beb5 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+
+	bool snp_initialized;
 };
 
 int sev_dev_init(struct psp_device *psp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 18/51] crypto: ccp: Provide API to issue SEV and SNP commands
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (16 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 17/51] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 19/51] x86/sev: Introduce snp leaked pages list Michael Roth
                   ` (32 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Make sev_do_cmd() a generic API interface for the hypervisor
to issue commands to manage an SEV and SNP guest. The commands
for SEV and SNP are defined in the SEV and SEV-SNP firmware
specifications.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c |  3 ++-
 include/linux/psp-sev.h      | 17 +++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index d3764ee073f3..88c5bf264a87 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -418,7 +418,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 	return ret;
 }
 
-static int sev_do_cmd(int cmd, void *data, int *psp_ret)
+int sev_do_cmd(int cmd, void *data, int *psp_ret)
 {
 	int rc;
 
@@ -428,6 +428,7 @@ static int sev_do_cmd(int cmd, void *data, int *psp_ret)
 
 	return rc;
 }
+EXPORT_SYMBOL_GPL(sev_do_cmd);
 
 static int __sev_init_locked(int *error)
 {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 06d0619ca442..c8656a36baeb 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -891,6 +891,20 @@ int sev_guest_df_flush(int *error);
  */
 int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 
+/**
+ * sev_do_cmd - perform SEV command
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV    if the SEV device is not available
+ * -%ENOTSUPP  if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO       if the SEV returned a non-zero return code
+ */
+int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -906,6 +920,9 @@ sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENO
 static inline int
 sev_guest_decommission(struct sev_data_decommission *data, int *error) { return -ENODEV; }
 
+static inline int
+sev_do_cmd(int cmd, void *data, int *psp_ret) { return -ENODEV; }
+
 static inline int
 sev_guest_activate(struct sev_data_activate *data, int *error) { return -ENODEV; }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 19/51] x86/sev: Introduce snp leaked pages list
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (17 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 18/51] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-08-09 12:46   ` Jeremi Piotrowski
  2023-06-12  4:25 ` [PATCH RFC v9 20/51] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
                   ` (31 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

From: Ashish Kalra <ashish.kalra@amd.com>

Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: relocate to arch/x86/coco/sev/host.c]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/coco/sev/host.c        | 28 ++++++++++++++++++++++++++++
 arch/x86/include/asm/sev-host.h |  3 +++
 2 files changed, 31 insertions(+)

diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
index cd3b4c6a25bc..373e91f5a337 100644
--- a/arch/x86/coco/sev/host.c
+++ b/arch/x86/coco/sev/host.c
@@ -64,6 +64,12 @@ struct rmpentry {
 static unsigned long rmptable_start __ro_after_init;
 static unsigned long rmptable_end __ro_after_init;
 
+/* list of pages which are leaked and cannot be reclaimed */
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"SEV-SNP: " fmt
 
@@ -494,3 +500,25 @@ int rmp_make_shared(u64 pfn, enum pg_level level)
 	return rmpupdate(pfn, &val);
 }
 EXPORT_SYMBOL_GPL(rmp_make_shared);
+
+void snp_leak_pages(unsigned long pfn, unsigned int npages)
+{
+	struct page *page = pfn_to_page(pfn);
+
+	WARN(1, "psc failed, pfn 0x%lx pages %d (marked offline)\n", pfn, npages);
+
+	spin_lock(&snp_leaked_pages_list_lock);
+	while (npages--) {
+		/*
+		 * Reuse the page's buddy list for chaining into the leaked
+		 * pages list. This page should not be on a free list currently
+		 * and is also unsafe to be added to a free list.
+		 */
+		list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
+		sev_dump_rmpentry(pfn);
+		pfn++;
+	}
+	spin_unlock(&snp_leaked_pages_list_lock);
+	atomic_long_inc(&snp_nr_leaked_pages);
+}
+EXPORT_SYMBOL_GPL(snp_leak_pages);
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index 753e80d16433..bab3b226777a 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -19,6 +19,8 @@ void sev_dump_rmpentry(u64 pfn);
 int psmash(u64 pfn);
 int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
 int rmp_make_shared(u64 pfn, enum pg_level level);
+void snp_leak_pages(unsigned long pfn, unsigned int npages);
+
 #else
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
 static inline void sev_dump_rmpentry(u64 pfn) {}
@@ -29,6 +31,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
 	return -ENODEV;
 }
 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+void snp_leak_pages(unsigned long pfn, unsigned int npages) {}
 #endif
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 20/51] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (18 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 19/51] x86/sev: Introduce snp leaked pages list Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 21/51] crypto: ccp: Handle the legacy SEV command " Michael Roth
                   ` (30 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.

Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
when SNP is enabled to satisfy new requirements for the SNP. Continue
allocating a 1mb region for !SNP configuration.

While at it, provide API that can be used by others to allocate a page
that can be used by the firmware. The immediate user for this API will
be the KVM driver. The KVM driver to need to allocate a firmware context
page during the guest creation. The context page need to be updated
by the firmware. See the SEV-SNP specification for further details.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: use struct sev_data_snp_page_reclaim instead of passing paddr
      directly to SEV_CMD_SNP_PAGE_RECLAIM]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 150 ++++++++++++++++++++++++++++++++---
 include/linux/psp-sev.h      |   9 +++
 2 files changed, 150 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 88c5bf264a87..d8124d33c831 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -91,6 +91,13 @@ static void *sev_init_ex_buffer;
 struct sev_data_range_list *snp_range_list;
 static int __sev_snp_init_locked(int *error);
 
+/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
+#define SEV_SNP_ES_TMR_SIZE	(2 * 1024 * 1024)
+
+static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
+
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+
 static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -191,11 +198,131 @@ static int sev_cmd_buffer_len(int cmd)
 	return 0;
 }
 
+static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
+{
+	/* Cbit maybe set in the paddr */
+	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+	int ret, err, i, n = 0;
+
+	for (i = 0; i < npages; i++, pfn++, n++) {
+		struct sev_data_snp_page_reclaim data = {0};
+
+		data.paddr = pfn << PAGE_SHIFT;
+
+		if (locked)
+			ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+		else
+			ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+
+		if (ret)
+			goto cleanup;
+
+		ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+		if (ret)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	/*
+	 * If failed to reclaim the page then page is no longer safe to
+	 * be release back to the system, leak it.
+	 */
+	snp_leak_pages(pfn, npages - n);
+	return ret;
+}
+
+static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
+{
+	/* Cbit maybe set in the paddr */
+	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+	int rc, n = 0, i;
+
+	for (i = 0; i < npages; i++, n++, pfn++) {
+		rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
+		if (rc)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	/*
+	 * Try unrolling the firmware state changes by
+	 * reclaiming the pages which were already changed to the
+	 * firmware state.
+	 */
+	snp_reclaim_pages(paddr, n, locked);
+
+	return rc;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
+{
+	unsigned long npages = 1ul << order, paddr;
+	struct sev_device *sev;
+	struct page *page;
+
+	if (!psp_master || !psp_master->sev_data)
+		return NULL;
+
+	page = alloc_pages(gfp_mask, order);
+	if (!page)
+		return NULL;
+
+	/* If SEV-SNP is initialized then add the page in RMP table. */
+	sev = psp_master->sev_data;
+	if (!sev->snp_initialized)
+		return page;
+
+	paddr = __pa((unsigned long)page_address(page));
+	if (rmp_mark_pages_firmware(paddr, npages, locked))
+		return NULL;
+
+	return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+	struct page *page;
+
+	page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
+
+	return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	unsigned long paddr, npages = 1ul << order;
+
+	if (!page)
+		return;
+
+	paddr = __pa((unsigned long)page_address(page));
+	if (sev->snp_initialized &&
+	    snp_reclaim_pages(paddr, npages, locked))
+		return;
+
+	__free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+	if (!addr)
+		return;
+
+	__snp_free_firmware_pages(virt_to_page(addr), 0, false);
+}
+EXPORT_SYMBOL_GPL(snp_free_firmware_page);
+
 static void *sev_fw_alloc(unsigned long len)
 {
 	struct page *page;
 
-	page = alloc_pages(GFP_KERNEL, get_order(len));
+	page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
 	if (!page)
 		return NULL;
 
@@ -443,7 +570,7 @@ static int __sev_init_locked(int *error)
 		data.tmr_address = __pa(sev_es_tmr);
 
 		data.flags |= SEV_INIT_FLAGS_SEV_ES;
-		data.tmr_len = SEV_ES_TMR_SIZE;
+		data.tmr_len = sev_es_tmr_size;
 	}
 
 	return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -466,7 +593,7 @@ static int __sev_init_ex_locked(int *error)
 		data.tmr_address = __pa(sev_es_tmr);
 
 		data.flags |= SEV_INIT_FLAGS_SEV_ES;
-		data.tmr_len = SEV_ES_TMR_SIZE;
+		data.tmr_len = sev_es_tmr_size;
 	}
 
 	return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
@@ -499,14 +626,16 @@ static int __sev_platform_init_locked(int *error)
 
 	if (!sev_es_tmr) {
 		/* Obtain the TMR memory area for SEV-ES use */
-		sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
-		if (sev_es_tmr)
+		sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
+		if (sev_es_tmr) {
 			/* Must flush the cache before giving it to the firmware */
-			clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
-		else
+			if (!sev->snp_initialized)
+				clflush_cache_range(sev_es_tmr, sev_es_tmr_size);
+		} else {
 			dev_warn(sev->dev,
 				 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
 		}
+	}
 
 	if (sev_init_ex_buffer) {
 		init_function = __sev_init_ex_locked;
@@ -1001,6 +1130,8 @@ static int __sev_snp_init_locked(int *error)
 	sev->snp_initialized = true;
 	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
 
+	sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
+
 	return rc;
 }
 
@@ -1507,8 +1638,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		/* The TMR area was encrypted, flush it from the cache */
 		wbinvd_on_all_cpus();
 
-		free_pages((unsigned long)sev_es_tmr,
-			   get_order(SEV_ES_TMR_SIZE));
+		__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
+					  get_order(sev_es_tmr_size),
+					  false);
 		sev_es_tmr = NULL;
 	}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index c8656a36baeb..5ae61de96e44 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -906,6 +906,8 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 int sev_do_cmd(int cmd, void *data, int *psp_ret);
 
 void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);
 
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
 
@@ -933,6 +935,13 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
 
 static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
 
+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+	return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 21/51] crypto: ccp: Handle the legacy SEV command when SNP is enabled
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (19 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 20/51] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 22/51] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
                   ` (29 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..

A command buffer may contains a system physical address that the firmware
may write to. There are two cases that need to be handled:

1) system physical address points to a guest memory
2) system physical address points to a host memory

To handle the case #1, change the page state to the firmware in the RMP
table before issuing the command and restore the state to shared after the
command completes.

For the case #2, use a bounce buffer to complete the request.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 371 ++++++++++++++++++++++++++++++++++-
 drivers/crypto/ccp/sev-dev.h |  12 ++
 2 files changed, 373 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index d8124d33c831..10bb0a7dcfd6 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -28,6 +28,7 @@
 #include <asm/smp.h>
 #include <asm/cacheflush.h>
 #include <asm/e820/types.h>
+#include <asm/sev-host.h>
 
 #include "psp-dev.h"
 #include "sev-dev.h"
@@ -258,6 +259,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
 	return rc;
 }
 
+static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
+{
+	/* Cbit maybe set in the paddr */
+	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+	int rc, n = 0, i;
+
+	for (i = 0; i < npages; i++, pfn++, n++) {
+		rc = rmp_make_shared(pfn, PG_LEVEL_4K);
+		if (rc)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	/*
+	 * If failed to change the page state to shared, then its not safe
+	 * to release the page back to the system, leak it.
+	 */
+	snp_leak_pages(pfn, npages - n);
+
+	return rc;
+}
+
 static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
 {
 	unsigned long npages = 1ul << order, paddr;
@@ -459,12 +484,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
 	return sev_write_init_ex_file();
 }
 
+static int alloc_snp_host_map(struct sev_device *sev)
+{
+	struct page *page;
+	int i;
+
+	for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+		struct snp_host_map *map = &sev->snp_host_map[i];
+
+		memset(map, 0, sizeof(*map));
+
+		page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
+		if (!page)
+			return -ENOMEM;
+
+		map->host = page_address(page);
+	}
+
+	return 0;
+}
+
+static void free_snp_host_map(struct sev_device *sev)
+{
+	int i;
+
+	for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+		struct snp_host_map *map = &sev->snp_host_map[i];
+
+		if (map->host) {
+			__free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
+			memset(map, 0, sizeof(*map));
+		}
+	}
+}
+
+static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+	unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+	map->active = false;
+
+	if (!paddr || !len)
+		return 0;
+
+	map->paddr = *paddr;
+	map->len = len;
+
+	/* If paddr points to a guest memory then change the page state to firmwware. */
+	if (guest) {
+		if (rmp_mark_pages_firmware(*paddr, npages, true))
+			return -EFAULT;
+
+		goto done;
+	}
+
+	if (!map->host)
+		return -ENOMEM;
+
+	/* Check if the pre-allocated buffer can be used to fullfil the request. */
+	if (len > SEV_FW_BLOB_MAX_SIZE)
+		return -EINVAL;
+
+	/* Transition the pre-allocated buffer to the firmware state. */
+	if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
+		return -EFAULT;
+
+	/* Set the paddr to use pre-allocated firmware buffer */
+	*paddr = __psp_pa(map->host);
+
+done:
+	map->active = true;
+	return 0;
+}
+
+static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+	unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+	if (!map->active)
+		return 0;
+
+	/* If paddr points to a guest memory then restore the page state to hypervisor. */
+	if (guest) {
+		if (snp_reclaim_pages(*paddr, npages, true))
+			return -EFAULT;
+
+		goto done;
+	}
+
+	/*
+	 * Transition the pre-allocated buffer to hypervisor state before the access.
+	 *
+	 * This is because while changing the page state to firmware, the kernel unmaps
+	 * the pages from the direct map, and to restore the direct map the pages must
+	 * be transitioned back to the shared state.
+	 */
+	if (snp_reclaim_pages(__pa(map->host), npages, true))
+		return -EFAULT;
+
+	/* Copy the response data firmware buffer to the callers buffer. */
+	memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
+	*paddr = map->paddr;
+
+done:
+	map->active = false;
+	return 0;
+}
+
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+	switch (cmd) {
+	case SEV_CMD_PLATFORM_STATUS:
+	case SEV_CMD_GUEST_STATUS:
+	case SEV_CMD_LAUNCH_START:
+	case SEV_CMD_RECEIVE_START:
+	case SEV_CMD_LAUNCH_MEASURE:
+	case SEV_CMD_SEND_START:
+	case SEV_CMD_SEND_UPDATE_DATA:
+	case SEV_CMD_SEND_UPDATE_VMSA:
+	case SEV_CMD_PEK_CSR:
+	case SEV_CMD_PDH_CERT_EXPORT:
+	case SEV_CMD_GET_ID:
+	case SEV_CMD_ATTESTATION_REPORT:
+		return true;
+	default:
+		return false;
+	}
+}
+
+#define prep_buffer(name, addr, len, guest, map) \
+	func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
+
+static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
+{
+	int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
+	struct sev_device *sev = psp_master->sev_data;
+	bool from_fw = !to_fw;
+
+	/*
+	 * After the command is completed, change the command buffer memory to
+	 * hypervisor state.
+	 *
+	 * The immutable bit is automatically cleared by the firmware, so
+	 * no not need to reclaim the page.
+	 */
+	if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
+		if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
+			return -EFAULT;
+
+		/* No need to go further if firmware failed to execute command. */
+		if (fw_err)
+			return 0;
+	}
+
+	if (to_fw)
+		func = map_firmware_writeable;
+	else
+		func = unmap_firmware_writeable;
+
+	/*
+	 * A command buffer may contains a system physical address. If the address
+	 * points to a host memory then use an intermediate firmware page otherwise
+	 * change the page state in the RMP table.
+	 */
+	switch (cmd) {
+	case SEV_CMD_PDH_CERT_EXPORT:
+		if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
+				pdh_cert_len, false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
+				cert_chain_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_GET_ID:
+		if (prep_buffer(struct sev_data_get_id, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_PEK_CSR:
+		if (prep_buffer(struct sev_data_pek_csr, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_launch_update_data, address, len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_MEASURE:
+		if (prep_buffer(struct sev_data_launch_measure, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_SECRET:
+		if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_DBG_DECRYPT:
+		if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
+				&sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_DBG_ENCRYPT:
+		if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
+				&sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_ATTESTATION_REPORT:
+		if (prep_buffer(struct sev_data_attestation_report, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_START:
+		if (prep_buffer(struct sev_data_send_start, session_address,
+				session_len, false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_send_update_data, trans_address,
+				trans_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
+				trans_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_RECEIVE_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_receive_update_data, guest_address,
+				guest_len, true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_RECEIVE_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
+				guest_len, true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	default:
+		break;
+	}
+
+	/* The command buffer need to be in the firmware state. */
+	if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
+		if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
+			return -EFAULT;
+	}
+
+	return 0;
+
+err:
+	return -EINVAL;
+}
+
+static inline bool need_firmware_copy(int cmd)
+{
+	struct sev_device *sev = psp_master->sev_data;
+
+	/* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
+	return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
+}
+
+static int snp_aware_copy_to_firmware(int cmd, void *data)
+{
+	return __snp_cmd_buf_copy(cmd, data, true, 0);
+}
+
+static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
+{
+	return __snp_cmd_buf_copy(cmd, data, false, fw_err);
+}
+
 static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 {
 	struct psp_device *psp = psp_master;
 	struct sev_device *sev;
 	unsigned int phys_lsb, phys_msb;
 	unsigned int reg, ret = 0;
+	void *cmd_buf;
 	int buf_len;
 
 	if (!psp || !psp->sev_data)
@@ -484,12 +792,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 	 * work for some memory, e.g. vmalloc'd addresses, and @data may not be
 	 * physically contiguous.
 	 */
-	if (data)
-		memcpy(sev->cmd_buf, data, buf_len);
+	if (data) {
+		if (sev->cmd_buf_active > 2)
+			return -EBUSY;
+
+		cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
+
+		memcpy(cmd_buf, data, buf_len);
+		sev->cmd_buf_active++;
+
+		/*
+		 * The behavior of the SEV-legacy commands is altered when the
+		 * SNP firmware is in the INIT state.
+		 */
+		if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, cmd_buf))
+			return -EFAULT;
+	} else {
+		cmd_buf = sev->cmd_buf;
+	}
 
 	/* Get the physical address of the command buffer */
-	phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
-	phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+	phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+	phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
 
 	dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
 		cmd, phys_msb, phys_lsb, psp_timeout);
@@ -532,15 +856,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 		ret = sev_write_init_ex_file_if_required(cmd);
 	}
 
-	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
-			     buf_len, false);
-
 	/*
 	 * Copy potential output from the PSP back to data.  Do this even on
 	 * failure in case the caller wants to glean something from the error.
 	 */
-	if (data)
-		memcpy(data, sev->cmd_buf, buf_len);
+	if (data) {
+		/*
+		 * Restore the page state after the command completes.
+		 */
+		if (need_firmware_copy(cmd) &&
+		    snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
+			return -EFAULT;
+
+		memcpy(data, cmd_buf, buf_len);
+		sev->cmd_buf_active--;
+	}
+
+	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+			     buf_len, false);
 
 	return ret;
 }
@@ -624,6 +957,14 @@ static int __sev_platform_init_locked(int *error)
 		dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
 	}
 
+	/*
+	 * Allocate the intermediate buffers used for the legacy command handling.
+	 */
+	if (rc != -ENODEV && alloc_snp_host_map(sev)) {
+		dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
+		goto skip_legacy;
+	}
+
 	if (!sev_es_tmr) {
 		/* Obtain the TMR memory area for SEV-ES use */
 		sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
@@ -677,6 +1018,7 @@ static int __sev_platform_init_locked(int *error)
 	dev_info(sev->dev, "SEV API:%d.%d build:%d\n", sev->api_major,
 		 sev->api_minor, sev->build);
 
+skip_legacy:
 	return 0;
 }
 
@@ -1586,10 +1928,12 @@ int sev_dev_init(struct psp_device *psp)
 	if (!sev)
 		goto e_err;
 
-	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
 	if (!sev->cmd_buf)
 		goto e_sev;
 
+	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
 	psp->sev_data = sev;
 
 	sev->dev = dev;
@@ -1655,6 +1999,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		snp_range_list = NULL;
 	}
 
+	/*
+	 * The host map need to clear the immutable bit so it must be free'd before the
+	 * SNP firmware shutdown.
+	 */
+	free_snp_host_map(sev);
+
 	sev_snp_shutdown(&error);
 }
 
@@ -1726,6 +2076,7 @@ void sev_pci_init(void)
 	return;
 
 err:
+	free_snp_host_map(sev);
 	psp_master->sev_data = NULL;
 }
 
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 34767657beb5..19d79f9d4212 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -29,11 +29,20 @@
 #define SEV_CMDRESP_CMD_SHIFT		16
 #define SEV_CMDRESP_IOC			BIT(0)
 
+#define MAX_SNP_HOST_MAP_BUFS		2
+
 struct sev_misc_dev {
 	struct kref refcount;
 	struct miscdevice misc;
 };
 
+struct snp_host_map {
+	u64 paddr;
+	u32 len;
+	void *host;
+	bool active;
+};
+
 struct sev_device {
 	struct device *dev;
 	struct psp_device *psp;
@@ -52,8 +61,11 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+	void *cmd_buf_backup;
+	int cmd_buf_active;
 
 	bool snp_initialized;
+	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
 };
 
 int sev_dev_init(struct psp_device *psp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 22/51] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (20 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 21/51] crypto: ccp: Handle the legacy SEV command " Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 23/51] KVM: SEV: Select CONFIG_KVM_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y Michael Roth
                   ` (28 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The command can be used by the userspace to query the SNP platform status
report. See the SEV-SNP spec for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst | 27 ++++++++++++++++
 drivers/crypto/ccp/sev-dev.c          | 45 +++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h          |  1 +
 3 files changed, 73 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index bf593e88cfd9..11ea67c944df 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -61,6 +61,22 @@ counter (e.g. counter overflow), then -EIO will be returned.
                 __u64 fw_err;
         };
 
+The host ioctl should be called to /dev/sev device. The ioctl accepts command
+id and command input structure.
+
+::
+        struct sev_issue_cmd {
+                /* Command ID */
+                __u32 cmd;
+
+                /* Command request structure */
+                __u64 data;
+
+                /* firmware error code on failure (see psp-sev.h) */
+                __u32 error;
+        };
+
+
 2.1 SNP_GET_REPORT
 ------------------
 
@@ -118,6 +134,17 @@ be updated with the expected value.
 
 See GHCB specification for further detail on how to parse the certificate blob.
 
+2.4 SNP_PLATFORM_STATUS
+-----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_platform_status
+:Returns (out): 0 on success, -negative on error
+
+The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
+status includes API major, minor version and more. See the SEV-SNP
+specification for further details.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 10bb0a7dcfd6..0bfe9721c977 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1767,6 +1767,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
 	return ret;
 }
 
+static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_data_snp_addr buf;
+	struct page *status_page;
+	void *data;
+	int ret;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	status_page = alloc_page(GFP_KERNEL_ACCOUNT);
+	if (!status_page)
+		return -ENOMEM;
+
+	data = page_address(status_page);
+	if (rmp_mark_pages_firmware(__pa(data), 1, true)) {
+		__free_pages(status_page, 0);
+		return -EFAULT;
+	}
+
+	buf.gctx_paddr = __psp_pa(data);
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
+
+	/* Change the page state before accessing it */
+	if (snp_reclaim_pages(__pa(data), 1, true)) {
+		snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
+		return -EFAULT;
+	}
+
+	if (ret)
+		goto cleanup;
+
+	if (copy_to_user((void __user *)argp->data, data,
+			 sizeof(struct sev_user_data_snp_status)))
+		ret = -EFAULT;
+
+cleanup:
+	__free_pages(status_page, 0);
+	return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -1818,6 +1860,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SEV_GET_ID2:
 		ret = sev_ioctl_do_get_id2(&input);
 		break;
+	case SNP_PLATFORM_STATUS:
+		ret = sev_ioctl_snp_platform_status(&input);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 7d8a2dd20273..4dc6a3e7b3d5 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,7 @@ enum {
 	SEV_PEK_CERT_IMPORT,
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
+	SNP_PLATFORM_STATUS,
 
 	SEV_MAX,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 23/51] KVM: SEV: Select CONFIG_KVM_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (21 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 22/51] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 24/51] KVM: SVM: Add support to handle AP reset MSR protocol Michael Roth
                   ` (27 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

AMD SEV relies on the restricted/protected memory support to run guests
in some cases (such as SEV lazy-pinning), so make sure to enable that
support with the CONFIG_KVM_PROTECTED_VM build option.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 718010600956..638679a4e5dc 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -126,6 +126,7 @@ config KVM_AMD_SEV
 	bool "AMD Secure Encrypted Virtualization (SEV) support"
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
+	select KVM_PROTECTED_VM
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 24/51] KVM: SVM: Add support to handle AP reset MSR protocol
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (22 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 23/51] KVM: SEV: Select CONFIG_KVM_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 25/51] KVM: SVM: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
                   ` (26 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  2 ++
 arch/x86/kvm/svm/sev.c            | 56 ++++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.h            |  1 +
 3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 9eb20b416251..a4fb53fd15d7 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -56,6 +56,8 @@
 /* AP Reset Hold */
 #define GHCB_MSR_AP_RESET_HOLD_REQ	0x006
 #define GHCB_MSR_AP_RESET_HOLD_RESP	0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c25aeb550cd9..b88295aa7124 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -58,6 +58,10 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+#define AP_RESET_HOLD_NONE		0
+#define AP_RESET_HOLD_NAE_EVENT		1
+#define AP_RESET_HOLD_MSR_PROTO		2
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2569,6 +2573,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 
 void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 {
+	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
+	svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
 	if (!svm->sev_es.ghcb)
 		return;
 
@@ -2781,6 +2788,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_AP_RESET_HOLD_REQ:
+		svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+		ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+		/*
+		 * Preset the result to a non-SIPI return and then only set
+		 * the result to non-zero when delivering a SIPI.
+		 */
+		set_ghcb_msr_bits(svm, 0,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2880,6 +2903,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
+		svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
 		ret = kvm_emulate_ap_reset_hold(vcpu);
 		break;
 	case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3040,13 +3064,29 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		return;
 	}
 
-	/*
-	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
-	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
-	 * non-zero value.
-	 */
-	if (!svm->sev_es.ghcb)
-		return;
+	/* Subsequent SIPI */
+	switch (svm->sev_es.ap_reset_hold_type) {
+	case AP_RESET_HOLD_NAE_EVENT:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+		 */
+		ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+		break;
+	case AP_RESET_HOLD_MSR_PROTO:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set GHCB data field to a non-zero value.
+		 */
+		set_ghcb_msr_bits(svm, 1,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
 
-	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	default:
+		break;
+	}
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f44751dd8d5d..50be41fa16a0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -192,6 +192,7 @@ struct vcpu_sev_es_state {
 	struct ghcb *ghcb;
 	struct kvm_host_map ghcb_map;
 	bool received_first_sipi;
+	unsigned int ap_reset_hold_type;
 
 	/* SEV-ES scratch area support */
 	void *ghcb_sa;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 25/51] KVM: SVM: Add GHCB handling for Hypervisor Feature Support requests
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (23 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 24/51] KVM: SVM: Add support to handle AP reset MSR protocol Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 26/51] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
                   ` (25 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  2 ++
 arch/x86/kvm/svm/sev.c            | 14 ++++++++++++++
 arch/x86/kvm/svm/svm.h            |  3 ++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index a4fb53fd15d7..aaea4afcda98 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
 /* GHCB Hypervisor Feature Request/Response */
 #define GHCB_MSR_HV_FT_REQ		0x080
 #define GHCB_MSR_HV_FT_RESP		0x081
+#define GHCB_MSR_HV_FT_POS		12
+#define GHCB_MSR_HV_FT_MASK		GENMASK_ULL(51, 0)
 #define GHCB_MSR_HV_FT_RESP_VAL(v)			\
 	/* GHCBData[63:12] */				\
 	(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b88295aa7124..2bceb0060880 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2538,6 +2538,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+	case SVM_VMGEXIT_HV_FEATURES:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -2804,6 +2805,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK,
 				  GHCB_MSR_INFO_POS);
 		break;
+	case GHCB_MSR_HV_FT_REQ: {
+		set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+				  GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2928,6 +2936,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_HV_FEATURES: {
+		ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
+
+		ret = 1;
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 50be41fa16a0..1ab117daebd9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -711,9 +711,10 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
-#define GHCB_VERSION_MAX	1ULL
+#define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
+#define GHCB_HV_FT_SUPPORTED	GHCB_HV_FT_SNP
 
 extern unsigned int max_sev_asid;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 26/51] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (24 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 25/51] KVM: SVM: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 27/51] KVM: SVM: Add initial SEV-SNP support Michael Roth
                   ` (24 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.

When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.

If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2mb aligned and software accesses any
part of the associated 2mb region with a hupage, the CPU will
incorrectly treat the entire 2mb region as in-use and signal a spurious
RMP violation #PF.

The recommended is to not use the hugepage for the VMCB, VMSA or
AVIC backing page for similar reasons. Add a generic allocator that will
ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
be used when SEV-SNP is enabled. Also implement similar handling for the
VMCB/VMSA pages of nested guests.

Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: squash in nested guest handling from Ashish]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/lapic.c               |  5 ++++-
 arch/x86/kvm/svm/nested.c          |  2 +-
 arch/x86/kvm/svm/sev.c             | 33 ++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c             | 17 ++++++++++++---
 arch/x86/kvm/svm/svm.h             |  1 +
 7 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 48f043de2ec0..28456b497198 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -135,6 +135,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
 KVM_X86_OP_OPTIONAL(gmem_invalidate)
+KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c26f76641121..8d2bb3ff66a2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1743,6 +1743,7 @@ struct kvm_x86_ops {
 	int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
 	void (*gmem_invalidate)(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
+	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e542cf285b51..94311938651a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2769,7 +2769,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 
 	vcpu->arch.apic = apic;
 
-	apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (kvm_x86_ops.alloc_apic_backing_page)
+		apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
+	else
+		apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
 	if (!apic->regs) {
 		printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
 		       vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 96936ddf1b3c..fb981c8b82c4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1185,7 +1185,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
 	if (svm->nested.initialized)
 		return 0;
 
-	vmcb02_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	vmcb02_page = snp_safe_alloc_page(&svm->vcpu);
 	if (!vmcb02_page)
 		return -ENOMEM;
 	svm->nested.vmcb02.ptr = page_address(vmcb02_page);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2bceb0060880..69b57e8f0a7f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3104,3 +3104,36 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		break;
 	}
 }
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+	unsigned long pfn;
+	struct page *p;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+	/*
+	 * Allocate an SNP safe page to workaround the SNP erratum where
+	 * the CPU will incorrectly signal an RMP violation  #PF if a
+	 * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
+	 * or AVIC backing page. The recommeded workaround is to not use the
+	 * hugepage.
+	 *
+	 * Allocate one extra page, use a page which is not 2mb aligned
+	 * and free the other.
+	 */
+	p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+	if (!p)
+		return NULL;
+
+	split_page(p, 1);
+
+	pfn = page_to_pfn(p);
+	if (IS_ALIGNED(pfn, PTRS_PER_PMD))
+		__free_page(p++);
+	else
+		__free_page(p + 1);
+
+	return p;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index eb308c9994f9..065167b42f90 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -668,7 +668,7 @@ static int svm_cpu_init(int cpu)
 	int ret = -ENOMEM;
 
 	memset(sd, 0, sizeof(struct svm_cpu_data));
-	sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	sd->save_area = snp_safe_alloc_page(NULL);
 	if (!sd->save_area)
 		return ret;
 
@@ -1381,7 +1381,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 	svm = to_svm(vcpu);
 
 	err = -ENOMEM;
-	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	vmcb01_page = snp_safe_alloc_page(vcpu);
 	if (!vmcb01_page)
 		goto out;
 
@@ -1390,7 +1390,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 		 * SEV-ES guests require a separate VMSA page used to contain
 		 * the encrypted register state of the guest.
 		 */
-		vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+		vmsa_page = snp_safe_alloc_page(vcpu);
 		if (!vmsa_page)
 			goto error_free_vmcb_page;
 
@@ -4770,6 +4770,16 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+	struct page *page = snp_safe_alloc_page(vcpu);
+
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -4900,6 +4910,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1ab117daebd9..e45b54e95495 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -741,6 +741,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 27/51] KVM: SVM: Add initial SEV-SNP support
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (25 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 26/51] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 28/51] KVM: SVM: Add KVM_SNP_INIT command Michael Roth
                   ` (23 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality  while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.

The SNP feature is added incrementally, the later patches adds a new module
parameters that can be used to enabled SEV-SNP in the KVM.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 10 +++++++++-
 arch/x86/kvm/svm/svm.h |  8 ++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 69b57e8f0a7f..f5fcf6c33583 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -58,6 +58,9 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
 #define AP_RESET_HOLD_NONE		0
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
@@ -2169,6 +2172,7 @@ void __init sev_hardware_setup(void)
 {
 #ifdef CONFIG_KVM_AMD_SEV
 	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+	bool sev_snp_supported = false;
 	bool sev_es_supported = false;
 	bool sev_supported = false;
 
@@ -2248,12 +2252,16 @@ void __init sev_hardware_setup(void)
 	if (misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count))
 		goto out;
 
-	pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count);
 	sev_es_supported = true;
+	sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
+
+	pr_info("SEV-ES %ssupported: %u ASIDs\n",
+		sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);
 
 out:
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
+	sev_snp_enabled = sev_snp_supported;
 #endif
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e45b54e95495..6974d63c84f9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -77,6 +77,7 @@ enum {
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
+	bool snp_active;	/* SEV-SNP enabled guest */
 	unsigned int asid;	/* ASID used for this guest */
 	unsigned int handle;	/* SEV firmware handle */
 	int fd;			/* SEV device fd */
@@ -346,6 +347,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
 #endif
 }
 
+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return sev_es_guest(kvm) && sev->snp_active;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 28/51] KVM: SVM: Add KVM_SNP_INIT command
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (26 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 27/51] KVM: SVM: Add initial SEV-SNP support Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 29/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
                   ` (22 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh, Pavan Kumar Paluri

From: Brijesh Singh <brijesh.singh@amd.com>

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

The flags value must be zero, it will be extended in future SNP support to
communicate the optional features (such as restricted INT injection etc).

Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    | 27 +++++++++++++
 arch/x86/include/asm/svm.h                    |  1 +
 arch/x86/kvm/svm/sev.c                        | 39 ++++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |  4 ++
 include/uapi/linux/kvm.h                      | 13 +++++++
 5 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 487b6328b3e7..1240d28badd6 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -434,6 +434,33 @@ issued by the hypervisor to make the guest ready for execution.
 
 Returns: 0 on success, -negative on error
 
+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_snp_init {
+                __u64 flags;
+        };
+
+The flags bitmap is defined as::
+
+   /* enable the restricted injection */
+   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
+
+   /* enable the restricted injection timer */
+   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
 References
 ==========
 
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index e7c7379d6ac7..ac8edfdd60fa 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -288,6 +288,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 
+#define SVM_SEV_FEAT_SNP_ACTIVE		BIT(0)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f5fcf6c33583..70e0576a32d0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -243,6 +243,25 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 	sev_decommission(handle);
 }
 
+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_snp_init params;
+	int ret = 0;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+		ret = -EOPNOTSUPP;
+
+	params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
+		ret = -EFAULT;
+
+	return ret;
+}
+
 static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -256,12 +275,19 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		return ret;
 
 	sev->active = true;
-	sev->es_active = argp->id == KVM_SEV_ES_INIT;
+	sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
+	sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
 	asid = sev_asid_new(sev);
 	if (asid < 0)
 		goto e_no_asid;
 	sev->asid = asid;
 
+	if (sev->snp_active) {
+		ret = verify_snp_init_flags(kvm, argp);
+		if (ret)
+			goto e_free;
+	}
+
 	ret = sev_platform_init(&argp->error);
 	if (ret)
 		goto e_free;
@@ -277,6 +303,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	sev_asid_free(sev);
 	sev->asid = 0;
 e_no_asid:
+	sev->snp_active = false;
 	sev->es_active = false;
 	sev->active = false;
 	return ret;
@@ -612,6 +639,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->xss  = svm->vcpu.arch.ia32_xss;
 	save->dr6  = svm->vcpu.arch.dr6;
 
+	/* Enable the SEV-SNP feature */
+	if (sev_snp_guest(svm->vcpu.kvm))
+		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
@@ -1864,6 +1895,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	}
 
 	switch (sev_cmd.id) {
+	case KVM_SEV_SNP_INIT:
+		if (!sev_snp_enabled) {
+			r = -ENOTTY;
+			goto out;
+		}
+		fallthrough;
 	case KVM_SEV_ES_INIT:
 		if (!sev_es_enabled) {
 			r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6974d63c84f9..4360cf04f53a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -74,6 +74,9 @@ enum {
 /* TPR and CR2 are always written before VMRUN */
 #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
 
+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS		0x0
+
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
@@ -89,6 +92,7 @@ struct kvm_sev_info {
 	struct list_head mirror_entry; /* Use as a list entry of mirrors */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	atomic_t migration_in_progress;
+	u64 snp_init_flags;
 };
 
 struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0fa665e8862a..43b6291e3a80 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1926,6 +1926,9 @@ enum sev_cmd_id {
 	/* Guest Migration Extension */
 	KVM_SEV_SEND_CANCEL,
 
+	/* SNP specific commands */
+	KVM_SEV_SNP_INIT,
+
 	KVM_SEV_NR_MAX,
 };
 
@@ -2022,6 +2025,16 @@ struct kvm_sev_receive_update_data {
 	__u32 trans_len;
 };
 
+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
+
+struct kvm_snp_init {
+	__u64 flags;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 29/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (27 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 28/51] KVM: SVM: Add KVM_SNP_INIT command Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12 17:08   ` Peter Gonda
  2023-06-12  4:25 ` [PATCH RFC v9 30/51] KVM: Add HVA range operator Michael Roth
                   ` (21 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: hold sev_deactivate_lock when calling SEV_CMD_SNP_DECOMMISSION]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  24 ++++
 arch/x86/kvm/svm/sev.c                        | 126 +++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |   1 +
 include/uapi/linux/kvm.h                      |  10 ++
 4 files changed, 158 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 1240d28badd6..3293e86f9b8a 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -461,6 +461,30 @@ The flags bitmap is defined as::
 If the specified flags is not supported then return -EOPNOTSUPP, and the supported
 flags are returned.
 
+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct  kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_start {
+                __u64 policy;           /* Guest policy to use. */
+                __u64 ma_uaddr;         /* userspace address of migration agent */
+                __u8 ma_en;             /* 1 if the migration agent is enabled */
+                __u8 imi_en;            /* set IMI to 1. */
+                __u8 gosvw[16];         /* guest OS visible workarounds */
+        };
+
+See the SEV-SNP specification for further detail on the launch input.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 70e0576a32d0..e65f3be67c23 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -22,6 +22,7 @@
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
 #include <asm/fpu/xcr.h>
+#include <asm/sev-host.h>
 
 #include "mmu.h"
 #include "x86.h"
@@ -75,6 +76,8 @@ static unsigned int nr_asids;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 
+static int snp_decommission_context(struct kvm *kvm);
+
 struct enc_region {
 	struct list_head list;
 	unsigned long npages;
@@ -100,12 +103,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
 	down_write(&sev_deactivate_lock);
 
 	wbinvd_on_all_cpus();
-	ret = sev_guest_df_flush(&error);
+
+	if (sev_snp_enabled)
+		ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+	else
+		ret = sev_guest_df_flush(&error);
 
 	up_write(&sev_deactivate_lock);
 
 	if (ret)
-		pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+		pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+		       sev_snp_enabled ? "-SNP" : "", ret, error);
 
 	return ret;
 }
@@ -1871,6 +1879,80 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct sev_data_snp_addr data = {};
+	void *context;
+	int rc;
+
+	/* Allocate memory for context page */
+	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!context)
+		return NULL;
+
+	data.gctx_paddr = __psp_pa(context);
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+	if (rc) {
+		snp_free_firmware_page(context);
+		return NULL;
+	}
+
+	return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_activate data = {0};
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.asid   = sev_get_asid(kvm);
+	return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_start start = {0};
+	struct kvm_sev_snp_launch_start params;
+	int rc;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	sev->snp_context = snp_context_create(kvm, argp);
+	if (!sev->snp_context)
+		return -ENOTTY;
+
+	start.gctx_paddr = __psp_pa(sev->snp_context);
+	start.policy = params.policy;
+	memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	sev->fd = argp->sev_fd;
+	rc = snp_bind_asid(kvm, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	return 0;
+
+e_free_context:
+	snp_decommission_context(kvm);
+
+	return rc;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1961,6 +2043,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_FINISH:
 		r = sev_receive_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_START:
+		r = snp_launch_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2152,6 +2237,33 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+static int snp_decommission_context(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_addr data = {};
+	int ret;
+
+	/* If context is not created then do nothing */
+	if (!sev->snp_context)
+		return 0;
+
+	data.gctx_paddr = __sme_pa(sev->snp_context);
+	down_write(&sev_deactivate_lock);
+	ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+	if (WARN_ONCE(ret, "failed to release guest context")) {
+		up_write(&sev_deactivate_lock);
+		return ret;
+	}
+
+	up_write(&sev_deactivate_lock);
+
+	/* free the context page now */
+	snp_free_firmware_page(sev->snp_context);
+	sev->snp_context = NULL;
+
+	return 0;
+}
+
 void sev_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2193,7 +2305,15 @@ void sev_vm_destroy(struct kvm *kvm)
 		}
 	}
 
-	sev_unbind_asid(kvm, sev->handle);
+	if (sev_snp_guest(kvm)) {
+		if (snp_decommission_context(kvm)) {
+			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+			return;
+		}
+	} else {
+		sev_unbind_asid(kvm, sev->handle);
+	}
+
 	sev_asid_free(sev);
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4360cf04f53a..9a7cafb018fe 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -93,6 +93,7 @@ struct kvm_sev_info {
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	atomic_t migration_in_progress;
 	u64 snp_init_flags;
+	void *snp_context;      /* SNP guest context page */
 };
 
 struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 43b6291e3a80..b4c7ac9710d3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1928,6 +1928,7 @@ enum sev_cmd_id {
 
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
+	KVM_SEV_SNP_LAUNCH_START,
 
 	KVM_SEV_NR_MAX,
 };
@@ -2035,6 +2036,15 @@ struct kvm_snp_init {
 	__u64 flags;
 };
 
+struct kvm_sev_snp_launch_start {
+	__u64 policy;
+	__u64 ma_uaddr;
+	__u8 ma_en;
+	__u8 imi_en;
+	__u8 gosvw[16];
+	__u8 pad[6];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 30/51] KVM: Add HVA range operator
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (28 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 29/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 31/51] KVM: Split out memory attribute xarray updates to helper function Michael Roth
                   ` (20 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Vishal Annapurve

From: Vishal Annapurve <vannapurve@google.com>

Introduce HVA range operator so that other KVM subsystems
can operate on HVA range.

Signed-off-by: Vishal Annapurve <vannapurve@google.com>
[mdr: minor checkpatch alignment fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/kvm_host.h |  6 +++++
 virt/kvm/kvm_main.c      | 49 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7de06add2235..9a9d4141ba74 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1400,6 +1400,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm);
 void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
 void kvm_mmu_invalidate_end(struct kvm *kvm);
 
+typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
+				struct kvm_gfn_range *range, void *data);
+
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
+
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg);
 long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 422d49634c56..48beffca6b67 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -642,6 +642,55 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 	return (int)ret;
 }
 
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
+{
+	int ret = 0;
+	struct kvm_gfn_range gfn_range;
+	struct kvm_memory_slot *slot;
+	struct kvm_memslots *slots;
+	int i, idx;
+
+	if (WARN_ON_ONCE(hva_end <= hva_start))
+		return -EINVAL;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+		struct interval_tree_node *node;
+
+		slots = __kvm_memslots(kvm, i);
+		kvm_for_each_memslot_in_hva_range(node, slots,
+						  hva_start, hva_end - 1) {
+			unsigned long start, end;
+
+			slot = container_of(node, struct kvm_memory_slot,
+					    hva_node[slots->node_idx]);
+			start = max(hva_start, slot->userspace_addr);
+			end = min(hva_end, slot->userspace_addr +
+						  (slot->npages << PAGE_SHIFT));
+
+			/*
+			 * {gfn(page) | page intersects with [hva_start, hva_end)} =
+			 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
+			 */
+			gfn_range.start = hva_to_gfn_memslot(start, slot);
+			gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
+			gfn_range.slot = slot;
+
+			ret = handler(kvm, &gfn_range, data);
+			if (ret)
+				goto e_ret;
+		}
+	}
+
+e_ret:
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_do_hva_range_op);
+
 static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
 						unsigned long start,
 						unsigned long end,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 31/51] KVM: Split out memory attribute xarray updates to helper function
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (29 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 30/51] KVM: Add HVA range operator Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 32/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
                   ` (19 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

This will be useful to other callers that need to update memory
attributes for things like setting up the initial private memory payload
for a guest.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      | 26 ++++++++++++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9a9d4141ba74..56055692cdfa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -991,6 +991,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module);
 void kvm_exit(void);
 
 void kvm_get_kvm(struct kvm *kvm);
+int kvm_vm_set_region_attr(struct kvm *kvm, gfn_t start, gfn_t end, u64 attributes);
 bool kvm_get_kvm_safe(struct kvm *kvm);
 void kvm_put_kvm(struct kvm *kvm);
 bool file_is_kvm(struct file *file);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 48beffca6b67..0c0a75affab6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2466,12 +2466,28 @@ static void kvm_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
 		kvm_flush_remote_tlbs(kvm);
 }
 
+int kvm_vm_set_region_attr(struct kvm *kvm, gfn_t start, gfn_t end,
+			   u64 attributes)
+{
+	gfn_t index;
+	void *entry;
+
+	entry = attributes ? xa_mk_value(attributes) : NULL;
+
+	for (index = start; index < end; index++)
+		if (xa_err(xa_store(&kvm->mem_attr_array, index, entry,
+				    GFP_KERNEL_ACCOUNT)))
+			break;
+
+	return index;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_set_region_attr);
+
 static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 					   struct kvm_memory_attributes *attrs)
 {
 	gfn_t start, end;
 	unsigned long i;
-	void *entry;
 
 	/* flags is currently not used. */
 	if (attrs->flags)
@@ -2486,8 +2502,6 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 	start = attrs->address >> PAGE_SHIFT;
 	end = (attrs->address + attrs->size - 1 + PAGE_SIZE) >> PAGE_SHIFT;
 
-	entry = attrs->attributes ? xa_mk_value(attrs->attributes) : NULL;
-
 	mutex_lock(&kvm->slots_lock);
 
 	KVM_MMU_LOCK(kvm);
@@ -2495,11 +2509,7 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 	kvm_mmu_invalidate_range_add(kvm, start, end);
 	KVM_MMU_UNLOCK(kvm);
 
-	for (i = start; i < end; i++) {
-		if (xa_err(xa_store(&kvm->mem_attr_array, i, entry,
-				    GFP_KERNEL_ACCOUNT)))
-			break;
-	}
+	i = kvm_vm_set_region_attr(kvm, start, end, attrs->attributes);
 
 	KVM_MMU_LOCK(kvm);
 	if (i > start)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 32/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (30 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 31/51] KVM: Split out memory attribute xarray updates to helper function Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 33/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
                   ` (18 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

While terminating the guest, reclaim the guest pages added in the RMP
table. If the reclaim fails, then the page is no longer safe to be
released back to the system and leak them.

For more information see the SEV-SNP specification.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  28 +++
 arch/x86/kvm/svm/sev.c                        | 189 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |  19 ++
 3 files changed, 236 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 3293e86f9b8a..d8492af09796 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error
 
 See the SEV-SNP specification for further detail on the launch input.
 
+20. KVM_SNP_LAUNCH_UPDATE
+-------------------------
+
+The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
+calculates a measurement of the memory contents. The measurement is a signature
+of the memory contents that can be sent to the guest owner as an attestation
+that the memory was encrypted correctly by the firmware.
+
+Parameters (in): struct  kvm_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_update {
+                __u64 start_gfn;        /* Guest page number to start from. */
+                __u64 uaddr;            /* userspace address need to be encrypted */
+                __u32 len;              /* length of memory region */
+                __u8 imi_page;          /* 1 if memory is part of the IMI */
+                __u8 page_type;         /* page type */
+                __u8 vmpl3_perms;       /* VMPL3 permission mask */
+                __u8 vmpl2_perms;       /* VMPL2 permission mask */
+                __u8 vmpl1_perms;       /* VMPL1 permission mask */
+        };
+
+See the SEV-SNP spec for further details on how to build the VMPL permission
+mask and page type.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e65f3be67c23..6a82767d940f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -234,6 +234,36 @@ static void sev_decommission(unsigned int handle)
 	sev_guest_decommission(&decommission, NULL);
 }
 
+static int snp_page_reclaim(u64 pfn)
+{
+	struct sev_data_snp_page_reclaim data = {0};
+	int err, rc;
+
+	data.paddr = __sme_set(pfn << PAGE_SHIFT);
+	rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+	if (rc) {
+		/*
+		 * If the reclaim failed, then page is no longer safe
+		 * to use.
+		 */
+		snp_leak_pages(pfn, 1);
+	}
+
+	return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+	int rc;
+
+	rc = rmp_make_shared(pfn, level);
+	if (rc && leak)
+		snp_leak_pages(pfn,
+			       page_level_size(level) >> PAGE_SHIFT);
+
+	return rc;
+}
+
 static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 {
 	struct sev_data_deactivate deactivate;
@@ -1953,6 +1983,162 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return rc;
 }
 
+static int snp_launch_update_gfn_handler(struct kvm *kvm,
+					 struct kvm_gfn_range *range,
+					 void *opaque)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_memory_slot *memslot = range->slot;
+	struct sev_data_snp_launch_update data = {0};
+	struct kvm_sev_snp_launch_update params;
+	struct kvm_sev_cmd *argp = opaque;
+	int *error = &argp->error;
+	int i, n = 0, ret = 0;
+	unsigned long npages;
+	kvm_pfn_t *pfns;
+	gfn_t gfn;
+
+	if (!kvm_slot_can_be_private(memslot)) {
+		pr_err("SEV-SNP requires restricted memory.\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
+		pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
+		return -EFAULT;
+	}
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+
+	npages = range->end - range->start;
+	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
+	if (!pfns)
+		return -ENOMEM;
+
+	pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
+		 range->start, range->end, params.page_type);
+
+	for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
+		int order, level;
+		bool assigned;
+		void *kvaddr;
+
+		ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfns[i], &order);
+		if (ret)
+			goto e_release;
+
+		n++;
+		ret = snp_lookup_rmpentry((u64)pfns[i], &assigned, &level);
+		if (ret || assigned) {
+			pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d, assigned: %d\n",
+			       gfn, ret, assigned);
+			return -EFAULT;
+		}
+
+		kvaddr = pfn_to_kaddr(pfns[i]);
+		if (!virt_addr_valid(kvaddr)) {
+			pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
+			ret = -EINVAL;
+			goto e_release;
+		}
+
+		ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+		if (ret) {
+			pr_err("Guest read failed, ret: 0x%x\n", ret);
+			goto e_release;
+		}
+
+		ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
+				       sev_get_asid(kvm), true);
+		if (ret) {
+			ret = -EFAULT;
+			goto e_release;
+		}
+
+		data.address = __sme_set(pfns[i] << PAGE_SHIFT);
+		data.page_size = X86_TO_RMP_PG_LEVEL(PG_LEVEL_4K);
+		data.page_type = params.page_type;
+		data.vmpl3_perms = params.vmpl3_perms;
+		data.vmpl2_perms = params.vmpl2_perms;
+		data.vmpl1_perms = params.vmpl1_perms;
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, error);
+		if (ret) {
+			pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
+			       ret, *error);
+			snp_page_reclaim(pfns[i]);
+
+			/*
+			 * When invalid CPUID function entries are detected, the firmware
+			 * corrects these entries for debugging purpose and leaves the
+			 * page unencrypted so it can be provided users for debugging
+			 * and error-reporting.
+			 *
+			 * Copy the corrected CPUID page back to shared memory so
+			 * userpsace can retrieve this information.
+			 */
+			if (params.page_type == SNP_PAGE_TYPE_CPUID &&
+			    *error == SEV_RET_INVALID_PARAM) {
+				int ret;
+
+				host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+				ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+				if (ret)
+					pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
+					       ret);
+			}
+
+			goto e_release;
+		}
+	}
+
+	/*
+	 * Memory attribute updates via KVM_SET_MEMORY_ATTRIBUTES are serialized
+	 * via kvm->slots_lock, so use the same protocol for updating them here.
+	 */
+	mutex_lock(&kvm->slots_lock);
+	kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+	mutex_unlock(&kvm->slots_lock);
+
+e_release:
+	/* Content of memory is updated, mark pages dirty */
+	for (i = 0; i < n; i++) {
+		set_page_dirty(pfn_to_page(pfns[i]));
+		mark_page_accessed(pfn_to_page(pfns[i]));
+
+		/*
+		 * If its an error, then update RMP entry to change page ownership
+		 * to the hypervisor.
+		 */
+		if (ret)
+			host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+		put_page(pfn_to_page(pfns[i]));
+	}
+
+	kvfree(pfns);
+	return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_launch_update params;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
+				      snp_launch_update_gfn_handler, argp);
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2046,6 +2232,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_START:
 		r = snp_launch_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_UPDATE:
+		r = snp_launch_update(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b4c7ac9710d3..4961d2e67a4b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1929,6 +1929,7 @@ enum sev_cmd_id {
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
+	KVM_SEV_SNP_LAUNCH_UPDATE,
 
 	KVM_SEV_NR_MAX,
 };
@@ -2045,6 +2046,24 @@ struct kvm_sev_snp_launch_start {
 	__u8 pad[6];
 };
 
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA		0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS		0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID		0x6
+
+struct kvm_sev_snp_launch_update {
+	__u64 start_gfn;
+	__u64 uaddr;
+	__u32 len;
+	__u8 imi_page;
+	__u8 page_type;
+	__u8 vmpl3_perms;
+	__u8 vmpl2_perms;
+	__u8 vmpl1_perms;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 33/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (31 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 32/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 34/51] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Michael Roth
                   ` (17 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh, Harald Hoyer

From: Brijesh Singh <brijesh.singh@amd.com>

The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

If its an SNP guest, then VMSA was added in the RMP entry as
a guest owned page and also removed from the kernel direct map
so flush it later after it is transitioned back to hypervisor
state and restored in the direct map.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: always measure BSP first to get consistent launch measurements]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  24 +++
 arch/x86/kvm/svm/sev.c                        | 146 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |  14 ++
 3 files changed, 184 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index d8492af09796..cd77a19577fe 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -513,6 +513,30 @@ Returns: 0 on success, -negative on error
 See the SEV-SNP spec for further details on how to build the VMPL permission
 mask and page type.
 
+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_finish {
+                __u64 id_block_uaddr;
+                __u64 id_auth_uaddr;
+                __u8 id_block_en;
+                __u8 auth_key_en;
+                __u8 host_data[32];
+                __u8 pad[6];
+        };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6a82767d940f..a7cbdc24ccdb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -66,6 +66,8 @@ static bool sev_snp_enabled;
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
 
+#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -727,7 +729,29 @@ static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (!sev_es_guest(kvm))
 		return -ENOTTY;
 
+	/* Handle boot vCPU first to ensure consistent measurement of initial state. */
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (vcpu->vcpu_id != 0)
+			continue;
+
+		ret = mutex_lock_killable(&vcpu->mutex);
+		if (ret)
+			return ret;
+
+		ret = __sev_launch_update_vmsa(kvm, vcpu, &argp->error);
+
+		mutex_unlock(&vcpu->mutex);
+		if (ret)
+			return ret;
+
+		break;
+	}
+
+	/* Handle remaining vCPUs. */
 	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (vcpu->vcpu_id == 0)
+			continue;
+
 		ret = mutex_lock_killable(&vcpu->mutex);
 		if (ret)
 			return ret;
@@ -2139,6 +2163,109 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 				      snp_launch_update_gfn_handler, argp);
 }
 
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {};
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+	int ret;
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.page_type = SNP_PAGE_TYPE_VMSA;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct vcpu_svm *svm = to_svm(vcpu);
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		/* Perform some pre-encryption checks against the VMSA */
+		ret = sev_es_sync_vmsa(svm);
+		if (ret)
+			return ret;
+
+		/* Transition the VMSA page to a firmware state. */
+		ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+		if (ret)
+			return ret;
+
+		/* Issue the SNP command to encrypt the VMSA */
+		data.address = __sme_pa(svm->sev_es.vmsa);
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, &argp->error);
+		if (ret) {
+			snp_page_reclaim(pfn);
+			return ret;
+		}
+
+		svm->vcpu.arch.guest_state_protected = true;
+	}
+
+	return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_launch_finish params;
+	struct sev_data_snp_launch_finish *data;
+	void *id_block = NULL, *id_auth = NULL;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	/* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+	ret = snp_launch_update_vmsa(kvm, argp);
+	if (ret)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	if (params.id_block_en) {
+		id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+		if (IS_ERR(id_block)) {
+			ret = PTR_ERR(id_block);
+			goto e_free;
+		}
+
+		data->id_block_en = 1;
+		data->id_block_paddr = __sme_pa(id_block);
+
+		id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+		if (IS_ERR(id_auth)) {
+			ret = PTR_ERR(id_auth);
+			goto e_free_id_block;
+		}
+
+		data->id_auth_paddr = __sme_pa(id_auth);
+
+		if (params.auth_key_en)
+			data->auth_key_en = 1;
+	}
+
+	memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+	kfree(id_auth);
+
+e_free_id_block:
+	kfree(id_block);
+
+e_free:
+	kfree(data);
+
+	return ret;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2235,6 +2362,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_UPDATE:
 		r = snp_launch_update(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_FINISH:
+		r = snp_launch_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2695,11 +2825,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 
 	svm = to_svm(vcpu);
 
+	/*
+	 * If its an SNP guest, then VMSA was added in the RMP entry as
+	 * a guest owned page. Transition the page to hypervisor state
+	 * before releasing it back to the system.
+	 * Also the page is removed from the kernel direct map, so flush it
+	 * later after it is transitioned back to hypervisor state and
+	 * restored in the direct map.
+	 */
+	if (sev_snp_guest(vcpu->kvm)) {
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
+			goto skip_vmsa_free;
+	}
+
 	if (vcpu->arch.guest_state_protected)
 		sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
 
 	__free_page(virt_to_page(svm->sev_es.vmsa));
 
+skip_vmsa_free:
 	if (svm->sev_es.ghcb_sa_free)
 		kvfree(svm->sev_es.ghcb_sa);
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4961d2e67a4b..1fb6a6615d09 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1930,6 +1930,7 @@ enum sev_cmd_id {
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
+	KVM_SEV_SNP_LAUNCH_FINISH,
 
 	KVM_SEV_NR_MAX,
 };
@@ -2064,6 +2065,19 @@ struct kvm_sev_snp_launch_update {
 	__u8 vmpl1_perms;
 };
 
+#define KVM_SEV_SNP_ID_BLOCK_SIZE	96
+#define KVM_SEV_SNP_ID_AUTH_SIZE	4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE	32
+
+struct kvm_sev_snp_launch_finish {
+	__u64 id_block_uaddr;
+	__u64 id_auth_uaddr;
+	__u8 id_block_en;
+	__u8 auth_key_en;
+	__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+	__u8 pad[6];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 34/51] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (32 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 33/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 35/51] KVM: SVM: Add KVM_EXIT_VMGEXIT Michael Roth
                   ` (16 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
mismatch is detected then abort the guest.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  8 ++++++++
 arch/x86/kvm/svm/sev.c            | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h            |  7 +++++++
 3 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index aaea4afcda98..d499730e1f15 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ		0x010
+#define GHCB_MSR_GPA_VALUE_POS		12
+#define GHCB_MSR_GPA_VALUE_MASK		GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP		0x011
+#define GHCB_MSR_PREF_GPA_NONE		0xfffffffffffff
+
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
 #define GHCB_MSR_REG_GPA_REQ_VAL(v)			\
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a7cbdc24ccdb..44fdcf407759 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3312,6 +3312,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PREF_GPA_REQ: {
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
+	case GHCB_MSR_REG_GPA_REQ: {
+		u64 gfn;
+
+		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+					GHCB_MSR_GPA_VALUE_POS);
+
+		svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+		set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3378,6 +3399,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 
 	exit_code = ghcb_get_sw_exit_code(ghcb);
 
+	/* SEV-SNP guest requires that the GHCB GPA must be registered */
+	if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+		vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+		return -EINVAL;
+	}
+
 	ret = sev_es_validate_vmgexit(svm);
 	if (ret)
 		return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9a7cafb018fe..02edbdd443e4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -205,6 +205,8 @@ struct vcpu_sev_es_state {
 	u32 ghcb_sa_len;
 	bool ghcb_sa_sync;
 	bool ghcb_sa_free;
+
+	u64 ghcb_registered_gpa;
 };
 
 struct vcpu_svm {
@@ -359,6 +361,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
 	return sev_es_guest(kvm) && sev->snp_active;
 }
 
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+	return svm->sev_es.ghcb_registered_gpa == val;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 35/51] KVM: SVM: Add KVM_EXIT_VMGEXIT
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (33 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 34/51] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 36/51] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
                   ` (15 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

For private memslots, GHCB page state change requests will be forwarded
to userspace for processing. Define a new KVM_EXIT_VMGEXIT for exits of
this type, as well as other potential userspace handling for VMGEXITs in
the future.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/kvm/api.rst | 34 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h       |  6 ++++++
 2 files changed, 40 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index df37aa11512d..028fd3fa50a7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6780,6 +6780,40 @@ Please note that the kernel is allowed to use the kvm_run structure as the
 primary storage for certain register types. Therefore, the kernel may use the
 values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
+::
+
+		/* KVM_EXIT_VMGEXIT */
+		struct {
+			__u64 ghcb_msr; /* GHCB MSR contents */
+			__u64 ret;      /* user -> kernel return value */
+		} memory;
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest has
+issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. This is generally handled via the Guest-Hypervisor Communication
+Block (GHCB) specification. The value of 'ghcb_msr' will be the contents of
+the GHCB MSR register at the time of the VMGEXIT, which can either be the GPA
+of the GHCB page for page-based GHCB requests, or an encoding of an MSR-based
+GHCB request. The mechanism to distinguish between these two and determine the
+type of request is the same as what is documented in the GHCB specification.
+
+Not all VMGEXITs or GHCB requests will be forwarded to userspace. Currently
+this will only be the case for "SNP Page State Change" requests (PSCs), and
+only for the subset of these which involve actual shared <-> private
+transition. Userspace is expected to process these requests in accordance
+with the GHCB specification and issue KVM_SET_MEMORY_ATTRIBUTE ioctls to
+perform the shared/private transitions.
+
+GHCB page-based PSC requests require returning a 64-bit return value to the
+guest via the SW_EXITINFO2 field of the vCPU's VMCB structure, as documented
+in the GHCB. Userspace must set 'ret' to what the GHCB specification documents
+the SW_EXITINFO2 VMCB field should be set to after processing a PSC request.
+
+For MSR-based PSC requests, userspace must set the value of 'ghcb_msr' to be
+the same as what the GHCB specification documents the actual GHCB MSR register
+should be set to after processing a PSC request.
+
 
 6. Capabilities that can be enabled on vCPUs
 ============================================
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1fb6a6615d09..175b958f103f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -279,6 +279,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_MEMORY_FAULT     38
+#define KVM_EXIT_VMGEXIT          50
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -527,6 +528,11 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory;
+		/* KVM_EXIT_VMGEXIT */
+		struct {
+			__u64 ghcb_msr; /* GHCB MSR contents */
+			__u64 ret; /* user -> kernel */
+		} vmgexit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 36/51] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (34 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 35/51] KVM: SVM: Add KVM_EXIT_VMGEXIT Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 37/51] KVM: SVM: Add support to handle " Michael Roth
                   ` (14 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell KVM MMU whether or not a particular GFN should be
backed by private memory or not.

Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kvm/svm/sev.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 44fdcf407759..2afc59b86b91 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3233,6 +3233,15 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	set_ghcb_msr(svm, vcpu->run->vmgexit.ghcb_msr);
+
+	return 1; /* resume */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3333,6 +3342,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PSC_REQ:
+		vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+		vcpu->run->vmgexit.ghcb_msr = control->ghcb_gpa;
+		vcpu->arch.complete_userspace_io = snp_complete_psc_msr_protocol;
+
+		ret = -1;
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 37/51] KVM: SVM: Add support to handle Page State Change VMGEXIT
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (35 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 36/51] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 38/51] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
                   ` (13 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kvm/svm/sev.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2afc59b86b91..9b9dff7728c8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3039,6 +3039,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
+	case SVM_VMGEXIT_PSC:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3242,6 +3243,15 @@ static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
 	return 1; /* resume */
 }
 
+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, vcpu->run->vmgexit.ret);
+
+	return 1; /* resume */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3485,6 +3495,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_PSC:
+		/* Let userspace handling allocating/deallocating backing pages. */
+		vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+		vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
+		vcpu->arch.complete_userspace_io = snp_complete_psc;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 38/51] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (36 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 37/51] KVM: SVM: Add support to handle " Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 39/51] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
                   ` (12 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

While resolving the RMP page fault, there may be cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, zap the gfn range after
splitting the pages in the RMP entry. The zap should force the TDP to
gets rebuilt with the new page level.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/mmu.h              | 2 --
 arch/x86/kvm/mmu/mmu.c          | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8d2bb3ff66a2..026bfc4446ee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1851,6 +1851,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
 
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 92d5a1924fc1..963c734642f6 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -235,8 +235,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return -(u32)fault & errcode;
 }
 
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0d3983b9aa7e..dff0eb018b27 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6732,6 +6732,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 
 	return need_tlb_flush;
 }
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
 
 static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 					   const struct kvm_memory_slot *slot)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 39/51] KVM: x86: Define RMP page fault error bits for #NPF
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (37 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 38/51] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 40/51] KVM: SVM: Add support to handle RMP nested page faults Michael Roth
                   ` (11 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled globally, the hardware places restrictions on all
memory accesses based on the RMP entry, whether the hypervisor or a VM,
performs the accesses. When hardware encounters an RMP access violation
during a guest access, it will cause a #VMEXIT(NPF).

See APM2 section 16.36.10 for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 026bfc4446ee..2fcd309fd9fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -253,9 +253,13 @@ enum x86_intercept_stage;
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
 #define PFERR_SGX_BIT 15
+#define PFERR_GUEST_RMP_BIT 31
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
 #define PFERR_IMPLICIT_ACCESS_BIT 48
+#define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36
 
 #define PFERR_PRESENT_MASK	BIT(PFERR_PRESENT_BIT)
 #define PFERR_WRITE_MASK	BIT(PFERR_WRITE_BIT)
@@ -267,6 +271,10 @@ enum x86_intercept_stage;
 #define PFERR_GUEST_FINAL_MASK	BIT_ULL(PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+#define PFERR_GUEST_RMP_MASK	BIT_ULL(PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_ENC_MASK	BIT_ULL(PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_SIZEM_MASK	BIT_ULL(PFERR_GUEST_SIZEM_BIT)
+#define PFERR_GUEST_VMPL_MASK	BIT_ULL(PFERR_GUEST_VMPL_BIT)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 40/51] KVM: SVM: Add support to handle RMP nested page faults
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (38 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 39/51] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 41/51] KVM: SVM: Use a VMSA physical address variable for populating VMCB Michael Roth
                   ` (10 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.

When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kvm/svm/sev.c | 85 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c | 21 +++++++++--
 arch/x86/kvm/svm/svm.h |  1 +
 3 files changed, 103 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9b9dff7728c8..1ba49c5ebaed 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3234,6 +3234,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
+{
+	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+	return psmash(pfn);
+}
+
 static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3696,3 +3703,81 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 
 	return p;
 }
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+	struct kvm_memory_slot *slot;
+	struct kvm *kvm = vcpu->kvm;
+	int order, rmp_level, ret;
+	bool assigned;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+
+	/*
+	 * Private memslots forward handling of implicit page state changes
+	 * to userspace, so the only RMP faults expected here are for
+	 * PFERR_GUEST_SIZEM_MASK. Anything else suggests that the RMP table
+	 * has gotten out of sync with the private memslot. Generally...
+	 *
+	 * However, there is a transient case where access to an NPT mapping
+	 * that has just been split/PSMASH'd can generate an RMP fault. In this
+	 * case the PFERR_GUEST_SIZEM bit might not be set. In these cases it
+	 * should be safe to ignore and let the guest retry, but allow for
+	 * these to be optionally logged to diagnose exceptional cases.
+	 */
+	if (!(error_code & PFERR_GUEST_SIZEM_MASK)) {
+		pr_debug_ratelimited("Unexpected RMP fault for GPA 0x%llx, error_code 0x%llx",
+				     gpa, error_code);
+		return;
+	}
+
+	gfn = gpa >> PAGE_SHIFT;
+
+	/*
+	 * Only RMPADJUST/PVALIDATE should cause PFERR_GUEST_SIZEM.
+	 *
+	 * For PVALIDATE, this should only happen if a guest PVALIDATEs a 4K GFN
+	 * that is backed by a huge page in the host whose RMP entry has the
+	 * hugepage/assigned bits set. With UPM, that should only ever happen
+	 * for private pages.
+	 *
+	 * For RMPADJUST, this assumption might not hold, in which case handling
+	 * for obtaining the PFN from HVA-backed memory may be needed. For now,
+	 * just print warnings.
+	 */
+	if (!kvm_mem_is_private(kvm, gfn)) {
+		pr_warn_ratelimited("Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_warn_ratelimited("Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+	if (ret) {
+		pr_warn_ratelimited("Unexpected RMP fault, no private backing page for GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+	if (ret || !assigned) {
+		pr_warn_ratelimited("Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
+				    gpa, pfn, ret);
+		goto out;
+	}
+
+	ret = snp_rmptable_psmash(kvm, pfn);
+	if (ret)
+		pr_err_ratelimited("Unable to split RMP entries for GPA 0x%llx PFN 0x%llx ret %d\n",
+				   gpa, pfn, ret);
+
+out:
+	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+	put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 065167b42f90..0cff050bf5bb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1995,15 +1995,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 static int npf_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	int rc;
 
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
 	trace_kvm_page_fault(vcpu, fault_address, error_code);
-	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
-			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
-			svm->vmcb->control.insn_bytes : NULL,
-			svm->vmcb->control.insn_len);
+	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+				svm->vmcb->control.insn_bytes : NULL,
+				svm->vmcb->control.insn_len);
+
+	/*
+	 * rc == 0 indicates a userspace exit is needed to handle page
+	 * transitions, so do that first before updating the RMP table.
+	 */
+	if (error_code & PFERR_GUEST_RMP_MASK) {
+		if (rc == 0)
+			return rc;
+		handle_rmp_page_fault(vcpu, fault_address, error_code);
+	}
+
+	return rc;
 }
 
 static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 02edbdd443e4..4cf9dbc442e9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -762,6 +762,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 41/51] KVM: SVM: Use a VMSA physical address variable for populating VMCB
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (39 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 40/51] KVM: SVM: Add support to handle RMP nested page faults Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 42/51] KVM: SVM: Support SEV-SNP AP Creation NAE event Michael Roth
                   ` (9 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

From: Tom Lendacky <thomas.lendacky@amd.com>

In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 5 ++---
 arch/x86/kvm/svm/svm.c | 9 ++++++++-
 arch/x86/kvm/svm/svm.h | 1 +
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1ba49c5ebaed..111e43eede15 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3551,10 +3551,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 
 	/*
 	 * An SEV-ES guest requires a VMSA area that is a separate from the
-	 * VMCB page. Do not include the encryption mask on the VMSA physical
-	 * address since hardware will access it using the guest key.
+	 * VMCB page.
 	 */
-	svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
+	svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
 
 	/* Can't intercept CR register access, HV can't modify CR registers */
 	svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0cff050bf5bb..77195d8c1aa3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1419,9 +1419,16 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 	svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
 	svm_switch_vmcb(svm, &svm->vmcb01);
 
-	if (vmsa_page)
+	if (vmsa_page) {
 		svm->sev_es.vmsa = page_address(vmsa_page);
 
+		/*
+		 * Do not include the encryption mask on the VMSA physical
+		 * address since hardware will access it using the guest key.
+		 */
+		svm->sev_es.vmsa_pa = __pa(svm->sev_es.vmsa);
+	}
+
 	svm->guest_state_loaded = false;
 
 	return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4cf9dbc442e9..8dc7946ab634 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -197,6 +197,7 @@ struct vcpu_sev_es_state {
 	struct sev_es_save_area *vmsa;
 	struct ghcb *ghcb;
 	struct kvm_host_map ghcb_map;
+	hpa_t vmsa_pa;
 	bool received_first_sipi;
 	unsigned int ap_reset_hold_type;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 42/51] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (40 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 41/51] KVM: SVM: Use a VMSA physical address variable for populating VMCB Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-08-15 16:00   ` Peter Gonda
  2023-06-12  4:25 ` [PATCH RFC v9 43/51] KVM: SEV: Configure MMU to check for private fault flags Michael Roth
                   ` (8 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID. The GPA is saved in the svm struct of the
  target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
  to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID the next time an INIT is performed. The GPA is
  saved in the svm struct of the target vCPU.

For DESTROY:
  The guest indicates it wishes to stop the vCPU. The GPA is cleared
  from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
  added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. The handler sets the
vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
leave the vCPU as not runnable. Any previous VMSA pages that were
installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
a new VMSA is to be installed, the VMSA guest page is pinned and set as
the VMSA in the vCPU VMCB and the vCPU state is set to
KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
cleared in the vCPU VMCB and the vCPU state is left as
KVM_MP_STATE_UNINITIALIZED to prevent it from being run.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: add handling for restrictedmem]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/include/asm/svm.h      |   7 +-
 arch/x86/kvm/svm/sev.c          | 240 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |   3 +
 arch/x86/kvm/svm/svm.h          |   8 +-
 arch/x86/kvm/x86.c              |  11 ++
 6 files changed, 268 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2fcd309fd9fb..8f515e0386a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -113,6 +113,7 @@
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(34)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ac8edfdd60fa..0deb83ac800b 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -288,7 +288,12 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 
-#define SVM_SEV_FEAT_SNP_ACTIVE		BIT(0)
+#define SVM_SEV_FEAT_SNP_ACTIVE			BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION	BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION	BIT(4)
+#define SVM_SEV_FEAT_INT_INJ_MODES		\
+	(SVM_SEV_FEAT_RESTRICTED_INJECTION |	\
+	 SVM_SEV_FEAT_ALTERNATE_INJECTION)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 111e43eede15..ec74ff5e09c7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -638,6 +638,7 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 {
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
 	struct sev_es_save_area *save = svm->sev_es.vmsa;
 
 	/* Check some debug related fields before encrypting the VMSA */
@@ -683,6 +684,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	if (sev_snp_guest(svm->vcpu.kvm))
 		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
 
+	/*
+	 * Save the VMSA synced SEV features. For now, they are the same for
+	 * all vCPUs, so just save each time.
+	 */
+	sev->sev_features = save->sev_features;
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
@@ -3034,6 +3041,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		if (!ghcb_sw_scratch_is_valid(ghcb))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		if (lower_32_bits(ghcb_get_sw_exit_info_1(ghcb)) != SVM_VMGEXIT_AP_DESTROY)
+			if (!ghcb_rax_is_valid(ghcb))
+				goto vmgexit_err;
+		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3259,6 +3271,220 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
 	return 1; /* resume */
 }
 
+static kvm_pfn_t gfn_to_pfn_gmem(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	kvm_pfn_t pfn;
+	int order = 0;
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_err("SEV: Failure retrieving restricted memslot for GFN 0x%llx, flags 0x%x, userspace_addr: 0x%lx\n",
+		       gfn, slot->flags, slot->userspace_addr);
+		return INVALID_PAGE;
+	}
+
+	if (!kvm_mem_is_private(kvm, gfn)) {
+		pr_err("SEV: Failure retrieving restricted PFN for GFN 0x%llx\n", gfn);
+		return INVALID_PAGE;
+	}
+
+	if (kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order)) {
+		pr_err("SEV: Failure retrieving restricted PFN for GFN 0x%llx\n", gfn);
+		return INVALID_PAGE;
+	}
+
+	return pfn;
+}
+
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	kvm_pfn_t pfn;
+	hpa_t cur_pa;
+
+	WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+	/* Save off the current VMSA PA for later checks */
+	cur_pa = svm->sev_es.vmsa_pa;
+
+	/* Mark the vCPU as offline and not runnable */
+	vcpu->arch.pv.pv_unhalted = false;
+	vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+
+	/* Clear use of the VMSA */
+	svm->sev_es.vmsa_pa = INVALID_PAGE;
+	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+	/*
+	 * sev->sev_es.vmsa holds the virtual address of the VMSA initially
+	 * allocated by the host. If the guest specified a new a VMSA via
+	 * AP_CREATION, it will have been pinned to avoid future issues
+	 * with things like page migration support. Make sure to un-pin it
+	 * before switching to a newer guest-specified VMSA.
+	 */
+	if (cur_pa != __pa(svm->sev_es.vmsa) && VALID_PAGE(cur_pa))
+		kvm_release_pfn_dirty(__phys_to_pfn(cur_pa));
+
+	if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+		/*
+		 * The VMSA is referenced by the hypervisor physical address,
+		 * so retrieve the PFN and ensure it is restricted memory.
+		 */
+		pfn = gfn_to_pfn_gmem(vcpu->kvm, gpa_to_gfn(svm->sev_es.snp_vmsa_gpa));
+		if (!VALID_PAGE(pfn) || is_error_pfn(pfn))
+			return -EINVAL;
+
+		/* Use the new VMSA */
+		svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
+		svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
+
+		/* Mark the vCPU as runnable */
+		vcpu->arch.pv.pv_unhalted = false;
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+		svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	}
+
+	/*
+	 * When replacing the VMSA during SEV-SNP AP creation,
+	 * mark the VMCB dirty so that full state is always reloaded.
+	 */
+	vmcb_mark_all_dirty(svm->vmcb);
+
+	return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return;
+
+	mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+	if (!svm->sev_es.snp_ap_create)
+		goto unlock;
+
+	svm->sev_es.snp_ap_create = false;
+
+	ret = __sev_snp_update_protected_guest_state(vcpu);
+	if (ret)
+		vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+	mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_vcpu *target_vcpu;
+	struct vcpu_svm *target_svm;
+	unsigned int request;
+	unsigned int apic_id;
+	bool kick;
+	int ret;
+
+	request = lower_32_bits(svm->vmcb->control.exit_info_1);
+	apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+	/* Validate the APIC ID */
+	target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+	if (!target_vcpu) {
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+			    apic_id);
+		return -EINVAL;
+	}
+
+	ret = 0;
+
+	target_svm = to_svm(target_vcpu);
+
+	/*
+	 * The target vCPU is valid, so the vCPU will be kicked unless the
+	 * request is for CREATE_ON_INIT. For any errors at this stage, the
+	 * kick will place the vCPU in an non-runnable state.
+	 */
+	kick = true;
+
+	mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	target_svm->sev_es.snp_ap_create = true;
+
+	/* Interrupt injection mode shouldn't change for AP creation */
+	if (request < SVM_VMGEXIT_AP_DESTROY) {
+		u64 sev_features;
+
+		sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+		sev_features ^= sev->sev_features;
+		if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+				    vcpu->arch.regs[VCPU_REGS_RAX]);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	switch (request) {
+	case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+		kick = false;
+		fallthrough;
+	case SVM_VMGEXIT_AP_CREATE:
+		if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		/*
+		 * Malicious guest can RMPADJUST a large page into VMSA which
+		 * will hit the SNP erratum where the CPU will incorrectly signal
+		 * an RMP violation #PF if a hugepage collides with the RMP entry
+		 * of VMSA page, reject the AP CREATE request if VMSA address from
+		 * guest is 2M aligned.
+		 */
+		if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+			vcpu_unimpl(vcpu,
+				    "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+		break;
+	case SVM_VMGEXIT_AP_DESTROY:
+		break;
+	default:
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+			    request);
+		ret = -EINVAL;
+		break;
+	}
+
+out:
+	if (kick) {
+		if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+			target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+		kvm_vcpu_kick(target_vcpu);
+	}
+
+	mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	return ret;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3508,6 +3734,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
 		vcpu->arch.complete_userspace_io = snp_complete_psc;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		ret = sev_snp_ap_creation(svm);
+		if (ret) {
+			ghcb_set_sw_exit_info_1(ghcb, 1);
+			ghcb_set_sw_exit_info_2(ghcb,
+						X86_TRAP_GP |
+						SVM_EVTINJ_TYPE_EXEPT |
+						SVM_EVTINJ_VALID);
+		}
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3612,6 +3850,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
 	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
 					    GHCB_VERSION_MIN,
 					    sev_enc_bit));
+
+	mutex_init(&svm->sev_es.snp_vmsa_mutex);
 }
 
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 77195d8c1aa3..81b9f4e04a8d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1358,6 +1358,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	svm->spec_ctrl = 0;
 	svm->virt_spec_ctrl = 0;
 
+	if (init_event)
+		sev_snp_init_protected_guest_state(vcpu);
+
 	init_vmcb(vcpu);
 
 	if (!init_event)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8dc7946ab634..e73a58e489c7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -94,6 +94,7 @@ struct kvm_sev_info {
 	atomic_t migration_in_progress;
 	u64 snp_init_flags;
 	void *snp_context;      /* SNP guest context page */
+	u64 sev_features;	/* Features set at VMSA creation */
 };
 
 struct kvm_svm {
@@ -208,6 +209,10 @@ struct vcpu_sev_es_state {
 	bool ghcb_sa_free;
 
 	u64 ghcb_registered_gpa;
+
+	struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+	gpa_t snp_vmsa_gpa;
+	bool snp_ap_create;
 };
 
 struct vcpu_svm {
@@ -735,7 +740,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 #define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
-#define GHCB_HV_FT_SUPPORTED	GHCB_HV_FT_SNP
+#define GHCB_HV_FT_SUPPORTED	(GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)
 
 extern unsigned int max_sev_asid;
 
@@ -764,6 +769,7 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 10d76afa23d9..9e3c41e2a3ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10621,6 +10621,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+		if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+			kvm_vcpu_reset(vcpu, true);
+			if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+				r = 1;
+				goto out;
+			}
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -12816,6 +12824,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 		return true;
 #endif
 
+	if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+		return true;
+
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
 	    kvm_guest_apic_has_interrupt(vcpu)))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 43/51] KVM: SEV: Configure MMU to check for private fault flags
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (41 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 42/51] KVM: SVM: Support SEV-SNP AP Creation NAE event Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 44/51] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
                   ` (7 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

SEV-SNP uses a #NPF flag to indicate whether the guest expects a shared
or a private page, which in turn dictates whether the KVM MMU should
map a normal page or a gmem page. Set the appropriate mask to enable
this handling for SEV-SNP guests.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ec74ff5e09c7..909ecd90d199 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2006,6 +2006,8 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (rc)
 		goto e_free_context;
 
+	kvm->arch.mmu_private_fault_mask = PFERR_GUEST_ENC_MASK;
+
 	return 0;
 
 e_free_context:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 44/51] KVM: SEV: Implement gmem hook for initializing private pages
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (42 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 43/51] KVM: SEV: Configure MMU to check for private fault flags Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 45/51] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
                   ` (6 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

This will handle RMP table updates and direct map changes needed to put
a page into a private state before mapping it into an SEV-SNP guest.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 95 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  2 +
 arch/x86/kvm/svm/svm.h |  2 +
 3 files changed, 99 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 909ecd90d199..c5a1706387bf 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4022,3 +4022,98 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
 	put_page(pfn_to_page(pfn));
 }
+
+/* Check if GFN range is marked private in the KVM/gmem xarray. */
+static bool is_gfn_range_private(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	gfn_t gfn = start;
+
+	while (gfn++ < end)
+		if (!kvm_mem_is_private(kvm, gfn)) {
+			pr_debug("%s: overlap detected, GFN 0x%llx start 0x%llx end 0x%llx\n",
+				 __func__, gfn, start, end);
+			return false;
+		}
+
+	return true;
+}
+
+/* Check that no pages in PFN range have already been set to private in RMP table. */
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+	kvm_pfn_t pfn = start;
+
+	while (pfn++ < end) {
+		int ret, rmp_level;
+		bool assigned;
+
+		ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+		if (ret) {
+			pr_debug("%s: failed to retrieve RMP entry, assuming overlap, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d error %d\n",
+				 __func__, pfn, start, end, rmp_level, ret);
+			return false;
+		}
+
+		if (assigned == 1) {
+			pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+				 __func__, pfn, start, end, rmp_level);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+static int get_supported_rmp_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn)
+{
+	if (!IS_ALIGNED(pfn, PTRS_PER_PMD) || !IS_ALIGNED(gfn, PTRS_PER_PMD))
+		return PG_LEVEL_4K;
+
+	/*
+	 * Check that both the desired GFN range states in the xarray, and
+	 * current PFN range states in the RMP table, are conducive to
+	 * creating a 2M private RMP entry.
+	 */
+	if (is_gfn_range_private(kvm, gfn, gfn + PTRS_PER_PMD) &&
+	    is_pfn_range_shared(pfn, pfn + PTRS_PER_PMD))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, struct kvm_memory_slot *slot,
+		     kvm_pfn_t pfn, gfn_t gfn, u8 *max_level)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	int level, rc = 0;
+	bool assigned;
+
+	if (!sev_snp_guest(kvm))
+		return 0;
+
+	rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+	if (rc)
+		return rc;
+
+	/* No conversion needed, just clamp xax_level according to RMP entry. */
+	if (assigned)
+		goto out_adjust_level;
+
+	if (*max_level == PG_LEVEL_4K)
+		level = PG_LEVEL_4K;
+	else
+		level = get_supported_rmp_level(kvm, pfn, gfn);
+
+	rc = rmp_make_private(pfn, gfn_to_gpa(gfn), level, sev->asid, false);
+	if (rc)
+		pr_err_ratelimited("%s: failed gfn %llx pfn %llx level %d rc %d\n",
+				   __func__, gfn, pfn, level, rc);
+
+out_adjust_level:
+	pr_debug("%s: pfn %llx gfn %llx max_level %d level %d assigned %d\n",
+		 __func__, pfn, gfn, *max_level, level, assigned);
+	if (*max_level > level)
+		*max_level = level;
+
+	return rc;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 81b9f4e04a8d..9085a122907c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4934,6 +4934,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+	.gmem_prepare = sev_gmem_prepare,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e73a58e489c7..0438f52e4396 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -770,6 +770,8 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, struct kvm_memory_slot *slot,
+		     kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 45/51] KVM: SEV: Implement gmem hook for invalidating private pages
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (43 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 44/51] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 46/51] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
                   ` (5 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 43 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  1 +
 3 files changed, 45 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c5a1706387bf..543926fa3200 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4117,3 +4117,46 @@ int sev_gmem_prepare(struct kvm *kvm, struct kvm_memory_slot *slot,
 
 	return rc;
 }
+
+void sev_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end)
+{
+	kvm_pfn_t pfn;
+
+	if (!sev_snp_guest(kvm))
+		return;
+
+	pr_debug("%s: kvm %p pfn 0x%llx pfn_end 0x%llx\n",
+		 __func__, kvm, start, end);
+
+	for (pfn = start; pfn < end; pfn++) {
+		int rc, rmp_level;
+		bool assigned;
+
+		rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+		if (rc) {
+			pr_warn_ratelimited("SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+					    pfn, rc);
+			continue;
+		}
+
+		if (!assigned)
+			continue;
+
+		/*
+		 * If PFN is currently assigned as a 2M page, PSMASH it into
+		 * individual 4K RMP entries before attempting to convert a
+		 * 4K sub-page.
+		 */
+		if (rmp_level > PG_LEVEL_4K) {
+			rc = snp_rmptable_psmash(kvm, pfn);
+			if (rc)
+				pr_warn_ratelimited("SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+						    pfn, rc);
+		}
+
+		rc = rmp_make_shared(pfn, PG_LEVEL_4K);
+		if (rc)
+			pr_warn_ratelimited("SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+					    pfn, rc);
+	}
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9085a122907c..1390e47d0aa5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4936,6 +4936,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 
 	.gmem_prepare = sev_gmem_prepare,
+	.gmem_invalidate = sev_gmem_invalidate,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0438f52e4396..0d4c29a4300a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -772,6 +772,7 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, struct kvm_memory_slot *slot,
 		     kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
+void sev_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 46/51] KVM: SVM: Add module parameter to enable the SEV-SNP
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (44 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 45/51] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 47/51] iommu/amd: Add IOMMU_SNP_SHUTDOWN support Michael Roth
                   ` (4 subsequent siblings)
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 543926fa3200..adbe8c242d81 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -54,14 +54,16 @@ module_param_named(sev, sev_enabled, bool, 0444);
 /* enable/disable SEV-ES support */
 static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
 #else
 #define sev_enabled false
 #define sev_es_enabled false
+#define sev_snp_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
-/* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
-
 #define AP_RESET_HOLD_NONE		0
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 47/51] iommu/amd: Add IOMMU_SNP_SHUTDOWN support
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (45 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 46/51] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-09-07 10:31   ` Suthikulpanit, Suravee
  2023-06-12  4:25 ` [PATCH RFC v9 48/51] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
                   ` (3 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

From: Ashish Kalra <ashish.kalra@amd.com>

Add a new IOMMU API interface amd_iommu_snp_disable() to transition
IOMMU pages to Hypervisor state from Reclaim state after SNP_SHUTDOWN_EX
command. Invoke this API from the CCP driver after SNP_SHUTDOWN_EX
command.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 20 +++++++++++++
 drivers/iommu/amd/init.c     | 55 ++++++++++++++++++++++++++++++++++++
 include/linux/amd-iommu.h    |  1 +
 3 files changed, 76 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 0bfe9721c977..b8e8c4da4025 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -24,6 +24,7 @@
 #include <linux/cpufeature.h>
 #include <linux/fs.h>
 #include <linux/fs_struct.h>
+#include <linux/amd-iommu.h>
 
 #include <asm/smp.h>
 #include <asm/cacheflush.h>
@@ -1508,6 +1509,25 @@ static int __sev_snp_shutdown_locked(int *error)
 		return ret;
 	}
 
+	/*
+	 * SNP_SHUTDOWN_EX with IOMMU_SNP_SHUTDOWN set to 1 disables SNP
+	 * enforcement by the IOMMU and also transitions all pages
+	 * associated with the IOMMU to the Reclaim state.
+	 * Firmware was transitioning the IOMMU pages to Hypervisor state
+	 * before version 1.53. But, accounting for the number of assigned
+	 * 4kB pages in a 2M page was done incorrectly by not transitioning
+	 * to the Reclaim state. This resulted in RMP #PF when later accessing
+	 * the 2M page containing those pages during kexec boot. Hence, the
+	 * firmware now transitions these pages to Reclaim state and hypervisor
+	 * needs to transition these pages to shared state. SNP Firmware
+	 * version 1.53 and above are needed for kexec boot.
+	 */
+	ret = amd_iommu_snp_disable();
+	if (ret) {
+		dev_err(sev->dev, "SNP IOMMU shutdown failed\n");
+		return ret;
+	}
+
 	sev->snp_initialized = false;
 	dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 33ea62d93540..a84ec81cfbb5 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -30,6 +30,7 @@
 #include <asm/io_apic.h>
 #include <asm/irq_remapping.h>
 #include <asm/set_memory.h>
+#include <asm/sev-host.h>
 
 #include <linux/crash_dump.h>
 
@@ -3701,4 +3702,58 @@ int amd_iommu_snp_enable(void)
 
 	return 0;
 }
+
+static int iommu_page_make_shared(void *page)
+{
+	unsigned long paddr, pfn;
+
+	paddr = iommu_virt_to_phys(page);
+	/* Cbit maybe set in the paddr */
+	pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+	return rmp_make_shared(pfn, PG_LEVEL_4K);
+}
+
+static int iommu_make_shared(void *va, size_t size)
+{
+	void *page;
+	int ret;
+
+	if (!va)
+		return 0;
+
+	for (page = va; page < (va + size); page += PAGE_SIZE) {
+		ret = iommu_page_make_shared(page);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int amd_iommu_snp_disable(void)
+{
+	struct amd_iommu *iommu;
+	int ret;
+
+	if (!amd_iommu_snp_en)
+		return 0;
+
+	for_each_iommu(iommu) {
+		ret = iommu_make_shared(iommu->evt_buf, EVT_BUFFER_SIZE);
+		if (ret)
+			return ret;
+
+		ret = iommu_make_shared(iommu->ppr_log, PPR_LOG_SIZE);
+		if (ret)
+			return ret;
+
+		ret = iommu_make_shared((void *)iommu->cmd_sem, PAGE_SIZE);
+		if (ret)
+			return ret;
+	}
+
+	amd_iommu_snp_en = false;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(amd_iommu_snp_disable);
 #endif
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 8f0cde2d451c..7ba46118d0f1 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -208,6 +208,7 @@ struct amd_iommu *get_amd_iommu(unsigned int idx);
 
 #ifdef CONFIG_KVM_AMD_SEV
 int amd_iommu_snp_enable(void);
+int amd_iommu_snp_disable(void);
 #endif
 
 #endif /* _ASM_X86_AMD_IOMMU_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 48/51] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (46 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 47/51] iommu/amd: Add IOMMU_SNP_SHUTDOWN support Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-13  6:24   ` Alexey Kardashevskiy
  2023-06-12  4:25 ` [PATCH RFC v9 49/51] x86/sev: Add KVM commands for per-instance certs Michael Roth
                   ` (2 subsequent siblings)
  50 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh, Alexey Kardashevskiy,
	Dionna Glaze

From: Brijesh Singh <brijesh.singh@amd.com>

The SEV-SNP firmware provides the SNP_CONFIG command used to set the
system-wide configuration value for SNP guests. The information includes
the TCB version string to be reported in guest attestation reports.

Version 2 of the GHCB specification adds an NAE (SNP extended guest
request) that a guest can use to query the reports that include additional
certificates.

In both cases, userspace provided additional data is included in the
attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
command to give the certificate blob and the reported TCB version string
at once. Note that the specification defines certificate blob with a
specific GUID format; the userspace is responsible for building the
proper certificate blob. The ioctl treats it an opaque blob.

While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
command that can be used to obtain the data programmed through the
SNP_SET_EXT_CONFIG.

Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: squash in doc patch from Dionna]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst |  27 ++++
 drivers/crypto/ccp/sev-dev.c          | 178 ++++++++++++++++++++++++++
 drivers/crypto/ccp/sev-dev.h          |   2 +
 include/linux/psp-sev.h               |  10 ++
 include/uapi/linux/psp-sev.h          |  17 +++
 5 files changed, 234 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 11ea67c944df..6cad4226c348 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
 status includes API major, minor version and more. See the SEV-SNP
 specification for further details.
 
+2.5 SNP_SET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar to
+SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
+command also accepts an additional certificate blob defined in the GHCB
+specification.
+
+If the certs_address is zero, then the previous certificate blob will deleted.
+For more information on the certificate blob layout, see the GHCB spec
+(extended guest request message).
+
+2.6 SNP_GET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
+through the SNP_SET_EXT_CONFIG.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index b8e8c4da4025..175c24163ba0 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1491,6 +1491,10 @@ static int __sev_snp_shutdown_locked(int *error)
 	data.length = sizeof(data);
 	data.iommu_snp_shutdown = 1;
 
+	/* Free the memory used for caching the certificate data */
+	sev_snp_certs_put(sev->snp_certs);
+	sev->snp_certs = NULL;
+
 	wbinvd_on_all_cpus();
 
 retry:
@@ -1829,6 +1833,126 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
 	return ret;
 }
 
+static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_user_data_ext_snp_config input;
+	struct sev_snp_certs *snp_certs;
+	int ret;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	memset(&input, 0, sizeof(input));
+
+	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+		return -EFAULT;
+
+	/* Copy the TCB version programmed through the SET_CONFIG to userspace */
+	if (input.config_address) {
+		if (copy_to_user((void * __user)input.config_address,
+				 &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
+			return -EFAULT;
+	}
+
+	snp_certs = sev_snp_certs_get(sev->snp_certs);
+
+	/* Copy the extended certs programmed through the SNP_SET_CONFIG */
+	if (input.certs_address && snp_certs) {
+		if (input.certs_len < snp_certs->len) {
+			/* Return the certs length to userspace */
+			input.certs_len = snp_certs->len;
+
+			ret = -EIO;
+			goto e_done;
+		}
+
+		if (copy_to_user((void * __user)input.certs_address,
+				 snp_certs->data, snp_certs->len)) {
+			ret = -EFAULT;
+			goto put_exit;
+		}
+	}
+
+	ret = 0;
+
+e_done:
+	if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
+		ret = -EFAULT;
+
+put_exit:
+	sev_snp_certs_put(snp_certs);
+
+	return ret;
+}
+
+static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_user_data_ext_snp_config input;
+	struct sev_user_data_snp_config config;
+	struct sev_snp_certs *snp_certs = NULL;
+	void *certs = NULL;
+	int ret = 0;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	memset(&input, 0, sizeof(input));
+
+	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+		return -EFAULT;
+
+	/* Copy the certs from userspace */
+	if (input.certs_address) {
+		if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
+			return -EINVAL;
+
+		certs = psp_copy_user_blob(input.certs_address, input.certs_len);
+		if (IS_ERR(certs))
+			return PTR_ERR(certs);
+	}
+
+	/* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
+	if (input.config_address) {
+		memset(&config, 0, sizeof(config));
+		if (copy_from_user(&config,
+				   (void __user *)input.config_address, sizeof(config))) {
+			ret = -EFAULT;
+			goto e_free;
+		}
+
+		ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
+		if (ret)
+			goto e_free;
+
+		memcpy(&sev->snp_config, &config, sizeof(config));
+	}
+
+	/*
+	 * If the new certs are passed then cache it else free the old certs.
+	 */
+	if (input.certs_len) {
+		snp_certs = sev_snp_certs_new(certs, input.certs_len);
+		if (!snp_certs) {
+			ret = -ENOMEM;
+			goto e_free;
+		}
+	}
+
+	sev_snp_certs_put(sev->snp_certs);
+	sev->snp_certs = snp_certs;
+
+	return 0;
+
+e_free:
+	kfree(certs);
+	return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -1883,6 +2007,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SNP_PLATFORM_STATUS:
 		ret = sev_ioctl_snp_platform_status(&input);
 		break;
+	case SNP_SET_EXT_CONFIG:
+		ret = sev_ioctl_snp_set_config(&input, writable);
+		break;
+	case SNP_GET_EXT_CONFIG:
+		ret = sev_ioctl_snp_get_config(&input);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
@@ -1931,6 +2061,54 @@ int sev_guest_df_flush(int *error)
 }
 EXPORT_SYMBOL_GPL(sev_guest_df_flush);
 
+static void sev_snp_certs_release(struct kref *kref)
+{
+	struct sev_snp_certs *certs = container_of(kref, struct sev_snp_certs, kref);
+
+	kfree(certs->data);
+	kfree(certs);
+}
+
+struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len)
+{
+	struct sev_snp_certs *certs;
+
+	if (!len || !data)
+		return NULL;
+
+	certs = kzalloc(sizeof(*certs), GFP_KERNEL);
+	if (!certs)
+		return NULL;
+
+	certs->data = data;
+	certs->len = len;
+	kref_init(&certs->kref);
+
+	return certs;
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_new);
+
+struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs)
+{
+	if (!certs)
+		return NULL;
+
+	if (!kref_get_unless_zero(&certs->kref))
+		return NULL;
+
+	return certs;
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_get);
+
+void sev_snp_certs_put(struct sev_snp_certs *certs)
+{
+	if (!certs)
+		return;
+
+	kref_put(&certs->kref, sev_snp_certs_release);
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_put);
+
 static void sev_exit(struct kref *ref)
 {
 	misc_deregister(&misc_dev->misc);
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 19d79f9d4212..22374f3d3e2e 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -66,6 +66,8 @@ struct sev_device {
 
 	bool snp_initialized;
 	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
+	struct sev_snp_certs *snp_certs;
+	struct sev_user_data_snp_config snp_config;
 };
 
 int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 5ae61de96e44..2191d8b5423a 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -24,6 +24,16 @@
 
 #define SEV_FW_BLOB_MAX_SIZE	0x4000	/* 16KB */
 
+struct sev_snp_certs {
+	void *data;
+	u32 len;
+	struct kref kref;
+};
+
+struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len);
+struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs);
+void sev_snp_certs_put(struct sev_snp_certs *certs);
+
 /**
  * SEV platform state
  */
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 4dc6a3e7b3d5..d1e6a0615546 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -29,6 +29,8 @@ enum {
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
 	SNP_PLATFORM_STATUS,
+	SNP_SET_EXT_CONFIG,
+	SNP_GET_EXT_CONFIG,
 
 	SEV_MAX,
 };
@@ -201,6 +203,21 @@ struct sev_user_data_snp_config {
 	__u8 rsvd1[52];
 } __packed;
 
+/**
+ * struct sev_data_snp_ext_config - system wide configuration value for SNP.
+ *
+ * @config_address: address of the struct sev_user_data_snp_config or 0 when
+ *		reported_tcb does not need to be updated.
+ * @certs_address: address of extended guest request certificate chain or
+ *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
+ * @certs_len: length of the certs
+ */
+struct sev_user_data_ext_snp_config {
+	__u64 config_address;		/* In */
+	__u64 certs_address;		/* In */
+	__u32 certs_len;		/* In */
+};
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 49/51] x86/sev: Add KVM commands for per-instance certs
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (47 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 48/51] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 50/51] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 51/51] crypto: ccp: Add debug support for decrypting pages Michael Roth
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Dionna Glaze, Tom Lendacky,
	Alexey Kardashevskiy

From: Dionna Glaze <dionnaglaze@google.com>

The /dev/sev device has the ability to store host-wide certificates for
the key used by the AMD-SP for SEV-SNP attestation report signing,
but for hosts that want to specify additional certificates that are
specific to the image launched in a VM, a different way is needed to
communicate those certificates.

Add two new KVM ioctl to handle this: KVM_SEV_SNP_{GET,SET}_CERTS

The certificates that are set with this command are expected to follow
the same format as the host certificates, but that format is opaque
to the kernel.

The new behavior for custom certificates is that the extended guest
request command will now return the overridden certificates if they
were installed for the instance. The error condition for a too small
data buffer is changed to return the overridden certificate data size
if there is an overridden certificate set installed.

Setting a 0 length certificate returns the system state to only return
the host certificates on an extended guest request.

Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
for an extra certificate.

Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: remove used of "we" and "this patch" in commit log, squash in
      documentation patch]
Signed-off-by: Michael Roth <michael.roth@amd.com>
[aik: snp_handle_ext_guest_request() now uses the CCP's cert object
      without copying things over, only refcounting needed.]
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  44 +++++++
 arch/x86/kvm/svm/sev.c                        | 115 ++++++++++++++++++
 arch/x86/kvm/svm/svm.h                        |   1 +
 include/linux/psp-sev.h                       |   2 +-
 include/uapi/linux/kvm.h                      |  12 ++
 5 files changed, 173 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index cd77a19577fe..21c1894d78ef 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -537,6 +537,50 @@ Returns: 0 on success, -negative on error
 
 See SEV-SNP specification for further details on launch finish input parameters.
 
+22. KVM_SEV_SNP_GET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_GET_CERTS command
+can be issued to request the data that has been installed with the
+KVM_SEV_SNP_SET_CERTS command.
+
+Parameters (in/out): struct kvm_sev_snp_get_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+	struct kvm_sev_snp_get_certs {
+		__u64 certs_uaddr;
+		__u64 certs_len
+	};
+
+If no certs have been installed, then the return value is -ENOENT.
+If the buffer specified in the struct is too small, the certs_len field will be
+overwritten with the required bytes to receive all the certificate bytes and the
+return value will be -EINVAL.
+
+23. KVM_SEV_SNP_SET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_SET_CERTS command
+can be issued to override the /dev/sev certs data that is returned when a
+guest issues an extended guest request. This is useful for instance-specific
+extensions to the host certificates.
+
+Parameters (in/out): struct kvm_sev_snp_set_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+	struct kvm_sev_snp_set_certs {
+		__u64 certs_uaddr;
+		__u64 certs_len
+	};
+
+The certs_len field may not exceed SEV_FW_BLOB_MAX_SIZE.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index adbe8c242d81..bdf32aa971d8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2277,6 +2277,113 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_get_certs params;
+	struct sev_snp_certs *snp_certs;
+	int rc = 0;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			   sizeof(params)))
+		return -EFAULT;
+
+	snp_certs = sev_snp_certs_get(sev->snp_certs);
+	/* No instance certs set. */
+	if (!snp_certs)
+		return -ENOENT;
+
+	if (params.certs_len < sev->snp_certs->len) {
+		/* Output buffer too small. Return the required size. */
+		params.certs_len = sev->snp_certs->len;
+
+		if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
+				 sizeof(params)))
+			rc = -EFAULT;
+		else
+			rc = -EINVAL; /* May be ENOSPC? */
+	} else {
+		if (copy_to_user((void __user *)(uintptr_t)params.certs_uaddr,
+				 snp_certs->data, snp_certs->len))
+			rc = -EFAULT;
+	}
+
+	sev_snp_certs_put(snp_certs);
+
+	return rc;
+}
+
+static void snp_replace_certs(struct kvm_sev_info *sev, struct sev_snp_certs *snp_certs)
+{
+	sev_snp_certs_put(sev->snp_certs);
+	sev->snp_certs = snp_certs;
+}
+
+static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	unsigned long length = SEV_FW_BLOB_MAX_SIZE;
+	struct kvm_sev_snp_set_certs params;
+	struct sev_snp_certs *snp_certs;
+	void *to_certs;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			   sizeof(params)))
+		return -EFAULT;
+
+	if (params.certs_len > SEV_FW_BLOB_MAX_SIZE)
+		return -EINVAL;
+
+	/*
+	 * Setting a length of 0 is the same as "uninstalling" instance-
+	 * specific certificates.
+	 */
+	if (params.certs_len == 0) {
+		snp_replace_certs(sev, NULL);
+		return 0;
+	}
+
+	/* Page-align the length */
+	length = ALIGN(params.certs_len, PAGE_SIZE);
+
+	to_certs = kmalloc(length, GFP_KERNEL | __GFP_ZERO);
+	if (!to_certs)
+		return -ENOMEM;
+
+	if (copy_from_user(to_certs,
+			   (void __user *)(uintptr_t)params.certs_uaddr,
+			   params.certs_len)) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	snp_certs = sev_snp_certs_new(to_certs, length);
+	if (!snp_certs) {
+		ret = -ENOMEM;
+		goto error_exit;
+	}
+
+	snp_replace_certs(sev, snp_certs);
+
+	return 0;
+error_exit:
+	kfree(to_certs);
+	return ret;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2376,6 +2483,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_FINISH:
 		r = snp_launch_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_GET_CERTS:
+		r = snp_get_instance_certs(kvm, &sev_cmd);
+		break;
+	case KVM_SEV_SNP_SET_CERTS:
+		r = snp_set_instance_certs(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2591,6 +2704,8 @@ static int snp_decommission_context(struct kvm *kvm)
 	snp_free_firmware_page(sev->snp_context);
 	sev->snp_context = NULL;
 
+	sev_snp_certs_put(sev->snp_certs);
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0d4c29a4300a..72be0c440b16 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -95,6 +95,7 @@ struct kvm_sev_info {
 	u64 snp_init_flags;
 	void *snp_context;      /* SNP guest context page */
 	u64 sev_features;	/* Features set at VMSA creation */
+	struct sev_snp_certs *snp_certs;
 };
 
 struct kvm_svm {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 2191d8b5423a..7b65dd5808a1 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -22,7 +22,7 @@
 #define __psp_pa(x)	__pa(x)
 #endif
 
-#define SEV_FW_BLOB_MAX_SIZE	0x4000	/* 16KB */
+#define SEV_FW_BLOB_MAX_SIZE	0x5000	/* 20KB */
 
 struct sev_snp_certs {
 	void *data;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 175b958f103f..fa1c300303d6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1937,6 +1937,8 @@ enum sev_cmd_id {
 	KVM_SEV_SNP_LAUNCH_START,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
 	KVM_SEV_SNP_LAUNCH_FINISH,
+	KVM_SEV_SNP_GET_CERTS,
+	KVM_SEV_SNP_SET_CERTS,
 
 	KVM_SEV_NR_MAX,
 };
@@ -2084,6 +2086,16 @@ struct kvm_sev_snp_launch_finish {
 	__u8 pad[6];
 };
 
+struct kvm_sev_snp_get_certs {
+	__u64 certs_uaddr;
+	__u64 certs_len;
+};
+
+struct kvm_sev_snp_set_certs {
+	__u64 certs_uaddr;
+	__u64 certs_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 50/51] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (48 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 49/51] x86/sev: Add KVM commands for per-instance certs Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  2023-06-12  4:25 ` [PATCH RFC v9 51/51] crypto: ccp: Add debug support for decrypting pages Michael Roth
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh, Alexey Kardashevskiy

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of GHCB specification added the support for two SNP Guest
Request Message NAE events. The events allows for an SEV-SNP guest to
make request to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.

The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
difference of an additional certificate blob that can be passed through
the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
provides snp_guest_ext_guest_request() that is used by the KVM to get
both the report and certificate data at once.

Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: ensure FW command failures are indicated to guest]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c       | 175 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h       |   1 +
 drivers/crypto/ccp/sev-dev.c |  15 +++
 include/linux/psp-sev.h      |   1 +
 4 files changed, 192 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bdf32aa971d8..9f7defce1988 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -328,6 +328,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		ret = verify_snp_init_flags(kvm, argp);
 		if (ret)
 			goto e_free;
+
+		mutex_init(&sev->guest_req_lock);
 	}
 
 	ret = sev_platform_init(&argp->error);
@@ -2321,8 +2323,10 @@ static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static void snp_replace_certs(struct kvm_sev_info *sev, struct sev_snp_certs *snp_certs)
 {
+	mutex_lock(&sev->guest_req_lock);
 	sev_snp_certs_put(sev->snp_certs);
 	sev->snp_certs = snp_certs;
+	mutex_unlock(&sev->guest_req_lock);
 }
 
 static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
@@ -3171,6 +3175,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
 	case SVM_VMGEXIT_PSC:
+	case SVM_VMGEXIT_GUEST_REQUEST:
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3604,6 +3610,163 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
 	return ret;
 }
 
+static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
+					 struct sev_data_snp_guest_request *data,
+					 gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	kvm_pfn_t req_pfn, resp_pfn;
+	struct kvm_sev_info *sev;
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+		return SEV_RET_INVALID_PARAM;
+
+	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+	if (is_error_noslot_pfn(req_pfn))
+		return SEV_RET_INVALID_ADDRESS;
+
+	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+	if (is_error_noslot_pfn(resp_pfn))
+		return SEV_RET_INVALID_ADDRESS;
+
+	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+		return SEV_RET_INVALID_ADDRESS;
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+	data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+	return 0;
+}
+
+static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
+{
+	u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+	int ret;
+
+	ret = snp_page_reclaim(pfn);
+	if (ret)
+		*rc = SEV_RET_INVALID_ADDRESS;
+
+	ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+	if (ret)
+		*rc = SEV_RET_INVALID_ADDRESS;
+}
+
+static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct sev_data_snp_guest_request data = {0};
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_sev_info *sev;
+	unsigned long rc;
+	int err;
+
+	if (!sev_snp_guest(vcpu->kvm)) {
+		rc = SEV_RET_INVALID_GUEST;
+		goto e_fail;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	mutex_lock(&sev->guest_req_lock);
+
+	rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
+	if (rc)
+		goto unlock;
+
+	rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+	if (rc)
+		/* Ensure an error value is returned to guest. */
+		rc = err ? err : SEV_RET_INVALID_ADDRESS;
+
+	snp_cleanup_guest_buf(&data, &rc);
+
+unlock:
+	mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
+}
+
+static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct sev_data_snp_guest_request req = {0};
+	struct sev_snp_certs *snp_certs = NULL;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long data_npages;
+	struct kvm_sev_info *sev;
+	unsigned long exitcode = 0;
+	u64 data_gpa;
+	int err, rc;
+
+	if (!sev_snp_guest(vcpu->kvm)) {
+		rc = SEV_RET_INVALID_GUEST;
+		goto e_fail;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+	if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
+		exitcode = SEV_RET_INVALID_ADDRESS;
+		goto e_fail;
+	}
+
+	mutex_lock(&sev->guest_req_lock);
+
+	rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
+	if (rc)
+		goto unlock;
+
+	/*
+	 * If a VMM-specific certificate blob hasn't been provided, grab the
+	 * host-wide one.
+	 */
+	snp_certs = sev_snp_certs_get(sev->snp_certs);
+	if (!snp_certs)
+		snp_certs = sev_snp_global_certs_get();
+
+	/*
+	 * If there is a host-wide or VMM-specific certificate blob available,
+	 * make sure the guest has allocated enough space to store it.
+	 * Otherwise, inform the guest how much space is needed.
+	 */
+	if (snp_certs && (data_npages << PAGE_SHIFT) < snp_certs->len) {
+		vcpu->arch.regs[VCPU_REGS_RBX] = snp_certs->len >> PAGE_SHIFT;
+		exitcode = SNP_GUEST_REQ_INVALID_LEN;
+		goto cleanup;
+	}
+
+	rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req, &err);
+	if (rc) {
+		/* pass the firmware error code */
+		exitcode = err;
+		goto cleanup;
+	}
+
+	/* Copy the certificate blob in the guest memory */
+	if (sev->snp_certs &&
+	    kvm_write_guest(kvm, data_gpa, sev->snp_certs->data, sev->snp_certs->len))
+		exitcode = SEV_RET_INVALID_ADDRESS;
+
+cleanup:
+	sev_snp_certs_put(snp_certs);
+	snp_cleanup_guest_buf(&req, &exitcode);
+
+unlock:
+	mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3863,6 +4026,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 						SVM_EVTINJ_VALID);
 		}
 
+		ret = 1;
+		break;
+	case SVM_VMGEXIT_GUEST_REQUEST:
+		snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
+
+		ret = 1;
+		break;
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+		snp_handle_ext_guest_request(svm,
+					     control->exit_info_1,
+					     control->exit_info_2);
+
 		ret = 1;
 		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 72be0c440b16..31cd8b3e6877 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -96,6 +96,7 @@ struct kvm_sev_info {
 	void *snp_context;      /* SNP guest context page */
 	u64 sev_features;	/* Features set at VMSA creation */
 	struct sev_snp_certs *snp_certs;
+	struct mutex guest_req_lock; /* Lock for guest request handling */
 };
 
 struct kvm_svm {
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 175c24163ba0..096ba15d0740 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2109,6 +2109,21 @@ void sev_snp_certs_put(struct sev_snp_certs *certs)
 }
 EXPORT_SYMBOL_GPL(sev_snp_certs_put);
 
+struct sev_snp_certs *sev_snp_global_certs_get(void)
+{
+	struct sev_device *sev;
+
+	if (!psp_master || !psp_master->sev_data)
+		return NULL;
+
+	sev = psp_master->sev_data;
+	if (!sev->snp_initialized)
+		return NULL;
+
+	return sev_snp_certs_get(sev->snp_certs);
+}
+EXPORT_SYMBOL_GPL(sev_snp_global_certs_get);
+
 static void sev_exit(struct kref *ref)
 {
 	misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 7b65dd5808a1..1235eb3110cb 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -33,6 +33,7 @@ struct sev_snp_certs {
 struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len);
 struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs);
 void sev_snp_certs_put(struct sev_snp_certs *certs);
+struct sev_snp_certs *sev_snp_global_certs_get(void);
 
 /**
  * SEV platform state
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH RFC v9 51/51] crypto: ccp: Add debug support for decrypting pages
  2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (49 preceding siblings ...)
  2023-06-12  4:25 ` [PATCH RFC v9 50/51] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
@ 2023-06-12  4:25 ` Michael Roth
  50 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-12  4:25 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Add support to decrypt guest encrypted memory. These API interfaces can
be used for example to dump VMCBs on SNP guest exit.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: minor commit fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 32 ++++++++++++++++++++++++++++++++
 include/linux/psp-sev.h      | 19 +++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 096ba15d0740..3c8cd2d20016 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2061,6 +2061,38 @@ int sev_guest_df_flush(int *error)
 }
 EXPORT_SYMBOL_GPL(sev_guest_df_flush);
 
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+	struct sev_data_snp_dbg data = {0};
+	struct sev_device *sev;
+	int ret;
+
+	if (!psp_master || !psp_master->sev_data)
+		return -ENODEV;
+
+	sev = psp_master->sev_data;
+
+	if (!sev->snp_initialized)
+		return -EINVAL;
+
+	data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
+	data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
+	data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);
+
+	/* The destination page must be in the firmware state. */
+	if (rmp_mark_pages_firmware(data.dst_addr, 1, false))
+		return -EIO;
+
+	ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);
+
+	/* Restore the page state */
+	if (snp_reclaim_pages(data.dst_addr, 1, false))
+		ret = -EIO;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
+
 static void sev_snp_certs_release(struct kref *kref)
 {
 	struct sev_snp_certs *certs = container_of(kref, struct sev_snp_certs, kref);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 1235eb3110cb..55f6dfc2580d 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -916,6 +916,20 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
  */
 int sev_do_cmd(int cmd, void *data, int *psp_ret);
 
+/**
+ * snp_guest_dbg_decrypt_page - perform SEV SNP_DBG_DECRYPT command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV    if the SEV device is not available
+ * -%ENOTSUPP  if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO       if the SEV returned a non-zero return code
+ */
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error);
+
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 void *snp_alloc_firmware_page(gfp_t mask);
 void snp_free_firmware_page(void *addr);
@@ -946,6 +960,11 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
 
 static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
 
+static inline int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+	return -ENODEV;
+}
+
 static inline void *snp_alloc_firmware_page(gfp_t mask)
 {
 	return NULL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile
  2023-06-12  4:25 ` [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile Michael Roth
@ 2023-06-12  7:07   ` Kirill A . Shutemov
  2023-06-20 12:09   ` Borislav Petkov
  2023-07-10  3:05   ` Sathyanarayanan Kuppuswamy
  2 siblings, 0 replies; 102+ messages in thread
From: Kirill A . Shutemov @ 2023-06-12  7:07 UTC (permalink / raw)
  To: Michael Roth, bp
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang, kai.huang, isaku.yamahata

On Sun, Jun 11, 2023 at 11:25:13PM -0500, Michael Roth wrote:
> Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
> arch/x86/coco, but that is generally only applicable for guest support.
> 
> For SEV-SNP, helpers related purely to host support will also live in
> arch/x86/coco. To allow for CoCo-related host support code in
> arch/x86/coco, move that check down into the Makefile and check for it
> specifically when needed.

Hm. TDX host support uses arch/x86/virt/vmx/tdx/. I think we need to be
consistent here.

IIRC, Borislav proposed the scheme that TDX uses.


-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 02/51] KVM: x86: Add gmem hook for invalidating private memory
  2023-06-12  4:25 ` [PATCH RFC v9 02/51] KVM: x86: Add gmem hook for invalidating " Michael Roth
@ 2023-06-12 10:49   ` Borislav Petkov
  2023-06-19 13:39     ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2023-06-12 10:49 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang

On Sun, Jun 11, 2023 at 11:25:10PM -0500, Michael Roth wrote:
> TODO: add a CONFIG option that can be to completely skip arch
> invalidation loop and avoid __weak references for arch/platforms that
> don't need an additional invalidation hook.
> 
> In some cases, like with SEV-SNP, guest memory needs to be updated in a
> platform-specific manner before it can be safely freed back to the host.
> Add hooks to wire up handling of this sort when freeing memory in
> response to FALLOC_FL_PUNCH_HOLE operations.
> 
> Also issue invalidations of all allocated pages when releasing the gmem
> file so that the pages are not left in an unusable state when they get
> freed back to the host.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/x86.c                 |  6 ++++
>  include/linux/kvm_host.h           |  3 ++
>  virt/kvm/guest_mem.c               | 48 ++++++++++++++++++++++++++++--
>  5 files changed, 57 insertions(+), 2 deletions(-)

ld: arch/x86/kvm/../../../virt/kvm/eventfd.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/binary_stats.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/arch/x86/kvm/../../../virt/kvm/binary_stats.c:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/vfio.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/coalesced_mmio.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/async_pf.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/irqchip.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/dirty_ring.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/../../../virt/kvm/pfncache.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/x86.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/emulate.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/i8259.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/irq.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/lapic.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/i8254.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/ioapic.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/irq_comm.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/cpuid.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/pmu.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mtrr.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/hyperv.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/debugfs.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/mmu.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/page_track.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/spte.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/tdp_iter.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/arch/x86/kvm/mmu/tdp_iter.c:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/mmu/tdp_mmu.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
ld: arch/x86/kvm/smm.o: in function `kvm_arch_gmem_invalidate':
/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
make[3]: *** [scripts/Makefile.build:452: arch/x86/kvm/kvm.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [scripts/Makefile.build:494: arch/x86/kvm] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [scripts/Makefile.build:494: arch/x86] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:2028: .] Error 2

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support
  2023-06-12  4:25 ` [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support Michael Roth
@ 2023-06-12 15:34   ` Dave Hansen
  2023-06-21  9:15     ` Borislav Petkov
  2023-06-21  9:42   ` Borislav Petkov
  2023-08-09 13:03   ` Jeremi Piotrowski
  2 siblings, 1 reply; 102+ messages in thread
From: Dave Hansen @ 2023-06-12 15:34 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

On 6/11/23 21:25, Michael Roth wrote:
> +	/*
> +	 * Calculate the amount the memory that must be reserved by the BIOS to
> +	 * address the whole RAM, including the bookkeeping area. The RMP itself
> +	 * must also be covered.
> +	 */
> +	max_rmp_pfn = max_pfn;
> +	if (PHYS_PFN(rmp_end) > max_pfn)
> +		max_rmp_pfn = PHYS_PFN(rmp_end);

Could you say a little here about how this deals with memory hotplug?


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
  2023-06-12  4:25 ` [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
@ 2023-06-12 15:39   ` Dave Hansen
  2023-07-18 22:34     ` Kim Phillips
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Hansen @ 2023-06-12 15:39 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Kim Phillips

On 6/11/23 21:25, Michael Roth wrote:
> A hardware limitation prevents the host from enabling Automatic IBRS
> when SNP is enabled.  Instead, fall back to retpolines.

"Hardware limitation"?  As in, it is a documented, architectural
restriction?  Or, it's a CPU bug?

> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index f9d060e71c3e..3fba3623ff64 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1507,7 +1507,12 @@ static void __init spectre_v2_select_mitigation(void)
>  
>  	if (spectre_v2_in_ibrs_mode(mode)) {
>  		if (boot_cpu_has(X86_FEATURE_AUTOIBRS)) {
> -			msr_set_bit(MSR_EFER, _EFER_AUTOIBRS);
> +			if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
> +				msr_set_bit(MSR_EFER, _EFER_AUTOIBRS);
> +			} else {
> +				pr_err("SNP feature available, not enabling AutoIBRS on the host.\n");
> +				mode = spectre_v2_select_retpoline();
> +			}

I think this would be nicer if you did something like:

	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
		setup_clear_cpu_cap(X86_FEATURE_AUTOIBRS);

somewhere _else_ in the code instead of smack-dab in the middle of the
mitigation selection.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 09/51] x86/sev: Add RMP entry lookup helpers
  2023-06-12  4:25 ` [PATCH RFC v9 09/51] x86/sev: Add RMP entry lookup helpers Michael Roth
@ 2023-06-12 16:08   ` Dave Hansen
  2023-06-30 21:57     ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Hansen @ 2023-06-12 16:08 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

On 6/11/23 21:25, Michael Roth wrote:
> +/*
> + * The RMP entry format is not architectural. The format is defined in PPR
> + * Family 19h Model 01h, Rev B1 processor.
> + */
> +struct rmpentry {
> +	union {
> +		struct {
> +			u64	assigned	: 1,
> +				pagesize	: 1,
> +				immutable	: 1,
> +				rsvd1		: 9,
> +				gpa		: 39,
> +				asid		: 10,
> +				vmsa		: 1,
> +				validated	: 1,
> +				rsvd2		: 1;
> +		} info;
> +		u64 low;
> +	};
> +	u64 high;
> +} __packed;

What's 'high' used for?  The PPR says it's reserved.  Why not call it
reserved?

It _looks_ like it's only used for a debugging pr_info().  It makes the
struct look kinda goofy.  I'd much rather limit the goofiness to the
"dumping" code, like:

     u64 *__e = (void *)e;
     ....
     pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
                               pfn << PAGE_SHIFT, __e[0], __e[1]);

BTW, why does it do any good to dump all these reserved fields?

>  /*
>   * The first 16KB from the RMP_BASE is used by the processor for the
>   * bookkeeping, the range needs to be added during the RMP entry lookup.
>   */
>  #define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
> +#define RMPENTRY_SHIFT			8
> +#define rmptable_page_offset(x)	(RMPTABLE_CPU_BOOKKEEPING_SZ +		\
> +				 (((unsigned long)x) >> RMPENTRY_SHIFT))

Is RMPENTRY_SHIFT just log2(sizeof(struct rmpentry))?  If so, wouldn't
this be a *LOT* more obvious to be written:

unsigned long rmptable_offset(unsigned long paddr)
{
	return 	RMPTABLE_CPU_BOOKKEEPING_SZ +
		paddr / sizeof(struct rmpentry);
}

??

It removes a cast, gives 'x' a real name, removes a random magic number
#define and is clearer to boot.

>  static unsigned long rmptable_start __ro_after_init;
>  static unsigned long rmptable_end __ro_after_init;
> @@ -210,3 +235,63 @@ static int __init snp_rmptable_init(void)
>   * the page(s) used for DMA are hypervisor owned.
>   */
>  fs_initcall(snp_rmptable_init);
> +
> +static inline unsigned int rmpentry_assigned(const struct rmpentry *e)
> +{
> +	return e->info.assigned;
> +}
> +
> +static inline unsigned int rmpentry_pagesize(const struct rmpentry *e)
> +{
> +	return e->info.pagesize;
> +}

I think these are a little silly.  If you're going to go to the trouble
of using bitfields, you don't need helpers for every bitfield.  I say
either use bitfields without helpers or just regular old:

#define RMPENTRY_ASSIGNED_MASK	BIT(1)

and then *maybe* you can make helpers for them.

> +static int rmptable_entry(unsigned long paddr, struct rmpentry *entry)
> +{
> +	unsigned long vaddr;
> +
> +	vaddr = rmptable_start + rmptable_page_offset(paddr);
> +	if (unlikely(vaddr > rmptable_end))
> +		return -EFAULT;

This seems like more of a WARN_ON_ONCE() kind of thing than just an
error return.  Isn't it a big deal if an invalid (non-RMP-covered)
address makes it in here?

> +	*entry = *(struct rmpentry *)vaddr;
> +
> +	return 0;
> +}
> +
> +static int __snp_lookup_rmpentry(u64 pfn, struct rmpentry *entry, int *level)
> +{

I've been bitten by pfn vs. paddr quite a few times.  I'd really
encourage you to add it to the names if you're going to pass them
around, like __snp_lookup_rmpentry_pfn().

> +	unsigned long paddr = pfn << PAGE_SHIFT;
> +	struct rmpentry large_entry;
> +	int ret;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return -ENXIO;

OK, so if you're running on non-SNP hardware, you return -ENXIO here.
Remember this please.

> +	ret = rmptable_entry(paddr, entry);
> +	if (ret)
> +		return ret;
> +
> +	/* Read a large RMP entry to get the correct page level used in RMP entry. */
> +	ret = rmptable_entry(paddr & PMD_MASK, &large_entry);
> +	if (ret)
> +		return ret;
> +
> +	*level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(&large_entry));
> +
> +	return 0;
> +}

This is a bit weird.  Should it say something like this?

To do an 4k RMP lookup the hardware looks at two places in the RMP:

	1. The actual 4k RMP entry
	2. The 2M entry that "covers" the 4k entry

The page size is not indicated in the 4k entries at all.  It is solely
indicated by the 2M entry.

> +int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
> +{
> +	struct rmpentry e;
> +	int ret;
> +
> +	ret = __snp_lookup_rmpentry(pfn, &e, level);
> +	if (ret)
> +		return ret;
> +
> +	*assigned = !!rmpentry_assigned(&e);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index b8357d6ecd47..bf0378136289 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -171,4 +171,8 @@ struct snp_psc_desc {
>  #define GHCB_ERR_INVALID_INPUT		5
>  #define GHCB_ERR_INVALID_EVENT		6
>  
> +/* RMP page size */
> +#define RMP_PG_SIZE_4K			0
> +#define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> +
>  #endif
> diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
> new file mode 100644
> index 000000000000..30d47e20081d
> --- /dev/null
> +++ b/arch/x86/include/asm/sev-host.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * AMD SVM-SEV Host Support.
> + *
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + *
> + * Author: Ashish Kalra <ashish.kalra@amd.com>
> + *
> + */
> +
> +#ifndef __ASM_X86_SEV_HOST_H
> +#define __ASM_X86_SEV_HOST_H
> +
> +#include <asm/sev-common.h>
> +
> +#ifdef CONFIG_KVM_AMD_SEV
> +int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
> +#else
> +static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
> +#endif

Above, -ENXIO was returned when SEV-SNP was not supported.  Here, 0 is
returned when it is compiled out.  That inconsistent.

Is snp_lookup_rmpentry() acceptable when SEV-SNP is in play?  I'd like
to see consistency between when it is compiled out and when it is
compiled in but unsupported on the CPU.

> +#endif
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index d34c46db7dd1..446fc7a9f7b0 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -81,9 +81,6 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>  /* Software defined (when rFlags.CF = 1) */
>  #define PVALIDATE_FAIL_NOUPDATE		255
>  
> -/* RMP page size */
> -#define RMP_PG_SIZE_4K			0
> -
>  #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
>  
>  /* SNP Guest message request */


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 10/51] x86/fault: Add helper for dumping RMP entries
  2023-06-12  4:25 ` [PATCH RFC v9 10/51] x86/fault: Add helper for dumping RMP entries Michael Roth
@ 2023-06-12 16:12   ` Dave Hansen
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Hansen @ 2023-06-12 16:12 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

On 6/11/23 21:25, Michael Roth wrote:
> +	/*
> +	 * If the RMP entry at the faulting pfn was not assigned, then not sure
> +	 * what caused the RMP violation. To get some useful debug information,
> +	 * iterate through the entire 2MB region, and dump the RMP entries if
> +	 * one of the bit in the RMP entry is set.
> +	 */
> +	pfn = pfn & ~(PTRS_PER_PMD - 1);
> +	pfn_end = pfn + PTRS_PER_PMD;
> +
> +	while (pfn < pfn_end) {
> +		ret = __snp_lookup_rmpentry(pfn, &e, &level);
> +		if (ret) {
> +			pr_info("Failed to read RMP entry for PFN 0x%llx\n", pfn);
> +			pfn++;
> +			continue;
> +		}
> +
> +		if (e.low || e.high)
> +			pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> +				pfn << PAGE_SHIFT, e.high, e.low);
> +		pfn++;
> +	}
> +}

Dumping 511 lines of (possible) junk into the dmesg buffer seems a _bit_
rude here.  I can see dumping out the 2M RMP entry, but not the other 510.

This also destroys the information about which pfn was being targeted
for the dump in the first place.  That seems unfortunate.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 11/51] x86/traps: Define RMP violation #PF error code
  2023-06-12  4:25 ` [PATCH RFC v9 11/51] x86/traps: Define RMP violation #PF error code Michael Roth
@ 2023-06-12 16:26   ` Dave Hansen
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Hansen @ 2023-06-12 16:26 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

On 6/11/23 21:25, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Bit 31 in the page fault-error bit will be set when processor encounters
> an RMP violation.
> 
> While at it, use the BIT_ULL() macro.
...
>  enum x86_pf_error_code {
> -	X86_PF_PROT	=		1 << 0,
> -	X86_PF_WRITE	=		1 << 1,
> -	X86_PF_USER	=		1 << 2,
> -	X86_PF_RSVD	=		1 << 3,
> -	X86_PF_INSTR	=		1 << 4,
> -	X86_PF_PK	=		1 << 5,
> -	X86_PF_SGX	=		1 << 15,
> +	X86_PF_PROT	=		BIT(0),
> +	X86_PF_WRITE	=		BIT(1),
> +	X86_PF_USER	=		BIT(2),
> +	X86_PF_RSVD	=		BIT(3),
> +	X86_PF_INSTR	=		BIT(4),
> +	X86_PF_PK	=		BIT(5),
> +	X86_PF_SGX	=		BIT(15),
> +	X86_PF_RMP	=		BIT(31),
>  };

It would be nice if the changelog "BIT_ULL()" matched the code "BIT()".  :)

With that fixed,

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 12/51] x86/fault: Report RMP page faults for kernel addresses
  2023-06-12  4:25 ` [PATCH RFC v9 12/51] x86/fault: Report RMP page faults for kernel addresses Michael Roth
@ 2023-06-12 16:30   ` Dave Hansen
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Hansen @ 2023-06-12 16:30 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

On 6/11/23 21:25, Michael Roth wrote:
>  	dump_pagetable(address);
> +
> +	if (error_code & X86_PF_RMP) {
> +		unsigned int level;
> +		pgd_t *pgd;
> +		pte_t *pte;
> +
> +		pgd = __va(read_cr3_pa());
> +		pgd += pgd_index(address);
> +		pte = lookup_address_in_pgd(pgd, address, &level);
> +
> +		sev_dump_rmpentry(pte_pfn(*pte));
> +	}
>  }

It would be nice to trim this hunk down.  Can you make it:

	if (error_code & X86_PF_RMP)
		sev_dump_rmpentry(address);

and hide the rest of the logic in the helper?

Oh, and lookup_address_in_pgd() can return NULL.  It's not great to page
fault in the page fault handler.  Could you fix that up too, please?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 13/51] x86/fault: Handle RMP page faults for user addresses
  2023-06-12  4:25 ` [PATCH RFC v9 13/51] x86/fault: Handle RMP page faults for user addresses Michael Roth
@ 2023-06-12 16:40   ` Dave Hansen
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Hansen @ 2023-06-12 16:40 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen

On 6/11/23 21:25, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> When SEV-SNP is enabled globally, a write from the host is subject to
> checks performed by the hardware against the RMP table (APM2 15.36.10)
> at the end of a page walk:
> 
>   1. Assigned bit in the RMP table is not set (i.e page is shared).
>   2. Immutable bit in the RMP table is not set.
>   3. If the page table entry that gives the sPA indicates that the
>      target page size is a large page, then all RMP entries for the 4KB
>      constituting pages of the target must have the assigned bit 0.
> 
> Nothing constructive can come of an attempt by userspace to violate case
> 1) (which will result in writing garbage due to page encryption) or case
> 2) (userspace should not ever need or be allowed to write to a page that
> the host has specifically needed to mark immutable).

What does this _mean_?  If nothing constructive can come of it, what
does that mean for the kernel?

> Case 3) is dependent on the hypervisor. In case of KVM, due to how
> shared/private pages are partitioned into separate memory pools via
> restricted/guarded memory, there should never be a case where a page in
> the private pool overlaps with a shared page: either it is a
> hugepage-sized allocation and all the sub-pages are private, or it is a
> single-page allocation, in which case it cannot overlap with anything
> but itself.
> 
> Therefore, for all 3 cases, it is appropriate to simply kill the
> userspace process if it ever generates an RMP #PF. Implement that logic
> here.
...
> +	if (error_code & X86_PF_RMP) {
> +		pr_err("Unexpected RMP page fault for address 0x%lx, terminating process\n",
> +		       address);
> +		do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
> +		return;
> +	}
> +

This is special-snowflake code.  You're making the argument that an RMP
fault is a special snowflake and needs special handling.

Why should an RMP violation be any different than, say a write to a
read-only page (that also ends in signal delivery)?

I kinda dislike the entire changelog here.  I really don't know what
point it's making or what it is arguing.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 14/51] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2023-06-12  4:25 ` [PATCH RFC v9 14/51] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
@ 2023-06-12 17:00   ` Dave Hansen
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Hansen @ 2023-06-12 17:00 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

On 6/11/23 21:25, Michael Roth wrote:
> +/*
> + * Assign a page to guest using the RMPUPDATE instruction.
> + */
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
> +{
> +	struct rmp_state val;
> +
> +	pr_debug("%s: GPA: 0x%llx, PFN: 0x%llx, level: %d, immutable: %d\n",
> +		 __func__, gpa, pfn, level, immutable);

Is this needed *EVERY* time a page is assigned to a guest?  As in, if I
create a 4GB guest, I'll see a literal million of these pr_debug()s in
dmesg?


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 29/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  2023-06-12  4:25 ` [PATCH RFC v9 29/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
@ 2023-06-12 17:08   ` Peter Gonda
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Gonda @ 2023-06-12 17:08 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

> +
> +static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +       struct sev_data_snp_launch_start start = {0};
> +       struct kvm_sev_snp_launch_start params;
> +       int rc;
> +
> +       if (!sev_snp_guest(kvm))
> +               return -ENOTTY;
> +
> +       if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> +               return -EFAULT;
> +
> +       sev->snp_context = snp_context_create(kvm, argp);
> +       if (!sev->snp_context)
> +               return -ENOTTY;


I commented on a  previous series but I think the bug is still here. I
think users can repeatedly call KVM_SEV_SNP_LAUNCH_START to have KVM
keep allocating more snp_contexts above.

Should we check if the VM already has a |snp_context| and error out if so?

>
> +
> +       start.gctx_paddr = __psp_pa(sev->snp_context);
> +       start.policy = params.policy;
> +       memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
> +       rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
> +       if (rc)
> +               goto e_free_context;
> +
> +       sev->fd = argp->sev_fd;
> +       rc = snp_bind_asid(kvm, &argp->error);
> +       if (rc)
> +               goto e_free_context;
> +
> +       return 0;
> +
> +e_free_context:
> +       snp_decommission_context(kvm);
> +
> +       return rc;
> +}
> +

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 48/51] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2023-06-12  4:25 ` [PATCH RFC v9 48/51] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
@ 2023-06-13  6:24   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 102+ messages in thread
From: Alexey Kardashevskiy @ 2023-06-13  6:24 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh, Dionna Glaze



On 12/6/23 14:25, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
> system-wide configuration value for SNP guests. The information includes
> the TCB version string to be reported in guest attestation reports.
> 
> Version 2 of the GHCB specification adds an NAE (SNP extended guest
> request) that a guest can use to query the reports that include additional
> certificates.
> 
> In both cases, userspace provided additional data is included in the
> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
> command to give the certificate blob and the reported TCB version string
> at once. Note that the specification defines certificate blob with a
> specific GUID format; the userspace is responsible for building the
> proper certificate blob. The ioctl treats it an opaque blob.
> 
> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
> command that can be used to obtain the data programmed through the
> SNP_SET_EXT_CONFIG.
> 
> Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> Co-developed-by: Dionna Glaze <dionnaglaze@google.com>
> Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: squash in doc patch from Dionna]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   Documentation/virt/coco/sev-guest.rst |  27 ++++
>   drivers/crypto/ccp/sev-dev.c          | 178 ++++++++++++++++++++++++++
>   drivers/crypto/ccp/sev-dev.h          |   2 +
>   include/linux/psp-sev.h               |  10 ++
>   include/uapi/linux/psp-sev.h          |  17 +++
>   5 files changed, 234 insertions(+)
> 
> diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
> index 11ea67c944df..6cad4226c348 100644
> --- a/Documentation/virt/coco/sev-guest.rst
> +++ b/Documentation/virt/coco/sev-guest.rst
> @@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
>   status includes API major, minor version and more. See the SEV-SNP
>   specification for further details.
>   
> +2.5 SNP_SET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
> +reported TCB version in the attestation report. The command is similar to
> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
> +command also accepts an additional certificate blob defined in the GHCB
> +specification.
> +
> +If the certs_address is zero, then the previous certificate blob will deleted.
> +For more information on the certificate blob layout, see the GHCB spec
> +(extended guest request message).
> +
> +2.6 SNP_GET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
> +through the SNP_SET_EXT_CONFIG.
> +
>   3. SEV-SNP CPUID Enforcement
>   ============================
>   
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index b8e8c4da4025..175c24163ba0 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1491,6 +1491,10 @@ static int __sev_snp_shutdown_locked(int *error)
>   	data.length = sizeof(data);
>   	data.iommu_snp_shutdown = 1;
>   
> +	/* Free the memory used for caching the certificate data */
> +	sev_snp_certs_put(sev->snp_certs);
> +	sev->snp_certs = NULL;
> +
>   	wbinvd_on_all_cpus();
>   
>   retry:
> @@ -1829,6 +1833,126 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
>   	return ret;
>   }
>   
> +static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
> +{
> +	struct sev_device *sev = psp_master->sev_data;
> +	struct sev_user_data_ext_snp_config input;

input = {0} would do as well as the memset() below but shorter.

> +	struct sev_snp_certs *snp_certs;
> +	int ret;
> +
> +	if (!sev->snp_initialized || !argp->data)
> +		return -EINVAL;
> +
> +	memset(&input, 0, sizeof(input));


but this memset() seems useless anyway because of copy_from_user() below.

> +
> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> +		return -EFAULT;
> +
> +	/* Copy the TCB version programmed through the SET_CONFIG to userspace */
> +	if (input.config_address) {
> +		if (copy_to_user((void * __user)input.config_address,
> +				 &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
> +			return -EFAULT;
> +	}
> +
> +	snp_certs = sev_snp_certs_get(sev->snp_certs);
> +
> +	/* Copy the extended certs programmed through the SNP_SET_CONFIG */
> +	if (input.certs_address && snp_certs) {
> +		if (input.certs_len < snp_certs->len) {
> +			/* Return the certs length to userspace */
> +			input.certs_len = snp_certs->len;
> +
> +			ret = -EIO;
> +			goto e_done;
> +		}
> +
> +		if (copy_to_user((void * __user)input.certs_address,
> +				 snp_certs->data, snp_certs->len)) {
> +			ret = -EFAULT;
> +			goto put_exit;
> +		}
> +	}
> +
> +	ret = 0;
> +
> +e_done:
> +	if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
> +		ret = -EFAULT;
> +
> +put_exit:
> +	sev_snp_certs_put(snp_certs);
> +
> +	return ret;
> +}
> +
> +static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
> +{
> +	struct sev_device *sev = psp_master->sev_data;
> +	struct sev_user_data_ext_snp_config input;
> +	struct sev_user_data_snp_config config;
> +	struct sev_snp_certs *snp_certs = NULL;
> +	void *certs = NULL;
> +	int ret = 0;

This '0' is not used below - it is always overwritten when "goto e_free" 
and the good exit is "return 0" and not "return ret". I'd suggest either 
not initializing it and letting gcc barf when some future change uses 
it, or initialize to something like -EFAULT.


> +
> +	if (!sev->snp_initialized || !argp->data)
> +		return -EINVAL;
> +
> +	if (!writable)
> +		return -EPERM;
> +
> +	memset(&input, 0, sizeof(input));

same here.

> +
> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> +		return -EFAULT;
> +
> +	/* Copy the certs from userspace */
> +	if (input.certs_address) {
> +		if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
> +			return -EINVAL;
> +
> +		certs = psp_copy_user_blob(input.certs_address, input.certs_len);
> +		if (IS_ERR(certs))
> +			return PTR_ERR(certs);
> +	}
> +
> +	/* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
> +	if (input.config_address) {
> +		memset(&config, 0, sizeof(config));


and here.

> +		if (copy_from_user(&config,
> +				   (void __user *)input.config_address, sizeof(config))) {
> +			ret = -EFAULT;
> +			goto e_free;
> +		}
> +
> +		ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
> +		if (ret)
> +			goto e_free;
> +
> +		memcpy(&sev->snp_config, &config, sizeof(config));
> +	}
> +
> +	/*
> +	 * If the new certs are passed then cache it else free the old certs.
> +	 */
> +	if (input.certs_len) {
> +		snp_certs = sev_snp_certs_new(certs, input.certs_len);
> +		if (!snp_certs) {
> +			ret = -ENOMEM;
> +			goto e_free;
> +		}
> +	}
> +
> +	sev_snp_certs_put(sev->snp_certs);
> +	sev->snp_certs = snp_certs;
> +
> +	return 0;
> +
> +e_free:
> +	kfree(certs);
> +	return ret;
> +}
> +
>   static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>   {
>   	void __user *argp = (void __user *)arg;
> @@ -1883,6 +2007,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>   	case SNP_PLATFORM_STATUS:
>   		ret = sev_ioctl_snp_platform_status(&input);
>   		break;
> +	case SNP_SET_EXT_CONFIG:
> +		ret = sev_ioctl_snp_set_config(&input, writable);
> +		break;
> +	case SNP_GET_EXT_CONFIG:
> +		ret = sev_ioctl_snp_get_config(&input);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		goto out;
> @@ -1931,6 +2061,54 @@ int sev_guest_df_flush(int *error)
>   }
>   EXPORT_SYMBOL_GPL(sev_guest_df_flush);
>   
> +static void sev_snp_certs_release(struct kref *kref)
> +{
> +	struct sev_snp_certs *certs = container_of(kref, struct sev_snp_certs, kref);
> +
> +	kfree(certs->data);
> +	kfree(certs);
> +}
> +
> +struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len)
> +{
> +	struct sev_snp_certs *certs;
> +
> +	if (!len || !data)
> +		return NULL;
> +
> +	certs = kzalloc(sizeof(*certs), GFP_KERNEL);
> +	if (!certs)
> +		return NULL;
> +
> +	certs->data = data;
> +	certs->len = len;
> +	kref_init(&certs->kref);
> +
> +	return certs;
> +}
> +EXPORT_SYMBOL_GPL(sev_snp_certs_new);
> +
> +struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs)
> +{
> +	if (!certs)
> +		return NULL;
> +
> +	if (!kref_get_unless_zero(&certs->kref))
> +		return NULL;
> +
> +	return certs;
> +}
> +EXPORT_SYMBOL_GPL(sev_snp_certs_get);
> +
> +void sev_snp_certs_put(struct sev_snp_certs *certs)
> +{
> +	if (!certs)
> +		return;
> +
> +	kref_put(&certs->kref, sev_snp_certs_release);
> +}
> +EXPORT_SYMBOL_GPL(sev_snp_certs_put);
> +
>   static void sev_exit(struct kref *ref)
>   {
>   	misc_deregister(&misc_dev->misc);
> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
> index 19d79f9d4212..22374f3d3e2e 100644
> --- a/drivers/crypto/ccp/sev-dev.h
> +++ b/drivers/crypto/ccp/sev-dev.h
> @@ -66,6 +66,8 @@ struct sev_device {
>   
>   	bool snp_initialized;
>   	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
> +	struct sev_snp_certs *snp_certs;
> +	struct sev_user_data_snp_config snp_config;
>   };
>   
>   int sev_dev_init(struct psp_device *psp);
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 5ae61de96e44..2191d8b5423a 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -24,6 +24,16 @@
>   
>   #define SEV_FW_BLOB_MAX_SIZE	0x4000	/* 16KB */
>   
> +struct sev_snp_certs {
> +	void *data;
> +	u32 len;
> +	struct kref kref;
> +};
> +
> +struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len);
> +struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs);
> +void sev_snp_certs_put(struct sev_snp_certs *certs);
> +
>   /**
>    * SEV platform state
>    */
> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
> index 4dc6a3e7b3d5..d1e6a0615546 100644
> --- a/include/uapi/linux/psp-sev.h
> +++ b/include/uapi/linux/psp-sev.h
> @@ -29,6 +29,8 @@ enum {
>   	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
>   	SEV_GET_ID2,
>   	SNP_PLATFORM_STATUS,
> +	SNP_SET_EXT_CONFIG,
> +	SNP_GET_EXT_CONFIG,
>   
>   	SEV_MAX,
>   };
> @@ -201,6 +203,21 @@ struct sev_user_data_snp_config {
>   	__u8 rsvd1[52];
>   } __packed;
>   
> +/**
> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
> + *
> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
> + *		reported_tcb does not need to be updated.
> + * @certs_address: address of extended guest request certificate chain or
> + *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
> + * @certs_len: length of the certs
> + */
> +struct sev_user_data_ext_snp_config {
> +	__u64 config_address;		/* In */
> +	__u64 certs_address;		/* In */
> +	__u32 certs_len;		/* In */
> +};

__packed or padding missing (there are other places like btw, I remember 
seeing quite a few of those). Thanks,


> +
>   /**
>    * struct sev_issue_cmd - SEV ioctl parameters
>    *

-- 
Alexey

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 03/51] KVM: x86: Use full 64-bit error code for kvm_mmu_do_page_fault
  2023-06-12  4:25 ` [PATCH RFC v9 03/51] KVM: x86: Use full 64-bit error code for kvm_mmu_do_page_fault Michael Roth
@ 2023-06-14 14:24   ` Isaku Yamahata
  0 siblings, 0 replies; 102+ messages in thread
From: Isaku Yamahata @ 2023-06-14 14:24 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, isaku.yamahata

On Sun, Jun 11, 2023 at 11:25:11PM -0500,
Michael Roth <michael.roth@amd.com> wrote:

> The upper bits will be needed in some cases to distinguish between
> nested page faults for private/shared pages, so pass along the full
> 64-bit value.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/mmu/mmu.c          | 3 +--
>  arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c54672ad6cbc..0d3983b9aa7e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5829,8 +5829,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>  	}
>  
>  	if (r == RET_PF_INVALID) {
> -		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
> -					  lower_32_bits(error_code), false,
> +		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
>  					  &emulation_type);
>  		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
>  			return -EIO;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index f1786698ae00..780b91e1da9f 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -283,11 +283,11 @@ enum {
>  };
>  
>  static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> -					u32 err, bool prefetch, int *emulation_type)
> +					u64 err, bool prefetch, int *emulation_type)
>  {
>  	struct kvm_page_fault fault = {
>  		.addr = cr2_or_gpa,
> -		.error_code = err,
> +		.error_code = lower_32_bits(err),
>  		.exec = err & PFERR_FETCH_MASK,
>  		.write = err & PFERR_WRITE_MASK,
>  		.present = err & PFERR_PRESENT_MASK,
> -- 
> 2.25.1

Hi. I'd like to pass around 64bit for TDX and come up with the following one.
Does it work for you?

From 53273b67e9be09129d35ac00cf9ce739d3fb4e2c Mon Sep 17 00:00:00 2001
Message-Id: <53273b67e9be09129d35ac00cf9ce739d3fb4e2c.1686752340.git.isaku.yamahata@intel.com>
From: Isaku Yamahata <isaku.yamahata@intel.com>
Date: Fri, 17 Mar 2023 12:58:42 -0700
Subject: [PATCH] KVM: x86/mmu: Pass round full 64-bit error code for the KVM
 page fault

In some cases the full 64-bit error code for the KVM page fault will be
needed to make this determination, so also update kvm_mmu_do_page_fault()
to accept the full 64-bit value so it can be plumbed through to the
callback.

The upper 32 bits of error code is discarded at kvm_mmu_page_fault()
by lower_32_bits().  Now it's passed down as full 64 bits. It turns out
that only FNAME(page_fault) depends on it.  Move lower_32_bits() into
FNAME(page_fault).

The accesses of fault->error_code are as follows
- FNAME(page_fault): change to explicitly use lower_32_bits()
- kvm_tdp_page_fault(): explicit mask with PFERR_LEVEL_MASK
- kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
                        PFERR_NESTED_GUEST_PAGE
- mmutrace: changed u32 -> u64
- pgprintk(): change %x -> %llx

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu.h              | 2 +-
 arch/x86/kvm/mmu/mmu.c          | 7 +++----
 arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
 arch/x86/kvm/mmu/mmutrace.h     | 2 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 4 ++--
 5 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6cc2558e977d..b6c9c2e27d0b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -176,7 +176,7 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
 }
 
 kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
-			       u32 error_code, int max_level);
+			       u64 error_code, int max_level);
 
 /*
  * Check if a given access (described through the I/D, W/R and U/S bits of a
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2c6d9d8d2c10..e4457eaa10a9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4941,7 +4941,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu,
 				struct kvm_page_fault *fault)
 {
-	pgprintk("%s: gva %llx error %x\n", __func__, fault->addr, fault->error_code);
+	pgprintk("%s: gva %llx error %llx\n", __func__, fault->addr, fault->error_code);
 
 	/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
 	fault->max_level = PG_LEVEL_2M;
@@ -5062,7 +5062,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 }
 
 kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
-			       u32 error_code, int max_level)
+			       u64 error_code, int max_level)
 {
 	int r;
 	struct kvm_page_fault fault = (struct kvm_page_fault) {
@@ -6317,8 +6317,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	}
 
 	if (r == RET_PF_INVALID) {
-		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
-					  lower_32_bits(error_code), false,
+		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
 					  &emulation_type);
 		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
 			return -EIO;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 2d8a5df56f20..979fd7c26610 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -342,7 +342,7 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 struct kvm_page_fault {
 	/* arguments to kvm_mmu_do_page_fault.  */
 	const gpa_t addr;
-	const u32 error_code;
+	const u64 error_code;
 	const bool prefetch;
 
 	/* Derived from error_code.  */
@@ -437,7 +437,7 @@ enum {
 };
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-					u32 err, bool prefetch, int *emulation_type)
+					u64 err, bool prefetch, int *emulation_type)
 {
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 2d7555381955..2e77883c92f6 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -261,7 +261,7 @@ TRACE_EVENT(
 	TP_STRUCT__entry(
 		__field(int, vcpu_id)
 		__field(gpa_t, cr2_or_gpa)
-		__field(u32, error_code)
+		__field(u64, error_code)
 		__field(u64 *, sptep)
 		__field(u64, old_spte)
 		__field(u64, new_spte)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index d87f95245ee9..8aeafd9178ac 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -758,7 +758,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	struct guest_walker walker;
 	int r;
 
-	pgprintk("%s: addr %llx err %x\n", __func__, fault->addr, fault->error_code);
+	pgprintk("%s: addr %llx err %llx\n", __func__, fault->addr, fault->error_code);
 	WARN_ON_ONCE(fault->is_tdp);
 
 	/*
@@ -767,7 +767,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * The bit needs to be cleared before walking guest page tables.
 	 */
 	r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
-			     fault->error_code & ~PFERR_RSVD_MASK);
+			     lower_32_bits(fault->error_code) & ~PFERR_RSVD_MASK);
 
 	/*
 	 * The page is not mapped by the guest.  Let the guest handle it.
-- 
2.25.1




-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-12  4:25 ` [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask Michael Roth
@ 2023-06-14 16:47   ` Isaku Yamahata
  2023-06-20 20:28     ` Michael Roth
  2023-06-19 16:27   ` Borislav Petkov
  1 sibling, 1 reply; 102+ messages in thread
From: Isaku Yamahata @ 2023-06-14 16:47 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, isaku.yamahata

On Sun, Jun 11, 2023 at 11:25:12PM -0500,
Michael Roth <michael.roth@amd.com> wrote:

> This will be used to determine whether or not an #NPF should be serviced
> using a normal page vs. a guarded/gmem one.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  7 +++++++
>  arch/x86/kvm/mmu/mmu_internal.h | 35 ++++++++++++++++++++++++++++++++-
>  2 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b3bd24f2a390..c26f76641121 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1445,6 +1445,13 @@ struct kvm_arch {
>  	 */
>  #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
>  	struct kvm_mmu_memory_cache split_desc_cache;
> +
> +	/*
> +	 * When set, used to determine whether a fault should be treated as
> +	 * private in the context of protected VMs which use a separate gmem
> +	 * pool to back private guest pages.
> +	 */
> +	u64 mmu_private_fault_mask;
>  };
>  
>  struct kvm_vm_stat {
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 780b91e1da9f..9b9e75aa43f4 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -252,6 +252,39 @@ struct kvm_page_fault {
>  
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>  
> +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> +{
> +	struct kvm_memory_slot *slot;
> +	bool private_fault = false;
> +	gfn_t gfn = gpa_to_gfn(gpa);
> +
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot) {
> +		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
> +		goto out;
> +	}
> +
> +	if (!kvm_slot_can_be_private(slot)) {
> +		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
> +		goto out;
> +	}
> +
> +	if (kvm->arch.mmu_private_fault_mask) {
> +		private_fault = !!(err & kvm->arch.mmu_private_fault_mask);
> +		goto out;
> +	}

What's the convention of err? Can we abstract it by introducing a new bit
PFERR_PRIVATE_MASK? The caller sets it based on arch specific value.
the logic will be
        .is_private = err & PFERR_PRIVATE_MASK;


> +
> +	/*
> +	 * Handling below is for UPM self-tests and guests that treat userspace
> +	 * as the authority on whether a fault should be private or not.
> +	 */
> +	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);

This code path is sad. One extra slot lookup and xarray look up.
Without mmu lock, the result can change by other vcpu.
Let's find a better way.

> +
> +out:
> +	pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault);
> +	return private_fault;
> +}
> +
>  /*
>   * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
>   * and of course kvm_mmu_do_page_fault().
> @@ -301,7 +334,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
>  		.req_level = PG_LEVEL_4K,
>  		.goal_level = PG_LEVEL_4K,
> -		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> +		.is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
>  	};
>  	int r;
>  
> -- 
> 2.25.1
> 

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 02/51] KVM: x86: Add gmem hook for invalidating private memory
  2023-06-12 10:49   ` Borislav Petkov
@ 2023-06-19 13:39     ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2023-06-19 13:39 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang

On Mon, Jun 12, 2023 at 12:49:05PM +0200, Borislav Petkov wrote:
> ld: arch/x86/kvm/../../../virt/kvm/eventfd.o: in function `kvm_arch_gmem_invalidate':
> /home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: multiple definition of `kvm_arch_gmem_invalidate'; arch/x86/kvm/../../../virt/kvm/kvm_main.o:/home/boris/kernel/2nd/linux/./include/linux/kvm_host.h:2356: first defined here
> ld: arch/x86/kvm/../../../virt/kvm/binary_stats.o: in function `kvm_arch_gmem_invalidate':

Fix is trivial:

---
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7de06add2235..67fdfb683cb9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2353,7 +2353,7 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
 	return -EIO;
 }
 
-void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end) { }
+static inline void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end) { }
 #endif /* CONFIG_KVM_PRIVATE_MEM */
 
 #endif

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-12  4:25 ` [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask Michael Roth
  2023-06-14 16:47   ` Isaku Yamahata
@ 2023-06-19 16:27   ` Borislav Petkov
  2023-06-20 20:36     ` Michael Roth
  1 sibling, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2023-06-19 16:27 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang

On Sun, Jun 11, 2023 at 11:25:12PM -0500, Michael Roth wrote:
> This will be used to determine whether or not an #NPF should be serviced
> using a normal page vs. a guarded/gmem one.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  7 +++++++
>  arch/x86/kvm/mmu/mmu_internal.h | 35 ++++++++++++++++++++++++++++++++-
>  2 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b3bd24f2a390..c26f76641121 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1445,6 +1445,13 @@ struct kvm_arch {
>  	 */
>  #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
>  	struct kvm_mmu_memory_cache split_desc_cache;
> +
> +	/*
> +	 * When set, used to determine whether a fault should be treated as
	   ^^^^^^^^

And when not set? Invalid?

I guess so, judging by the code below.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile
  2023-06-12  4:25 ` [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile Michael Roth
  2023-06-12  7:07   ` Kirill A . Shutemov
@ 2023-06-20 12:09   ` Borislav Petkov
  2023-06-20 20:43     ` Michael Roth
  2023-07-10  3:05   ` Sathyanarayanan Kuppuswamy
  2 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2023-06-20 12:09 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang, Kirill A . Shutemov

On Sun, Jun 11, 2023 at 11:25:13PM -0500, Michael Roth wrote:
> Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
					^^^^^^

Use proper english words pls.

> arch/x86/coco, but that is generally only applicable for guest support.
> 
> For SEV-SNP, helpers related purely to host support will also live in
> arch/x86/coco. To allow for CoCo-related host support code in
> arch/x86/coco, move that check down into the Makefile and check for it
> specifically when needed.

I have no clue what that means. Example?

The last time we talked about paths, we ended up agreeing on:

https://lore.kernel.org/all/Yg5nh1RknPRwIrb8@zn.tnic/

So your "helpers related purely to host support" should go to

arch/x86/virt/svm/sev*.c

And just to keep it simple, that should be

arch/x86/virt/svm/sev.c

and if there's real need to split that, we can do that later.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-14 16:47   ` Isaku Yamahata
@ 2023-06-20 20:28     ` Michael Roth
  2023-06-20 21:18       ` Isaku Yamahata
  0 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-20 20:28 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

On Wed, Jun 14, 2023 at 09:47:09AM -0700, Isaku Yamahata wrote:
> On Sun, Jun 11, 2023 at 11:25:12PM -0500,
> Michael Roth <michael.roth@amd.com> wrote:
> 
> > This will be used to determine whether or not an #NPF should be serviced
> > using a normal page vs. a guarded/gmem one.
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  7 +++++++
> >  arch/x86/kvm/mmu/mmu_internal.h | 35 ++++++++++++++++++++++++++++++++-
> >  2 files changed, 41 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index b3bd24f2a390..c26f76641121 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1445,6 +1445,13 @@ struct kvm_arch {
> >  	 */
> >  #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> >  	struct kvm_mmu_memory_cache split_desc_cache;
> > +
> > +	/*
> > +	 * When set, used to determine whether a fault should be treated as
> > +	 * private in the context of protected VMs which use a separate gmem
> > +	 * pool to back private guest pages.
> > +	 */
> > +	u64 mmu_private_fault_mask;
> >  };
> >  
> >  struct kvm_vm_stat {
> > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> > index 780b91e1da9f..9b9e75aa43f4 100644
> > --- a/arch/x86/kvm/mmu/mmu_internal.h
> > +++ b/arch/x86/kvm/mmu/mmu_internal.h
> > @@ -252,6 +252,39 @@ struct kvm_page_fault {
> >  
> >  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> >  
> > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> > +{
> > +	struct kvm_memory_slot *slot;
> > +	bool private_fault = false;
> > +	gfn_t gfn = gpa_to_gfn(gpa);
> > +
> > +	slot = gfn_to_memslot(kvm, gfn);
> > +	if (!slot) {
> > +		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
> > +		goto out;
> > +	}
> > +
> > +	if (!kvm_slot_can_be_private(slot)) {
> > +		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
> > +		goto out;
> > +	}
> > +
> > +	if (kvm->arch.mmu_private_fault_mask) {
> > +		private_fault = !!(err & kvm->arch.mmu_private_fault_mask);
> > +		goto out;
> > +	}
> 
> What's the convention of err? Can we abstract it by introducing a new bit
> PFERR_PRIVATE_MASK? The caller sets it based on arch specific value.
> the logic will be
>         .is_private = err & PFERR_PRIVATE_MASK;

I'm not sure I understand the question. 'err' is just the page fault flags,
and arch.mmu_private_fault_mask is something that can be set on a
per-platform basis when running in a mode where shared/private access
is recorded in the page fault flags during a #NPF.

I'm not sure how we'd keep the handling cross-platform by moving to a macro,
since TDX uses a different bit, and we'd want to be able to build a
SNP+TDX kernel that could run on either type of hardware.

Are you suggesting to reverse that and have err be set in a platform-specific
way and then use a common PFERR_PRIVATE_MASK that's software-defined and
consistent across platforms? That could work, but existing handling seems
to use page fault flags as-is, keeping the hardware-set values, rather than
modifying them to pass additional metadata, so it seems like it might
make things more confusing to make an exception to that here. Or are
there other cases where it's done that way?

> 
> 
> > +
> > +	/*
> > +	 * Handling below is for UPM self-tests and guests that treat userspace
> > +	 * as the authority on whether a fault should be private or not.
> > +	 */
> > +	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
> 
> This code path is sad. One extra slot lookup and xarray look up.
> Without mmu lock, the result can change by other vcpu.
> Let's find a better way.

The intention was to rely on fault->mmu_seq to determine if a
KVM_SET_MEMORY_ATTRIBUTES update came in after .private_fault was set so
that fault handling could be retried, but that doesn't happen until
kvm_faultin_pfn() which is *after* this is logged. So yes, I think there
is a race here, and the approach you took in your Misc. series of
keeping the kvm_mem_is_private() check inside kvm_faultin_pfn() is more
efficient/correct.

If we can figure out a way to handle checking the fault flags in a way
that works for both TDX/SNP (and KVM self-test use-case) we can
consolidate around that.

-Mike

> 
> > +
> > +out:
> > +	pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault);
> > +	return private_fault;
> > +}
> > +
> >  /*
> >   * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
> >   * and of course kvm_mmu_do_page_fault().
> > @@ -301,7 +334,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >  		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
> >  		.req_level = PG_LEVEL_4K,
> >  		.goal_level = PG_LEVEL_4K,
> > -		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> > +		.is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
> >  	};
> >  	int r;
> >  
> > -- 
> > 2.25.1
> > 
> 
> -- 
> Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-19 16:27   ` Borislav Petkov
@ 2023-06-20 20:36     ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-20 20:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang

On Mon, Jun 19, 2023 at 06:27:03PM +0200, Borislav Petkov wrote:
> On Sun, Jun 11, 2023 at 11:25:12PM -0500, Michael Roth wrote:
> > This will be used to determine whether or not an #NPF should be serviced
> > using a normal page vs. a guarded/gmem one.
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  7 +++++++
> >  arch/x86/kvm/mmu/mmu_internal.h | 35 ++++++++++++++++++++++++++++++++-
> >  2 files changed, 41 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index b3bd24f2a390..c26f76641121 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1445,6 +1445,13 @@ struct kvm_arch {
> >  	 */
> >  #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> >  	struct kvm_mmu_memory_cache split_desc_cache;
> > +
> > +	/*
> > +	 * When set, used to determine whether a fault should be treated as
> 	   ^^^^^^^^
> 
> And when not set? Invalid?
> 
> I guess so, judging by the code below.

Yes, or more specifically, "When not set, treat the value set by
userspace via KVM_SET_MEMORY_ATTRIBUTES as the authority on whether to
treat a fault as private or not. In this case, KVM_EXIT_MEMORY_FAULT
events won't be generated, so there will never be a mismatch between
what hardware indicates via page fault flags vs. what software has
assigned via KVM_SET_MEMORY_ATTRIBUTES".

-Mike

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile
  2023-06-20 12:09   ` Borislav Petkov
@ 2023-06-20 20:43     ` Michael Roth
  2023-06-21  8:54       ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-20 20:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang, Kirill A . Shutemov

On Tue, Jun 20, 2023 at 02:09:20PM +0200, Borislav Petkov wrote:
> On Sun, Jun 11, 2023 at 11:25:13PM -0500, Michael Roth wrote:
> > Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
> 					^^^^^^
> 
> Use proper english words pls.
> 
> > arch/x86/coco, but that is generally only applicable for guest support.
> > 
> > For SEV-SNP, helpers related purely to host support will also live in
> > arch/x86/coco. To allow for CoCo-related host support code in
> > arch/x86/coco, move that check down into the Makefile and check for it
> > specifically when needed.
> 
> I have no clue what that means. Example?

Basically, arch/x86/coco/Makefile is never processed if arch/x86/Kbuild
indicates that CONFIG_HAS_CC_PLATFORM is not set. So if we want to have
stuff in arch/x86/coco/Makefile that build for !CONFIG_HAS_CC_PLATFORM,
like SNP host support, which does not rely on CONFIG_HAS_CC_PLATFORM
being set, that check needs to be moved down into arch/x86/coco/Makefile.

> 
> The last time we talked about paths, we ended up agreeing on:
> 
> https://lore.kernel.org/all/Yg5nh1RknPRwIrb8@zn.tnic/
> 
> So your "helpers related purely to host support" should go to
> 
> arch/x86/virt/svm/sev*.c
> 
> And just to keep it simple, that should be
> 
> arch/x86/virt/svm/sev.c
> 
> and if there's real need to split that, we can do that later.

Ok, makes sense.

Thanks,

Mike

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-20 20:28     ` Michael Roth
@ 2023-06-20 21:18       ` Isaku Yamahata
  2023-06-21 23:00         ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Isaku Yamahata @ 2023-06-20 21:18 UTC (permalink / raw)
  To: Michael Roth
  Cc: Isaku Yamahata, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, liam.merwick, zhi.a.wang

On Tue, Jun 20, 2023 at 03:28:41PM -0500,
Michael Roth <michael.roth@amd.com> wrote:

> On Wed, Jun 14, 2023 at 09:47:09AM -0700, Isaku Yamahata wrote:
> > On Sun, Jun 11, 2023 at 11:25:12PM -0500,
> > Michael Roth <michael.roth@amd.com> wrote:
> > 
> > > This will be used to determine whether or not an #NPF should be serviced
> > > using a normal page vs. a guarded/gmem one.
> > > 
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > >  arch/x86/include/asm/kvm_host.h |  7 +++++++
> > >  arch/x86/kvm/mmu/mmu_internal.h | 35 ++++++++++++++++++++++++++++++++-
> > >  2 files changed, 41 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index b3bd24f2a390..c26f76641121 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1445,6 +1445,13 @@ struct kvm_arch {
> > >  	 */
> > >  #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> > >  	struct kvm_mmu_memory_cache split_desc_cache;
> > > +
> > > +	/*
> > > +	 * When set, used to determine whether a fault should be treated as
> > > +	 * private in the context of protected VMs which use a separate gmem
> > > +	 * pool to back private guest pages.
> > > +	 */
> > > +	u64 mmu_private_fault_mask;
> > >  };
> > >  
> > >  struct kvm_vm_stat {
> > > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> > > index 780b91e1da9f..9b9e75aa43f4 100644
> > > --- a/arch/x86/kvm/mmu/mmu_internal.h
> > > +++ b/arch/x86/kvm/mmu/mmu_internal.h
> > > @@ -252,6 +252,39 @@ struct kvm_page_fault {
> > >  
> > >  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> > >  
> > > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> > > +{
> > > +	struct kvm_memory_slot *slot;
> > > +	bool private_fault = false;
> > > +	gfn_t gfn = gpa_to_gfn(gpa);
> > > +
> > > +	slot = gfn_to_memslot(kvm, gfn);
> > > +	if (!slot) {
> > > +		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
> > > +		goto out;
> > > +	}
> > > +
> > > +	if (!kvm_slot_can_be_private(slot)) {
> > > +		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
> > > +		goto out;
> > > +	}
> > > +
> > > +	if (kvm->arch.mmu_private_fault_mask) {
> > > +		private_fault = !!(err & kvm->arch.mmu_private_fault_mask);
> > > +		goto out;
> > > +	}
> > 
> > What's the convention of err? Can we abstract it by introducing a new bit
> > PFERR_PRIVATE_MASK? The caller sets it based on arch specific value.
> > the logic will be
> >         .is_private = err & PFERR_PRIVATE_MASK;
> 
> I'm not sure I understand the question. 'err' is just the page fault flags,
> and arch.mmu_private_fault_mask is something that can be set on a
> per-platform basis when running in a mode where shared/private access
> is recorded in the page fault flags during a #NPF.
> 
> I'm not sure how we'd keep the handling cross-platform by moving to a macro,
> since TDX uses a different bit, and we'd want to be able to build a
> SNP+TDX kernel that could run on either type of hardware.
> 
> Are you suggesting to reverse that and have err be set in a platform-specific
> way and then use a common PFERR_PRIVATE_MASK that's software-defined and
> consistent across platforms? That could work, but existing handling seems
> to use page fault flags as-is, keeping the hardware-set values, rather than
> modifying them to pass additional metadata, so it seems like it might
> make things more confusing to make an exception to that here. Or are
> there other cases where it's done that way?

I meant the latter, making PFERR_PRIVATE_MASK common software-defined.

I think the SVM fault handler can use hardware value directly by carefully
defining those PFERR values.

TDX doesn't have architectural bit in error code to indicate the private fault.
It's coded in faulted address as shared bit. GPA bit 51 or 47.
PFERR_{USER, WRITE, FETCH, PRESENT} are already software-defined value for VMX
(and TDX).  The fault handler for VMX, handle_ept_violation(), converts
encoding.  For TDX, PFERR_PRIVATE_MASK is just one more software defined bit.

I'm fine with either way, variable or macro. Which do you prefer?

- Define variable mmu_private_fault_mask (global or per struct kvm)
  The module initialization code, hardware_setup(), sets mmu_private_fault_mask.
- Define the software defined value, PFERR_PRIVATE_MASK.
  The caller of kvm_mmu_page_fault() parses the hardware value and construct
  software defined error_code.
- any other?
  

> > > +
> > > +	/*
> > > +	 * Handling below is for UPM self-tests and guests that treat userspace
> > > +	 * as the authority on whether a fault should be private or not.
> > > +	 */
> > > +	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
> > 
> > This code path is sad. One extra slot lookup and xarray look up.
> > Without mmu lock, the result can change by other vcpu.
> > Let's find a better way.
> 
> The intention was to rely on fault->mmu_seq to determine if a
> KVM_SET_MEMORY_ATTRIBUTES update came in after .private_fault was set so
> that fault handling could be retried, but that doesn't happen until
> kvm_faultin_pfn() which is *after* this is logged. So yes, I think there
> is a race here, and the approach you took in your Misc. series of
> keeping the kvm_mem_is_private() check inside kvm_faultin_pfn() is more
> efficient/correct.
> 
> If we can figure out a way to handle checking the fault flags in a way
> that works for both TDX/SNP (and KVM self-test use-case) we can
> consolidate around that.

I can think of the following ways. I think the second option is better because
we don't need exit bit for error code.

- Introduce software defined error code
- Add a flags to struct kvm_arch for self-test use-case VM_TYPE_PROTECTED_VM.
  Set it to true for VM_TYPE_PROTECTED_VM case.
- any other?

Thanks,
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile
  2023-06-20 20:43     ` Michael Roth
@ 2023-06-21  8:54       ` Borislav Petkov
  2023-06-29 21:02         ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2023-06-21  8:54 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang, Kirill A . Shutemov

On Tue, Jun 20, 2023 at 03:43:15PM -0500, Michael Roth wrote:
> Basically, arch/x86/coco/Makefile is never processed if arch/x86/Kbuild
> indicates that CONFIG_HAS_CC_PLATFORM is not set. So if we want to have
> stuff in arch/x86/coco/Makefile that build for !CONFIG_HAS_CC_PLATFORM,
> like SNP host support, which does not rely on CONFIG_HAS_CC_PLATFORM
> being set, that check needs to be moved down into arch/x86/coco/Makefile.

Ok, so if you put SNP host support into arch/x86/virt/svm/sev.c, that
should work too and won't have any relation to CONFIG_HAS_CC_PLATFORM,
right?

The CC_PLATFORM thing is a way to check for confidential computing guest
features by abstracting the capabilities so that you don't have to check
*each* and *every* conf guest type in the conditionals and thus go nuts.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support
  2023-06-12 15:34   ` Dave Hansen
@ 2023-06-21  9:15     ` Borislav Petkov
  2023-06-21 14:31       ` Dave Hansen
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2023-06-21  9:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, liam.merwick, zhi.a.wang,
	Brijesh Singh

On Mon, Jun 12, 2023 at 08:34:02AM -0700, Dave Hansen wrote:
> On 6/11/23 21:25, Michael Roth wrote:
> > +	/*
> > +	 * Calculate the amount the memory that must be reserved by the BIOS to
> > +	 * address the whole RAM, including the bookkeeping area. The RMP itself
> > +	 * must also be covered.
> > +	 */
> > +	max_rmp_pfn = max_pfn;
> > +	if (PHYS_PFN(rmp_end) > max_pfn)
> > +		max_rmp_pfn = PHYS_PFN(rmp_end);
> 
> Could you say a little here about how this deals with memory hotplug?

Does SNP hw even support memory hotplug?

I think in order to support that, you'd need some special dance because
of the RMP table etc...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support
  2023-06-12  4:25 ` [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support Michael Roth
  2023-06-12 15:34   ` Dave Hansen
@ 2023-06-21  9:42   ` Borislav Petkov
  2023-06-21 14:36     ` Tom Lendacky
  2023-06-21 19:15     ` Kalra, Ashish
  2023-08-09 13:03   ` Jeremi Piotrowski
  2 siblings, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2023-06-21  9:42 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sun, Jun 11, 2023 at 11:25:15PM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details

Rather say 'APM v2, section "Secure Nested Paging (SEV-SNP)"' because
the numbering is more likely to change than the name in the future. With
the name, people can find it faster.

> a number of steps needed to detect/enable SEV-SNP and RMP table support
> on the host:
> 
>  - Detect SEV-SNP support based on CPUID bit
>  - Initialize the RMP table memory reported by the RMP base/end MSR
>    registers and configure IOMMU to be compatible with RMP access
>    restrictions
>  - Set the MtrrFixDramModEn bit in SYSCFG MSR
>  - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
>  - Configure IOMMU
> 
> RMP table entry format is non-architectural and it can vary by
> processor. It is defined by the PPR. Restrict SNP support to CPU
> models/families which are compatible with the current RMP table entry
> format to guard against any undefined behavior when running on other
> system types. Future models/support will handle this through an
> architectural mechanism to allow for broader compatibility.

I'm guessing this is all for live migration between SNP hosts. If so,
then there will have to be a guest API to handle the differences.

> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
> instead of CONFIG_AMD_MEM_ENCRYPT.

Does that mean that even on CONFIG_AMD_MEM_ENCRYPT=n kernels, host SNP
can function?

Do we even want that?

I'd expect that a host SNP kernel should have SME enabled too even
though it is not absolutely necessary.

> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> [mdr: rework commit message to be clearer about what patch does, squash
>       in early_rmptable_check() handling from Tom]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/coco/Makefile                   |   1 +
>  arch/x86/coco/sev/Makefile               |   3 +
>  arch/x86/coco/sev/host.c                 | 212 +++++++++++++++++++++++
>  arch/x86/include/asm/disabled-features.h |   8 +-
>  arch/x86/include/asm/msr-index.h         |  11 +-
>  arch/x86/include/asm/sev.h               |   2 +
>  arch/x86/kernel/cpu/amd.c                |  19 ++
>  drivers/iommu/amd/init.c                 |   2 +-
>  include/linux/amd-iommu.h                |   2 +-
>  9 files changed, 256 insertions(+), 4 deletions(-)
>  create mode 100644 arch/x86/coco/sev/Makefile
>  create mode 100644 arch/x86/coco/sev/host.c

Ignored review comments here:

https://lore.kernel.org/r/Y9ubi0i4Z750gdMm@zn.tnic

Ignoring this one for now too.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support
  2023-06-21  9:15     ` Borislav Petkov
@ 2023-06-21 14:31       ` Dave Hansen
  2023-06-21 15:59         ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Hansen @ 2023-06-21 14:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, liam.merwick, zhi.a.wang,
	Brijesh Singh

On 6/21/23 02:15, Borislav Petkov wrote:
> On Mon, Jun 12, 2023 at 08:34:02AM -0700, Dave Hansen wrote:
>> On 6/11/23 21:25, Michael Roth wrote:
>>> +	/*
>>> +	 * Calculate the amount the memory that must be reserved by the BIOS to
>>> +	 * address the whole RAM, including the bookkeeping area. The RMP itself
>>> +	 * must also be covered.
>>> +	 */
>>> +	max_rmp_pfn = max_pfn;
>>> +	if (PHYS_PFN(rmp_end) > max_pfn)
>>> +		max_rmp_pfn = PHYS_PFN(rmp_end);
>> Could you say a little here about how this deals with memory hotplug?
> Does SNP hw even support memory hotplug?
> 
> I think in order to support that, you'd need some special dance because
> of the RMP table etc...

Yep, there's the hardware side and then there are fun nuggets like using
mem= and then doing a software-only hot-add later after boot.

Also, if the hardware doesn't support any kind of hotplug, it would be
great to point to the place in the spec where it says that.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support
  2023-06-21  9:42   ` Borislav Petkov
@ 2023-06-21 14:36     ` Tom Lendacky
  2023-06-21 19:15     ` Kalra, Ashish
  1 sibling, 0 replies; 102+ messages in thread
From: Tom Lendacky @ 2023-06-21 14:36 UTC (permalink / raw)
  To: Borislav Petkov, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, vbabka, kirill, ak, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, liam.merwick, zhi.a.wang,
	Brijesh Singh

On 6/21/23 04:42, Borislav Petkov wrote:
> On Sun, Jun 11, 2023 at 11:25:15PM -0500, Michael Roth wrote:
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> The memory integrity guarantees of SEV-SNP are enforced through a new
>> structure called the Reverse Map Table (RMP). The RMP is a single data
>> structure shared across the system that contains one entry for every 4K
>> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
> 
> Rather say 'APM v2, section "Secure Nested Paging (SEV-SNP)"' because
> the numbering is more likely to change than the name in the future. With
> the name, people can find it faster.
> 
>> a number of steps needed to detect/enable SEV-SNP and RMP table support
>> on the host:
>>
>>   - Detect SEV-SNP support based on CPUID bit
>>   - Initialize the RMP table memory reported by the RMP base/end MSR
>>     registers and configure IOMMU to be compatible with RMP access
>>     restrictions
>>   - Set the MtrrFixDramModEn bit in SYSCFG MSR
>>   - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
>>   - Configure IOMMU
>>
>> RMP table entry format is non-architectural and it can vary by
>> processor. It is defined by the PPR. Restrict SNP support to CPU
>> models/families which are compatible with the current RMP table entry
>> format to guard against any undefined behavior when running on other
>> system types. Future models/support will handle this through an
>> architectural mechanism to allow for broader compatibility.
> 
> I'm guessing this is all for live migration between SNP hosts. If so,
> then there will have to be a guest API to handle the differences.
> 
>> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
>> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
>> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
>> instead of CONFIG_AMD_MEM_ENCRYPT.
> 
> Does that mean that even on CONFIG_AMD_MEM_ENCRYPT=n kernels, host SNP
> can function?

Yes, because CONFIG_AMD_MEM_ENCRYPT is mainly for dealing with the 
encryption bit.

> 
> Do we even want that?

We support that today with SEV and SEV-ES guests. The host/hypervisor 
kernel does not need CONFIG_AMD_MEM_ENCRYPT=y in order to run SEV guests.

> 
> I'd expect that a host SNP kernel should have SME enabled too even
> though it is not absolutely necessary.

I recommend using TSME over SME.

Thanks,
Tom

> 
>> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> [mdr: rework commit message to be clearer about what patch does, squash
>>        in early_rmptable_check() handling from Tom]
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   arch/x86/coco/Makefile                   |   1 +
>>   arch/x86/coco/sev/Makefile               |   3 +
>>   arch/x86/coco/sev/host.c                 | 212 +++++++++++++++++++++++
>>   arch/x86/include/asm/disabled-features.h |   8 +-
>>   arch/x86/include/asm/msr-index.h         |  11 +-
>>   arch/x86/include/asm/sev.h               |   2 +
>>   arch/x86/kernel/cpu/amd.c                |  19 ++
>>   drivers/iommu/amd/init.c                 |   2 +-
>>   include/linux/amd-iommu.h                |   2 +-
>>   9 files changed, 256 insertions(+), 4 deletions(-)
>>   create mode 100644 arch/x86/coco/sev/Makefile
>>   create mode 100644 arch/x86/coco/sev/host.c
> 
> Ignored review comments here:
> 
> https://lore.kernel.org/r/Y9ubi0i4Z750gdMm@zn.tnic
> 
> Ignoring this one for now too.
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support
  2023-06-21 14:31       ` Dave Hansen
@ 2023-06-21 15:59         ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2023-06-21 15:59 UTC (permalink / raw)
  To: Dave Hansen, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Wed, Jun 21, 2023 at 07:31:34AM -0700, Dave Hansen wrote:
> Yep, there's the hardware side and then there are fun nuggets like using
> mem= and then doing a software-only hot-add later after boot.

Ah, right, Mike, I think we need to check whether mem= has any effect on
SNP.

> Also, if the hardware doesn't support any kind of hotplug, it would be
> great to point to the place in the spec where it says that.

I honestly don't know. But I can't recall ever hearing about hardware
memory hotplug so something worth to check too.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support
  2023-06-21  9:42   ` Borislav Petkov
  2023-06-21 14:36     ` Tom Lendacky
@ 2023-06-21 19:15     ` Kalra, Ashish
  1 sibling, 0 replies; 102+ messages in thread
From: Kalra, Ashish @ 2023-06-21 19:15 UTC (permalink / raw)
  To: Borislav Petkov, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, nikunj.dadhania, liam.merwick, zhi.a.wang,
	Brijesh Singh

Hello Boris,

On 6/21/2023 4:42 AM, Borislav Petkov wrote:
> On Sun, Jun 11, 2023 at 11:25:15PM -0500, Michael Roth wrote:
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> The memory integrity guarantees of SEV-SNP are enforced through a new
>> structure called the Reverse Map Table (RMP). The RMP is a single data
>> structure shared across the system that contains one entry for every 4K
>> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
> 
> Rather say 'APM v2, section "Secure Nested Paging (SEV-SNP)"' because
> the numbering is more likely to change than the name in the future. With
> the name, people can find it faster.
> 
>> a number of steps needed to detect/enable SEV-SNP and RMP table support
>> on the host:
>>
>>   - Detect SEV-SNP support based on CPUID bit
>>   - Initialize the RMP table memory reported by the RMP base/end MSR
>>     registers and configure IOMMU to be compatible with RMP access
>>     restrictions
>>   - Set the MtrrFixDramModEn bit in SYSCFG MSR
>>   - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
>>   - Configure IOMMU
>>
>> RMP table entry format is non-architectural and it can vary by
>> processor. It is defined by the PPR. Restrict SNP support to CPU
>> models/families which are compatible with the current RMP table entry
>> format to guard against any undefined behavior when running on other
>> system types. Future models/support will handle this through an
>> architectural mechanism to allow for broader compatibility.
> 
> I'm guessing this is all for live migration between SNP hosts. If so,
> then there will have to be a guest API to handle the differences.

This is basically for the RMP table entry format/structure definition in
arch/x86/coco/sev/host.c, as this is non-architectural it is defined in 
a .c file instead of a header file, so that the structure remains 
private (and restricted to that file) to the SNP host code and not 
exposed to the rest of the kernel.

As mentioned in the comments above, future CPU models may support RMP 
table accesses in an architectural way.

> 
>> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
>> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
>> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
>> instead of CONFIG_AMD_MEM_ENCRYPT.
> 
> Does that mean that even on CONFIG_AMD_MEM_ENCRYPT=n kernels, host SNP
> can function?
> 

Yes, host SNP is supposed to function with CONFIG_AMD_MEM_ENCRYPT=n.

CONFIG_AMD_MEM_ENCRYPT=y is needed for SNP guest.

> Do we even want that?
> 
> I'd expect that a host SNP kernel should have SME enabled too even
> though it is not absolutely necessary.

Yes, we typically test host SNP kernel with SME enabled.

Thanks,
Ashish

> 
>> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> [mdr: rework commit message to be clearer about what patch does, squash
>>        in early_rmptable_check() handling from Tom]
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   arch/x86/coco/Makefile                   |   1 +
>>   arch/x86/coco/sev/Makefile               |   3 +
>>   arch/x86/coco/sev/host.c                 | 212 +++++++++++++++++++++++
>>   arch/x86/include/asm/disabled-features.h |   8 +-
>>   arch/x86/include/asm/msr-index.h         |  11 +-
>>   arch/x86/include/asm/sev.h               |   2 +
>>   arch/x86/kernel/cpu/amd.c                |  19 ++
>>   drivers/iommu/amd/init.c                 |   2 +-
>>   include/linux/amd-iommu.h                |   2 +-
>>   9 files changed, 256 insertions(+), 4 deletions(-)
>>   create mode 100644 arch/x86/coco/sev/Makefile
>>   create mode 100644 arch/x86/coco/sev/host.c
> 
> Ignored review comments here:
> 
> https://lore.kernel.org/r/Y9ubi0i4Z750gdMm@zn.tnic
> 
> Ignoring this one for now too.
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-20 21:18       ` Isaku Yamahata
@ 2023-06-21 23:00         ` Michael Roth
  2023-06-22  8:01           ` Isaku Yamahata
  2023-06-22  9:55           ` Huang, Kai
  0 siblings, 2 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-21 23:00 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Vishal Annapurve

On Tue, Jun 20, 2023 at 02:18:45PM -0700, Isaku Yamahata wrote:
> On Tue, Jun 20, 2023 at 03:28:41PM -0500,
> Michael Roth <michael.roth@amd.com> wrote:
> 
> > On Wed, Jun 14, 2023 at 09:47:09AM -0700, Isaku Yamahata wrote:
> > > On Sun, Jun 11, 2023 at 11:25:12PM -0500,
> > > Michael Roth <michael.roth@amd.com> wrote:
> > > 
> > > > This will be used to determine whether or not an #NPF should be serviced
> > > > using a normal page vs. a guarded/gmem one.
> > > > 
> > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > ---
> > > >  arch/x86/include/asm/kvm_host.h |  7 +++++++
> > > >  arch/x86/kvm/mmu/mmu_internal.h | 35 ++++++++++++++++++++++++++++++++-
> > > >  2 files changed, 41 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index b3bd24f2a390..c26f76641121 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1445,6 +1445,13 @@ struct kvm_arch {
> > > >  	 */
> > > >  #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> > > >  	struct kvm_mmu_memory_cache split_desc_cache;
> > > > +
> > > > +	/*
> > > > +	 * When set, used to determine whether a fault should be treated as
> > > > +	 * private in the context of protected VMs which use a separate gmem
> > > > +	 * pool to back private guest pages.
> > > > +	 */
> > > > +	u64 mmu_private_fault_mask;
> > > >  };
> > > >  
> > > >  struct kvm_vm_stat {
> > > > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> > > > index 780b91e1da9f..9b9e75aa43f4 100644
> > > > --- a/arch/x86/kvm/mmu/mmu_internal.h
> > > > +++ b/arch/x86/kvm/mmu/mmu_internal.h
> > > > @@ -252,6 +252,39 @@ struct kvm_page_fault {
> > > >  
> > > >  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> > > >  
> > > > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> > > > +{
> > > > +	struct kvm_memory_slot *slot;
> > > > +	bool private_fault = false;
> > > > +	gfn_t gfn = gpa_to_gfn(gpa);
> > > > +
> > > > +	slot = gfn_to_memslot(kvm, gfn);
> > > > +	if (!slot) {
> > > > +		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	if (!kvm_slot_can_be_private(slot)) {
> > > > +		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	if (kvm->arch.mmu_private_fault_mask) {
> > > > +		private_fault = !!(err & kvm->arch.mmu_private_fault_mask);
> > > > +		goto out;
> > > > +	}
> > > 
> > > What's the convention of err? Can we abstract it by introducing a new bit
> > > PFERR_PRIVATE_MASK? The caller sets it based on arch specific value.
> > > the logic will be
> > >         .is_private = err & PFERR_PRIVATE_MASK;
> > 
> > I'm not sure I understand the question. 'err' is just the page fault flags,
> > and arch.mmu_private_fault_mask is something that can be set on a
> > per-platform basis when running in a mode where shared/private access
> > is recorded in the page fault flags during a #NPF.
> > 
> > I'm not sure how we'd keep the handling cross-platform by moving to a macro,
> > since TDX uses a different bit, and we'd want to be able to build a
> > SNP+TDX kernel that could run on either type of hardware.
> > 
> > Are you suggesting to reverse that and have err be set in a platform-specific
> > way and then use a common PFERR_PRIVATE_MASK that's software-defined and
> > consistent across platforms? That could work, but existing handling seems
> > to use page fault flags as-is, keeping the hardware-set values, rather than
> > modifying them to pass additional metadata, so it seems like it might
> > make things more confusing to make an exception to that here. Or are
> > there other cases where it's done that way?
> 
> I meant the latter, making PFERR_PRIVATE_MASK common software-defined.
> 
> I think the SVM fault handler can use hardware value directly by carefully
> defining those PFERR values.
> 
> TDX doesn't have architectural bit in error code to indicate the private fault.
> It's coded in faulted address as shared bit. GPA bit 51 or 47.
> PFERR_{USER, WRITE, FETCH, PRESENT} are already software-defined value for VMX
> (and TDX).  The fault handler for VMX, handle_ept_violation(), converts
> encoding.  For TDX, PFERR_PRIVATE_MASK is just one more software defined bit.
> 
> I'm fine with either way, variable or macro. Which do you prefer?
> 
> - Define variable mmu_private_fault_mask (global or per struct kvm)
>   The module initialization code, hardware_setup(), sets mmu_private_fault_mask.
> - Define the software defined value, PFERR_PRIVATE_MASK.
>   The caller of kvm_mmu_page_fault() parses the hardware value and construct
>   software defined error_code.
> - any other?
>   
> 
> > > > +
> > > > +	/*
> > > > +	 * Handling below is for UPM self-tests and guests that treat userspace
> > > > +	 * as the authority on whether a fault should be private or not.
> > > > +	 */
> > > > +	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
> > > 
> > > This code path is sad. One extra slot lookup and xarray look up.
> > > Without mmu lock, the result can change by other vcpu.
> > > Let's find a better way.
> > 
> > The intention was to rely on fault->mmu_seq to determine if a
> > KVM_SET_MEMORY_ATTRIBUTES update came in after .private_fault was set so
> > that fault handling could be retried, but that doesn't happen until
> > kvm_faultin_pfn() which is *after* this is logged. So yes, I think there
> > is a race here, and the approach you took in your Misc. series of
> > keeping the kvm_mem_is_private() check inside kvm_faultin_pfn() is more
> > efficient/correct.
> > 
> > If we can figure out a way to handle checking the fault flags in a way
> > that works for both TDX/SNP (and KVM self-test use-case) we can
> > consolidate around that.
> 
> I can think of the following ways. I think the second option is better because
> we don't need exit bit for error code.
> 
> - Introduce software defined error code
> - Add a flags to struct kvm_arch for self-test use-case VM_TYPE_PROTECTED_VM.
>   Set it to true for VM_TYPE_PROTECTED_VM case.
> - any other?

Vishal: hoping to get your thoughts here as well from the perspective of
the KVM self-test use-case.

I was thinking that once we set fault->is_private, that sort of
becomes our "software-defined" bit, and what KVM would use from that
point forward to determine whether or not the access should be treated
as a private one or not, and that whatever handler sets
fault->is_private would encapsulate away all the platform-specific
bit-checking needed to do that.

So if we were to straight-forwardly implement that based on how TDX
currently handles checking for the shared bit in GPA, paired with how
SEV-SNP handles checking for private bit in fault flags, it would look
something like:

  bool kvm_fault_is_private(kvm, gpa, err)
  {
    /* SEV-SNP handling */
    if (kvm->arch.mmu_private_fault_mask)
      return !!(err & arch.mmu_private_fault_mask);

    /* TDX handling */
    if (kvm->arch.gfn_shared_mask)
      return !!(gpa & arch.gfn_shared_mask);

    return false;
  }

  kvm_mmu_do_page_fault(vcpu, gpa, err, ...)
  {
    struct kvm_page_fault fault = {
      ...
      .is_private = kvm_fault_is_private(vcpu->kvm, gpa, err)
    };

    ...
  }

And then arch.mmu_private_fault_mask and arch.gfn_shared_mask would be
set per-KVM-instance, just like they are now with current SNP and TDX
patchsets, since stuff like KVM self-test wouldn't be setting those
masks, so it makes sense to do it per-instance in that regard.

But that still gets a little awkward for the KVM self-test use-case where
.is_private should sort of be ignored in favor of whatever the xarray
reports via kvm_mem_is_private(). In your Misc. series I believe you
handled this by introducing a PFERR_HASATTR_MASK bit so we can determine
whether existing value of fault->is_private should be
ignored/overwritten or not.

So maybe kvm_fault_is_private() needs to return an integer value
instead, like:

  enum {
    KVM_FAULT_VMM_DEFINED,
    KVM_FAULT_SHARED,
    KVM_FAULT_PRIVATE,
  }

  bool kvm_fault_is_private(kvm, gpa, err)
  {
    /* SEV-SNP handling */
    if (kvm->arch.mmu_private_fault_mask)
      (err & arch.mmu_private_fault_mask) ? KVM_FAULT_PRIVATE : KVM_FAULT_SHARED

    /* TDX handling */
    if (kvm->arch.gfn_shared_mask)
      (gpa & arch.gfn_shared_mask) ? KVM_FAULT_SHARED : KVM_FAULT_PRIVATE

    return KVM_FAULT_VMM_DEFINED;
  }

And then down in __kvm_faultin_pfn() we do:

  if (fault->is_private == KVM_FAULT_VMM_DEFINED)
    fault->is_private = kvm_mem_is_private(vcpu->kvm, fault->gfn);
  else if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn))
    return kvm_do_memory_fault_exit(vcpu, fault);

  if (fault->is_private)
    return kvm_faultin_pfn_private(vcpu, fault);

Maybe kvm_fault_is_private() can be simplified based on what direction
we end up taking WRT ongoing discussions like whether we decide to define
KVM_X86_{SNP,TDX}_VM vm_types in addition to the KVM_X86_PROTECTED_VM
type that the selftests uses, but hoping that for this path, any changes
along that line can be encapsulated away in kvm_fault_is_private() without
any/much further churn at the various call-sites like __kvm_faultin_pfn().

We could even push all the above logic down into the KVM self-tests, but
have:

  bool kvm_fault_is_private(kvm, gpa, err) {
    return KVM_FAULT_VMM_DEFINED;
  }

And that would be enough to run self-tests as standalone series, with
TDX/SNP should filling in kvm_fault_is_private() with their
platform-specific handling.

Does that seem reasonable to you? At least as a starting point. 

-Mike

> 
> Thanks,
> -- 
> Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-21 23:00         ` Michael Roth
@ 2023-06-22  8:01           ` Isaku Yamahata
  2023-06-22  9:55           ` Huang, Kai
  1 sibling, 0 replies; 102+ messages in thread
From: Isaku Yamahata @ 2023-06-22  8:01 UTC (permalink / raw)
  To: Michael Roth
  Cc: Isaku Yamahata, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, liam.merwick, zhi.a.wang,
	Vishal Annapurve

On Wed, Jun 21, 2023 at 06:00:31PM -0500,
Michael Roth <michael.roth@amd.com> wrote:

> On Tue, Jun 20, 2023 at 02:18:45PM -0700, Isaku Yamahata wrote:
> > On Tue, Jun 20, 2023 at 03:28:41PM -0500,
> > Michael Roth <michael.roth@amd.com> wrote:
> > 
> > > On Wed, Jun 14, 2023 at 09:47:09AM -0700, Isaku Yamahata wrote:
> > > > On Sun, Jun 11, 2023 at 11:25:12PM -0500,
> > > > Michael Roth <michael.roth@amd.com> wrote:
> > > > 
> > > > > This will be used to determine whether or not an #NPF should be serviced
> > > > > using a normal page vs. a guarded/gmem one.
> > > > > 
> > > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > > ---
> > > > >  arch/x86/include/asm/kvm_host.h |  7 +++++++
> > > > >  arch/x86/kvm/mmu/mmu_internal.h | 35 ++++++++++++++++++++++++++++++++-
> > > > >  2 files changed, 41 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > index b3bd24f2a390..c26f76641121 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -1445,6 +1445,13 @@ struct kvm_arch {
> > > > >  	 */
> > > > >  #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> > > > >  	struct kvm_mmu_memory_cache split_desc_cache;
> > > > > +
> > > > > +	/*
> > > > > +	 * When set, used to determine whether a fault should be treated as
> > > > > +	 * private in the context of protected VMs which use a separate gmem
> > > > > +	 * pool to back private guest pages.
> > > > > +	 */
> > > > > +	u64 mmu_private_fault_mask;
> > > > >  };
> > > > >  
> > > > >  struct kvm_vm_stat {
> > > > > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> > > > > index 780b91e1da9f..9b9e75aa43f4 100644
> > > > > --- a/arch/x86/kvm/mmu/mmu_internal.h
> > > > > +++ b/arch/x86/kvm/mmu/mmu_internal.h
> > > > > @@ -252,6 +252,39 @@ struct kvm_page_fault {
> > > > >  
> > > > >  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
> > > > >  
> > > > > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> > > > > +{
> > > > > +	struct kvm_memory_slot *slot;
> > > > > +	bool private_fault = false;
> > > > > +	gfn_t gfn = gpa_to_gfn(gpa);
> > > > > +
> > > > > +	slot = gfn_to_memslot(kvm, gfn);
> > > > > +	if (!slot) {
> > > > > +		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
> > > > > +		goto out;
> > > > > +	}
> > > > > +
> > > > > +	if (!kvm_slot_can_be_private(slot)) {
> > > > > +		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
> > > > > +		goto out;
> > > > > +	}
> > > > > +
> > > > > +	if (kvm->arch.mmu_private_fault_mask) {
> > > > > +		private_fault = !!(err & kvm->arch.mmu_private_fault_mask);
> > > > > +		goto out;
> > > > > +	}
> > > > 
> > > > What's the convention of err? Can we abstract it by introducing a new bit
> > > > PFERR_PRIVATE_MASK? The caller sets it based on arch specific value.
> > > > the logic will be
> > > >         .is_private = err & PFERR_PRIVATE_MASK;
> > > 
> > > I'm not sure I understand the question. 'err' is just the page fault flags,
> > > and arch.mmu_private_fault_mask is something that can be set on a
> > > per-platform basis when running in a mode where shared/private access
> > > is recorded in the page fault flags during a #NPF.
> > > 
> > > I'm not sure how we'd keep the handling cross-platform by moving to a macro,
> > > since TDX uses a different bit, and we'd want to be able to build a
> > > SNP+TDX kernel that could run on either type of hardware.
> > > 
> > > Are you suggesting to reverse that and have err be set in a platform-specific
> > > way and then use a common PFERR_PRIVATE_MASK that's software-defined and
> > > consistent across platforms? That could work, but existing handling seems
> > > to use page fault flags as-is, keeping the hardware-set values, rather than
> > > modifying them to pass additional metadata, so it seems like it might
> > > make things more confusing to make an exception to that here. Or are
> > > there other cases where it's done that way?
> > 
> > I meant the latter, making PFERR_PRIVATE_MASK common software-defined.
> > 
> > I think the SVM fault handler can use hardware value directly by carefully
> > defining those PFERR values.
> > 
> > TDX doesn't have architectural bit in error code to indicate the private fault.
> > It's coded in faulted address as shared bit. GPA bit 51 or 47.
> > PFERR_{USER, WRITE, FETCH, PRESENT} are already software-defined value for VMX
> > (and TDX).  The fault handler for VMX, handle_ept_violation(), converts
> > encoding.  For TDX, PFERR_PRIVATE_MASK is just one more software defined bit.
> > 
> > I'm fine with either way, variable or macro. Which do you prefer?
> > 
> > - Define variable mmu_private_fault_mask (global or per struct kvm)
> >   The module initialization code, hardware_setup(), sets mmu_private_fault_mask.
> > - Define the software defined value, PFERR_PRIVATE_MASK.
> >   The caller of kvm_mmu_page_fault() parses the hardware value and construct
> >   software defined error_code.
> > - any other?
> >   
> > 
> > > > > +
> > > > > +	/*
> > > > > +	 * Handling below is for UPM self-tests and guests that treat userspace
> > > > > +	 * as the authority on whether a fault should be private or not.
> > > > > +	 */
> > > > > +	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
> > > > 
> > > > This code path is sad. One extra slot lookup and xarray look up.
> > > > Without mmu lock, the result can change by other vcpu.
> > > > Let's find a better way.
> > > 
> > > The intention was to rely on fault->mmu_seq to determine if a
> > > KVM_SET_MEMORY_ATTRIBUTES update came in after .private_fault was set so
> > > that fault handling could be retried, but that doesn't happen until
> > > kvm_faultin_pfn() which is *after* this is logged. So yes, I think there
> > > is a race here, and the approach you took in your Misc. series of
> > > keeping the kvm_mem_is_private() check inside kvm_faultin_pfn() is more
> > > efficient/correct.
> > > 
> > > If we can figure out a way to handle checking the fault flags in a way
> > > that works for both TDX/SNP (and KVM self-test use-case) we can
> > > consolidate around that.
> > 
> > I can think of the following ways. I think the second option is better because
> > we don't need exit bit for error code.
> > 
> > - Introduce software defined error code
> > - Add a flags to struct kvm_arch for self-test use-case VM_TYPE_PROTECTED_VM.
> >   Set it to true for VM_TYPE_PROTECTED_VM case.
> > - any other?
> 
> Vishal: hoping to get your thoughts here as well from the perspective of
> the KVM self-test use-case.
> 
> I was thinking that once we set fault->is_private, that sort of
> becomes our "software-defined" bit, and what KVM would use from that
> point forward to determine whether or not the access should be treated
> as a private one or not, and that whatever handler sets
> fault->is_private would encapsulate away all the platform-specific
> bit-checking needed to do that.
> 
> So if we were to straight-forwardly implement that based on how TDX
> currently handles checking for the shared bit in GPA, paired with how
> SEV-SNP handles checking for private bit in fault flags, it would look
> something like:
> 
>   bool kvm_fault_is_private(kvm, gpa, err)
>   {
>     /* SEV-SNP handling */
>     if (kvm->arch.mmu_private_fault_mask)
>       return !!(err & arch.mmu_private_fault_mask);
> 
>     /* TDX handling */
>     if (kvm->arch.gfn_shared_mask)
>       return !!(gpa & arch.gfn_shared_mask);
> 
>     return false;
>   }
> 
>   kvm_mmu_do_page_fault(vcpu, gpa, err, ...)
>   {
>     struct kvm_page_fault fault = {
>       ...
>       .is_private = kvm_fault_is_private(vcpu->kvm, gpa, err)
>     };
> 
>     ...
>   }
> 
> And then arch.mmu_private_fault_mask and arch.gfn_shared_mask would be
> set per-KVM-instance, just like they are now with current SNP and TDX
> patchsets, since stuff like KVM self-test wouldn't be setting those
> masks, so it makes sense to do it per-instance in that regard.
> 
> But that still gets a little awkward for the KVM self-test use-case where
> .is_private should sort of be ignored in favor of whatever the xarray
> reports via kvm_mem_is_private(). In your Misc. series I believe you
> handled this by introducing a PFERR_HASATTR_MASK bit so we can determine
> whether existing value of fault->is_private should be
> ignored/overwritten or not.
> 
> So maybe kvm_fault_is_private() needs to return an integer value
> instead, like:
> 
>   enum {
>     KVM_FAULT_VMM_DEFINED,
>     KVM_FAULT_SHARED,
>     KVM_FAULT_PRIVATE,
>   }
> 
>   bool kvm_fault_is_private(kvm, gpa, err)
>   {
>     /* SEV-SNP handling */
>     if (kvm->arch.mmu_private_fault_mask)
>       (err & arch.mmu_private_fault_mask) ? KVM_FAULT_PRIVATE : KVM_FAULT_SHARED
> 
>     /* TDX handling */
>     if (kvm->arch.gfn_shared_mask)
>       (gpa & arch.gfn_shared_mask) ? KVM_FAULT_SHARED : KVM_FAULT_PRIVATE
> 
>     return KVM_FAULT_VMM_DEFINED;
>   }
> 
> And then down in __kvm_faultin_pfn() we do:
> 
>   if (fault->is_private == KVM_FAULT_VMM_DEFINED)
>     fault->is_private = kvm_mem_is_private(vcpu->kvm, fault->gfn);
>   else if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn))
>     return kvm_do_memory_fault_exit(vcpu, fault);
> 
>   if (fault->is_private)
>     return kvm_faultin_pfn_private(vcpu, fault);
> 
> Maybe kvm_fault_is_private() can be simplified based on what direction
> we end up taking WRT ongoing discussions like whether we decide to define
> KVM_X86_{SNP,TDX}_VM vm_types in addition to the KVM_X86_PROTECTED_VM
> type that the selftests uses, but hoping that for this path, any changes
> along that line can be encapsulated away in kvm_fault_is_private() without
> any/much further churn at the various call-sites like __kvm_faultin_pfn().
> 
> We could even push all the above logic down into the KVM self-tests, but
> have:
> 
>   bool kvm_fault_is_private(kvm, gpa, err) {
>     return KVM_FAULT_VMM_DEFINED;
>   }
> 
> And that would be enough to run self-tests as standalone series, with
> TDX/SNP should filling in kvm_fault_is_private() with their
> platform-specific handling.
> 
> Does that seem reasonable to you? At least as a starting point. 

I tried to move the logic to the caller site and hide it from KVM MMU code.
You'd like to move the logic down into KVM MMU side and make the difference
explicit.  Either way works for me.  Let me respin my patches for the direction
you propose.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-21 23:00         ` Michael Roth
  2023-06-22  8:01           ` Isaku Yamahata
@ 2023-06-22  9:55           ` Huang, Kai
  2023-06-22 15:32             ` Michael Roth
  1 sibling, 1 reply; 102+ messages in thread
From: Huang, Kai @ 2023-06-22  9:55 UTC (permalink / raw)
  To: isaku.yamahata, michael.roth
  Cc: tglx, tobin, liam.merwick, kvm, Luck, Tony, jmattson, Lutomirski,
	Andy, ak, pbonzini, pgonda, srinivas.pandruvada, slp, rientjes,
	alpergun, peterz, linux-kernel, dovmurik, thomas.lendacky, Wang,
	Zhi A, x86, bp, Annapurve, Vishal, dgilbert, Christopherson,,
	Sean, vkuznets, vbabka, marcorr, ashish.kalra, linux-coco,
	nikunj.dadhania, Rodel, Jorg, mingo, sathyanarayanan.kuppuswamy,
	hpa, kirill, jarkko, ardb, linux-crypto, linux-mm, dave.hansen


> 
> So if we were to straight-forwardly implement that based on how TDX
> currently handles checking for the shared bit in GPA, paired with how
> SEV-SNP handles checking for private bit in fault flags, it would look
> something like:
> 
>   bool kvm_fault_is_private(kvm, gpa, err)
>   {
>     /* SEV-SNP handling */
>     if (kvm->arch.mmu_private_fault_mask)
>       return !!(err & arch.mmu_private_fault_mask);
> 
>     /* TDX handling */
>     if (kvm->arch.gfn_shared_mask)
>       return !!(gpa & arch.gfn_shared_mask);

The logic of the two are identical.  I think they need to be converged.

Either SEV-SNP should convert the error code private bit to the gfn_shared_mask,
or TDX's shared bit should be converted to some private error bit.

Perhaps converting SEV-SNP makes more sense because if I recall correctly SEV
guest also has a C-bit, correct?

Or, ...

> 
>     return false;
>   }
> 
>   kvm_mmu_do_page_fault(vcpu, gpa, err, ...)
>   {
>     struct kvm_page_fault fault = {
>       ...
>       .is_private = kvm_fault_is_private(vcpu->kvm, gpa, err)

... should we do something like:

	.is_private = static_call(kvm_x86_fault_is_private)(vcpu->kvm, gpa, 
							    err);

?

>     };
> 
>     ...
>   }
> 
> And then arch.mmu_private_fault_mask and arch.gfn_shared_mask would be
> set per-KVM-instance, just like they are now with current SNP and TDX
> patchsets, since stuff like KVM self-test wouldn't be setting those
> masks, so it makes sense to do it per-instance in that regard.
> 
> But that still gets a little awkward for the KVM self-test use-case where
> .is_private should sort of be ignored in favor of whatever the xarray
> reports via kvm_mem_is_private(). 
> 

I must have missed something.  Why does KVM self-test have impact to how does
KVM handles private fault? 

> In your Misc. series I believe you
> handled this by introducing a PFERR_HASATTR_MASK bit so we can determine
> whether existing value of fault->is_private should be
> ignored/overwritten or not.
> 
> So maybe kvm_fault_is_private() needs to return an integer value
> instead, like:
> 
>   enum {
>     KVM_FAULT_VMM_DEFINED,
>     KVM_FAULT_SHARED,
>     KVM_FAULT_PRIVATE,
>   }
> 
>   bool kvm_fault_is_private(kvm, gpa, err)
>   {
>     /* SEV-SNP handling */
>     if (kvm->arch.mmu_private_fault_mask)
>       (err & arch.mmu_private_fault_mask) ? KVM_FAULT_PRIVATE : KVM_FAULT_SHARED
> 
>     /* TDX handling */
>     if (kvm->arch.gfn_shared_mask)
>       (gpa & arch.gfn_shared_mask) ? KVM_FAULT_SHARED : KVM_FAULT_PRIVATE
> 
>     return KVM_FAULT_VMM_DEFINED;
>   }
> 
> And then down in __kvm_faultin_pfn() we do:
> 
>   if (fault->is_private == KVM_FAULT_VMM_DEFINED)
>     fault->is_private = kvm_mem_is_private(vcpu->kvm, fault->gfn);
>   else if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn))
>     return kvm_do_memory_fault_exit(vcpu, fault);
> 
>   if (fault->is_private)
>     return kvm_faultin_pfn_private(vcpu, fault);


What does KVM_FAULT_VMM_DEFINED mean, exactly?

Shouldn't the fault type come from _hardware_?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-22  9:55           ` Huang, Kai
@ 2023-06-22 15:32             ` Michael Roth
  2023-06-22 22:31               ` Huang, Kai
  0 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-22 15:32 UTC (permalink / raw)
  To: Huang, Kai
  Cc: isaku.yamahata, tglx, tobin, liam.merwick, kvm, Luck, Tony,
	jmattson, Lutomirski, Andy, ak, pbonzini, pgonda,
	srinivas.pandruvada, slp, rientjes, alpergun, peterz,
	linux-kernel, dovmurik, thomas.lendacky, Wang, Zhi A, x86, bp,
	Annapurve, Vishal, dgilbert, Christopherson,,
	Sean, vkuznets, vbabka, marcorr, ashish.kalra, linux-coco,
	nikunj.dadhania, Rodel, Jorg, mingo, sathyanarayanan.kuppuswamy,
	hpa, kirill, jarkko, ardb, linux-crypto, linux-mm, dave.hansen

On Thu, Jun 22, 2023 at 09:55:22AM +0000, Huang, Kai wrote:
> 
> > 
> > So if we were to straight-forwardly implement that based on how TDX
> > currently handles checking for the shared bit in GPA, paired with how
> > SEV-SNP handles checking for private bit in fault flags, it would look
> > something like:
> > 
> >   bool kvm_fault_is_private(kvm, gpa, err)
> >   {
> >     /* SEV-SNP handling */
> >     if (kvm->arch.mmu_private_fault_mask)
> >       return !!(err & arch.mmu_private_fault_mask);
> > 
> >     /* TDX handling */
> >     if (kvm->arch.gfn_shared_mask)
> >       return !!(gpa & arch.gfn_shared_mask);
> 
> The logic of the two are identical.  I think they need to be converged.

I think they're just different enough that trying too hard to converge
them might obfuscate things. If the determination didn't come from 2
completely different fields (gpa vs. fault flags) maybe it could be
simplified a bit more, but have well-defined open-coded handler that
gets called once to set fault->is_private during initial fault time so
that that ugliness never needs to be looked at again by KVM MMU seems
like a good way to keep things simple through the rest of the handling.

> 
> Either SEV-SNP should convert the error code private bit to the gfn_shared_mask,
> or TDX's shared bit should be converted to some private error bit.

struct kvm_page_fault seems to be the preferred way to pass additional
params/metadata around, and .is_private field was introduced to track
this private/shared state as part of UPM base series:

  https://lore.kernel.org/lkml/20221202061347.1070246-9-chao.p.peng@linux.intel.com/

So it seems like unecessary complexity to track/encode that state into
other additional places rather than just encapsulating it all in
fault->is_private (or some similar field), and synthesizing all this
platform-specific handling into a well-defined value that's conveyed
by something like fault->is_private in a way where KVM MMU doesn't need
to worry as much about platform-specific stuff seems like a good thing,
and in line with what the UPM base series was trying to do by adding the
fault->is_private field.

So all I'm really proposing is that whatever SNP and TDX end up doing
should center around setting that fault->is_private field and keeping
everything contained there. If there are better ways to handle *how*
that's done I don't have any complaints there, but moving/adding bits
to GPA/error_flags after fault time just seems unecessary to me when
fault->is_private field can serve that purpose just as well.

> 
> Perhaps converting SEV-SNP makes more sense because if I recall correctly SEV
> guest also has a C-bit, correct?

That's correct, but the C-bit doesn't show in the GPA that gets passed
up to KVM during an #NPF, and instead gets encoded into the fault's
error_flags.

> 
> Or, ...
> 
> > 
> >     return false;
> >   }
> > 
> >   kvm_mmu_do_page_fault(vcpu, gpa, err, ...)
> >   {
> >     struct kvm_page_fault fault = {
> >       ...
> >       .is_private = kvm_fault_is_private(vcpu->kvm, gpa, err)
> 
> ... should we do something like:
> 
> 	.is_private = static_call(kvm_x86_fault_is_private)(vcpu->kvm, gpa, 
> 							    err);

We actually had exactly this in v7 of SNP hypervisor patches:

  https://lore.kernel.org/linux-coco/20221214194056.161492-7-michael.roth@amd.com/T/#m17841f5bfdfb8350d69d78c6831dd8f3a4748638

but Sean was hoping to avoid a callback, which is why we ended up using
a bitmask in this version since it basically all that callback would
need to do. It's unfortunately that we don't have a common scheme
between SNP/TDX, but maybe that's still possible, I just think that
whatever that ends up being, it should live and be contained inside
whatever helper ends up setting fault->is_private.

There's some other awkwardness with a callback approach. It sort of ties
into your question about selftests so I'll address it below...


> 
> ?
> 
> >     };
> > 
> >     ...
> >   }
> > 
> > And then arch.mmu_private_fault_mask and arch.gfn_shared_mask would be
> > set per-KVM-instance, just like they are now with current SNP and TDX
> > patchsets, since stuff like KVM self-test wouldn't be setting those
> > masks, so it makes sense to do it per-instance in that regard.
> > 
> > But that still gets a little awkward for the KVM self-test use-case where
> > .is_private should sort of be ignored in favor of whatever the xarray
> > reports via kvm_mem_is_private(). 
> > 
> 
> I must have missed something.  Why does KVM self-test have impact to how does
> KVM handles private fault? 

The self-tests I'm referring to here are the ones from Vishal that shipped with
v10 of Chao's UPM/fd-based private memory series, and also as part of Sean's
gmem tree:

  https://github.com/sean-jc/linux/commit/a0f5f8c911804f55935094ad3a277301704330a6

These exercise gmem/UPM handling without the need for any SNP/TDX
hardware support. They do so by "trusting" the shared/private state
that the VMM sets via KVM_SET_MEMORY_ATTRIBUTES. So if VMM says it
should be private, KVM MMU will treat it as private, so we'd never
get a mismatch, so KVM_EXIT_MEMORY_FAULT will never be generated.

> 
> > In your Misc. series I believe you
> > handled this by introducing a PFERR_HASATTR_MASK bit so we can determine
> > whether existing value of fault->is_private should be
> > ignored/overwritten or not.
> > 
> > So maybe kvm_fault_is_private() needs to return an integer value
> > instead, like:
> > 
> >   enum {
> >     KVM_FAULT_VMM_DEFINED,
> >     KVM_FAULT_SHARED,
> >     KVM_FAULT_PRIVATE,
> >   }
> > 
> >   bool kvm_fault_is_private(kvm, gpa, err)
> >   {
> >     /* SEV-SNP handling */
> >     if (kvm->arch.mmu_private_fault_mask)
> >       (err & arch.mmu_private_fault_mask) ? KVM_FAULT_PRIVATE : KVM_FAULT_SHARED
> > 
> >     /* TDX handling */
> >     if (kvm->arch.gfn_shared_mask)
> >       (gpa & arch.gfn_shared_mask) ? KVM_FAULT_SHARED : KVM_FAULT_PRIVATE
> > 
> >     return KVM_FAULT_VMM_DEFINED;
> >   }
> > 
> > And then down in __kvm_faultin_pfn() we do:
> > 
> >   if (fault->is_private == KVM_FAULT_VMM_DEFINED)
> >     fault->is_private = kvm_mem_is_private(vcpu->kvm, fault->gfn);
> >   else if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn))
> >     return kvm_do_memory_fault_exit(vcpu, fault);
> > 
> >   if (fault->is_private)
> >     return kvm_faultin_pfn_private(vcpu, fault);
> 
> 
> What does KVM_FAULT_VMM_DEFINED mean, exactly?
> 
> Shouldn't the fault type come from _hardware_?

In above self-test use-case, there is no reliance on hardware support,
and fault->is_private should always be treated as being whatever was
set by the VMM via KVM_SET_MEMORY_ATTRIBUTES, so that's why I proposed
the KVM_FAULT_VMM_DEFINED value to encode that case into
fault->is_private so KVM MMU and handle protected self-test VMs of this
sort.

In the future, this protected self-test VMs might become the basis of
a new protected VM type where some sort of guest-issued hypercall can
be used to set whether a fault should be treated as shared/private,
rather than relying on VMM-defined value. There's some current discussion
about that here:

  https://lore.kernel.org/lkml/20230620190443.GU2244082@ls.amr.corp.intel.com/T/#me627bed3d9acf73ea882e8baa76dfcb27759c440

Going back to your callback question above, that makes things a little
awkward, since kvm_x86_ops is statically defined for both
kvm_amd/kvm_intel modules, and either can run these self-tests guests as
well as these proposed "non-CC VMs" which rely on enlightened guest
kernels instead of TDX/SNPhardware support for managing private/shared
access.

So you either need to duplicate the handling for how to determine
private/shared for these other types into the kvm_intel/kvm_amd callbacks,
or have some way for the callback to say to "fall back to the common
handling for self-tests and non-CC VMs". The latter is what we implemented
in v8 of this series, but Isaku suggested it was a bit too heavyweight
and proposed dropping the fall-back logic in favor of updating the
kvm_x86_ops at run-time once we know whether or not it's a TDX/SNP guest:

  https://lkml.iu.edu/hypermail/linux/kernel/2303.2/03009.html

which could work, but it still doesn't address Sean's desire to avoid
callbacks completely, and still amounts to a somewhat convulated way
to hide away TDX/SNP-specific bit checks for shared/private. Rather
than hide them away in callbacks that are already frowned upon by
maintainer, I think it makes sense to "open-code" all these checks in a
common handler like kvm_fault_is_private() to we can make some progress
toward a consensus, and then iterate on it from there rather than
refining what may already be a dead-end path.

-Mike


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-22 15:32             ` Michael Roth
@ 2023-06-22 22:31               ` Huang, Kai
  2023-06-22 23:39                 ` Isaku Yamahata
  0 siblings, 1 reply; 102+ messages in thread
From: Huang, Kai @ 2023-06-22 22:31 UTC (permalink / raw)
  To: michael.roth
  Cc: srinivas.pandruvada, liam.merwick, tobin, alpergun, Luck, Tony,
	jmattson, Lutomirski, Andy, ak, pbonzini, kvm, tglx, slp,
	dovmurik, Wang, Zhi A, linux-kernel, pgonda, thomas.lendacky,
	rientjes, peterz, x86, bp, Annapurve, Vishal, dgilbert,
	Christopherson,,
	Sean, vkuznets, marcorr, vbabka, ashish.kalra, linux-coco,
	nikunj.dadhania, Rodel, Jorg, isaku.yamahata, mingo,
	sathyanarayanan.kuppuswamy, kirill, jarkko, hpa, ardb,
	linux-crypto, linux-mm, dave.hansen

On Thu, 2023-06-22 at 10:32 -0500, Michael Roth wrote:
> On Thu, Jun 22, 2023 at 09:55:22AM +0000, Huang, Kai wrote:
> > 
> > > 
> > > So if we were to straight-forwardly implement that based on how TDX
> > > currently handles checking for the shared bit in GPA, paired with how
> > > SEV-SNP handles checking for private bit in fault flags, it would look
> > > something like:
> > > 
> > >   bool kvm_fault_is_private(kvm, gpa, err)
> > >   {
> > >     /* SEV-SNP handling */
> > >     if (kvm->arch.mmu_private_fault_mask)
> > >       return !!(err & arch.mmu_private_fault_mask);
> > > 
> > >     /* TDX handling */
> > >     if (kvm->arch.gfn_shared_mask)
> > >       return !!(gpa & arch.gfn_shared_mask);
> > 
> > The logic of the two are identical.  I think they need to be converged.
> 
> I think they're just different enough that trying too hard to converge
> them might obfuscate things. If the determination didn't come from 2
> completely different fields (gpa vs. fault flags) maybe it could be
> simplified a bit more, but have well-defined open-coded handler that
> gets called once to set fault->is_private during initial fault time so
> that that ugliness never needs to be looked at again by KVM MMU seems
> like a good way to keep things simple through the rest of the handling.

I actually agree with the second half that is after "but ...", but IIUC it
doesn't conflict with converging arch.mmu_private_fault_mask and
arch.gfn_shared_mask into one.  No?

> 
> > 
> > Either SEV-SNP should convert the error code private bit to the gfn_shared_mask,
> > or TDX's shared bit should be converted to some private error bit.
> 
> struct kvm_page_fault seems to be the preferred way to pass additional
> params/metadata around, and .is_private field was introduced to track
> this private/shared state as part of UPM base series:
> 
>   https://lore.kernel.org/lkml/20221202061347.1070246-9-chao.p.peng@linux.intel.com/

Sure.  Agreed.

> 
> So it seems like unecessary complexity to track/encode that state into
> other additional places rather than just encapsulating it all in
> fault->is_private (or some similar field), and synthesizing all this
> platform-specific handling into a well-defined value that's conveyed
> by something like fault->is_private in a way where KVM MMU doesn't need
> to worry as much about platform-specific stuff seems like a good thing,
> and in line with what the UPM base series was trying to do by adding the
> fault->is_private field.

Yes we should just set fault->is_private and pass to the rest of the common KVM
MMU code.

> 
> So all I'm really proposing is that whatever SNP and TDX end up doing
> should center around setting that fault->is_private field and keeping
> everything contained there. 
> 

I agree.

> If there are better ways to handle *how*
> that's done I don't have any complaints there, but moving/adding bits
> to GPA/error_flags after fault time just seems unecessary to me when
> fault->is_private field can serve that purpose just as well.

Perhaps you missed my point.  My point is arch.mmu_private_fault_mask and
arch.gfn_shared_mask seem redundant because the logic around them are exactly
the same.  I do believe we should have fault->is_private passing to the common
MMU code.

In fact, now I am wondering why we need to have "mmu_private_fault_mask" and
"gfn_shared_mask" in _common_ KVM MMU code.  We already have enough mechanism in
KVM common code:

  1) fault->is_private
  2) kvm_mmu_page_role.private
  3) an Xarray to tell whether a GFN is private or shared

I am not convinced that we need to have "mmu_private_fault_mask" and
"gfn_shared_mask" in common KVM MMU code.  Instead, they can be in AMD and
Intel's vendor code.

Maybe it makes sense to have "gfn_shared_mask" in the KVM common code so that
the fault handler can just strip away the "shared bit" at the very beginning (at
least for TDX), but for the rest of the time I think we should already have
enough infrastructure to handle private/shared mapping.

Btw, one minor issue is, if I recall correctly, for TDX the shared bit must be
applied to the GFN for shared mapping in normal EPT.  I guess AMD doesn't need
that for shared mapping.  So "gfn_shared_mask" maybe useful in this case, but
w/o it I believe we can also achieve in another way via vendor callback.

> 
> > 
> > Perhaps converting SEV-SNP makes more sense because if I recall correctly SEV
> > guest also has a C-bit, correct?
> 
> That's correct, but the C-bit doesn't show in the GPA that gets passed
> up to KVM during an #NPF, and instead gets encoded into the fault's
> error_flags.
> 
> > 
> > Or, ...
> > 
> > > 
> > >     return false;
> > >   }
> > > 
> > >   kvm_mmu_do_page_fault(vcpu, gpa, err, ...)
> > >   {
> > >     struct kvm_page_fault fault = {
> > >       ...
> > >       .is_private = kvm_fault_is_private(vcpu->kvm, gpa, err)
> > 
> > ... should we do something like:
> > 
> > 	.is_private = static_call(kvm_x86_fault_is_private)(vcpu->kvm, gpa, 
> > 							    err);
> 
> We actually had exactly this in v7 of SNP hypervisor patches:
> 
>   https://lore.kernel.org/linux-coco/20221214194056.161492-7-michael.roth@amd.com/T/#m17841f5bfdfb8350d69d78c6831dd8f3a4748638
> 
> but Sean was hoping to avoid a callback, which is why we ended up using
> a bitmask in this version since it basically all that callback would
> need to do. It's unfortunately that we don't have a common scheme
> between SNP/TDX, but maybe that's still possible, I just think that
> whatever that ends up being, it should live and be contained inside
> whatever helper ends up setting fault->is_private.

Sure.  If the function call can be avoided in fault handler then we should.

> 
> There's some other awkwardness with a callback approach. It sort of ties
> into your question about selftests so I'll address it below...
> 
> 
> > 
> > ?
> > 
> > >     };
> > > 
> > >     ...
> > >   }
> > > 
> > > And then arch.mmu_private_fault_mask and arch.gfn_shared_mask would be
> > > set per-KVM-instance, just like they are now with current SNP and TDX
> > > patchsets, since stuff like KVM self-test wouldn't be setting those
> > > masks, so it makes sense to do it per-instance in that regard.
> > > 
> > > But that still gets a little awkward for the KVM self-test use-case where
> > > .is_private should sort of be ignored in favor of whatever the xarray
> > > reports via kvm_mem_is_private(). 
> > > 
> > 
> > I must have missed something.  Why does KVM self-test have impact to how does
> > KVM handles private fault? 
> 
> The self-tests I'm referring to here are the ones from Vishal that shipped with
> v10 of Chao's UPM/fd-based private memory series, and also as part of Sean's
> gmem tree:
> 
>   https://github.com/sean-jc/linux/commit/a0f5f8c911804f55935094ad3a277301704330a6
> 
> These exercise gmem/UPM handling without the need for any SNP/TDX
> hardware support. They do so by "trusting" the shared/private state
> that the VMM sets via KVM_SET_MEMORY_ATTRIBUTES. So if VMM says it
> should be private, KVM MMU will treat it as private, so we'd never
> get a mismatch, so KVM_EXIT_MEMORY_FAULT will never be generated.

If KVM supports a test mode, or by reading another thread, KVM wants to support
a software-based KVM_X86_PROTECTED_VM and distinguish it with hardware-based
confidential VMs such as TDX and SEV-SNP:

https://lore.kernel.org/lkml/9e3e99f78fcbd7db21368b5fe1d931feeb4db567.1686858861.git.isaku.yamahata@intel.com/T/#me627bed3d9acf73ea882e8baa76dfcb27759c440

then it's fine to handle it when setting up fault->is_private.  But my point is
KVM self-test itself should not impact how does KVM implement fault handler --
it is KVM itself wants to support additional software-based things.

[...]

(Thanks for explaining additional background).

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-22 22:31               ` Huang, Kai
@ 2023-06-22 23:39                 ` Isaku Yamahata
  2023-06-22 23:52                   ` Huang, Kai
  0 siblings, 1 reply; 102+ messages in thread
From: Isaku Yamahata @ 2023-06-22 23:39 UTC (permalink / raw)
  To: Huang, Kai
  Cc: michael.roth, srinivas.pandruvada, liam.merwick, tobin, alpergun,
	Luck, Tony, jmattson, Lutomirski, Andy, ak, pbonzini, kvm, tglx,
	slp, dovmurik, Wang, Zhi A, linux-kernel, pgonda,
	thomas.lendacky, rientjes, peterz, x86, bp, Annapurve, Vishal,
	dgilbert, Christopherson,,
	Sean, vkuznets, marcorr, vbabka, ashish.kalra, linux-coco,
	nikunj.dadhania, Rodel, Jorg, isaku.yamahata, mingo,
	sathyanarayanan.kuppuswamy, kirill, jarkko, hpa, ardb,
	linux-crypto, linux-mm, dave.hansen

On Thu, Jun 22, 2023 at 10:31:08PM +0000,
"Huang, Kai" <kai.huang@intel.com> wrote:

> > If there are better ways to handle *how*
> > that's done I don't have any complaints there, but moving/adding bits
> > to GPA/error_flags after fault time just seems unecessary to me when
> > fault->is_private field can serve that purpose just as well.
> 
> Perhaps you missed my point.  My point is arch.mmu_private_fault_mask and
> arch.gfn_shared_mask seem redundant because the logic around them are exactly
> the same.  I do believe we should have fault->is_private passing to the common
> MMU code.
> 
> In fact, now I am wondering why we need to have "mmu_private_fault_mask" and
> "gfn_shared_mask" in _common_ KVM MMU code.  We already have enough mechanism in
> KVM common code:
> 
>   1) fault->is_private
>   2) kvm_mmu_page_role.private
>   3) an Xarray to tell whether a GFN is private or shared
> 
> I am not convinced that we need to have "mmu_private_fault_mask" and
> "gfn_shared_mask" in common KVM MMU code.  Instead, they can be in AMD and
> Intel's vendor code.
> 
> Maybe it makes sense to have "gfn_shared_mask" in the KVM common code so that
> the fault handler can just strip away the "shared bit" at the very beginning (at
> least for TDX), but for the rest of the time I think we should already have
> enough infrastructure to handle private/shared mapping.
> 
> Btw, one minor issue is, if I recall correctly, for TDX the shared bit must be
> applied to the GFN for shared mapping in normal EPT.  I guess AMD doesn't need
> that for shared mapping.  So "gfn_shared_mask" maybe useful in this case, but
> w/o it I believe we can also achieve in another way via vendor callback.


"2) kvm_mmu_page_role.private" above has different meaning.

a). The fault is private or not.
b). page table the fault handler is walking is private or conventional.

a.) is common for SNP, TDX and PROTECTED_VM. It makes sense in
kvm_mmu_do_page_fault() and __kvm_faultin_pfn(). After kvm_faultin_pfn(), the
fault handler can mostly forget it for SNP and PROTECTED_VM. (large page
adjustment needs it, though.) This is what we're discussing in this thread.

b.) is specific to TDX. TDX KVM MMU introduces one more page table.

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-22 23:39                 ` Isaku Yamahata
@ 2023-06-22 23:52                   ` Huang, Kai
  2023-06-23 14:43                     ` Isaku Yamahata
  0 siblings, 1 reply; 102+ messages in thread
From: Huang, Kai @ 2023-06-22 23:52 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: tglx, liam.merwick, tobin, alpergun, Luck, Tony, jmattson,
	Lutomirski, Andy, ak, pbonzini, kvm, srinivas.pandruvada, slp,
	dovmurik, michael.roth, peterz, linux-kernel, pgonda,
	thomas.lendacky, rientjes, Wang, Zhi A, x86, bp, Annapurve,
	Vishal, dgilbert, Christopherson,,
	Sean, vkuznets, marcorr, vbabka, ashish.kalra, linux-coco,
	nikunj.dadhania, Rodel, Jorg, mingo, sathyanarayanan.kuppuswamy,
	hpa, kirill, jarkko, ardb, linux-crypto, linux-mm, dave.hansen

On Thu, 2023-06-22 at 16:39 -0700, Isaku Yamahata wrote:
> On Thu, Jun 22, 2023 at 10:31:08PM +0000,
> "Huang, Kai" <kai.huang@intel.com> wrote:
> 
> > > If there are better ways to handle *how*
> > > that's done I don't have any complaints there, but moving/adding bits
> > > to GPA/error_flags after fault time just seems unecessary to me when
> > > fault->is_private field can serve that purpose just as well.
> > 
> > Perhaps you missed my point.  My point is arch.mmu_private_fault_mask and
> > arch.gfn_shared_mask seem redundant because the logic around them are exactly
> > the same.  I do believe we should have fault->is_private passing to the common
> > MMU code.
> > 
> > In fact, now I am wondering why we need to have "mmu_private_fault_mask" and
> > "gfn_shared_mask" in _common_ KVM MMU code.  We already have enough mechanism in
> > KVM common code:
> > 
> >   1) fault->is_private
> >   2) kvm_mmu_page_role.private
> >   3) an Xarray to tell whether a GFN is private or shared
> > 
> > I am not convinced that we need to have "mmu_private_fault_mask" and
> > "gfn_shared_mask" in common KVM MMU code.  Instead, they can be in AMD and
> > Intel's vendor code.
> > 
> > Maybe it makes sense to have "gfn_shared_mask" in the KVM common code so that
> > the fault handler can just strip away the "shared bit" at the very beginning (at
> > least for TDX), but for the rest of the time I think we should already have
> > enough infrastructure to handle private/shared mapping.
> > 
> > Btw, one minor issue is, if I recall correctly, for TDX the shared bit must be
> > applied to the GFN for shared mapping in normal EPT.  I guess AMD doesn't need
> > that for shared mapping.  So "gfn_shared_mask" maybe useful in this case, but
> > w/o it I believe we can also achieve in another way via vendor callback.
> 
> 
> "2) kvm_mmu_page_role.private" above has different meaning.
> 
> a). The fault is private or not.
> b). page table the fault handler is walking is private or conventional.
> 
> a.) is common for SNP, TDX and PROTECTED_VM. It makes sense in
> kvm_mmu_do_page_fault() and __kvm_faultin_pfn(). After kvm_faultin_pfn(), the
> fault handler can mostly forget it for SNP and PROTECTED_VM. (large page
> adjustment needs it, though.) This is what we're discussing in this thread.
> 
> b.) is specific to TDX. TDX KVM MMU introduces one more page table.
> 
> 

I don't buy the last sentence.  Even it's not necessarily for AMD from
hardware's perspective, but the concept remains true for AMD too.  So why cannot
we use it for AMD?

Note "shared_gfn_mask" and kvm_mmu_page_role.private is kinda redundant too, I
do believe we should avoid introducing a lot fields to KVM *common* code due to
vendor differences (i.e., in this case, "gfn_shared_mask" and
"mmu_private_fault_mask").

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask
  2023-06-22 23:52                   ` Huang, Kai
@ 2023-06-23 14:43                     ` Isaku Yamahata
  0 siblings, 0 replies; 102+ messages in thread
From: Isaku Yamahata @ 2023-06-23 14:43 UTC (permalink / raw)
  To: Huang, Kai
  Cc: isaku.yamahata, tglx, liam.merwick, tobin, alpergun, Luck, Tony,
	jmattson, Lutomirski, Andy, ak, pbonzini, kvm,
	srinivas.pandruvada, slp, dovmurik, michael.roth, peterz,
	linux-kernel, pgonda, thomas.lendacky, rientjes, Wang, Zhi A,
	x86, bp, Annapurve, Vishal, dgilbert, Christopherson,,
	Sean, vkuznets, marcorr, vbabka, ashish.kalra, linux-coco,
	nikunj.dadhania, Rodel, Jorg, mingo, sathyanarayanan.kuppuswamy,
	hpa, kirill, jarkko, ardb, linux-crypto, linux-mm, dave.hansen

On Thu, Jun 22, 2023 at 11:52:56PM +0000,
"Huang, Kai" <kai.huang@intel.com> wrote:

> On Thu, 2023-06-22 at 16:39 -0700, Isaku Yamahata wrote:
> > On Thu, Jun 22, 2023 at 10:31:08PM +0000,
> > "Huang, Kai" <kai.huang@intel.com> wrote:
> > 
> > > > If there are better ways to handle *how*
> > > > that's done I don't have any complaints there, but moving/adding bits
> > > > to GPA/error_flags after fault time just seems unecessary to me when
> > > > fault->is_private field can serve that purpose just as well.
> > > 
> > > Perhaps you missed my point.  My point is arch.mmu_private_fault_mask and
> > > arch.gfn_shared_mask seem redundant because the logic around them are exactly
> > > the same.  I do believe we should have fault->is_private passing to the common
> > > MMU code.
> > > 
> > > In fact, now I am wondering why we need to have "mmu_private_fault_mask" and
> > > "gfn_shared_mask" in _common_ KVM MMU code.  We already have enough mechanism in
> > > KVM common code:
> > > 
> > >   1) fault->is_private
> > >   2) kvm_mmu_page_role.private
> > >   3) an Xarray to tell whether a GFN is private or shared
> > > 
> > > I am not convinced that we need to have "mmu_private_fault_mask" and
> > > "gfn_shared_mask" in common KVM MMU code.  Instead, they can be in AMD and
> > > Intel's vendor code.
> > > 
> > > Maybe it makes sense to have "gfn_shared_mask" in the KVM common code so that
> > > the fault handler can just strip away the "shared bit" at the very beginning (at
> > > least for TDX), but for the rest of the time I think we should already have
> > > enough infrastructure to handle private/shared mapping.
> > > 
> > > Btw, one minor issue is, if I recall correctly, for TDX the shared bit must be
> > > applied to the GFN for shared mapping in normal EPT.  I guess AMD doesn't need
> > > that for shared mapping.  So "gfn_shared_mask" maybe useful in this case, but
> > > w/o it I believe we can also achieve in another way via vendor callback.
> > 
> > 
> > "2) kvm_mmu_page_role.private" above has different meaning.
> > 
> > a). The fault is private or not.
> > b). page table the fault handler is walking is private or conventional.
> > 
> > a.) is common for SNP, TDX and PROTECTED_VM. It makes sense in
> > kvm_mmu_do_page_fault() and __kvm_faultin_pfn(). After kvm_faultin_pfn(), the
> > fault handler can mostly forget it for SNP and PROTECTED_VM. (large page
> > adjustment needs it, though.) This is what we're discussing in this thread.
> > 
> > b.) is specific to TDX. TDX KVM MMU introduces one more page table.
> > 
> > 
> 
> I don't buy the last sentence.  Even it's not necessarily for AMD from
> hardware's perspective, but the concept remains true for AMD too.  So why cannot
> we use it for AMD?

We can use it for AMD. Let me rephrase it.
TDX only uses it now. SEV-SNP may or may not use it at their option.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile
  2023-06-21  8:54       ` Borislav Petkov
@ 2023-06-29 21:02         ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-06-29 21:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
	ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, liam.merwick,
	zhi.a.wang, Kirill A . Shutemov

On Wed, Jun 21, 2023 at 10:54:00AM +0200, Borislav Petkov wrote:
> On Tue, Jun 20, 2023 at 03:43:15PM -0500, Michael Roth wrote:
> > Basically, arch/x86/coco/Makefile is never processed if arch/x86/Kbuild
> > indicates that CONFIG_HAS_CC_PLATFORM is not set. So if we want to have
> > stuff in arch/x86/coco/Makefile that build for !CONFIG_HAS_CC_PLATFORM,
> > like SNP host support, which does not rely on CONFIG_HAS_CC_PLATFORM
> > being set, that check needs to be moved down into arch/x86/coco/Makefile.
> 
> Ok, so if you put SNP host support into arch/x86/virt/svm/sev.c, that
> should work too and won't have any relation to CONFIG_HAS_CC_PLATFORM,
> right?

Right, that works out just as well, and ends up being a bit more
straightforward. I have it implemented here:

  https://github.com/mdroth/linux/commits/snp-host-latest-v9b

  https://github.com/mdroth/linux/commit/a889a2dd64b62d9c3bf74cf02e7d8d71c7061667

and dropped the patch that reworks arch/x86/coco/Makefile.

Thanks,

Mike

> 
> The CC_PLATFORM thing is a way to check for confidential computing guest
> features by abstracting the capabilities so that you don't have to check
> *each* and *every* conf guest type in the conditionals and thus go nuts.
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 09/51] x86/sev: Add RMP entry lookup helpers
  2023-06-12 16:08   ` Dave Hansen
@ 2023-06-30 21:57     ` Michael Roth
  2023-06-30 22:29       ` Dave Hansen
  0 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-06-30 21:57 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Mon, Jun 12, 2023 at 09:08:58AM -0700, Dave Hansen wrote:
> On 6/11/23 21:25, Michael Roth wrote:
> > +/*
> > + * The RMP entry format is not architectural. The format is defined in PPR
> > + * Family 19h Model 01h, Rev B1 processor.
> > + */
> > +struct rmpentry {
> > +	union {
> > +		struct {
> > +			u64	assigned	: 1,
> > +				pagesize	: 1,
> > +				immutable	: 1,
> > +				rsvd1		: 9,
> > +				gpa		: 39,
> > +				asid		: 10,
> > +				vmsa		: 1,
> > +				validated	: 1,
> > +				rsvd2		: 1;
> > +		} info;
> > +		u64 low;
> > +	};
> > +	u64 high;
> > +} __packed;
> 
> What's 'high' used for?  The PPR says it's reserved.  Why not call it
> reserved?
>
> It _looks_ like it's only used for a debugging pr_info().  It makes the
> struct look kinda goofy.  I'd much rather limit the goofiness to the
> "dumping" code, like:
> 
>      u64 *__e = (void *)e;
>      ....
>      pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
>                                pfn << PAGE_SHIFT, __e[0], __e[1]);
> 
> BTW, why does it do any good to dump all these reserved fields?
> 

The reserved bits sometimes contain information that can be useful to
pass along to folks on the firmware side, so would definitely be helpful
to provide the full raw contents of the RMP entry.

So maybe something like this better captures the intended usage:

    struct rmpentry {
        union {
            struct {
                u64 assigned        : 1,
                    pagesize        : 1,
                    immutable       : 1,
                    rsvd1           : 9,
                    gpa             : 39,
                    asid            : 10,
                    vmsa            : 1,
                    validated       : 1,
                    rsvd2           : 1;
                u64 rsvd3;
            } info;
            u64 data[2];
        };
    } __packed;

But dropping the union and casting to u64[] locally in the debug/dumping
routine should work fine as well.

> >  /*
> >   * The first 16KB from the RMP_BASE is used by the processor for the
> >   * bookkeeping, the range needs to be added during the RMP entry lookup.
> >   */
> >  #define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
> > +#define RMPENTRY_SHIFT			8
> > +#define rmptable_page_offset(x)	(RMPTABLE_CPU_BOOKKEEPING_SZ +		\
> > +				 (((unsigned long)x) >> RMPENTRY_SHIFT))
> 
> Is RMPENTRY_SHIFT just log2(sizeof(struct rmpentry))?  If so, wouldn't
> this be a *LOT* more obvious to be written:
> 
> unsigned long rmptable_offset(unsigned long paddr)
> {
> 	return 	RMPTABLE_CPU_BOOKKEEPING_SZ +
> 		paddr / sizeof(struct rmpentry);
> }
> 
> ??
> 
> It removes a cast, gives 'x' a real name, removes a random magic number
> #define and is clearer to boot.

Yes, we ended up reworking this to be an array of struct rmpentry's that
are indexed by PFN. That helps to simplify the lookup logic a good bit.

> 
> >  static unsigned long rmptable_start __ro_after_init;
> >  static unsigned long rmptable_end __ro_after_init;
> > @@ -210,3 +235,63 @@ static int __init snp_rmptable_init(void)
> >   * the page(s) used for DMA are hypervisor owned.
> >   */
> >  fs_initcall(snp_rmptable_init);
> > +
> > +static inline unsigned int rmpentry_assigned(const struct rmpentry *e)
> > +{
> > +	return e->info.assigned;
> > +}
> > +
> > +static inline unsigned int rmpentry_pagesize(const struct rmpentry *e)
> > +{
> > +	return e->info.pagesize;
> > +}
> 
> I think these are a little silly.  If you're going to go to the trouble
> of using bitfields, you don't need helpers for every bitfield.  I say
> either use bitfields without helpers or just regular old:
> 
> #define RMPENTRY_ASSIGNED_MASK	BIT(1)
> 
> and then *maybe* you can make helpers for them.

Yah, if we have the bitfields it makes sense to use them directly.

> 
> > +static int rmptable_entry(unsigned long paddr, struct rmpentry *entry)
> > +{
> > +	unsigned long vaddr;
> > +
> > +	vaddr = rmptable_start + rmptable_page_offset(paddr);
> > +	if (unlikely(vaddr > rmptable_end))
> > +		return -EFAULT;
> 
> This seems like more of a WARN_ON_ONCE() kind of thing than just an
> error return.  Isn't it a big deal if an invalid (non-RMP-covered)
> address makes it in here?

Yes, now the lookups are by indexing into an array of struct rmpentry's,
that scenario basically the same as an attempted out-of-bounds access,
so definitely worth a warning.

> 
> > +	*entry = *(struct rmpentry *)vaddr;
> > +
> > +	return 0;
> > +}
> > +
> > +static int __snp_lookup_rmpentry(u64 pfn, struct rmpentry *entry, int *level)
> > +{
> 
> I've been bitten by pfn vs. paddr quite a few times.  I'd really
> encourage you to add it to the names if you're going to pass them
> around, like __snp_lookup_rmpentry_pfn().
> 
> > +	unsigned long paddr = pfn << PAGE_SHIFT;
> > +	struct rmpentry large_entry;
> > +	int ret;
> > +
> > +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> > +		return -ENXIO;
> 
> OK, so if you're running on non-SNP hardware, you return -ENXIO here.
> Remember this please.
> 
> > +	ret = rmptable_entry(paddr, entry);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Read a large RMP entry to get the correct page level used in RMP entry. */
> > +	ret = rmptable_entry(paddr & PMD_MASK, &large_entry);
> > +	if (ret)
> > +		return ret;
> > +
> > +	*level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(&large_entry));
> > +
> > +	return 0;
> > +}
> 
> This is a bit weird.  Should it say something like this?
> 
> To do an 4k RMP lookup the hardware looks at two places in the RMP:

I'd word this as:

  "To query all the relevant bit of an 4k RMP entry, the kernel must access
   2 entries in the RMP table:"

Because it's possible hardware only looks at the 2M entry for
hardware-based lookups, depending on where the access is coming from, or
how the memory at the PFN range is mapped.

But otherwise it seems like an accurate description.

> 
> 	1. The actual 4k RMP entry
> 	2. The 2M entry that "covers" the 4k entry
> 
> The page size is not indicated in the 4k entries at all.  It is solely
> indicated by the 2M entry.
> 
> > +int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
> > +{
> > +	struct rmpentry e;
> > +	int ret;
> > +
> > +	ret = __snp_lookup_rmpentry(pfn, &e, level);
> > +	if (ret)
> > +		return ret;
> > +
> > +	*assigned = !!rmpentry_assigned(&e);
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> > index b8357d6ecd47..bf0378136289 100644
> > --- a/arch/x86/include/asm/sev-common.h
> > +++ b/arch/x86/include/asm/sev-common.h
> > @@ -171,4 +171,8 @@ struct snp_psc_desc {
> >  #define GHCB_ERR_INVALID_INPUT		5
> >  #define GHCB_ERR_INVALID_EVENT		6
> >  
> > +/* RMP page size */
> > +#define RMP_PG_SIZE_4K			0
> > +#define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> > +
> >  #endif
> > diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
> > new file mode 100644
> > index 000000000000..30d47e20081d
> > --- /dev/null
> > +++ b/arch/x86/include/asm/sev-host.h
> > @@ -0,0 +1,22 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * AMD SVM-SEV Host Support.
> > + *
> > + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> > + *
> > + * Author: Ashish Kalra <ashish.kalra@amd.com>
> > + *
> > + */
> > +
> > +#ifndef __ASM_X86_SEV_HOST_H
> > +#define __ASM_X86_SEV_HOST_H
> > +
> > +#include <asm/sev-common.h>
> > +
> > +#ifdef CONFIG_KVM_AMD_SEV
> > +int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
> > +#else
> > +static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
> > +#endif
> 
> Above, -ENXIO was returned when SEV-SNP was not supported.  Here, 0 is
> returned when it is compiled out.  That inconsistent.
> 
> Is snp_lookup_rmpentry() acceptable when SEV-SNP is in play?  I'd like
> to see consistency between when it is compiled out and when it is
> compiled in but unsupported on the CPU.

I really don't think anything in the kernel should be calling
snp_lookup_rmpentry(), so I think it makes sense to adoption the -ENXIO
convention here and in any other stubs where that applies.

Thanks,

Mike

> 
> > +#endif
> > diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> > index d34c46db7dd1..446fc7a9f7b0 100644
> > --- a/arch/x86/include/asm/sev.h
> > +++ b/arch/x86/include/asm/sev.h
> > @@ -81,9 +81,6 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
> >  /* Software defined (when rFlags.CF = 1) */
> >  #define PVALIDATE_FAIL_NOUPDATE		255
> >  
> > -/* RMP page size */
> > -#define RMP_PG_SIZE_4K			0
> > -
> >  #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
> >  
> >  /* SNP Guest message request */
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 09/51] x86/sev: Add RMP entry lookup helpers
  2023-06-30 21:57     ` Michael Roth
@ 2023-06-30 22:29       ` Dave Hansen
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Hansen @ 2023-06-30 22:29 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

On 6/30/23 14:57, Michael Roth wrote:
> On Mon, Jun 12, 2023 at 09:08:58AM -0700, Dave Hansen wrote:
>> On 6/11/23 21:25, Michael Roth wrote:
>>> +/*
>>> + * The RMP entry format is not architectural. The format is defined in PPR
>>> + * Family 19h Model 01h, Rev B1 processor.
>>> + */
>>> +struct rmpentry {
>>> +	union {
>>> +		struct {
>>> +			u64	assigned	: 1,
>>> +				pagesize	: 1,
>>> +				immutable	: 1,
>>> +				rsvd1		: 9,
>>> +				gpa		: 39,
>>> +				asid		: 10,
>>> +				vmsa		: 1,
>>> +				validated	: 1,
>>> +				rsvd2		: 1;
>>> +		} info;
>>> +		u64 low;
>>> +	};
>>> +	u64 high;
>>> +} __packed;
>>
>> What's 'high' used for?  The PPR says it's reserved.  Why not call it
>> reserved?
>>
>> It _looks_ like it's only used for a debugging pr_info().  It makes the
>> struct look kinda goofy.  I'd much rather limit the goofiness to the
>> "dumping" code, like:
>>
>>      u64 *__e = (void *)e;
>>      ....
>>      pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
>>                                pfn << PAGE_SHIFT, __e[0], __e[1]);
>>
>> BTW, why does it do any good to dump all these reserved fields?
>>
> 
> The reserved bits sometimes contain information that can be useful to
> pass along to folks on the firmware side, so would definitely be helpful
> to provide the full raw contents of the RMP entry.

Ahh, OK.  Could you include a comment to that effect, please?

> So maybe something like this better captures the intended usage:
> 
>     struct rmpentry {
>         union {
>             struct {
>                 u64 assigned        : 1,
>                     pagesize        : 1,
>                     immutable       : 1,
>                     rsvd1           : 9,
>                     gpa             : 39,
>                     asid            : 10,
>                     vmsa            : 1,
>                     validated       : 1,
>                     rsvd2           : 1;
>                 u64 rsvd3;
>             } info;
>             u64 data[2];
>         };
>     } __packed;
> 
> But dropping the union and casting to u64[] locally in the debug/dumping
> routine should work fine as well.

Yeah, I'd suggest doing the nasty casting in the debug function.  That
makes it much more clear what the hardware is doing with the entries.
The hardware doesn't treat the struct as 2*u64's at all.

...
>>> +	ret = rmptable_entry(paddr, entry);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/* Read a large RMP entry to get the correct page level used in RMP entry. */
>>> +	ret = rmptable_entry(paddr & PMD_MASK, &large_entry);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	*level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(&large_entry));
>>> +
>>> +	return 0;
>>> +}
>>
>> This is a bit weird.  Should it say something like this?
>>
>> To do an 4k RMP lookup the hardware looks at two places in the RMP:
> 
> I'd word this as:
> 
>   "To query all the relevant bit of an 4k RMP entry, the kernel must access
>    2 entries in the RMP table:"
> 
> Because it's possible hardware only looks at the 2M entry for
> hardware-based lookups, depending on where the access is coming from, or
> how the memory at the PFN range is mapped.
> 
> But otherwise it seems like an accurate description.

The wording you suggest is a bit imprecise.  For a 2M-aligned 4k page,
there is only *one* location, *one* entry.

Also, we're not doing a lookup for an RMP entry.  We're doing it for a
_pfn_ that results in an RMP entry.

How about this:

/*
 * Find the authoritative RMP entry for a PFN.  This can be either a 4k
 * RMP entry or a special large RMP entry that is authoritative for a
 * whole 2M area.
 */
...
>>> +#ifdef CONFIG_KVM_AMD_SEV
>>> +int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
>>> +#else
>>> +static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
>>> +#endif
>>
>> Above, -ENXIO was returned when SEV-SNP was not supported.  Here, 0 is
>> returned when it is compiled out.  That inconsistent.
>>
>> Is snp_lookup_rmpentry() acceptable when SEV-SNP is in play?  I'd like
>> to see consistency between when it is compiled out and when it is
>> compiled in but unsupported on the CPU.
> 
> I really don't think anything in the kernel should be calling
> snp_lookup_rmpentry(), so I think it makes sense to adoption the -ENXIO
> convention here and in any other stubs where that applies.

Sounds good to me.  Just please make them consistent.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile
  2023-06-12  4:25 ` [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile Michael Roth
  2023-06-12  7:07   ` Kirill A . Shutemov
  2023-06-20 12:09   ` Borislav Petkov
@ 2023-07-10  3:05   ` Sathyanarayanan Kuppuswamy
  2023-07-10 13:11     ` Tom Lendacky
  2 siblings, 1 reply; 102+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2023-07-10  3:05 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, liam.merwick, zhi.a.wang,
	Kirill A . Shutemov

Hi,

On 6/11/23 9:25 PM, Michael Roth wrote:
> Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
> arch/x86/coco, but that is generally only applicable for guest support.> 
> For SEV-SNP, helpers related purely to host support will also live in
> arch/x86/coco. To allow for CoCo-related host support code in
> arch/x86/coco, move that check down into the Makefile and check for it
> specifically when needed.


I think CONFIG_HAS_CC_PLATFORM is not meant to be guest specific (otherwise,
we could have named it CONFIG_HAS_CC_GUEST). Will it create any issue if
we enable it in host?

> 
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/Kbuild        | 2 +-
>  arch/x86/coco/Makefile | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
> index 5a83da703e87..1889cef48b58 100644
> --- a/arch/x86/Kbuild
> +++ b/arch/x86/Kbuild
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += coco/
> +obj-y += coco/
>  
>  obj-y += entry/
>  
> diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
> index c816acf78b6a..6aa52e719bf5 100644
> --- a/arch/x86/coco/Makefile
> +++ b/arch/x86/coco/Makefile
> @@ -3,6 +3,6 @@ CFLAGS_REMOVE_core.o	= -pg
>  KASAN_SANITIZE_core.o	:= n
>  CFLAGS_core.o		+= -fno-stack-protector
>  
> -obj-y += core.o
> +obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o
>  
>  obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx/

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile
  2023-07-10  3:05   ` Sathyanarayanan Kuppuswamy
@ 2023-07-10 13:11     ` Tom Lendacky
  0 siblings, 0 replies; 102+ messages in thread
From: Tom Lendacky @ 2023-07-10 13:11 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	marcorr, alpergun, dgilbert, jarkko, ashish.kalra,
	nikunj.dadhania, liam.merwick, zhi.a.wang, Kirill A . Shutemov

On 7/9/23 22:05, Sathyanarayanan Kuppuswamy wrote:
> Hi,
> 
> On 6/11/23 9:25 PM, Michael Roth wrote:
>> Currently CONFIG_HAS_CC_PLATFORM is a prereq for building anything in
>> arch/x86/coco, but that is generally only applicable for guest support.>
>> For SEV-SNP, helpers related purely to host support will also live in
>> arch/x86/coco. To allow for CoCo-related host support code in
>> arch/x86/coco, move that check down into the Makefile and check for it
>> specifically when needed.
> 
> 
> I think CONFIG_HAS_CC_PLATFORM is not meant to be guest specific (otherwise,

Correct, it is used in bare-metal for SME support, so that needs to 
continue to work.

Thanks,
Tom

> we could have named it CONFIG_HAS_CC_GUEST). Will it create any issue if
> we enable it in host?
> 
>>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   arch/x86/Kbuild        | 2 +-
>>   arch/x86/coco/Makefile | 2 +-
>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
>> index 5a83da703e87..1889cef48b58 100644
>> --- a/arch/x86/Kbuild
>> +++ b/arch/x86/Kbuild
>> @@ -1,5 +1,5 @@
>>   # SPDX-License-Identifier: GPL-2.0
>> -obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += coco/
>> +obj-y += coco/
>>   
>>   obj-y += entry/
>>   
>> diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
>> index c816acf78b6a..6aa52e719bf5 100644
>> --- a/arch/x86/coco/Makefile
>> +++ b/arch/x86/coco/Makefile
>> @@ -3,6 +3,6 @@ CFLAGS_REMOVE_core.o	= -pg
>>   KASAN_SANITIZE_core.o	:= n
>>   CFLAGS_core.o		+= -fno-stack-protector
>>   
>> -obj-y += core.o
>> +obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o
>>   
>>   obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx/
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
  2023-06-12 15:39   ` Dave Hansen
@ 2023-07-18 22:34     ` Kim Phillips
  2023-07-18 23:17       ` Dave Hansen
  0 siblings, 1 reply; 102+ messages in thread
From: Kim Phillips @ 2023-07-18 22:34 UTC (permalink / raw)
  To: Dave Hansen, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

On 6/12/23 10:39 AM, Dave Hansen wrote:
> On 6/11/23 21:25, Michael Roth wrote:
>> A hardware limitation prevents the host from enabling Automatic IBRS
>> when SNP is enabled.  Instead, fall back to retpolines.
> 
> "Hardware limitation"?  As in, it is a documented, architectural
> restriction?  Or, it's a CPU bug?

It's a documented, architectural restriction:  When Secure Nested Paging
(SEV-SNP) is enabled, processes running as host are protected, but
those running with a non-zero ASID, are not.

>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>> index f9d060e71c3e..3fba3623ff64 100644
>> --- a/arch/x86/kernel/cpu/bugs.c
>> +++ b/arch/x86/kernel/cpu/bugs.c
>> @@ -1507,7 +1507,12 @@ static void __init spectre_v2_select_mitigation(void)
>>   
>>   	if (spectre_v2_in_ibrs_mode(mode)) {
>>   		if (boot_cpu_has(X86_FEATURE_AUTOIBRS)) {
>> -			msr_set_bit(MSR_EFER, _EFER_AUTOIBRS);
>> +			if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
>> +				msr_set_bit(MSR_EFER, _EFER_AUTOIBRS);
>> +			} else {
>> +				pr_err("SNP feature available, not enabling AutoIBRS on the host.\n");
>> +				mode = spectre_v2_select_retpoline();
>> +			}
> 
> I think this would be nicer if you did something like:
> 
> 	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> 		setup_clear_cpu_cap(X86_FEATURE_AUTOIBRS);
> 
> somewhere _else_ in the code instead of smack-dab in the middle of the
> mitigation selection.

Good idea.  How about this?:

 From 6cf32e8d8426190b1bf1b1e04ceb35bf0bac784b Mon Sep 17 00:00:00 2001
From: Kim Phillips <kim.phillips@amd.com>
Date: Mon, 17 Jul 2023 14:08:15 -0500
Subject: [PATCH] x86/speculation: Do not enable Automatic IBRS if SEV SNP is
  enabled

Automatic IBRS provides protection to [1]:

  - Processes running at CPL=0
  - Processes running as host when Secure Nested Paging (SEV-SNP) is enabled

i.e.,

	(CPL < 3) || ((ASID == 0) && SNP)

Because of this limitation, do not enable Automatic IBRS when SNP is enabled.
Instead, fall back to retpolines.

Note that the AutoIBRS feature may continue to be used within the guest.

[1] "AMD64 Architecture Programmer's Manual Volume 2: System Programming",
     Pub. 24593, rev. 3.41, June 2023, Part 1, Section 3.1.7 "Extended Feature
     Enable Register (EFER)", Automatic IBRS Enable (AIBRSE) Bit.

Link: https://bugzilla.kernel.org/attachment.cgi?id=304652
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
---
  arch/x86/kernel/cpu/common.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8cd4126d8253..311c0a6422b5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1348,7 +1348,8 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
  	 * AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
  	 * flag and protect from vendor-specific bugs via the whitelist.
  	 */
-	if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
+	if ((ia32_cap & ARCH_CAP_IBRS_ALL) || (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
+	    !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
  		setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
  		if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
  		    !(ia32_cap & ARCH_CAP_PBRSB_NO))
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
  2023-07-18 22:34     ` Kim Phillips
@ 2023-07-18 23:17       ` Dave Hansen
  2023-07-20 19:11         ` Kim Phillips
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Hansen @ 2023-07-18 23:17 UTC (permalink / raw)
  To: Kim Phillips, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

On 7/18/23 15:34, Kim Phillips wrote:
...
> Automatic IBRS provides protection to [1]:
> 
>  - Processes running at CPL=0
>  - Processes running as host when Secure Nested Paging (SEV-SNP) is enabled
> 
> i.e.,
> 
>     (CPL < 3) || ((ASID == 0) && SNP)
> 
> Because of this limitation, do not enable Automatic IBRS when SNP is
> enabled.

Gah, I found that hard to parse.  I think it's because you're talking
about an SEV-SNP host in one part and "SNP" in the other but _meaning_
SNP host and SNP guest.

Could I maybe suggest that you folks follow the TDX convention and
actually add _GUEST and _HOST to the feature name be explicit about
which side is which?

> Instead, fall back to retpolines.

Now I'm totally lost.

This is talking about falling back to retpolines ... in the kernel.  But
"Automatic IBRS provides protection to ... CPL < 3", aka. the kernel.

> Note that the AutoIBRS feature may continue to be used within the
> guest.

What is this trying to say?

"AutoIBRS can still be used in a guest since it protects CPL < 3"

or

"The AutoIBRS bits can still be twiddled within the guest even though it
doesn't do any good"

?

> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 8cd4126d8253..311c0a6422b5 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1348,7 +1348,8 @@ static void __init cpu_set_bug_bits(struct
> cpuinfo_x86 *c)
>       * AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
>       * flag and protect from vendor-specific bugs via the whitelist.
>       */
> -    if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
> +    if ((ia32_cap & ARCH_CAP_IBRS_ALL) || (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
> +        !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
>          setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
>          if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
>              !(ia32_cap & ARCH_CAP_PBRSB_NO))


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
  2023-07-18 23:17       ` Dave Hansen
@ 2023-07-20 19:11         ` Kim Phillips
  2023-07-20 22:24           ` Dave Hansen
  0 siblings, 1 reply; 102+ messages in thread
From: Kim Phillips @ 2023-07-20 19:11 UTC (permalink / raw)
  To: Dave Hansen, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

On 7/18/23 6:17 PM, Dave Hansen wrote:
> On 7/18/23 15:34, Kim Phillips wrote:
> ...
>> Automatic IBRS provides protection to [1]:
>>
>>   - Processes running at CPL=0
>>   - Processes running as host when Secure Nested Paging (SEV-SNP) is enabled
>>
>> i.e.,
>>
>>      (CPL < 3) || ((ASID == 0) && SNP)
>>
>> Because of this limitation, do not enable Automatic IBRS when SNP is
>> enabled.
> 
> Gah, I found that hard to parse.  I think it's because you're talking
> about an SEV-SNP host in one part and "SNP" in the other but _meaning_
> SNP host and SNP guest.
> 
> Could I maybe suggest that you folks follow the TDX convention and
> actually add _GUEST and _HOST to the feature name be explicit about
> which side is which?
> 
>> Instead, fall back to retpolines.
> 
> Now I'm totally lost.
> 
> This is talking about falling back to retpolines ... in the kernel.  But
> "Automatic IBRS provides protection to ... CPL < 3", aka. the kernel.
> 
>> Note that the AutoIBRS feature may continue to be used within the
>> guest.
> 
> What is this trying to say?
> 
> "AutoIBRS can still be used in a guest since it protects CPL < 3"
> 
> or
> 
> "The AutoIBRS bits can still be twiddled within the guest even though it
> doesn't do any good"
> 
> ?

Hopefully the commit text in this version will help answer all your
questions?:

 From 96dbd72d018287bc5b72f6083884e2125c9d09bc Mon Sep 17 00:00:00 2001
From: Kim Phillips <kim.phillips@amd.com>
Date: Mon, 17 Jul 2023 14:08:15 -0500
Subject: [PATCH] x86/speculation: Do not enable Automatic IBRS if SEV SNP is
  enabled

Automatic IBRS provides protection to [1]:

  - Processes running at CPL=0
  - Processes running as host when Secure Nested Paging (SEV-SNP) is enabled

I.e., from the host side (ASID=0, based on host EFER.AutoIBRS)
If SYSCFG[SNPEn]=0 then:
      IBRS is enabled for supervisor mode (CPL < 3) only

If SYSCFG[SNPEn]=1 then:
      IBRS is enabled at all CPLs

 From the guest side (ASID!=0, based on guest EFER.AutoIBRS)
      IBRS is enabled for supervisor mode (CPL < 3)

Therefore, don't enable Automatic IBRS in host mode if SNP is enabled,
because it will penalize user-mode indirect branch performance.  Have
the kernel fall back to retpolines instead.

Note that the AutoIBRS feature may continue to be used within guests,
where ASID != 0.

[1] "AMD64 Architecture Programmer's Manual Volume 2: System Programming",
     Pub. 24593, rev. 3.41, June 2023, Part 1, Section 3.1.7 "Extended
     Feature Enable Register (EFER)" - accessible via Link.

Link: https://bugzilla.kernel.org/attachment.cgi?id=304652
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
---
  arch/x86/kernel/cpu/common.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8cd4126d8253..311c0a6422b5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1348,7 +1348,8 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
  	 * AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
  	 * flag and protect from vendor-specific bugs via the whitelist.
  	 */
-	if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
+	if ((ia32_cap & ARCH_CAP_IBRS_ALL) || (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
+	    !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
  		setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
  		if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
  		    !(ia32_cap & ARCH_CAP_PBRSB_NO))
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
  2023-07-20 19:11         ` Kim Phillips
@ 2023-07-20 22:24           ` Dave Hansen
  2023-07-21 16:56             ` Kim Phillips
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Hansen @ 2023-07-20 22:24 UTC (permalink / raw)
  To: Kim Phillips, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

On 7/20/23 12:11, Kim Phillips wrote:
> Hopefully the commit text in this version will help answer all your
> questions?:

To be honest, it didn't really.  I kinda feel like I was having the APM
contents tossed casually in my direction rather than being provided a
fully considered explanation.

Here's what I came up with instead:

Host-side Automatic IBRS has different behavior based on whether SEV-SNP
is enabled.

Without SEV-SNP, Automatic IBRS protects only the kernel.  But when
SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
host-side code, including userspace.  This protection comes at a cost:
reduced userspace indirect branch performance.

To avoid this performance loss, nix using Automatic IBRS on SEV-SNP
hosts.  Fall back to retpolines instead.

=====

Is that about right?

I don't think any chit-chat about the guest side is even relevant.

This also absolutely needs a comment.  Perhaps just pull the code up to
the top level of the function and do this:

	/*
	 * Automatic IBRS imposes unacceptable overhead on host
	 * userspace for SEV-SNP systems.  Zap it instead.
	 */
	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
		setup_clear_cpu_cap(X86_FEATURE_AUTOIBRS);

BTW, I assume you've grumbled to folks about this.  It's an awful shame
the hardware (or ucode) was built this was.  It's just throwing
Automatic IBRS out the window because it's not architected in a nice way.

Is there any plan to improve this?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
  2023-07-20 22:24           ` Dave Hansen
@ 2023-07-21 16:56             ` Kim Phillips
  0 siblings, 0 replies; 102+ messages in thread
From: Kim Phillips @ 2023-07-21 16:56 UTC (permalink / raw)
  To: Dave Hansen, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

On 7/20/23 5:24 PM, Dave Hansen wrote:
> On 7/20/23 12:11, Kim Phillips wrote:
>> Hopefully the commit text in this version will help answer all your
>> questions?:
> 
> To be honest, it didn't really.  I kinda feel like I was having the APM
> contents tossed casually in my direction rather than being provided a
> fully considered explanation.

Sorry to hear that, it wasn't the intention.

> Here's what I came up with instead:
> 
> Host-side Automatic IBRS has different behavior based on whether SEV-SNP
> is enabled.
> 
> Without SEV-SNP, Automatic IBRS protects only the kernel.  But when
> SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
> host-side code, including userspace.  This protection comes at a cost:
> reduced userspace indirect branch performance.
> 
> To avoid this performance loss, nix using Automatic IBRS on SEV-SNP
> hosts.  Fall back to retpolines instead.
> 
> =====
> 
> Is that about right?

Sure, see new version below.

> I don't think any chit-chat about the guest side is even relevant.
> 
> This also absolutely needs a comment.  Perhaps just pull the code up to
> the top level of the function and do this:
> 
> 	/*
> 	 * Automatic IBRS imposes unacceptable overhead on host
> 	 * userspace for SEV-SNP systems.  Zap it instead.
> 	 */
> 	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> 		setup_clear_cpu_cap(X86_FEATURE_AUTOIBRS);

Clearing X86_FEATURE_AUTOIBRS won't work because it'll unnecessarily
prohibit KVM from subsequently advertising the feature to guests.

I've tried to address the comment comment, see below.

> BTW, I assume you've grumbled to folks about this.  It's an awful shame
> the hardware (or ucode) was built this was.  It's just throwing
> Automatic IBRS out the window because it's not architected in a nice way.
> 
> Is there any plan to improve this?

Sure.  Until then:

 From fb55df544ed066a3c8fdb1581932a23c03ce6d2c Mon Sep 17 00:00:00 2001
From: Kim Phillips <kim.phillips@amd.com>
Date: Mon, 17 Jul 2023 14:08:15 -0500
Subject: [PATCH] x86/speculation: Don't enable Automatic IBRS if SEV SNP is
  enabled

Host-side Automatic IBRS has a different behaviour depending on whether
SEV-SNP is enabled.

Without SEV-SNP, Automatic IBRS protects only the kernel.  But when
SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
host-side code, including userspace.  This protection comes at a cost:
reduced userspace indirect branch performance.

To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
hosts.  Fall back to retpolines instead.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
---
  arch/x86/kernel/cpu/common.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8cd4126d8253..a93e6204d847 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1347,8 +1347,11 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
  	/*
  	 * AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
  	 * flag and protect from vendor-specific bugs via the whitelist.
+	 * Don't use AutoIBRS when SNP is enabled because it degrades host
+	 * userspace indirect branch performance.
  	 */
-	if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
+	if ((ia32_cap & ARCH_CAP_IBRS_ALL) || (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
+	    !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
  		setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
  		if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
  		    !(ia32_cap & ARCH_CAP_PBRSB_NO))
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 19/51] x86/sev: Introduce snp leaked pages list
  2023-06-12  4:25 ` [PATCH RFC v9 19/51] x86/sev: Introduce snp leaked pages list Michael Roth
@ 2023-08-09 12:46   ` Jeremi Piotrowski
  0 siblings, 0 replies; 102+ messages in thread
From: Jeremi Piotrowski @ 2023-08-09 12:46 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

On Sun, Jun 11, 2023 at 11:25:27PM -0500, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Pages are unsafe to be released back to the page-allocator, if they
> have been transitioned to firmware/guest state and can't be reclaimed
> or transitioned back to hypervisor/shared state. In this case add
> them to an internal leaked pages list to ensure that they are not freed
> or touched/accessed to cause fatal page faults.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: relocate to arch/x86/coco/sev/host.c]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/coco/sev/host.c        | 28 ++++++++++++++++++++++++++++
>  arch/x86/include/asm/sev-host.h |  3 +++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
> index cd3b4c6a25bc..373e91f5a337 100644
> --- a/arch/x86/coco/sev/host.c
> +++ b/arch/x86/coco/sev/host.c
> @@ -64,6 +64,12 @@ struct rmpentry {
>  static unsigned long rmptable_start __ro_after_init;
>  static unsigned long rmptable_end __ro_after_init;
>  
> +/* list of pages which are leaked and cannot be reclaimed */
> +static LIST_HEAD(snp_leaked_pages_list);
> +static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
> +
> +static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
> +
>  #undef pr_fmt
>  #define pr_fmt(fmt)	"SEV-SNP: " fmt
>  
> @@ -494,3 +500,25 @@ int rmp_make_shared(u64 pfn, enum pg_level level)
>  	return rmpupdate(pfn, &val);
>  }
>  EXPORT_SYMBOL_GPL(rmp_make_shared);
> +
> +void snp_leak_pages(unsigned long pfn, unsigned int npages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +
> +	WARN(1, "psc failed, pfn 0x%lx pages %d (marked offline)\n", pfn, npages);
> +
> +	spin_lock(&snp_leaked_pages_list_lock);
> +	while (npages--) {
> +		/*
> +		 * Reuse the page's buddy list for chaining into the leaked
> +		 * pages list. This page should not be on a free list currently
> +		 * and is also unsafe to be added to a free list.
> +		 */
> +		list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
> +		sev_dump_rmpentry(pfn);
> +		pfn++;
> +	}
> +	spin_unlock(&snp_leaked_pages_list_lock);
> +	atomic_long_inc(&snp_nr_leaked_pages);
> +}
> +EXPORT_SYMBOL_GPL(snp_leak_pages);
> diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
> index 753e80d16433..bab3b226777a 100644
> --- a/arch/x86/include/asm/sev-host.h
> +++ b/arch/x86/include/asm/sev-host.h
> @@ -19,6 +19,8 @@ void sev_dump_rmpentry(u64 pfn);
>  int psmash(u64 pfn);
>  int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
>  int rmp_make_shared(u64 pfn, enum pg_level level);
> +void snp_leak_pages(unsigned long pfn, unsigned int npages);
> +
>  #else
>  static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
>  static inline void sev_dump_rmpentry(u64 pfn) {}
> @@ -29,6 +31,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
>  	return -ENODEV;
>  }
>  static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
> +void snp_leak_pages(unsigned long pfn, unsigned int npages) {}

This needs to be 'static inline' or the build fails with multiple definition errors.
I'm building a guest kernel with CONFIG_KVM_AMD_SEV disabled.

Jeremi

>  #endif
>  
>  #endif
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support
  2023-06-12  4:25 ` [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support Michael Roth
  2023-06-12 15:34   ` Dave Hansen
  2023-06-21  9:42   ` Borislav Petkov
@ 2023-08-09 13:03   ` Jeremi Piotrowski
  2 siblings, 0 replies; 102+ messages in thread
From: Jeremi Piotrowski @ 2023-08-09 13:03 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Sun, Jun 11, 2023 at 11:25:15PM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
> a number of steps needed to detect/enable SEV-SNP and RMP table support
> on the host:
> 
>  - Detect SEV-SNP support based on CPUID bit
>  - Initialize the RMP table memory reported by the RMP base/end MSR
>    registers and configure IOMMU to be compatible with RMP access
>    restrictions
>  - Set the MtrrFixDramModEn bit in SYSCFG MSR
>  - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
>  - Configure IOMMU
> 
> RMP table entry format is non-architectural and it can vary by
> processor. It is defined by the PPR. Restrict SNP support to CPU
> models/families which are compatible with the current RMP table entry
> format to guard against any undefined behavior when running on other
> system types. Future models/support will handle this through an
> architectural mechanism to allow for broader compatibility.
> 
> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
> instead of CONFIG_AMD_MEM_ENCRYPT.
> 
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> [mdr: rework commit message to be clearer about what patch does, squash
>       in early_rmptable_check() handling from Tom]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/coco/Makefile                   |   1 +
>  arch/x86/coco/sev/Makefile               |   3 +
>  arch/x86/coco/sev/host.c                 | 212 +++++++++++++++++++++++
>  arch/x86/include/asm/disabled-features.h |   8 +-
>  arch/x86/include/asm/msr-index.h         |  11 +-
>  arch/x86/include/asm/sev.h               |   2 +
>  arch/x86/kernel/cpu/amd.c                |  19 ++
>  drivers/iommu/amd/init.c                 |   2 +-
>  include/linux/amd-iommu.h                |   2 +-
>  9 files changed, 256 insertions(+), 4 deletions(-)
>  create mode 100644 arch/x86/coco/sev/Makefile
>  create mode 100644 arch/x86/coco/sev/host.c
> 
> diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
> index 6aa52e719bf5..6a7d876130e2 100644
> --- a/arch/x86/coco/Makefile
> +++ b/arch/x86/coco/Makefile
> @@ -6,3 +6,4 @@ CFLAGS_core.o		+= -fno-stack-protector
>  obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += core.o
>  
>  obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx/
> +obj-$(CONFIG_KVM_AMD_SEV)	+= sev/
> diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile
> new file mode 100644
> index 000000000000..27c0500d75c8
> --- /dev/null
> +++ b/arch/x86/coco/sev/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-y += host.o
> diff --git a/arch/x86/coco/sev/host.c b/arch/x86/coco/sev/host.c
> new file mode 100644
> index 000000000000..6907ce887b23
> --- /dev/null
> +++ b/arch/x86/coco/sev/host.c
> @@ -0,0 +1,212 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AMD SVM-SEV Host Support.
> + *
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + *
> + * Author: Ashish Kalra <ashish.kalra@amd.com>
> + *
> + */
> +
> +#include <linux/cc_platform.h>
> +#include <linux/printk.h>
> +#include <linux/mm_types.h>
> +#include <linux/set_memory.h>
> +#include <linux/memblock.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
> +#include <linux/amd-iommu.h>
> +
> +#include <asm/sev.h>
> +#include <asm/processor.h>
> +#include <asm/setup.h>
> +#include <asm/svm.h>
> +#include <asm/smp.h>
> +#include <asm/cpu.h>
> +#include <asm/apic.h>
> +#include <asm/cpuid.h>
> +#include <asm/cmdline.h>
> +#include <asm/iommu.h>
> +
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range needs to be added during the RMP entry lookup.
> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
> +
> +static unsigned long rmptable_start __ro_after_init;
> +static unsigned long rmptable_end __ro_after_init;
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt)	"SEV-SNP: " fmt
> +
> +static int __mfd_enable(unsigned int cpu)
> +{
> +	u64 val;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return 0;
> +
> +	rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	val |= MSR_AMD64_SYSCFG_MFDM;
> +
> +	wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	return 0;
> +}
> +
> +static __init void mfd_enable(void *arg)
> +{
> +	__mfd_enable(smp_processor_id());
> +}
> +
> +static int __snp_enable(unsigned int cpu)
> +{
> +	u64 val;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return 0;
> +
> +	rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	val |= MSR_AMD64_SYSCFG_SNP_EN;
> +	val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> +	wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> +	__snp_enable(smp_processor_id());
> +}
> +
> +bool snp_get_rmptable_info(u64 *start, u64 *len)
> +{
> +	u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
> +
> +	rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> +	rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> +	if (!rmp_base || !rmp_end) {
> +		pr_err("Memory for the RMP table has not been reserved by BIOS\n");
> +		return false;
> +	}
> +
> +	rmp_sz = rmp_end - rmp_base + 1;
> +
> +	/*
> +	 * Calculate the amount the memory that must be reserved by the BIOS to
> +	 * address the whole RAM, including the bookkeeping area. The RMP itself
> +	 * must also be covered.
> +	 */
> +	max_rmp_pfn = max_pfn;
> +	if (PHYS_PFN(rmp_end) > max_pfn)
> +		max_rmp_pfn = PHYS_PFN(rmp_end);
> +
> +	calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> +	if (calc_rmp_sz > rmp_sz) {
> +		pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> +		       calc_rmp_sz, rmp_sz);
> +		return false;
> +	}
> +
> +	*start = rmp_base;
> +	*len = rmp_sz;
> +
> +	return true;
> +}
> +
> +static __init int __snp_rmptable_init(void)
> +{
> +	u64 rmp_base, sz;
> +	void *start;
> +	u64 val;
> +
> +	if (!snp_get_rmptable_info(&rmp_base, &sz))
> +		return 1;
> +
> +	pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
> +		rmp_base, rmp_base + sz - 1);
> +
> +	start = memremap(rmp_base, sz, MEMREMAP_WB);
> +	if (!start) {
> +		pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, sz);
> +		return 1;
> +	}
> +
> +	/*
> +	 * Check if SEV-SNP is already enabled, this can happen in case of
> +	 * kexec boot.
> +	 */
> +	rdmsrl(MSR_AMD64_SYSCFG, val);
> +	if (val & MSR_AMD64_SYSCFG_SNP_EN)
> +		goto skip_enable;
> +
> +	/* Initialize the RMP table to zero */
> +	memset(start, 0, sz);
> +
> +	/* Flush the caches to ensure that data is written before SNP is enabled. */
> +	wbinvd_on_all_cpus();
> +
> +	/* MFDM must be enabled on all the CPUs prior to enabling SNP. */
> +	on_each_cpu(mfd_enable, NULL, 1);
> +
> +	/* Enable SNP on all CPUs. */
> +	on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> +	rmptable_start = (unsigned long)start;
> +	rmptable_end = rmptable_start + sz - 1;
> +
> +	return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> +	int family, model;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return 0;
> +
> +	family = boot_cpu_data.x86;
> +	model  = boot_cpu_data.x86_model;
> +
> +	/*
> +	 * RMP table entry format is not architectural and it can vary by processor and
> +	 * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> +	 * model and family for which the RMP table entry format is currently defined for.
> +	 */
> +	if (!(family == 0x19 && model <= 0xaf) && !(family == 0x1a && model <= 0xf))
> +		goto nosnp;
> +
> +	if (amd_iommu_snp_enable())
> +		goto nosnp;
> +
> +	if (__snp_rmptable_init())
> +		goto nosnp;
> +
> +	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> +	return 0;
> +
> +nosnp:
> +	setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> +	return -ENOSYS;
> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
> + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
> + * called after subsys_initcall().
> + *
> + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
> + * directly into guest private memory. In case of SNP, the IOMMU ensures that
> + * the page(s) used for DMA are hypervisor owned.
> + */
> +fs_initcall(snp_rmptable_init);
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 5dfa4fb76f4b..0a9938aea305 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -99,6 +99,12 @@
>  # define DISABLE_TDX_GUEST	(1 << (X86_FEATURE_TDX_GUEST & 31))
>  #endif
>  
> +#ifdef CONFIG_KVM_AMD_SEV
> +# define DISABLE_SEV_SNP	0
> +#else
> +# define DISABLE_SEV_SNP	(1 << (X86_FEATURE_SEV_SNP & 31))
> +#endif
> +
>  /*
>   * Make sure to add features to the correct mask
>   */
> @@ -123,7 +129,7 @@
>  			 DISABLE_ENQCMD)
>  #define DISABLED_MASK17	0
>  #define DISABLED_MASK18	0
> -#define DISABLED_MASK19	0
> +#define DISABLED_MASK19	(DISABLE_SEV_SNP)
>  #define DISABLED_MASK20	0
>  #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)
>  
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index ad35355ee43e..db0f3a041930 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -574,6 +574,8 @@
>  #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
>  #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
>  #define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE		0xc0010132
> +#define MSR_AMD64_RMP_END		0xc0010133
>  
>  /* SNP feature bits enabled by the hypervisor */
>  #define MSR_AMD64_SNP_VTOM			BIT_ULL(3)
> @@ -675,7 +677,14 @@
>  #define MSR_K8_TOP_MEM2			0xc001001d
>  #define MSR_AMD64_SYSCFG		0xc0010010
>  #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT		BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
> +#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> +#define MSR_AMD64_SYSCFG_MFDM_BIT		19
> +#define MSR_AMD64_SYSCFG_MFDM			BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
> +
>  #define MSR_K8_INT_PENDING_MSG		0xc0010055
>  /* C1E active bits in int pending message */
>  #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index ebc271bb6d8e..d34c46db7dd1 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -197,6 +197,7 @@ void snp_set_wakeup_secondary_cpu(void);
>  bool snp_init(struct boot_params *bp);
>  void __init __noreturn snp_abort(void);
>  int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
> +bool snp_get_rmptable_info(u64 *start, u64 *len);
>  #else
>  static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>  static inline void sev_es_ist_exit(void) { }
> @@ -221,6 +222,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
>  {
>  	return -ENOTTY;
>  }
> +static inline bool snp_get_rmptable_info(u64 *start, u64 *len) { return false; }
>  #endif
>  
>  #endif
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index a79774181f22..1493ddf89fdf 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -20,6 +20,7 @@
>  #include <asm/delay.h>
>  #include <asm/debugreg.h>
>  #include <asm/resctrl.h>
> +#include <asm/sev.h>
>  
>  #ifdef CONFIG_X86_64
>  # include <asm/mmconfig.h>
> @@ -546,6 +547,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
>  	resctrl_cpu_detect(c);
>  }
>  
> +static bool early_rmptable_check(void)
> +{
> +	u64 rmp_base, rmp_size;
> +
> +	/*
> +	 * For early BSP initialization, max_pfn won't be set up yet, wait until
> +	 * it is set before performing the RMP table calculations.
> +	 */
> +	if (!max_pfn)
> +		return true;
> +
> +	return snp_get_rmptable_info(&rmp_base, &rmp_size);
> +}
> +

When CONFIG_AMD_MEM_ENCRYPT=y && CONFIG_KVM=n (=> CONFIG_KVM_AMD_SEV=n) this
results in an undefined reference to snp_get_rmptable_info when linking this
file. The header provides a stub when AMD_MEM_ENCRYPT=n but the definition is
only compiled in when KVM_AMD_SEV=y

Jeremi

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 42/51] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2023-06-12  4:25 ` [PATCH RFC v9 42/51] KVM: SVM: Support SEV-SNP AP Creation NAE event Michael Roth
@ 2023-08-15 16:00   ` Peter Gonda
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Gonda @ 2023-08-15 16:00 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang, Brijesh Singh

> +
> +       switch (request) {
> +       case SVM_VMGEXIT_AP_CREATE_ON_INIT:
> +               kick = false;
> +               fallthrough;
> +       case SVM_VMGEXIT_AP_CREATE:
> +               if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
> +                                   svm->vmcb->control.exit_info_2);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               /*
> +                * Malicious guest can RMPADJUST a large page into VMSA which
> +                * will hit the SNP erratum where the CPU will incorrectly signal
> +                * an RMP violation #PF if a hugepage collides with the RMP entry
> +                * of VMSA page, reject the AP CREATE request if VMSA address from
> +                * guest is 2M aligned.
> +                */
> +               if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
> +                       vcpu_unimpl(vcpu,
> +                                   "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
> +                                   svm->vmcb->control.exit_info_2);

In this case and maybe the above case can we instead return an error
to the guest instead of a #GP?

I think Tom suggested: SW_EXITINFO1[31:0] to 2 (meaning there was an
error with the VMGEXIT request) and SW_EXITINFO2 to 5 (The NAE event
input was not valid (e.g., an invalid SW_EXITINFO1 value for the AP
Jump Table NAE event).

This way a non-malicious but buggy or naive guest gets a little more
information and a chance to retry the request?

> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
> +               break;
> +       case SVM_VMGEXIT_AP_DESTROY:
> +               break;
> +       default:
> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
> +                           request);
> +               ret = -EINVAL;
> +               break;
> +       }
> +

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH RFC v9 47/51] iommu/amd: Add IOMMU_SNP_SHUTDOWN support
  2023-06-12  4:25 ` [PATCH RFC v9 47/51] iommu/amd: Add IOMMU_SNP_SHUTDOWN support Michael Roth
@ 2023-09-07 10:31   ` Suthikulpanit, Suravee
  0 siblings, 0 replies; 102+ messages in thread
From: Suthikulpanit, Suravee @ 2023-09-07 10:31 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	liam.merwick, zhi.a.wang

Mike / Ashish

FYI, you might need to start including the change from this patch

https://lore.kernel.org/linux-iommu/ZPiAUx9Qysw0AKNq@ziepe.ca/T/

in this series since Christoph would like to remove the code from the 
current upstream, and re-introduce the change within this series instead.

Regards,
Suravee

On 6/12/2023 11:25 AM, Michael Roth wrote:
> From: Ashish Kalra<ashish.kalra@amd.com>
> 
> Add a new IOMMU API interface amd_iommu_snp_disable() to transition
> IOMMU pages to Hypervisor state from Reclaim state after SNP_SHUTDOWN_EX
> command. Invoke this API from the CCP driver after SNP_SHUTDOWN_EX
> command.
> 
> Signed-off-by: Ashish Kalra<ashish.kalra@amd.com>
> Signed-off-by: Michael Roth<michael.roth@amd.com>

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2023-09-07 15:47 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-12  4:25 [PATCH RFC v9 00/51] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 01/51] KVM: x86: Add gmem hook for initializing private memory Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 02/51] KVM: x86: Add gmem hook for invalidating " Michael Roth
2023-06-12 10:49   ` Borislav Petkov
2023-06-19 13:39     ` Borislav Petkov
2023-06-12  4:25 ` [PATCH RFC v9 03/51] KVM: x86: Use full 64-bit error code for kvm_mmu_do_page_fault Michael Roth
2023-06-14 14:24   ` Isaku Yamahata
2023-06-12  4:25 ` [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask Michael Roth
2023-06-14 16:47   ` Isaku Yamahata
2023-06-20 20:28     ` Michael Roth
2023-06-20 21:18       ` Isaku Yamahata
2023-06-21 23:00         ` Michael Roth
2023-06-22  8:01           ` Isaku Yamahata
2023-06-22  9:55           ` Huang, Kai
2023-06-22 15:32             ` Michael Roth
2023-06-22 22:31               ` Huang, Kai
2023-06-22 23:39                 ` Isaku Yamahata
2023-06-22 23:52                   ` Huang, Kai
2023-06-23 14:43                     ` Isaku Yamahata
2023-06-19 16:27   ` Borislav Petkov
2023-06-20 20:36     ` Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 05/51] x86/coco: move CONFIG_HAS_CC_PLATFORM check down into coco/Makefile Michael Roth
2023-06-12  7:07   ` Kirill A . Shutemov
2023-06-20 12:09   ` Borislav Petkov
2023-06-20 20:43     ` Michael Roth
2023-06-21  8:54       ` Borislav Petkov
2023-06-29 21:02         ` Michael Roth
2023-07-10  3:05   ` Sathyanarayanan Kuppuswamy
2023-07-10 13:11     ` Tom Lendacky
2023-06-12  4:25 ` [PATCH RFC v9 06/51] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 07/51] x86/sev: Add the host SEV-SNP initialization support Michael Roth
2023-06-12 15:34   ` Dave Hansen
2023-06-21  9:15     ` Borislav Petkov
2023-06-21 14:31       ` Dave Hansen
2023-06-21 15:59         ` Borislav Petkov
2023-06-21  9:42   ` Borislav Petkov
2023-06-21 14:36     ` Tom Lendacky
2023-06-21 19:15     ` Kalra, Ashish
2023-08-09 13:03   ` Jeremi Piotrowski
2023-06-12  4:25 ` [PATCH RFC v9 08/51] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
2023-06-12 15:39   ` Dave Hansen
2023-07-18 22:34     ` Kim Phillips
2023-07-18 23:17       ` Dave Hansen
2023-07-20 19:11         ` Kim Phillips
2023-07-20 22:24           ` Dave Hansen
2023-07-21 16:56             ` Kim Phillips
2023-06-12  4:25 ` [PATCH RFC v9 09/51] x86/sev: Add RMP entry lookup helpers Michael Roth
2023-06-12 16:08   ` Dave Hansen
2023-06-30 21:57     ` Michael Roth
2023-06-30 22:29       ` Dave Hansen
2023-06-12  4:25 ` [PATCH RFC v9 10/51] x86/fault: Add helper for dumping RMP entries Michael Roth
2023-06-12 16:12   ` Dave Hansen
2023-06-12  4:25 ` [PATCH RFC v9 11/51] x86/traps: Define RMP violation #PF error code Michael Roth
2023-06-12 16:26   ` Dave Hansen
2023-06-12  4:25 ` [PATCH RFC v9 12/51] x86/fault: Report RMP page faults for kernel addresses Michael Roth
2023-06-12 16:30   ` Dave Hansen
2023-06-12  4:25 ` [PATCH RFC v9 13/51] x86/fault: Handle RMP page faults for user addresses Michael Roth
2023-06-12 16:40   ` Dave Hansen
2023-06-12  4:25 ` [PATCH RFC v9 14/51] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
2023-06-12 17:00   ` Dave Hansen
2023-06-12  4:25 ` [PATCH RFC v9 15/51] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 16/51] crypto: ccp: Define the SEV-SNP commands Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 17/51] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 18/51] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 19/51] x86/sev: Introduce snp leaked pages list Michael Roth
2023-08-09 12:46   ` Jeremi Piotrowski
2023-06-12  4:25 ` [PATCH RFC v9 20/51] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 21/51] crypto: ccp: Handle the legacy SEV command " Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 22/51] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 23/51] KVM: SEV: Select CONFIG_KVM_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 24/51] KVM: SVM: Add support to handle AP reset MSR protocol Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 25/51] KVM: SVM: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 26/51] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 27/51] KVM: SVM: Add initial SEV-SNP support Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 28/51] KVM: SVM: Add KVM_SNP_INIT command Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 29/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
2023-06-12 17:08   ` Peter Gonda
2023-06-12  4:25 ` [PATCH RFC v9 30/51] KVM: Add HVA range operator Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 31/51] KVM: Split out memory attribute xarray updates to helper function Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 32/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 33/51] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 34/51] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 35/51] KVM: SVM: Add KVM_EXIT_VMGEXIT Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 36/51] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 37/51] KVM: SVM: Add support to handle " Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 38/51] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 39/51] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 40/51] KVM: SVM: Add support to handle RMP nested page faults Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 41/51] KVM: SVM: Use a VMSA physical address variable for populating VMCB Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 42/51] KVM: SVM: Support SEV-SNP AP Creation NAE event Michael Roth
2023-08-15 16:00   ` Peter Gonda
2023-06-12  4:25 ` [PATCH RFC v9 43/51] KVM: SEV: Configure MMU to check for private fault flags Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 44/51] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 45/51] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 46/51] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 47/51] iommu/amd: Add IOMMU_SNP_SHUTDOWN support Michael Roth
2023-09-07 10:31   ` Suthikulpanit, Suravee
2023-06-12  4:25 ` [PATCH RFC v9 48/51] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
2023-06-13  6:24   ` Alexey Kardashevskiy
2023-06-12  4:25 ` [PATCH RFC v9 49/51] x86/sev: Add KVM commands for per-instance certs Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 50/51] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
2023-06-12  4:25 ` [PATCH RFC v9 51/51] crypto: ccp: Add debug support for decrypting pages Michael Roth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).