linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
@ 2023-02-20 18:37 Michael Roth
  2023-02-20 18:37 ` [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op Michael Roth
                   ` (57 more replies)
  0 siblings, 58 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:37 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

This patchset is also available at:

  https://github.com/amdese/linux/commits/upmv10-host-snp-v8-rfc

and is based on top of the following tree:

  https://github.com/mdroth/linux/commits/upm_base_support_fixes

which in turn is based on Sean Christopherson's UPM base support tree,
with some fixes/workarounds needed for SEV/SNP support.[1]

== OVERVIEW ==

This version is being posted as an RFC due to fairly extensive changes
relating to transitioning the SEV-SNP implementation to using
restricted/private memslots (aka Unmapped Private Memory) to manage
private guest pages instead of the legacy SEV memory registration ioctls.

Alongside that work we've also been investigating leveraging UPM to to
implement lazy-pinning support for SEV guests, rather than the legacy
SEV memory registration ioctls which rely on pinning everything in
advance.

For both of these SEV and SEV-SNP use-cases we've needed to add a
number of hooks in the restricted, so we thought it would be useful
for this version at least to include both UPM-based SEV and SNP
implementations so can see if these hooks might be needed for other
archs/platforms and start consolidating around whether/how they should
be defined for general usage. There are still some TODOs in this area,
but we hope this implementation is complete enough to at least outline
the required additions needed for using UPM for these use-cases.

Outside of UPM-related items, we've also included fairly extensive changes
based on review feedback from v6/v7 and would appreciate any feedback on
those aspects as well.

== LAYOUT ==

PATCH 01-03: pre-patches that add the UPM hooks and KVM capability needed
             to switch between UPM and legacy SEV memory registration.
PATCH 04-09: implement SEV lazy-pinning using UPM to manage private memory
PATCH 10-28: general SNP detection/enablement for host and CCP driver
PATCH 29-56: SNP hypervisor support

== TESTING (note updated QEMU command-lines) ==

For testing this via QEMU, use the following tree:

  https://github.com/amdese/qemu/commits/upmv10b-snpv3-wip

SEV-SNP with UPM:

  qemu-system-x86_64 -cpu EPYC-Milan-v2 \
    -object memory-backend-memfd-private,id=ram1,size=1G,share=true \
    -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \
    -machine q35,confidential-guest-support=sev0,memory-backend=ram1,kvm-type=protected \
    ...

SEV with UPM (requires patched OVMF[2]):

  qemu-system-x86_64 -cpu EPYC-Milan-v2 \
    -object memory-backend-memfd-private,id=ram1,size=1G,share=true \
    -object sev-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \
    -machine q35,confidential-guest-support=sev0,memory-backend=ram1,kvm-type=protected \
    ...

KVM selftests for UPM:

  cd $kernel_src_dir
  make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG"
  sudo tools/testing/selftests/kvm/x86_64/private_mem_conversions_test


== BACKGROUND (SEV-SNP) ==

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Guest Support now part of mainline.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:
- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
access requires an RMP check. In particular, the read accesses from the
hypervisor do not require RMP checks because the data confidentiality is
already protected via memory encryption. When hardware encounters an RMP
checks failure, it raises a page-fault exception. If RMP check failure
is due to the page-size mismatch, then split the large page to resolve
the fault.

The series does not provide support for the interrupt security and migration
and those feature will be added after the base support.

== BACKGROUND (SEV Lazy-pinning) ==

The current implementation of SEV relies on a KVM_MEM_ENCRYPT_REG_REGION
ioctl to pin all pages in advance, since they may be used as private pages
by the guest. Previous iterations of SNP also relied on this interface. With
UPM however, the initial shared pages are not converted to private, instead
private pages are allocated from a separate restrictedmem backend, which,
by design, will handle the pinning internally upon allocation, which
generally occurs when the guest faults the page in for the first time.
This provides all the necessary characteristic to support lazy-pinning to
improve boot times for SEV guests, and well as a bare minimum implementation
of a UPM-enabled guest that can provide basic infrastructure and testing
flexibility that SNP can then build on, which is why it's included with
this series for now.

It should be noted that SEV guests don't have a mean to issue explicit
page state changes like SNP guests can via GHCB page-state change requests.
They do however need a way to inform the host of shared pages so that
non-restricted memory can be used. This is done via the MAP_GPA_RANGE
KVM hypercall interface which was introduced in guest kernels to inform
the host of shared pages to support live migration. This allows support
for existing guest kernels, however OVMF is still lacking enablement
for MAP_GPA_RANGE hypercalls, which is why a patched version is needed
there.

== TODO / KNOWN ISSUES ==

 * Failures in the rmpupdate() helper have been observed when running many
   concurrent SNP guests while THP is disabled for restricted/private guest
   pages. This has not been observed when running with THP enabled via
   `echo always >/sys/kernel/mm/transparent_hugepages/shmem_enabled`
 * CCP driver init ordering considerations (Jarkko)
 * Reclaim firmware pages in case of SNP command failure (Alper)
 * Return 0 to userspace when generating KVM_EXIT_VMGEXIT (Tom)
 * Rework interfaces for issuing SNP guest requests to firmware (Alexey)
 * Incorporate guest request throttling patches from Dionna
 * Incorporate suggested updates for instant certificate support (Dionna)


[1] https://github.com/mdroth/edk2/commits/upmv8-seves-v1
[2] https://lore.kernel.org/lkml/20230213130102.two7q3kkcf254uof@amd.com/


Changes since v7:

 * Rebase to Sean's updated UPM base support tree
 * Drop KVM_CAP_UNMAPPED_MEMORY and .private_mem_enabled x86 op in favor
   of kvm_arch_has_private_mem() and vm_type KVM_VM_CREATE arg
 * Drop GHCB map/unmap refactoring and post map/unmap hooks as they are no
   longer needed with UPM
 * Move .fault_is_private implementation to SNP patch range, no longer
   needed for SEV.
 * Don't call attribute update / invalidation hooks under kvm->mmu_lock
   (Tom, Jarkko)
 * Revert switch to using set_memory_p()/set_memory_np() in rmpupdate() due
   to it causing performance regression
 * Commit fixups for 'fault_is_private'/'update_mem_attr' hooks, have
   'fault_is_private' return bool (Boris)
 * Split kvm_vm_set_region_attr() into separate patch. (Jarkko)
 * Copy corrected CPUID page to userspace when firmware rejects it (Tom,
   Jarkko)
 * Fix sev_dump_rmpentry() error-handling (Alper)
 * Use adjusted cmd_buf pointer rather than sev->cmd_buf directly (Alper)
 * Correct typo in SNP_GET_EXT_CONFIG documentation (Dov)
 * Update struct kvm_sev_snp_launch_finish definition in
   amd-memory-encryption.rst (Tom)
 * Fix snp_launch_update_vmsa replacing created_vcpus with online_vcpus
 * Fix SNP_DBG_DECRYPT to not include len parameter.
 * Fix SNP_LAUNCH_FINISH to copy host-data from userspace


Changes since v6:

 * Added support for restrictedmem/UPM, and removed SEV-specific
   implementation of private memory management. As a result of this rework
   the following patches were no longer needed so were dropped:
   - KVM: SVM: Mark the private vma unmergable for SEV-SNP guests
   - KVM: SVM: Disallow registering memory range from HugeTLB for SNP guest
   - KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
   - KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
 * Moved RMP table entry structure definition (struct rmpentry)
   to sev.c, to not expose this non-architectural definition to rest
   of the kernel and making the structure private to SNP code.
   Also made RMP table entry accessors to be inline functions and
   removed all accessors which are not called more than once. 
   Added a new function rmptable_entry() to index into the RMP table
   and return RMP table entry.
 * Moved RMPUPDATE, PSMASH helper function declerations to x86 arch
   specific include namespace from linux namespace. Added comments 
   for these helper functions.
 * Introduce set_memory_p() to provide a way to change atributes of a
   memory range to be marked as present and added to the kernel 
   directmap, and invalidating/restoring pages from directmap are
   now done using set_memory_np() and set_memory_p().
 * Added detailed comments around user RMP #PF fault handling and
   simplified computation of the faulting pfn for large-pages.
 * Added support to return pfn from dump_pagetable() to do SEV-specific
   fault handling, this is added a pre-patch. This support is now
   used to dump RMP entry in case of RMP #PF in show_fault_oops().
 * Added a new generic SNP command params structure sev_data_snp_addr,
   which is used for all SNP firmware API commands requiring a 
   single physical address parameter.
 * Added support for new SNP_INIT_EX command with support for HV-Fixed
   page range list. 
 * Added support for new SNP_SHUTDOWN_EX command which allows 
   disabling enforcement of SNP in the IOMMU. Also DF_FLUSH is done
   at SNP shutdown if it indicates DF_FLUSH is required.
 * Make sev_do_cmd() a generic API interface for the hypervisor
   to issue commands to manage an SEV and SNP guest. Also removed
   the API wrappers used by the hypervisor to manage an SEV-SNP guest.
   All these APIs now invoke sev_do_cmd() directly.
 * Introduce snp leaked pages list. If pages are unsafe to be released
   back to the page-allocator as they can't be reclaimed or 
   transitioned back to hypervisor/shared state are now added
   to this internal leaked pages list to prevent fatal page faults
   when accessing these pages. The function snp_leak_pages() is 
   renamed to snp_mark_pages_offline() and is an external function
   available to both CCP driver and the SNP hypervisor code. Removed
   call to memory_failure() when leaking/marking pages offline.
 * Remove snp_set_rmp_state() multiplexor code and add new separate
   helpers such as rmp_mark_pages_firmware() & rmp_mark_pages_shared().
   The callers now issue snp_reclaim_pages() directly when needed as
   done by __snp_free_firmware_pages() and unmap_firmware_writeable().
   All callers of snp_set_rmp_state() modified to call helpers
   rmp_mark_pages_firmware() or rmp_mark_pages_shared() as required.
 * Change snp_reclaim_pages() to take physical address as an argument
   and clear C-bit from this physical address argument internally.
 * Output parameter sev_user_data_ext_snp_config in sev_ioctl_snp_get_config()
   is memset to zero to avoid kernel memory leaking.
 * Prevent race between sev_ioctl_snp_set_config() and 
   snp_guest_ext_guest_request() for sev->snp_certs_data by acquiring
   sev->snp_certs_lock mutex.
 * Zeroed out struct sev_user_data_snp_config in
   sev_ioctl_snp_set_config() to prevent leaking uninitialized
   kernel memory.
 * Optimized snp_safe_alloc_page() by avoiding multiple calls to
   pfn_to_page() and checking for a hugepage using pfn instead of
   expanding to full physical address.
 * Invoke host_rmp_make_shared() with leak parameter set to true
   if VMSA page cannot be transitioned back to shared state.
 * Fix snp_launch_finish() to always sent the ID_AUTH struct to
   the firmware. Use params.auth_key_en indicator to set 
   if the ID_AUTH struct contains an author key or not.
 * Cleanup snp_context_create() and allocate certs_data in this
   function using kzalloc() to prevent giving the guest 
   uninitialized kernel memory.
 * Remove the check for guest supplied buffer greater than the data
   provided by the hypervisor in snp_handle_ext_guest_request().
 * Add check in sev_snp_ap_create() if a malicious guest can
   RMPADJUST a large page into VMSA which will hit the SNP erratum
   where the CPU will incorrectly signal an RMP violation #PF if a
   hugepage collides with the RMP entry of VMSA page, reject the
   AP CREATE request if VMSA address from guest is 2M aligned.
 * Make VMSAVE target area memory allocation SNP safe, implemented
   workaround for an SNP erratum where the CPU will incorrectly signal
   an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
   RMP entry of the VMSAVE target page.
 * Fix handle_split_page_fault() to work with memfd backed pages.
 * Add KVM commands for per-VM instance certificates.
 * Add IOMMU_SNP_SHUTDOWN support, this adds support for Host kexec
   support with SNP.

----------------------------------------------------------------
Ashish Kalra (4):
      x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling.
      crypto: ccp: Introduce snp leaked pages list
      KVM: SVM: Make VMSAVE target area memory allocation SNP safe
      iommu/amd: Add IOMMU_SNP_SHUTDOWN support

Brijesh Singh (32):
      x86/cpufeatures: Add SEV-SNP CPU feature
      x86/sev: Add the host SEV-SNP initialization support
      x86/sev: Add RMP entry lookup helpers
      x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
      x86/sev: Invalidate pages from the direct map when adding them to the RMP table
      x86/traps: Define RMP violation #PF error code
      x86/fault: Add support to handle the RMP fault for user address
      crypto:ccp: Define the SEV-SNP commands
      crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
      crypto:ccp: Provide API to issue SEV and SNP commands
      crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
      crypto: ccp: Handle the legacy SEV command when SNP is enabled
      crypto: ccp: Add the SNP_PLATFORM_STATUS command
      crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
      crypto: ccp: Provide APIs to query extended attestation report
      KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
      KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
      KVM: SVM: Add initial SEV-SNP support
      KVM: SVM: Add KVM_SNP_INIT command
      KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
      KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
      KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
      KVM: X86: Keep the NPT and RMP page level in sync
      KVM: x86: Define RMP page fault error bits for #NPF
      KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
      KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
      KVM: SVM: Add support to handle Page State Change VMGEXIT
      KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
      KVM: SVM: Add support to handle the RMP nested page fault
      KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
      KVM: SVM: Add module parameter to enable the SEV-SNP
      ccp: Add support to decrypt the page

Dionna Glaze (2):
      x86/sev: Add KVM commands for instance certs
      x86/sev: Document KVM_SEV_SNP_{G,S}ET_CERTS

Hugh Dickins (1):
      x86/fault: fix handle_split_page_fault() to work with memfd backed pages

Michael Roth (10):
      KVM: x86: Add 'fault_is_private' x86 op
      KVM: x86: Add 'update_mem_attr' x86 op
      KVM: x86: Add platform hooks for private memory invalidations
      KVM: SEV: Require KVM_PROTECTED_VM when AMD_MEM_ENCRYPT is enabled
      KVM: Split out memory attribute xarray updates to helper function
      x86/fault: Add helper for dumping RMP entries
      KVM: SVM: Add KVM_EXIT_VMGEXIT
      KVM: SVM: Add SNP-specific handling for memory attribute updates
      KVM: SVM: Implement .fault_is_private callback for SNP
      KVM: SEV: Handle restricted memory invalidations for SNP

Nikunj A Dadhania (2):
      KVM: SEV: Rename sev_{pin,unpin}_memory
      KVM: SEV: Handle memory backed by restricted memfd

Tom Lendacky (3):
      KVM: SVM: Add support to handle AP reset MSR protocol
      KVM: SVM: Use a VMSA physical address variable for populating VMCB
      KVM: SVM: Support SEV-SNP AP Creation NAE event

Vishal Annapurve (2):
      KVM: Add HVA range operator
      KVM: SEV: Populate private memory fd during LAUNCH_UPDATE_DATA

 Documentation/virt/coco/sev-guest.rst              |   54 +
 .../virt/kvm/x86/amd-memory-encryption.rst         |  147 ++
 arch/x86/Kconfig                                   |    1 +
 arch/x86/include/asm/cpufeatures.h                 |    1 +
 arch/x86/include/asm/disabled-features.h           |    8 +-
 arch/x86/include/asm/kvm-x86-ops.h                 |    5 +
 arch/x86/include/asm/kvm_host.h                    |   19 +
 arch/x86/include/asm/msr-index.h                   |   11 +-
 arch/x86/include/asm/sev-common.h                  |   28 +
 arch/x86/include/asm/sev.h                         |   28 +
 arch/x86/include/asm/svm.h                         |    6 +
 arch/x86/include/asm/trap_pf.h                     |   18 +-
 arch/x86/kernel/cpu/amd.c                          |    5 +-
 arch/x86/kernel/sev.c                              |  472 +++++
 arch/x86/kvm/lapic.c                               |    5 +-
 arch/x86/kvm/mmu.h                                 |    2 -
 arch/x86/kvm/mmu/mmu.c                             |   31 +-
 arch/x86/kvm/mmu/mmu_internal.h                    |   37 +-
 arch/x86/kvm/svm/sev.c                             | 2089 +++++++++++++++++---
 arch/x86/kvm/svm/svm.c                             |   55 +-
 arch/x86/kvm/svm/svm.h                             |   43 +-
 arch/x86/kvm/trace.h                               |   34 +
 arch/x86/kvm/x86.c                                 |   10 +
 arch/x86/mm/fault.c                                |  118 +-
 drivers/crypto/ccp/sev-dev.c                       | 1054 +++++++++-
 drivers/crypto/ccp/sev-dev.h                       |   18 +
 drivers/iommu/amd/init.c                           |   53 +
 include/linux/amd-iommu.h                          |    1 +
 include/linux/kvm_host.h                           |   14 +
 include/linux/mm.h                                 |    3 +-
 include/linux/mm_types.h                           |    3 +
 include/linux/psp-sev.h                            |  351 +++-
 include/uapi/linux/kvm.h                           |   74 +
 include/uapi/linux/psp-sev.h                       |   62 +
 mm/memory.c                                        |   15 +
 mm/restrictedmem.c                                 |   12 +-
 tools/arch/x86/include/asm/cpufeatures.h           |    1 +
 virt/kvm/kvm_main.c                                |  117 +-
 38 files changed, 4714 insertions(+), 291 deletions(-)



^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
@ 2023-02-20 18:37 ` Michael Roth
  2023-03-01 10:25   ` Zhi Wang
                     ` (2 more replies)
  2023-02-20 18:37 ` [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' " Michael Roth
                   ` (56 subsequent siblings)
  57 siblings, 3 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:37 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

This callback is used by the KVM MMU to check whether a #NPF was for a
private GPA or not.

In some cases the full 64-bit error code for the #NPF will be needed to
make this determination, so also update kvm_mmu_do_page_fault() to
accept the full 64-bit value so it can be plumbed through to the
callback.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/mmu/mmu.c             |  3 +--
 arch/x86/kvm/mmu/mmu_internal.h    | 37 +++++++++++++++++++++++++++---
 4 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 8dc345cc6318..72183da010b8 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e552374f2357..f856d689dda0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1643,6 +1643,7 @@ struct kvm_x86_ops {
 
 	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			     int root_level);
+	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
 
 	bool (*has_wbinvd_exit)(void);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eda615f3951c..fb3f34b7391c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5724,8 +5724,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	}
 
 	if (r == RET_PF_INVALID) {
-		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
-					  lower_32_bits(error_code), false);
+		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false);
 		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
 			return -EIO;
 	}
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e642d431df4b..557a001210df 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -231,6 +231,37 @@ struct kvm_page_fault {
 
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
+static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
+{
+	struct kvm_memory_slot *slot;
+	bool private_fault = false;
+	gfn_t gfn = gpa_to_gfn(gpa);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
+		goto out;
+	}
+
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
+		goto out;
+	}
+
+	if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault))
+		goto out;
+
+	/*
+	 * Handling below is for UPM self-tests and guests that treat userspace
+	 * as the authority on whether a fault should be private or not.
+	 */
+	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
+
+out:
+	pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault);
+	return private_fault;
+}
+
 /*
  * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
  * and of course kvm_mmu_do_page_fault().
@@ -262,11 +293,11 @@ enum {
 };
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-					u32 err, bool prefetch)
+					u64 err, bool prefetch)
 {
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
-		.error_code = err,
+		.error_code = lower_32_bits(err),
 		.exec = err & PFERR_FETCH_MASK,
 		.write = err & PFERR_WRITE_MASK,
 		.present = err & PFERR_PRESENT_MASK,
@@ -280,7 +311,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
-		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
+		.is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
 	};
 	int r;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' x86 op
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2023-02-20 18:37 ` [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op Michael Roth
@ 2023-02-20 18:37 ` Michael Roth
  2023-03-18  4:56   ` Isaku Yamahata
  2023-02-20 18:37 ` [PATCH RFC v8 03/56] KVM: x86: Add platform hooks for private memory invalidations Michael Roth
                   ` (55 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:37 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

This callback will do any platform-specific handling needed for
converting pages between shared/private.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/mmu/mmu.c             | 13 +++++++++++++
 include/linux/kvm_host.h           |  4 ++++
 virt/kvm/kvm_main.c                | 29 +++++++++++++++++++++++++++++
 5 files changed, 49 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 72183da010b8..a8aaf532c2ab 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -132,6 +132,7 @@ KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
+KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f856d689dda0..2da3fb2d5d1b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1644,6 +1644,8 @@ struct kvm_x86_ops {
 	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			     int root_level);
 	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
+	int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
+			       gfn_t start, gfn_t end);
 
 	bool (*has_wbinvd_exit)(void);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fb3f34b7391c..053bd77bbf52 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7251,4 +7251,17 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
 		linfo_update_mixed(gfn, slot, level, mixed);
 	}
 }
+
+void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
+					 struct kvm_memory_slot *slot,
+					 unsigned long attrs,
+					 gfn_t start, gfn_t end)
+{
+	int ret;
+
+	ret = static_call(kvm_x86_update_mem_attr)(slot, attrs, start, end);
+	if (ret)
+		pr_warn_ratelimited("Failed to update GFN range 0x%llx-0x%llx with attributes 0x%lx. Ret: %d\n",
+				    start, end, attrs, ret);
+}
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fdc59479b3e2..d200b8f45583 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2330,6 +2330,10 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
 				    struct kvm_memory_slot *slot,
 				    unsigned long attrs,
 				    gfn_t start, gfn_t end);
+void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
+					 struct kvm_memory_slot *slot,
+					 unsigned long attrs,
+					 gfn_t start, gfn_t end);
 
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b68574ff6c30..8ec985f1c57d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2561,6 +2561,32 @@ static void kvm_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
 		kvm_flush_remote_tlbs(kvm);
 }
 
+static void kvm_post_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
+				       gfn_t start_orig, gfn_t end_orig)
+{
+	struct kvm_memory_slot *slot;
+	struct kvm_memslots *slots;
+	struct kvm_memslot_iter iter;
+	int i;
+
+	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+		slots = __kvm_memslots(kvm, i);
+
+		kvm_for_each_memslot_in_gfn_range(&iter, slots, start_orig, end_orig) {
+			gfn_t start, end;
+
+			slot = iter.slot;
+			start = max(start_orig, slot->base_gfn);
+			end = min(end_orig, slot->base_gfn + slot->npages);
+
+			if (start >= end)
+				continue;
+
+			kvm_arch_post_set_memory_attributes(kvm, slot, attrs, start, end);
+		}
+	}
+}
+
 static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 					   struct kvm_memory_attributes *attrs)
 {
@@ -2602,6 +2628,9 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 	kvm_mmu_invalidate_end(kvm);
 	KVM_MMU_UNLOCK(kvm);
 
+	if (i > start)
+		kvm_post_mem_attrs_changed(kvm, attrs->attributes, start, i);
+
 	mutex_unlock(&kvm->slots_lock);
 
 	attrs->address = i << PAGE_SHIFT;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 03/56] KVM: x86: Add platform hooks for private memory invalidations
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
  2023-02-20 18:37 ` [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op Michael Roth
  2023-02-20 18:37 ` [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' " Michael Roth
@ 2023-02-20 18:37 ` Michael Roth
  2023-03-18  5:13   ` Isaku Yamahata
  2023-02-20 18:37 ` [PATCH RFC v8 04/56] KVM: Add HVA range operator Michael Roth
                   ` (54 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:37 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

In some cases, like with SEV-SNP, guest memory needs to be updated in a
platform-specific manner before it can be safely freed back to the host.
Add hooks to wire up handling of this sort to the invalidation notifiers
for restricted memory.

Also issue invalidations of all allocated pages during notifier/memslot
unbinding so that the pages are not left in an unusable state when
they eventually get freed back to the host upon FD release.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/mmu/mmu.c             |  5 +++++
 include/linux/kvm_host.h           |  3 +++
 mm/restrictedmem.c                 | 12 +++++++++++-
 virt/kvm/kvm_main.c                | 12 +++++++++++-
 6 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index a8aaf532c2ab..6a885f024a00 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -133,6 +133,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
 KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
+KVM_X86_OP_OPTIONAL(invalidate_restricted_mem)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2da3fb2d5d1b..37c92412035f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1646,6 +1646,7 @@ struct kvm_x86_ops {
 	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
 	int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
 			       gfn_t start, gfn_t end);
+	void (*invalidate_restricted_mem)(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
 
 	bool (*has_wbinvd_exit)(void);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 053bd77bbf52..360af0c9997e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7264,4 +7264,9 @@ void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 		pr_warn_ratelimited("Failed to update GFN range 0x%llx-0x%llx with attributes 0x%lx. Ret: %d\n",
 				    start, end, attrs, ret);
 }
+
+void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
+{
+	static_call_cond(kvm_x86_invalidate_restricted_mem)(slot, start, end);
+}
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d200b8f45583..4d542060cd93 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2341,6 +2341,9 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 
 }
+
+void kvm_arch_invalidate_restricted_mem(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
+
 #else
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
index fd6f3c66033f..c8353c592cfe 100644
--- a/mm/restrictedmem.c
+++ b/mm/restrictedmem.c
@@ -17,7 +17,7 @@ struct restrictedmem {
 
 static int restrictedmem_release(struct inode *inode, struct file *file)
 {
-	struct restrictedmem *rm = inode->i_mapping->private_data;
+	struct restrictedmem *rm = file->f_mapping->private_data;
 
 	xa_destroy(&rm->bindings);
 	fput(rm->memfd);
@@ -305,10 +305,20 @@ void restrictedmem_unbind(struct file *file, pgoff_t start, pgoff_t end,
 			  struct restrictedmem_notifier *notifier)
 {
 	struct restrictedmem *rm = file->f_mapping->private_data;
+	unsigned long index;
 
+	pr_debug("%s: unregistering notifier, invalidating page offsets 0x%lx-0x%lx\n",
+		 __func__, start, end);
 	down_write(&rm->lock);
+
+	xa_for_each_range(&rm->bindings, index, notifier, start, end)
+		notifier->ops->invalidate_start(notifier, start, end);
+	xa_for_each_range(&rm->bindings, index, notifier, start, end)
+		notifier->ops->invalidate_end(notifier, start, end);
+
 	xa_store_range(&rm->bindings, start, end, NULL, GFP_KERNEL);
 	synchronize_rcu();
+
 	up_write(&rm->lock);
 }
 EXPORT_SYMBOL_GPL(restrictedmem_unbind);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8ec985f1c57d..f7e00593cc5d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -960,8 +960,15 @@ static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no
 	struct kvm *kvm = slot->kvm;
 	int idx;
 
-	if (restrictedmem_get_gfn_range(slot, start, end, &gfn_range))
+	if (restrictedmem_get_gfn_range(slot, start, end, &gfn_range)) {
+		pr_debug("%s: Invalidation skipped, slot: %d, start: 0x%lx, end: 0x%lx, restrictedmem.index: 0x%lx\n",
+			 __func__, slot->id, start, end, slot->restrictedmem.index);
 		return;
+	}
+
+	pr_debug("%s: slot: %d, start: 0x%lx, end: 0x%lx, restrictedmem.index: 0x%lx, gfn_start: 0x%llx, gfn_end: 0x%llx\n",
+		 __func__, slot->id, start, end, slot->restrictedmem.index, gfn_range.start,
+		 gfn_range.end);
 
 	idx = srcu_read_lock(&kvm->srcu);
 	KVM_MMU_LOCK(kvm);
@@ -972,7 +979,10 @@ static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no
 		kvm_flush_remote_tlbs(kvm);
 
 	KVM_MMU_UNLOCK(kvm);
+
 	srcu_read_unlock(&kvm->srcu, idx);
+
+	kvm_arch_invalidate_restricted_mem(slot, gfn_range.start, gfn_range.end);
 }
 
 static void kvm_restrictedmem_invalidate_end(struct restrictedmem_notifier *notifier,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 04/56] KVM: Add HVA range operator
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (2 preceding siblings ...)
  2023-02-20 18:37 ` [PATCH RFC v8 03/56] KVM: x86: Add platform hooks for private memory invalidations Michael Roth
@ 2023-02-20 18:37 ` Michael Roth
  2023-02-20 21:37   ` Zhi Wang
  2023-02-20 18:37 ` [PATCH RFC v8 05/56] KVM: SEV: Require KVM_PROTECTED_VM when AMD_MEM_ENCRYPT is enabled Michael Roth
                   ` (53 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:37 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Vishal Annapurve

From: Vishal Annapurve <vannapurve@google.com>

Introduce HVA range operator so that other KVM subsystems
can operate on HVA range.

Signed-off-by: Vishal Annapurve <vannapurve@google.com>
[mdr: minor checkpatch alignment fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/kvm_host.h |  6 +++++
 virt/kvm/kvm_main.c      | 48 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4d542060cd93..c615650ed256 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1402,6 +1402,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm);
 void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
 void kvm_mmu_invalidate_end(struct kvm *kvm);
 
+typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
+				struct kvm_gfn_range *range, void *data);
+
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
+
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg);
 long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f7e00593cc5d..4ccd655dd5af 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -642,6 +642,54 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 	return (int)ret;
 }
 
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
+{
+	int ret = 0;
+	struct kvm_gfn_range gfn_range;
+	struct kvm_memory_slot *slot;
+	struct kvm_memslots *slots;
+	int i, idx;
+
+	if (WARN_ON_ONCE(hva_end <= hva_start))
+		return -EINVAL;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+		struct interval_tree_node *node;
+
+		slots = __kvm_memslots(kvm, i);
+		kvm_for_each_memslot_in_hva_range(node, slots,
+						  hva_start, hva_end - 1) {
+			unsigned long start, end;
+
+			slot = container_of(node, struct kvm_memory_slot,
+					    hva_node[slots->node_idx]);
+			start = max(hva_start, slot->userspace_addr);
+			end = min(hva_end, slot->userspace_addr +
+						  (slot->npages << PAGE_SHIFT));
+
+			/*
+			 * {gfn(page) | page intersects with [hva_start, hva_end)} =
+			 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
+			 */
+			gfn_range.start = hva_to_gfn_memslot(start, slot);
+			gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
+			gfn_range.slot = slot;
+
+			ret = handler(kvm, &gfn_range, data);
+			if (ret)
+				goto e_ret;
+		}
+	}
+
+e_ret:
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return ret;
+}
+
 static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
 						unsigned long start,
 						unsigned long end,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 05/56] KVM: SEV: Require KVM_PROTECTED_VM when AMD_MEM_ENCRYPT is enabled
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (3 preceding siblings ...)
  2023-02-20 18:37 ` [PATCH RFC v8 04/56] KVM: Add HVA range operator Michael Roth
@ 2023-02-20 18:37 ` Michael Roth
  2023-02-20 18:37 ` [PATCH RFC v8 06/56] KVM: Split out memory attribute xarray updates to helper function Michael Roth
                   ` (52 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:37 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

AMD_MEM_ENCRYPT implies SEV support, which now relies on support
provided by the KVM_PROTECTED_VM config option.

An argument can be made that SEV running in non-protected-VM-mode is
still possible, and so this should be configurable, but AMD_MEM_ENCRYPT
will also imply SEV-SNP, for which KVM_PROTECTED_VM is required in all
cases.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67745ceab0db..f0d8f6bbc1a7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1546,6 +1546,7 @@ config AMD_MEM_ENCRYPT
 	select INSTRUCTION_DECODER
 	select ARCH_HAS_CC_PLATFORM
 	select X86_MEM_ENCRYPT
+	select KVM_PROTECTED_VM
 	help
 	  Say yes to enable support for the encryption of system memory.
 	  This requires an AMD processor that supports Secure Memory
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 06/56] KVM: Split out memory attribute xarray updates to helper function
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (4 preceding siblings ...)
  2023-02-20 18:37 ` [PATCH RFC v8 05/56] KVM: SEV: Require KVM_PROTECTED_VM when AMD_MEM_ENCRYPT is enabled Michael Roth
@ 2023-02-20 18:37 ` Michael Roth
  2023-02-20 18:37 ` [PATCH RFC v8 07/56] KVM: SEV: Populate private memory fd during LAUNCH_UPDATE_DATA Michael Roth
                   ` (51 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:37 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

This will be useful to other callers that need to update memory
attributes for things like setting up the initial private memory payload
for a guest.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      | 26 ++++++++++++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c615650ed256..57d56cd09a61 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -993,6 +993,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module);
 void kvm_exit(void);
 
 void kvm_get_kvm(struct kvm *kvm);
+int kvm_vm_set_region_attr(struct kvm *kvm, gfn_t start, gfn_t end, u64 attributes);
 bool kvm_get_kvm_safe(struct kvm *kvm);
 void kvm_put_kvm(struct kvm *kvm);
 bool file_is_kvm(struct file *file);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4ccd655dd5af..c740b56d6ba4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2645,12 +2645,28 @@ static void kvm_post_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
 	}
 }
 
+int kvm_vm_set_region_attr(struct kvm *kvm, gfn_t start, gfn_t end,
+			   u64 attributes)
+{
+	gfn_t index;
+	void *entry;
+
+	entry = attributes ? xa_mk_value(attributes) : NULL;
+
+	for (index = start; index < end; index++)
+		if (xa_err(xa_store(&kvm->mem_attr_array, index, entry,
+				    GFP_KERNEL_ACCOUNT)))
+			break;
+
+	return index;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_set_region_attr);
+
 static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 					   struct kvm_memory_attributes *attrs)
 {
 	gfn_t start, end;
 	unsigned long i;
-	void *entry;
 
 	/* flags is currently not used. */
 	if (attrs->flags)
@@ -2665,8 +2681,6 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 	start = attrs->address >> PAGE_SHIFT;
 	end = (attrs->address + attrs->size - 1 + PAGE_SIZE) >> PAGE_SHIFT;
 
-	entry = attrs->attributes ? xa_mk_value(attrs->attributes) : NULL;
-
 	mutex_lock(&kvm->slots_lock);
 
 	KVM_MMU_LOCK(kvm);
@@ -2674,11 +2688,7 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 	kvm_mmu_invalidate_range_add(kvm, start, end);
 	KVM_MMU_UNLOCK(kvm);
 
-	for (i = start; i < end; i++)
-		if (xa_err(xa_store(&kvm->mem_attr_array, i, entry,
-				    GFP_KERNEL_ACCOUNT)))
-			break;
-
+	i = kvm_vm_set_region_attr(kvm, start, end, attrs->attributes);
 
 	KVM_MMU_LOCK(kvm);
 	if (i > start)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 07/56] KVM: SEV: Populate private memory fd during LAUNCH_UPDATE_DATA
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (5 preceding siblings ...)
  2023-02-20 18:37 ` [PATCH RFC v8 06/56] KVM: Split out memory attribute xarray updates to helper function Michael Roth
@ 2023-02-20 18:37 ` Michael Roth
  2023-02-20 18:37 ` [PATCH RFC v8 08/56] KVM: SEV: Rename sev_{pin,unpin}_memory Michael Roth
                   ` (50 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:37 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Vishal Annapurve

From: Vishal Annapurve <vannapurve@google.com>

This change adds handling of HVA ranges to copy contents
to private memory while doing sev launch update data.

mem_attr array is updated during LAUNCH_UPDATE_DATA to ensure
that encrypted memory is marked as private.

Signed-off-by: Vishal Annapurve <vannapurve@google.com>
[mdr: Use gfn_to_hva_memslot_prot() for shared GFN handler to deal with
 read-only slots for ROMs. Split kvm_vm_set_region_attr into separate
 patch.]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 103 +++++++++++++++++++++++++++++++++++++----
 virt/kvm/kvm_main.c    |   2 +
 2 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 273cba809328..fad7fb34ef9e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -494,23 +494,26 @@ static unsigned long get_num_contig_pages(unsigned long idx,
 	return pages;
 }
 
-static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
+static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
+						struct kvm_gfn_range *range,
+						struct kvm_sev_cmd *argp)
 {
 	unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i;
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-	struct kvm_sev_launch_update_data params;
 	struct sev_data_launch_update_data data;
 	struct page **inpages;
 	int ret;
 
-	if (!sev_guest(kvm))
-		return -ENOTTY;
-
-	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
-		return -EFAULT;
+	vaddr = gfn_to_hva_memslot_prot(range->slot, range->start, NULL);
+	pr_debug("%s: shared GFN: %llx, slot.id: %d, slot.base_gfn: %llx, slot.userspace_addr: %lx, slot.flags: %x, vaddr: %lx\n",
+		 __func__, range->start, range->slot->id, range->slot->base_gfn,
+		 range->slot->userspace_addr, range->slot->flags, vaddr);
+	if (kvm_is_error_hva(vaddr)) {
+		pr_err("vaddr is erroneous 0x%lx\n", vaddr);
+		return -EINVAL;
+	}
 
-	vaddr = params.uaddr;
-	size = params.len;
+	size = (range->end - range->start) << PAGE_SHIFT;
 	vaddr_end = vaddr + size;
 
 	/* Lock the user memory. */
@@ -562,6 +565,88 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_launch_update_priv_gfn_handler(struct kvm *kvm,
+					      struct kvm_gfn_range *range,
+					      struct kvm_sev_cmd *argp)
+{
+	struct sev_data_launch_update_data data;
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	gfn_t gfn;
+	kvm_pfn_t pfn;
+	struct kvm_memory_slot *memslot = range->slot;
+	int ret = 0;
+
+	data.reserved = 0;
+	data.handle = sev->handle;
+
+	for (gfn = range->start; gfn < range->end; gfn++) {
+		int order;
+		void *kvaddr;
+
+		ret = kvm_restrictedmem_get_pfn(memslot, gfn, &pfn, &order);
+		if (ret)
+			goto e_ret;
+
+		kvaddr = pfn_to_kaddr(pfn);
+		if (!virt_addr_valid(kvaddr)) {
+			pr_debug("%s: Invalid kvaddr 0x%llx\n", __func__, (uint64_t)kvaddr);
+			ret = -EINVAL;
+			goto e_ret;
+		}
+
+		ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+		if (ret) {
+			pr_debug("%s: Guest read failed 0x%x\n", __func__, ret);
+			goto e_ret;
+		}
+
+		if (!cpu_feature_enabled(X86_FEATURE_SME_COHERENT))
+			clflush_cache_range(kvaddr, PAGE_SIZE);
+
+		data.len = PAGE_SIZE;
+		data.address = __sme_set(pfn << PAGE_SHIFT);
+		ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_DATA, &data, &argp->error);
+		if (ret)
+			goto e_ret;
+		kvm_release_pfn_clean(pfn);
+	}
+
+	/*
+	 * Memory attribute updates via KVM_SET_MEMORY_ATTRIBUTES are serialized
+	 * via kvm->slots_lock, so use the same protocol for updating them here.
+	 */
+	mutex_lock(&kvm->slots_lock);
+	kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+	mutex_unlock(&kvm->slots_lock);
+e_ret:
+	return ret;
+}
+
+static int sev_launch_update_gfn_handler(struct kvm *kvm, struct kvm_gfn_range *range,
+					 void *data)
+{
+	struct kvm_sev_cmd *argp = (struct kvm_sev_cmd *)data;
+
+	if (kvm_slot_can_be_private(range->slot))
+		return sev_launch_update_priv_gfn_handler(kvm, range, argp);
+
+	return sev_launch_update_shared_gfn_handler(kvm, range, argp);
+}
+
+static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_launch_update_data params;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
+		sev_launch_update_gfn_handler, argp);
+}
+
 static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 {
 	struct sev_es_save_area *save = svm->sev_es.vmsa;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c740b56d6ba4..003cb199ba4b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -689,6 +689,7 @@ int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(kvm_vm_do_hva_range_op);
 
 static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
 						unsigned long start,
@@ -2850,6 +2851,7 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot,
 
 	return hva;
 }
+EXPORT_SYMBOL_GPL(gfn_to_hva_memslot_prot);
 
 unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 08/56] KVM: SEV: Rename sev_{pin,unpin}_memory
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (6 preceding siblings ...)
  2023-02-20 18:37 ` [PATCH RFC v8 07/56] KVM: SEV: Populate private memory fd during LAUNCH_UPDATE_DATA Michael Roth
@ 2023-02-20 18:37 ` Michael Roth
  2023-03-03 14:00   ` Vlastimil Babka
  2023-02-20 18:38 ` [PATCH RFC v8 09/56] KVM: SEV: Handle memory backed by restricted memfd Michael Roth
                   ` (49 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:37 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Nikunj A Dadhania

From: Nikunj A Dadhania <nikunj@amd.com>

Rename sev_{pin|unpin}_memory to sev_memory_{get|put}_pages. Apart
from pinning the pages, sev_pin_memory also populates the pages array
which is used by its callers. SEV guest using restricted memfd do not
to pin the memory but will require the pages array to be populated.
Rename the function appropriately.

No functional change intended.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 62 ++++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index fad7fb34ef9e..523c78bbff3f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -383,9 +383,13 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
-static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
-				    unsigned long ulen, unsigned long *n,
-				    int write)
+/*
+ * Legacy SEV guest pin the pages and return the array populated with pinned
+ * pages.
+ */
+static struct page **sev_memory_get_pages(struct kvm *kvm, unsigned long uaddr,
+					  unsigned long ulen, unsigned long *n,
+					  int write)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 	unsigned long npages, size;
@@ -446,8 +450,8 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
 	return ERR_PTR(ret);
 }
 
-static void sev_unpin_memory(struct kvm *kvm, struct page **pages,
-			     unsigned long npages)
+static void sev_memory_put_pages(struct kvm *kvm, struct page **pages,
+				 unsigned long npages)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 
@@ -517,7 +521,7 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
 	vaddr_end = vaddr + size;
 
 	/* Lock the user memory. */
-	inpages = sev_pin_memory(kvm, vaddr, size, &npages, 1);
+	inpages = sev_memory_get_pages(kvm, vaddr, size, &npages, 1);
 	if (IS_ERR(inpages))
 		return PTR_ERR(inpages);
 
@@ -548,20 +552,20 @@ static int sev_launch_update_shared_gfn_handler(struct kvm *kvm,
 		data.address = __sme_page_pa(inpages[i]) + offset;
 		ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_DATA, &data, &argp->error);
 		if (ret)
-			goto e_unpin;
+			goto e_put_pages;
 
 		size -= len;
 		next_vaddr = vaddr + len;
 	}
 
-e_unpin:
+e_put_pages:
 	/* content of memory is updated, mark pages dirty */
 	for (i = 0; i < npages; i++) {
 		set_page_dirty_lock(inpages[i]);
 		mark_page_accessed(inpages[i]);
 	}
 	/* unlock the user pages */
-	sev_unpin_memory(kvm, inpages, npages);
+	sev_memory_put_pages(kvm, inpages, npages);
 	return ret;
 }
 
@@ -1028,13 +1032,13 @@ static int sev_dbg_crypt(struct kvm *kvm, struct kvm_sev_cmd *argp, bool dec)
 		int len, s_off, d_off;
 
 		/* lock userspace source and destination page */
-		src_p = sev_pin_memory(kvm, vaddr & PAGE_MASK, PAGE_SIZE, &n, 0);
+		src_p = sev_memory_get_pages(kvm, vaddr & PAGE_MASK, PAGE_SIZE, &n, 0);
 		if (IS_ERR(src_p))
 			return PTR_ERR(src_p);
 
-		dst_p = sev_pin_memory(kvm, dst_vaddr & PAGE_MASK, PAGE_SIZE, &n, 1);
+		dst_p = sev_memory_get_pages(kvm, dst_vaddr & PAGE_MASK, PAGE_SIZE, &n, 1);
 		if (IS_ERR(dst_p)) {
-			sev_unpin_memory(kvm, src_p, n);
+			sev_memory_put_pages(kvm, src_p, n);
 			return PTR_ERR(dst_p);
 		}
 
@@ -1068,8 +1072,8 @@ static int sev_dbg_crypt(struct kvm *kvm, struct kvm_sev_cmd *argp, bool dec)
 						     (void __user *)dst_vaddr,
 						     len, &argp->error);
 
-		sev_unpin_memory(kvm, src_p, n);
-		sev_unpin_memory(kvm, dst_p, n);
+		sev_memory_put_pages(kvm, src_p, n);
+		sev_memory_put_pages(kvm, dst_p, n);
 
 		if (ret)
 			goto err;
@@ -1098,7 +1102,7 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
 		return -EFAULT;
 
-	pages = sev_pin_memory(kvm, params.guest_uaddr, params.guest_len, &n, 1);
+	pages = sev_memory_get_pages(kvm, params.guest_uaddr, params.guest_len, &n, 1);
 	if (IS_ERR(pages))
 		return PTR_ERR(pages);
 
@@ -1114,7 +1118,7 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	 */
 	if (get_num_contig_pages(0, pages, n) != n) {
 		ret = -EINVAL;
-		goto e_unpin_memory;
+		goto e_put_pages;
 	}
 
 	memset(&data, 0, sizeof(data));
@@ -1126,7 +1130,7 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	blob = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
 	if (IS_ERR(blob)) {
 		ret = PTR_ERR(blob);
-		goto e_unpin_memory;
+		goto e_put_pages;
 	}
 
 	data.trans_address = __psp_pa(blob);
@@ -1147,13 +1151,13 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 e_free_blob:
 	kfree(blob);
-e_unpin_memory:
+e_put_pages:
 	/* content of memory is updated, mark pages dirty */
 	for (i = 0; i < n; i++) {
 		set_page_dirty_lock(pages[i]);
 		mark_page_accessed(pages[i]);
 	}
-	sev_unpin_memory(kvm, pages, n);
+	sev_memory_put_pages(kvm, pages, n);
 	return ret;
 }
 
@@ -1383,8 +1387,8 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		return -EINVAL;
 
 	/* Pin guest memory */
-	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
-				    PAGE_SIZE, &n, 0);
+	guest_page = sev_memory_get_pages(kvm, params.guest_uaddr & PAGE_MASK,
+					  PAGE_SIZE, &n, 0);
 	if (IS_ERR(guest_page))
 		return PTR_ERR(guest_page);
 
@@ -1392,7 +1396,7 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	ret = -ENOMEM;
 	hdr = kzalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
 	if (!hdr)
-		goto e_unpin;
+		goto e_put_pages;
 
 	trans_data = kzalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
 	if (!trans_data)
@@ -1431,8 +1435,8 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	kfree(trans_data);
 e_free_hdr:
 	kfree(hdr);
-e_unpin:
-	sev_unpin_memory(kvm, guest_page, n);
+e_put_pages:
+	sev_memory_put_pages(kvm, guest_page, n);
 
 	return ret;
 }
@@ -1579,8 +1583,8 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	data.trans_len = params.trans_len;
 
 	/* Pin guest memory */
-	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
-				    PAGE_SIZE, &n, 1);
+	guest_page = sev_memory_get_pages(kvm, params.guest_uaddr & PAGE_MASK,
+					  PAGE_SIZE, &n, 1);
 	if (IS_ERR(guest_page)) {
 		ret = PTR_ERR(guest_page);
 		goto e_free_trans;
@@ -1602,7 +1606,7 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, &data,
 				&argp->error);
 
-	sev_unpin_memory(kvm, guest_page, n);
+	sev_memory_put_pages(kvm, guest_page, n);
 
 e_free_trans:
 	kfree(trans);
@@ -2037,7 +2041,7 @@ int sev_mem_enc_register_region(struct kvm *kvm,
 		return -ENOMEM;
 
 	mutex_lock(&kvm->lock);
-	region->pages = sev_pin_memory(kvm, range->addr, range->size, &region->npages, 1);
+	region->pages = sev_memory_get_pages(kvm, range->addr, range->size, &region->npages, 1);
 	if (IS_ERR(region->pages)) {
 		ret = PTR_ERR(region->pages);
 		mutex_unlock(&kvm->lock);
@@ -2084,7 +2088,7 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
 static void __unregister_enc_region_locked(struct kvm *kvm,
 					   struct enc_region *region)
 {
-	sev_unpin_memory(kvm, region->pages, region->npages);
+	sev_memory_put_pages(kvm, region->pages, region->npages);
 	list_del(&region->list);
 	kfree(region);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 09/56] KVM: SEV: Handle memory backed by restricted memfd
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (7 preceding siblings ...)
  2023-02-20 18:37 ` [PATCH RFC v8 08/56] KVM: SEV: Rename sev_{pin,unpin}_memory Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-03 14:05   ` Vlastimil Babka
  2023-02-20 18:38 ` [PATCH RFC v8 10/56] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
                   ` (48 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Nikunj A Dadhania

From: Nikunj A Dadhania <nikunj@amd.com>

Do not pin the guest memory backed by a restrictedmem backend, as
pages in the restrictedmem are already pinned. Instead, populate the
pages array for these guests using the already-pinned pages provided by
restrictedmem backend.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 68 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 58 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 523c78bbff3f..ad9b29ff4590 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -383,9 +383,46 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_private_mem_get_pages_handler(struct kvm *kvm, struct kvm_gfn_range *range,
+					     void *data)
+{
+	struct kvm_memory_slot *memslot = range->slot;
+	struct page **pages = data;
+	int ret = 0, i = 0;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+
+	for (gfn = range->start; gfn < range->end; gfn++) {
+		int order;
+
+		ret = kvm_restrictedmem_get_pfn(memslot, gfn, &pfn, &order);
+		if (ret)
+			return ret;
+
+		if (is_error_noslot_pfn(pfn))
+			return -EFAULT;
+
+		pages[i++] = pfn_to_page(pfn);
+	}
+
+	return ret;
+}
+
+static int sev_private_mem_get_pages(struct kvm *kvm, unsigned long addr,
+				     unsigned long size, unsigned long npages,
+				     struct page **pages)
+{
+	return kvm_vm_do_hva_range_op(kvm, addr, addr + size,
+				      sev_private_mem_get_pages_handler, pages);
+}
+
 /*
  * Legacy SEV guest pin the pages and return the array populated with pinned
  * pages.
+ *
+ * SEV guests using restricted memfd backend, pages are already marked as
+ * unmovable and unevictable. Populate the pages array for these guests using
+ * restrictedmem get_pfn.
  */
 static struct page **sev_memory_get_pages(struct kvm *kvm, unsigned long uaddr,
 					  unsigned long ulen, unsigned long *n,
@@ -393,7 +430,7 @@ static struct page **sev_memory_get_pages(struct kvm *kvm, unsigned long uaddr,
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 	unsigned long npages, size;
-	int npinned;
+	int npinned = 0;
 	unsigned long locked, lock_limit;
 	struct page **pages;
 	unsigned long first, last;
@@ -429,16 +466,25 @@ static struct page **sev_memory_get_pages(struct kvm *kvm, unsigned long uaddr,
 	if (!pages)
 		return ERR_PTR(-ENOMEM);
 
-	/* Pin the user virtual address. */
-	npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
-	if (npinned != npages) {
-		pr_err("SEV: Failure locking %lu pages.\n", npages);
-		ret = -ENOMEM;
-		goto err;
+	if (kvm_arch_has_private_mem(kvm)) {
+		/* Get the PFN from memfile */
+		if (sev_private_mem_get_pages(kvm, uaddr, ulen, npages, pages)) {
+			pr_err("%s: ERROR: unable to find slot for uaddr %lx", __func__, uaddr);
+			ret = -ENOMEM;
+			goto err;
+		}
+	} else {
+		/* Pin the user virtual address. */
+		npinned = pin_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
+		if (npinned != npages) {
+			pr_err("SEV: Failure locking %lu pages.\n", npages);
+			ret = -ENOMEM;
+			goto err;
+		}
+		sev->pages_locked = locked;
 	}
 
 	*n = npages;
-	sev->pages_locked = locked;
 
 	return pages;
 
@@ -455,9 +501,11 @@ static void sev_memory_put_pages(struct kvm *kvm, struct page **pages,
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 
-	unpin_user_pages(pages, npages);
+	if (!kvm_arch_has_private_mem(kvm)) {
+		unpin_user_pages(pages, npages);
+		sev->pages_locked -= npages;
+	}
 	kvfree(pages);
-	sev->pages_locked -= npages;
 }
 
 static void sev_clflush_pages(struct page *pages[], unsigned long npages)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 10/56] x86/cpufeatures: Add SEV-SNP CPU feature
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (8 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 09/56] KVM: SEV: Handle memory backed by restricted memfd Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-21 21:21   ` Sathyanarayanan Kuppuswamy
  2023-02-20 18:38 ` [PATCH RFC v8 11/56] x86/sev: Add the host SEV-SNP initialization support Michael Roth
                   ` (47 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Jarkko Sakkinen, Ashish Kalra

From: Brijesh Singh <brijesh.singh@amd.com>

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/cpufeatures.h       | 1 +
 arch/x86/kernel/cpu/amd.c                | 5 +++--
 tools/arch/x86/include/asm/cpufeatures.h | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1419c4e04d45..480b4eaef310 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -420,6 +420,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 860b60273df3..c7884198ad5b 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -558,8 +558,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 	 *	      SME feature (set in scattered.c).
 	 *	      If the kernel has not enabled SME via any means then
 	 *	      don't advertise the SME feature.
-	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
-	 *            SEV and SEV_ES feature (set in scattered.c).
+	 *   For SEV: If BIOS has not enabled SEV then don't advertise SEV and
+	 *	      any additional functionality based on it.
 	 *
 	 *   In all cases, since support for SME and SEV requires long mode,
 	 *   don't advertise the feature under CONFIG_X86_32.
@@ -594,6 +594,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 clear_sev:
 		setup_clear_cpu_cap(X86_FEATURE_SEV);
 		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
 	}
 }
 
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index b71f4f2ecdd5..e81606fcd2ab 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -417,6 +417,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 11/56] x86/sev: Add the host SEV-SNP initialization support
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (9 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 10/56] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 20:12   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 12/56] x86/sev: Add RMP entry lookup helpers Michael Roth
                   ` (46 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
track the owner of each page of memory. Pages of memory can be owned by
the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
section 15.36.3 for more detail on RMP.

The RMP table is used to enforce access control to memory. The table
itself is not directly writable by the software. New CPU instructions
(RMPUPDATE, PVALIDATE, RMPADJUST) are used to manipulate the RMP
entries.

Based on the platform configuration, the BIOS reserves the memory used
for the RMP table. The start and end address of the RMP table must be
queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
RMP_END are not set then disable the SEV-SNP feature.

The SEV-SNP feature is enabled only after the RMP table is successfully
initialized.

Also set SYSCFG.MFMD when enabling SNP as SEV-SNP FW >= 1.51 requires
that SYSCFG.MFMD must be se

RMP table entry format is non-architectural and it can vary by processor
and is defined by the PPR. Restrict SNP support on the known CPU model
and family for which the RMP table entry format is currently defined
for.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/msr-index.h         |  11 +-
 arch/x86/kernel/sev.c                    | 175 +++++++++++++++++++++++
 3 files changed, 192 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 33d2cd04d254..9b5a2cc8064a 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -87,6 +87,12 @@
 # define DISABLE_TDX_GUEST	(1 << (X86_FEATURE_TDX_GUEST & 31))
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+# define DISABLE_SEV_SNP	0
+#else
+# define DISABLE_SEV_SNP	(1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -110,7 +116,7 @@
 			 DISABLE_ENQCMD)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
-#define DISABLED_MASK19	0
+#define DISABLED_MASK19	(DISABLE_SEV_SNP)
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
 
 #endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 10ac52705892..35100c630617 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -565,6 +565,8 @@
 #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
 #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
 #define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE		0xc0010132
+#define MSR_AMD64_RMP_END		0xc0010133
 
 #define MSR_AMD64_VIRT_SPEC_CTRL	0xc001011f
 
@@ -649,7 +651,14 @@
 #define MSR_K8_TOP_MEM2			0xc001001d
 #define MSR_AMD64_SYSCFG		0xc0010010
 #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
-#define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_MEM_ENCRYPT		BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
+#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
+#define MSR_AMD64_SYSCFG_MFDM_BIT		19
+#define MSR_AMD64_SYSCFG_MFDM			BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
+
 #define MSR_K8_INT_PENDING_MSG		0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a428c62330d3..e54e412c9916 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -22,6 +22,9 @@
 #include <linux/efi.h>
 #include <linux/platform_device.h>
 #include <linux/io.h>
+#include <linux/cpumask.h>
+#include <linux/iommu.h>
+#include <linux/amd-iommu.h>
 
 #include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
@@ -38,6 +41,7 @@
 #include <asm/apic.h>
 #include <asm/cpuid.h>
 #include <asm/cmdline.h>
+#include <asm/iommu.h>
 
 #define DR7_RESET_VALUE        0x400
 
@@ -57,6 +61,12 @@
 #define AP_INIT_CR0_DEFAULT		0x60000010
 #define AP_INIT_MXCSR_DEFAULT		0x1f80
 
+/*
+ * The first 16KB from the RMP_BASE is used by the processor for the
+ * bookkeeping, the range needs to be added during the RMP entry lookup.
+ */
+#define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
+
 /* For early boot hypervisor communication in SEV-ES enabled guests */
 static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
 
@@ -69,6 +79,9 @@ static struct ghcb *boot_ghcb __section(".data");
 /* Bitmap of SEV features supported by the hypervisor */
 static u64 sev_hv_features __ro_after_init;
 
+static unsigned long rmptable_start __ro_after_init;
+static unsigned long rmptable_end __ro_after_init;
+
 /* #VC handler runtime per-CPU data */
 struct sev_es_runtime_data {
 	struct ghcb ghcb_page;
@@ -2260,3 +2273,165 @@ static int __init snp_init_platform_device(void)
 	return 0;
 }
 device_initcall(snp_init_platform_device);
+
+#undef pr_fmt
+#define pr_fmt(fmt)	"SEV-SNP: " fmt
+
+static int __mfd_enable(unsigned int cpu)
+{
+	u64 val;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+
+	val |= MSR_AMD64_SYSCFG_MFDM;
+
+	wrmsrl(MSR_AMD64_SYSCFG, val);
+
+	return 0;
+}
+
+static __init void mfd_enable(void *arg)
+{
+	__mfd_enable(smp_processor_id());
+}
+
+static int __snp_enable(unsigned int cpu)
+{
+	u64 val;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+
+	val |= MSR_AMD64_SYSCFG_SNP_EN;
+	val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+	wrmsrl(MSR_AMD64_SYSCFG, val);
+
+	return 0;
+}
+
+static __init void snp_enable(void *arg)
+{
+	__snp_enable(smp_processor_id());
+}
+
+static bool get_rmptable_info(u64 *start, u64 *len)
+{
+	u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
+
+	rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+	rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+	if (!rmp_base || !rmp_end) {
+		pr_err("Memory for the RMP table has not been reserved by BIOS\n");
+		return false;
+	}
+
+	rmp_sz = rmp_end - rmp_base + 1;
+
+	/*
+	 * Calculate the amount the memory that must be reserved by the BIOS to
+	 * address the whole RAM. The reserved memory should also cover the
+	 * RMP table itself.
+	 */
+	calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + totalram_pages()) << 4)
+		      + RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+	if (calc_rmp_sz > rmp_sz) {
+		pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
+		       calc_rmp_sz, rmp_sz);
+		return false;
+	}
+
+	*start = rmp_base;
+	*len = rmp_sz;
+
+	pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n", rmp_base, rmp_end);
+
+	return true;
+}
+
+static __init int snp_rmptable_init(void)
+{
+	u64 rmp_base, sz;
+	void *start;
+	u64 val;
+
+	if (!get_rmptable_info(&rmp_base, &sz))
+		return 1;
+
+	start = memremap(rmp_base, sz, MEMREMAP_WB);
+	if (!start) {
+		pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, sz);
+		return 1;
+	}
+
+	/*
+	 * Check if SEV-SNP is already enabled, this can happen in case of
+	 * kexec boot.
+	 */
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+	if (val & MSR_AMD64_SYSCFG_SNP_EN)
+		goto skip_enable;
+
+	memset(start, 0, sz);
+
+	/* Flush the caches to ensure that data is written before SNP is enabled. */
+	wbinvd_on_all_cpus();
+
+	/* MFDM must be enabled on all the CPUs prior to enabling SNP. */
+	on_each_cpu(mfd_enable, NULL, 1);
+
+	/* Enable SNP on all CPUs. */
+	on_each_cpu(snp_enable, NULL, 1);
+
+skip_enable:
+	rmptable_start = (unsigned long)start;
+	rmptable_end = rmptable_start + sz - 1;
+
+	return 0;
+}
+
+static int __init snp_host_init(void)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	/*
+	 * RMP table entry format is not architectural and it can vary by processor and
+	 * is defined by the per-processor PPR. Restrict SNP support on the known CPU
+	 * model and family for which the RMP table entry format is currently defined for.
+	 */
+	if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xaf)
+		goto nosnp;
+
+	if (amd_iommu_snp_enable())
+		goto nosnp;
+
+	if (snp_rmptable_init())
+		goto nosnp;
+
+	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+
+	return 0;
+
+nosnp:
+	setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+	return -ENODEV;
+}
+
+/*
+ * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
+ * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
+ * called after subsys_initcall().
+ *
+ * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
+ * directly into guest private memory. In case of SNP, the IOMMU ensures that
+ * the page(s) used for DMA are hypervisor owned.
+ */
+fs_initcall(snp_host_init);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 12/56] x86/sev: Add RMP entry lookup helpers
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (10 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 11/56] x86/sev: Add the host SEV-SNP initialization support Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-03 15:28   ` Vlastimil Babka
  2023-02-20 18:38 ` [PATCH RFC v8 13/56] x86/fault: Add helper for dumping RMP entries Michael Roth
                   ` (45 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
entry for a given page. The RMP entry format is documented in AMD PPR, see
https://bugzilla.kernel.org/attachment.cgi?id=296015.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h |  4 +-
 arch/x86/kernel/sev.c      | 84 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ebc271bb6d8e..8d3ce2ad27da 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -83,7 +83,7 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
-
+#define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
 #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
 
 /* SNP Guest message request */
@@ -197,6 +197,7 @@ void snp_set_wakeup_secondary_cpu(void);
 bool snp_init(struct boot_params *bp);
 void __init __noreturn snp_abort(void);
 int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
+int snp_lookup_rmpentry(u64 pfn, int *level);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -221,6 +222,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
 {
 	return -ENOTTY;
 }
+static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
 #endif
 
 #endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e54e412c9916..a063c1b98034 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -61,11 +61,36 @@
 #define AP_INIT_CR0_DEFAULT		0x60000010
 #define AP_INIT_MXCSR_DEFAULT		0x1f80
 
+/*
+ * The RMP entry format is not architectural. The format is defined in PPR
+ * Family 19h Model 01h, Rev B1 processor.
+ */
+struct rmpentry {
+	union {
+		struct {
+			u64	assigned	: 1,
+				pagesize	: 1,
+				immutable	: 1,
+				rsvd1		: 9,
+				gpa		: 39,
+				asid		: 10,
+				vmsa		: 1,
+				validated	: 1,
+				rsvd2		: 1;
+		} info;
+		u64 low;
+	};
+	u64 high;
+} __packed;
+
 /*
  * The first 16KB from the RMP_BASE is used by the processor for the
  * bookkeeping, the range needs to be added during the RMP entry lookup.
  */
 #define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
+#define RMPENTRY_SHIFT			8
+#define rmptable_page_offset(x)	(RMPTABLE_CPU_BOOKKEEPING_SZ + \
+				 (((unsigned long)x) >> RMPENTRY_SHIFT))
 
 /* For early boot hypervisor communication in SEV-ES enabled guests */
 static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
@@ -2435,3 +2460,62 @@ static int __init snp_host_init(void)
  * the page(s) used for DMA are hypervisor owned.
  */
 fs_initcall(snp_host_init);
+
+static inline unsigned int rmpentry_assigned(struct rmpentry *e)
+{
+	return e->info.assigned;
+}
+
+static inline unsigned int rmpentry_pagesize(struct rmpentry *e)
+{
+	return e->info.pagesize;
+}
+
+static struct rmpentry *rmptable_entry(unsigned long paddr)
+{
+	unsigned long vaddr;
+
+	vaddr = rmptable_start + rmptable_page_offset(paddr);
+	if (unlikely(vaddr > rmptable_end))
+		return ERR_PTR(-EFAULT);
+
+	return (struct rmpentry *)vaddr;
+}
+
+static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
+{
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	struct rmpentry *entry, *large_entry;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return ERR_PTR(-ENXIO);
+
+	if (!pfn_valid(pfn))
+		return ERR_PTR(-EINVAL);
+
+	entry = rmptable_entry(paddr);
+	if (IS_ERR(entry))
+		return entry;
+
+	/* Read a large RMP entry to get the correct page level used in RMP entry. */
+	large_entry = rmptable_entry(paddr & PMD_MASK);
+	*level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(large_entry));
+
+	return entry;
+}
+
+/*
+ * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
+ * and -errno if there is no corresponding RMP entry.
+ */
+int snp_lookup_rmpentry(u64 pfn, int *level)
+{
+	struct rmpentry *e;
+
+	e = __snp_lookup_rmpentry(pfn, level);
+	if (IS_ERR(e))
+		return PTR_ERR(e);
+
+	return !!rmpentry_assigned(e);
+}
+EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 13/56] x86/fault: Add helper for dumping RMP entries
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (11 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 12/56] x86/sev: Add RMP entry lookup helpers Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 14/56] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
                   ` (44 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

This information will be useful for debugging things like page faults
due to RMP access violations and RMPUPDATE failures.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: move helper to standalone patch]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h |  2 ++
 arch/x86/kernel/sev.c      | 48 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 8d3ce2ad27da..edbb7fa488af 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -198,6 +198,7 @@ bool snp_init(struct boot_params *bp);
 void __init __noreturn snp_abort(void);
 int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
 int snp_lookup_rmpentry(u64 pfn, int *level);
+void sev_dump_rmpentry(u64 pfn);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -223,6 +224,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
 	return -ENOTTY;
 }
 static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
+static inline void sev_dump_rmpentry(u64 pfn) {}
 #endif
 
 #endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a063c1b98034..a01741c4a1b8 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2504,6 +2504,54 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
 	return entry;
 }
 
+void sev_dump_rmpentry(u64 pfn)
+{
+	unsigned long pfn_end;
+	struct rmpentry *e;
+	int level;
+
+	e = __snp_lookup_rmpentry(pfn, &level);
+	if (IS_ERR(e)) {
+		pr_info("Failed to read RMP entry for PFN 0x%llx\n", pfn);
+		return;
+	}
+
+	if (rmpentry_assigned(e)) {
+		pr_info("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx asid=%d vmsa=%d validated=%d]\n",
+			pfn << PAGE_SHIFT, rmpentry_assigned(e), e->info.immutable,
+			rmpentry_pagesize(e), (unsigned long)e->info.gpa, e->info.asid,
+			e->info.vmsa, e->info.validated);
+
+		/* Dump all the non-zero entries if debug enabled */
+		if (!sev_cfg.debug)
+			return;
+	}
+
+	/*
+	 * If the RMP entry at the faulting pfn was not assigned, then not sure
+	 * what caused the RMP violation. To get some useful debug information,
+	 * iterate through the entire 2MB region, and dump the RMP entries if
+	 * one of the bit in the RMP entry is set.
+	 */
+	pfn = pfn & ~(PTRS_PER_PMD - 1);
+	pfn_end = pfn + PTRS_PER_PMD;
+
+	while (pfn < pfn_end) {
+		e = __snp_lookup_rmpentry(pfn, &level);
+		if (IS_ERR(e)) {
+			pr_info("Failed to read RMP entry for PFN 0x%llx\n", pfn);
+			pfn++;
+			continue;
+		}
+
+		if (e->low || e->high)
+			pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+				pfn << PAGE_SHIFT, e->high, e->low);
+		pfn++;
+	}
+}
+EXPORT_SYMBOL_GPL(sev_dump_rmpentry);
+
 /*
  * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
  * and -errno if there is no corresponding RMP entry.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 14/56] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (12 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 13/56] x86/fault: Add helper for dumping RMP entries Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
                   ` (43 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set
of contiguous 4KB-Page RMP entries. The hypervisor will use this
instruction to adjust the RMP entry without invalidating the previous
RMP entry.

Add the following external interface API functions:

psmash():
  Used to smash a 2MB aligned page into 4K pages while preserving the
  Validated bit in the RMP.

rmp_make_private():
  Used to assign a page to guest using the RMPUPDATE instruction.

rmp_make_shared():
  Used to transition a page to hypervisor/shared state using the
  RMPUPDATE instruction.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h |  24 +++++++++
 arch/x86/kernel/sev.c      | 108 +++++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index edbb7fa488af..7d728b30319c 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -80,10 +80,15 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 
 /* Software defined (when rFlags.CF = 1) */
 #define PVALIDATE_FAIL_NOUPDATE		255
+/* RMUPDATE detected 4K page and 2MB page overlap. */
+#define RMPUPDATE_FAIL_OVERLAP		7
 
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
+#define RMP_PG_SIZE_2M			1
 #define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+#define X86_TO_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+
 #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
 
 /* SNP Guest message request */
@@ -133,6 +138,15 @@ struct snp_secrets_page_layout {
 	u8 rsvd3[3840];
 } __packed;
 
+struct rmp_state {
+	u64 gpa;
+	u8 assigned;
+	u8 pagesize;
+	u8 immutable;
+	u8 rsvd;
+	u32 asid;
+} __packed;
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 extern struct static_key_false sev_es_enable_key;
 extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -199,6 +213,9 @@ void __init __noreturn snp_abort(void);
 int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned long *fw_err);
 int snp_lookup_rmpentry(u64 pfn, int *level);
 void sev_dump_rmpentry(u64 pfn);
+int psmash(u64 pfn);
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
+int rmp_make_shared(u64 pfn, enum pg_level level);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -225,6 +242,13 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
 }
 static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
 static inline void sev_dump_rmpentry(u64 pfn) {}
+static inline int psmash(u64 pfn) { return -ENXIO; }
+static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
+				   bool immutable)
+{
+	return -ENODEV;
+}
+static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
 #endif
 
 #endif
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a01741c4a1b8..a49f30c10dc1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2567,3 +2567,111 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
 	return !!rmpentry_assigned(e);
 }
 EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+
+/*
+ * psmash is used to smash a 2MB aligned page into 4K
+ * pages while preserving the Validated bit in the RMP.
+ */
+int psmash(u64 pfn)
+{
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	int ret;
+
+	pr_debug("%s: PFN: 0x%llx\n", __func__, pfn);
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	/* Binutils version 2.36 supports the PSMASH mnemonic. */
+	asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+		      : "=a"(ret)
+		      : "a"(paddr)
+		      : "memory", "cc");
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+static int rmpupdate(u64 pfn, struct rmp_state *val)
+{
+	int max_attempts = 4 * num_present_cpus();
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	int ret, level, npages;
+	int attempts = 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	do {
+		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+			     : "=a"(ret)
+			     : "a"(paddr), "c"((unsigned long)val)
+			     : "memory", "cc");
+
+		attempts++;
+		if (ret)
+			pr_debug("RMPUPDATE retry needed, ASID: %d, ret: %d, pfn: %llx, npages: %d, level: %d, assigned: %d, attempts: %d (max: %d)\n",
+				 ret, val->asid, pfn, npages, level, val->assigned,
+				 attempts, max_attempts);
+	} while (ret && attempts < max_attempts);
+
+	if (ret) {
+		pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n",
+		       attempts, ret, pfn, npages, level);
+		sev_dump_rmpentry(pfn);
+		dump_stack();
+		return -EFAULT;
+	} else if (attempts > 1) {
+		pr_debug("RMPUPDATE succeeded after %d attempts, ASID: %d, ret: %d, pfn: %llx, npages: %d",
+			 attempts, val->asid, ret, pfn, npages);
+	}
+
+	return 0;
+}
+
+/*
+ * Assign a page to guest using the RMPUPDATE instruction.
+ */
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
+{
+	struct rmp_state val;
+
+	pr_debug("%s: GPA: 0x%llx, PFN: 0x%llx, level: %d, immutable: %d\n",
+		 __func__, gpa, pfn, level, immutable);
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
+	memset(&val, 0, sizeof(val));
+	val.assigned = 1;
+	val.asid = asid;
+	val.immutable = immutable;
+	val.gpa = gpa;
+	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+	return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_private);
+
+/*
+ * Transition a page to hypervisor/shared state using the RMPUPDATE instruction.
+ */
+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+	struct rmp_state val;
+
+	pr_debug("%s: PFN: 0x%llx, level: %d\n", __func__, pfn, level);
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
+	memset(&val, 0, sizeof(val));
+	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+	return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_shared);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (13 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 14/56] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-01 12:07   ` Tom Dohrmann
  2023-03-01 16:15   ` Dave Hansen
  2023-02-20 18:38 ` [PATCH RFC v8 16/56] x86/traps: Define RMP violation #PF error code Michael Roth
                   ` (42 subsequent siblings)
  57 siblings, 2 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used with standard x86 and IOMMU page tables to enforce
memory restrictions and page access rights. The RMP check is enforced as
soon as SEV-SNP is enabled globally in the system. When hardware
encounters an RMP-check failure, it raises a page-fault exception.

The rmp_make_private() and rmp_make_shared() helpers are used to add
or remove the pages from the RMP table. Improve the rmp_make_private()
to invalidate state so that pages cannot be used in the direct-map after
they are added the RMP table, and restored to their default valid
permission after the pages are removed from the RMP table.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kernel/sev.c | 57 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a49f30c10dc1..3e5ff5934e83 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2595,6 +2595,37 @@ int psmash(u64 pfn)
 }
 EXPORT_SYMBOL_GPL(psmash);
 
+static int restore_direct_map(u64 pfn, int npages)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < npages; i++) {
+		ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
+		if (ret)
+			goto cleanup;
+	}
+
+cleanup:
+	WARN(ret > 0, "Failed to restore direct map for pfn 0x%llx\n", pfn + i);
+	return ret;
+}
+
+static int invalidate_direct_map(u64 pfn, int npages)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < npages; i++) {
+		ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));
+		if (ret)
+			goto cleanup;
+	}
+
+cleanup:
+	WARN(ret > 0, "Failed to invalidate direct map for pfn 0x%llx\n", pfn + i);
+	restore_direct_map(pfn, i);
+	return ret;
+}
+
 static int rmpupdate(u64 pfn, struct rmp_state *val)
 {
 	int max_attempts = 4 * num_present_cpus();
@@ -2605,6 +2636,21 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
 	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
 		return -ENXIO;
 
+	level = RMP_TO_X86_PG_LEVEL(val->pagesize);
+	npages = page_level_size(level) / PAGE_SIZE;
+
+	/*
+	 * If page is getting assigned in the RMP table then unmap it from the
+	 * direct map.
+	 */
+	if (val->assigned) {
+		if (invalidate_direct_map(pfn, npages)) {
+			pr_err("Failed to unmap %d pages at pfn 0x%llx from the direct_map\n",
+			       npages, pfn);
+			return -EFAULT;
+		}
+	}
+
 	do {
 		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
 		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
@@ -2630,6 +2676,17 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
 			 attempts, val->asid, ret, pfn, npages);
 	}
 
+	/*
+	 * Restore the direct map after the page is removed from the RMP table.
+	 */
+	if (!val->assigned) {
+		if (restore_direct_map(pfn, npages)) {
+			pr_err("Failed to map %d pages at pfn 0x%llx into the direct_map\n",
+			       npages, pfn);
+			return -EFAULT;
+		}
+	}
+
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 16/56] x86/traps: Define RMP violation #PF error code
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (14 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address Michael Roth
                   ` (41 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.

While at it, use the BIT_ULL() macro.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/trap_pf.h | 18 +++++++++++-------
 arch/x86/mm/fault.c            |  1 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..295be06f8db7 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_TRAP_PF_H
 #define _ASM_X86_TRAP_PF_H
 
+#include <linux/bits.h>  /* BIT() macro */
+
 /*
  * Page fault error code bits:
  *
@@ -12,15 +14,17 @@
  *   bit 4 ==				1: fault was an instruction fetch
  *   bit 5 ==				1: protection keys block access
  *   bit 15 ==				1: SGX MMU page-fault
+ *   bit 31 ==				1: fault was due to RMP violation
  */
 enum x86_pf_error_code {
-	X86_PF_PROT	=		1 << 0,
-	X86_PF_WRITE	=		1 << 1,
-	X86_PF_USER	=		1 << 2,
-	X86_PF_RSVD	=		1 << 3,
-	X86_PF_INSTR	=		1 << 4,
-	X86_PF_PK	=		1 << 5,
-	X86_PF_SGX	=		1 << 15,
+	X86_PF_PROT	=		BIT(0),
+	X86_PF_WRITE	=		BIT(1),
+	X86_PF_USER	=		BIT(2),
+	X86_PF_RSVD	=		BIT(3),
+	X86_PF_INSTR	=		BIT(4),
+	X86_PF_PK	=		BIT(5),
+	X86_PF_SGX	=		BIT(15),
+	X86_PF_RMP	=		BIT(31),
 };
 
 #endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7b0d4ab894c8..f8193b99e9c8 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -567,6 +567,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		 !(error_code & X86_PF_PROT) ? "not-present page" :
 		 (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
 		 (error_code & X86_PF_PK)    ? "protection keys violation" :
+		 (error_code & X86_PF_RMP)   ? "RMP violation" :
 					       "permissions violation");
 
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (15 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 16/56] x86/traps: Define RMP violation #PF error code Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-01 16:21   ` Dave Hansen
  2023-03-03 15:31   ` Vlastimil Babka
  2023-02-20 18:38 ` [PATCH RFC v8 18/56] x86/fault: fix handle_split_page_fault() to work with memfd backed pages Michael Roth
                   ` (40 subsequent siblings)
  57 siblings, 2 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Jarkko Sakkinen

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled globally, a write from the host goes through the
RMP check. When the host writes to pages, hardware checks the following
conditions at the end of page walk:

1. Assigned bit in the RMP table is zero (i.e page is shared).
2. If the page table entry that gives the sPA indicates that the target
   page size is a large page, then all RMP entries for the 4KB
   constituting pages of the target must have the assigned bit 0.
3. Immutable bit in the RMP table is not zero.

The hardware will raise page fault if one of the above conditions is not
met. Try resolving the fault instead of taking fault again and again. If
the host attempts to write to the guest private memory then send the
SIGBUS signal to kill the process. If the page level between the host and
RMP entry does not match, then split the address to keep the RMP and host
page levels in sync.

Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@profian.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/mm/fault.c      | 104 ++++++++++++++++++++++++++++++++++++++-
 include/linux/mm.h       |   3 +-
 include/linux/mm_types.h |   3 ++
 mm/memory.c              |  10 ++++
 4 files changed, 118 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f8193b99e9c8..afd4cde17001 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -33,6 +33,7 @@
 #include <asm/kvm_para.h>		/* kvm_handle_async_pf		*/
 #include <asm/vdso.h>			/* fixup_vdso_exception()	*/
 #include <asm/irq_stack.h>
+#include <asm/sev.h>			/* snp_lookup_rmpentry()	*/
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -414,6 +415,7 @@ static void dump_pagetable(unsigned long address)
 	pr_cont("PTE %lx", pte_val(*pte));
 out:
 	pr_cont("\n");
+
 	return;
 bad:
 	pr_info("BAD\n");
@@ -527,6 +529,8 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
 static void
 show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
+	unsigned long pfn;
+
 	if (!oops_may_print())
 		return;
 
@@ -599,7 +603,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		show_ldttss(&gdt, "TR", tr);
 	}
 
-	dump_pagetable(address);
+	pfn = dump_pagetable(address);
+
+	if (error_code & X86_PF_RMP)
+		sev_dump_rmpentry(pfn);
 }
 
 static noinline void
@@ -1240,6 +1247,90 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
+enum rmp_pf_ret {
+	RMP_PF_SPLIT	= 0,
+	RMP_PF_RETRY	= 1,
+	RMP_PF_UNMAP	= 2,
+};
+
+/*
+ * The goal of RMP faulting routine is really to check whether the
+ * page that faulted should be accessible.  That can be determined
+ * simply by looking at the RMP entry for the 4k address being accessed.
+ * If that entry has Assigned=1 then it's a bad address. It could be
+ * because the 2MB region was assigned as a large page, or it could be
+ * because the region is all 4k pages and that 4k was assigned.
+ * In either case, it's a bad access.
+ * There are basically two main possibilities:
+ * 1. The 2M entry has Assigned=1 and Page_Size=1. Then all 511 middle
+ * entries also have Assigned=1. This entire 2M region is a guest page.
+ * 2. The 2M entry has Assigned=0 and Page_Size=0. Then the 511 middle
+ * entries can be anything, this region consists of individual 4k assignments.
+ */
+static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
+				      unsigned long address)
+{
+	int rmp_level, level;
+	pgd_t *pgd;
+	pte_t *pte;
+	u64 pfn;
+
+	pgd = __va(read_cr3_pa());
+	pgd += pgd_index(address);
+
+	pte = lookup_address_in_pgd(pgd, address, &level);
+
+	/*
+	 * It can happen if there was a race between an unmap event and
+	 * the RMP fault delivery.
+	 */
+	if (!pte || !pte_present(*pte))
+		return RMP_PF_UNMAP;
+
+	/*
+	 * RMP page fault handler follows this algorithm:
+	 * 1. Compute the pfn for the 4kb page being accessed
+	 * 2. Read that RMP entry -- If it is assigned then kill the process
+	 * 3. Otherwise, check the level from the host page table
+	 *    If level=PG_LEVEL_4K then the page is already smashed
+	 *    so just retry the instruction
+	 * 4. If level=PG_LEVEL_2M/1G, then the host page needs to be split
+	 */
+
+	pfn = pte_pfn(*pte);
+
+	/* If its large page then calculte the fault pfn */
+	if (level > PG_LEVEL_4K)
+		pfn = pfn | PFN_DOWN(address & (page_level_size(level) - 1));
+
+	/*
+	 * If its a guest private page, then the fault cannot be resolved.
+	 * Send a SIGBUS to terminate the process.
+	 *
+	 * As documented in APM vol3 pseudo-code for RMPUPDATE, when the 2M range
+	 * is covered by a valid (Assigned=1) 2M entry, the middle 511 4k entries
+	 * also have Assigned=1. This means that if there is an access to a page
+	 * which happens to lie within an Assigned 2M entry, the 4k RMP entry
+	 * will also have Assigned=1. Therefore, the kernel should see that
+	 * the page is not a valid page and the fault cannot be resolved.
+	 */
+	if (snp_lookup_rmpentry(pfn, &rmp_level)) {
+		pr_info("Fatal RMP page fault, terminating process, entry assigned for pfn 0x%llx\n",
+			pfn);
+		do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
+		return RMP_PF_RETRY;
+	}
+
+	/*
+	 * The backing page level is higher than the RMP page level, request
+	 * to split the page.
+	 */
+	if (level > rmp_level)
+		return RMP_PF_SPLIT;
+
+	return RMP_PF_RETRY;
+}
+
 /*
  * Handle faults in the user portion of the address space.  Nothing in here
  * should check X86_PF_USER without a specific justification: for almost
@@ -1337,6 +1428,17 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+	/*
+	 * If its an RMP violation, try resolving it.
+	 */
+	if (error_code & X86_PF_RMP) {
+		if (handle_user_rmp_page_fault(regs, error_code, address))
+			return;
+
+		/* Ask to split the page */
+		flags |= FAULT_FLAG_PAGE_SPLIT;
+	}
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Faults in the vsyscall page might need emulation.  The
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3c84f4e48cd7..2fd8e16d149c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -466,7 +466,8 @@ static inline bool fault_flag_allow_retry_first(enum fault_flag flags)
 	{ FAULT_FLAG_USER,		"USER" }, \
 	{ FAULT_FLAG_REMOTE,		"REMOTE" }, \
 	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }, \
-	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }
+	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }, \
+	{ FAULT_FLAG_PAGE_SPLIT,	"PAGESPLIT" }
 
 /*
  * vm_fault is filled by the pagefault handler and passed to the vma's
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..06ba34d51638 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -962,6 +962,8 @@ typedef struct {
  *                      mapped R/O.
  * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached.
  *                        We should only access orig_pte if this flag set.
+ * @FAULT_FLAG_PAGE_SPLIT: The fault was due page size mismatch, split the
+ *                         region to smaller page size and retry.
  *
  * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
  * whether we would allow page faults to retry by specifying these two
@@ -999,6 +1001,7 @@ enum fault_flag {
 	FAULT_FLAG_INTERRUPTIBLE =	1 << 9,
 	FAULT_FLAG_UNSHARE =		1 << 10,
 	FAULT_FLAG_ORIG_PTE_VALID =	1 << 11,
+	FAULT_FLAG_PAGE_SPLIT =		1 << 12,
 };
 
 typedef unsigned int __bitwise zap_flags_t;
diff --git a/mm/memory.c b/mm/memory.c
index f88c351aecd4..e68da7e403c6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4996,6 +4996,12 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	return 0;
 }
 
+static int handle_split_page_fault(struct vm_fault *vmf)
+{
+	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
+	return 0;
+}
+
 /*
  * By the time we get here, we already hold the mm semaphore
  *
@@ -5078,6 +5084,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 				pmd_migration_entry_wait(mm, vmf.pmd);
 			return 0;
 		}
+
+		if (flags & FAULT_FLAG_PAGE_SPLIT)
+			return handle_split_page_fault(&vmf);
+
 		if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) {
 			if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
 				return do_huge_pmd_numa_page(&vmf);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 18/56] x86/fault: fix handle_split_page_fault() to work with memfd backed pages
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (16 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 19:57   ` Hugh Dickins
  2023-02-20 18:38 ` [PATCH RFC v8 19/56] x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling Michael Roth
                   ` (39 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Hugh Dickins

From: Hugh Dickins <hughd@google.com>

When the address is backed by a memfd, the code to split the page does
nothing more than remove the PMD from the page tables. So immediately
install a PTE to ensure that any other pages in that 2MB region are
brought back as in 4K pages.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 mm/memory.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index e68da7e403c6..33c9020ba1f8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4999,6 +4999,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 static int handle_split_page_fault(struct vm_fault *vmf)
 {
 	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
+	/*
+	 * Install a PTE immediately to ensure that any other pages in
+	 * this 2MB region are brought back in as 4K pages.
+	 */
+	__pte_alloc(vmf->vma->vm_mm, vmf->pmd);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 19/56] x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling.
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (17 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 18/56] x86/fault: fix handle_split_page_fault() to work with memfd backed pages Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 21:13   ` Zhi Wang
  2023-02-28 10:53   ` Wu Zongyong
  2023-02-20 18:38 ` [PATCH RFC v8 20/56] crypto:ccp: Define the SEV-SNP commands Michael Roth
                   ` (38 subsequent siblings)
  57 siblings, 2 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

From: Ashish Kalra <ashish.kalra@amd.com>

Return pfn from dump_pagetable() to do SEV-specific
fault handling. Used for handling SNP RMP page fault.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/mm/fault.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index afd4cde17001..f2b16dcfbd9a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -311,7 +311,7 @@ static bool low_pfn(unsigned long pfn)
 	return pfn < max_low_pfn;
 }
 
-static void dump_pagetable(unsigned long address)
+static unsigned long dump_pagetable(unsigned long address)
 {
 	pgd_t *base = __va(read_cr3_pa());
 	pgd_t *pgd = &base[pgd_index(address)];
@@ -345,8 +345,10 @@ static void dump_pagetable(unsigned long address)
 
 	pte = pte_offset_kernel(pmd, address);
 	pr_cont("*pte = %0*Lx ", sizeof(*pte) * 2, (u64)pte_val(*pte));
+	return 0;
 out:
 	pr_cont("\n");
+	return 0;
 }
 
 #else /* CONFIG_X86_64: */
@@ -367,10 +369,11 @@ static int bad_address(void *p)
 	return get_kernel_nofault(dummy, (unsigned long *)p);
 }
 
-static void dump_pagetable(unsigned long address)
+static unsigned long dump_pagetable(unsigned long address)
 {
 	pgd_t *base = __va(read_cr3_pa());
 	pgd_t *pgd = base + pgd_index(address);
+	unsigned long pfn;
 	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
@@ -388,6 +391,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(p4d))
 		goto bad;
 
+	pfn = p4d_pfn(*p4d);
 	pr_cont("P4D %lx ", p4d_val(*p4d));
 	if (!p4d_present(*p4d) || p4d_large(*p4d))
 		goto out;
@@ -396,6 +400,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pud))
 		goto bad;
 
+	pfn = pud_pfn(*pud);
 	pr_cont("PUD %lx ", pud_val(*pud));
 	if (!pud_present(*pud) || pud_large(*pud))
 		goto out;
@@ -404,6 +409,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pmd))
 		goto bad;
 
+	pfn = pmd_pfn(*pmd);
 	pr_cont("PMD %lx ", pmd_val(*pmd));
 	if (!pmd_present(*pmd) || pmd_large(*pmd))
 		goto out;
@@ -412,13 +418,14 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pte))
 		goto bad;
 
+	pfn = pte_pfn(*pte);
 	pr_cont("PTE %lx", pte_val(*pte));
 out:
 	pr_cont("\n");
-
-	return;
+	return pfn;
 bad:
 	pr_info("BAD\n");
+	return -1;
 }
 
 #endif /* CONFIG_X86_64 */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 20/56] crypto:ccp: Define the SEV-SNP commands
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (18 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 19/56] x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-04-17 14:54   ` Sabin Rapan
  2023-02-20 18:38 ` [PATCH RFC v8 21/56] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
                   ` (37 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c |  16 +++
 include/linux/psp-sev.h      | 247 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h |  44 +++++++
 3 files changed, 307 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 06fc7156c04f..9d84720a41d7 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -126,6 +126,8 @@ static int sev_cmd_buffer_len(int cmd)
 	switch (cmd) {
 	case SEV_CMD_INIT:			return sizeof(struct sev_data_init);
 	case SEV_CMD_INIT_EX:                   return sizeof(struct sev_data_init_ex);
+	case SEV_CMD_SNP_SHUTDOWN_EX:		return sizeof(struct sev_data_snp_shutdown_ex);
+	case SEV_CMD_SNP_INIT_EX:		return sizeof(struct sev_data_snp_init_ex);
 	case SEV_CMD_PLATFORM_STATUS:		return sizeof(struct sev_user_data_status);
 	case SEV_CMD_PEK_CSR:			return sizeof(struct sev_data_pek_csr);
 	case SEV_CMD_PEK_CERT_IMPORT:		return sizeof(struct sev_data_pek_cert_import);
@@ -154,6 +156,20 @@ static int sev_cmd_buffer_len(int cmd)
 	case SEV_CMD_GET_ID:			return sizeof(struct sev_data_get_id);
 	case SEV_CMD_ATTESTATION_REPORT:	return sizeof(struct sev_data_attestation_report);
 	case SEV_CMD_SEND_CANCEL:		return sizeof(struct sev_data_send_cancel);
+	case SEV_CMD_SNP_GCTX_CREATE:		return sizeof(struct sev_data_snp_addr);
+	case SEV_CMD_SNP_LAUNCH_START:		return sizeof(struct sev_data_snp_launch_start);
+	case SEV_CMD_SNP_LAUNCH_UPDATE:		return sizeof(struct sev_data_snp_launch_update);
+	case SEV_CMD_SNP_ACTIVATE:		return sizeof(struct sev_data_snp_activate);
+	case SEV_CMD_SNP_DECOMMISSION:		return sizeof(struct sev_data_snp_addr);
+	case SEV_CMD_SNP_PAGE_RECLAIM:		return sizeof(struct sev_data_snp_page_reclaim);
+	case SEV_CMD_SNP_GUEST_STATUS:		return sizeof(struct sev_data_snp_guest_status);
+	case SEV_CMD_SNP_LAUNCH_FINISH:		return sizeof(struct sev_data_snp_launch_finish);
+	case SEV_CMD_SNP_DBG_DECRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_DBG_ENCRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_PAGE_UNSMASH:		return sizeof(struct sev_data_snp_page_unsmash);
+	case SEV_CMD_SNP_PLATFORM_STATUS:	return sizeof(struct sev_data_snp_addr);
+	case SEV_CMD_SNP_GUEST_REQUEST:		return sizeof(struct sev_data_snp_guest_request);
+	case SEV_CMD_SNP_CONFIG:		return sizeof(struct sev_user_data_snp_config);
 	default:				return 0;
 	}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 1595088c428b..31b045e1926f 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -86,6 +86,35 @@ enum sev_cmd {
 	SEV_CMD_DBG_DECRYPT		= 0x060,
 	SEV_CMD_DBG_ENCRYPT		= 0x061,
 
+	/* SNP specific commands */
+	SEV_CMD_SNP_INIT		= 0x81,
+	SEV_CMD_SNP_SHUTDOWN		= 0x82,
+	SEV_CMD_SNP_PLATFORM_STATUS	= 0x83,
+	SEV_CMD_SNP_DF_FLUSH		= 0x84,
+	SEV_CMD_SNP_INIT_EX		= 0x85,
+	SEV_CMD_SNP_SHUTDOWN_EX		= 0x86,
+	SEV_CMD_SNP_DECOMMISSION	= 0x90,
+	SEV_CMD_SNP_ACTIVATE		= 0x91,
+	SEV_CMD_SNP_GUEST_STATUS	= 0x92,
+	SEV_CMD_SNP_GCTX_CREATE		= 0x93,
+	SEV_CMD_SNP_GUEST_REQUEST	= 0x94,
+	SEV_CMD_SNP_ACTIVATE_EX		= 0x95,
+	SEV_CMD_SNP_LAUNCH_START	= 0xA0,
+	SEV_CMD_SNP_LAUNCH_UPDATE	= 0xA1,
+	SEV_CMD_SNP_LAUNCH_FINISH	= 0xA2,
+	SEV_CMD_SNP_DBG_DECRYPT		= 0xB0,
+	SEV_CMD_SNP_DBG_ENCRYPT		= 0xB1,
+	SEV_CMD_SNP_PAGE_SWAP_OUT	= 0xC0,
+	SEV_CMD_SNP_PAGE_SWAP_IN	= 0xC1,
+	SEV_CMD_SNP_PAGE_MOVE		= 0xC2,
+	SEV_CMD_SNP_PAGE_MD_INIT	= 0xC3,
+	SEV_CMD_SNP_PAGE_MD_RECLAIM	= 0xC4,
+	SEV_CMD_SNP_PAGE_RO_RECLAIM	= 0xC5,
+	SEV_CMD_SNP_PAGE_RO_RESTORE	= 0xC6,
+	SEV_CMD_SNP_PAGE_RECLAIM	= 0xC7,
+	SEV_CMD_SNP_PAGE_UNSMASH	= 0xC8,
+	SEV_CMD_SNP_CONFIG		= 0xC9,
+
 	SEV_CMD_MAX,
 };
 
@@ -531,6 +560,224 @@ struct sev_data_attestation_report {
 	u32 len;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+	u64 address;				/* In */
+	u32 len;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+	u64 gctx_paddr;				/* In */
+	u32 asid;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_addr - generic SNP command params
+ *
+ * @address: system physical address guest context page
+ */
+struct sev_data_snp_addr {
+	u64 gctx_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_addr: system physical address of migration agent
+ * @imi_en: launch flow is launching an IMI for the purpose of
+ *   guest-assisted migration.
+ * @ma_en: the guest is associated with a migration agent
+ */
+struct sev_data_snp_launch_start {
+	u64 gctx_paddr;				/* In */
+	u64 policy;				/* In */
+	u64 ma_gctx_paddr;			/* In */
+	u32 ma_en:1;				/* In */
+	u32 imi_en:1;				/* In */
+	u32 rsvd:30;
+	u8 gosvw[16];				/* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+	SNP_PAGE_TYPE_NORMAL		= 0x1,
+	SNP_PAGE_TYPE_VMSA		= 0x2,
+	SNP_PAGE_TYPE_ZERO		= 0x3,
+	SNP_PAGE_TYPE_UNMEASURED	= 0x4,
+	SNP_PAGE_TYPE_SECRET		= 0x5,
+	SNP_PAGE_TYPE_CPUID		= 0x6,
+
+	SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @page_type: encoded page type
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @address: system physical address of destination page to encrypt
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ */
+struct sev_data_snp_launch_update {
+	u64 gctx_paddr;				/* In */
+	u32 page_size:1;			/* In */
+	u32 page_type:3;			/* In */
+	u32 imi_page:1;				/* In */
+	u32 rsvd:27;
+	u32 rsvd2;
+	u64 address;				/* In */
+	u32 rsvd3:8;
+	u32 vmpl1_perms:8;			/* In */
+	u32 vmpl2_perms:8;			/* In */
+	u32 vmpl3_perms:8;			/* In */
+	u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ */
+struct sev_data_snp_launch_finish {
+	u64 gctx_paddr;
+	u64 id_block_paddr;
+	u64 id_auth_paddr;
+	u8 id_block_en:1;
+	u8 auth_key_en:1;
+	u64 rsvd:62;
+	u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+	u64 gctx_paddr;
+	u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The 0th bit
+ *	in the address indicates the page size. 0h indicates 4 kB and
+ *	1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNSMASH command params
+ *
+ * @paddr: system physical address of page to be unsmashed. The 0th bit
+ *	in the address indicates the page size. 0h indicates 4 kB and
+ *	1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @handle: handle of the VM to perform debug operation
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ * @len: len of data to operate on
+ */
+struct sev_data_snp_dbg {
+	u64 gctx_paddr;				/* In */
+	u64 src_addr;				/* In */
+	u64 dst_addr;				/* In */
+	u32 len;				/* In */
+} __packed;
+
+/**
+ * struct sev_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+	u64 gctx_paddr;				/* In */
+	u64 req_paddr;				/* In */
+	u64 res_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_init - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ * @list_paddr_en: indicate that list_paddr is valid
+ * #list_paddr: system physical address of range list
+ */
+struct sev_data_snp_init_ex {
+	u32 init_rmp:1;
+	u32 list_paddr_en:1;
+	u32 rsvd:30;
+	u32 rsvd1;
+	u64 list_paddr;
+	u8  rsvd2[48];
+} __packed;
+
+/**
+ * struct sev_data_range - RANGE structure
+ *
+ * @base: system physical address of first byte of range
+ * @page_count: number of 4KB pages in this range
+ */
+struct sev_data_range {
+	u64 base;
+	u32 page_count;
+	u32 rsvd;
+} __packed;
+
+/**
+ * struct sev_data_range_list - RANGE_LIST structure
+ *
+ * @num_elements: number of elements in RANGE_ARRAY
+ * @ranges: array of num_elements of type RANGE
+ */
+struct sev_data_range_list {
+	u32 num_elements;
+	u32 rsvd;
+	struct sev_data_range ranges[0];
+} __packed;
+
+/**
+ * struct sev_data_snp_shutdown_ex - SNP_SHUTDOWN_EX structure
+ *
+ * @length: len of the command buffer read by the PSP
+ * @iommu_snp_shutdown: Disable enforcement of SNP in the IOMMU
+ */
+struct sev_data_snp_shutdown_ex {
+	u32 length;
+	u32 iommu_snp_shutdown:1;
+	u32 rsvd1:31;
+} __packed;
+
 #ifdef CONFIG_CRYPTO_DEV_SP_PSP
 
 /**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 91b4c63d5cbf..c66f7c372645 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -61,6 +61,13 @@ typedef enum {
 	SEV_RET_INVALID_PARAM,
 	SEV_RET_RESOURCE_LIMIT,
 	SEV_RET_SECURE_DATA_INVALID,
+	SEV_RET_INVALID_PAGE_SIZE,
+	SEV_RET_INVALID_PAGE_STATE,
+	SEV_RET_INVALID_MDATA_ENTRY,
+	SEV_RET_INVALID_PAGE_OWNER,
+	SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+	SEV_RET_RMP_INIT_REQUIRED,
+
 	SEV_RET_MAX,
 } sev_ret_code;
 
@@ -147,6 +154,43 @@ struct sev_user_data_get_id2 {
 	__u32 length;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @major: API major version
+ * @minor: API minor version
+ * @state: current platform state
+ * @build: firmware build id for the API version
+ * @guest_count: the number of guest currently managed by the firmware
+ * @tcb_version: current TCB version
+ */
+struct sev_user_data_snp_status {
+	__u8 api_major;		/* Out */
+	__u8 api_minor;		/* Out */
+	__u8 state;		/* Out */
+	__u8 rsvd;
+	__u32 build_id;		/* Out */
+	__u32 rsvd1;
+	__u32 guest_count;	/* Out */
+	__u64 tcb_version;	/* Out */
+	__u64 rsvd2;
+} __packed;
+
+/*
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: The TCB version to report in the guest attestation report.
+ * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
+ * will always be zero.
+ */
+struct sev_user_data_snp_config {
+	__u64 reported_tcb  ;   /* In */
+	__u32 mask_chip_id:1;   /* In */
+	__u32 mask_chip_key:1;  /* In */
+	__u32 rsvd:30;          /* In */
+	__u8 rsvd1[52];
+} __packed;
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 21/56] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (19 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 20/56] crypto:ccp: Define the SEV-SNP commands Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 22/56] crypto:ccp: Provide API to issue SEV and SNP commands Michael Roth
                   ` (36 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
command to prepare for the first SNP guest launch after INIT.

During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to other page states
after SNP_INIT enables their SNP enforcement.

Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 225 +++++++++++++++++++++++++++++++++++
 drivers/crypto/ccp/sev-dev.h |   2 +
 include/linux/psp-sev.h      |  17 +++
 3 files changed, 244 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 9d84720a41d7..af20420bd6c2 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -26,6 +26,7 @@
 #include <linux/fs_struct.h>
 
 #include <asm/smp.h>
+#include <asm/e820/types.h>
 
 #include "psp-dev.h"
 #include "sev-dev.h"
@@ -34,6 +35,10 @@
 #define SEV_FW_FILE		"amd/sev.fw"
 #define SEV_FW_NAME_SIZE	64
 
+/* Minimum firmware version required for the SEV-SNP support */
+#define SNP_MIN_API_MAJOR	1
+#define SNP_MIN_API_MINOR	51
+
 static DEFINE_MUTEX(sev_cmd_mutex);
 static struct sev_misc_dev *misc_dev;
 
@@ -76,6 +81,13 @@ static void *sev_es_tmr;
 #define NV_LENGTH (32 * 1024)
 static void *sev_init_ex_buffer;
 
+/*
+ * SEV_DATA_RANGE_LIST:
+ *   Array containing range of pages that firmware transitions to HV-fixed
+ *   page state.
+ */
+struct sev_data_range_list *snp_range_list;
+
 static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -830,6 +842,186 @@ static int sev_update_firmware(struct device *dev)
 	return ret;
 }
 
+static void snp_set_hsave_pa(void *arg)
+{
+	wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
+{
+	struct sev_data_range_list *range_list = arg;
+	struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
+	size_t size;
+
+	if ((range_list->num_elements * sizeof(struct sev_data_range) +
+	     sizeof(struct sev_data_range_list)) > PAGE_SIZE)
+		return -E2BIG;
+
+	switch (rs->desc) {
+	case E820_TYPE_RESERVED:
+	case E820_TYPE_PMEM:
+	case E820_TYPE_ACPI:
+		range->base = rs->start & PAGE_MASK;
+		size = (rs->end + 1) - rs->start;
+		range->page_count = size >> PAGE_SHIFT;
+		range_list->num_elements++;
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+	struct psp_device *psp = psp_master;
+	struct sev_data_snp_init_ex data;
+	struct sev_device *sev;
+	int rc = 0;
+
+	if (!psp || !psp->sev_data)
+		return -ENODEV;
+
+	sev = psp->sev_data;
+
+	if (sev->snp_initialized)
+		return 0;
+
+	/*
+	 * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
+	 * across all cores.
+	 */
+	on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+	/*
+	 * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
+	 * system physical address ranges to convert into the HV-fixed page states
+	 * during the RMP initialization.  For instance, the memory that UEFI
+	 * reserves should be included in the range list. This allows system
+	 * components that occasionally write to memory (e.g. logging to UEFI
+	 * reserved regions) to not fail due to RMP initialization and SNP enablement.
+	 */
+	if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
+		/*
+		 * Firmware checks that the pages containing the ranges enumerated
+		 * in the RANGES structure are either in the Default page state or in the
+		 * firmware page state.
+		 */
+		snp_range_list = sev_fw_alloc(PAGE_SIZE);
+		if (!snp_range_list) {
+			dev_err(sev->dev,
+				"SEV: SNP_INIT_EX range list memory allocation failed\n");
+			return -ENOMEM;
+		}
+
+		memset(snp_range_list, 0, PAGE_SIZE);
+
+		/*
+		 * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
+		 * to be setup as HV-fixed pages.
+		 */
+
+		rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
+					 snp_range_list, snp_filter_reserved_mem_regions);
+		if (rc) {
+			dev_err(sev->dev,
+				"SEV: SNP_INIT_EX walk_iomem_res_desc failed rc = %d\n", rc);
+			return rc;
+		}
+
+		memset(&data, 0, sizeof(data));
+		data.init_rmp = 1;
+		data.list_paddr_en = 1;
+		data.list_paddr = __pa(snp_range_list);
+
+		rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT_EX, &data, error);
+		if (rc)
+			return rc;
+	} else {
+		rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+		if (rc)
+			return rc;
+	}
+
+	/* Prepare for first SNP guest launch after INIT */
+	wbinvd_on_all_cpus();
+	rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+	if (rc)
+		return rc;
+
+	sev->snp_initialized = true;
+	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+	return rc;
+}
+
+int sev_snp_init(int *error, bool init_on_probe)
+{
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENODEV;
+
+	if (init_on_probe && !psp_init_on_probe)
+		return 0;
+
+	mutex_lock(&sev_cmd_mutex);
+	rc = __sev_snp_init_locked(error);
+	mutex_unlock(&sev_cmd_mutex);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(sev_snp_init);
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_data_snp_shutdown_ex data;
+	int ret;
+
+	if (!sev->snp_initialized)
+		return 0;
+
+	memset(&data, 0, sizeof(data));
+	data.length = sizeof(data);
+	data.iommu_snp_shutdown = 1;
+
+	wbinvd_on_all_cpus();
+
+retry:
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
+	/* SHUTDOWN may require DF_FLUSH */
+	if (*error == SEV_RET_DFFLUSH_REQUIRED) {
+		ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
+		if (ret) {
+			dev_err(sev->dev, "SEV-SNP DF_FLUSH failed\n");
+			return ret;
+		}
+		goto retry;
+	}
+	if (ret) {
+		dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
+		return ret;
+	}
+
+	sev->snp_initialized = false;
+	dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+	return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+	int rc;
+
+	mutex_lock(&sev_cmd_mutex);
+	rc = __sev_snp_shutdown_locked(error);
+	mutex_unlock(&sev_cmd_mutex);
+
+	return rc;
+}
+
 static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -1270,6 +1462,8 @@ int sev_dev_init(struct psp_device *psp)
 
 static void sev_firmware_shutdown(struct sev_device *sev)
 {
+	int error;
+
 	sev_platform_shutdown(NULL);
 
 	if (sev_es_tmr) {
@@ -1286,6 +1480,14 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 			   get_order(NV_LENGTH));
 		sev_init_ex_buffer = NULL;
 	}
+
+	if (snp_range_list) {
+		free_pages((unsigned long)snp_range_list,
+			   get_order(PAGE_SIZE));
+		snp_range_list = NULL;
+	}
+
+	sev_snp_shutdown(&error);
 }
 
 void sev_dev_destroy(struct psp_device *psp)
@@ -1341,6 +1543,26 @@ void sev_pci_init(void)
 		}
 	}
 
+	/*
+	 * If boot CPU supports SNP, then first attempt to initialize
+	 * the SNP firmware.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
+		if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
+			dev_err(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
+				SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
+		} else {
+			rc = sev_snp_init(&error, true);
+			if (rc) {
+				/*
+				 * Don't abort the probe if SNP INIT failed,
+				 * continue to initialize the legacy SEV firmware.
+				 */
+				dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
+			}
+		}
+	}
+
 	/* Obtain the TMR memory area for SEV-ES use */
 	sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
 	if (!sev_es_tmr)
@@ -1356,6 +1578,9 @@ void sev_pci_init(void)
 		dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
 			error, rc);
 
+	dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
+		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
+
 	return;
 
 err:
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 666c21eb81ab..34767657beb5 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+
+	bool snp_initialized;
 };
 
 int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 31b045e1926f..8cfe92e82743 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -794,6 +794,21 @@ struct sev_data_snp_shutdown_ex {
  */
 int sev_platform_init(int *error);
 
+/**
+ * sev_snp_init - perform SEV SNP_INIT command
+ *
+ * @error: SEV command return code
+ * @init_on_probe: indicates if called during module probe/init
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV    if the SEV device is not available
+ * -%ENOTSUPP  if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO       if the SEV returned a non-zero return code
+ */
+int sev_snp_init(int *error, bool init_on_probe);
+
 /**
  * sev_platform_status - perform SEV PLATFORM_STATUS command
  *
@@ -901,6 +916,8 @@ sev_platform_status(struct sev_user_data_status *status, int *error) { return -E
 
 static inline int sev_platform_init(int *error) { return -ENODEV; }
 
+static inline int sev_snp_init(int *error, bool init_on_probe) { return -ENODEV; }
+
 static inline int
 sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENODEV; }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 22/56] crypto:ccp: Provide API to issue SEV and SNP commands
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (20 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 21/56] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 23/56] crypto: ccp: Introduce snp leaked pages list Michael Roth
                   ` (35 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Make sev_do_cmd() a generic API interface for the hypervisor
to issue commands to manage an SEV and SNP guest. The commands
for SEV and SNP are defined in the SEV and SEV-SNP firmware
specifications.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c |  3 ++-
 include/linux/psp-sev.h      | 17 +++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index af20420bd6c2..35f605936f1b 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -415,7 +415,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 	return ret;
 }
 
-static int sev_do_cmd(int cmd, void *data, int *psp_ret)
+int sev_do_cmd(int cmd, void *data, int *psp_ret)
 {
 	int rc;
 
@@ -425,6 +425,7 @@ static int sev_do_cmd(int cmd, void *data, int *psp_ret)
 
 	return rc;
 }
+EXPORT_SYMBOL_GPL(sev_do_cmd);
 
 static int __sev_init_locked(int *error)
 {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 8cfe92e82743..46f61e3ae33b 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -907,6 +907,20 @@ int sev_guest_df_flush(int *error);
  */
 int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 
+/**
+ * sev_do_cmd - perform SEV command
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV    if the SEV device is not available
+ * -%ENOTSUPP  if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO       if the SEV returned a non-zero return code
+ */
+int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -924,6 +938,9 @@ sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENO
 static inline int
 sev_guest_decommission(struct sev_data_decommission *data, int *error) { return -ENODEV; }
 
+static inline int
+sev_do_cmd(int cmd, void *data, int *psp_ret) { return -ENODEV; }
+
 static inline int
 sev_guest_activate(struct sev_data_activate *data, int *error) { return -ENODEV; }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 23/56] crypto: ccp: Introduce snp leaked pages list
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (21 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 22/56] crypto:ccp: Provide API to issue SEV and SNP commands Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-03 15:54   ` Vlastimil Babka
  2023-02-20 18:38 ` [PATCH RFC v8 24/56] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
                   ` (34 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

From: Ashish Kalra <ashish.kalra@amd.com>

Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 28 ++++++++++++++++++++++++++++
 include/linux/psp-sev.h      |  8 ++++++++
 2 files changed, 36 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 35f605936f1b..eca4e59b0f44 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -42,6 +42,12 @@
 static DEFINE_MUTEX(sev_cmd_mutex);
 static struct sev_misc_dev *misc_dev;
 
+/* list of pages which are leaked and cannot be reclaimed */
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
+
 static int psp_cmd_timeout = 100;
 module_param(psp_cmd_timeout, int, 0644);
 MODULE_PARM_DESC(psp_cmd_timeout, " default timeout value, in seconds, for PSP commands");
@@ -188,6 +194,28 @@ static int sev_cmd_buffer_len(int cmd)
 	return 0;
 }
 
+void snp_mark_pages_offline(unsigned long pfn, unsigned int npages)
+{
+	struct page *page = pfn_to_page(pfn);
+
+	WARN(1, "psc failed, pfn 0x%lx pages %d (marked offline)\n", pfn, npages);
+
+	spin_lock(&snp_leaked_pages_list_lock);
+	while (npages--) {
+		/*
+		 * Reuse the page's buddy list for chaining into the leaked
+		 * pages list. This page should not be on a free list currently
+		 * and is also unsafe to be added to a free list.
+		 */
+		list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
+		sev_dump_rmpentry(pfn);
+		pfn++;
+	}
+	spin_unlock(&snp_leaked_pages_list_lock);
+	atomic_long_inc(&snp_nr_leaked_pages);
+}
+EXPORT_SYMBOL_GPL(snp_mark_pages_offline);
+
 static void *sev_fw_alloc(unsigned long len)
 {
 	struct page *page;
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 46f61e3ae33b..8edf5c548fbf 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -923,6 +923,12 @@ int sev_do_cmd(int cmd, void *data, int *psp_ret);
 
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 
+/**
+ * sev_mark_pages_offline - insert non-reclaimed firmware/guest pages
+ * into a leaked pages list.
+ */
+void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
+
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
 
 static inline int
@@ -951,6 +957,8 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
 
 static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
 
+void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 24/56] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (22 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 23/56] crypto: ccp: Introduce snp leaked pages list Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-21  9:28   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 25/56] crypto: ccp: Handle the legacy SEV command " Michael Roth
                   ` (33 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.

Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
when SNP is enabled to satisfy new requirements for the SNP. Continue
allocating a 1mb region for !SNP configuration.

While at it, provide API that can be used by others to allocate a page
that can be used by the firmware. The immediate user for this API will
be the KVM driver. The KVM driver to need to allocate a firmware context
page during the guest creation. The context page need to be updated
by the firmware. See the SEV-SNP specification for further details.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 148 +++++++++++++++++++++++++++++++++--
 include/linux/psp-sev.h      |   9 +++
 2 files changed, 149 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index eca4e59b0f44..4c12e98a1219 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -94,6 +94,13 @@ static void *sev_init_ex_buffer;
  */
 struct sev_data_range_list *snp_range_list;
 
+/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
+#define SEV_SNP_ES_TMR_SIZE	(2 * 1024 * 1024)
+
+static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
+
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+
 static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -216,11 +223,134 @@ void snp_mark_pages_offline(unsigned long pfn, unsigned int npages)
 }
 EXPORT_SYMBOL_GPL(snp_mark_pages_offline);
 
+static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
+{
+	/* Cbit maybe set in the paddr */
+	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+	int ret, err, i, n = 0;
+
+	if (!pfn_valid(pfn)) {
+		pr_err("%s: Invalid PFN %lx\n", __func__, pfn);
+		return 0;
+	}
+
+	for (i = 0; i < npages; i++, pfn++, n++) {
+		paddr = pfn << PAGE_SHIFT;
+
+		if (locked)
+			ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
+		else
+			ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
+
+		if (ret)
+			goto cleanup;
+
+		ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+		if (ret)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	/*
+	 * If failed to reclaim the page then page is no longer safe to
+	 * be release back to the system, leak it.
+	 */
+	snp_mark_pages_offline(pfn, npages - n);
+	return ret;
+}
+
+static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
+{
+	/* Cbit maybe set in the paddr */
+	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+	int rc, n = 0, i;
+
+	for (i = 0; i < npages; i++, n++, pfn++) {
+		rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
+		if (rc)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	/*
+	 * Try unrolling the firmware state changes by
+	 * reclaiming the pages which were already changed to the
+	 * firmware state.
+	 */
+	snp_reclaim_pages(paddr, n, locked);
+
+	return rc;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
+{
+	unsigned long npages = 1ul << order, paddr;
+	struct sev_device *sev;
+	struct page *page;
+
+	if (!psp_master || !psp_master->sev_data)
+		return NULL;
+
+	page = alloc_pages(gfp_mask, order);
+	if (!page)
+		return NULL;
+
+	/* If SEV-SNP is initialized then add the page in RMP table. */
+	sev = psp_master->sev_data;
+	if (!sev->snp_initialized)
+		return page;
+
+	paddr = __pa((unsigned long)page_address(page));
+	if (rmp_mark_pages_firmware(paddr, npages, locked))
+		return NULL;
+
+	return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+	struct page *page;
+
+	page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
+
+	return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	unsigned long paddr, npages = 1ul << order;
+
+	if (!page)
+		return;
+
+	paddr = __pa((unsigned long)page_address(page));
+	if (sev->snp_initialized &&
+	    snp_reclaim_pages(paddr, npages, locked))
+		return;
+
+	__free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+	if (!addr)
+		return;
+
+	__snp_free_firmware_pages(virt_to_page(addr), 0, false);
+}
+EXPORT_SYMBOL_GPL(snp_free_firmware_page);
+
 static void *sev_fw_alloc(unsigned long len)
 {
 	struct page *page;
 
-	page = alloc_pages(GFP_KERNEL, get_order(len));
+	page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
 	if (!page)
 		return NULL;
 
@@ -468,7 +598,7 @@ static int __sev_init_locked(int *error)
 		data.tmr_address = __pa(sev_es_tmr);
 
 		data.flags |= SEV_INIT_FLAGS_SEV_ES;
-		data.tmr_len = SEV_ES_TMR_SIZE;
+		data.tmr_len = sev_es_tmr_size;
 	}
 
 	return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -491,7 +621,7 @@ static int __sev_init_ex_locked(int *error)
 		data.tmr_address = __pa(sev_es_tmr);
 
 		data.flags |= SEV_INIT_FLAGS_SEV_ES;
-		data.tmr_len = SEV_ES_TMR_SIZE;
+		data.tmr_len = sev_es_tmr_size;
 	}
 
 	return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
@@ -982,6 +1112,8 @@ static int __sev_snp_init_locked(int *error)
 	sev->snp_initialized = true;
 	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
 
+	sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
+
 	return rc;
 }
 
@@ -1499,8 +1631,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		/* The TMR area was encrypted, flush it from the cache */
 		wbinvd_on_all_cpus();
 
-		free_pages((unsigned long)sev_es_tmr,
-			   get_order(SEV_ES_TMR_SIZE));
+		__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
+					  get_order(sev_es_tmr_size),
+					  false);
 		sev_es_tmr = NULL;
 	}
 
@@ -1511,8 +1644,7 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 	}
 
 	if (snp_range_list) {
-		free_pages((unsigned long)snp_range_list,
-			   get_order(PAGE_SIZE));
+		snp_free_firmware_page(snp_range_list);
 		snp_range_list = NULL;
 	}
 
@@ -1593,7 +1725,7 @@ void sev_pci_init(void)
 	}
 
 	/* Obtain the TMR memory area for SEV-ES use */
-	sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
+	sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
 	if (!sev_es_tmr)
 		dev_warn(sev->dev,
 			 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 8edf5c548fbf..d19744807471 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -922,6 +922,8 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 int sev_do_cmd(int cmd, void *data, int *psp_ret);
 
 void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);
 
 /**
  * sev_mark_pages_offline - insert non-reclaimed firmware/guest pages
@@ -959,6 +961,13 @@ static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_P
 
 void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}
 
+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+	return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 25/56] crypto: ccp: Handle the legacy SEV command when SNP is enabled
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (23 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 24/56] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 26/56] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
                   ` (32 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..

A command buffer may contains a system physical address that the firmware
may write to. There are two cases that need to be handled:

1) system physical address points to a guest memory
2) system physical address points to a host memory

To handle the case #1, change the page state to the firmware in the RMP
table before issuing the command and restore the state to shared after the
command completes.

For the case #2, use a bounce buffer to complete the request.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 371 ++++++++++++++++++++++++++++++++++-
 drivers/crypto/ccp/sev-dev.h |  12 ++
 2 files changed, 373 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 4c12e98a1219..fd8893af6ed7 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -27,6 +27,7 @@
 
 #include <asm/smp.h>
 #include <asm/e820/types.h>
+#include <asm/sev.h>
 
 #include "psp-dev.h"
 #include "sev-dev.h"
@@ -286,6 +287,30 @@ static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, boo
 	return rc;
 }
 
+static int rmp_mark_pages_shared(unsigned long paddr, unsigned int npages)
+{
+	/* Cbit maybe set in the paddr */
+	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+	int rc, n = 0, i;
+
+	for (i = 0; i < npages; i++, pfn++, n++) {
+		rc = rmp_make_shared(pfn, PG_LEVEL_4K);
+		if (rc)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	/*
+	 * If failed to change the page state to shared, then its not safe
+	 * to release the page back to the system, leak it.
+	 */
+	snp_mark_pages_offline(pfn, npages - n);
+
+	return rc;
+}
+
 static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
 {
 	unsigned long npages = 1ul << order, paddr;
@@ -487,12 +512,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
 	return sev_write_init_ex_file();
 }
 
+static int alloc_snp_host_map(struct sev_device *sev)
+{
+	struct page *page;
+	int i;
+
+	for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+		struct snp_host_map *map = &sev->snp_host_map[i];
+
+		memset(map, 0, sizeof(*map));
+
+		page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
+		if (!page)
+			return -ENOMEM;
+
+		map->host = page_address(page);
+	}
+
+	return 0;
+}
+
+static void free_snp_host_map(struct sev_device *sev)
+{
+	int i;
+
+	for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+		struct snp_host_map *map = &sev->snp_host_map[i];
+
+		if (map->host) {
+			__free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
+			memset(map, 0, sizeof(*map));
+		}
+	}
+}
+
+static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+	unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+	map->active = false;
+
+	if (!paddr || !len)
+		return 0;
+
+	map->paddr = *paddr;
+	map->len = len;
+
+	/* If paddr points to a guest memory then change the page state to firmwware. */
+	if (guest) {
+		if (rmp_mark_pages_firmware(*paddr, npages, true))
+			return -EFAULT;
+
+		goto done;
+	}
+
+	if (!map->host)
+		return -ENOMEM;
+
+	/* Check if the pre-allocated buffer can be used to fullfil the request. */
+	if (len > SEV_FW_BLOB_MAX_SIZE)
+		return -EINVAL;
+
+	/* Transition the pre-allocated buffer to the firmware state. */
+	if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
+		return -EFAULT;
+
+	/* Set the paddr to use pre-allocated firmware buffer */
+	*paddr = __psp_pa(map->host);
+
+done:
+	map->active = true;
+	return 0;
+}
+
+static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+	unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+	if (!map->active)
+		return 0;
+
+	/* If paddr points to a guest memory then restore the page state to hypervisor. */
+	if (guest) {
+		if (snp_reclaim_pages(*paddr, npages, true))
+			return -EFAULT;
+
+		goto done;
+	}
+
+	/*
+	 * Transition the pre-allocated buffer to hypervisor state before the access.
+	 *
+	 * This is because while changing the page state to firmware, the kernel unmaps
+	 * the pages from the direct map, and to restore the direct map the pages must
+	 * be transitioned back to the shared state.
+	 */
+	if (snp_reclaim_pages(__pa(map->host), npages, true))
+		return -EFAULT;
+
+	/* Copy the response data firmware buffer to the callers buffer. */
+	memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
+	*paddr = map->paddr;
+
+done:
+	map->active = false;
+	return 0;
+}
+
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+	switch (cmd) {
+	case SEV_CMD_PLATFORM_STATUS:
+	case SEV_CMD_GUEST_STATUS:
+	case SEV_CMD_LAUNCH_START:
+	case SEV_CMD_RECEIVE_START:
+	case SEV_CMD_LAUNCH_MEASURE:
+	case SEV_CMD_SEND_START:
+	case SEV_CMD_SEND_UPDATE_DATA:
+	case SEV_CMD_SEND_UPDATE_VMSA:
+	case SEV_CMD_PEK_CSR:
+	case SEV_CMD_PDH_CERT_EXPORT:
+	case SEV_CMD_GET_ID:
+	case SEV_CMD_ATTESTATION_REPORT:
+		return true;
+	default:
+		return false;
+	}
+}
+
+#define prep_buffer(name, addr, len, guest, map) \
+	func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
+
+static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
+{
+	int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
+	struct sev_device *sev = psp_master->sev_data;
+	bool from_fw = !to_fw;
+
+	/*
+	 * After the command is completed, change the command buffer memory to
+	 * hypervisor state.
+	 *
+	 * The immutable bit is automatically cleared by the firmware, so
+	 * no not need to reclaim the page.
+	 */
+	if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
+		if (rmp_mark_pages_shared(__pa(cmd_buf), 1))
+			return -EFAULT;
+
+		/* No need to go further if firmware failed to execute command. */
+		if (fw_err)
+			return 0;
+	}
+
+	if (to_fw)
+		func = map_firmware_writeable;
+	else
+		func = unmap_firmware_writeable;
+
+	/*
+	 * A command buffer may contains a system physical address. If the address
+	 * points to a host memory then use an intermediate firmware page otherwise
+	 * change the page state in the RMP table.
+	 */
+	switch (cmd) {
+	case SEV_CMD_PDH_CERT_EXPORT:
+		if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
+				pdh_cert_len, false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
+				cert_chain_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_GET_ID:
+		if (prep_buffer(struct sev_data_get_id, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_PEK_CSR:
+		if (prep_buffer(struct sev_data_pek_csr, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_launch_update_data, address, len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_MEASURE:
+		if (prep_buffer(struct sev_data_launch_measure, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_SECRET:
+		if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_DBG_DECRYPT:
+		if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
+				&sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_DBG_ENCRYPT:
+		if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
+				&sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_ATTESTATION_REPORT:
+		if (prep_buffer(struct sev_data_attestation_report, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_START:
+		if (prep_buffer(struct sev_data_send_start, session_address,
+				session_len, false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_send_update_data, trans_address,
+				trans_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
+				trans_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_RECEIVE_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_receive_update_data, guest_address,
+				guest_len, true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_RECEIVE_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
+				guest_len, true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	default:
+		break;
+	}
+
+	/* The command buffer need to be in the firmware state. */
+	if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
+		if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
+			return -EFAULT;
+	}
+
+	return 0;
+
+err:
+	return -EINVAL;
+}
+
+static inline bool need_firmware_copy(int cmd)
+{
+	struct sev_device *sev = psp_master->sev_data;
+
+	/* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
+	return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
+}
+
+static int snp_aware_copy_to_firmware(int cmd, void *data)
+{
+	return __snp_cmd_buf_copy(cmd, data, true, 0);
+}
+
+static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
+{
+	return __snp_cmd_buf_copy(cmd, data, false, fw_err);
+}
+
 static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 {
 	struct psp_device *psp = psp_master;
 	struct sev_device *sev;
 	unsigned int phys_lsb, phys_msb;
 	unsigned int reg, ret = 0;
+	void *cmd_buf;
 	int buf_len;
 
 	if (!psp || !psp->sev_data)
@@ -512,12 +820,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 	 * work for some memory, e.g. vmalloc'd addresses, and @data may not be
 	 * physically contiguous.
 	 */
-	if (data)
-		memcpy(sev->cmd_buf, data, buf_len);
+	if (data) {
+		if (sev->cmd_buf_active > 2)
+			return -EBUSY;
+
+		cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
+
+		memcpy(cmd_buf, data, buf_len);
+		sev->cmd_buf_active++;
+
+		/*
+		 * The behavior of the SEV-legacy commands is altered when the
+		 * SNP firmware is in the INIT state.
+		 */
+		if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, cmd_buf))
+			return -EFAULT;
+	} else {
+		cmd_buf = sev->cmd_buf;
+	}
 
 	/* Get the physical address of the command buffer */
-	phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
-	phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+	phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+	phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
 
 	dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
 		cmd, phys_msb, phys_lsb, psp_timeout);
@@ -560,15 +884,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 		ret = sev_write_init_ex_file_if_required(cmd);
 	}
 
-	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
-			     buf_len, false);
-
 	/*
 	 * Copy potential output from the PSP back to data.  Do this even on
 	 * failure in case the caller wants to glean something from the error.
 	 */
-	if (data)
-		memcpy(data, sev->cmd_buf, buf_len);
+	if (data) {
+		/*
+		 * Restore the page state after the command completes.
+		 */
+		if (need_firmware_copy(cmd) &&
+		    snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
+			return -EFAULT;
+
+		memcpy(data, cmd_buf, buf_len);
+		sev->cmd_buf_active--;
+	}
+
+	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+			     buf_len, false);
 
 	return ret;
 }
@@ -1579,10 +1912,12 @@ int sev_dev_init(struct psp_device *psp)
 	if (!sev)
 		goto e_err;
 
-	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
 	if (!sev->cmd_buf)
 		goto e_sev;
 
+	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
 	psp->sev_data = sev;
 
 	sev->dev = dev;
@@ -1648,6 +1983,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		snp_range_list = NULL;
 	}
 
+	/*
+	 * The host map need to clear the immutable bit so it must be free'd before the
+	 * SNP firmware shutdown.
+	 */
+	free_snp_host_map(sev);
+
 	sev_snp_shutdown(&error);
 }
 
@@ -1722,6 +2063,14 @@ void sev_pci_init(void)
 				dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
 			}
 		}
+
+		/*
+		 * Allocate the intermediate buffers used for the legacy command handling.
+		 */
+		if (alloc_snp_host_map(sev)) {
+			dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
+			goto skip_legacy;
+		}
 	}
 
 	/* Obtain the TMR memory area for SEV-ES use */
@@ -1739,12 +2088,14 @@ void sev_pci_init(void)
 		dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
 			error, rc);
 
+skip_legacy:
 	dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
 		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
 
 	return;
 
 err:
+	free_snp_host_map(sev);
 	psp_master->sev_data = NULL;
 }
 
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 34767657beb5..19d79f9d4212 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -29,11 +29,20 @@
 #define SEV_CMDRESP_CMD_SHIFT		16
 #define SEV_CMDRESP_IOC			BIT(0)
 
+#define MAX_SNP_HOST_MAP_BUFS		2
+
 struct sev_misc_dev {
 	struct kref refcount;
 	struct miscdevice misc;
 };
 
+struct snp_host_map {
+	u64 paddr;
+	u32 len;
+	void *host;
+	bool active;
+};
+
 struct sev_device {
 	struct device *dev;
 	struct psp_device *psp;
@@ -52,8 +61,11 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+	void *cmd_buf_backup;
+	int cmd_buf_active;
 
 	bool snp_initialized;
+	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
 };
 
 int sev_dev_init(struct psp_device *psp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 26/56] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (24 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 25/56] crypto: ccp: Handle the legacy SEV command " Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
                   ` (31 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The command can be used by the userspace to query the SNP platform status
report. See the SEV-SNP spec for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst | 27 ++++++++++++++++
 drivers/crypto/ccp/sev-dev.c          | 45 +++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h          |  1 +
 3 files changed, 73 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index bf593e88cfd9..11ea67c944df 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -61,6 +61,22 @@ counter (e.g. counter overflow), then -EIO will be returned.
                 __u64 fw_err;
         };
 
+The host ioctl should be called to /dev/sev device. The ioctl accepts command
+id and command input structure.
+
+::
+        struct sev_issue_cmd {
+                /* Command ID */
+                __u32 cmd;
+
+                /* Command request structure */
+                __u64 data;
+
+                /* firmware error code on failure (see psp-sev.h) */
+                __u32 error;
+        };
+
+
 2.1 SNP_GET_REPORT
 ------------------
 
@@ -118,6 +134,17 @@ be updated with the expected value.
 
 See GHCB specification for further detail on how to parse the certificate blob.
 
+2.4 SNP_PLATFORM_STATUS
+-----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_platform_status
+:Returns (out): 0 on success, -negative on error
+
+The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
+status includes API major, minor version and more. See the SEV-SNP
+specification for further details.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index fd8893af6ed7..65e13a562f3b 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1751,6 +1751,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
 	return ret;
 }
 
+static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_data_snp_addr buf;
+	struct page *status_page;
+	void *data;
+	int ret;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	status_page = alloc_page(GFP_KERNEL_ACCOUNT);
+	if (!status_page)
+		return -ENOMEM;
+
+	data = page_address(status_page);
+	if (rmp_mark_pages_firmware(__pa(data), 1, true)) {
+		__free_pages(status_page, 0);
+		return -EFAULT;
+	}
+
+	buf.gctx_paddr = __psp_pa(data);
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
+
+	/* Change the page state before accessing it */
+	if (snp_reclaim_pages(__pa(data), 1, true)) {
+		snp_mark_pages_offline(__pa(data) >> PAGE_SHIFT, 1);
+		return -EFAULT;
+	}
+
+	if (ret)
+		goto cleanup;
+
+	if (copy_to_user((void __user *)argp->data, data,
+			 sizeof(struct sev_user_data_snp_status)))
+		ret = -EFAULT;
+
+cleanup:
+	__free_pages(status_page, 0);
+	return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -1802,6 +1844,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SEV_GET_ID2:
 		ret = sev_ioctl_do_get_id2(&input);
 		break;
+	case SNP_PLATFORM_STATUS:
+		ret = sev_ioctl_snp_platform_status(&input);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index c66f7c372645..5adfaea7df97 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,7 @@ enum {
 	SEV_PEK_CERT_IMPORT,
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
+	SNP_PLATFORM_STATUS,
 
 	SEV_MAX,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (25 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 26/56] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-22 12:32   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 28/56] crypto: ccp: Provide APIs to query extended attestation report Michael Roth
                   ` (30 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The SEV-SNP firmware provides the SNP_CONFIG command used to set the
system-wide configuration value for SNP guests. The information includes
the TCB version string to be reported in guest attestation reports.

Version 2 of the GHCB specification adds an NAE (SNP extended guest
request) that a guest can use to query the reports that include additional
certificates.

In both cases, userspace provided additional data is included in the
attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
command to give the certificate blob and the reported TCB version string
at once. Note that the specification defines certificate blob with a
specific GUID format; the userspace is responsible for building the
proper certificate blob. The ioctl treats it an opaque blob.

While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
command that can be used to obtain the data programmed through the
SNP_SET_EXT_CONFIG.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst |  27 ++++++
 drivers/crypto/ccp/sev-dev.c          | 123 ++++++++++++++++++++++++++
 drivers/crypto/ccp/sev-dev.h          |   4 +
 include/uapi/linux/psp-sev.h          |  17 ++++
 4 files changed, 171 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 11ea67c944df..6cad4226c348 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
 status includes API major, minor version and more. See the SEV-SNP
 specification for further details.
 
+2.5 SNP_SET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar to
+SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
+command also accepts an additional certificate blob defined in the GHCB
+specification.
+
+If the certs_address is zero, then the previous certificate blob will deleted.
+For more information on the certificate blob layout, see the GHCB spec
+(extended guest request message).
+
+2.6 SNP_GET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
+through the SNP_SET_EXT_CONFIG.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 65e13a562f3b..b56b00ca2cd4 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1481,6 +1481,10 @@ static int __sev_snp_shutdown_locked(int *error)
 	data.length = sizeof(data);
 	data.iommu_snp_shutdown = 1;
 
+	/* Free the memory used for caching the certificate data */
+	kfree(sev->snp_certs_data);
+	sev->snp_certs_data = NULL;
+
 	wbinvd_on_all_cpus();
 
 retry:
@@ -1793,6 +1797,118 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
 	return ret;
 }
 
+static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_user_data_ext_snp_config input;
+	int ret;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	memset(&input, 0, sizeof(input));
+
+	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+		return -EFAULT;
+
+	/* Copy the TCB version programmed through the SET_CONFIG to userspace */
+	if (input.config_address) {
+		if (copy_to_user((void * __user)input.config_address,
+				 &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
+			return -EFAULT;
+	}
+
+	/* Copy the extended certs programmed through the SNP_SET_CONFIG */
+	if (input.certs_address && sev->snp_certs_data) {
+		if (input.certs_len < sev->snp_certs_len) {
+			/* Return the certs length to userspace */
+			input.certs_len = sev->snp_certs_len;
+
+			ret = -ENOSR;
+			goto e_done;
+		}
+
+		if (copy_to_user((void * __user)input.certs_address,
+				 sev->snp_certs_data, sev->snp_certs_len))
+			return -EFAULT;
+	}
+
+	ret = 0;
+
+e_done:
+	if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
+		ret = -EFAULT;
+
+	return ret;
+}
+
+static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_user_data_ext_snp_config input;
+	struct sev_user_data_snp_config config;
+	void *certs = NULL;
+	int ret = 0;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	memset(&input, 0, sizeof(input));
+
+	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+		return -EFAULT;
+
+	/* Copy the certs from userspace */
+	if (input.certs_address) {
+		if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
+			return -EINVAL;
+
+		certs = psp_copy_user_blob(input.certs_address, input.certs_len);
+		if (IS_ERR(certs))
+			return PTR_ERR(certs);
+	}
+
+	/* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
+	if (input.config_address) {
+		memset(&config, 0, sizeof(config));
+		if (copy_from_user(&config,
+				   (void __user *)input.config_address, sizeof(config))) {
+			ret = -EFAULT;
+			goto e_free;
+		}
+
+		ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
+		if (ret)
+			goto e_free;
+
+		memcpy(&sev->snp_config, &config, sizeof(config));
+	}
+
+	/*
+	 * If the new certs are passed then cache it else free the old certs.
+	 */
+	mutex_lock(&sev->snp_certs_lock);
+	if (certs) {
+		kfree(sev->snp_certs_data);
+		sev->snp_certs_data = certs;
+		sev->snp_certs_len = input.certs_len;
+	} else {
+		kfree(sev->snp_certs_data);
+		sev->snp_certs_data = NULL;
+		sev->snp_certs_len = 0;
+	}
+	mutex_unlock(&sev->snp_certs_lock);
+
+	return 0;
+
+e_free:
+	kfree(certs);
+	return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -1847,6 +1963,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SNP_PLATFORM_STATUS:
 		ret = sev_ioctl_snp_platform_status(&input);
 		break;
+	case SNP_SET_EXT_CONFIG:
+		ret = sev_ioctl_snp_set_config(&input, writable);
+		break;
+	case SNP_GET_EXT_CONFIG:
+		ret = sev_ioctl_snp_get_config(&input);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
@@ -1962,6 +2084,7 @@ int sev_dev_init(struct psp_device *psp)
 		goto e_sev;
 
 	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+	mutex_init(&sev->snp_certs_lock);
 
 	psp->sev_data = sev;
 
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 19d79f9d4212..41d5353d5bab 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -66,6 +66,10 @@ struct sev_device {
 
 	bool snp_initialized;
 	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
+	void *snp_certs_data;
+	u32 snp_certs_len;
+	struct mutex snp_certs_lock;
+	struct sev_user_data_snp_config snp_config;
 };
 
 int sev_dev_init(struct psp_device *psp);
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 5adfaea7df97..c20d37586d21 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -29,6 +29,8 @@ enum {
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
 	SNP_PLATFORM_STATUS,
+	SNP_SET_EXT_CONFIG,
+	SNP_GET_EXT_CONFIG,
 
 	SEV_MAX,
 };
@@ -192,6 +194,21 @@ struct sev_user_data_snp_config {
 	__u8 rsvd1[52];
 } __packed;
 
+/**
+ * struct sev_data_snp_ext_config - system wide configuration value for SNP.
+ *
+ * @config_address: address of the struct sev_user_data_snp_config or 0 when
+ *		reported_tcb does not need to be updated.
+ * @certs_address: address of extended guest request certificate chain or
+ *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
+ * @certs_len: length of the certs
+ */
+struct sev_user_data_ext_snp_config {
+	__u64 config_address;		/* In */
+	__u64 certs_address;		/* In */
+	__u32 certs_len;		/* In */
+};
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 28/56] crypto: ccp: Provide APIs to query extended attestation report
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (26 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-22 20:24   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 29/56] KVM: SVM: Add support to handle AP reset MSR protocol Michael Roth
                   ` (29 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of the GHCB specification defines VMGEXIT that is used to get
the extended attestation report. The extended attestation report includes
the certificate blobs provided through the SNP_SET_EXT_CONFIG.

The snp_guest_ext_guest_request() will be used by the hypervisor to get
the extended attestation report. See the GHCB specification for more
details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 47 ++++++++++++++++++++++++++++++++++++
 include/linux/psp-sev.h      | 33 +++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index b56b00ca2cd4..e65563bc8298 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2017,6 +2017,53 @@ int sev_guest_df_flush(int *error)
 }
 EXPORT_SYMBOL_GPL(sev_guest_df_flush);
 
+int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+				unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
+{
+	unsigned long expected_npages;
+	struct sev_device *sev;
+	int rc;
+
+	if (!psp_master || !psp_master->sev_data)
+		return -ENODEV;
+
+	sev = psp_master->sev_data;
+
+	if (!sev->snp_initialized)
+		return -EINVAL;
+
+	mutex_lock(&sev->snp_certs_lock);
+	/*
+	 * Check if there is enough space to copy the certificate chain. Otherwise
+	 * return ERROR code defined in the GHCB specification.
+	 */
+	expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
+	if (*npages < expected_npages) {
+		*npages = expected_npages;
+		*fw_err = SNP_GUEST_REQ_INVALID_LEN;
+		mutex_unlock(&sev->snp_certs_lock);
+		return -EINVAL;
+	}
+
+	rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)fw_err);
+	if (rc) {
+		mutex_unlock(&sev->snp_certs_lock);
+		return rc;
+	}
+
+	/* Copy the certificate blob */
+	if (sev->snp_certs_data) {
+		*npages = expected_npages;
+		memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
+	} else {
+		*npages = 0;
+	}
+
+	mutex_unlock(&sev->snp_certs_lock);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
+
 static void sev_exit(struct kref *ref)
 {
 	misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index d19744807471..81bafc049eca 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -931,6 +931,32 @@ void snp_free_firmware_page(void *addr);
  */
 void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
 
+/**
+ * snp_guest_ext_guest_request - perform the SNP extended guest request command
+ *  defined in the GHCB specification.
+ *
+ * @data: the input guest request structure
+ * @vaddr: address where the certificate blob need to be copied.
+ * @npages: number of pages for the certificate blob.
+ *    If the specified page count is less than the certificate blob size, then the
+ *    required page count is returned with error code defined in the GHCB spec.
+ *    If the specified page count is more than the certificate blob size, then
+ *    page count is updated to reflect the amount of valid data copied in the
+ *    vaddr.
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+				unsigned long vaddr, unsigned long *npages,
+				unsigned long *error);
+
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
 
 static inline int
@@ -968,6 +994,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
 
 static inline void snp_free_firmware_page(void *addr) { }
 
+static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+					      unsigned long vaddr, unsigned long *n,
+					      unsigned long *error)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 29/56] KVM: SVM: Add support to handle AP reset MSR protocol
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (27 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 28/56] crypto: ccp: Provide APIs to query extended attestation report Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 30/56] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Michael Roth
                   ` (28 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  2 ++
 arch/x86/kvm/svm/sev.c            | 56 ++++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.h            |  1 +
 3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b8357d6ecd47..e15548d88f2a 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -56,6 +56,8 @@
 /* AP Reset Hold */
 #define GHCB_MSR_AP_RESET_HOLD_REQ	0x006
 #define GHCB_MSR_AP_RESET_HOLD_RESP	0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ad9b29ff4590..05eda0940e22 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -58,6 +58,10 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+#define AP_RESET_HOLD_NONE		0
+#define AP_RESET_HOLD_NAE_EVENT		1
+#define AP_RESET_HOLD_MSR_PROTO		2
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2706,6 +2710,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 
 void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 {
+	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
+	svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
 	if (!svm->sev_es.ghcb)
 		return;
 
@@ -2918,6 +2925,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_AP_RESET_HOLD_REQ:
+		svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+		ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+		/*
+		 * Preset the result to a non-SIPI return and then only set
+		 * the result to non-zero when delivering a SIPI.
+		 */
+		set_ghcb_msr_bits(svm, 0,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3017,6 +3040,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
+		svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
 		ret = kvm_emulate_ap_reset_hold(vcpu);
 		break;
 	case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3177,13 +3201,29 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		return;
 	}
 
-	/*
-	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
-	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
-	 * non-zero value.
-	 */
-	if (!svm->sev_es.ghcb)
-		return;
+	/* Subsequent SIPI */
+	switch (svm->sev_es.ap_reset_hold_type) {
+	case AP_RESET_HOLD_NAE_EVENT:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+		 */
+		ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+		break;
+	case AP_RESET_HOLD_MSR_PROTO:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set GHCB data field to a non-zero value.
+		 */
+		set_ghcb_msr_bits(svm, 1,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
 
-	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	default:
+		break;
+	}
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 56e306a1f0c7..f4848e6aba28 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -191,6 +191,7 @@ struct vcpu_sev_es_state {
 	struct ghcb *ghcb;
 	struct kvm_host_map ghcb_map;
 	bool received_first_sipi;
+	unsigned int ap_reset_hold_type;
 
 	/* SEV-ES scratch area support */
 	void *ghcb_sa;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 30/56] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (28 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 29/56] KVM: SVM: Add support to handle AP reset MSR protocol Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 31/56] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
                   ` (27 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  2 ++
 arch/x86/kvm/svm/sev.c            | 14 ++++++++++++++
 arch/x86/kvm/svm/svm.h            |  3 ++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index e15548d88f2a..539de6b93420 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
 /* GHCB Hypervisor Feature Request/Response */
 #define GHCB_MSR_HV_FT_REQ		0x080
 #define GHCB_MSR_HV_FT_RESP		0x081
+#define GHCB_MSR_HV_FT_POS		12
+#define GHCB_MSR_HV_FT_MASK		GENMASK_ULL(51, 0)
 #define GHCB_MSR_HV_FT_RESP_VAL(v)			\
 	/* GHCBData[63:12] */				\
 	(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 05eda0940e22..c1f0d4898ce3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2675,6 +2675,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+	case SVM_VMGEXIT_HV_FEATURES:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -2941,6 +2942,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK,
 				  GHCB_MSR_INFO_POS);
 		break;
+	case GHCB_MSR_HV_FT_REQ: {
+		set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+				  GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3065,6 +3073,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_HV_FEATURES: {
+		ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
+
+		ret = 1;
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f4848e6aba28..c249c360fe36 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -662,9 +662,10 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
-#define GHCB_VERSION_MAX	1ULL
+#define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
+#define GHCB_HV_FT_SUPPORTED	0
 
 extern unsigned int max_sev_asid;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 31/56] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (29 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 30/56] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-22 20:42   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 32/56] KVM: SVM: Add initial SEV-SNP support Michael Roth
                   ` (26 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.

When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing   pages as "in-use" in the RMP after a successful VMRUN.  This
is done for _all_ VMs, not just SNP-Active VMs.

If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2mb aligned and software accesses any
part of the associated 2mb region with a hupage, the CPU will
incorrectly treat the entire 2mb region as in-use and signal a spurious
RMP violation #PF.

The recommended is to not use the hugepage for the VMCB, VMSA or
AVIC backing page. Add a generic allocator that will ensure that the
page returns is not hugepage (2mb or 1gb) and is safe to be used when
SEV-SNP is enabled.

Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/lapic.c               |  5 ++++-
 arch/x86/kvm/svm/sev.c             | 33 ++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c             | 15 ++++++++++++--
 arch/x86/kvm/svm/svm.h             |  1 +
 6 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 6a885f024a00..e116405cbb5f 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
 KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
 KVM_X86_OP_OPTIONAL(invalidate_restricted_mem)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 37c92412035f..a9363a6f779d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1729,6 +1729,8 @@ struct kvm_x86_ops {
 	 * Returns vCPU specific APICv inhibit reasons
 	 */
 	unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
+
+	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 80f92cbc4029..72e46d5b4201 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2740,7 +2740,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 
 	vcpu->arch.apic = apic;
 
-	apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (kvm_x86_ops.alloc_apic_backing_page)
+		apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
+	else
+		apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
 	if (!apic->regs) {
 		printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
 		       vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c1f0d4898ce3..9e9efb42a766 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3241,3 +3241,36 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		break;
 	}
 }
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+	unsigned long pfn;
+	struct page *p;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+	/*
+	 * Allocate an SNP safe page to workaround the SNP erratum where
+	 * the CPU will incorrectly signal an RMP violation  #PF if a
+	 * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
+	 * or AVIC backing page. The recommeded workaround is to not use the
+	 * hugepage.
+	 *
+	 * Allocate one extra page, use a page which is not 2mb aligned
+	 * and free the other.
+	 */
+	p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+	if (!p)
+		return NULL;
+
+	split_page(p, 1);
+
+	pfn = page_to_pfn(p);
+	if (IS_ALIGNED(pfn, PTRS_PER_PMD))
+		__free_page(p++);
+	else
+		__free_page(p + 1);
+
+	return p;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 213593dbd7a1..1061aaf66f0a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1372,7 +1372,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 	svm = to_svm(vcpu);
 
 	err = -ENOMEM;
-	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	vmcb01_page = snp_safe_alloc_page(vcpu);
 	if (!vmcb01_page)
 		goto out;
 
@@ -1381,7 +1381,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 		 * SEV-ES guests require a separate VMSA page used to contain
 		 * the encrypted register state of the guest.
 		 */
-		vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+		vmsa_page = snp_safe_alloc_page(vcpu);
 		if (!vmsa_page)
 			goto error_free_vmcb_page;
 
@@ -4696,6 +4696,16 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+	struct page *page = snp_safe_alloc_page(vcpu);
+
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -4824,6 +4834,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c249c360fe36..5efcf036ccad 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -692,6 +692,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 32/56] KVM: SVM: Add initial SEV-SNP support
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (30 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 31/56] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-23 17:46   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 33/56] KVM: SVM: Add KVM_SNP_INIT command Michael Roth
                   ` (25 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality  while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.

The SNP feature is added incrementally, the later patches adds a new module
parameters that can be used to enabled SEV-SNP in the KVM.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 10 +++++++++-
 arch/x86/kvm/svm/svm.h |  8 ++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9e9efb42a766..51db01b282eb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -58,6 +58,9 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
 #define AP_RESET_HOLD_NONE		0
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
@@ -2306,6 +2309,7 @@ void __init sev_hardware_setup(void)
 {
 #ifdef CONFIG_KVM_AMD_SEV
 	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+	bool sev_snp_supported = false;
 	bool sev_es_supported = false;
 	bool sev_supported = false;
 
@@ -2385,12 +2389,16 @@ void __init sev_hardware_setup(void)
 	if (misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count))
 		goto out;
 
-	pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count);
 	sev_es_supported = true;
+	sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
+
+	pr_info("SEV-ES %ssupported: %u ASIDs\n",
+		sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);
 
 out:
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
+	sev_snp_enabled = sev_snp_supported;
 #endif
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5efcf036ccad..8eb1b51e92f5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -76,6 +76,7 @@ enum {
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
+	bool snp_active;	/* SEV-SNP enabled guest */
 	unsigned int asid;	/* ASID used for this guest */
 	unsigned int handle;	/* SEV firmware handle */
 	int fd;			/* SEV device fd */
@@ -323,6 +324,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
 #endif
 }
 
+static inline bool sev_snp_guest(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return sev_es_guest(kvm) && sev->snp_active;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 33/56] KVM: SVM: Add KVM_SNP_INIT command
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (31 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 32/56] KVM: SVM: Add initial SEV-SNP support Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
                   ` (24 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Pavan Kumar Paluri

From: Brijesh Singh <brijesh.singh@amd.com>

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

The flags value must be zero, it will be extended in future SNP support to
communicate the optional features (such as restricted INT injection etc).

Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    | 27 ++++++++++++
 arch/x86/include/asm/svm.h                    |  1 +
 arch/x86/kvm/svm/sev.c                        | 44 ++++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |  4 ++
 include/uapi/linux/kvm.h                      | 13 ++++++
 5 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 935aaeb97fe6..2432213bd0ea 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -434,6 +434,33 @@ issued by the hypervisor to make the guest ready for execution.
 
 Returns: 0 on success, -negative on error
 
+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_snp_init {
+                __u64 flags;
+        };
+
+The flags bitmap is defined as::
+
+   /* enable the restricted injection */
+   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
+
+   /* enable the restricted injection timer */
+   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
 References
 ==========
 
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index cb1ee53ad3b1..c18d78d5e505 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -278,6 +278,7 @@ enum avic_ipi_failure_cause {
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 #define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
 
+#define SVM_SEV_FEAT_SNP_ACTIVE		BIT(0)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 51db01b282eb..a8efe1f6bf77 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -243,6 +243,25 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 	sev_decommission(handle);
 }
 
+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_snp_init params;
+	int ret = 0;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+		ret = -EOPNOTSUPP;
+
+	params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
+		ret = -EFAULT;
+
+	return ret;
+}
+
 static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -256,13 +275,23 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		return ret;
 
 	sev->active = true;
-	sev->es_active = argp->id == KVM_SEV_ES_INIT;
+	sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
+	sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
 	asid = sev_asid_new(sev);
 	if (asid < 0)
 		goto e_no_asid;
 	sev->asid = asid;
 
-	ret = sev_platform_init(&argp->error);
+	if (sev->snp_active) {
+		ret = verify_snp_init_flags(kvm, argp);
+		if (ret)
+			goto e_free;
+
+		ret = sev_snp_init(&argp->error, false);
+	} else {
+		ret = sev_platform_init(&argp->error);
+	}
+
 	if (ret)
 		goto e_free;
 
@@ -277,6 +306,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	sev_asid_free(sev);
 	sev->asid = 0;
 e_no_asid:
+	sev->snp_active = false;
 	sev->es_active = false;
 	sev->active = false;
 	return ret;
@@ -749,6 +779,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->xss  = svm->vcpu.arch.ia32_xss;
 	save->dr6  = svm->vcpu.arch.dr6;
 
+	/* Enable the SEV-SNP feature */
+	if (sev_snp_guest(svm->vcpu.kvm))
+		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
@@ -2001,6 +2035,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	}
 
 	switch (sev_cmd.id) {
+	case KVM_SEV_SNP_INIT:
+		if (!sev_snp_enabled) {
+			r = -ENOTTY;
+			goto out;
+		}
+		fallthrough;
 	case KVM_SEV_ES_INIT:
 		if (!sev_es_enabled) {
 			r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8eb1b51e92f5..56a5c96d8a36 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -73,6 +73,9 @@ enum {
 /* TPR and CR2 are always written before VMRUN */
 #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
 
+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS		0x0
+
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
@@ -88,6 +91,7 @@ struct kvm_sev_info {
 	struct list_head mirror_entry; /* Use as a list entry of mirrors */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	atomic_t migration_in_progress;
+	u64 snp_init_flags;
 };
 
 struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2fba29125ec2..499cc323f793 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1917,6 +1917,9 @@ enum sev_cmd_id {
 	/* Guest Migration Extension */
 	KVM_SEV_SEND_CANCEL,
 
+	/* SNP specific commands */
+	KVM_SEV_SNP_INIT,
+
 	KVM_SEV_NR_MAX,
 };
 
@@ -2013,6 +2016,16 @@ struct kvm_sev_receive_update_data {
 	__u32 trans_len;
 };
 
+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
+
+struct kvm_snp_init {
+	__u64 flags;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (32 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 33/56] KVM: SVM: Add KVM_SNP_INIT command Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-23 21:41   ` Zhi Wang
  2023-04-26 17:06   ` Sabin Rapan
  2023-02-20 18:38 ` [PATCH RFC v8 35/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
                   ` (23 subsequent siblings)
  57 siblings, 2 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  24 ++++
 arch/x86/kvm/svm/sev.c                        | 121 +++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |   1 +
 include/uapi/linux/kvm.h                      |  10 ++
 4 files changed, 153 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 2432213bd0ea..58971fc02a15 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -461,6 +461,30 @@ The flags bitmap is defined as::
 If the specified flags is not supported then return -EOPNOTSUPP, and the supported
 flags are returned.
 
+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct  kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_start {
+                __u64 policy;           /* Guest policy to use. */
+                __u64 ma_uaddr;         /* userspace address of migration agent */
+                __u8 ma_en;             /* 1 if the migration agent is enabled */
+                __u8 imi_en;            /* set IMI to 1. */
+                __u8 gosvw[16];         /* guest OS visible workarounds */
+        };
+
+See the SEV-SNP specification for further detail on the launch input.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a8efe1f6bf77..097bb2138360 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -22,6 +22,7 @@
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
 #include <asm/fpu/xcr.h>
+#include <asm/sev.h>
 
 #include "mmu.h"
 #include "x86.h"
@@ -75,6 +76,8 @@ static unsigned int nr_asids;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 
+static int snp_decommission_context(struct kvm *kvm);
+
 struct enc_region {
 	struct list_head list;
 	unsigned long npages;
@@ -100,12 +103,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
 	down_write(&sev_deactivate_lock);
 
 	wbinvd_on_all_cpus();
-	ret = sev_guest_df_flush(&error);
+
+	if (sev_snp_enabled)
+		ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+	else
+		ret = sev_guest_df_flush(&error);
 
 	up_write(&sev_deactivate_lock);
 
 	if (ret)
-		pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+		pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+		       sev_snp_enabled ? "-SNP" : "", ret, error);
 
 	return ret;
 }
@@ -2011,6 +2019,80 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct sev_data_snp_addr data = {};
+	void *context;
+	int rc;
+
+	/* Allocate memory for context page */
+	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!context)
+		return NULL;
+
+	data.gctx_paddr = __psp_pa(context);
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+	if (rc) {
+		snp_free_firmware_page(context);
+		return NULL;
+	}
+
+	return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_activate data = {0};
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.asid   = sev_get_asid(kvm);
+	return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_start start = {0};
+	struct kvm_sev_snp_launch_start params;
+	int rc;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	sev->snp_context = snp_context_create(kvm, argp);
+	if (!sev->snp_context)
+		return -ENOTTY;
+
+	start.gctx_paddr = __psp_pa(sev->snp_context);
+	start.policy = params.policy;
+	memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	sev->fd = argp->sev_fd;
+	rc = snp_bind_asid(kvm, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	return 0;
+
+e_free_context:
+	snp_decommission_context(kvm);
+
+	return rc;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2101,6 +2183,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_FINISH:
 		r = sev_receive_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_START:
+		r = snp_launch_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2292,6 +2377,28 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+static int snp_decommission_context(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_addr data = {};
+	int ret;
+
+	/* If context is not created then do nothing */
+	if (!sev->snp_context)
+		return 0;
+
+	data.gctx_paddr = __sme_pa(sev->snp_context);
+	ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+	if (WARN_ONCE(ret, "failed to release guest context"))
+		return ret;
+
+	/* free the context page now */
+	snp_free_firmware_page(sev->snp_context);
+	sev->snp_context = NULL;
+
+	return 0;
+}
+
 void sev_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2333,7 +2440,15 @@ void sev_vm_destroy(struct kvm *kvm)
 		}
 	}
 
-	sev_unbind_asid(kvm, sev->handle);
+	if (sev_snp_guest(kvm)) {
+		if (snp_decommission_context(kvm)) {
+			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+			return;
+		}
+	} else {
+		sev_unbind_asid(kvm, sev->handle);
+	}
+
 	sev_asid_free(sev);
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 56a5c96d8a36..740969b57425 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -92,6 +92,7 @@ struct kvm_sev_info {
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	atomic_t migration_in_progress;
 	u64 snp_init_flags;
+	void *snp_context;      /* SNP guest context page */
 };
 
 struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 499cc323f793..cf19799ca5ce 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1919,6 +1919,7 @@ enum sev_cmd_id {
 
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
+	KVM_SEV_SNP_LAUNCH_START,
 
 	KVM_SEV_NR_MAX,
 };
@@ -2026,6 +2027,15 @@ struct kvm_snp_init {
 	__u64 flags;
 };
 
+struct kvm_sev_snp_launch_start {
+	__u64 policy;
+	__u64 ma_uaddr;
+	__u8 ma_en;
+	__u8 imi_en;
+	__u8 gosvw[16];
+	__u8 pad[6];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 35/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (33 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-24 11:55   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 36/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
                   ` (22 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

While terminating the guest, reclaim the guest pages added in the RMP
table. If the reclaim fails, then the page is no longer safe to be
released back to the system and leak them.

For more information see the SEV-SNP specification.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  29 +++
 arch/x86/kvm/svm/sev.c                        | 190 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |  19 ++
 3 files changed, 238 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 58971fc02a15..c94be8e6d657 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -485,6 +485,35 @@ Returns: 0 on success, -negative on error
 
 See the SEV-SNP specification for further detail on the launch input.
 
+20. KVM_SNP_LAUNCH_UPDATE
+-------------------------
+
+The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
+calculates a measurement of the memory contents. The measurement is a signature
+of the memory contents that can be sent to the guest owner as an attestation
+that the memory was encrypted correctly by the firmware.
+
+Parameters (in): struct  kvm_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_update {
+                __u64 start_gfn;        /* Guest page number to start from. */
+                __u64 uaddr;            /* userspace address need to be encrypted */
+                __u32 len;              /* length of memory region */
+                __u8 imi_page;          /* 1 if memory is part of the IMI */
+                __u8 page_type;         /* page type */
+                __u8 vmpl3_perms;       /* VMPL3 permission mask */
+                __u8 vmpl2_perms;       /* VMPL2 permission mask */
+                __u8 vmpl1_perms;       /* VMPL1 permission mask */
+        };
+
+See the SEV-SNP spec for further details on how to build the VMPL permission
+mask and page type.
+
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 097bb2138360..03dd227f6090 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -234,6 +234,37 @@ static void sev_decommission(unsigned int handle)
 	sev_guest_decommission(&decommission, NULL);
 }
 
+static int snp_page_reclaim(u64 pfn)
+{
+	struct sev_data_snp_page_reclaim data = {0};
+	int err, rc;
+
+	data.paddr = __sme_set(pfn << PAGE_SHIFT);
+	rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+	if (rc) {
+		/*
+		 * If the reclaim failed, then page is no longer safe
+		 * to use.
+		 */
+		snp_mark_pages_offline(pfn,
+				       page_level_size(PG_LEVEL_4K) >> PAGE_SHIFT);
+	}
+
+	return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+	int rc;
+
+	rc = rmp_make_shared(pfn, level);
+	if (rc && leak)
+		snp_mark_pages_offline(pfn,
+				       page_level_size(level) >> PAGE_SHIFT);
+
+	return rc;
+}
+
 static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 {
 	struct sev_data_deactivate deactivate;
@@ -2093,6 +2124,162 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return rc;
 }
 
+static int snp_launch_update_gfn_handler(struct kvm *kvm,
+					 struct kvm_gfn_range *range,
+					 void *opaque)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_memory_slot *memslot = range->slot;
+	struct sev_data_snp_launch_update data = {0};
+	struct kvm_sev_snp_launch_update params;
+	struct kvm_sev_cmd *argp = opaque;
+	int *error = &argp->error;
+	int i, n = 0, ret = 0;
+	unsigned long npages;
+	kvm_pfn_t *pfns;
+	gfn_t gfn;
+
+	if (!kvm_slot_can_be_private(memslot)) {
+		pr_err("SEV-SNP requires restricted memory.\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
+		pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
+		return -EFAULT;
+	}
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+
+	npages = range->end - range->start;
+	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
+	if (!pfns)
+		return -ENOMEM;
+
+	pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
+		 range->start, range->end, params.page_type);
+
+	for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
+		int order, level;
+		void *kvaddr;
+
+		ret = kvm_restrictedmem_get_pfn(memslot, gfn, &pfns[i], &order);
+		if (ret)
+			goto e_release;
+
+		n++;
+		ret = snp_lookup_rmpentry((u64)pfns[i], &level);
+		if (ret) {
+			pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d\n",
+			       gfn, ret);
+			return -EFAULT;
+		}
+
+		kvaddr = pfn_to_kaddr(pfns[i]);
+		if (!virt_addr_valid(kvaddr)) {
+			pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
+			ret = -EINVAL;
+			goto e_release;
+		}
+
+		ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+		if (ret) {
+			pr_err("Guest read failed, ret: 0x%x\n", ret);
+			goto e_release;
+		}
+
+		ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
+				       sev_get_asid(kvm), true);
+		if (ret) {
+			ret = -EFAULT;
+			goto e_release;
+		}
+
+		data.address = __sme_set(pfns[i] << PAGE_SHIFT);
+		data.page_size = X86_TO_RMP_PG_LEVEL(PG_LEVEL_4K);
+		data.page_type = params.page_type;
+		data.vmpl3_perms = params.vmpl3_perms;
+		data.vmpl2_perms = params.vmpl2_perms;
+		data.vmpl1_perms = params.vmpl1_perms;
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, error);
+		if (ret) {
+			pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
+			       ret, *error);
+			snp_page_reclaim(pfns[i]);
+
+			/*
+			 * When invalid CPUID function entries are detected, the firmware
+			 * corrects these entries for debugging purpose and leaves the
+			 * page unencrypted so it can be provided users for debugging
+			 * and error-reporting.
+			 *
+			 * Copy the corrected CPUID page back to shared memory so
+			 * userpsace can retrieve this information.
+			 */
+			if (params.page_type == SNP_PAGE_TYPE_CPUID &&
+			    *error == SEV_RET_INVALID_PARAM) {
+				int ret;
+
+				host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+				ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+				if (ret)
+					pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
+					       ret);
+			}
+
+
+			goto e_release;
+		}
+	}
+
+	/*
+	 * Memory attribute updates via KVM_SET_MEMORY_ATTRIBUTES are serialized
+	 * via kvm->slots_lock, so use the same protocol for updating them here.
+	 */
+	mutex_lock(&kvm->slots_lock);
+	kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+	mutex_unlock(&kvm->slots_lock);
+
+e_release:
+	/* Content of memory is updated, mark pages dirty */
+	for (i = 0; i < n; i++) {
+		set_page_dirty(pfn_to_page(pfns[i]));
+		mark_page_accessed(pfn_to_page(pfns[i]));
+
+		/*
+		 * If its an error, then update RMP entry to change page ownership
+		 * to the hypervisor.
+		 */
+		if (ret)
+			host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+		put_page(pfn_to_page(pfns[i]));
+	}
+
+	kvfree(pfns);
+	return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_launch_update params;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
+				      snp_launch_update_gfn_handler, argp);
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2186,6 +2373,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_START:
 		r = snp_launch_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_UPDATE:
+		r = snp_launch_update(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cf19799ca5ce..4098bba17aa4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1920,6 +1920,7 @@ enum sev_cmd_id {
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
+	KVM_SEV_SNP_LAUNCH_UPDATE,
 
 	KVM_SEV_NR_MAX,
 };
@@ -2036,6 +2037,24 @@ struct kvm_sev_snp_launch_start {
 	__u8 pad[6];
 };
 
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA		0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS		0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID		0x6
+
+struct kvm_sev_snp_launch_update {
+	__u64 start_gfn;
+	__u64 uaddr;
+	__u32 len;
+	__u8 imi_page;
+	__u8 page_type;
+	__u8 vmpl3_perms;
+	__u8 vmpl2_perms;
+	__u8 vmpl1_perms;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 36/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (34 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 35/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-24 14:40   ` Alexander Graf
  2023-02-20 18:38 ` [PATCH RFC v8 37/56] KVM: X86: Keep the NPT and RMP page level in sync Michael Roth
                   ` (21 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Harald Hoyer

From: Brijesh Singh <brijesh.singh@amd.com>

The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

If its an SNP guest, then VMSA was added in the RMP entry as
a guest owned page and also removed from the kernel direct map
so flush it later after it is transitioned back to hypervisor
state and restored in the direct map.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    |  23 ++++
 arch/x86/kvm/svm/sev.c                        | 122 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |  14 ++
 3 files changed, 159 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index c94be8e6d657..dafb0c9984f1 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -513,6 +513,29 @@ Returns: 0 on success, -negative on error
 See the SEV-SNP spec for further details on how to build the VMPL permission
 mask and page type.
 
+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_finish {
+                __u64 id_block_uaddr;
+                __u64 id_auth_uaddr;
+                __u8 id_block_en;
+                __u8 auth_key_en;
+                __u8 host_data[32];
+                __u8 pad[6];
+        };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.
 
 References
 ==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 03dd227f6090..515e22d0dc30 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2280,6 +2280,109 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 				      snp_launch_update_gfn_handler, argp);
 }
 
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {};
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+	int ret;
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.page_type = SNP_PAGE_TYPE_VMSA;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct vcpu_svm *svm = to_svm(vcpu);
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		/* Perform some pre-encryption checks against the VMSA */
+		ret = sev_es_sync_vmsa(svm);
+		if (ret)
+			return ret;
+
+		/* Transition the VMSA page to a firmware state. */
+		ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);
+		if (ret)
+			return ret;
+
+		/* Issue the SNP command to encrypt the VMSA */
+		data.address = __sme_pa(svm->sev_es.vmsa);
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, &argp->error);
+		if (ret) {
+			snp_page_reclaim(pfn);
+			return ret;
+		}
+
+		svm->vcpu.arch.guest_state_protected = true;
+	}
+
+	return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_launch_finish params;
+	struct sev_data_snp_launch_finish *data;
+	void *id_block = NULL, *id_auth = NULL;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	/* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+	ret = snp_launch_update_vmsa(kvm, argp);
+	if (ret)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	if (params.id_block_en) {
+		id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+		if (IS_ERR(id_block)) {
+			ret = PTR_ERR(id_block);
+			goto e_free;
+		}
+
+		data->id_block_en = 1;
+		data->id_block_paddr = __sme_pa(id_block);
+
+		id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+		if (IS_ERR(id_auth)) {
+			ret = PTR_ERR(id_auth);
+			goto e_free_id_block;
+		}
+
+		data->id_auth_paddr = __sme_pa(id_auth);
+
+		if (params.auth_key_en)
+			data->auth_key_en = 1;
+	}
+
+	memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+	kfree(id_auth);
+
+e_free_id_block:
+	kfree(id_block);
+
+e_free:
+	kfree(data);
+
+	return ret;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2376,6 +2479,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_UPDATE:
 		r = snp_launch_update(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_FINISH:
+		r = snp_launch_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2831,11 +2937,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 
 	svm = to_svm(vcpu);
 
+	/*
+	 * If its an SNP guest, then VMSA was added in the RMP entry as
+	 * a guest owned page. Transition the page to hypervisor state
+	 * before releasing it back to the system.
+	 * Also the page is removed from the kernel direct map, so flush it
+	 * later after it is transitioned back to hypervisor state and
+	 * restored in the direct map.
+	 */
+	if (sev_snp_guest(vcpu->kvm)) {
+		u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+		if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
+			goto skip_vmsa_free;
+	}
+
 	if (vcpu->arch.guest_state_protected)
 		sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
 
 	__free_page(virt_to_page(svm->sev_es.vmsa));
 
+skip_vmsa_free:
 	if (svm->sev_es.ghcb_sa_free)
 		kvfree(svm->sev_es.ghcb_sa);
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4098bba17aa4..2bab08a5b5d7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1921,6 +1921,7 @@ enum sev_cmd_id {
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
+	KVM_SEV_SNP_LAUNCH_FINISH,
 
 	KVM_SEV_NR_MAX,
 };
@@ -2055,6 +2056,19 @@ struct kvm_sev_snp_launch_update {
 	__u8 vmpl1_perms;
 };
 
+#define KVM_SEV_SNP_ID_BLOCK_SIZE	96
+#define KVM_SEV_SNP_ID_AUTH_SIZE	4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE	32
+
+struct kvm_sev_snp_launch_finish {
+	__u64 id_block_uaddr;
+	__u64 id_auth_uaddr;
+	__u8 id_block_en;
+	__u8 auth_key_en;
+	__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+	__u8 pad[6];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 37/56] KVM: X86: Keep the NPT and RMP page level in sync
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (35 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 36/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 38/56] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
                   ` (20 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Jarkko Sakkinen, Ashish Kalra

From: Brijesh Singh <brijesh.singh@amd.com>

When running an SEV-SNP VM, the sPA used to index the RMP entry is
obtained through the NPT translation (gva->gpa->spa). The NPT page
level is checked against the page level programmed in the RMP entry.
If the page level does not match, then it will cause a nested page
fault with the RMP bit set to indicate the RMP violation.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/mmu/mmu.c             |  9 ++++++
 arch/x86/kvm/svm/sev.c             | 51 ++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c             |  2 ++
 arch/x86/kvm/svm/svm.h             |  1 +
 6 files changed, 66 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index e116405cbb5f..87a087ec3277 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -135,6 +135,7 @@ KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
 KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
 KVM_X86_OP_OPTIONAL(invalidate_restricted_mem)
+KVM_X86_OP_OPTIONAL(adjust_mapping_level)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a9363a6f779d..456b42cb167b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
 	unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
 
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+
+	void (*adjust_mapping_level)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *level);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 360af0c9997e..d8e5254f314d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3081,6 +3081,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 
 out:
 	local_irq_restore(flags);
+
 	return level;
 }
 
@@ -3141,6 +3142,14 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot,
 						       fault->gfn, fault->max_level,
 						       fault->is_private);
+	if (kvm_slot_can_be_private(slot)) {
+		int req_level = fault->req_level;
+
+		static_call_cond(kvm_x86_adjust_mapping_level)(vcpu->kvm, fault->gfn, fault->pfn,
+							       &req_level);
+		fault->req_level = req_level;
+	}
+
 	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
 		return;
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 515e22d0dc30..e8740c35be39 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3749,3 +3749,54 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 
 	return p;
 }
+
+static bool is_gfn_range_shared(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	while (start++ < end)
+		if (kvm_mem_is_private(kvm, start))
+			return false;
+
+	return true;
+}
+
+void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *level)
+{
+	int assigned;
+	int rmp_level = 1;
+	int level_orig = *level;
+
+	if (!sev_snp_guest(kvm))
+		return;
+
+	/* If there's an error retrieving RMP entry, stick with 4K mappings */
+	assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+	if (unlikely(assigned < 0))
+		goto out_adjust;
+
+	if (!assigned) {
+		gfn_t huge_gfn;
+
+		/*
+		 * If all the pages are shared then no need to keep the RMP
+		 * and NPT in sync.
+		 */
+		huge_gfn = gfn & ~(PTRS_PER_PMD - 1);
+		if (is_gfn_range_shared(kvm, huge_gfn, huge_gfn + PTRS_PER_PMD))
+			goto out;
+	}
+
+	/*
+	 * The hardware installs 2MB TLB entries to access to 1GB pages,
+	 * therefore allow NPT to use 1GB pages when pfn was added as 2MB
+	 * in the RMP table.
+	 */
+	if (rmp_level == PG_LEVEL_2M && (*level == PG_LEVEL_1G))
+		goto out;
+
+out_adjust:
+	/* Adjust the level to keep the NPT and RMP in sync */
+	*level = min_t(size_t, *level, rmp_level);
+out:
+	pr_debug("%s: GFN: 0x%llx, PFN: 0x%llx, level: %d, rmp_level: %d, level_orig: %d, assigned: %d\n",
+		 __func__, gfn, pfn, *level, rmp_level, level_orig, assigned);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1061aaf66f0a..9eb750c8b04c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4835,6 +4835,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+	.adjust_mapping_level = sev_adjust_mapping_level,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 740969b57425..cbd4594f1cca 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -706,6 +706,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *level);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 38/56] KVM: x86: Define RMP page fault error bits for #NPF
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (36 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 37/56] KVM: X86: Keep the NPT and RMP page level in sync Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 39/56] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Michael Roth
                   ` (19 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled globally, the hardware places restrictions on all
memory accesses based on the RMP entry, whether the hypervisor or a VM,
performs the accesses. When hardware encounters an RMP access violation
during a guest access, it will cause a #VMEXIT(NPF).

See APM2 section 16.36.10 for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 456b42cb167b..d2e1c109dde5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -253,9 +253,13 @@ enum x86_intercept_stage;
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
 #define PFERR_SGX_BIT 15
+#define PFERR_GUEST_RMP_BIT 31
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
 #define PFERR_IMPLICIT_ACCESS_BIT 48
+#define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36
 
 #define PFERR_PRESENT_MASK	BIT(PFERR_PRESENT_BIT)
 #define PFERR_WRITE_MASK	BIT(PFERR_WRITE_BIT)
@@ -267,6 +271,10 @@ enum x86_intercept_stage;
 #define PFERR_GUEST_FINAL_MASK	BIT_ULL(PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+#define PFERR_GUEST_RMP_MASK	BIT_ULL(PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_ENC_MASK	BIT_ULL(PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_SIZEM_MASK	BIT_ULL(PFERR_GUEST_SIZEM_BIT)
+#define PFERR_GUEST_VMPL_MASK	BIT_ULL(PFERR_GUEST_VMPL_BIT)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 39/56] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (37 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 38/56] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 40/56] KVM: SVM: Add KVM_EXIT_VMGEXIT Michael Roth
                   ` (18 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
mismatch is detected then abort the guest.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev-common.h |  8 ++++++++
 arch/x86/kvm/svm/sev.c            | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h            |  7 +++++++
 3 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 539de6b93420..0a9055cdfae2 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ		0x010
+#define GHCB_MSR_GPA_VALUE_POS		12
+#define GHCB_MSR_GPA_VALUE_MASK		GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP		0x011
+#define GHCB_MSR_PREF_GPA_NONE		0xfffffffffffff
+
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
 #define GHCB_MSR_REG_GPA_REQ_VAL(v)			\
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e8740c35be39..2613311f4fcc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3424,6 +3424,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PREF_GPA_REQ: {
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
+	case GHCB_MSR_REG_GPA_REQ: {
+		u64 gfn;
+
+		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+					GHCB_MSR_GPA_VALUE_POS);
+
+		svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+		set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3490,6 +3511,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 
 	exit_code = ghcb_get_sw_exit_code(ghcb);
 
+	/* SEV-SNP guest requires that the GHCB GPA must be registered */
+	if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+		vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+		return -EINVAL;
+	}
+
 	ret = sev_es_validate_vmgexit(svm);
 	if (ret)
 		return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index cbd4594f1cca..0c655a4d32d5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -204,6 +204,8 @@ struct vcpu_sev_es_state {
 	u32 ghcb_sa_len;
 	bool ghcb_sa_sync;
 	bool ghcb_sa_free;
+
+	u64 ghcb_registered_gpa;
 };
 
 struct vcpu_svm {
@@ -336,6 +338,11 @@ static inline bool sev_snp_guest(struct kvm *kvm)
 	return sev_es_guest(kvm) && sev->snp_active;
 }
 
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+	return svm->sev_es.ghcb_registered_gpa == val;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 40/56] KVM: SVM: Add KVM_EXIT_VMGEXIT
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (38 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 39/56] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 41/56] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
                   ` (17 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

For private memslots, GHCB page state change requests will be forwarded
to userspace for processing. Define a new KVM_EXIT_VMGEXIT for exits of
this type.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 include/uapi/linux/kvm.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2bab08a5b5d7..6e684bf5f723 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -279,6 +279,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_MEMORY_FAULT     38
+#define KVM_EXIT_VMGEXIT          50
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -527,6 +528,11 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory;
+		/* KVM_EXIT_VMGEXIT */
+		struct {
+			__u64 ghcb_msr; /* GHCB MSR contents */
+			__u8 error; /* user -> kernel */
+		} vmgexit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 41/56] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (39 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 40/56] KVM: SVM: Add KVM_EXIT_VMGEXIT Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-24 15:06   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 42/56] KVM: SVM: Add support to handle " Michael Roth
                   ` (16 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

Forward these requests to userspace via KVM_EXIT_VMGEXIT so the VMM can
issue the KVM ioctls to update the page state accordingly.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/sev-common.h |  9 ++++++++
 arch/x86/kvm/svm/sev.c            | 25 +++++++++++++++++++++++
 arch/x86/kvm/trace.h              | 34 +++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                |  1 +
 4 files changed, 69 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 0a9055cdfae2..ee38f7408470 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -93,6 +93,10 @@ enum psc_op {
 };
 
 #define GHCB_MSR_PSC_REQ		0x014
+#define GHCB_MSR_PSC_GFN_POS		12
+#define GHCB_MSR_PSC_GFN_MASK		GENMASK_ULL(39, 0)
+#define GHCB_MSR_PSC_OP_POS		52
+#define GHCB_MSR_PSC_OP_MASK		0xf
 #define GHCB_MSR_PSC_REQ_GFN(gfn, op)			\
 	/* GHCBData[55:52] */				\
 	(((u64)((op) & 0xf) << 52) |			\
@@ -102,6 +106,11 @@ enum psc_op {
 	GHCB_MSR_PSC_REQ)
 
 #define GHCB_MSR_PSC_RESP		0x015
+#define GHCB_MSR_PSC_ERROR_POS		32
+#define GHCB_MSR_PSC_ERROR_MASK		GENMASK_ULL(31, 0)
+#define GHCB_MSR_PSC_ERROR		GENMASK_ULL(31, 0)
+#define GHCB_MSR_PSC_RSVD_POS		12
+#define GHCB_MSR_PSC_RSVD_MASK		GENMASK_ULL(19, 0)
 #define GHCB_MSR_PSC_RESP_VAL(val)			\
 	/* GHCBData[63:32] */				\
 	(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2613311f4fcc..a1a2686dde7b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -30,6 +30,7 @@
 #include "svm_ops.h"
 #include "cpuid.h"
 #include "trace.h"
+#include "mmu.h"
 
 #ifndef CONFIG_KVM_AMD_SEV
 /*
@@ -3345,6 +3346,23 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+/*
+ * TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr
+ * and process that here accordingly.
+ */
+static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	set_ghcb_msr_bits(svm, 0,
+			  GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
+
+	set_ghcb_msr_bits(svm, 0, GHCB_MSR_PSC_RSVD_MASK, GHCB_MSR_PSC_RSVD_POS);
+	set_ghcb_msr_bits(svm, GHCB_MSR_PSC_RESP, GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+
+	return 1; /* resume */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3445,6 +3463,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PSC_REQ:
+		vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+		vcpu->run->vmgexit.ghcb_msr = control->ghcb_gpa;
+		vcpu->arch.complete_userspace_io = snp_complete_psc_msr_protocol;
+
+		ret = -1;
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 83843379813e..65861d2d086c 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -7,6 +7,7 @@
 #include <asm/svm.h>
 #include <asm/clocksource.h>
 #include <asm/pvclock-abi.h>
+#include <asm/sev-common.h>
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm
@@ -1831,6 +1832,39 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
 		  __entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
 );
 
+/*
+ * Tracepoint for the SEV-SNP page state change processing
+ */
+#define psc_operation					\
+	{SNP_PAGE_STATE_PRIVATE, "private"},		\
+	{SNP_PAGE_STATE_SHARED,  "shared"}		\
+
+TRACE_EVENT(kvm_snp_psc,
+	TP_PROTO(unsigned int vcpu_id, u64 pfn, u64 gpa, u8 op, int level),
+	TP_ARGS(vcpu_id, pfn, gpa, op, level),
+
+	TP_STRUCT__entry(
+		__field(int, vcpu_id)
+		__field(u64, pfn)
+		__field(u64, gpa)
+		__field(u8, op)
+		__field(int, level)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu_id;
+		__entry->pfn = pfn;
+		__entry->gpa = gpa;
+		__entry->op = op;
+		__entry->level = level;
+	),
+
+	TP_printk("vcpu %u, pfn %llx, gpa %llx, op %s, level %d",
+		  __entry->vcpu_id, __entry->pfn, __entry->gpa,
+		  __print_symbolic(__entry->op, psc_operation),
+		  __entry->level)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 268c3d16894d..0154fc7a28c1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13515,6 +13515,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_snp_psc);
 
 static int __init kvm_x86_init(void)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 42/56] KVM: SVM: Add support to handle Page State Change VMGEXIT
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (40 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 41/56] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 43/56] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
                   ` (15 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/sev-common.h |  7 +++++++
 arch/x86/kvm/svm/sev.c            | 20 ++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index ee38f7408470..1b111cde8c82 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -130,6 +130,13 @@ enum psc_op {
 /* SNP Page State Change NAE event */
 #define VMGEXIT_PSC_MAX_ENTRY		253
 
+/* The page state change hdr structure in not valid */
+#define PSC_INVALID_HDR			1
+/* The hdr.cur_entry or hdr.end_entry is not valid */
+#define PSC_INVALID_ENTRY		2
+/* Page state change encountered undefined error */
+#define PSC_UNDEF_ERR			3
+
 struct psc_hdr {
 	u16 cur_entry;
 	u16 end_entry;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a1a2686dde7b..102966c43e28 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3152,6 +3152,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
+	case SVM_VMGEXIT_PSC:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3363,6 +3364,19 @@ static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
 	return 1; /* resume */
 }
 
+/*
+ * TODO: need to process the GHCB contents and report the proper error code
+ * instead of assuming success.
+ */
+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 0);
+
+	return 1;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3606,6 +3620,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_PSC:
+		/* Let userspace handling allocating/deallocating backing pages. */
+		vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+		vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
+		vcpu->arch.complete_userspace_io = snp_complete_psc;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 43/56] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (41 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 42/56] KVM: SVM: Add support to handle " Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 44/56] KVM: SVM: Add support to handle the RMP nested page fault Michael Roth
                   ` (14 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

While resolving the RMP page fault, there may be cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, zap the gfn range after
splitting the pages in the RMP entry. The zap should force the TDP to
gets rebuilt with the new page level.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/mmu.h              | 2 --
 arch/x86/kvm/mmu/mmu.c          | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d2e1c109dde5..28b01cc7f64d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1845,6 +1845,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
 
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..0afaf8ff2bb8 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -211,8 +211,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return -(u32)fault & errcode;
 }
 
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d8e5254f314d..d7847af3e177 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6615,6 +6615,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 
 	return need_tlb_flush;
 }
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
 
 static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 					   const struct kvm_memory_slot *slot)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 44/56] KVM: SVM: Add support to handle the RMP nested page fault
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (42 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 43/56] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-28 19:11   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
                   ` (13 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.

Page state changes are handled by userspace, so if an RMP fault is
triggered as a result of an RMP NPT fault, exit to userspace just like
with explicit page-state change requests.

RMP NPT faults can also occur if the guest pvalidates a 2M page as 4K,
in which case the RMP entries need to be PSMASH'd. Handle this case
immediately in the kernel.

Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kvm/svm/sev.c | 84 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c | 21 +++++++++--
 arch/x86/kvm/svm/svm.h |  1 +
 3 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 102966c43e28..197b1f904567 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3347,6 +3347,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
+{
+	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+	return psmash(pfn);
+}
+
 /*
  * TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr
  * and process that here accordingly.
@@ -3872,3 +3879,80 @@ void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *le
 	pr_debug("%s: GFN: 0x%llx, PFN: 0x%llx, level: %d, rmp_level: %d, level_orig: %d, assigned: %d\n",
 		 __func__, gfn, pfn, *level, rmp_level, level_orig, assigned);
 }
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+	int order, rmp_level, assigned, ret;
+	struct kvm_memory_slot *slot;
+	struct kvm *kvm = vcpu->kvm;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+
+	/*
+	 * Private memslots punt handling of implicit page state changes to
+	 * userspace, so the only RMP faults expected here for
+	 * PFERR_GUEST_SIZEM_MASK. Anything else suggests that the RMP table has
+	 * gotten out of sync with the private memslot.
+	 *
+	 * TODO: However, this case has also been noticed when an access occurs
+	 * to an NPT mapping that has just been split/PSMASHED, in which case
+	 * PFERR_GUEST_SIZEM_MASK might not be set. In those cases it should be
+	 * safe to ignore and let the guest retry, but log these just in case
+	 * for now.
+	 */
+	if (!(error_code & PFERR_GUEST_SIZEM_MASK)) {
+		pr_warn_ratelimited("Unexpected RMP fault for GPA 0x%llx, error_code 0x%llx",
+				    gpa, error_code);
+		return;
+	}
+
+	gfn = gpa >> PAGE_SHIFT;
+
+	/*
+	 * Only RMPADJUST/PVALIDATE should cause PFERR_GUEST_SIZEM.
+	 *
+	 * For PVALIDATE, this should only happen if a guest PVALIDATEs a 4K GFN
+	 * that is backed by a huge page in the host whose RMP entry has the
+	 * hugepage/assigned bits set. With UPM, that should only ever happen
+	 * for private pages.
+	 *
+	 * For RMPADJUST, this assumption might not hold, in which case handling
+	 * for obtaining the PFN from HVA-backed memory may be needed. For now,
+	 * just print warnings.
+	 */
+	if (!kvm_mem_is_private(kvm, gfn)) {
+		pr_warn_ratelimited("Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_warn_ratelimited("Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	ret = kvm_restrictedmem_get_pfn(slot, gfn, &pfn, &order);
+	if (ret) {
+		pr_warn_ratelimited("Unexpected RMP fault, no private backing page for GPA 0x%llx\n",
+				    gpa);
+		return;
+	}
+
+	assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+	if (assigned != 1) {
+		pr_warn_ratelimited("Unexpected RMP fault, no assigned RMP entry for GPA 0x%llx\n",
+				    gpa);
+		goto out;
+	}
+
+	ret = snp_rmptable_psmash(kvm, pfn);
+	if (ret)
+		pr_err_ratelimited("Unable to split RMP entries for GPA 0x%llx PFN 0x%llx ret %d\n",
+				   gpa, pfn, ret);
+
+out:
+	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+	put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9eb750c8b04c..f9ab4bf6d245 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1976,15 +1976,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 static int npf_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	int rc;
 
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
 	trace_kvm_page_fault(vcpu, fault_address, error_code);
-	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
-			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
-			svm->vmcb->control.insn_bytes : NULL,
-			svm->vmcb->control.insn_len);
+	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+				svm->vmcb->control.insn_bytes : NULL,
+				svm->vmcb->control.insn_len);
+
+	/*
+	 * rc == 0 indicates a userspace exit is needed to handle page
+	 * transitions, so do that first before updating the RMP table.
+	 */
+	if (error_code & PFERR_GUEST_RMP_MASK) {
+		if (rc == 0)
+			return rc;
+		handle_rmp_page_fault(vcpu, fault_address, error_code);
+	}
+
+	return rc;
 }
 
 static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0c655a4d32d5..13b00233b315 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -714,6 +714,7 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *level);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (43 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 44/56] KVM: SVM: Add support to handle the RMP nested page fault Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-24 11:01   ` Alexander Graf
                     ` (2 more replies)
  2023-02-20 18:38 ` [PATCH RFC v8 46/56] KVM: SVM: Use a VMSA physical address variable for populating VMCB Michael Roth
                   ` (12 subsequent siblings)
  57 siblings, 3 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of GHCB specification added the support for two SNP Guest
Request Message NAE events. The events allows for an SEV-SNP guest to
make request to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.

The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
difference of an additional certificate blob that can be passed through
the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
provides snp_guest_ext_guest_request() that is used by the KVM to get
both the report and certificate data at once.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 185 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.h |   2 +
 2 files changed, 181 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 197b1f904567..92179614102e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -327,6 +327,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		if (ret)
 			goto e_free;
 
+		mutex_init(&sev->guest_req_lock);
 		ret = sev_snp_init(&argp->error, false);
 	} else {
 		ret = sev_platform_init(&argp->error);
@@ -2059,23 +2060,34 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
  */
 static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 	struct sev_data_snp_addr data = {};
-	void *context;
+	void *context, *certs_data;
 	int rc;
 
+	/* Allocate memory used for the certs data in SNP guest request */
+	certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
+	if (!certs_data)
+		return NULL;
+
 	/* Allocate memory for context page */
 	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
 	if (!context)
-		return NULL;
+		goto e_free;
 
 	data.gctx_paddr = __psp_pa(context);
 	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
-	if (rc) {
-		snp_free_firmware_page(context);
-		return NULL;
-	}
+	if (rc)
+		goto e_free;
+
+	sev->snp_certs_data = certs_data;
 
 	return context;
+
+e_free:
+	snp_free_firmware_page(context);
+	kfree(certs_data);
+	return NULL;
 }
 
 static int snp_bind_asid(struct kvm *kvm, int *error)
@@ -2693,6 +2705,8 @@ static int snp_decommission_context(struct kvm *kvm)
 	snp_free_firmware_page(sev->snp_context);
 	sev->snp_context = NULL;
 
+	kfree(sev->snp_certs_data);
+
 	return 0;
 }
 
@@ -3153,6 +3167,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
 	case SVM_VMGEXIT_PSC:
+	case SVM_VMGEXIT_GUEST_REQUEST:
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3384,6 +3400,149 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
+					 struct sev_data_snp_guest_request *data,
+					 gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	kvm_pfn_t req_pfn, resp_pfn;
+	struct kvm_sev_info *sev;
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+		return SEV_RET_INVALID_PARAM;
+
+	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+	if (is_error_noslot_pfn(req_pfn))
+		return SEV_RET_INVALID_ADDRESS;
+
+	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+	if (is_error_noslot_pfn(resp_pfn))
+		return SEV_RET_INVALID_ADDRESS;
+
+	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+		return SEV_RET_INVALID_ADDRESS;
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+	data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+	return 0;
+}
+
+static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
+{
+	u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+	int ret;
+
+	ret = snp_page_reclaim(pfn);
+	if (ret)
+		*rc = SEV_RET_INVALID_ADDRESS;
+
+	ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+	if (ret)
+		*rc = SEV_RET_INVALID_ADDRESS;
+}
+
+static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct sev_data_snp_guest_request data = {0};
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_sev_info *sev;
+	unsigned long rc;
+	int err;
+
+	if (!sev_snp_guest(vcpu->kvm)) {
+		rc = SEV_RET_INVALID_GUEST;
+		goto e_fail;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	mutex_lock(&sev->guest_req_lock);
+
+	rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
+	if (rc)
+		goto unlock;
+
+	rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+	if (rc)
+		/* use the firmware error code */
+		rc = err;
+
+	snp_cleanup_guest_buf(&data, &rc);
+
+unlock:
+	mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
+}
+
+static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct sev_data_snp_guest_request req = {0};
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long data_npages;
+	struct kvm_sev_info *sev;
+	unsigned long rc, err;
+	u64 data_gpa;
+
+	if (!sev_snp_guest(vcpu->kvm)) {
+		rc = SEV_RET_INVALID_GUEST;
+		goto e_fail;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+	if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
+		rc = SEV_RET_INVALID_ADDRESS;
+		goto e_fail;
+	}
+
+	mutex_lock(&sev->guest_req_lock);
+
+	rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
+	if (rc)
+		goto unlock;
+
+	rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
+					 &data_npages, &err);
+	if (rc) {
+		/*
+		 * If buffer length is small then return the expected
+		 * length in rbx.
+		 */
+		if (err == SNP_GUEST_REQ_INVALID_LEN)
+			vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
+
+		/* pass the firmware error code */
+		rc = err;
+		goto cleanup;
+	}
+
+	/* Copy the certificate blob in the guest memory */
+	if (data_npages &&
+	    kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
+		rc = SEV_RET_INVALID_ADDRESS;
+
+cleanup:
+	snp_cleanup_guest_buf(&req, &rc);
+
+unlock:
+	mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3633,6 +3792,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
 		vcpu->arch.complete_userspace_io = snp_complete_psc;
 		break;
+	case SVM_VMGEXIT_GUEST_REQUEST: {
+		snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
+
+		ret = 1;
+		break;
+	}
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
+		snp_handle_ext_guest_request(svm,
+					     control->exit_info_1,
+					     control->exit_info_2);
+
+		ret = 1;
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 13b00233b315..4a9ffb7e5139 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -93,6 +93,8 @@ struct kvm_sev_info {
 	atomic_t migration_in_progress;
 	u64 snp_init_flags;
 	void *snp_context;      /* SNP guest context page */
+	void *snp_certs_data;
+	struct mutex guest_req_lock; /* Lock for guest request handling */
 };
 
 struct kvm_svm {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 46/56] KVM: SVM: Use a VMSA physical address variable for populating VMCB
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (44 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event Michael Roth
                   ` (11 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

From: Tom Lendacky <thomas.lendacky@amd.com>

In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 5 ++---
 arch/x86/kvm/svm/svm.c | 9 ++++++++-
 arch/x86/kvm/svm/svm.h | 1 +
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 92179614102e..6bec2712ecc6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3849,10 +3849,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 
 	/*
 	 * An SEV-ES guest requires a VMSA area that is a separate from the
-	 * VMCB page. Do not include the encryption mask on the VMSA physical
-	 * address since hardware will access it using the guest key.
+	 * VMCB page.
 	 */
-	svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
+	svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
 
 	/* Can't intercept CR register access, HV can't modify CR registers */
 	svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f9ab4bf6d245..745f736d9c98 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1410,9 +1410,16 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 	svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
 	svm_switch_vmcb(svm, &svm->vmcb01);
 
-	if (vmsa_page)
+	if (vmsa_page) {
 		svm->sev_es.vmsa = page_address(vmsa_page);
 
+		/*
+		 * Do not include the encryption mask on the VMSA physical
+		 * address since hardware will access it using the guest key.
+		 */
+		svm->sev_es.vmsa_pa = __pa(svm->sev_es.vmsa);
+	}
+
 	svm->guest_state_loaded = false;
 
 	return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4a9ffb7e5139..b6ca6657aa6c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -198,6 +198,7 @@ struct vcpu_sev_es_state {
 	struct sev_es_save_area *vmsa;
 	struct ghcb *ghcb;
 	struct kvm_host_map ghcb_map;
+	hpa_t vmsa_pa;
 	bool received_first_sipi;
 	unsigned int ap_reset_hold_type;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (45 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 46/56] KVM: SVM: Use a VMSA physical address variable for populating VMCB Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-24 12:37   ` Alexander Graf
  2023-02-20 18:38 ` [PATCH RFC v8 48/56] KVM: SVM: Add SNP-specific handling for memory attribute updates Michael Roth
                   ` (10 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID. The GPA is saved in the svm struct of the
  target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
  to the vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
  The guest supplies the GPA of the VMSA to be used for the vCPU with
  the specified APIC ID the next time an INIT is performed. The GPA is
  saved in the svm struct of the target vCPU.

For DESTROY:
  The guest indicates it wishes to stop the vCPU. The GPA is cleared
  from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
  added to vCPU and then the vCPU is kicked.

The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. The handler sets the
vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
leave the vCPU as not runnable. Any previous VMSA pages that were
installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
a new VMSA is to be installed, the VMSA guest page is pinned and set as
the VMSA in the vCPU VMCB and the vCPU state is set to
KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
cleared in the vCPU VMCB and the vCPU state is left as
KVM_MP_STATE_UNINITIALIZED to prevent it from being run.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: add handling for restrictedmem]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/include/asm/svm.h      |   7 +-
 arch/x86/kvm/svm/sev.c          | 245 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |   3 +
 arch/x86/kvm/svm/svm.h          |   7 +
 arch/x86/kvm/x86.c              |   9 ++
 6 files changed, 271 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 28b01cc7f64d..09b36462582c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -113,6 +113,7 @@
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(34)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index c18d78d5e505..e76ad26ba64f 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -278,7 +278,12 @@ enum avic_ipi_failure_cause {
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 #define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
 
-#define SVM_SEV_FEAT_SNP_ACTIVE		BIT(0)
+#define SVM_SEV_FEAT_SNP_ACTIVE			BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION	BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION	BIT(4)
+#define SVM_SEV_FEAT_INT_INJ_MODES		\
+	(SVM_SEV_FEAT_RESTRICTED_INJECTION |	\
+	 SVM_SEV_FEAT_ALTERNATE_INJECTION)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6bec2712ecc6..b2f1a12685ed 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -779,6 +779,7 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 {
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
 	struct sev_es_save_area *save = svm->sev_es.vmsa;
 
 	/* Check some debug related fields before encrypting the VMSA */
@@ -824,6 +825,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	if (sev_snp_guest(svm->vcpu.kvm))
 		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
 
+	/*
+	 * Save the VMSA synced SEV features. For now, they are the same for
+	 * all vCPUs, so just save each time.
+	 */
+	sev->sev_features = save->sev_features;
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
@@ -3161,6 +3168,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		if (!ghcb_sw_scratch_is_valid(ghcb))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		if (!ghcb_rax_is_valid(ghcb))
+			goto vmgexit_err;
+		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3543,6 +3554,226 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
 	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
 }
 
+static kvm_pfn_t gfn_to_pfn_restricted(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	kvm_pfn_t pfn;
+	int order = 0;
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_err("SEV: Failure retrieving restricted memslot for GFN 0x%llx, flags 0x%x, userspace_addr: 0x%lx\n",
+		       gfn, slot->flags, slot->userspace_addr);
+		return INVALID_PAGE;
+	}
+
+	if (!kvm_mem_is_private(kvm, gfn)) {
+		pr_err("SEV: Failure retrieving restricted PFN for GFN 0x%llx\n", gfn);
+		return INVALID_PAGE;
+	}
+
+	if (kvm_restrictedmem_get_pfn(slot, gfn, &pfn, &order)) {
+		pr_err("SEV: Failure retrieving restricted PFN for GFN 0x%llx\n", gfn);
+		return INVALID_PAGE;
+	}
+
+	put_page(pfn_to_page(pfn));
+
+	return pfn;
+}
+
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	kvm_pfn_t pfn;
+	hpa_t cur_pa;
+
+	WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+	/* Save off the current VMSA PA for later checks */
+	cur_pa = svm->sev_es.vmsa_pa;
+
+	/* Mark the vCPU as offline and not runnable */
+	vcpu->arch.pv.pv_unhalted = false;
+	vcpu->arch.mp_state = KVM_MP_STATE_STOPPED;
+
+	/* Clear use of the VMSA */
+	svm->sev_es.vmsa_pa = INVALID_PAGE;
+	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+	if (cur_pa != __pa(svm->sev_es.vmsa) && VALID_PAGE(cur_pa)) {
+		/*
+		 * The svm->sev_es.vmsa_pa field holds the hypervisor physical
+		 * address of the about to be replaced VMSA which will no longer
+		 * be used or referenced, so un-pin it. However, restricted
+		 * pages (e.g. via AP creation) should be left to the
+		 * restrictedmem backend to deal with, so don't release the
+		 * page in that case.
+		 */
+		if (!VALID_PAGE(gfn_to_pfn_restricted(vcpu->kvm,
+						      gpa_to_gfn(svm->sev_es.snp_vmsa_gpa))))
+			kvm_release_pfn_dirty(__phys_to_pfn(cur_pa));
+	}
+
+	if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+		/*
+		 * The VMSA is referenced by the hypervisor physical address,
+		 * so retrieve the PFN and ensure it is restricted memory.
+		 */
+		pfn = gfn_to_pfn_restricted(vcpu->kvm, gpa_to_gfn(svm->sev_es.snp_vmsa_gpa));
+		if (!VALID_PAGE(pfn))
+			return pfn;
+
+		/* Use the new VMSA */
+		svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
+		svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
+
+		/* Mark the vCPU as runnable */
+		vcpu->arch.pv.pv_unhalted = false;
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+		svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	}
+
+	/*
+	 * When replacing the VMSA during SEV-SNP AP creation,
+	 * mark the VMCB dirty so that full state is always reloaded.
+	 */
+	vmcb_mark_all_dirty(svm->vmcb);
+
+	return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return;
+
+	mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+	if (!svm->sev_es.snp_ap_create)
+		goto unlock;
+
+	svm->sev_es.snp_ap_create = false;
+
+	ret = __sev_snp_update_protected_guest_state(vcpu);
+	if (ret)
+		vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+	mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_vcpu *target_vcpu;
+	struct vcpu_svm *target_svm;
+	unsigned int request;
+	unsigned int apic_id;
+	bool kick;
+	int ret;
+
+	request = lower_32_bits(svm->vmcb->control.exit_info_1);
+	apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+	/* Validate the APIC ID */
+	target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+	if (!target_vcpu) {
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+			    apic_id);
+		return -EINVAL;
+	}
+
+	ret = 0;
+
+	target_svm = to_svm(target_vcpu);
+
+	/*
+	 * The target vCPU is valid, so the vCPU will be kicked unless the
+	 * request is for CREATE_ON_INIT. For any errors at this stage, the
+	 * kick will place the vCPU in an non-runnable state.
+	 */
+	kick = true;
+
+	mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+	target_svm->sev_es.snp_ap_create = true;
+
+	/* Interrupt injection mode shouldn't change for AP creation */
+	if (request < SVM_VMGEXIT_AP_DESTROY) {
+		u64 sev_features;
+
+		sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+		sev_features ^= sev->sev_features;
+		if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+				    vcpu->arch.regs[VCPU_REGS_RAX]);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	switch (request) {
+	case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+		kick = false;
+		fallthrough;
+	case SVM_VMGEXIT_AP_CREATE:
+		if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		/*
+		 * Malicious guest can RMPADJUST a large page into VMSA which
+		 * will hit the SNP erratum where the CPU will incorrectly signal
+		 * an RMP violation #PF if a hugepage collides with the RMP entry
+		 * of VMSA page, reject the AP CREATE request if VMSA address from
+		 * guest is 2M aligned.
+		 */
+		if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+			vcpu_unimpl(vcpu,
+				    "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+		break;
+	case SVM_VMGEXIT_AP_DESTROY:
+		break;
+	default:
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+			    request);
+		ret = -EINVAL;
+		break;
+	}
+
+out:
+	if (kick) {
+		if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+			target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+		kvm_vcpu_kick(target_vcpu);
+	}
+
+	mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+	return ret;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3806,6 +4037,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_AP_CREATION:
+		ret = sev_snp_ap_creation(svm);
+		if (ret) {
+			ghcb_set_sw_exit_info_1(ghcb, 1);
+			ghcb_set_sw_exit_info_2(ghcb,
+						X86_TRAP_GP |
+						SVM_EVTINJ_TYPE_EXEPT |
+						SVM_EVTINJ_VALID);
+		}
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3910,6 +4153,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
 	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
 					    GHCB_VERSION_MIN,
 					    sev_enc_bit));
+
+	mutex_init(&svm->sev_es.snp_vmsa_mutex);
 }
 
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 745f736d9c98..539926b07ee5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1349,6 +1349,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	svm->spec_ctrl = 0;
 	svm->virt_spec_ctrl = 0;
 
+	if (init_event)
+		sev_snp_init_protected_guest_state(vcpu);
+
 	init_vmcb(vcpu);
 
 	if (!init_event)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index b6ca6657aa6c..37bd7b728d52 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -95,6 +95,8 @@ struct kvm_sev_info {
 	void *snp_context;      /* SNP guest context page */
 	void *snp_certs_data;
 	struct mutex guest_req_lock; /* Lock for guest request handling */
+
+	u64 sev_features;	/* Features set at VMSA creation */
 };
 
 struct kvm_svm {
@@ -209,6 +211,10 @@ struct vcpu_sev_es_state {
 	bool ghcb_sa_free;
 
 	u64 ghcb_registered_gpa;
+
+	struct mutex snp_vmsa_mutex;
+	gpa_t snp_vmsa_gpa;
+	bool snp_ap_create;
 };
 
 struct vcpu_svm {
@@ -718,6 +724,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *level);
 void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0154fc7a28c1..9872217e3a06 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10501,6 +10501,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+		if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+			kvm_vcpu_reset(vcpu, true);
+			if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
+				goto out;
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -12698,6 +12704,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 		return true;
 #endif
 
+	if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+		return true;
+
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
 	    kvm_guest_apic_has_interrupt(vcpu)))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 48/56] KVM: SVM: Add SNP-specific handling for memory attribute updates
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (46 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-01 23:37   ` Dave Hansen
  2023-02-20 18:38 ` [PATCH RFC v8 49/56] KVM: SVM: Implement .fault_is_private callback for SNP Michael Roth
                   ` (9 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

This will handle RMP table updates and direct map changes needed for
page state conversions requested by userspace.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 126 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |   1 +
 arch/x86/kvm/svm/svm.h |   2 +
 3 files changed, 129 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b2f1a12685ed..73d614c538da 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3381,6 +3381,31 @@ static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
 	return psmash(pfn);
 }
 
+static int snp_make_page_shared(struct kvm *kvm, gpa_t gpa, kvm_pfn_t pfn, int level)
+{
+	int rc, rmp_level;
+
+	rc = snp_lookup_rmpentry(pfn, &rmp_level);
+	if (rc < 0)
+		return -EINVAL;
+
+	/* If page is not assigned then do nothing */
+	if (!rc)
+		return 0;
+
+	/*
+	 * Is the page part of an existing 2MB RMP entry ? Split the 2MB into
+	 * multiple of 4K-page before making the memory shared.
+	 */
+	if (level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M) {
+		rc = snp_rmptable_psmash(kvm, pfn);
+		if (rc)
+			return rc;
+	}
+
+	return rmp_make_shared(pfn, level);
+}
+
 /*
  * TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr
  * and process that here accordingly.
@@ -4373,3 +4398,104 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
 	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
 	put_page(pfn_to_page(pfn));
 }
+
+static inline u8 order_to_level(int order)
+{
+	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
+		return PG_LEVEL_1G;
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,
+			gfn_t start, gfn_t end)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(slot->kvm)->sev_info;
+	enum psc_op op = (attr & KVM_MEMORY_ATTRIBUTE_PRIVATE) ? SNP_PAGE_STATE_PRIVATE
+							       : SNP_PAGE_STATE_SHARED;
+	gfn_t gfn = start;
+
+	pr_debug("%s: GFN 0x%llx - 0x%llx, op: %d\n", __func__, start, end, op);
+
+	if (!sev_snp_guest(slot->kvm))
+		return 0;
+
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_err_ratelimited("%s: memslot for gfn: 0x%llx is not private.\n",
+				   __func__, gfn);
+		return -EPERM;
+	}
+
+	while (gfn < end) {
+		kvm_pfn_t pfn;
+		int level = PG_LEVEL_4K; /* TODO: take actual order into account */
+		gpa_t gpa = gfn_to_gpa(gfn);
+		int npages = 1;
+		int order;
+		int rc;
+
+		/*
+		 * No work to do if there was never a page allocated from private
+		 * memory. If there was a page that was deallocated previously,
+		 * the invalidation notifier should have restored the page to
+		 * shared.
+		 */
+		rc = kvm_restrictedmem_get_pfn(slot, gfn, &pfn, &order);
+		if (rc) {
+			pr_warn_ratelimited("%s: failed to retrieve gfn 0x%llx from private FD\n",
+					    __func__, gfn);
+			gfn++;
+			continue;
+		}
+
+		/*
+		 * TODO: The RMP entry's hugepage bit is ignored for
+		 * shared/unassigned pages. Either handle looping through each
+		 * sub-page as part of snp_make_page_shared(), or remove the
+		 * level argument.
+		 */
+		if (op == SNP_PAGE_STATE_PRIVATE && order &&
+		    IS_ALIGNED(gfn, 1 << order) && (gfn + (1 << order)) <= end) {
+			level = order_to_level(order);
+			npages = 1 << order;
+		}
+
+		/*
+		 * Grab the PFN from private memslot and update the RMP entry.
+		 * It may be worthwhile to go ahead and map it into the TDP at
+		 * this point if the guest is doing lazy acceptance, but for
+		 * up-front bulk shared->private conversions it's not likely
+		 * the guest will try to access the PFN any time soon, so for
+		 * now just take the let KVM MMU handle faulting it on the next
+		 * access.
+		 */
+		switch (op) {
+		case SNP_PAGE_STATE_SHARED:
+			rc = snp_make_page_shared(slot->kvm, gpa, pfn, level);
+			break;
+		case SNP_PAGE_STATE_PRIVATE:
+			rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
+			break;
+		default:
+			rc = PSC_INVALID_ENTRY;
+			break;
+		}
+
+		put_page(pfn_to_page(pfn));
+
+		if (rc) {
+			pr_err_ratelimited("%s: failed op %d gpa %llx pfn %llx level %d rc %d\n",
+					   __func__, op, gpa, pfn, level, rc);
+			return -EINVAL;
+		}
+
+		gfn += npages;
+	}
+
+	return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 539926b07ee5..e2edc4700e55 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4860,6 +4860,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 
 	.adjust_mapping_level = sev_adjust_mapping_level,
+	.update_mem_attr = sev_update_mem_attr,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 37bd7b728d52..50a2bcaf3fd7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -725,6 +725,8 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *level);
 void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,
+			gfn_t start, gfn_t end);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 49/56] KVM: SVM: Implement .fault_is_private callback for SNP
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (47 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 48/56] KVM: SVM: Add SNP-specific handling for memory attribute updates Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 50/56] KVM: SEV: Handle restricted memory invalidations " Michael Roth
                   ` (8 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

KVM MMU will use this to determine whether an #NPF should be serviced
with restricted memory or not.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 10 ++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  2 ++
 3 files changed, 13 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 73d614c538da..7a74a92cb39a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4499,3 +4499,13 @@ int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,
 
 	return 0;
 }
+
+bool sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault)
+{
+	if (!sev_snp_guest(kvm))
+		return false;
+
+	*private_fault = (error_code & PFERR_GUEST_ENC_MASK) ? true : false;
+
+	return true;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e2edc4700e55..18e4a6c17d11 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4861,6 +4861,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.adjust_mapping_level = sev_adjust_mapping_level,
 	.update_mem_attr = sev_update_mem_attr,
+	.fault_is_private = sev_fault_is_private,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 50a2bcaf3fd7..97038afa8020 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -728,6 +728,8 @@ void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,
 			gfn_t start, gfn_t end);
 
+bool sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
+
 /* vmenter.S */
 
 void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 50/56] KVM: SEV: Handle restricted memory invalidations for SNP
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (48 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 49/56] KVM: SVM: Implement .fault_is_private callback for SNP Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-01 10:41   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 51/56] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
                   ` (7 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

Implement a platform hook to do the work of restoring the direct map
entries and cleaning up RMP table entries for restricted memory that is
being freed back to the host.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 62 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  1 +
 arch/x86/kvm/svm/svm.h |  1 +
 3 files changed, 64 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7a74a92cb39a..bedec90d034f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4509,3 +4509,65 @@ bool sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *priv
 
 	return true;
 }
+
+void sev_invalidate_private_range(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
+{
+	gfn_t gfn = start;
+
+	if (!sev_snp_guest(slot->kvm))
+		return;
+
+	if (!kvm_slot_can_be_private(slot)) {
+		pr_warn_ratelimited("SEV: Memslot for GFN: 0x%llx is not private.\n",
+				    gfn);
+		return;
+	}
+
+	while (gfn <= end) {
+		gpa_t gpa = gfn_to_gpa(gfn);
+		int level = PG_LEVEL_4K;
+		int order, rc;
+		kvm_pfn_t pfn;
+
+		rc = kvm_restrictedmem_get_pfn(slot, gfn, &pfn, &order);
+		if (rc) {
+			pr_warn_ratelimited("SEV: Failed to retrieve restricted PFN for GFN 0x%llx, rc: %d\n",
+					    gfn, rc);
+			gfn++;
+			continue;
+		}
+
+		if (order) {
+			int rmp_level;
+
+			if (IS_ALIGNED(gpa, page_level_size(PG_LEVEL_2M)) &&
+			    gpa + page_level_size(PG_LEVEL_2M) <= gfn_to_gpa(end))
+				level = PG_LEVEL_2M;
+			else
+				pr_debug("%s: GPA 0x%llx is not aligned to 2M, skipping 2M directmap restoration\n",
+					 __func__, gpa);
+
+			/*
+			 * TODO: It may still be possible to restore 2M mapping here,
+			 * but keep it simple for now.
+			 */
+			if (level == PG_LEVEL_2M &&
+			    (!snp_lookup_rmpentry(pfn, &rmp_level) || rmp_level == PG_LEVEL_4K)) {
+				pr_debug("%s: PFN 0x%llx is not mapped as 2M private range, skipping 2M directmap restoration\n",
+					 __func__, pfn);
+				level = PG_LEVEL_4K;
+			}
+		}
+
+		pr_debug("%s: GPA %llx PFN %llx order %d level %d\n",
+			 __func__, gpa, pfn, order, level);
+		rc = snp_make_page_shared(slot->kvm, gpa, pfn, level);
+		if (rc)
+			pr_err("SEV: Failed to restore page to shared, GPA: 0x%llx PFN: 0x%llx order: %d rc: %d\n",
+			       gpa, pfn, order, rc);
+
+		gfn += page_level_size(level) >> PAGE_SHIFT;
+		put_page(pfn_to_page(pfn));
+		cond_resched();
+	}
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 18e4a6c17d11..3fe5f13b5f3a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4862,6 +4862,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.adjust_mapping_level = sev_adjust_mapping_level,
 	.update_mem_attr = sev_update_mem_attr,
 	.fault_is_private = sev_fault_is_private,
+	.invalidate_restricted_mem = sev_invalidate_private_range,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 97038afa8020..857b674e68f0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -727,6 +727,7 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,
 			gfn_t start, gfn_t end);
+void sev_invalidate_private_range(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
 
 bool sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 51/56] KVM: SVM: Add module parameter to enable the SEV-SNP
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (49 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 50/56] KVM: SEV: Handle restricted memory invalidations " Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-01 10:45   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 52/56] ccp: Add support to decrypt the page Michael Roth
                   ` (6 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c | 7 ++++---
 arch/x86/kvm/svm/svm.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bedec90d034f..70d5650d8d95 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -55,14 +55,15 @@ module_param_named(sev, sev_enabled, bool, 0444);
 /* enable/disable SEV-ES support */
 static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
 #else
 #define sev_enabled false
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
-/* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
-
 #define AP_RESET_HOLD_NONE		0
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 857b674e68f0..221b38d3c845 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -694,7 +694,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 #define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
-#define GHCB_HV_FT_SUPPORTED	0
+#define GHCB_HV_FT_SUPPORTED	(GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)
 
 extern unsigned int max_sev_asid;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 52/56] ccp: Add support to decrypt the page
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (50 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 51/56] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-01 21:20   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 53/56] KVM: SVM: Make VMSAVE target area memory allocation SNP safe Michael Roth
                   ` (5 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Add support to decrypt guest encrypted memory. These API interfaces can
be used for example to dump VMCBs on SNP guest exit.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: minor commit fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 32 ++++++++++++++++++++++++++++++++
 include/linux/psp-sev.h      | 22 ++++++++++++++++++++--
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index e65563bc8298..bf5167b2acfc 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2017,6 +2017,38 @@ int sev_guest_df_flush(int *error)
 }
 EXPORT_SYMBOL_GPL(sev_guest_df_flush);
 
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+	struct sev_data_snp_dbg data = {0};
+	struct sev_device *sev;
+	int ret;
+
+	if (!psp_master || !psp_master->sev_data)
+		return -ENODEV;
+
+	sev = psp_master->sev_data;
+
+	if (!sev->snp_initialized)
+		return -EINVAL;
+
+	data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
+	data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
+	data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);
+
+	/* The destination page must be in the firmware state. */
+	if (rmp_mark_pages_firmware(data.dst_addr, 1, false))
+		return -EIO;
+
+	ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);
+
+	/* Restore the page state */
+	if (snp_reclaim_pages(data.dst_addr, 1, false))
+		ret = -EIO;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
+
 int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
 				unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
 {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 81bafc049eca..92116e2b74fd 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -710,7 +710,6 @@ struct sev_data_snp_dbg {
 	u64 gctx_paddr;				/* In */
 	u64 src_addr;				/* In */
 	u64 dst_addr;				/* In */
-	u32 len;				/* In */
 } __packed;
 
 /**
@@ -913,13 +912,27 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
  * @error: SEV command return code
  *
  * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
+/**
+ * snp_guest_dbg_decrypt_page - perform SEV SNP_DBG_DECRYPT command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
  * 0 if the SEV successfully processed the command
  * -%ENODEV    if the SEV device is not available
  * -%ENOTSUPP  if the SEV does not support SEV
  * -%ETIMEDOUT if the SEV command timed out
  * -%EIO       if the SEV returned a non-zero return code
  */
-int sev_do_cmd(int cmd, void *data, int *psp_ret);
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error);
 
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 void *snp_alloc_firmware_page(gfp_t mask);
@@ -987,6 +1000,11 @@ static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_P
 
 void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}
 
+static inline int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+	return -ENODEV;
+}
+
 static inline void *snp_alloc_firmware_page(gfp_t mask)
 {
 	return NULL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 53/56] KVM: SVM: Make VMSAVE target area memory allocation SNP safe
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (51 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 52/56] ccp: Add support to decrypt the page Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-01 21:23   ` Zhi Wang
  2023-02-20 18:38 ` [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs Michael Roth
                   ` (4 subsequent siblings)
  57 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

From: Ashish Kalra <ashish.kalra@amd.com>

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of the VMSAVE target page.

When SEV-SNP is globally enabled, the CPU marks the VMSAVE target page
as "InUse" while the VMSAVE instruction is executing. If another
CPU writes to a different page in the same 2MB region while the VMSAVE
is executing, the CPU will throw an RMP violation #PF.

Use the snp safe generic allocator for allocating the VMSA target
page which will ensure that the page returned is not a hugepage, as it
is already being used for the allocating the VMCB, VMSA and AVIC backing
page.

Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Reported-by: Alper Gun <alpergun@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3fe5f13b5f3a..8bda31a61757 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -665,7 +665,7 @@ static int svm_cpu_init(int cpu)
 	int ret = -ENOMEM;
 
 	memset(sd, 0, sizeof(struct svm_cpu_data));
-	sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	sd->save_area = snp_safe_alloc_page(NULL);
 	if (!sd->save_area)
 		return ret;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (52 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 53/56] KVM: SVM: Make VMSAVE target area memory allocation SNP safe Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-21 12:40   ` Dov Murik
                     ` (2 more replies)
  2023-02-20 18:38 ` [PATCH RFC v8 55/56] x86/sev: Document KVM_SEV_SNP_{G,S}ET_CERTS Michael Roth
                   ` (3 subsequent siblings)
  57 siblings, 3 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Dionna Glaze, Tom Lendacky

From: Dionna Glaze <dionnaglaze@google.com>

The /dev/sev device has the ability to store host-wide certificates for
the key used by the AMD-SP for SEV-SNP attestation report signing,
but for hosts that want to specify additional certificates that are
specific to the image launched in a VM, a different way is needed to
communicate those certificates.

Add two new KVM ioctl to handle this: KVM_SEV_SNP_{GET,SET}_CERTS

The certificates that are set with this command are expected to follow
the same format as the host certificates, but that format is opaque
to the kernel.

The new behavior for custom certificates is that the extended guest
request command will now return the overridden certificates if they
were installed for the instance. The error condition for a too small
data buffer is changed to return the overridden certificate data size
if there is an overridden certificate set installed.

Setting a 0 length certificate returns the system state to only return
the host certificates on an extended guest request.

Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
for an extra certificate.

Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: remove used of "we" and "this patch" in commit log]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c   | 111 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h   |   1 +
 include/linux/psp-sev.h  |   2 +-
 include/uapi/linux/kvm.h |  12 +++++
 4 files changed, 123 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 70d5650d8d95..18b64b7005e7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2089,6 +2089,7 @@ static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		goto e_free;
 
 	sev->snp_certs_data = certs_data;
+	sev->snp_certs_len = 0;
 
 	return context;
 
@@ -2404,6 +2405,86 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_snp_get_certs params;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			   sizeof(params)))
+		return -EFAULT;
+
+	/* No instance certs set. */
+	if (!sev->snp_certs_len)
+		return -ENOENT;
+
+	if (params.certs_len < sev->snp_certs_len) {
+		/* Output buffer too small. Return the required size. */
+		params.certs_len = sev->snp_certs_len;
+
+		if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
+				 sizeof(params)))
+			return -EFAULT;
+
+		return -EINVAL;
+	}
+
+	if (copy_to_user((void __user *)(uintptr_t)params.certs_uaddr,
+			 sev->snp_certs_data, sev->snp_certs_len))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	unsigned long length = SEV_FW_BLOB_MAX_SIZE;
+	void *to_certs = sev->snp_certs_data;
+	struct kvm_sev_snp_set_certs params;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			   sizeof(params)))
+		return -EFAULT;
+
+	if (params.certs_len > SEV_FW_BLOB_MAX_SIZE)
+		return -EINVAL;
+
+	/*
+	 * Setting a length of 0 is the same as "uninstalling" instance-
+	 * specific certificates.
+	 */
+	if (params.certs_len == 0) {
+		sev->snp_certs_len = 0;
+		return 0;
+	}
+
+	/* Page-align the length */
+	length = (params.certs_len + PAGE_SIZE - 1) & PAGE_MASK;
+
+	if (copy_from_user(to_certs,
+			   (void __user *)(uintptr_t)params.certs_uaddr,
+			   params.certs_len)) {
+		return -EFAULT;
+	}
+
+	sev->snp_certs_len = length;
+
+	return 0;
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2503,6 +2584,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_FINISH:
 		r = snp_launch_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_GET_CERTS:
+		r = snp_get_instance_certs(kvm, &sev_cmd);
+		break;
+	case KVM_SEV_SNP_SET_CERTS:
+		r = snp_set_instance_certs(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -3550,8 +3637,28 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
 	if (rc)
 		goto unlock;
 
-	rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
-					 &data_npages, &err);
+	/*
+	 * If the VMM has overridden the certs, then change the error message
+	 * if the size is inappropriate for the override. Otherwise, use a
+	 * regular guest request and copy back the instance certs.
+	 */
+	if (sev->snp_certs_len) {
+		if ((data_npages << PAGE_SHIFT) < sev->snp_certs_len) {
+			rc = -EINVAL;
+			err = SNP_GUEST_REQ_INVALID_LEN;
+			goto datalen;
+		}
+		rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
+				   (int *)&err);
+	} else {
+		rc = snp_guest_ext_guest_request(&req,
+						 (unsigned long)sev->snp_certs_data,
+						 &data_npages, &err);
+	}
+datalen:
+	if (sev->snp_certs_len)
+		data_npages = sev->snp_certs_len >> PAGE_SHIFT;
+
 	if (rc) {
 		/*
 		 * If buffer length is small then return the expected
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 221b38d3c845..dced46559508 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -94,6 +94,7 @@ struct kvm_sev_info {
 	u64 snp_init_flags;
 	void *snp_context;      /* SNP guest context page */
 	void *snp_certs_data;
+	unsigned int snp_certs_len; /* Size of instance override for certs */
 	struct mutex guest_req_lock; /* Lock for guest request handling */
 
 	u64 sev_features;	/* Features set at VMSA creation */
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 92116e2b74fd..3b28b78938f6 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -22,7 +22,7 @@
 #define __psp_pa(x)	__pa(x)
 #endif
 
-#define SEV_FW_BLOB_MAX_SIZE	0x4000	/* 16KB */
+#define SEV_FW_BLOB_MAX_SIZE	0x5000	/* 20KB */
 
 /**
  * SEV platform state
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6e684bf5f723..ad7e24e43547 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1928,6 +1928,8 @@ enum sev_cmd_id {
 	KVM_SEV_SNP_LAUNCH_START,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
 	KVM_SEV_SNP_LAUNCH_FINISH,
+	KVM_SEV_SNP_GET_CERTS,
+	KVM_SEV_SNP_SET_CERTS,
 
 	KVM_SEV_NR_MAX,
 };
@@ -2075,6 +2077,16 @@ struct kvm_sev_snp_launch_finish {
 	__u8 pad[6];
 };
 
+struct kvm_sev_snp_get_certs {
+	__u64 certs_uaddr;
+	__u64 certs_len;
+};
+
+struct kvm_sev_snp_set_certs {
+	__u64 certs_uaddr;
+	__u64 certs_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 55/56] x86/sev: Document KVM_SEV_SNP_{G,S}ET_CERTS
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (53 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-02-20 18:38 ` [PATCH RFC v8 56/56] iommu/amd: Add IOMMU_SNP_SHUTDOWN support Michael Roth
                   ` (2 subsequent siblings)
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Dionna Glaze, Thomas Lendacky

From: Dionna Glaze <dionnaglaze@google.com>

Update the KVM_MEMORY_ENCRYPT_OP documentation to include the new
commands for overriding the host certificates that the guest receives
from an extended guest request.

Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    | 44 +++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index dafb0c9984f1..153003ff2c51 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -537,6 +537,50 @@ Returns: 0 on success, -negative on error
 
 See SEV-SNP specification for further details on launch finish input parameters.
 
+22. KVM_SEV_SNP_GET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_GET_CERTS command
+can be issued to request the data that has been installed with the
+KVM_SEV_SNP_SET_CERTS command.
+
+Parameters (in/out): struct kvm_sev_snp_get_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+	struct kvm_sev_snp_get_certs {
+		__u64 certs_uaddr;
+		__u64 certs_len
+	};
+
+If no certs have been installed, then the return value is -ENOENT.
+If the buffer specified in the struct is too small, the certs_len field will be
+overwritten with the required bytes to receive all the certificate bytes and the
+return value will be -EINVAL.
+
+23. KVM_SEV_SNP_SET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_SET_CERTS command
+can be issued to override the /dev/sev certs data that is returned when a
+guest issues an extended guest request. This is useful for instance-specific
+extensions to the host certificates.
+
+Parameters (in/out): struct kvm_sev_snp_set_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+	struct kvm_sev_snp_set_certs {
+		__u64 certs_uaddr;
+		__u64 certs_len
+	};
+
+The certs_len field may not exceed SEV_FW_BLOB_MAX_SIZE.
+
 References
 ==========
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH RFC v8 56/56] iommu/amd: Add IOMMU_SNP_SHUTDOWN support
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (54 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 55/56] x86/sev: Document KVM_SEV_SNP_{G,S}ET_CERTS Michael Roth
@ 2023-02-20 18:38 ` Michael Roth
  2023-03-01 16:56 ` [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Dave Hansen
  2023-08-03 18:27 ` Schander, Johanna 'Mimoja' Amelie
  57 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 18:38 UTC (permalink / raw)
  To: kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

From: Ashish Kalra <ashish.kalra@amd.com>

Add a new IOMMU API interface amd_iommu_snp_disable() to transition
IOMMU pages to Hypervisor state from Reclaim state after SNP_SHUTDOWN_EX
command. Invoke this API from the CCP driver after SNP_SHUTDOWN_EX
command.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 20 ++++++++++++++
 drivers/iommu/amd/init.c     | 53 ++++++++++++++++++++++++++++++++++++
 include/linux/amd-iommu.h    |  1 +
 3 files changed, 74 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index bf5167b2acfc..7ded2f9111e0 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -24,6 +24,7 @@
 #include <linux/cpufeature.h>
 #include <linux/fs.h>
 #include <linux/fs_struct.h>
+#include <linux/amd-iommu.h>
 
 #include <asm/smp.h>
 #include <asm/e820/types.h>
@@ -1503,6 +1504,25 @@ static int __sev_snp_shutdown_locked(int *error)
 		return ret;
 	}
 
+	/*
+	 * SNP_SHUTDOWN_EX with IOMMU_SNP_SHUTDOWN set to 1 disables SNP
+	 * enforcement by the IOMMU and also transitions all pages
+	 * associated with the IOMMU to the Reclaim state.
+	 * Firmware was transitioning the IOMMU pages to Hypervisor state
+	 * before version 1.53. But, accounting for the number of assigned
+	 * 4kB pages in a 2M page was done incorrectly by not transitioning
+	 * to the Reclaim state. This resulted in RMP #PF when later accessing
+	 * the 2M page containing those pages during kexec boot. Hence, the
+	 * firmware now transitions these pages to Reclaim state and hypervisor
+	 * needs to transition these pages to shared state. SNP Firmware
+	 * version 1.53 and above are needed for kexec boot.
+	 */
+	ret = amd_iommu_snp_disable();
+	if (ret) {
+		dev_err(sev->dev, "SNP IOMMU shutdown failed\n");
+		return ret;
+	}
+
 	sev->snp_initialized = false;
 	dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1a2d425bf568..d1270e3c5baf 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -30,6 +30,7 @@
 #include <asm/io_apic.h>
 #include <asm/irq_remapping.h>
 #include <asm/set_memory.h>
+#include <asm/sev.h>
 
 #include <linux/crash_dump.h>
 
@@ -3651,4 +3652,56 @@ int amd_iommu_snp_enable(void)
 
 	return 0;
 }
+
+static int iommu_page_make_shared(void *page)
+{
+	unsigned long pfn;
+
+	pfn = iommu_virt_to_phys(page) >> PAGE_SHIFT;
+	return rmp_make_shared(pfn, PG_LEVEL_4K);
+}
+
+static int iommu_make_shared(void *va, size_t size)
+{
+	void *page;
+	int ret;
+
+	if (!va)
+		return 0;
+
+	for (page = va; page < (va + size); page += PAGE_SIZE) {
+		ret = iommu_page_make_shared(page);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int amd_iommu_snp_disable(void)
+{
+	struct amd_iommu *iommu;
+	int ret;
+
+	if (!amd_iommu_snp_en)
+		return 0;
+
+	for_each_iommu(iommu) {
+		ret = iommu_make_shared(iommu->evt_buf, EVT_BUFFER_SIZE);
+		if (ret)
+			return ret;
+
+		ret = iommu_make_shared(iommu->ppr_log, PPR_LOG_SIZE);
+		if (ret)
+			return ret;
+
+		ret = iommu_make_shared((void *)iommu->cmd_sem, PAGE_SIZE);
+		if (ret)
+			return ret;
+	}
+
+	amd_iommu_snp_en = false;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(amd_iommu_snp_disable);
 #endif
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 953e6f12fa1c..a1b33b838842 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -208,6 +208,7 @@ struct amd_iommu *get_amd_iommu(unsigned int idx);
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 int amd_iommu_snp_enable(void);
+int amd_iommu_snp_disable(void);
 #endif
 
 #endif /* _ASM_X86_AMD_IOMMU_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 18/56] x86/fault: fix handle_split_page_fault() to work with memfd backed pages
  2023-02-20 18:38 ` [PATCH RFC v8 18/56] x86/fault: fix handle_split_page_fault() to work with memfd backed pages Michael Roth
@ 2023-02-20 19:57   ` Hugh Dickins
  2023-02-20 20:31     ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Hugh Dickins @ 2023-02-20 19:57 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Hugh Dickins

On Mon, 20 Feb 2023, Michael Roth wrote:

> From: Hugh Dickins <hughd@google.com>

No.

> 
> When the address is backed by a memfd, the code to split the page does
> nothing more than remove the PMD from the page tables. So immediately
> install a PTE to ensure that any other pages in that 2MB region are
> brought back as in 4K pages.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

No.  Suggested-by would be okay.

> Cc: Hugh Dickins <hughd@google.com>

Thanks.  I'm really sorry to be such a jobsworth,
and have nothing more constructive to say than I did before in
https://lore.kernel.org/linux-mm/7f2228c4-1586-2934-7b92-1a9d23b6046@google.com/
(please re-read), but adding a Signed-off-by where none was given is wrong;
and I'm not ever going to comprehend enough of the context to give it.

Best of luck for the series,
Hugh

> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  mm/memory.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e68da7e403c6..33c9020ba1f8 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4999,6 +4999,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>  static int handle_split_page_fault(struct vm_fault *vmf)
>  {
>  	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> +	/*
> +	 * Install a PTE immediately to ensure that any other pages in
> +	 * this 2MB region are brought back in as 4K pages.
> +	 */
> +	__pte_alloc(vmf->vma->vm_mm, vmf->pmd);
>  	return 0;
>  }
>  
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 11/56] x86/sev: Add the host SEV-SNP initialization support
  2023-02-20 18:38 ` [PATCH RFC v8 11/56] x86/sev: Add the host SEV-SNP initialization support Michael Roth
@ 2023-02-20 20:12   ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-20 20:12 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:02 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
> track the owner of each page of memory. Pages of memory can be owned by
> the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
> section 15.36.3 for more detail on RMP.
> 
> The RMP table is used to enforce access control to memory. The table
> itself is not directly writable by the software. New CPU instructions
> (RMPUPDATE, PVALIDATE, RMPADJUST) are used to manipulate the RMP
> entries.
> 
> Based on the platform configuration, the BIOS reserves the memory used
> for the RMP table. The start and end address of the RMP table must be
> queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
> RMP_END are not set then disable the SEV-SNP feature.
> 
> The SEV-SNP feature is enabled only after the RMP table is successfully
> initialized.
> 
> Also set SYSCFG.MFMD when enabling SNP as SEV-SNP FW >= 1.51 requires
> that SYSCFG.MFMD must be se
> 
                           ^ unfinished sentence.
> RMP table entry format is non-architectural and it can vary by processor
> and is defined by the PPR. Restrict SNP support on the known CPU model
> and family for which the RMP table entry format is currently defined
> for.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/disabled-features.h |   8 +-
>  arch/x86/include/asm/msr-index.h         |  11 +-
>  arch/x86/kernel/sev.c                    | 175 +++++++++++++++++++++++
>  3 files changed, 192 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 33d2cd04d254..9b5a2cc8064a 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -87,6 +87,12 @@
>  # define DISABLE_TDX_GUEST	(1 << (X86_FEATURE_TDX_GUEST & 31))
>  #endif
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +# define DISABLE_SEV_SNP	0
> +#else
> +# define DISABLE_SEV_SNP	(1 << (X86_FEATURE_SEV_SNP & 31))
> +#endif
> +
>  /*
>   * Make sure to add features to the correct mask
>   */
> @@ -110,7 +116,7 @@
>  			 DISABLE_ENQCMD)
>  #define DISABLED_MASK17	0
>  #define DISABLED_MASK18	0
> -#define DISABLED_MASK19	0
> +#define DISABLED_MASK19	(DISABLE_SEV_SNP)
>  #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
>  
>  #endif /* _ASM_X86_DISABLED_FEATURES_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 10ac52705892..35100c630617 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -565,6 +565,8 @@
>  #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
>  #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
>  #define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE		0xc0010132
> +#define MSR_AMD64_RMP_END		0xc0010133
>  
>  #define MSR_AMD64_VIRT_SPEC_CTRL	0xc001011f
>  
> @@ -649,7 +651,14 @@
>  #define MSR_K8_TOP_MEM2			0xc001001d
>  #define MSR_AMD64_SYSCFG		0xc0010010
>  #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT		BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
> +#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> +#define MSR_AMD64_SYSCFG_MFDM_BIT		19
> +#define MSR_AMD64_SYSCFG_MFDM			BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
						^ an extra tab?
> +
>  #define MSR_K8_INT_PENDING_MSG		0xc0010055
>  /* C1E active bits in int pending message */
>  #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index a428c62330d3..e54e412c9916 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -22,6 +22,9 @@
>  #include <linux/efi.h>
>  #include <linux/platform_device.h>
>  #include <linux/io.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
> +#include <linux/amd-iommu.h>
>  
>  #include <asm/cpu_entry_area.h>
>  #include <asm/stacktrace.h>
> @@ -38,6 +41,7 @@
>  #include <asm/apic.h>
>  #include <asm/cpuid.h>
>  #include <asm/cmdline.h>
> +#include <asm/iommu.h>
>  
>  #define DR7_RESET_VALUE        0x400
>  
> @@ -57,6 +61,12 @@
>  #define AP_INIT_CR0_DEFAULT		0x60000010
>  #define AP_INIT_MXCSR_DEFAULT		0x1f80
>  
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range needs to be added during the RMP entry lookup.
> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
> +
>  /* For early boot hypervisor communication in SEV-ES enabled guests */
>  static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
>  
> @@ -69,6 +79,9 @@ static struct ghcb *boot_ghcb __section(".data");
>  /* Bitmap of SEV features supported by the hypervisor */
>  static u64 sev_hv_features __ro_after_init;
>  
> +static unsigned long rmptable_start __ro_after_init;
> +static unsigned long rmptable_end __ro_after_init;
> +
>  /* #VC handler runtime per-CPU data */
>  struct sev_es_runtime_data {
>  	struct ghcb ghcb_page;
> @@ -2260,3 +2273,165 @@ static int __init snp_init_platform_device(void)
>  	return 0;
>  }
>  device_initcall(snp_init_platform_device);
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt)	"SEV-SNP: " fmt
> +
> +static int __mfd_enable(unsigned int cpu)
> +{
> +	u64 val;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return 0;
> +
> +	rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	val |= MSR_AMD64_SYSCFG_MFDM;
> +
> +	wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	return 0;
> +}
> +
> +static __init void mfd_enable(void *arg)
> +{
> +	__mfd_enable(smp_processor_id());
> +}
> +
> +static int __snp_enable(unsigned int cpu)
> +{
> +	u64 val;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return 0;
> +
> +	rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	val |= MSR_AMD64_SYSCFG_SNP_EN;
> +	val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> +	wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> +	__snp_enable(smp_processor_id());
> +}
> +
> +static bool get_rmptable_info(u64 *start, u64 *len)
> +{
> +	u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
> +
> +	rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> +	rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> +	if (!rmp_base || !rmp_end) {
> +		pr_err("Memory for the RMP table has not been reserved by BIOS\n");
> +		return false;
> +	}
> +
> +	rmp_sz = rmp_end - rmp_base + 1;
> +
> +	/*
> +	 * Calculate the amount the memory that must be reserved by the BIOS to
> +	 * address the whole RAM. The reserved memory should also cover the
> +	 * RMP table itself.
> +	 */
> +	calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + totalram_pages()) << 4)
> +		      + RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> +	if (calc_rmp_sz > rmp_sz) {
> +		pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> +		       calc_rmp_sz, rmp_sz);
> +		return false;
> +	}
> +
> +	*start = rmp_base;
> +	*len = rmp_sz;
> +
> +	pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n", rmp_base, rmp_end);
> +
> +	return true;
> +}
> +
> +static __init int snp_rmptable_init(void)
> +{
> +	u64 rmp_base, sz;
> +	void *start;
> +	u64 val;
> +
> +	if (!get_rmptable_info(&rmp_base, &sz))
> +		return 1;
> +
> +	start = memremap(rmp_base, sz, MEMREMAP_WB);
> +	if (!start) {
> +		pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, sz);
> +		return 1;
> +	}
> +
> +	/*
> +	 * Check if SEV-SNP is already enabled, this can happen in case of
> +	 * kexec boot.
> +	 */
> +	rdmsrl(MSR_AMD64_SYSCFG, val);
> +	if (val & MSR_AMD64_SYSCFG_SNP_EN)
> +		goto skip_enable;
> +
> +	memset(start, 0, sz);
> +
> +	/* Flush the caches to ensure that data is written before SNP is enabled. */
> +	wbinvd_on_all_cpus();
> +
> +	/* MFDM must be enabled on all the CPUs prior to enabling SNP. */
> +	on_each_cpu(mfd_enable, NULL, 1);
> +
> +	/* Enable SNP on all CPUs. */
> +	on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> +	rmptable_start = (unsigned long)start;
> +	rmptable_end = rmptable_start + sz - 1;
> +
> +	return 0;
> +}
> +
> +static int __init snp_host_init(void)
> +{
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return 0;
> +
> +	/*
> +	 * RMP table entry format is not architectural and it can vary by processor and
> +	 * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> +	 * model and family for which the RMP table entry format is currently defined for.
> +	 */
> +	if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xaf)
> +		goto nosnp;
> +
> +	if (amd_iommu_snp_enable())
> +		goto nosnp;
> +
> +	if (snp_rmptable_init())
> +		goto nosnp;
> +
> +	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> +	return 0;
> +
> +nosnp:
> +	setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> +	return -ENODEV;
> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
> + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
> + * called after subsys_initcall().
> + *
> + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
> + * directly into guest private memory. In case of SNP, the IOMMU ensures that
> + * the page(s) used for DMA are hypervisor owned.
> + */
> +fs_initcall(snp_host_init);


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 18/56] x86/fault: fix handle_split_page_fault() to work with memfd backed pages
  2023-02-20 19:57   ` Hugh Dickins
@ 2023-02-20 20:31     ` Michael Roth
  0 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-02-20 20:31 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On Mon, Feb 20, 2023 at 11:57:59AM -0800, Hugh Dickins wrote:
> On Mon, 20 Feb 2023, Michael Roth wrote:
> 
> > From: Hugh Dickins <hughd@google.com>
> 
> No.
> 
> > 
> > When the address is backed by a memfd, the code to split the page does
> > nothing more than remove the PMD from the page tables. So immediately
> > install a PTE to ensure that any other pages in that 2MB region are
> > brought back as in 4K pages.
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> 
> No.  Suggested-by would be okay.
> 
> > Cc: Hugh Dickins <hughd@google.com>
> 
> Thanks.  I'm really sorry to be such a jobsworth,
> and have nothing more constructive to say than I did before in
> https://lore.kernel.org/linux-mm/7f2228c4-1586-2934-7b92-1a9d23b6046@google.com/
> (please re-read), but adding a Signed-off-by where none was given is wrong;
> and I'm not ever going to comprehend enough of the context to give it.

Hi Hugh,

Sorry for the mix-up, I'd intended to remove this patch before submitting. I'll
make sure to remove it from future postings.

> 
> Best of luck for the series,

Thank you!

-Mike

> Hugh
> 
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  mm/memory.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index e68da7e403c6..33c9020ba1f8 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4999,6 +4999,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> >  static int handle_split_page_fault(struct vm_fault *vmf)
> >  {
> >  	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> > +	/*
> > +	 * Install a PTE immediately to ensure that any other pages in
> > +	 * this 2MB region are brought back in as 4K pages.
> > +	 */
> > +	__pte_alloc(vmf->vma->vm_mm, vmf->pmd);
> >  	return 0;
> >  }
> >  
> > -- 
> > 2.25.1

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 19/56] x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling.
  2023-02-20 18:38 ` [PATCH RFC v8 19/56] x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling Michael Roth
@ 2023-02-20 21:13   ` Zhi Wang
  2023-02-28 10:53   ` Wu Zongyong
  1 sibling, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-20 21:13 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On Mon, 20 Feb 2023 12:38:10 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Return pfn from dump_pagetable() to do SEV-specific
> fault handling. Used for handling SNP RMP page fault.
> 

It would be better to merge this patch with PATCH 13.

> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/mm/fault.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index afd4cde17001..f2b16dcfbd9a 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -311,7 +311,7 @@ static bool low_pfn(unsigned long pfn)
>  	return pfn < max_low_pfn;
>  }
>  
> -static void dump_pagetable(unsigned long address)
> +static unsigned long dump_pagetable(unsigned long address)
>  {
>  	pgd_t *base = __va(read_cr3_pa());
>  	pgd_t *pgd = &base[pgd_index(address)];
> @@ -345,8 +345,10 @@ static void dump_pagetable(unsigned long address)
>  
>  	pte = pte_offset_kernel(pmd, address);
>  	pr_cont("*pte = %0*Lx ", sizeof(*pte) * 2, (u64)pte_val(*pte));
> +	return 0;
>  out:
>  	pr_cont("\n");
> +	return 0;
>  }
>  
>  #else /* CONFIG_X86_64: */
> @@ -367,10 +369,11 @@ static int bad_address(void *p)
>  	return get_kernel_nofault(dummy, (unsigned long *)p);
>  }
>  
> -static void dump_pagetable(unsigned long address)
> +static unsigned long dump_pagetable(unsigned long address)
>  {
>  	pgd_t *base = __va(read_cr3_pa());
>  	pgd_t *pgd = base + pgd_index(address);
> +	unsigned long pfn;
>  	p4d_t *p4d;
>  	pud_t *pud;
>  	pmd_t *pmd;
> @@ -388,6 +391,7 @@ static void dump_pagetable(unsigned long address)
>  	if (bad_address(p4d))
>  		goto bad;
>  
> +	pfn = p4d_pfn(*p4d);
>  	pr_cont("P4D %lx ", p4d_val(*p4d));
>  	if (!p4d_present(*p4d) || p4d_large(*p4d))
>  		goto out;
> @@ -396,6 +400,7 @@ static void dump_pagetable(unsigned long address)
>  	if (bad_address(pud))
>  		goto bad;
>  
> +	pfn = pud_pfn(*pud);
>  	pr_cont("PUD %lx ", pud_val(*pud));
>  	if (!pud_present(*pud) || pud_large(*pud))
>  		goto out;
> @@ -404,6 +409,7 @@ static void dump_pagetable(unsigned long address)
>  	if (bad_address(pmd))
>  		goto bad;
>  
> +	pfn = pmd_pfn(*pmd);
>  	pr_cont("PMD %lx ", pmd_val(*pmd));
>  	if (!pmd_present(*pmd) || pmd_large(*pmd))
>  		goto out;
> @@ -412,13 +418,14 @@ static void dump_pagetable(unsigned long address)
>  	if (bad_address(pte))
>  		goto bad;
>  
> +	pfn = pte_pfn(*pte);
>  	pr_cont("PTE %lx", pte_val(*pte));
>  out:
>  	pr_cont("\n");
> -
> -	return;
> +	return pfn;
>  bad:
>  	pr_info("BAD\n");
> +	return -1;
>  }
>  
>  #endif /* CONFIG_X86_64 */


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 04/56] KVM: Add HVA range operator
  2023-02-20 18:37 ` [PATCH RFC v8 04/56] KVM: Add HVA range operator Michael Roth
@ 2023-02-20 21:37   ` Zhi Wang
  2023-03-27  0:34     ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-02-20 21:37 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Vishal Annapurve

On Mon, 20 Feb 2023 12:37:55 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Vishal Annapurve <vannapurve@google.com>
> 
> Introduce HVA range operator so that other KVM subsystems
> can operate on HVA range.
> 
> Signed-off-by: Vishal Annapurve <vannapurve@google.com>
> [mdr: minor checkpatch alignment fixups]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  include/linux/kvm_host.h |  6 +++++
>  virt/kvm/kvm_main.c      | 48 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 54 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 4d542060cd93..c615650ed256 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1402,6 +1402,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm);
>  void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
>  void kvm_mmu_invalidate_end(struct kvm *kvm);
>  
> +typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
> +				struct kvm_gfn_range *range, void *data);
> +
> +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
> +			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
> +
>  long kvm_arch_dev_ioctl(struct file *filp,
>  			unsigned int ioctl, unsigned long arg);
>  long kvm_arch_vcpu_ioctl(struct file *filp,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f7e00593cc5d..4ccd655dd5af 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -642,6 +642,54 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
>  	return (int)ret;
>  }
>  

Below function seems a reduced duplicate of __kvm_handle_hva_range()
in virt/kvm/kvm_main.c. It would be nice to factor __kvm_handle_hva_range().

> +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
> +			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
> +{
> +	int ret = 0;
> +	struct kvm_gfn_range gfn_range;
> +	struct kvm_memory_slot *slot;
> +	struct kvm_memslots *slots;
> +	int i, idx;
> +
> +	if (WARN_ON_ONCE(hva_end <= hva_start))
> +		return -EINVAL;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +
> +	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> +		struct interval_tree_node *node;
> +
> +		slots = __kvm_memslots(kvm, i);
> +		kvm_for_each_memslot_in_hva_range(node, slots,
> +						  hva_start, hva_end - 1) {
> +			unsigned long start, end;
> +
> +			slot = container_of(node, struct kvm_memory_slot,
> +					    hva_node[slots->node_idx]);
> +			start = max(hva_start, slot->userspace_addr);
> +			end = min(hva_end, slot->userspace_addr +
> +						  (slot->npages << PAGE_SHIFT));
> +
> +			/*
> +			 * {gfn(page) | page intersects with [hva_start, hva_end)} =
> +			 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
> +			 */
> +			gfn_range.start = hva_to_gfn_memslot(start, slot);
> +			gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
> +			gfn_range.slot = slot;
> +
> +			ret = handler(kvm, &gfn_range, data);
> +			if (ret)
> +				goto e_ret;
> +		}
> +	}
> +
> +e_ret:
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return ret;
> +}
> +
>  static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
>  						unsigned long start,
>  						unsigned long end,


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 24/56] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2023-02-20 18:38 ` [PATCH RFC v8 24/56] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
@ 2023-02-21  9:28   ` Zhi Wang
  2023-02-21 15:31     ` Kalra, Ashish
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-02-21  9:28 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:15 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The behavior and requirement for the SEV-legacy command is altered when
> the SNP firmware is in the INIT state. See SEV-SNP firmware specification
> for more details.
> 
> Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
> when SNP is enabled to satisfy new requirements for the SNP. Continue
> allocating a 1mb region for !SNP configuration.
> 
> While at it, provide API that can be used by others to allocate a page
> that can be used by the firmware. The immediate user for this API will
> be the KVM driver. The KVM driver to need to allocate a firmware context
> page during the guest creation. The context page need to be updated
> by the firmware. See the SEV-SNP specification for further details.
> 
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c | 148 +++++++++++++++++++++++++++++++++--
>  include/linux/psp-sev.h      |   9 +++
>  2 files changed, 149 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index eca4e59b0f44..4c12e98a1219 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -94,6 +94,13 @@ static void *sev_init_ex_buffer;
>   */
>  struct sev_data_range_list *snp_range_list;
>  
> +/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
> +#define SEV_SNP_ES_TMR_SIZE	(2 * 1024 * 1024)

It would be better to re-use the kernel size definition macros. E.g. SZ_2MB.

> +
> +static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
> +
> +static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
> +
>  static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
>  {
>  	struct sev_device *sev = psp_master->sev_data;
> @@ -216,11 +223,134 @@ void snp_mark_pages_offline(unsigned long pfn, unsigned int npages)
>  }
>  EXPORT_SYMBOL_GPL(snp_mark_pages_offline);
>  
> +static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
> +{
> +	/* Cbit maybe set in the paddr */

This is confusing.

I suppose C-bit is treated as a attribute of PTE in the kernel not part of the
PA. It means only a PTE might carry a C-bit. 

The paddr is from __pa(page_address()). It is not extracted from a PTE. Thus, the
return from them should never have a C-bit.

BTW: Wouldn't it be better to have pfn as input param instead of paddr?

The caller has struct page, calling snp_reclaim_pages(page_to_pfn(page), xxxxx)
would be much clearer than the current conversion:
page_address() (struct page is converted to VA), __pa() (VA is converted to PA)
in the caller and then PA is converted to pfn here.

> +	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> +	int ret, err, i, n = 0;
> +

should be unsigned int i, n; as the input param npage is unsigned int.

> +	if (!pfn_valid(pfn)) {
> +		pr_err("%s: Invalid PFN %lx\n", __func__, pfn);
> +		return 0;
> +	}
> +
> +	for (i = 0; i < npages; i++, pfn++, n++) {
> +		paddr = pfn << PAGE_SHIFT;
> +
> +		if (locked)
> +			ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
> +		else
> +			ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
> +
> +		if (ret)
> +			goto cleanup;
> +
> +		ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> +		if (ret)
> +			goto cleanup;
> +	}
> +
> +	return 0;
> +
> +cleanup:
> +	/*
> +	 * If failed to reclaim the page then page is no longer safe to
> +	 * be release back to the system, leak it.
> +	 */
> +	snp_mark_pages_offline(pfn, npages - n);
> +	return ret;
> +}
> +
> +static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)

The same comment as above. Better take pfn or page instead of paddr with
redundant conversions.

> +{
> +	/* Cbit maybe set in the paddr */
> +	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> +	int rc, n = 0, i;
> +
> +	for (i = 0; i < npages; i++, n++, pfn++) {
> +		rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
> +		if (rc)
> +			goto cleanup;
> +	}
> +
> +	return 0;
> +
> +cleanup:
> +	/*
> +	 * Try unrolling the firmware state changes by
> +	 * reclaiming the pages which were already changed to the
> +	 * firmware state.
> +	 */
> +	snp_reclaim_pages(paddr, n, locked);
> +
> +	return rc;
> +}
> +
> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
> +{
> +	unsigned long npages = 1ul << order, paddr;
> +	struct sev_device *sev;
> +	struct page *page;
> +
> +	if (!psp_master || !psp_master->sev_data)
> +		return NULL;
> +
> +	page = alloc_pages(gfp_mask, order);
> +	if (!page)
> +		return NULL;
> +
> +	/* If SEV-SNP is initialized then add the page in RMP table. */
> +	sev = psp_master->sev_data;
> +	if (!sev->snp_initialized)
> +		return page;
> +
> +	paddr = __pa((unsigned long)page_address(page));
> +	if (rmp_mark_pages_firmware(paddr, npages, locked))
> +		return NULL;
> +
> +	return page;
> +}
> +
> +void *snp_alloc_firmware_page(gfp_t gfp_mask)
> +{
> +	struct page *page;
> +
> +	page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> +
> +	return page ? page_address(page) : NULL;
> +}
> +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> +
> +static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
> +{
> +	struct sev_device *sev = psp_master->sev_data;
> +	unsigned long paddr, npages = 1ul << order;
> +
> +	if (!page)
> +		return;
> +
> +	paddr = __pa((unsigned long)page_address(page));
> +	if (sev->snp_initialized &&
> +	    snp_reclaim_pages(paddr, npages, locked))
> +		return;
> +
> +	__free_pages(page, order);
> +}
> +
> +void snp_free_firmware_page(void *addr)
> +{
> +	if (!addr)
> +		return;
> +
> +	__snp_free_firmware_pages(virt_to_page(addr), 0, false);
> +}
> +EXPORT_SYMBOL_GPL(snp_free_firmware_page);
> +
>  static void *sev_fw_alloc(unsigned long len)
>  {
>  	struct page *page;
>  
> -	page = alloc_pages(GFP_KERNEL, get_order(len));
> +	page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
>  	if (!page)
>  		return NULL;
>  
> @@ -468,7 +598,7 @@ static int __sev_init_locked(int *error)
>  		data.tmr_address = __pa(sev_es_tmr);
>  
>  		data.flags |= SEV_INIT_FLAGS_SEV_ES;
> -		data.tmr_len = SEV_ES_TMR_SIZE;
> +		data.tmr_len = sev_es_tmr_size;
>  	}
>  
>  	return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
> @@ -491,7 +621,7 @@ static int __sev_init_ex_locked(int *error)
>  		data.tmr_address = __pa(sev_es_tmr);
>  
>  		data.flags |= SEV_INIT_FLAGS_SEV_ES;
> -		data.tmr_len = SEV_ES_TMR_SIZE;
> +		data.tmr_len = sev_es_tmr_size;
>  	}
>  
>  	return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
> @@ -982,6 +1112,8 @@ static int __sev_snp_init_locked(int *error)
>  	sev->snp_initialized = true;
>  	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
>  
> +	sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
> +
>  	return rc;
>  }
>  
> @@ -1499,8 +1631,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
>  		/* The TMR area was encrypted, flush it from the cache */
>  		wbinvd_on_all_cpus();
>  
> -		free_pages((unsigned long)sev_es_tmr,
> -			   get_order(SEV_ES_TMR_SIZE));
> +		__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
> +					  get_order(sev_es_tmr_size),
> +					  false);
>  		sev_es_tmr = NULL;
>  	}
>  
> @@ -1511,8 +1644,7 @@ static void sev_firmware_shutdown(struct sev_device *sev)
>  	}
>  
>  	if (snp_range_list) {
> -		free_pages((unsigned long)snp_range_list,
> -			   get_order(PAGE_SIZE));
> +		snp_free_firmware_page(snp_range_list);
>  		snp_range_list = NULL;
>  	}
>  
> @@ -1593,7 +1725,7 @@ void sev_pci_init(void)
>  	}
>  
>  	/* Obtain the TMR memory area for SEV-ES use */
> -	sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
> +	sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
>  	if (!sev_es_tmr)
>  		dev_warn(sev->dev,
>  			 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 8edf5c548fbf..d19744807471 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -922,6 +922,8 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
>  int sev_do_cmd(int cmd, void *data, int *psp_ret);
>  
>  void *psp_copy_user_blob(u64 uaddr, u32 len);
> +void *snp_alloc_firmware_page(gfp_t mask);
> +void snp_free_firmware_page(void *addr);
>  
>  /**
>   * sev_mark_pages_offline - insert non-reclaimed firmware/guest pages
> @@ -959,6 +961,13 @@ static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_P
>  
>  void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}
>  
> +static inline void *snp_alloc_firmware_page(gfp_t mask)
> +{
> +	return NULL;
> +}
> +
> +static inline void snp_free_firmware_page(void *addr) { }
> +
>  #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
>  
>  #endif	/* __PSP_SEV_H__ */


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs
  2023-02-20 18:38 ` [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs Michael Roth
@ 2023-02-21 12:40   ` Dov Murik
  2023-03-02  0:02   ` Zhi Wang
  2023-03-02 11:34   ` Dov Murik
  2 siblings, 0 replies; 147+ messages in thread
From: Dov Murik @ 2023-02-21 12:40 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, Dionna Glaze,
	Dov Murik

Hi Mike,

On 20/02/2023 20:38, Michael Roth wrote:
> From: Dionna Glaze <dionnaglaze@google.com>
> 
> The /dev/sev device has the ability to store host-wide certificates for
> the key used by the AMD-SP for SEV-SNP attestation report signing,
> but for hosts that want to specify additional certificates that are
> specific to the image launched in a VM, a different way is needed to
> communicate those certificates.
> 
> Add two new KVM ioctl to handle this: KVM_SEV_SNP_{GET,SET}_CERTS
> 
> The certificates that are set with this command are expected to follow
> the same format as the host certificates, but that format is opaque
> to the kernel.
> 
> The new behavior for custom certificates is that the extended guest
> request command will now return the overridden certificates if they
> were installed for the instance. The error condition for a too small
> data buffer is changed to return the overridden certificate data size
> if there is an overridden certificate set installed.
> 
> Setting a 0 length certificate returns the system state to only return
> the host certificates on an extended guest request.
> 
> Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
> for an extra certificate.
> 
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> 
> Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: remove used of "we" and "this patch" in commit log]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c   | 111 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/svm/svm.h   |   1 +
>  include/linux/psp-sev.h  |   2 +-
>  include/uapi/linux/kvm.h |  12 +++++
>  4 files changed, 123 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 70d5650d8d95..18b64b7005e7 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2089,6 +2089,7 @@ static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  		goto e_free;
>  
>  	sev->snp_certs_data = certs_data;
> +	sev->snp_certs_len = 0;
>  
>  	return context;
>  
> @@ -2404,6 +2405,86 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return ret;
>  }
>  
> +static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct kvm_sev_snp_get_certs params;
> +
> +	if (!sev_snp_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (!sev->snp_context)
> +		return -EINVAL;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			   sizeof(params)))
> +		return -EFAULT;
> +
> +	/* No instance certs set. */
> +	if (!sev->snp_certs_len)
> +		return -ENOENT;
> +
> +	if (params.certs_len < sev->snp_certs_len) {
> +		/* Output buffer too small. Return the required size. */
> +		params.certs_len = sev->snp_certs_len;
> +
> +		if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> +				 sizeof(params)))
> +			return -EFAULT;
> +
> +		return -EINVAL;
> +	}
> +
> +	if (copy_to_user((void __user *)(uintptr_t)params.certs_uaddr,
> +			 sev->snp_certs_data, sev->snp_certs_len))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	unsigned long length = SEV_FW_BLOB_MAX_SIZE;
> +	void *to_certs = sev->snp_certs_data;
> +	struct kvm_sev_snp_set_certs params;
> +
> +	if (!sev_snp_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (!sev->snp_context)
> +		return -EINVAL;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			   sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.certs_len > SEV_FW_BLOB_MAX_SIZE)
> +		return -EINVAL;
> +
> +	/*
> +	 * Setting a length of 0 is the same as "uninstalling" instance-
> +	 * specific certificates.
> +	 */
> +	if (params.certs_len == 0) {
> +		sev->snp_certs_len = 0;
> +		return 0;
> +	}
> +
> +	/* Page-align the length */
> +	length = (params.certs_len + PAGE_SIZE - 1) & PAGE_MASK;
> +

In comments on v7 [1] Dionna agreed adding cleanup here:

  /* The size could shrink and leave garbage at the end. */
  memset(sev->snp_certs_data, 0, SEV_FW_BLOB_MAX_SIZE);

(we can use 'to_certs' in the first argument)

but it wasn't added in v8.

-Dov

[1] https://lore.kernel.org/linux-coco/CAAH4kHYOtzgqSTZQFcRiZwPLCkLAThjsCMdjUCdsBTiP=W0Vxw@mail.gmail.com/


> +	if (copy_from_user(to_certs,
> +			   (void __user *)(uintptr_t)params.certs_uaddr,
> +			   params.certs_len)) {
> +		return -EFAULT;
> +	}
> +
> +	sev->snp_certs_len = length;
> +
> +	return 0;
> +}
> +
>  int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -2503,6 +2584,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_SNP_LAUNCH_FINISH:
>  		r = snp_launch_finish(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_SNP_GET_CERTS:
> +		r = snp_get_instance_certs(kvm, &sev_cmd);
> +		break;
> +	case KVM_SEV_SNP_SET_CERTS:
> +		r = snp_set_instance_certs(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> @@ -3550,8 +3637,28 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
>  	if (rc)
>  		goto unlock;
>  
> -	rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
> -					 &data_npages, &err);
> +	/*
> +	 * If the VMM has overridden the certs, then change the error message
> +	 * if the size is inappropriate for the override. Otherwise, use a
> +	 * regular guest request and copy back the instance certs.
> +	 */
> +	if (sev->snp_certs_len) {
> +		if ((data_npages << PAGE_SHIFT) < sev->snp_certs_len) {
> +			rc = -EINVAL;
> +			err = SNP_GUEST_REQ_INVALID_LEN;
> +			goto datalen;
> +		}
> +		rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
> +				   (int *)&err);
> +	} else {
> +		rc = snp_guest_ext_guest_request(&req,
> +						 (unsigned long)sev->snp_certs_data,
> +						 &data_npages, &err);
> +	}
> +datalen:
> +	if (sev->snp_certs_len)
> +		data_npages = sev->snp_certs_len >> PAGE_SHIFT;
> +
>  	if (rc) {
>  		/*
>  		 * If buffer length is small then return the expected
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 221b38d3c845..dced46559508 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -94,6 +94,7 @@ struct kvm_sev_info {
>  	u64 snp_init_flags;
>  	void *snp_context;      /* SNP guest context page */
>  	void *snp_certs_data;
> +	unsigned int snp_certs_len; /* Size of instance override for certs */
>  	struct mutex guest_req_lock; /* Lock for guest request handling */
>  
>  	u64 sev_features;	/* Features set at VMSA creation */
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 92116e2b74fd..3b28b78938f6 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -22,7 +22,7 @@
>  #define __psp_pa(x)	__pa(x)
>  #endif
>  
> -#define SEV_FW_BLOB_MAX_SIZE	0x4000	/* 16KB */
> +#define SEV_FW_BLOB_MAX_SIZE	0x5000	/* 20KB */
>  
>  /**
>   * SEV platform state
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6e684bf5f723..ad7e24e43547 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1928,6 +1928,8 @@ enum sev_cmd_id {
>  	KVM_SEV_SNP_LAUNCH_START,
>  	KVM_SEV_SNP_LAUNCH_UPDATE,
>  	KVM_SEV_SNP_LAUNCH_FINISH,
> +	KVM_SEV_SNP_GET_CERTS,
> +	KVM_SEV_SNP_SET_CERTS,
>  
>  	KVM_SEV_NR_MAX,
>  };
> @@ -2075,6 +2077,16 @@ struct kvm_sev_snp_launch_finish {
>  	__u8 pad[6];
>  };
>  
> +struct kvm_sev_snp_get_certs {
> +	__u64 certs_uaddr;
> +	__u64 certs_len;
> +};
> +
> +struct kvm_sev_snp_set_certs {
> +	__u64 certs_uaddr;
> +	__u64 certs_len;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 24/56] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2023-02-21  9:28   ` Zhi Wang
@ 2023-02-21 15:31     ` Kalra, Ashish
  2023-02-21 21:15       ` Zhi Wang
  0 siblings, 1 reply; 147+ messages in thread
From: Kalra, Ashish @ 2023-02-21 15:31 UTC (permalink / raw)
  To: Zhi Wang, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, nikunj.dadhania, Brijesh Singh

>> +static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
>> +{
>> +	/* Cbit maybe set in the paddr */
> 
> This is confusing.
> 
> I suppose C-bit is treated as a attribute of PTE in the kernel not part of the
> PA. It means only a PTE might carry a C-bit.
> 

snp_reclaim_pages() is also called for reclaiming guest memory, in which 
case the (guest) paddr will have the C-bit set. Hence this C-bit 
handling is done within snp_reclaim_pages() so that the callers don't 
need to handle it explicitly.


> The paddr is from __pa(page_address()). It is not extracted from a PTE. Thus, the
> return from them should never have a C-bit.
> 
> BTW: Wouldn't it be better to have pfn as input param instead of paddr?
> 
> The caller has struct page, calling snp_reclaim_pages(page_to_pfn(page), xxxxx)
> would be much clearer than the current conversion:
> page_address() (struct page is converted to VA), __pa() (VA is converted to PA)
> in the caller and then PA is converted to pfn here.
> 
>> +	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
>> +	int ret, err, i, n = 0;
>> +
> 
> should be unsigned int i, n; as the input param npage is unsigned int.
> 
>> +	if (!pfn_valid(pfn)) {
>> +		pr_err("%s: Invalid PFN %lx\n", __func__, pfn);
>> +		return 0;
>> +	}
>> +
>> +	for (i = 0; i < npages; i++, pfn++, n++) {
>> +		paddr = pfn << PAGE_SHIFT;
>> +
>> +		if (locked)
>> +			ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
>> +		else
>> +			ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
>> +
>> +		if (ret)
>> +			goto cleanup;
>> +
>> +		ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>> +		if (ret)
>> +			goto cleanup;
>> +	}
>> +
>> +	return 0;
>> +
>> +cleanup:
>> +	/*
>> +	 * If failed to reclaim the page then page is no longer safe to
>> +	 * be release back to the system, leak it.
>> +	 */
>> +	snp_mark_pages_offline(pfn, npages - n);
>> +	return ret;
>> +}
>> +
>> +static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
> 
> The same comment as above. Better take pfn or page instead of paddr with
> redundant conversions.
> 

Again, the paddr can point to guest memory so it can have C-bit set.

Thanks,
Ashish

>> +{
>> +	/* Cbit maybe set in the paddr */
>> +	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
>> +	int rc, n = 0, i;
>> +
>> +	for (i = 0; i < npages; i++, n++, pfn++) {
>> +		rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
>> +		if (rc)
>> +			goto cleanup;
>> +	}
>> +
>> +	return 0;
>> +
>> +cleanup:
>> +	/*
>> +	 * Try unrolling the firmware state changes by
>> +	 * reclaiming the pages which were already changed to the
>> +	 * firmware state.
>> +	 */
>> +	snp_reclaim_pages(paddr, n, locked);
>> +
>> +	return rc;
>> +}
>> +

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 24/56] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2023-02-21 15:31     ` Kalra, Ashish
@ 2023-02-21 21:15       ` Zhi Wang
  2023-02-21 22:06         ` Kalra, Ashish
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-02-21 21:15 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	nikunj.dadhania, Brijesh Singh

On Tue, 21 Feb 2023 09:31:01 -0600
"Kalra, Ashish" <ashish.kalra@amd.com> wrote:

> >> +static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
> >> +{
> >> +	/* Cbit maybe set in the paddr */
> > 
> > This is confusing.
> > 
> > I suppose C-bit is treated as a attribute of PTE in the kernel not part of the
> > PA. It means only a PTE might carry a C-bit.
> > 
> 
> snp_reclaim_pages() is also called for reclaiming guest memory, in which 
> case the (guest) paddr will have the C-bit set. Hence this C-bit 
> handling is done within snp_reclaim_pages() so that the callers don't 
> need to handle it explicitly.

Thanks for the explanation.

Do you mean it will be used like that in the later patch? Sorry if it is in the
later patch as I was making progress slowly. It is quite a big patch set.

At least, I don't see that kind of usage in the current patch. Feel free to
correct me if I am wrong.

The call chains:

__snp_free_firmware_page()
    snp_reclaim_pages();

As __snp_free_firmware_page() takes struct page*, all the follwing coversion
from it would not carry C-bit.

__snp_alloc_firmware_pages()
  rmp_mark_pages_firmware()
    snp_reclaim_pages()

As __snp_alloc_firmware_page() allocates page with struct page*, the same
conclusion as above.

> 
> 
> > The paddr is from __pa(page_address()). It is not extracted from a PTE. Thus, the
> > return from them should never have a C-bit.
> > 
> > BTW: Wouldn't it be better to have pfn as input param instead of paddr?
> > 
> > The caller has struct page, calling snp_reclaim_pages(page_to_pfn(page), xxxxx)
> > would be much clearer than the current conversion:
> > page_address() (struct page is converted to VA), __pa() (VA is converted to PA)
> > in the caller and then PA is converted to pfn here.
> > 
> >> +	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> >> +	int ret, err, i, n = 0;
> >> +
> > 
> > should be unsigned int i, n; as the input param npage is unsigned int.
> > 
> >> +	if (!pfn_valid(pfn)) {
> >> +		pr_err("%s: Invalid PFN %lx\n", __func__, pfn);
> >> +		return 0;
> >> +	}
> >> +
> >> +	for (i = 0; i < npages; i++, pfn++, n++) {
> >> +		paddr = pfn << PAGE_SHIFT;
> >> +
> >> +		if (locked)
> >> +			ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
> >> +		else
> >> +			ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
> >> +
> >> +		if (ret)
> >> +			goto cleanup;
> >> +
> >> +		ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> >> +		if (ret)
> >> +			goto cleanup;
> >> +	}
> >> +
> >> +	return 0;
> >> +
> >> +cleanup:
> >> +	/*
> >> +	 * If failed to reclaim the page then page is no longer safe to
> >> +	 * be release back to the system, leak it.
> >> +	 */
> >> +	snp_mark_pages_offline(pfn, npages - n);
> >> +	return ret;
> >> +}
> >> +
> >> +static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
> > 
> > The same comment as above. Better take pfn or page instead of paddr with
> > redundant conversions.
> > 
> 
> Again, the paddr can point to guest memory so it can have C-bit set.
> 
> Thanks,
> Ashish
> 
> >> +{
> >> +	/* Cbit maybe set in the paddr */
> >> +	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> >> +	int rc, n = 0, i;
> >> +
> >> +	for (i = 0; i < npages; i++, n++, pfn++) {
> >> +		rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
> >> +		if (rc)
> >> +			goto cleanup;
> >> +	}
> >> +
> >> +	return 0;
> >> +
> >> +cleanup:
> >> +	/*
> >> +	 * Try unrolling the firmware state changes by
> >> +	 * reclaiming the pages which were already changed to the
> >> +	 * firmware state.
> >> +	 */
> >> +	snp_reclaim_pages(paddr, n, locked);
> >> +
> >> +	return rc;
> >> +}
> >> +


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 10/56] x86/cpufeatures: Add SEV-SNP CPU feature
  2023-02-20 18:38 ` [PATCH RFC v8 10/56] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
@ 2023-02-21 21:21   ` Sathyanarayanan Kuppuswamy
  2023-02-22 23:27     ` Kalra, Ashish
  0 siblings, 1 reply; 147+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2023-02-21 21:21 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh, Jarkko Sakkinen



On 2/20/23 10:38 AM, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Add CPU feature detection for Secure Encrypted Virtualization with
> Secure Nested Paging. This feature adds a strong memory integrity
> protection to help prevent malicious hypervisor-based attacks like
> data replay, memory re-mapping, and more.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
> Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>

Too many signed-off-by's. Are you missing Co-developed-by?

> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h       | 1 +
>  arch/x86/kernel/cpu/amd.c                | 5 +++--
>  tools/arch/x86/include/asm/cpufeatures.h | 1 +
>  3 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 1419c4e04d45..480b4eaef310 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -420,6 +420,7 @@
>  #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
>  #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
>  #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> +#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
>  #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
>  #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
>  
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 860b60273df3..c7884198ad5b 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -558,8 +558,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>  	 *	      SME feature (set in scattered.c).
>  	 *	      If the kernel has not enabled SME via any means then
>  	 *	      don't advertise the SME feature.
> -	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
> -	 *            SEV and SEV_ES feature (set in scattered.c).

Did you remove the related scattered.c code mentioned above in a different patch?

> +	 *   For SEV: If BIOS has not enabled SEV then don't advertise SEV and
> +	 *	      any additional functionality based on it.
>  	 *
>  	 *   In all cases, since support for SME and SEV requires long mode,
>  	 *   don't advertise the feature under CONFIG_X86_32.
> @@ -594,6 +594,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>  clear_sev:
>  		setup_clear_cpu_cap(X86_FEATURE_SEV);
>  		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
> +		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>  	}
>  }
>  
> diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
> index b71f4f2ecdd5..e81606fcd2ab 100644
> --- a/tools/arch/x86/include/asm/cpufeatures.h
> +++ b/tools/arch/x86/include/asm/cpufeatures.h
> @@ -417,6 +417,7 @@
>  #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
>  #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
>  #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> +#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
>  #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
>  #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
>  

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 24/56] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2023-02-21 21:15       ` Zhi Wang
@ 2023-02-21 22:06         ` Kalra, Ashish
  0 siblings, 0 replies; 147+ messages in thread
From: Kalra, Ashish @ 2023-02-21 22:06 UTC (permalink / raw)
  To: Zhi Wang
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	nikunj.dadhania, Brijesh Singh

On 2/21/2023 3:15 PM, Zhi Wang wrote:
> On Tue, 21 Feb 2023 09:31:01 -0600
> "Kalra, Ashish" <ashish.kalra@amd.com> wrote:
> 
>>>> +static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
>>>> +{
>>>> +	/* Cbit maybe set in the paddr */
>>>
>>> This is confusing.
>>>
>>> I suppose C-bit is treated as a attribute of PTE in the kernel not part of the
>>> PA. It means only a PTE might carry a C-bit.
>>>
>>
>> snp_reclaim_pages() is also called for reclaiming guest memory, in which
>> case the (guest) paddr will have the C-bit set. Hence this C-bit
>> handling is done within snp_reclaim_pages() so that the callers don't
>> need to handle it explicitly.
> 
> Thanks for the explanation.
> 
> Do you mean it will be used like that in the later patch? Sorry if it is in the
> later patch as I was making progress slowly. It is quite a big patch set.
>

Yes, these are callers in later patches, like the following code path in 
patch 25:

static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, 
struct snp_host_map *map)
{
         unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
	...
         /* If paddr points to a guest memory then restore the page 
state to hypervisor. */
         if (guest) {
                 if (snp_reclaim_pages(*paddr, npages, true))
                         return -EFAULT;

                 goto done;
         }

       	...
	...

Or, the following as part of patch 52:

int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, 
int *error)
{
	...
         data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
         data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
         data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);

         /* The destination page must be in the firmware state. */
         if (rmp_mark_pages_firmware(data.dst_addr, 1, false))
                 return -EIO;

         ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);

         /* Restore the page state */
         if (snp_reclaim_pages(data.dst_addr, 1, false))
	...
	...

Thanks,
Ashish

> At least, I don't see that kind of usage in the current patch. Feel free to
> correct me if I am wrong.
> 
> The call chains:
> 
> __snp_free_firmware_page()
>      snp_reclaim_pages();
> 
> As __snp_free_firmware_page() takes struct page*, all the follwing coversion
> from it would not carry C-bit.
> 
> __snp_alloc_firmware_pages()
>    rmp_mark_pages_firmware()
>      snp_reclaim_pages()
> 
> As __snp_alloc_firmware_page() allocates page with struct page*, the same
> conclusion as above.
> 
>>
>>
>>> The paddr is from __pa(page_address()). It is not extracted from a PTE. Thus, the
>>> return from them should never have a C-bit.
>>>
>>> BTW: Wouldn't it be better to have pfn as input param instead of paddr?
>>>
>>> The caller has struct page, calling snp_reclaim_pages(page_to_pfn(page), xxxxx)
>>> would be much clearer than the current conversion:
>>> page_address() (struct page is converted to VA), __pa() (VA is converted to PA)
>>> in the caller and then PA is converted to pfn here.
>>>
>>>> +	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
>>>> +	int ret, err, i, n = 0;
>>>> +
>>>
>>> should be unsigned int i, n; as the input param npage is unsigned int.
>>>
>>>> +	if (!pfn_valid(pfn)) {
>>>> +		pr_err("%s: Invalid PFN %lx\n", __func__, pfn);
>>>> +		return 0;
>>>> +	}
>>>> +
>>>> +	for (i = 0; i < npages; i++, pfn++, n++) {
>>>> +		paddr = pfn << PAGE_SHIFT;
>>>> +
>>>> +		if (locked)
>>>> +			ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
>>>> +		else
>>>> +			ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &paddr, &err);
>>>> +
>>>> +		if (ret)
>>>> +			goto cleanup;
>>>> +
>>>> +		ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>>> +		if (ret)
>>>> +			goto cleanup;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +
>>>> +cleanup:
>>>> +	/*
>>>> +	 * If failed to reclaim the page then page is no longer safe to
>>>> +	 * be release back to the system, leak it.
>>>> +	 */
>>>> +	snp_mark_pages_offline(pfn, npages - n);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
>>>
>>> The same comment as above. Better take pfn or page instead of paddr with
>>> redundant conversions.
>>>
>>
>> Again, the paddr can point to guest memory so it can have C-bit set.
>>
>> Thanks,
>> Ashish
>>
>>>> +{
>>>> +	/* Cbit maybe set in the paddr */
>>>> +	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
>>>> +	int rc, n = 0, i;
>>>> +
>>>> +	for (i = 0; i < npages; i++, n++, pfn++) {
>>>> +		rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
>>>> +		if (rc)
>>>> +			goto cleanup;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +
>>>> +cleanup:
>>>> +	/*
>>>> +	 * Try unrolling the firmware state changes by
>>>> +	 * reclaiming the pages which were already changed to the
>>>> +	 * firmware state.
>>>> +	 */
>>>> +	snp_reclaim_pages(paddr, n, locked);
>>>> +
>>>> +	return rc;
>>>> +}
>>>> +
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2023-02-20 18:38 ` [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
@ 2023-02-22 12:32   ` Zhi Wang
  2023-02-22 16:50     ` Tom Lendacky
  2023-02-22 22:43     ` Kalra, Ashish
  0 siblings, 2 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-22 12:32 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:18 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
> system-wide configuration value for SNP guests. The information includes
> the TCB version string to be reported in guest attestation reports.
> 
> Version 2 of the GHCB specification adds an NAE (SNP extended guest
> request) that a guest can use to query the reports that include additional
> certificates.
> 
> In both cases, userspace provided additional data is included in the
> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
> command to give the certificate blob and the reported TCB version string
> at once. Note that the specification defines certificate blob with a
> specific GUID format; the userspace is responsible for building the
> proper certificate blob. The ioctl treats it an opaque blob.
> 
> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
> command that can be used to obtain the data programmed through the
> SNP_SET_EXT_CONFIG.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  Documentation/virt/coco/sev-guest.rst |  27 ++++++
>  drivers/crypto/ccp/sev-dev.c          | 123 ++++++++++++++++++++++++++
>  drivers/crypto/ccp/sev-dev.h          |   4 +
>  include/uapi/linux/psp-sev.h          |  17 ++++
>  4 files changed, 171 insertions(+)
> 
> diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
> index 11ea67c944df..6cad4226c348 100644
> --- a/Documentation/virt/coco/sev-guest.rst
> +++ b/Documentation/virt/coco/sev-guest.rst
> @@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
>  status includes API major, minor version and more. See the SEV-SNP
>  specification for further details.
>  
> +2.5 SNP_SET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
> +reported TCB version in the attestation report. The command is similar to
> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
> +command also accepts an additional certificate blob defined in the GHCB
> +specification.
> +
> +If the certs_address is zero, then the previous certificate blob will deleted.
> +For more information on the certificate blob layout, see the GHCB spec
> +(extended guest request message).
> +
> +2.6 SNP_GET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
> +through the SNP_SET_EXT_CONFIG.
> +
>  3. SEV-SNP CPUID Enforcement
>  ============================
>  
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 65e13a562f3b..b56b00ca2cd4 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1481,6 +1481,10 @@ static int __sev_snp_shutdown_locked(int *error)
>  	data.length = sizeof(data);
>  	data.iommu_snp_shutdown = 1;
>  
> +	/* Free the memory used for caching the certificate data */
> +	kfree(sev->snp_certs_data);
> +	sev->snp_certs_data = NULL;
> +
>  	wbinvd_on_all_cpus();
>  
>  retry:
> @@ -1793,6 +1797,118 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
>  	return ret;
>  }
>  
> +static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
> +{
> +	struct sev_device *sev = psp_master->sev_data;
> +	struct sev_user_data_ext_snp_config input;
> +	int ret;
> +
> +	if (!sev->snp_initialized || !argp->data)
> +		return -EINVAL;
> +
> +	memset(&input, 0, sizeof(input));
> +
> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> +		return -EFAULT;
> +
> +	/* Copy the TCB version programmed through the SET_CONFIG to userspace */
> +	if (input.config_address) {
> +		if (copy_to_user((void * __user)input.config_address,
> +				 &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
> +			return -EFAULT;
> +	}
> +
> +	/* Copy the extended certs programmed through the SNP_SET_CONFIG */
> +	if (input.certs_address && sev->snp_certs_data) {
> +		if (input.certs_len < sev->snp_certs_len) {
> +			/* Return the certs length to userspace */
> +			input.certs_len = sev->snp_certs_len;
> +
> +			ret = -ENOSR;
> +			goto e_done;
> +		}
> +

What about if input.certs_len > sev->snp_certs_len? Is it possbile for the
userspace to know the length of data in the buffer? (I guess it might be able
to know the certs len through the blob data, but a comment here would be nice)

> +		if (copy_to_user((void * __user)input.certs_address,
> +				 sev->snp_certs_data, sev->snp_certs_len))
> +			return -EFAULT;
> +	}
> +
> +	ret = 0;
> +
> +e_done:
> +	if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
> +		ret = -EFAULT;
> +
> +	return ret;
> +}
> +
> +static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
> +{
> +	struct sev_device *sev = psp_master->sev_data;
> +	struct sev_user_data_ext_snp_config input;
> +	struct sev_user_data_snp_config config;
> +	void *certs = NULL;
> +	int ret = 0;
> +
> +	if (!sev->snp_initialized || !argp->data)
> +		return -EINVAL;
> +
> +	if (!writable)
> +		return -EPERM;
> +
> +	memset(&input, 0, sizeof(input));
> +
> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> +		return -EFAULT;
> +
> +	/* Copy the certs from userspace */
> +	if (input.certs_address) {
> +		if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
> +			return -EINVAL;
> +
> +		certs = psp_copy_user_blob(input.certs_address, input.certs_len);
> +		if (IS_ERR(certs))
> +			return PTR_ERR(certs);
> +	}
> +
> +	/* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
> +	if (input.config_address) {
> +		memset(&config, 0, sizeof(config));
> +		if (copy_from_user(&config,
> +				   (void __user *)input.config_address, sizeof(config))) {
> +			ret = -EFAULT;
> +			goto e_free;
> +		}
> +
> +		ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
> +		if (ret)
> +			goto e_free;
> +
> +		memcpy(&sev->snp_config, &config, sizeof(config));
> +	}
> +
> +	/*
> +	 * If the new certs are passed then cache it else free the old certs.
> +	 */
> +	mutex_lock(&sev->snp_certs_lock);
> +	if (certs) {
> +		kfree(sev->snp_certs_data);
> +		sev->snp_certs_data = certs;
> +		sev->snp_certs_len = input.certs_len;
> +	} else {
> +		kfree(sev->snp_certs_data);
> +		sev->snp_certs_data = NULL;
> +		sev->snp_certs_len = 0;
> +	}
> +	mutex_unlock(&sev->snp_certs_lock);
> +
> +	return 0;
> +
> +e_free:
> +	kfree(certs);
> +	return ret;
> +}
> +
>  static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>  {
>  	void __user *argp = (void __user *)arg;
> @@ -1847,6 +1963,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>  	case SNP_PLATFORM_STATUS:
>  		ret = sev_ioctl_snp_platform_status(&input);
>  		break;
> +	case SNP_SET_EXT_CONFIG:
> +		ret = sev_ioctl_snp_set_config(&input, writable);
> +		break;
> +	case SNP_GET_EXT_CONFIG:
> +		ret = sev_ioctl_snp_get_config(&input);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		goto out;
> @@ -1962,6 +2084,7 @@ int sev_dev_init(struct psp_device *psp)
>  		goto e_sev;
>  
>  	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
> +	mutex_init(&sev->snp_certs_lock);
>  
>  	psp->sev_data = sev;
>  
> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
> index 19d79f9d4212..41d5353d5bab 100644
> --- a/drivers/crypto/ccp/sev-dev.h
> +++ b/drivers/crypto/ccp/sev-dev.h
> @@ -66,6 +66,10 @@ struct sev_device {
>  
>  	bool snp_initialized;
>  	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
> +	void *snp_certs_data;
> +	u32 snp_certs_len;
> +	struct mutex snp_certs_lock;
> +	struct sev_user_data_snp_config snp_config;
>  };
>  
>  int sev_dev_init(struct psp_device *psp);
> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
> index 5adfaea7df97..c20d37586d21 100644
> --- a/include/uapi/linux/psp-sev.h
> +++ b/include/uapi/linux/psp-sev.h
> @@ -29,6 +29,8 @@ enum {
>  	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
>  	SEV_GET_ID2,
>  	SNP_PLATFORM_STATUS,
> +	SNP_SET_EXT_CONFIG,
> +	SNP_GET_EXT_CONFIG,
>  
>  	SEV_MAX,
>  };
> @@ -192,6 +194,21 @@ struct sev_user_data_snp_config {
>  	__u8 rsvd1[52];
>  } __packed;
>  
> +/**
> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
> + *
> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
> + *		reported_tcb does not need to be updated.
> + * @certs_address: address of extended guest request certificate chain or
> + *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
> + * @certs_len: length of the certs
> + */
> +struct sev_user_data_ext_snp_config {
> +	__u64 config_address;		/* In */
> +	__u64 certs_address;		/* In */
> +	__u32 certs_len;		/* In */
> +};
> +
>  /**
>   * struct sev_issue_cmd - SEV ioctl parameters
>   *


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2023-02-22 12:32   ` Zhi Wang
@ 2023-02-22 16:50     ` Tom Lendacky
  2023-02-22 22:43     ` Kalra, Ashish
  1 sibling, 0 replies; 147+ messages in thread
From: Tom Lendacky @ 2023-02-22 16:50 UTC (permalink / raw)
  To: Zhi Wang, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh

On 2/22/23 06:32, Zhi Wang wrote:
> On Mon, 20 Feb 2023 12:38:18 -0600
> Michael Roth <michael.roth@amd.com> wrote:
> 
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
>> system-wide configuration value for SNP guests. The information includes
>> the TCB version string to be reported in guest attestation reports.
>>
>> Version 2 of the GHCB specification adds an NAE (SNP extended guest
>> request) that a guest can use to query the reports that include additional
>> certificates.
>>
>> In both cases, userspace provided additional data is included in the
>> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
>> command to give the certificate blob and the reported TCB version string
>> at once. Note that the specification defines certificate blob with a
>> specific GUID format; the userspace is responsible for building the
>> proper certificate blob. The ioctl treats it an opaque blob.
>>
>> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
>> command that can be used to obtain the data programmed through the
>> SNP_SET_EXT_CONFIG.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   Documentation/virt/coco/sev-guest.rst |  27 ++++++
>>   drivers/crypto/ccp/sev-dev.c          | 123 ++++++++++++++++++++++++++
>>   drivers/crypto/ccp/sev-dev.h          |   4 +
>>   include/uapi/linux/psp-sev.h          |  17 ++++
>>   4 files changed, 171 insertions(+)
>>

>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 65e13a562f3b..b56b00ca2cd4 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -1481,6 +1481,10 @@ static int __sev_snp_shutdown_locked(int *error)
>>   	data.length = sizeof(data);
>>   	data.iommu_snp_shutdown = 1;
>>   
>> +	/* Free the memory used for caching the certificate data */
>> +	kfree(sev->snp_certs_data);
>> +	sev->snp_certs_data = NULL;
>> +
>>   	wbinvd_on_all_cpus();
>>   
>>   retry:
>> @@ -1793,6 +1797,118 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
>>   	return ret;
>>   }
>>   
>> +static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
>> +{
>> +	struct sev_device *sev = psp_master->sev_data;
>> +	struct sev_user_data_ext_snp_config input;
>> +	int ret;
>> +
>> +	if (!sev->snp_initialized || !argp->data)
>> +		return -EINVAL;
>> +
>> +	memset(&input, 0, sizeof(input));
>> +
>> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
>> +		return -EFAULT;
>> +
>> +	/* Copy the TCB version programmed through the SET_CONFIG to userspace */
>> +	if (input.config_address) {
>> +		if (copy_to_user((void * __user)input.config_address,
>> +				 &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
>> +			return -EFAULT;
>> +	}
>> +
>> +	/* Copy the extended certs programmed through the SNP_SET_CONFIG */
>> +	if (input.certs_address && sev->snp_certs_data) {
>> +		if (input.certs_len < sev->snp_certs_len) {
>> +			/* Return the certs length to userspace */
>> +			input.certs_len = sev->snp_certs_len;
>> +
>> +			ret = -ENOSR;

We should be consistent with the other SEV ioctls that return required 
lengths and return -EIO here instead -ENOSR.

Thanks,
Tom

>> +			goto e_done;
>> +		}
>> +
> 
> What about if input.certs_len > sev->snp_certs_len? Is it possbile for the
> userspace to know the length of data in the buffer? (I guess it might be able
> to know the certs len through the blob data, but a comment here would be nice)
> 
>> +		if (copy_to_user((void * __user)input.certs_address,
>> +				 sev->snp_certs_data, sev->snp_certs_len))
>> +			return -EFAULT;
>> +	}
>> +
>> +	ret = 0;
>> +
>> +e_done:
>> +	if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
>> +		ret = -EFAULT;
>> +
>> +	return ret;
>> +}


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 28/56] crypto: ccp: Provide APIs to query extended attestation report
  2023-02-20 18:38 ` [PATCH RFC v8 28/56] crypto: ccp: Provide APIs to query extended attestation report Michael Roth
@ 2023-02-22 20:24   ` Zhi Wang
  2023-02-22 22:35     ` Kalra, Ashish
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-02-22 20:24 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:19 -0600
Michael Roth <michael.roth@amd.com> wrote:

It seems in the discussion:
https://lore.kernel.org/lkml/f18fae8b-a928-cd82-e0b3-eac62ad3e106@amd.com/,
this API is going to be removed. Will that fix land in this patch series or not?
If not, It would be better to mention it in the comment message of this one
or patch 45.
If yes, I guess this patch is not needed.

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Version 2 of the GHCB specification defines VMGEXIT that is used to get
> the extended attestation report. The extended attestation report includes
> the certificate blobs provided through the SNP_SET_EXT_CONFIG.
> 
> The snp_guest_ext_guest_request() will be used by the hypervisor to get
> the extended attestation report. See the GHCB specification for more
> details.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c | 47 ++++++++++++++++++++++++++++++++++++
>  include/linux/psp-sev.h      | 33 +++++++++++++++++++++++++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index b56b00ca2cd4..e65563bc8298 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -2017,6 +2017,53 @@ int sev_guest_df_flush(int *error)
>  }
>  EXPORT_SYMBOL_GPL(sev_guest_df_flush);
>  
> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> +				unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
> +{
> +	unsigned long expected_npages;
> +	struct sev_device *sev;
> +	int rc;
> +
> +	if (!psp_master || !psp_master->sev_data)
> +		return -ENODEV;
> +
> +	sev = psp_master->sev_data;
> +
> +	if (!sev->snp_initialized)
> +		return -EINVAL;
> +
> +	mutex_lock(&sev->snp_certs_lock);
> +	/*
> +	 * Check if there is enough space to copy the certificate chain. Otherwise
> +	 * return ERROR code defined in the GHCB specification.
> +	 */
> +	expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
> +	if (*npages < expected_npages) {
> +		*npages = expected_npages;
> +		*fw_err = SNP_GUEST_REQ_INVALID_LEN;
> +		mutex_unlock(&sev->snp_certs_lock);
> +		return -EINVAL;
> +	}
> +
> +	rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)fw_err);
> +	if (rc) {
> +		mutex_unlock(&sev->snp_certs_lock);
> +		return rc;
> +	}
> +
> +	/* Copy the certificate blob */
> +	if (sev->snp_certs_data) {
> +		*npages = expected_npages;
> +		memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
> +	} else {
> +		*npages = 0;
> +	}
> +
> +	mutex_unlock(&sev->snp_certs_lock);
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
> +
>  static void sev_exit(struct kref *ref)
>  {
>  	misc_deregister(&misc_dev->misc);
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index d19744807471..81bafc049eca 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -931,6 +931,32 @@ void snp_free_firmware_page(void *addr);
>   */
>  void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
>  
> +/**
> + * snp_guest_ext_guest_request - perform the SNP extended guest request command
> + *  defined in the GHCB specification.
> + *
> + * @data: the input guest request structure
> + * @vaddr: address where the certificate blob need to be copied.
> + * @npages: number of pages for the certificate blob.
> + *    If the specified page count is less than the certificate blob size, then the
> + *    required page count is returned with error code defined in the GHCB spec.
> + *    If the specified page count is more than the certificate blob size, then
> + *    page count is updated to reflect the amount of valid data copied in the
> + *    vaddr.
> + *
> + * @sev_ret: sev command return code
> + *
> + * Returns:
> + * 0 if the sev successfully processed the command
> + * -%ENODEV    if the sev device is not available
> + * -%ENOTSUPP  if the sev does not support SEV
> + * -%ETIMEDOUT if the sev command timed out
> + * -%EIO       if the sev returned a non-zero return code
> + */
> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> +				unsigned long vaddr, unsigned long *npages,
> +				unsigned long *error);
> +
>  #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
>  
>  static inline int
> @@ -968,6 +994,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
>  
>  static inline void snp_free_firmware_page(void *addr) { }
>  
> +static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> +					      unsigned long vaddr, unsigned long *n,
> +					      unsigned long *error)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
>  
>  #endif	/* __PSP_SEV_H__ */


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 31/56] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2023-02-20 18:38 ` [PATCH RFC v8 31/56] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
@ 2023-02-22 20:42   ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-22 20:42 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:22 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
> 
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing   pages as "in-use" in the RMP after a successful VMRUN.  This

Is this "in-use" bit part of an RMP entry? If yes, better list its name 
in APM.

> is done for _all_ VMs, not just SNP-Active VMs.
_All_ VMs? Do you mean SEV VMs and SEVSNP VMs? I guess legacy VM is not
affected, right?
>
> If the hypervisor accesses an in-use page through a writable
> translation, the CPU will throw an RMP violation #PF. On early SNP
> hardware, if an in-use page is 2mb aligned and software accesses any
> part of the associated 2mb region with a hupage, the CPU will
                                            ^hugepage
> incorrectly treat the entire 2mb region as in-use and signal a spurious
> RMP violation #PF.
> 
> The recommended is to not use the hugepage for the VMCB, VMSA or
> AVIC backing page. Add a generic allocator that will ensure that the
> page returns is not hugepage (2mb or 1gb) and is safe to be used when
> SEV-SNP is enabled.
> 
> Co-developed-by: Marc Orr <marcorr@google.com>
> Signed-off-by: Marc Orr <marcorr@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  2 ++
>  arch/x86/kvm/lapic.c               |  5 ++++-
>  arch/x86/kvm/svm/sev.c             | 33 ++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c             | 15 ++++++++++++--
>  arch/x86/kvm/svm/svm.h             |  1 +
>  6 files changed, 54 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 6a885f024a00..e116405cbb5f 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed)
>  KVM_X86_OP(complete_emulated_msr)
>  KVM_X86_OP(vcpu_deliver_sipi_vector)
>  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> +KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
>  KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
>  KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
>  KVM_X86_OP_OPTIONAL(invalidate_restricted_mem)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 37c92412035f..a9363a6f779d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1729,6 +1729,8 @@ struct kvm_x86_ops {
>  	 * Returns vCPU specific APICv inhibit reasons
>  	 */
>  	unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
> +
> +	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
>  };
>  
>  struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 80f92cbc4029..72e46d5b4201 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2740,7 +2740,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
>  
>  	vcpu->arch.apic = apic;
>  
> -	apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> +	if (kvm_x86_ops.alloc_apic_backing_page)
> +		apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
> +	else
> +		apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
>  	if (!apic->regs) {
>  		printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
>  		       vcpu->vcpu_id);
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index c1f0d4898ce3..9e9efb42a766 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3241,3 +3241,36 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
>  		break;
>  	}
>  }
> +
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long pfn;
> +	struct page *p;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +
> +	/*
> +	 * Allocate an SNP safe page to workaround the SNP erratum where
> +	 * the CPU will incorrectly signal an RMP violation  #PF if a
> +	 * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> +	 * or AVIC backing page. The recommeded workaround is to not use the
> +	 * hugepage.
> +	 *
> +	 * Allocate one extra page, use a page which is not 2mb aligned
> +	 * and free the other.
> +	 */
> +	p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> +	if (!p)
> +		return NULL;
> +
> +	split_page(p, 1);
> +
> +	pfn = page_to_pfn(p);
> +	if (IS_ALIGNED(pfn, PTRS_PER_PMD))
> +		__free_page(p++);
> +	else
> +		__free_page(p + 1);
> +
> +	return p;
> +}

The duplicate allocation routine in snp_alloc_vmsa_page() in sev.c can
be replaced with snp_safe_alloc_page().

> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 213593dbd7a1..1061aaf66f0a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1372,7 +1372,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
>  	svm = to_svm(vcpu);
>  
>  	err = -ENOMEM;
> -	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	vmcb01_page = snp_safe_alloc_page(vcpu);
>  	if (!vmcb01_page)
>  		goto out;
>  
> @@ -1381,7 +1381,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
>  		 * SEV-ES guests require a separate VMSA page used to contain
>  		 * the encrypted register state of the guest.
>  		 */
> -		vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +		vmsa_page = snp_safe_alloc_page(vcpu);
>  		if (!vmsa_page)
>  			goto error_free_vmcb_page;
>  
> @@ -4696,6 +4696,16 @@ static int svm_vm_init(struct kvm *kvm)
>  	return 0;
>  }
>  
> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
> +{
> +	struct page *page = snp_safe_alloc_page(vcpu);
> +
> +	if (!page)
> +		return NULL;
> +
> +	return page_address(page);
> +}
> +
>  static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.name = KBUILD_MODNAME,
>  
> @@ -4824,6 +4834,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  
>  	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
>  	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
> +	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
>  };
>  
>  /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index c249c360fe36..5efcf036ccad 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -692,6 +692,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>  void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>  
>  /* vmenter.S */
>  


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 28/56] crypto: ccp: Provide APIs to query extended attestation report
  2023-02-22 20:24   ` Zhi Wang
@ 2023-02-22 22:35     ` Kalra, Ashish
  2023-02-23  8:14       ` Zhi Wang
  0 siblings, 1 reply; 147+ messages in thread
From: Kalra, Ashish @ 2023-02-22 22:35 UTC (permalink / raw)
  To: Zhi Wang, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, nikunj.dadhania, Brijesh Singh

On 2/22/2023 2:24 PM, Zhi Wang wrote:
> On Mon, 20 Feb 2023 12:38:19 -0600
> Michael Roth <michael.roth@amd.com> wrote:
> 
> It seems in the discussion:
> https://lore.kernel.org/lkml/f18fae8b-a928-cd82-e0b3-eac62ad3e106@amd.com/,
> this API is going to be removed. Will that fix land in this patch series or not?
> If not, It would be better to mention it in the comment message of this one
> or patch 45.
> If yes, I guess this patch is not needed.
> 

This API is definitely not going to be removed.

There will be some fixes and optimizations added to the API 
implementation (as per the discussions) and that will be included in v9.

Thanks,
Ashish

>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> Version 2 of the GHCB specification defines VMGEXIT that is used to get
>> the extended attestation report. The extended attestation report includes
>> the certificate blobs provided through the SNP_SET_EXT_CONFIG.
>>
>> The snp_guest_ext_guest_request() will be used by the hypervisor to get
>> the extended attestation report. See the GHCB specification for more
>> details.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   drivers/crypto/ccp/sev-dev.c | 47 ++++++++++++++++++++++++++++++++++++
>>   include/linux/psp-sev.h      | 33 +++++++++++++++++++++++++
>>   2 files changed, 80 insertions(+)
>>
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index b56b00ca2cd4..e65563bc8298 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -2017,6 +2017,53 @@ int sev_guest_df_flush(int *error)
>>   }
>>   EXPORT_SYMBOL_GPL(sev_guest_df_flush);
>>   
>> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>> +				unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
>> +{
>> +	unsigned long expected_npages;
>> +	struct sev_device *sev;
>> +	int rc;
>> +
>> +	if (!psp_master || !psp_master->sev_data)
>> +		return -ENODEV;
>> +
>> +	sev = psp_master->sev_data;
>> +
>> +	if (!sev->snp_initialized)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&sev->snp_certs_lock);
>> +	/*
>> +	 * Check if there is enough space to copy the certificate chain. Otherwise
>> +	 * return ERROR code defined in the GHCB specification.
>> +	 */
>> +	expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
>> +	if (*npages < expected_npages) {
>> +		*npages = expected_npages;
>> +		*fw_err = SNP_GUEST_REQ_INVALID_LEN;
>> +		mutex_unlock(&sev->snp_certs_lock);
>> +		return -EINVAL;
>> +	}
>> +
>> +	rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)fw_err);
>> +	if (rc) {
>> +		mutex_unlock(&sev->snp_certs_lock);
>> +		return rc;
>> +	}
>> +
>> +	/* Copy the certificate blob */
>> +	if (sev->snp_certs_data) {
>> +		*npages = expected_npages;
>> +		memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
>> +	} else {
>> +		*npages = 0;
>> +	}
>> +
>> +	mutex_unlock(&sev->snp_certs_lock);
>> +	return rc;
>> +}
>> +EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
>> +
>>   static void sev_exit(struct kref *ref)
>>   {
>>   	misc_deregister(&misc_dev->misc);
>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
>> index d19744807471..81bafc049eca 100644
>> --- a/include/linux/psp-sev.h
>> +++ b/include/linux/psp-sev.h
>> @@ -931,6 +931,32 @@ void snp_free_firmware_page(void *addr);
>>    */
>>   void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
>>   
>> +/**
>> + * snp_guest_ext_guest_request - perform the SNP extended guest request command
>> + *  defined in the GHCB specification.
>> + *
>> + * @data: the input guest request structure
>> + * @vaddr: address where the certificate blob need to be copied.
>> + * @npages: number of pages for the certificate blob.
>> + *    If the specified page count is less than the certificate blob size, then the
>> + *    required page count is returned with error code defined in the GHCB spec.
>> + *    If the specified page count is more than the certificate blob size, then
>> + *    page count is updated to reflect the amount of valid data copied in the
>> + *    vaddr.
>> + *
>> + * @sev_ret: sev command return code
>> + *
>> + * Returns:
>> + * 0 if the sev successfully processed the command
>> + * -%ENODEV    if the sev device is not available
>> + * -%ENOTSUPP  if the sev does not support SEV
>> + * -%ETIMEDOUT if the sev command timed out
>> + * -%EIO       if the sev returned a non-zero return code
>> + */
>> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>> +				unsigned long vaddr, unsigned long *npages,
>> +				unsigned long *error);
>> +
>>   #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
>>   
>>   static inline int
>> @@ -968,6 +994,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
>>   
>>   static inline void snp_free_firmware_page(void *addr) { }
>>   
>> +static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>> +					      unsigned long vaddr, unsigned long *n,
>> +					      unsigned long *error)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>   #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
>>   
>>   #endif	/* __PSP_SEV_H__ */
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2023-02-22 12:32   ` Zhi Wang
  2023-02-22 16:50     ` Tom Lendacky
@ 2023-02-22 22:43     ` Kalra, Ashish
  2023-02-23  6:38       ` Zhi Wang
  1 sibling, 1 reply; 147+ messages in thread
From: Kalra, Ashish @ 2023-02-22 22:43 UTC (permalink / raw)
  To: Zhi Wang, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, nikunj.dadhania, Brijesh Singh

On 2/22/2023 6:32 AM, Zhi Wang wrote:
> On Mon, 20 Feb 2023 12:38:18 -0600
> Michael Roth <michael.roth@amd.com> wrote:
> 
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
>> system-wide configuration value for SNP guests. The information includes
>> the TCB version string to be reported in guest attestation reports.
>>
>> Version 2 of the GHCB specification adds an NAE (SNP extended guest
>> request) that a guest can use to query the reports that include additional
>> certificates.
>>
>> In both cases, userspace provided additional data is included in the
>> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
>> command to give the certificate blob and the reported TCB version string
>> at once. Note that the specification defines certificate blob with a
>> specific GUID format; the userspace is responsible for building the
>> proper certificate blob. The ioctl treats it an opaque blob.
>>
>> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
>> command that can be used to obtain the data programmed through the
>> SNP_SET_EXT_CONFIG.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   Documentation/virt/coco/sev-guest.rst |  27 ++++++
>>   drivers/crypto/ccp/sev-dev.c          | 123 ++++++++++++++++++++++++++
>>   drivers/crypto/ccp/sev-dev.h          |   4 +
>>   include/uapi/linux/psp-sev.h          |  17 ++++
>>   4 files changed, 171 insertions(+)
>>
>> diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
>> index 11ea67c944df..6cad4226c348 100644
>> --- a/Documentation/virt/coco/sev-guest.rst
>> +++ b/Documentation/virt/coco/sev-guest.rst
>> @@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
>>   status includes API major, minor version and more. See the SEV-SNP
>>   specification for further details.
>>   
>> +2.5 SNP_SET_EXT_CONFIG
>> +----------------------
>> +:Technology: sev-snp
>> +:Type: hypervisor ioctl cmd
>> +:Parameters (in): struct sev_data_snp_ext_config
>> +:Returns (out): 0 on success, -negative on error
>> +
>> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
>> +reported TCB version in the attestation report. The command is similar to
>> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
>> +command also accepts an additional certificate blob defined in the GHCB
>> +specification.
>> +
>> +If the certs_address is zero, then the previous certificate blob will deleted.
>> +For more information on the certificate blob layout, see the GHCB spec
>> +(extended guest request message).
>> +
>> +2.6 SNP_GET_EXT_CONFIG
>> +----------------------
>> +:Technology: sev-snp
>> +:Type: hypervisor ioctl cmd
>> +:Parameters (in): struct sev_data_snp_ext_config
>> +:Returns (out): 0 on success, -negative on error
>> +
>> +The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
>> +through the SNP_SET_EXT_CONFIG.
>> +
>>   3. SEV-SNP CPUID Enforcement
>>   ============================
>>   
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 65e13a562f3b..b56b00ca2cd4 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -1481,6 +1481,10 @@ static int __sev_snp_shutdown_locked(int *error)
>>   	data.length = sizeof(data);
>>   	data.iommu_snp_shutdown = 1;
>>   
>> +	/* Free the memory used for caching the certificate data */
>> +	kfree(sev->snp_certs_data);
>> +	sev->snp_certs_data = NULL;
>> +
>>   	wbinvd_on_all_cpus();
>>   
>>   retry:
>> @@ -1793,6 +1797,118 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
>>   	return ret;
>>   }
>>   
>> +static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
>> +{
>> +	struct sev_device *sev = psp_master->sev_data;
>> +	struct sev_user_data_ext_snp_config input;
>> +	int ret;
>> +
>> +	if (!sev->snp_initialized || !argp->data)
>> +		return -EINVAL;
>> +
>> +	memset(&input, 0, sizeof(input));
>> +
>> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
>> +		return -EFAULT;
>> +
>> +	/* Copy the TCB version programmed through the SET_CONFIG to userspace */
>> +	if (input.config_address) {
>> +		if (copy_to_user((void * __user)input.config_address,
>> +				 &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
>> +			return -EFAULT;
>> +	}
>> +
>> +	/* Copy the extended certs programmed through the SNP_SET_CONFIG */
>> +	if (input.certs_address && sev->snp_certs_data) {
>> +		if (input.certs_len < sev->snp_certs_len) {
>> +			/* Return the certs length to userspace */
>> +			input.certs_len = sev->snp_certs_len;
>> +
>> +			ret = -ENOSR;
>> +			goto e_done;
>> +		}
>> +
> 
> What about if input.certs_len > sev->snp_certs_len? Is it possbile for the
> userspace to know the length of data in the buffer? (I guess it might be able
> to know the certs len through the blob data, but a comment here would be nice)
> 

If userspace provides an input buffer/length smaller then snp_certs_len, 
then the above returns the "required" certs length back to userspace.

And what is the issue if input.certs_len > sev->snp_certs_len, the 
buffer returned back to userspace is sev->snp_certs_len as below.

Thanks,
Ashish

>> +		if (copy_to_user((void * __user)input.certs_address,
>> +				 sev->snp_certs_data, sev->snp_certs_len))
>> +			return -EFAULT;
>> +	}
>> +
>> +	ret = 0;
>> +
>> +e_done:
>> +	if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
>> +		ret = -EFAULT;
>> +
>> +	return ret;
>> +}
>> +
>> +static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
>> +{
>> +	struct sev_device *sev = psp_master->sev_data;
>> +	struct sev_user_data_ext_snp_config input;
>> +	struct sev_user_data_snp_config config;
>> +	void *certs = NULL;
>> +	int ret = 0;
>> +
>> +	if (!sev->snp_initialized || !argp->data)
>> +		return -EINVAL;
>> +
>> +	if (!writable)
>> +		return -EPERM;
>> +
>> +	memset(&input, 0, sizeof(input));
>> +
>> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
>> +		return -EFAULT;
>> +
>> +	/* Copy the certs from userspace */
>> +	if (input.certs_address) {
>> +		if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
>> +			return -EINVAL;
>> +
>> +		certs = psp_copy_user_blob(input.certs_address, input.certs_len);
>> +		if (IS_ERR(certs))
>> +			return PTR_ERR(certs);
>> +	}
>> +
>> +	/* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
>> +	if (input.config_address) {
>> +		memset(&config, 0, sizeof(config));
>> +		if (copy_from_user(&config,
>> +				   (void __user *)input.config_address, sizeof(config))) {
>> +			ret = -EFAULT;
>> +			goto e_free;
>> +		}
>> +
>> +		ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
>> +		if (ret)
>> +			goto e_free;
>> +
>> +		memcpy(&sev->snp_config, &config, sizeof(config));
>> +	}
>> +
>> +	/*
>> +	 * If the new certs are passed then cache it else free the old certs.
>> +	 */
>> +	mutex_lock(&sev->snp_certs_lock);
>> +	if (certs) {
>> +		kfree(sev->snp_certs_data);
>> +		sev->snp_certs_data = certs;
>> +		sev->snp_certs_len = input.certs_len;
>> +	} else {
>> +		kfree(sev->snp_certs_data);
>> +		sev->snp_certs_data = NULL;
>> +		sev->snp_certs_len = 0;
>> +	}
>> +	mutex_unlock(&sev->snp_certs_lock);
>> +
>> +	return 0;
>> +
>> +e_free:
>> +	kfree(certs);
>> +	return ret;
>> +}
>> +
>>   static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>>   {
>>   	void __user *argp = (void __user *)arg;
>> @@ -1847,6 +1963,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>>   	case SNP_PLATFORM_STATUS:
>>   		ret = sev_ioctl_snp_platform_status(&input);
>>   		break;
>> +	case SNP_SET_EXT_CONFIG:
>> +		ret = sev_ioctl_snp_set_config(&input, writable);
>> +		break;
>> +	case SNP_GET_EXT_CONFIG:
>> +		ret = sev_ioctl_snp_get_config(&input);
>> +		break;
>>   	default:
>>   		ret = -EINVAL;
>>   		goto out;
>> @@ -1962,6 +2084,7 @@ int sev_dev_init(struct psp_device *psp)
>>   		goto e_sev;
>>   
>>   	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
>> +	mutex_init(&sev->snp_certs_lock);
>>   
>>   	psp->sev_data = sev;
>>   
>> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
>> index 19d79f9d4212..41d5353d5bab 100644
>> --- a/drivers/crypto/ccp/sev-dev.h
>> +++ b/drivers/crypto/ccp/sev-dev.h
>> @@ -66,6 +66,10 @@ struct sev_device {
>>   
>>   	bool snp_initialized;
>>   	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
>> +	void *snp_certs_data;
>> +	u32 snp_certs_len;
>> +	struct mutex snp_certs_lock;
>> +	struct sev_user_data_snp_config snp_config;
>>   };
>>   
>>   int sev_dev_init(struct psp_device *psp);
>> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
>> index 5adfaea7df97..c20d37586d21 100644
>> --- a/include/uapi/linux/psp-sev.h
>> +++ b/include/uapi/linux/psp-sev.h
>> @@ -29,6 +29,8 @@ enum {
>>   	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
>>   	SEV_GET_ID2,
>>   	SNP_PLATFORM_STATUS,
>> +	SNP_SET_EXT_CONFIG,
>> +	SNP_GET_EXT_CONFIG,
>>   
>>   	SEV_MAX,
>>   };
>> @@ -192,6 +194,21 @@ struct sev_user_data_snp_config {
>>   	__u8 rsvd1[52];
>>   } __packed;
>>   
>> +/**
>> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
>> + *
>> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
>> + *		reported_tcb does not need to be updated.
>> + * @certs_address: address of extended guest request certificate chain or
>> + *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
>> + * @certs_len: length of the certs
>> + */
>> +struct sev_user_data_ext_snp_config {
>> +	__u64 config_address;		/* In */
>> +	__u64 certs_address;		/* In */
>> +	__u32 certs_len;		/* In */
>> +};
>> +
>>   /**
>>    * struct sev_issue_cmd - SEV ioctl parameters
>>    *
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 10/56] x86/cpufeatures: Add SEV-SNP CPU feature
  2023-02-21 21:21   ` Sathyanarayanan Kuppuswamy
@ 2023-02-22 23:27     ` Kalra, Ashish
  0 siblings, 0 replies; 147+ messages in thread
From: Kalra, Ashish @ 2023-02-22 23:27 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, alpergun, dgilbert, jarkko,
	nikunj.dadhania, Brijesh Singh, Jarkko Sakkinen

On 2/21/2023 3:21 PM, Sathyanarayanan Kuppuswamy wrote:
> 
> 
> On 2/20/23 10:38 AM, Michael Roth wrote:
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> Add CPU feature detection for Secure Encrypted Virtualization with
>> Secure Nested Paging. This feature adds a strong memory integrity
>> protection to help prevent malicious hypervisor-based attacks like
>> data replay, memory re-mapping, and more.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
>> Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
> 
> Too many signed-off-by's. Are you missing Co-developed-by?
> 
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   arch/x86/include/asm/cpufeatures.h       | 1 +
>>   arch/x86/kernel/cpu/amd.c                | 5 +++--
>>   tools/arch/x86/include/asm/cpufeatures.h | 1 +
>>   3 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 1419c4e04d45..480b4eaef310 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -420,6 +420,7 @@
>>   #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
>>   #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
>>   #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
>> +#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
>>   #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
>>   #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
>>   
>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>> index 860b60273df3..c7884198ad5b 100644
>> --- a/arch/x86/kernel/cpu/amd.c
>> +++ b/arch/x86/kernel/cpu/amd.c
>> @@ -558,8 +558,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>   	 *	      SME feature (set in scattered.c).
>>   	 *	      If the kernel has not enabled SME via any means then
>>   	 *	      don't advertise the SME feature.
>> -	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
>> -	 *            SEV and SEV_ES feature (set in scattered.c).
> 
> Did you remove the related scattered.c code mentioned above in a different patch?
>

That is part of the following commit:

commit fb35d30fe5b06cc24444f0405da8fbe0be5330d1
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jan 22 12:40:46 2021 -0800

     x86/cpufeatures: Assign dedicated feature word for 
CPUID_0x8000001F[EAX]

     Collect the scattered SME/SEV related feature flags into a dedicated
     word.  There are now five recognized features in CPUID.0x8000001F.EAX,
     with at least one more on the horizon (SEV-SNP).  Using a dedicated 
word
     allows KVM to use its automagic CPUID adjustment logic when reporting
     the set of supported features to userspace.

     No functional change intended.

     Signed-off-by: Sean Christopherson <seanjc@google.com>
     Signed-off-by: Borislav Petkov <bp@suse.de>
     Reviewed-by: Brijesh Singh <brijesh.singh@amd.com>
     Link: 
https://lkml.kernel.org/r/20210122204047.2860075-2-seanjc@google.com

Thanks,
Ashish

>> +	 *   For SEV: If BIOS has not enabled SEV then don't advertise SEV and
>> +	 *	      any additional functionality based on it.
>>   	 *
>>   	 *   In all cases, since support for SME and SEV requires long mode,
>>   	 *   don't advertise the feature under CONFIG_X86_32.
>> @@ -594,6 +594,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>   clear_sev:
>>   		setup_clear_cpu_cap(X86_FEATURE_SEV);
>>   		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
>> +		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>>   	}
>>   }
>>   
>> diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
>> index b71f4f2ecdd5..e81606fcd2ab 100644
>> --- a/tools/arch/x86/include/asm/cpufeatures.h
>> +++ b/tools/arch/x86/include/asm/cpufeatures.h
>> @@ -417,6 +417,7 @@
>>   #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
>>   #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
>>   #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
>> +#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
>>   #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
>>   #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
>>   
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2023-02-22 22:43     ` Kalra, Ashish
@ 2023-02-23  6:38       ` Zhi Wang
  2023-02-23 14:19         ` Tom Lendacky
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-02-23  6:38 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	nikunj.dadhania, Brijesh Singh

On Wed, 22 Feb 2023 16:43:54 -0600
"Kalra, Ashish" <ashish.kalra@amd.com> wrote:

> On 2/22/2023 6:32 AM, Zhi Wang wrote:
> > On Mon, 20 Feb 2023 12:38:18 -0600
> > Michael Roth <michael.roth@amd.com> wrote:
> > 
> >> From: Brijesh Singh <brijesh.singh@amd.com>
> >>
> >> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
> >> system-wide configuration value for SNP guests. The information includes
> >> the TCB version string to be reported in guest attestation reports.
> >>
> >> Version 2 of the GHCB specification adds an NAE (SNP extended guest
> >> request) that a guest can use to query the reports that include additional
> >> certificates.
> >>
> >> In both cases, userspace provided additional data is included in the
> >> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
> >> command to give the certificate blob and the reported TCB version string
> >> at once. Note that the specification defines certificate blob with a
> >> specific GUID format; the userspace is responsible for building the
> >> proper certificate blob. The ioctl treats it an opaque blob.
> >>
> >> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
> >> command that can be used to obtain the data programmed through the
> >> SNP_SET_EXT_CONFIG.
> >>
> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> >> Signed-off-by: Michael Roth <michael.roth@amd.com>
> >> ---
> >>   Documentation/virt/coco/sev-guest.rst |  27 ++++++
> >>   drivers/crypto/ccp/sev-dev.c          | 123 ++++++++++++++++++++++++++
> >>   drivers/crypto/ccp/sev-dev.h          |   4 +
> >>   include/uapi/linux/psp-sev.h          |  17 ++++
> >>   4 files changed, 171 insertions(+)
> >>
> >> diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
> >> index 11ea67c944df..6cad4226c348 100644
> >> --- a/Documentation/virt/coco/sev-guest.rst
> >> +++ b/Documentation/virt/coco/sev-guest.rst
> >> @@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
> >>   status includes API major, minor version and more. See the SEV-SNP
> >>   specification for further details.
> >>   
> >> +2.5 SNP_SET_EXT_CONFIG
> >> +----------------------
> >> +:Technology: sev-snp
> >> +:Type: hypervisor ioctl cmd
> >> +:Parameters (in): struct sev_data_snp_ext_config
> >> +:Returns (out): 0 on success, -negative on error
> >> +
> >> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
> >> +reported TCB version in the attestation report. The command is similar to
> >> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
> >> +command also accepts an additional certificate blob defined in the GHCB
> >> +specification.
> >> +
> >> +If the certs_address is zero, then the previous certificate blob will deleted.
> >> +For more information on the certificate blob layout, see the GHCB spec
> >> +(extended guest request message).
> >> +
> >> +2.6 SNP_GET_EXT_CONFIG
> >> +----------------------
> >> +:Technology: sev-snp
> >> +:Type: hypervisor ioctl cmd
> >> +:Parameters (in): struct sev_data_snp_ext_config
> >> +:Returns (out): 0 on success, -negative on error
> >> +
> >> +The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
> >> +through the SNP_SET_EXT_CONFIG.
> >> +
> >>   3. SEV-SNP CPUID Enforcement
> >>   ============================
> >>   
> >> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> >> index 65e13a562f3b..b56b00ca2cd4 100644
> >> --- a/drivers/crypto/ccp/sev-dev.c
> >> +++ b/drivers/crypto/ccp/sev-dev.c
> >> @@ -1481,6 +1481,10 @@ static int __sev_snp_shutdown_locked(int *error)
> >>   	data.length = sizeof(data);
> >>   	data.iommu_snp_shutdown = 1;
> >>   
> >> +	/* Free the memory used for caching the certificate data */
> >> +	kfree(sev->snp_certs_data);
> >> +	sev->snp_certs_data = NULL;
> >> +
> >>   	wbinvd_on_all_cpus();
> >>   
> >>   retry:
> >> @@ -1793,6 +1797,118 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
> >>   	return ret;
> >>   }
> >>   
> >> +static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
> >> +{
> >> +	struct sev_device *sev = psp_master->sev_data;
> >> +	struct sev_user_data_ext_snp_config input;
> >> +	int ret;
> >> +
> >> +	if (!sev->snp_initialized || !argp->data)
> >> +		return -EINVAL;
> >> +
> >> +	memset(&input, 0, sizeof(input));
> >> +
> >> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> >> +		return -EFAULT;
> >> +
> >> +	/* Copy the TCB version programmed through the SET_CONFIG to userspace */
> >> +	if (input.config_address) {
> >> +		if (copy_to_user((void * __user)input.config_address,
> >> +				 &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	/* Copy the extended certs programmed through the SNP_SET_CONFIG */
> >> +	if (input.certs_address && sev->snp_certs_data) {
> >> +		if (input.certs_len < sev->snp_certs_len) {
> >> +			/* Return the certs length to userspace */
> >> +			input.certs_len = sev->snp_certs_len;
> >> +
> >> +			ret = -ENOSR;
> >> +			goto e_done;
> >> +		}
> >> +
> > 
> > What about if input.certs_len > sev->snp_certs_len? Is it possbile for the
> > userspace to know the length of data in the buffer? (I guess it might be able
> > to know the certs len through the blob data, but a comment here would be nice)
> > 
> 
> If userspace provides an input buffer/length smaller then snp_certs_len, 
> then the above returns the "required" certs length back to userspace.
> 
> And what is the issue if input.certs_len > sev->snp_certs_len, the 
> buffer returned back to userspace is sev->snp_certs_len as below.
> 

My point is: How can the userspace know the length of return data is shorter
than input.certs_len when input.certs_len > sev->snp_serts_len? as the length
is only returned when input.certs_len < sev->snp_certs_len.

> Thanks,
> Ashish
> 
> >> +		if (copy_to_user((void * __user)input.certs_address,
> >> +				 sev->snp_certs_data, sev->snp_certs_len))
> >> +			return -EFAULT;
> >> +	}
> >> +
> >> +	ret = 0;
> >> +
> >> +e_done:
> >> +	if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
> >> +		ret = -EFAULT;
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
> >> +{
> >> +	struct sev_device *sev = psp_master->sev_data;
> >> +	struct sev_user_data_ext_snp_config input;
> >> +	struct sev_user_data_snp_config config;
> >> +	void *certs = NULL;
> >> +	int ret = 0;
> >> +
> >> +	if (!sev->snp_initialized || !argp->data)
> >> +		return -EINVAL;
> >> +
> >> +	if (!writable)
> >> +		return -EPERM;
> >> +
> >> +	memset(&input, 0, sizeof(input));
> >> +
> >> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> >> +		return -EFAULT;
> >> +
> >> +	/* Copy the certs from userspace */
> >> +	if (input.certs_address) {
> >> +		if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
> >> +			return -EINVAL;
> >> +
> >> +		certs = psp_copy_user_blob(input.certs_address, input.certs_len);
> >> +		if (IS_ERR(certs))
> >> +			return PTR_ERR(certs);
> >> +	}
> >> +
> >> +	/* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
> >> +	if (input.config_address) {
> >> +		memset(&config, 0, sizeof(config));
> >> +		if (copy_from_user(&config,
> >> +				   (void __user *)input.config_address, sizeof(config))) {
> >> +			ret = -EFAULT;
> >> +			goto e_free;
> >> +		}
> >> +
> >> +		ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
> >> +		if (ret)
> >> +			goto e_free;
> >> +
> >> +		memcpy(&sev->snp_config, &config, sizeof(config));
> >> +	}
> >> +
> >> +	/*
> >> +	 * If the new certs are passed then cache it else free the old certs.
> >> +	 */
> >> +	mutex_lock(&sev->snp_certs_lock);
> >> +	if (certs) {
> >> +		kfree(sev->snp_certs_data);
> >> +		sev->snp_certs_data = certs;
> >> +		sev->snp_certs_len = input.certs_len;
> >> +	} else {
> >> +		kfree(sev->snp_certs_data);
> >> +		sev->snp_certs_data = NULL;
> >> +		sev->snp_certs_len = 0;
> >> +	}
> >> +	mutex_unlock(&sev->snp_certs_lock);
> >> +
> >> +	return 0;
> >> +
> >> +e_free:
> >> +	kfree(certs);
> >> +	return ret;
> >> +}
> >> +
> >>   static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> >>   {
> >>   	void __user *argp = (void __user *)arg;
> >> @@ -1847,6 +1963,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
> >>   	case SNP_PLATFORM_STATUS:
> >>   		ret = sev_ioctl_snp_platform_status(&input);
> >>   		break;
> >> +	case SNP_SET_EXT_CONFIG:
> >> +		ret = sev_ioctl_snp_set_config(&input, writable);
> >> +		break;
> >> +	case SNP_GET_EXT_CONFIG:
> >> +		ret = sev_ioctl_snp_get_config(&input);
> >> +		break;
> >>   	default:
> >>   		ret = -EINVAL;
> >>   		goto out;
> >> @@ -1962,6 +2084,7 @@ int sev_dev_init(struct psp_device *psp)
> >>   		goto e_sev;
> >>   
> >>   	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
> >> +	mutex_init(&sev->snp_certs_lock);
> >>   
> >>   	psp->sev_data = sev;
> >>   
> >> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
> >> index 19d79f9d4212..41d5353d5bab 100644
> >> --- a/drivers/crypto/ccp/sev-dev.h
> >> +++ b/drivers/crypto/ccp/sev-dev.h
> >> @@ -66,6 +66,10 @@ struct sev_device {
> >>   
> >>   	bool snp_initialized;
> >>   	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
> >> +	void *snp_certs_data;
> >> +	u32 snp_certs_len;
> >> +	struct mutex snp_certs_lock;
> >> +	struct sev_user_data_snp_config snp_config;
> >>   };
> >>   
> >>   int sev_dev_init(struct psp_device *psp);
> >> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
> >> index 5adfaea7df97..c20d37586d21 100644
> >> --- a/include/uapi/linux/psp-sev.h
> >> +++ b/include/uapi/linux/psp-sev.h
> >> @@ -29,6 +29,8 @@ enum {
> >>   	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
> >>   	SEV_GET_ID2,
> >>   	SNP_PLATFORM_STATUS,
> >> +	SNP_SET_EXT_CONFIG,
> >> +	SNP_GET_EXT_CONFIG,
> >>   
> >>   	SEV_MAX,
> >>   };
> >> @@ -192,6 +194,21 @@ struct sev_user_data_snp_config {
> >>   	__u8 rsvd1[52];
> >>   } __packed;
> >>   
> >> +/**
> >> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
> >> + *
> >> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
> >> + *		reported_tcb does not need to be updated.
> >> + * @certs_address: address of extended guest request certificate chain or
> >> + *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
> >> + * @certs_len: length of the certs
> >> + */
> >> +struct sev_user_data_ext_snp_config {
> >> +	__u64 config_address;		/* In */
> >> +	__u64 certs_address;		/* In */
> >> +	__u32 certs_len;		/* In */
> >> +};
> >> +
> >>   /**
> >>    * struct sev_issue_cmd - SEV ioctl parameters
> >>    *
> > 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 28/56] crypto: ccp: Provide APIs to query extended attestation report
  2023-02-22 22:35     ` Kalra, Ashish
@ 2023-02-23  8:14       ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-23  8:14 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	nikunj.dadhania, Brijesh Singh

On Wed, 22 Feb 2023 16:35:43 -0600
"Kalra, Ashish" <ashish.kalra@amd.com> wrote:

> On 2/22/2023 2:24 PM, Zhi Wang wrote:
> > On Mon, 20 Feb 2023 12:38:19 -0600
> > Michael Roth <michael.roth@amd.com> wrote:
> > 
> > It seems in the discussion:
> > https://lore.kernel.org/lkml/f18fae8b-a928-cd82-e0b3-eac62ad3e106@amd.com/,
> > this API is going to be removed. Will that fix land in this patch series or not?
> > If not, It would be better to mention it in the comment message of this one
> > or patch 45.
> > If yes, I guess this patch is not needed.
> >   
> 
> This API is definitely not going to be removed.
> 
> There will be some fixes and optimizations added to the API 
> implementation (as per the discussions) and that will be included in v9.
> 

Thanks.

I should use the term "this API is going to be refined" as
snp_guest_ext_guest_request() is going to be renamed and refined. I gave
this comment because when digging this patch, I found this API was going to be
changed in the discussion based on v7 when digging this patch. It would be
really nice to mention it in the v8 so that some review efforts can be saved.
For example, some people might choose to skip reviewing this one in v8 and get
back on it in the next version when it is ready. Or people can also evaluate
the possible changes in v9 when reviewing this part.

> Thanks,
> Ashish
> 
> >> From: Brijesh Singh <brijesh.singh@amd.com>
> >>
> >> Version 2 of the GHCB specification defines VMGEXIT that is used to get
> >> the extended attestation report. The extended attestation report includes
> >> the certificate blobs provided through the SNP_SET_EXT_CONFIG.
> >>
> >> The snp_guest_ext_guest_request() will be used by the hypervisor to get
> >> the extended attestation report. See the GHCB specification for more
> >> details.
> >>
> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> >> Signed-off-by: Michael Roth <michael.roth@amd.com>
> >> ---
> >>   drivers/crypto/ccp/sev-dev.c | 47 ++++++++++++++++++++++++++++++++++++
> >>   include/linux/psp-sev.h      | 33 +++++++++++++++++++++++++
> >>   2 files changed, 80 insertions(+)
> >>
> >> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> >> index b56b00ca2cd4..e65563bc8298 100644
> >> --- a/drivers/crypto/ccp/sev-dev.c
> >> +++ b/drivers/crypto/ccp/sev-dev.c
> >> @@ -2017,6 +2017,53 @@ int sev_guest_df_flush(int *error)
> >>   }
> >>   EXPORT_SYMBOL_GPL(sev_guest_df_flush);
> >>   
> >> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> >> +				unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
> >> +{
> >> +	unsigned long expected_npages;
> >> +	struct sev_device *sev;
> >> +	int rc;
> >> +
> >> +	if (!psp_master || !psp_master->sev_data)
> >> +		return -ENODEV;
> >> +
> >> +	sev = psp_master->sev_data;
> >> +
> >> +	if (!sev->snp_initialized)
> >> +		return -EINVAL;
> >> +
> >> +	mutex_lock(&sev->snp_certs_lock);
> >> +	/*
> >> +	 * Check if there is enough space to copy the certificate chain. Otherwise
> >> +	 * return ERROR code defined in the GHCB specification.
> >> +	 */
> >> +	expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
> >> +	if (*npages < expected_npages) {
> >> +		*npages = expected_npages;
> >> +		*fw_err = SNP_GUEST_REQ_INVALID_LEN;
> >> +		mutex_unlock(&sev->snp_certs_lock);
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)fw_err);
> >> +	if (rc) {
> >> +		mutex_unlock(&sev->snp_certs_lock);
> >> +		return rc;
> >> +	}
> >> +
> >> +	/* Copy the certificate blob */
> >> +	if (sev->snp_certs_data) {
> >> +		*npages = expected_npages;
> >> +		memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
> >> +	} else {
> >> +		*npages = 0;
> >> +	}
> >> +
> >> +	mutex_unlock(&sev->snp_certs_lock);
> >> +	return rc;
> >> +}
> >> +EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
> >> +
> >>   static void sev_exit(struct kref *ref)
> >>   {
> >>   	misc_deregister(&misc_dev->misc);
> >> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> >> index d19744807471..81bafc049eca 100644
> >> --- a/include/linux/psp-sev.h
> >> +++ b/include/linux/psp-sev.h
> >> @@ -931,6 +931,32 @@ void snp_free_firmware_page(void *addr);
> >>    */
> >>   void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
> >>   
> >> +/**
> >> + * snp_guest_ext_guest_request - perform the SNP extended guest request command
> >> + *  defined in the GHCB specification.
> >> + *
> >> + * @data: the input guest request structure
> >> + * @vaddr: address where the certificate blob need to be copied.
> >> + * @npages: number of pages for the certificate blob.
> >> + *    If the specified page count is less than the certificate blob size, then the
> >> + *    required page count is returned with error code defined in the GHCB spec.
> >> + *    If the specified page count is more than the certificate blob size, then
> >> + *    page count is updated to reflect the amount of valid data copied in the
> >> + *    vaddr.
> >> + *
> >> + * @sev_ret: sev command return code
> >> + *
> >> + * Returns:
> >> + * 0 if the sev successfully processed the command
> >> + * -%ENODEV    if the sev device is not available
> >> + * -%ENOTSUPP  if the sev does not support SEV
> >> + * -%ETIMEDOUT if the sev command timed out
> >> + * -%EIO       if the sev returned a non-zero return code
> >> + */
> >> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> >> +				unsigned long vaddr, unsigned long *npages,
> >> +				unsigned long *error);
> >> +
> >>   #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
> >>   
> >>   static inline int
> >> @@ -968,6 +994,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
> >>   
> >>   static inline void snp_free_firmware_page(void *addr) { }
> >>   
> >> +static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> >> +					      unsigned long vaddr, unsigned long *n,
> >> +					      unsigned long *error)
> >> +{
> >> +	return -ENODEV;
> >> +}
> >> +
> >>   #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
> >>   
> >>   #endif	/* __PSP_SEV_H__ */  
> >   


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2023-02-23  6:38       ` Zhi Wang
@ 2023-02-23 14:19         ` Tom Lendacky
  0 siblings, 0 replies; 147+ messages in thread
From: Tom Lendacky @ 2023-02-23 14:19 UTC (permalink / raw)
  To: Zhi Wang, Kalra, Ashish
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, nikunj.dadhania, Brijesh Singh

On 2/23/23 00:38, Zhi Wang wrote:
> On Wed, 22 Feb 2023 16:43:54 -0600
> "Kalra, Ashish" <ashish.kalra@amd.com> wrote:
> 
>> On 2/22/2023 6:32 AM, Zhi Wang wrote:
>>> On Mon, 20 Feb 2023 12:38:18 -0600
>>> Michael Roth <michael.roth@amd.com> wrote:
>>>
>>>> From: Brijesh Singh <brijesh.singh@amd.com>
>>>>
>>>> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
>>>> system-wide configuration value for SNP guests. The information includes
>>>> the TCB version string to be reported in guest attestation reports.
>>>>
>>>> Version 2 of the GHCB specification adds an NAE (SNP extended guest
>>>> request) that a guest can use to query the reports that include additional
>>>> certificates.
>>>>
>>>> In both cases, userspace provided additional data is included in the
>>>> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
>>>> command to give the certificate blob and the reported TCB version string
>>>> at once. Note that the specification defines certificate blob with a
>>>> specific GUID format; the userspace is responsible for building the
>>>> proper certificate blob. The ioctl treats it an opaque blob.
>>>>
>>>> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
>>>> command that can be used to obtain the data programmed through the
>>>> SNP_SET_EXT_CONFIG.
>>>>
>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>>>> ---
>>>>    Documentation/virt/coco/sev-guest.rst |  27 ++++++
>>>>    drivers/crypto/ccp/sev-dev.c          | 123 ++++++++++++++++++++++++++
>>>>    drivers/crypto/ccp/sev-dev.h          |   4 +
>>>>    include/uapi/linux/psp-sev.h          |  17 ++++
>>>>    4 files changed, 171 insertions(+)
>>>>
>>>> diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
>>>> index 11ea67c944df..6cad4226c348 100644
>>>> --- a/Documentation/virt/coco/sev-guest.rst
>>>> +++ b/Documentation/virt/coco/sev-guest.rst
>>>> @@ -145,6 +145,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
>>>>    status includes API major, minor version and more. See the SEV-SNP
>>>>    specification for further details.
>>>>    
>>>> +2.5 SNP_SET_EXT_CONFIG
>>>> +----------------------
>>>> +:Technology: sev-snp
>>>> +:Type: hypervisor ioctl cmd
>>>> +:Parameters (in): struct sev_data_snp_ext_config
>>>> +:Returns (out): 0 on success, -negative on error
>>>> +
>>>> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
>>>> +reported TCB version in the attestation report. The command is similar to
>>>> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
>>>> +command also accepts an additional certificate blob defined in the GHCB
>>>> +specification.
>>>> +
>>>> +If the certs_address is zero, then the previous certificate blob will deleted.
>>>> +For more information on the certificate blob layout, see the GHCB spec
>>>> +(extended guest request message).
>>>> +
>>>> +2.6 SNP_GET_EXT_CONFIG
>>>> +----------------------
>>>> +:Technology: sev-snp
>>>> +:Type: hypervisor ioctl cmd
>>>> +:Parameters (in): struct sev_data_snp_ext_config
>>>> +:Returns (out): 0 on success, -negative on error
>>>> +
>>>> +The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
>>>> +through the SNP_SET_EXT_CONFIG.
>>>> +
>>>>    3. SEV-SNP CPUID Enforcement
>>>>    ============================
>>>>    
>>>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>>>> index 65e13a562f3b..b56b00ca2cd4 100644
>>>> --- a/drivers/crypto/ccp/sev-dev.c
>>>> +++ b/drivers/crypto/ccp/sev-dev.c
>>>> @@ -1481,6 +1481,10 @@ static int __sev_snp_shutdown_locked(int *error)
>>>>    	data.length = sizeof(data);
>>>>    	data.iommu_snp_shutdown = 1;
>>>>    
>>>> +	/* Free the memory used for caching the certificate data */
>>>> +	kfree(sev->snp_certs_data);
>>>> +	sev->snp_certs_data = NULL;
>>>> +
>>>>    	wbinvd_on_all_cpus();
>>>>    
>>>>    retry:
>>>> @@ -1793,6 +1797,118 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
>>>>    	return ret;
>>>>    }
>>>>    
>>>> +static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
>>>> +{
>>>> +	struct sev_device *sev = psp_master->sev_data;
>>>> +	struct sev_user_data_ext_snp_config input;
>>>> +	int ret;
>>>> +
>>>> +	if (!sev->snp_initialized || !argp->data)
>>>> +		return -EINVAL;
>>>> +
>>>> +	memset(&input, 0, sizeof(input));
>>>> +
>>>> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
>>>> +		return -EFAULT;
>>>> +
>>>> +	/* Copy the TCB version programmed through the SET_CONFIG to userspace */
>>>> +	if (input.config_address) {
>>>> +		if (copy_to_user((void * __user)input.config_address,
>>>> +				 &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	/* Copy the extended certs programmed through the SNP_SET_CONFIG */
>>>> +	if (input.certs_address && sev->snp_certs_data) {
>>>> +		if (input.certs_len < sev->snp_certs_len) {
>>>> +			/* Return the certs length to userspace */
>>>> +			input.certs_len = sev->snp_certs_len;
>>>> +
>>>> +			ret = -ENOSR;
>>>> +			goto e_done;
>>>> +		}
>>>> +
>>>
>>> What about if input.certs_len > sev->snp_certs_len? Is it possbile for the
>>> userspace to know the length of data in the buffer? (I guess it might be able
>>> to know the certs len through the blob data, but a comment here would be nice)
>>>
>>
>> If userspace provides an input buffer/length smaller then snp_certs_len,
>> then the above returns the "required" certs length back to userspace.
>>
>> And what is the issue if input.certs_len > sev->snp_certs_len, the
>> buffer returned back to userspace is sev->snp_certs_len as below.
>>
> 
> My point is: How can the userspace know the length of return data is shorter
> than input.certs_len when input.certs_len > sev->snp_serts_len? as the length
> is only returned when input.certs_len < sev->snp_certs_len.

The returned data has a defined format that can be used to calculate the 
overall length.

Thanks,
Tom

> 
>> Thanks,
>> Ashish
>>
>>>> +		if (copy_to_user((void * __user)input.certs_address,
>>>> +				 sev->snp_certs_data, sev->snp_certs_len))
>>>> +			return -EFAULT;
>>>> +	}
>>>> +
>>>> +	ret = 0;
>>>> +
>>>> +e_done:
>>>> +	if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
>>>> +		ret = -EFAULT;
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
>>>> +{
>>>> +	struct sev_device *sev = psp_master->sev_data;
>>>> +	struct sev_user_data_ext_snp_config input;
>>>> +	struct sev_user_data_snp_config config;
>>>> +	void *certs = NULL;
>>>> +	int ret = 0;
>>>> +
>>>> +	if (!sev->snp_initialized || !argp->data)
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (!writable)
>>>> +		return -EPERM;
>>>> +
>>>> +	memset(&input, 0, sizeof(input));
>>>> +
>>>> +	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
>>>> +		return -EFAULT;
>>>> +
>>>> +	/* Copy the certs from userspace */
>>>> +	if (input.certs_address) {
>>>> +		if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
>>>> +			return -EINVAL;
>>>> +
>>>> +		certs = psp_copy_user_blob(input.certs_address, input.certs_len);
>>>> +		if (IS_ERR(certs))
>>>> +			return PTR_ERR(certs);
>>>> +	}
>>>> +
>>>> +	/* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
>>>> +	if (input.config_address) {
>>>> +		memset(&config, 0, sizeof(config));
>>>> +		if (copy_from_user(&config,
>>>> +				   (void __user *)input.config_address, sizeof(config))) {
>>>> +			ret = -EFAULT;
>>>> +			goto e_free;
>>>> +		}
>>>> +
>>>> +		ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
>>>> +		if (ret)
>>>> +			goto e_free;
>>>> +
>>>> +		memcpy(&sev->snp_config, &config, sizeof(config));
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * If the new certs are passed then cache it else free the old certs.
>>>> +	 */
>>>> +	mutex_lock(&sev->snp_certs_lock);
>>>> +	if (certs) {
>>>> +		kfree(sev->snp_certs_data);
>>>> +		sev->snp_certs_data = certs;
>>>> +		sev->snp_certs_len = input.certs_len;
>>>> +	} else {
>>>> +		kfree(sev->snp_certs_data);
>>>> +		sev->snp_certs_data = NULL;
>>>> +		sev->snp_certs_len = 0;
>>>> +	}
>>>> +	mutex_unlock(&sev->snp_certs_lock);
>>>> +
>>>> +	return 0;
>>>> +
>>>> +e_free:
>>>> +	kfree(certs);
>>>> +	return ret;
>>>> +}
>>>> +
>>>>    static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>>>>    {
>>>>    	void __user *argp = (void __user *)arg;
>>>> @@ -1847,6 +1963,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>>>>    	case SNP_PLATFORM_STATUS:
>>>>    		ret = sev_ioctl_snp_platform_status(&input);
>>>>    		break;
>>>> +	case SNP_SET_EXT_CONFIG:
>>>> +		ret = sev_ioctl_snp_set_config(&input, writable);
>>>> +		break;
>>>> +	case SNP_GET_EXT_CONFIG:
>>>> +		ret = sev_ioctl_snp_get_config(&input);
>>>> +		break;
>>>>    	default:
>>>>    		ret = -EINVAL;
>>>>    		goto out;
>>>> @@ -1962,6 +2084,7 @@ int sev_dev_init(struct psp_device *psp)
>>>>    		goto e_sev;
>>>>    
>>>>    	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
>>>> +	mutex_init(&sev->snp_certs_lock);
>>>>    
>>>>    	psp->sev_data = sev;
>>>>    
>>>> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
>>>> index 19d79f9d4212..41d5353d5bab 100644
>>>> --- a/drivers/crypto/ccp/sev-dev.h
>>>> +++ b/drivers/crypto/ccp/sev-dev.h
>>>> @@ -66,6 +66,10 @@ struct sev_device {
>>>>    
>>>>    	bool snp_initialized;
>>>>    	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
>>>> +	void *snp_certs_data;
>>>> +	u32 snp_certs_len;
>>>> +	struct mutex snp_certs_lock;
>>>> +	struct sev_user_data_snp_config snp_config;
>>>>    };
>>>>    
>>>>    int sev_dev_init(struct psp_device *psp);
>>>> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
>>>> index 5adfaea7df97..c20d37586d21 100644
>>>> --- a/include/uapi/linux/psp-sev.h
>>>> +++ b/include/uapi/linux/psp-sev.h
>>>> @@ -29,6 +29,8 @@ enum {
>>>>    	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
>>>>    	SEV_GET_ID2,
>>>>    	SNP_PLATFORM_STATUS,
>>>> +	SNP_SET_EXT_CONFIG,
>>>> +	SNP_GET_EXT_CONFIG,
>>>>    
>>>>    	SEV_MAX,
>>>>    };
>>>> @@ -192,6 +194,21 @@ struct sev_user_data_snp_config {
>>>>    	__u8 rsvd1[52];
>>>>    } __packed;
>>>>    
>>>> +/**
>>>> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
>>>> + *
>>>> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
>>>> + *		reported_tcb does not need to be updated.
>>>> + * @certs_address: address of extended guest request certificate chain or
>>>> + *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
>>>> + * @certs_len: length of the certs
>>>> + */
>>>> +struct sev_user_data_ext_snp_config {
>>>> +	__u64 config_address;		/* In */
>>>> +	__u64 certs_address;		/* In */
>>>> +	__u32 certs_len;		/* In */
>>>> +};
>>>> +
>>>>    /**
>>>>     * struct sev_issue_cmd - SEV ioctl parameters
>>>>     *
>>>
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 32/56] KVM: SVM: Add initial SEV-SNP support
  2023-02-20 18:38 ` [PATCH RFC v8 32/56] KVM: SVM: Add initial SEV-SNP support Michael Roth
@ 2023-02-23 17:46   ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-23 17:46 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:23 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The next generation of SEV is called SEV-SNP (Secure Nested Paging).
> SEV-SNP builds upon existing SEV and SEV-ES functionality  while adding new
> hardware based security protection. SEV-SNP adds strong memory encryption
> integrity protection to help prevent malicious hypervisor-based attacks
> such as data replay, memory re-mapping, and more, to create an isolated
> execution environment.
> 
> The SNP feature is added incrementally, the later patches adds a new module
> parameters that can be used to enabled SEV-SNP in the KVM.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 10 +++++++++-
>  arch/x86/kvm/svm/svm.h |  8 ++++++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 9e9efb42a766..51db01b282eb 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -58,6 +58,9 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
>  #define sev_es_enabled false
>  #endif /* CONFIG_KVM_AMD_SEV */
>  
> +/* enable/disable SEV-SNP support */
> +static bool sev_snp_enabled;
> +
>  #define AP_RESET_HOLD_NONE		0
>  #define AP_RESET_HOLD_NAE_EVENT		1
>  #define AP_RESET_HOLD_MSR_PROTO		2
> @@ -2306,6 +2309,7 @@ void __init sev_hardware_setup(void)
>  {
>  #ifdef CONFIG_KVM_AMD_SEV
>  	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
> +	bool sev_snp_supported = false;
>  	bool sev_es_supported = false;
>  	bool sev_supported = false;
>  
> @@ -2385,12 +2389,16 @@ void __init sev_hardware_setup(void)
>  	if (misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count))
>  		goto out;
>  
> -	pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count);
>  	sev_es_supported = true;
> +	sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
> +
> +	pr_info("SEV-ES %ssupported: %u ASIDs\n",
> +		sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);
>  
>  out:
>  	sev_enabled = sev_supported;
>  	sev_es_enabled = sev_es_supported;
> +	sev_snp_enabled = sev_snp_supported;
>  #endif
>  }
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 5efcf036ccad..8eb1b51e92f5 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -76,6 +76,7 @@ enum {
>  struct kvm_sev_info {
>  	bool active;		/* SEV enabled guest */
>  	bool es_active;		/* SEV-ES enabled guest */
> +	bool snp_active;	/* SEV-SNP enabled guest */
>  	unsigned int asid;	/* ASID used for this guest */
>  	unsigned int handle;	/* SEV firmware handle */
>  	int fd;			/* SEV device fd */
> @@ -323,6 +324,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
>  #endif
>  }
>  
> +static inline bool sev_snp_guest(struct kvm *kvm)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	return sev_es_guest(kvm) && sev->snp_active;
> +}
> +

Maybe also use __always_inline like sev_es_guest() above?

It seems solved some warnings before:

https://lore.kernel.org/all/20210624095147.880513802@infradead.org/

>  static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
>  {
>  	vmcb->control.clean = 0;


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  2023-02-20 18:38 ` [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
@ 2023-02-23 21:41   ` Zhi Wang
  2023-02-24 16:22     ` Tom Lendacky
  2023-04-26 17:06   ` Sabin Rapan
  1 sibling, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-02-23 21:41 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:25 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
> The command initializes a cryptographic digest context used to construct
> the measurement of the guest. If the guest is expected to be migrated,
> the command also binds a migration agent (MA) to the guest.
> 
> For more information see the SEV-SNP specification.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  .../virt/kvm/x86/amd-memory-encryption.rst    |  24 ++++
>  arch/x86/kvm/svm/sev.c                        | 121 +++++++++++++++++-
>  arch/x86/kvm/svm/svm.h                        |   1 +
>  include/uapi/linux/kvm.h                      |  10 ++
>  4 files changed, 153 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index 2432213bd0ea..58971fc02a15 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -461,6 +461,30 @@ The flags bitmap is defined as::
>  If the specified flags is not supported then return -EOPNOTSUPP, and the supported
>  flags are returned.
>  
> +19. KVM_SNP_LAUNCH_START
> +------------------------
> +
> +The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
> +context for the SEV-SNP guest. To create the encryption context, user must
> +provide a guest policy, migration agent (if any) and guest OS visible
> +workarounds value as defined SEV-SNP specification.
> +
> +Parameters (in): struct  kvm_snp_launch_start
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_snp_launch_start {
> +                __u64 policy;           /* Guest policy to use. */
> +                __u64 ma_uaddr;         /* userspace address of migration agent */
> +                __u8 ma_en;             /* 1 if the migration agent is enabled */
> +                __u8 imi_en;            /* set IMI to 1. */
> +                __u8 gosvw[16];         /* guest OS visible workarounds */
> +        };
> +
> +See the SEV-SNP specification for further detail on the launch input.
> +
>  References
>  ==========
>  
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a8efe1f6bf77..097bb2138360 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -22,6 +22,7 @@
>  #include <asm/pkru.h>
>  #include <asm/trapnr.h>
>  #include <asm/fpu/xcr.h>
> +#include <asm/sev.h>
>  
>  #include "mmu.h"
>  #include "x86.h"
> @@ -75,6 +76,8 @@ static unsigned int nr_asids;
>  static unsigned long *sev_asid_bitmap;
>  static unsigned long *sev_reclaim_asid_bitmap;
>  
> +static int snp_decommission_context(struct kvm *kvm);
> +
>  struct enc_region {
>  	struct list_head list;
>  	unsigned long npages;
> @@ -100,12 +103,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
>  	down_write(&sev_deactivate_lock);
>  
>  	wbinvd_on_all_cpus();
> -	ret = sev_guest_df_flush(&error);
> +
> +	if (sev_snp_enabled)
> +		ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
> +	else
> +		ret = sev_guest_df_flush(&error);
>  
>  	up_write(&sev_deactivate_lock);
>  
>  	if (ret)
> -		pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
> +		pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
> +		       sev_snp_enabled ? "-SNP" : "", ret, error);
>  
>  	return ret;
>  }
> @@ -2011,6 +2019,80 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>  	return ret;
>  }
>  
> +/*
> + * The guest context contains all the information, keys and metadata
> + * associated with the guest that the firmware tracks to implement SEV
> + * and SNP features. The firmware stores the guest context in hypervisor
> + * provide page via the SNP_GCTX_CREATE command.
> + */
> +static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct sev_data_snp_addr data = {};
> +	void *context;
> +	int rc;
> +
> +	/* Allocate memory for context page */
> +	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> +	if (!context)
> +		return NULL;
> +
> +	data.gctx_paddr = __psp_pa(context);
> +	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> +	if (rc) {
> +		snp_free_firmware_page(context);
> +		return NULL;
> +	}
> +
> +	return context;
> +}
> +
> +static int snp_bind_asid(struct kvm *kvm, int *error)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_snp_activate data = {0};
> +
> +	data.gctx_paddr = __psp_pa(sev->snp_context);
> +	data.asid   = sev_get_asid(kvm);
> +	return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);

According to the SNP ABI specification[1] 8.10 SNP_ACTIVATE:

"The firmware checks that a DF_FLUSH is not required. If a DF_FLUSH is
required, the firmware returns DFFLUSH_REQUIRED. Note that all ASIDs are
marked to require a DF_FLUSH at reset."

Do we need a SNP_DF_FLUSH here before calling SNP_ACTIVATE or handle the
situation if the PSP firmware returns DFFLUSH_REQUIRED?

[1] https://www.amd.com/system/files/TechDocs/56860.pdf

> +}
> +
> +static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_snp_launch_start start = {0};
> +	struct kvm_sev_snp_launch_start params;
> +	int rc;
> +
> +	if (!sev_snp_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> +		return -EFAULT;
> +
> +	sev->snp_context = snp_context_create(kvm, argp);
> +	if (!sev->snp_context)
> +		return -ENOTTY;
> +
> +	start.gctx_paddr = __psp_pa(sev->snp_context);
> +	start.policy = params.policy;
> +	memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
> +	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
> +	if (rc)
> +		goto e_free_context;
> +
> +	sev->fd = argp->sev_fd;
> +	rc = snp_bind_asid(kvm, &argp->error);
> +	if (rc)
> +		goto e_free_context;
> +
> +	return 0;
> +
> +e_free_context:
> +	snp_decommission_context(kvm);
> +
> +	return rc;
> +}
> +
>  int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -2101,6 +2183,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_RECEIVE_FINISH:
>  		r = sev_receive_finish(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_SNP_LAUNCH_START:
> +		r = snp_launch_start(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> @@ -2292,6 +2377,28 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>  	return ret;
>  }
>  
> +static int snp_decommission_context(struct kvm *kvm)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_snp_addr data = {};
> +	int ret;
> +
> +	/* If context is not created then do nothing */
> +	if (!sev->snp_context)
> +		return 0;
> +
> +	data.gctx_paddr = __sme_pa(sev->snp_context);
> +	ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
> +	if (WARN_ONCE(ret, "failed to release guest context"))
> +		return ret;
> +
> +	/* free the context page now */
> +	snp_free_firmware_page(sev->snp_context);
> +	sev->snp_context = NULL;
> +
> +	return 0;
> +}
> +
>  void sev_vm_destroy(struct kvm *kvm)
>  {
>  	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> @@ -2333,7 +2440,15 @@ void sev_vm_destroy(struct kvm *kvm)
>  		}
>  	}
>  
> -	sev_unbind_asid(kvm, sev->handle);
> +	if (sev_snp_guest(kvm)) {
> +		if (snp_decommission_context(kvm)) {
> +			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
> +			return;
> +		}
> +	} else {
> +		sev_unbind_asid(kvm, sev->handle);
> +	}
> +
>  	sev_asid_free(sev);
>  }
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 56a5c96d8a36..740969b57425 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -92,6 +92,7 @@ struct kvm_sev_info {
>  	struct misc_cg *misc_cg; /* For misc cgroup accounting */
>  	atomic_t migration_in_progress;
>  	u64 snp_init_flags;
> +	void *snp_context;      /* SNP guest context page */
>  };
>  
>  struct kvm_svm {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 499cc323f793..cf19799ca5ce 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1919,6 +1919,7 @@ enum sev_cmd_id {
>  
>  	/* SNP specific commands */
>  	KVM_SEV_SNP_INIT,
> +	KVM_SEV_SNP_LAUNCH_START,
>  
>  	KVM_SEV_NR_MAX,
>  };
> @@ -2026,6 +2027,15 @@ struct kvm_snp_init {
>  	__u64 flags;
>  };
>  
> +struct kvm_sev_snp_launch_start {
> +	__u64 policy;
> +	__u64 ma_uaddr;
> +	__u8 ma_en;
> +	__u8 imi_en;
> +	__u8 gosvw[16];
> +	__u8 pad[6];
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2023-02-20 18:38 ` [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
@ 2023-02-24 11:01   ` Alexander Graf
  2023-02-28 19:34   ` Zhi Wang
  2023-04-17 13:05   ` Alexander Graf
  2 siblings, 0 replies; 147+ messages in thread
From: Alexander Graf @ 2023-02-24 11:01 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh


On 20.02.23 19:38, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> Version 2 of GHCB specification added the support for two SNP Guest
> Request Message NAE events. The events allows for an SEV-SNP guest to
> make request to the SEV-SNP firmware through hypervisor using the
> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>
> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
> difference of an additional certificate blob that can be passed through
> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
> provides snp_guest_ext_guest_request() that is used by the KVM to get
> both the report and certificate data at once.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   arch/x86/kvm/svm/sev.c | 185 +++++++++++++++++++++++++++++++++++++++--
>   arch/x86/kvm/svm/svm.h |   2 +
>   2 files changed, 181 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 197b1f904567..92179614102e 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -327,6 +327,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>                  if (ret)
>                          goto e_free;
>
> +               mutex_init(&sev->guest_req_lock);
>                  ret = sev_snp_init(&argp->error, false);
>          } else {
>                  ret = sev_platform_init(&argp->error);
> @@ -2059,23 +2060,34 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>    */
>   static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   {
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>          struct sev_data_snp_addr data = {};
> -       void *context;
> +       void *context, *certs_data;
>          int rc;
>
> +       /* Allocate memory used for the certs data in SNP guest request */
> +       certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
> +       if (!certs_data)
> +               return NULL;


I don't understand why this is part of the context creation, which again 
is part of the KVM_SEV_SNP_LAUNCH_START op. Would you mind to create a 
separate op for this and then check later on while you use the buffer 
whether it was ever allocated?


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 35/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2023-02-20 18:38 ` [PATCH RFC v8 35/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
@ 2023-02-24 11:55   ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-24 11:55 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:26 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
> guest's memory. The data is encrypted with the cryptographic context
> created with the KVM_SEV_SNP_LAUNCH_START.
> 
> In addition to the inserting data, it can insert a two special pages
> into the guests memory: the secrets page and the CPUID page.
> 
> While terminating the guest, reclaim the guest pages added in the RMP
> table. If the reclaim fails, then the page is no longer safe to be
> released back to the system and leak them.
> 
> For more information see the SEV-SNP specification.
> 
> Co-developed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  .../virt/kvm/x86/amd-memory-encryption.rst    |  29 +++
>  arch/x86/kvm/svm/sev.c                        | 190 ++++++++++++++++++
>  include/uapi/linux/kvm.h                      |  19 ++
>  3 files changed, 238 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index 58971fc02a15..c94be8e6d657 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -485,6 +485,35 @@ Returns: 0 on success, -negative on error
>  
>  See the SEV-SNP specification for further detail on the launch input.
>  
> +20. KVM_SNP_LAUNCH_UPDATE
> +-------------------------
> +
> +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
> +calculates a measurement of the memory contents. The measurement is a signature
> +of the memory contents that can be sent to the guest owner as an attestation
> +that the memory was encrypted correctly by the firmware.
> +
> +Parameters (in): struct  kvm_snp_launch_update
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_snp_launch_update {
> +                __u64 start_gfn;        /* Guest page number to start from. */
> +                __u64 uaddr;            /* userspace address need to be encrypted */
> +                __u32 len;              /* length of memory region */
> +                __u8 imi_page;          /* 1 if memory is part of the IMI */
> +                __u8 page_type;         /* page type */
> +                __u8 vmpl3_perms;       /* VMPL3 permission mask */
> +                __u8 vmpl2_perms;       /* VMPL2 permission mask */
> +                __u8 vmpl1_perms;       /* VMPL1 permission mask */
> +        };
> +
> +See the SEV-SNP spec for further details on how to build the VMPL permission
> +mask and page type.
> +
> +
>  References
>  ==========
>  
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 097bb2138360..03dd227f6090 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -234,6 +234,37 @@ static void sev_decommission(unsigned int handle)
>  	sev_guest_decommission(&decommission, NULL);
>  }
>  
> +static int snp_page_reclaim(u64 pfn)
> +{
> +	struct sev_data_snp_page_reclaim data = {0};
> +	int err, rc;
> +
> +	data.paddr = __sme_set(pfn << PAGE_SHIFT);
> +	rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> +	if (rc) {
> +		/*
> +		 * If the reclaim failed, then page is no longer safe
> +		 * to use.
> +		 */
> +		snp_mark_pages_offline(pfn,
> +				       page_level_size(PG_LEVEL_4K) >> PAGE_SHIFT);
> +	}
> +
> +	return rc;
> +}
> +
> +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
> +{
> +	int rc;
> +
> +	rc = rmp_make_shared(pfn, level);
> +	if (rc && leak)
> +		snp_mark_pages_offline(pfn,
> +				       page_level_size(level) >> PAGE_SHIFT);
> +
> +	return rc;
> +}
> +

PATCH 24 has similar functions. It would be better to expose them.

>  static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>  {
>  	struct sev_data_deactivate deactivate;
> @@ -2093,6 +2124,162 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return rc;
>  }
>  
> +static int snp_launch_update_gfn_handler(struct kvm *kvm,
> +					 struct kvm_gfn_range *range,
> +					 void *opaque)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct kvm_memory_slot *memslot = range->slot;
> +	struct sev_data_snp_launch_update data = {0};
> +	struct kvm_sev_snp_launch_update params;
> +	struct kvm_sev_cmd *argp = opaque;
> +	int *error = &argp->error;
> +	int i, n = 0, ret = 0;
> +	unsigned long npages;
> +	kvm_pfn_t *pfns;
> +	gfn_t gfn;
> +
> +	if (!kvm_slot_can_be_private(memslot)) {
> +		pr_err("SEV-SNP requires restricted memory.\n");
> +		return -EINVAL;
> +	}
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
> +		pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
> +		return -EFAULT;
> +	}
> +
> +	data.gctx_paddr = __psp_pa(sev->snp_context);
> +
> +	npages = range->end - range->start;
> +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
> +	if (!pfns)
> +		return -ENOMEM;
> +
> +	pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
> +		 range->start, range->end, params.page_type);
> +
> +	for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
> +		int order, level;
> +		void *kvaddr;
> +
> +		ret = kvm_restrictedmem_get_pfn(memslot, gfn, &pfns[i], &order);
> +		if (ret)
> +			goto e_release;
> +
> +		n++;
> +		ret = snp_lookup_rmpentry((u64)pfns[i], &level);
> +		if (ret) {
> +			pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d\n",
> +			       gfn, ret);
> +			return -EFAULT;
> +		}
> +
> +		kvaddr = pfn_to_kaddr(pfns[i]);
> +		if (!virt_addr_valid(kvaddr)) {
> +			pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
> +			ret = -EINVAL;
> +			goto e_release;
> +		}
> +
> +		ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> +		if (ret) {
> +			pr_err("Guest read failed, ret: 0x%x\n", ret);
> +			goto e_release;
> +		}
> +
> +		ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
> +				       sev_get_asid(kvm), true);
> +		if (ret) {
> +			ret = -EFAULT;
> +			goto e_release;
> +		}
> +
> +		data.address = __sme_set(pfns[i] << PAGE_SHIFT);
> +		data.page_size = X86_TO_RMP_PG_LEVEL(PG_LEVEL_4K);
> +		data.page_type = params.page_type;
> +		data.vmpl3_perms = params.vmpl3_perms;
> +		data.vmpl2_perms = params.vmpl2_perms;
> +		data.vmpl1_perms = params.vmpl1_perms;
> +		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> +				      &data, error);
> +		if (ret) {
> +			pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
> +			       ret, *error);
> +			snp_page_reclaim(pfns[i]);
> +
> +			/*
> +			 * When invalid CPUID function entries are detected, the firmware
> +			 * corrects these entries for debugging purpose and leaves the
> +			 * page unencrypted so it can be provided users for debugging
> +			 * and error-reporting.
> +			 *
> +			 * Copy the corrected CPUID page back to shared memory so
> +			 * userpsace can retrieve this information.
> +			 */
> +			if (params.page_type == SNP_PAGE_TYPE_CPUID &&
> +			    *error == SEV_RET_INVALID_PARAM) {
> +				int ret;
> +
> +				host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
> +
> +				ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> +				if (ret)
> +					pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
> +					       ret);
> +			}
> +
> +
> +			goto e_release;
> +		}
> +	}
> +
> +	/*
> +	 * Memory attribute updates via KVM_SET_MEMORY_ATTRIBUTES are serialized
> +	 * via kvm->slots_lock, so use the same protocol for updating them here.
> +	 */
> +	mutex_lock(&kvm->slots_lock);
> +	kvm_vm_set_region_attr(kvm, range->start, range->end, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +	mutex_unlock(&kvm->slots_lock);
> +
> +e_release:
> +	/* Content of memory is updated, mark pages dirty */
> +	for (i = 0; i < n; i++) {
> +		set_page_dirty(pfn_to_page(pfns[i]));
> +		mark_page_accessed(pfn_to_page(pfns[i]));
> +
> +		/*
> +		 * If its an error, then update RMP entry to change page ownership
> +		 * to the hypervisor.
> +		 */
> +		if (ret)
> +			host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
> +
> +		put_page(pfn_to_page(pfns[i]));
> +	}
> +
> +	kvfree(pfns);
> +	return ret;
> +}
> +
> +static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct kvm_sev_snp_launch_update params;
> +
> +	if (!sev_snp_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (!sev->snp_context)
> +		return -EINVAL;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> +		return -EFAULT;
> +
> +	return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
> +				      snp_launch_update_gfn_handler, argp);
> +}
> +
>  int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -2186,6 +2373,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_SNP_LAUNCH_START:
>  		r = snp_launch_start(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_SNP_LAUNCH_UPDATE:
> +		r = snp_launch_update(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index cf19799ca5ce..4098bba17aa4 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1920,6 +1920,7 @@ enum sev_cmd_id {
>  	/* SNP specific commands */
>  	KVM_SEV_SNP_INIT,
>  	KVM_SEV_SNP_LAUNCH_START,
> +	KVM_SEV_SNP_LAUNCH_UPDATE,
>  
>  	KVM_SEV_NR_MAX,
>  };
> @@ -2036,6 +2037,24 @@ struct kvm_sev_snp_launch_start {
>  	__u8 pad[6];
>  };
>  
> +#define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
> +#define KVM_SEV_SNP_PAGE_TYPE_VMSA		0x2
> +#define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
> +#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
> +#define KVM_SEV_SNP_PAGE_TYPE_SECRETS		0x5
> +#define KVM_SEV_SNP_PAGE_TYPE_CPUID		0x6
> +
> +struct kvm_sev_snp_launch_update {
> +	__u64 start_gfn;
> +	__u64 uaddr;
> +	__u32 len;
> +	__u8 imi_page;
> +	__u8 page_type;
> +	__u8 vmpl3_perms;
> +	__u8 vmpl2_perms;
> +	__u8 vmpl1_perms;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2023-02-20 18:38 ` [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event Michael Roth
@ 2023-02-24 12:37   ` Alexander Graf
  2023-02-28 20:47     ` Zhi Wang
  2023-04-04 22:48     ` Michael Roth
  0 siblings, 2 replies; 147+ messages in thread
From: Alexander Graf @ 2023-02-24 12:37 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh


On 20.02.23 19:38, Michael Roth wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
>
> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
> guests to alter the register state of the APs on their own. This allows
> the guest a way of simulating INIT-SIPI.
>
> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
> so as to avoid updating the VMSA pointer while the vCPU is running.
>
> For CREATE
>    The guest supplies the GPA of the VMSA to be used for the vCPU with
>    the specified APIC ID. The GPA is saved in the svm struct of the
>    target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>    to the vCPU and then the vCPU is kicked.
>
> For CREATE_ON_INIT:
>    The guest supplies the GPA of the VMSA to be used for the vCPU with
>    the specified APIC ID the next time an INIT is performed. The GPA is
>    saved in the svm struct of the target vCPU.
>
> For DESTROY:
>    The guest indicates it wishes to stop the vCPU. The GPA is cleared
>    from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
>    added to vCPU and then the vCPU is kicked.
>
> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
> as a result of the event or as a result of an INIT. The handler sets the
> vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
> leave the vCPU as not runnable. Any previous VMSA pages that were
> installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
> a new VMSA is to be installed, the VMSA guest page is pinned and set as
> the VMSA in the vCPU VMCB and the vCPU state is set to
> KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
> cleared in the vCPU VMCB and the vCPU state is left as
> KVM_MP_STATE_UNINITIALIZED to prevent it from being run.
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: add handling for restrictedmem]
> Signed-off-by: Michael Roth <michael.roth@amd.com>


What is the intended boot sequence for SEV-SNP guests? FWIW with this 
interface in place, guests will typically use in-guest VMSA pages to 
hold secondary vcpu state. But that means we're now allocating 4kb of 
memory for every vcpu that we create that will be for most of the 
guest's lifetime superfluous.

Wouldn't it make more sense to have a model where we only allocate the 
VMSA for the boot CPU and leave secondary allocation to the guest? We 
already need firmware changes for SEV-SNP - may as well make this one more.

[...]

> +
> +static int sev_snp_ap_creation(struct vcpu_svm *svm)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm_vcpu *target_vcpu;
> +       struct vcpu_svm *target_svm;
> +       unsigned int request;
> +       unsigned int apic_id;
> +       bool kick;
> +       int ret;
> +
> +       request = lower_32_bits(svm->vmcb->control.exit_info_1);
> +       apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
> +
> +       /* Validate the APIC ID */
> +       target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);


Out of curiosity: The target CPU can be my own vCPU, right?


> +       if (!target_vcpu) {
> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
> +                           apic_id);
> +               return -EINVAL;
> +       }
> +
> +       ret = 0;
> +
> +       target_svm = to_svm(target_vcpu);
> +
> +       /*
> +        * The target vCPU is valid, so the vCPU will be kicked unless the
> +        * request is for CREATE_ON_INIT. For any errors at this stage, the
> +        * kick will place the vCPU in an non-runnable state.
> +        */
> +       kick = true;
> +
> +       mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
> +
> +       target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
> +       target_svm->sev_es.snp_ap_create = true;
> +
> +       /* Interrupt injection mode shouldn't change for AP creation */
> +       if (request < SVM_VMGEXIT_AP_DESTROY) {
> +               u64 sev_features;
> +
> +               sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
> +               sev_features ^= sev->sev_features;
> +               if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
> +                                   vcpu->arch.regs[VCPU_REGS_RAX]);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +       }
> +
> +       switch (request) {
> +       case SVM_VMGEXIT_AP_CREATE_ON_INIT:
> +               kick = false;
> +               fallthrough;
> +       case SVM_VMGEXIT_AP_CREATE:
> +               if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
> +                                   svm->vmcb->control.exit_info_2);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               /*
> +                * Malicious guest can RMPADJUST a large page into VMSA which
> +                * will hit the SNP erratum where the CPU will incorrectly signal
> +                * an RMP violation #PF if a hugepage collides with the RMP entry
> +                * of VMSA page, reject the AP CREATE request if VMSA address from
> +                * guest is 2M aligned.


This will break genuine current Linux kernels that just happen to 
allocate a guest page, no? In fact, given enough vCPUs you're almost 
guaranteed to hit an aligned structure somewhere. What is the guest 
supposed to do in that situation?


> +                */
> +               if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
> +                       vcpu_unimpl(vcpu,
> +                                   "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
> +                                   svm->vmcb->control.exit_info_2);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
> +               break;
> +       case SVM_VMGEXIT_AP_DESTROY:


I don't understand the destroy path. Why does this case destroy anything?


> +               break;
> +       default:
> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
> +                           request);
> +               ret = -EINVAL;
> +               break;
> +       }
> +
> +out:
> +       if (kick) {
> +               if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
> +                       target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;


What if the guest AP goes through a create -> destroy -> create cycle? 
Will it stay runnable while destroyed?


Alex

> +
> +               kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
> +               kvm_vcpu_kick(target_vcpu);
> +       }
> +
> +       mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
> +
> +       return ret;
> +}
> +
>   static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>   {
>          struct vmcb_control_area *control = &svm->vmcb->control;



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 41/56] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2023-02-20 18:38 ` [PATCH RFC v8 41/56] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
@ 2023-02-24 15:06   ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-24 15:06 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:32 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> table to be private or shared using the Page State Change MSR protocol
> as defined in the GHCB specification.
> 
> Forward these requests to userspace via KVM_EXIT_VMGEXIT so the VMM can
> issue the KVM ioctls to update the page state accordingly.
> 

It would be better to describe the design purpose. Like, why should the
page state change VMGEIXT be forwarded to the userspace instead of being
handled in the kernel.

> Co-developed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/include/asm/sev-common.h |  9 ++++++++
>  arch/x86/kvm/svm/sev.c            | 25 +++++++++++++++++++++++
>  arch/x86/kvm/trace.h              | 34 +++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                |  1 +
>  4 files changed, 69 insertions(+)
> 
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 0a9055cdfae2..ee38f7408470 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -93,6 +93,10 @@ enum psc_op {
>  };
>  
>  #define GHCB_MSR_PSC_REQ		0x014
> +#define GHCB_MSR_PSC_GFN_POS		12
> +#define GHCB_MSR_PSC_GFN_MASK		GENMASK_ULL(39, 0)
> +#define GHCB_MSR_PSC_OP_POS		52
> +#define GHCB_MSR_PSC_OP_MASK		0xf
>  #define GHCB_MSR_PSC_REQ_GFN(gfn, op)			\
>  	/* GHCBData[55:52] */				\
>  	(((u64)((op) & 0xf) << 52) |			\
> @@ -102,6 +106,11 @@ enum psc_op {
>  	GHCB_MSR_PSC_REQ)
>  
>  #define GHCB_MSR_PSC_RESP		0x015
> +#define GHCB_MSR_PSC_ERROR_POS		32
> +#define GHCB_MSR_PSC_ERROR_MASK		GENMASK_ULL(31, 0)
> +#define GHCB_MSR_PSC_ERROR		GENMASK_ULL(31, 0)
> +#define GHCB_MSR_PSC_RSVD_POS		12
> +#define GHCB_MSR_PSC_RSVD_MASK		GENMASK_ULL(19, 0)
>  #define GHCB_MSR_PSC_RESP_VAL(val)			\
>  	/* GHCBData[63:32] */				\
>  	(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2613311f4fcc..a1a2686dde7b 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -30,6 +30,7 @@
>  #include "svm_ops.h"
>  #include "cpuid.h"
>  #include "trace.h"
> +#include "mmu.h"
>  
>  #ifndef CONFIG_KVM_AMD_SEV
>  /*
> @@ -3345,6 +3346,23 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
>  	svm->vmcb->control.ghcb_gpa = value;
>  }
>  
> +/*
> + * TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr
> + * and process that here accordingly.
> + */
> +static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	set_ghcb_msr_bits(svm, 0,
> +			  GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
> +
> +	set_ghcb_msr_bits(svm, 0, GHCB_MSR_PSC_RSVD_MASK, GHCB_MSR_PSC_RSVD_POS);
> +	set_ghcb_msr_bits(svm, GHCB_MSR_PSC_RESP, GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
> +
> +	return 1; /* resume */
> +}
> +
>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  {
>  	struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3445,6 +3463,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  				  GHCB_MSR_INFO_POS);
>  		break;
>  	}
> +	case GHCB_MSR_PSC_REQ:
> +		vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
> +		vcpu->run->vmgexit.ghcb_msr = control->ghcb_gpa;
> +		vcpu->arch.complete_userspace_io = snp_complete_psc_msr_protocol;
> +
> +		ret = -1;
> +		break;
>  	case GHCB_MSR_TERM_REQ: {
>  		u64 reason_set, reason_code;
>  
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 83843379813e..65861d2d086c 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -7,6 +7,7 @@
>  #include <asm/svm.h>
>  #include <asm/clocksource.h>
>  #include <asm/pvclock-abi.h>
> +#include <asm/sev-common.h>
>  
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM kvm
> @@ -1831,6 +1832,39 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
>  		  __entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
>  );
>  
> +/*
> + * Tracepoint for the SEV-SNP page state change processing
> + */
> +#define psc_operation					\
> +	{SNP_PAGE_STATE_PRIVATE, "private"},		\
> +	{SNP_PAGE_STATE_SHARED,  "shared"}		\
> +
> +TRACE_EVENT(kvm_snp_psc,
> +	TP_PROTO(unsigned int vcpu_id, u64 pfn, u64 gpa, u8 op, int level),
> +	TP_ARGS(vcpu_id, pfn, gpa, op, level),
> +
> +	TP_STRUCT__entry(
> +		__field(int, vcpu_id)
> +		__field(u64, pfn)
> +		__field(u64, gpa)
> +		__field(u8, op)
> +		__field(int, level)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu_id = vcpu_id;
> +		__entry->pfn = pfn;
> +		__entry->gpa = gpa;
> +		__entry->op = op;
> +		__entry->level = level;
> +	),
> +
> +	TP_printk("vcpu %u, pfn %llx, gpa %llx, op %s, level %d",
> +		  __entry->vcpu_id, __entry->pfn, __entry->gpa,
> +		  __print_symbolic(__entry->op, psc_operation),
> +		  __entry->level)
> +);
> +
>  #endif /* _TRACE_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 268c3d16894d..0154fc7a28c1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13515,6 +13515,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_snp_psc);
>  
>  static int __init kvm_x86_init(void)
>  {


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  2023-02-23 21:41   ` Zhi Wang
@ 2023-02-24 16:22     ` Tom Lendacky
  0 siblings, 0 replies; 147+ messages in thread
From: Tom Lendacky @ 2023-02-24 16:22 UTC (permalink / raw)
  To: Zhi Wang, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh

On 2/23/23 15:41, Zhi Wang wrote:
> On Mon, 20 Feb 2023 12:38:25 -0600
> Michael Roth <michael.roth@amd.com> wrote:
> 
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
>> The command initializes a cryptographic digest context used to construct
>> the measurement of the guest. If the guest is expected to be migrated,
>> the command also binds a migration agent (MA) to the guest.
>>
>> For more information see the SEV-SNP specification.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   .../virt/kvm/x86/amd-memory-encryption.rst    |  24 ++++
>>   arch/x86/kvm/svm/sev.c                        | 121 +++++++++++++++++-
>>   arch/x86/kvm/svm/svm.h                        |   1 +
>>   include/uapi/linux/kvm.h                      |  10 ++
>>   4 files changed, 153 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> index 2432213bd0ea..58971fc02a15 100644
>> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
>> @@ -461,6 +461,30 @@ The flags bitmap is defined as::
>>   If the specified flags is not supported then return -EOPNOTSUPP, and the supported
>>   flags are returned.
>>   
>> +19. KVM_SNP_LAUNCH_START
>> +------------------------
>> +
>> +The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
>> +context for the SEV-SNP guest. To create the encryption context, user must
>> +provide a guest policy, migration agent (if any) and guest OS visible
>> +workarounds value as defined SEV-SNP specification.
>> +
>> +Parameters (in): struct  kvm_snp_launch_start
>> +
>> +Returns: 0 on success, -negative on error
>> +
>> +::
>> +
>> +        struct kvm_sev_snp_launch_start {
>> +                __u64 policy;           /* Guest policy to use. */
>> +                __u64 ma_uaddr;         /* userspace address of migration agent */
>> +                __u8 ma_en;             /* 1 if the migration agent is enabled */
>> +                __u8 imi_en;            /* set IMI to 1. */
>> +                __u8 gosvw[16];         /* guest OS visible workarounds */
>> +        };
>> +
>> +See the SEV-SNP specification for further detail on the launch input.
>> +
>>   References
>>   ==========
>>   
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index a8efe1f6bf77..097bb2138360 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -22,6 +22,7 @@
>>   #include <asm/pkru.h>
>>   #include <asm/trapnr.h>
>>   #include <asm/fpu/xcr.h>
>> +#include <asm/sev.h>
>>   
>>   #include "mmu.h"
>>   #include "x86.h"
>> @@ -75,6 +76,8 @@ static unsigned int nr_asids;
>>   static unsigned long *sev_asid_bitmap;
>>   static unsigned long *sev_reclaim_asid_bitmap;
>>   
>> +static int snp_decommission_context(struct kvm *kvm);
>> +
>>   struct enc_region {
>>   	struct list_head list;
>>   	unsigned long npages;
>> @@ -100,12 +103,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
>>   	down_write(&sev_deactivate_lock);
>>   
>>   	wbinvd_on_all_cpus();
>> -	ret = sev_guest_df_flush(&error);
>> +
>> +	if (sev_snp_enabled)
>> +		ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
>> +	else
>> +		ret = sev_guest_df_flush(&error);
>>   
>>   	up_write(&sev_deactivate_lock);
>>   
>>   	if (ret)
>> -		pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
>> +		pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
>> +		       sev_snp_enabled ? "-SNP" : "", ret, error);
>>   
>>   	return ret;
>>   }
>> @@ -2011,6 +2019,80 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>>   	return ret;
>>   }
>>   
>> +/*
>> + * The guest context contains all the information, keys and metadata
>> + * associated with the guest that the firmware tracks to implement SEV
>> + * and SNP features. The firmware stores the guest context in hypervisor
>> + * provide page via the SNP_GCTX_CREATE command.
>> + */
>> +static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> +{
>> +	struct sev_data_snp_addr data = {};
>> +	void *context;
>> +	int rc;
>> +
>> +	/* Allocate memory for context page */
>> +	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>> +	if (!context)
>> +		return NULL;
>> +
>> +	data.gctx_paddr = __psp_pa(context);
>> +	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
>> +	if (rc) {
>> +		snp_free_firmware_page(context);
>> +		return NULL;
>> +	}
>> +
>> +	return context;
>> +}
>> +
>> +static int snp_bind_asid(struct kvm *kvm, int *error)
>> +{
>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +	struct sev_data_snp_activate data = {0};
>> +
>> +	data.gctx_paddr = __psp_pa(sev->snp_context);
>> +	data.asid   = sev_get_asid(kvm);
>> +	return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
> 
> According to the SNP ABI specification[1] 8.10 SNP_ACTIVATE:
> 
> "The firmware checks that a DF_FLUSH is not required. If a DF_FLUSH is
> required, the firmware returns DFFLUSH_REQUIRED. Note that all ASIDs are
> marked to require a DF_FLUSH at reset."
> 
> Do we need a SNP_DF_FLUSH here before calling SNP_ACTIVATE or handle the
> situation if the PSP firmware returns DFFLUSH_REQUIRED?
> 
> [1] https://www.amd.com/system/files/TechDocs/56860.pdf

This is related to ASID use. An initial DF_FLUSH is done which allows any 
SNP ASID to be used once without requiring a DF_FLUSH. Once an ASID has 
been used, it cannot be re-used until a DF_FLUSH is performed. The ASID 
recycling code takes care of that.

Thanks,
Tom

> 
>> +}
>> +
>> +static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> +{
>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +	struct sev_data_snp_launch_start start = {0};
>> +	struct kvm_sev_snp_launch_start params;
>> +	int rc;
>> +
>> +	if (!sev_snp_guest(kvm))
>> +		return -ENOTTY;
>> +
>> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
>> +		return -EFAULT;
>> +
>> +	sev->snp_context = snp_context_create(kvm, argp);
>> +	if (!sev->snp_context)
>> +		return -ENOTTY;
>> +
>> +	start.gctx_paddr = __psp_pa(sev->snp_context);
>> +	start.policy = params.policy;
>> +	memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
>> +	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
>> +	if (rc)
>> +		goto e_free_context;
>> +
>> +	sev->fd = argp->sev_fd;
>> +	rc = snp_bind_asid(kvm, &argp->error);
>> +	if (rc)
>> +		goto e_free_context;
>> +
>> +	return 0;
>> +
>> +e_free_context:
>> +	snp_decommission_context(kvm);
>> +
>> +	return rc;
>> +}
>> +
>>   int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>>   {
>>   	struct kvm_sev_cmd sev_cmd;
>> @@ -2101,6 +2183,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>>   	case KVM_SEV_RECEIVE_FINISH:
>>   		r = sev_receive_finish(kvm, &sev_cmd);
>>   		break;
>> +	case KVM_SEV_SNP_LAUNCH_START:
>> +		r = snp_launch_start(kvm, &sev_cmd);
>> +		break;
>>   	default:
>>   		r = -EINVAL;
>>   		goto out;
>> @@ -2292,6 +2377,28 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>>   	return ret;
>>   }
>>   
>> +static int snp_decommission_context(struct kvm *kvm)
>> +{
>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +	struct sev_data_snp_addr data = {};
>> +	int ret;
>> +
>> +	/* If context is not created then do nothing */
>> +	if (!sev->snp_context)
>> +		return 0;
>> +
>> +	data.gctx_paddr = __sme_pa(sev->snp_context);
>> +	ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
>> +	if (WARN_ONCE(ret, "failed to release guest context"))
>> +		return ret;
>> +
>> +	/* free the context page now */
>> +	snp_free_firmware_page(sev->snp_context);
>> +	sev->snp_context = NULL;
>> +
>> +	return 0;
>> +}
>> +
>>   void sev_vm_destroy(struct kvm *kvm)
>>   {
>>   	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> @@ -2333,7 +2440,15 @@ void sev_vm_destroy(struct kvm *kvm)
>>   		}
>>   	}
>>   
>> -	sev_unbind_asid(kvm, sev->handle);
>> +	if (sev_snp_guest(kvm)) {
>> +		if (snp_decommission_context(kvm)) {
>> +			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
>> +			return;
>> +		}
>> +	} else {
>> +		sev_unbind_asid(kvm, sev->handle);
>> +	}
>> +
>>   	sev_asid_free(sev);
>>   }
>>   
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 56a5c96d8a36..740969b57425 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -92,6 +92,7 @@ struct kvm_sev_info {
>>   	struct misc_cg *misc_cg; /* For misc cgroup accounting */
>>   	atomic_t migration_in_progress;
>>   	u64 snp_init_flags;
>> +	void *snp_context;      /* SNP guest context page */
>>   };
>>   
>>   struct kvm_svm {
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 499cc323f793..cf19799ca5ce 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1919,6 +1919,7 @@ enum sev_cmd_id {
>>   
>>   	/* SNP specific commands */
>>   	KVM_SEV_SNP_INIT,
>> +	KVM_SEV_SNP_LAUNCH_START,
>>   
>>   	KVM_SEV_NR_MAX,
>>   };
>> @@ -2026,6 +2027,15 @@ struct kvm_snp_init {
>>   	__u64 flags;
>>   };
>>   
>> +struct kvm_sev_snp_launch_start {
>> +	__u64 policy;
>> +	__u64 ma_uaddr;
>> +	__u8 ma_en;
>> +	__u8 imi_en;
>> +	__u8 gosvw[16];
>> +	__u8 pad[6];
>> +};
>> +
>>   #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>>   #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>>   #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 19/56] x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling.
  2023-02-20 18:38 ` [PATCH RFC v8 19/56] x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling Michael Roth
  2023-02-20 21:13   ` Zhi Wang
@ 2023-02-28 10:53   ` Wu Zongyong
  1 sibling, 0 replies; 147+ messages in thread
From: Wu Zongyong @ 2023-02-28 10:53 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	wutu.xq2

On Mon, Feb 20, 2023 at 12:38:10PM -0600, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Return pfn from dump_pagetable() to do SEV-specific
> fault handling. Used for handling SNP RMP page fault.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/mm/fault.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index afd4cde17001..f2b16dcfbd9a 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -311,7 +311,7 @@ static bool low_pfn(unsigned long pfn)
>  	return pfn < max_low_pfn;
>  }
>  
> -static void dump_pagetable(unsigned long address)
> +static unsigned long dump_pagetable(unsigned long address)
>  {
>  	pgd_t *base = __va(read_cr3_pa());
>  	pgd_t *pgd = &base[pgd_index(address)];
> @@ -345,8 +345,10 @@ static void dump_pagetable(unsigned long address)
>  
>  	pte = pte_offset_kernel(pmd, address);
>  	pr_cont("*pte = %0*Lx ", sizeof(*pte) * 2, (u64)pte_val(*pte));
> +	return 0;
>  out:
>  	pr_cont("\n");
> +	return 0;
>  }
>  
>  #else /* CONFIG_X86_64: */
> @@ -367,10 +369,11 @@ static int bad_address(void *p)
>  	return get_kernel_nofault(dummy, (unsigned long *)p);
>  }
>  
> -static void dump_pagetable(unsigned long address)
> +static unsigned long dump_pagetable(unsigned long address)
>  {
>  	pgd_t *base = __va(read_cr3_pa());
>  	pgd_t *pgd = base + pgd_index(address);
> +	unsigned long pfn;

pfn should be initialized otherwise this function may return an
uninitialized value.

>  	p4d_t *p4d;
>  	pud_t *pud;
>  	pmd_t *pmd;
> @@ -388,6 +391,7 @@ static void dump_pagetable(unsigned long address)
>  	if (bad_address(p4d))
>  		goto bad;
>  
> +	pfn = p4d_pfn(*p4d);
>  	pr_cont("P4D %lx ", p4d_val(*p4d));
>  	if (!p4d_present(*p4d) || p4d_large(*p4d))
>  		goto out;
> @@ -396,6 +400,7 @@ static void dump_pagetable(unsigned long address)
>  	if (bad_address(pud))
>  		goto bad;
>  
> +	pfn = pud_pfn(*pud);
>  	pr_cont("PUD %lx ", pud_val(*pud));
>  	if (!pud_present(*pud) || pud_large(*pud))
>  		goto out;
> @@ -404,6 +409,7 @@ static void dump_pagetable(unsigned long address)
>  	if (bad_address(pmd))
>  		goto bad;
>  
> +	pfn = pmd_pfn(*pmd);
>  	pr_cont("PMD %lx ", pmd_val(*pmd));
>  	if (!pmd_present(*pmd) || pmd_large(*pmd))
>  		goto out;
> @@ -412,13 +418,14 @@ static void dump_pagetable(unsigned long address)
>  	if (bad_address(pte))
>  		goto bad;
>  
> +	pfn = pte_pfn(*pte);
>  	pr_cont("PTE %lx", pte_val(*pte));
>  out:
>  	pr_cont("\n");
> -
> -	return;
> +	return pfn;
>  bad:
>  	pr_info("BAD\n");
> +	return -1;
>  }
>  
>  #endif /* CONFIG_X86_64 */
> -- 
> 2.25.1
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 44/56] KVM: SVM: Add support to handle the RMP nested page fault
  2023-02-20 18:38 ` [PATCH RFC v8 44/56] KVM: SVM: Add support to handle the RMP nested page fault Michael Roth
@ 2023-02-28 19:11   ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-28 19:11 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:35 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> When SEV-SNP is enabled in the guest, the hardware places restrictions
> on all memory accesses based on the contents of the RMP table. When
> hardware encounters RMP check failure caused by the guest memory access
> it raises the #NPF. The error code contains additional information on
> the access type. See the APM volume 2 for additional information.
> 
> Page state changes are handled by userspace, so if an RMP fault is
> triggered as a result of an RMP NPT fault, exit to userspace just like
> with explicit page-state change requests.
> 
> RMP NPT faults can also occur if the guest pvalidates a 2M page as 4K,
> in which case the RMP entries need to be PSMASH'd. Handle this case
> immediately in the kernel.
> 
> Co-developed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 84 ++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c | 21 +++++++++--
>  arch/x86/kvm/svm/svm.h |  1 +
>  3 files changed, 102 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 102966c43e28..197b1f904567 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3347,6 +3347,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
>  	svm->vmcb->control.ghcb_gpa = value;
>  }
>  
> +static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
> +{
> +	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> +
> +	return psmash(pfn);
> +}
> +
>  /*
>   * TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr
>   * and process that here accordingly.
> @@ -3872,3 +3879,80 @@ void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *le
>  	pr_debug("%s: GFN: 0x%llx, PFN: 0x%llx, level: %d, rmp_level: %d, level_orig: %d, assigned: %d\n",
>  		 __func__, gfn, pfn, *level, rmp_level, level_orig, assigned);
>  }
> +
> +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
> +{
> +	int order, rmp_level, assigned, ret;
> +	struct kvm_memory_slot *slot;
> +	struct kvm *kvm = vcpu->kvm;
> +	kvm_pfn_t pfn;
> +	gfn_t gfn;
> +
> +	/*
> +	 * Private memslots punt handling of implicit page state changes to
                             ^put
> +	 * userspace, so the only RMP faults expected here for
> +	 * PFERR_GUEST_SIZEM_MASK. Anything else suggests that the RMP table has
> +	 * gotten out of sync with the private memslot.
> +	 *
> +	 * TODO: However, this case has also been noticed when an access occurs
> +	 * to an NPT mapping that has just been split/PSMASHED, in which case
> +	 * PFERR_GUEST_SIZEM_MASK might not be set. In those cases it should be
> +	 * safe to ignore and let the guest retry, but log these just in case
> +	 * for now.
> +	 */
> +	if (!(error_code & PFERR_GUEST_SIZEM_MASK)) {
> +		pr_warn_ratelimited("Unexpected RMP fault for GPA 0x%llx, error_code 0x%llx",
> +				    gpa, error_code);
> +		return;
> +	}
> +
> +	gfn = gpa >> PAGE_SHIFT;
> +
> +	/*
> +	 * Only RMPADJUST/PVALIDATE should cause PFERR_GUEST_SIZEM.
> +	 *
> +	 * For PVALIDATE, this should only happen if a guest PVALIDATEs a 4K GFN
> +	 * that is backed by a huge page in the host whose RMP entry has the
> +	 * hugepage/assigned bits set. With UPM, that should only ever happen
> +	 * for private pages.
> +	 *
> +	 * For RMPADJUST, this assumption might not hold, in which case handling
> +	 * for obtaining the PFN from HVA-backed memory may be needed. For now,
> +	 * just print warnings.
> +	 */
> +	if (!kvm_mem_is_private(kvm, gfn)) {
> +		pr_warn_ratelimited("Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx\n",
> +				    gpa);
> +		return;
> +	}
> +
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!kvm_slot_can_be_private(slot)) {
> +		pr_warn_ratelimited("Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
> +				    gpa);
> +		return;
> +	}
> +
> +	ret = kvm_restrictedmem_get_pfn(slot, gfn, &pfn, &order);
> +	if (ret) {
> +		pr_warn_ratelimited("Unexpected RMP fault, no private backing page for GPA 0x%llx\n",
> +				    gpa);
> +		return;
> +	}
> +
> +	assigned = snp_lookup_rmpentry(pfn, &rmp_level);
> +	if (assigned != 1) {
> +		pr_warn_ratelimited("Unexpected RMP fault, no assigned RMP entry for GPA 0x%llx\n",
> +				    gpa);
> +		goto out;
> +	}
> +
> +	ret = snp_rmptable_psmash(kvm, pfn);
> +	if (ret)
> +		pr_err_ratelimited("Unable to split RMP entries for GPA 0x%llx PFN 0x%llx ret %d\n",
> +				   gpa, pfn, ret);
> +
> +out:
> +	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
> +	put_page(pfn_to_page(pfn));
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 9eb750c8b04c..f9ab4bf6d245 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1976,15 +1976,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
>  static int npf_interception(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	int rc;
>  
>  	u64 fault_address = svm->vmcb->control.exit_info_2;
>  	u64 error_code = svm->vmcb->control.exit_info_1;
>  
>  	trace_kvm_page_fault(vcpu, fault_address, error_code);
> -	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
> -			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
> -			svm->vmcb->control.insn_bytes : NULL,
> -			svm->vmcb->control.insn_len);
> +	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
> +				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
> +				svm->vmcb->control.insn_bytes : NULL,
> +				svm->vmcb->control.insn_len);
> +
> +	/*
> +	 * rc == 0 indicates a userspace exit is needed to handle page
> +	 * transitions, so do that first before updating the RMP table.
> +	 */
> +	if (error_code & PFERR_GUEST_RMP_MASK) {
> +		if (rc == 0)
> +			return rc;
> +		handle_rmp_page_fault(vcpu, fault_address, error_code);
> +	}
> +
> +	return rc;
>  }
>  
>  static int db_interception(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 0c655a4d32d5..13b00233b315 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -714,6 +714,7 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
>  struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>  void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *level);
> +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
>  
>  /* vmenter.S */
>  


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2023-02-20 18:38 ` [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
  2023-02-24 11:01   ` Alexander Graf
@ 2023-02-28 19:34   ` Zhi Wang
  2023-04-17 13:05   ` Alexander Graf
  2 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-02-28 19:34 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:36 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Version 2 of GHCB specification added the support for two SNP Guest
> Request Message NAE events. The events allows for an SEV-SNP guest to
> make request to the SEV-SNP firmware through hypervisor using the
> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
> 
> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
> difference of an additional certificate blob that can be passed through
> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
> provides snp_guest_ext_guest_request() that is used by the KVM to get
> both the report and certificate data at once.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 185 +++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/svm/svm.h |   2 +
>  2 files changed, 181 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 197b1f904567..92179614102e 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -327,6 +327,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  		if (ret)
>  			goto e_free;
>  
> +		mutex_init(&sev->guest_req_lock);
>  		ret = sev_snp_init(&argp->error, false);
>  	} else {
>  		ret = sev_platform_init(&argp->error);
> @@ -2059,23 +2060,34 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>   */
>  static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>  	struct sev_data_snp_addr data = {};
> -	void *context;
> +	void *context, *certs_data;
>  	int rc;
>  
> +	/* Allocate memory used for the certs data in SNP guest request */
> +	certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
> +	if (!certs_data)
> +		return NULL;
> +
>  	/* Allocate memory for context page */
>  	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>  	if (!context)
> -		return NULL;
> +		goto e_free;
>  
>  	data.gctx_paddr = __psp_pa(context);
>  	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> -	if (rc) {
> -		snp_free_firmware_page(context);
> -		return NULL;
> -	}
> +	if (rc)
> +		goto e_free;
> +
> +	sev->snp_certs_data = certs_data;
>  
>  	return context;
> +
> +e_free:
> +	snp_free_firmware_page(context);
> +	kfree(certs_data);
> +	return NULL;
>  }
>  
>  static int snp_bind_asid(struct kvm *kvm, int *error)
> @@ -2693,6 +2705,8 @@ static int snp_decommission_context(struct kvm *kvm)
>  	snp_free_firmware_page(sev->snp_context);
>  	sev->snp_context = NULL;
>  
> +	kfree(sev->snp_certs_data);
> +
>  	return 0;
>  }
>  
> @@ -3153,6 +3167,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>  	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>  	case SVM_VMGEXIT_HV_FEATURES:
>  	case SVM_VMGEXIT_PSC:
> +	case SVM_VMGEXIT_GUEST_REQUEST:
> +	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>  		break;
>  	default:
>  		reason = GHCB_ERR_INVALID_EVENT;
> @@ -3384,6 +3400,149 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
> +					 struct sev_data_snp_guest_request *data,
> +					 gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct kvm *kvm = vcpu->kvm;
> +	kvm_pfn_t req_pfn, resp_pfn;
> +	struct kvm_sev_info *sev;
> +
> +	sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
> +		return SEV_RET_INVALID_PARAM;
> +
> +	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> +	if (is_error_noslot_pfn(req_pfn))
> +		return SEV_RET_INVALID_ADDRESS;
> +
> +	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> +	if (is_error_noslot_pfn(resp_pfn))
> +		return SEV_RET_INVALID_ADDRESS;
> +
> +	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
> +		return SEV_RET_INVALID_ADDRESS;
> +
> +	data->gctx_paddr = __psp_pa(sev->snp_context);
> +	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> +	data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> +
> +	return 0;
> +}
> +
> +static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
> +{
> +	u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
> +	int ret;
> +
> +	ret = snp_page_reclaim(pfn);
> +	if (ret)
> +		*rc = SEV_RET_INVALID_ADDRESS;
> +
> +	ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> +	if (ret)
> +		*rc = SEV_RET_INVALID_ADDRESS;
> +}
> +
> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +	struct sev_data_snp_guest_request data = {0};
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_sev_info *sev;
> +	unsigned long rc;
> +	int err;
> +
> +	if (!sev_snp_guest(vcpu->kvm)) {
> +		rc = SEV_RET_INVALID_GUEST;
> +		goto e_fail;
> +	}
> +
> +	sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	mutex_lock(&sev->guest_req_lock);
> +
> +	rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> +	if (rc)
> +		goto unlock;
> +
> +	rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> +	if (rc)
> +		/* use the firmware error code */
> +		rc = err;
> +
> +	snp_cleanup_guest_buf(&data, &rc);
> +

I am curious about the reason of having a shared-private and private-shared
conversion before and after issuing the command to firmware.

Is it because the firmware requires the resp page has to be a private page?
while the req page is not. (I understand that the req/resp page should be
shared before returnning to guest due to GHCB spec)

> +unlock:
> +	mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> +	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
> +}
> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +	struct sev_data_snp_guest_request req = {0};
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct kvm *kvm = vcpu->kvm;
> +	unsigned long data_npages;
> +	struct kvm_sev_info *sev;
> +	unsigned long rc, err;
> +	u64 data_gpa;
> +
> +	if (!sev_snp_guest(vcpu->kvm)) {
> +		rc = SEV_RET_INVALID_GUEST;
> +		goto e_fail;
> +	}
> +
> +	sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> +	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> +
> +	if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> +		rc = SEV_RET_INVALID_ADDRESS;
> +		goto e_fail;
> +	}
> +
> +	mutex_lock(&sev->guest_req_lock);
> +
> +	rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
> +	if (rc)
> +		goto unlock;
> +
> +	rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
> +					 &data_npages, &err);
> +	if (rc) {
> +		/*
> +		 * If buffer length is small then return the expected
> +		 * length in rbx.
> +		 */
> +		if (err == SNP_GUEST_REQ_INVALID_LEN)
> +			vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
> +
> +		/* pass the firmware error code */
> +		rc = err;
> +		goto cleanup;
> +	}
> +
> +	/* Copy the certificate blob in the guest memory */
> +	if (data_npages &&
> +	    kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
> +		rc = SEV_RET_INVALID_ADDRESS;
> +
> +cleanup:
> +	snp_cleanup_guest_buf(&req, &rc);
> +
> +unlock:
> +	mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> +	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
> +}
> +
>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  {
>  	struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3633,6 +3792,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>  		vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
>  		vcpu->arch.complete_userspace_io = snp_complete_psc;
>  		break;
> +	case SVM_VMGEXIT_GUEST_REQUEST: {
> +		snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
> +
> +		ret = 1;
> +		break;
> +	}
> +	case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
> +		snp_handle_ext_guest_request(svm,
> +					     control->exit_info_1,
> +					     control->exit_info_2);
> +
> +		ret = 1;
> +		break;
> +	}
>  	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>  		vcpu_unimpl(vcpu,
>  			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 13b00233b315..4a9ffb7e5139 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -93,6 +93,8 @@ struct kvm_sev_info {
>  	atomic_t migration_in_progress;
>  	u64 snp_init_flags;
>  	void *snp_context;      /* SNP guest context page */
> +	void *snp_certs_data;
> +	struct mutex guest_req_lock; /* Lock for guest request handling */
>  };
>  
>  struct kvm_svm {


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2023-02-24 12:37   ` Alexander Graf
@ 2023-02-28 20:47     ` Zhi Wang
  2023-03-01 21:14       ` Alexander Graf
  2023-04-04 22:48     ` Michael Roth
  1 sibling, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-02-28 20:47 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh

On Fri, 24 Feb 2023 13:37:48 +0100
Alexander Graf <graf@amazon.com> wrote:

> 
> On 20.02.23 19:38, Michael Roth wrote:
> > From: Tom Lendacky <thomas.lendacky@amd.com>
> >
> > Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
> > guests to alter the register state of the APs on their own. This allows
> > the guest a way of simulating INIT-SIPI.
> >
> > A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
> > so as to avoid updating the VMSA pointer while the vCPU is running.
> >
> > For CREATE
> >    The guest supplies the GPA of the VMSA to be used for the vCPU with
> >    the specified APIC ID. The GPA is saved in the svm struct of the
> >    target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
> >    to the vCPU and then the vCPU is kicked.
> >
> > For CREATE_ON_INIT:
> >    The guest supplies the GPA of the VMSA to be used for the vCPU with
> >    the specified APIC ID the next time an INIT is performed. The GPA is
> >    saved in the svm struct of the target vCPU.
> >
> > For DESTROY:
> >    The guest indicates it wishes to stop the vCPU. The GPA is cleared
> >    from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
> >    added to vCPU and then the vCPU is kicked.
> >
> > The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
> > as a result of the event or as a result of an INIT. The handler sets the
> > vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
> > leave the vCPU as not runnable. Any previous VMSA pages that were
> > installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
> > a new VMSA is to be installed, the VMSA guest page is pinned and set as
> > the VMSA in the vCPU VMCB and the vCPU state is set to
> > KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
> > cleared in the vCPU VMCB and the vCPU state is left as
> > KVM_MP_STATE_UNINITIALIZED to prevent it from being run.
> >
> > Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > [mdr: add handling for restrictedmem]
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> 
> 
> What is the intended boot sequence for SEV-SNP guests? FWIW with this 
> interface in place, guests will typically use in-guest VMSA pages to 
> hold secondary vcpu state. But that means we're now allocating 4kb of 
> memory for every vcpu that we create that will be for most of the 
> guest's lifetime superfluous.
> 
> Wouldn't it make more sense to have a model where we only allocate the 
> VMSA for the boot CPU and leave secondary allocation to the guest? We 
> already need firmware changes for SEV-SNP - may as well make this one more.
> 
> [...]
> 
> > +
> > +static int sev_snp_ap_creation(struct vcpu_svm *svm)
> > +{
> > +       struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
> > +       struct kvm_vcpu *vcpu = &svm->vcpu;
> > +       struct kvm_vcpu *target_vcpu;
> > +       struct vcpu_svm *target_svm;
> > +       unsigned int request;
> > +       unsigned int apic_id;
> > +       bool kick;
> > +       int ret;
> > +
> > +       request = lower_32_bits(svm->vmcb->control.exit_info_1);
> > +       apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
> > +
> > +       /* Validate the APIC ID */
> > +       target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
> 
> 
> Out of curiosity: The target CPU can be my own vCPU, right?
> 
> 
> > +       if (!target_vcpu) {
> > +               vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
> > +                           apic_id);
> > +               return -EINVAL;
> > +       }
> > +
> > +       ret = 0;
> > +
> > +       target_svm = to_svm(target_vcpu);
> > +
> > +       /*
> > +        * The target vCPU is valid, so the vCPU will be kicked unless the
> > +        * request is for CREATE_ON_INIT. For any errors at this stage, the
> > +        * kick will place the vCPU in an non-runnable state.
> > +        */
> > +       kick = true;
> > +
> > +       mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
> > +
> > +       target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
> > +       target_svm->sev_es.snp_ap_create = true;
> > +
> > +       /* Interrupt injection mode shouldn't change for AP creation */
> > +       if (request < SVM_VMGEXIT_AP_DESTROY) {
> > +               u64 sev_features;
> > +
> > +               sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
> > +               sev_features ^= sev->sev_features;
> > +               if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
> > +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
> > +                                   vcpu->arch.regs[VCPU_REGS_RAX]);
> > +                       ret = -EINVAL;
> > +                       goto out;
> > +               }
> > +       }
> > +
> > +       switch (request) {
> > +       case SVM_VMGEXIT_AP_CREATE_ON_INIT:
> > +               kick = false;
> > +               fallthrough;
> > +       case SVM_VMGEXIT_AP_CREATE:
> > +               if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
> > +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
> > +                                   svm->vmcb->control.exit_info_2);
> > +                       ret = -EINVAL;
> > +                       goto out;
> > +               }
> > +
> > +               /*
> > +                * Malicious guest can RMPADJUST a large page into VMSA which
> > +                * will hit the SNP erratum where the CPU will incorrectly signal
> > +                * an RMP violation #PF if a hugepage collides with the RMP entry
> > +                * of VMSA page, reject the AP CREATE request if VMSA address from
> > +                * guest is 2M aligned.
> 
> 
> This will break genuine current Linux kernels that just happen to 
> allocate a guest page, no? In fact, given enough vCPUs you're almost 
> guaranteed to hit an aligned structure somewhere. What is the guest 
> supposed to do in that situation?
> 
> 
> > +                */
> > +               if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
> > +                       vcpu_unimpl(vcpu,
> > +                                   "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
> > +                                   svm->vmcb->control.exit_info_2);
> > +                       ret = -EINVAL;
> > +                       goto out;
> > +               }
> > +
> > +               target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
> > +               break;
> > +       case SVM_VMGEXIT_AP_DESTROY:
> 
> 
> I don't understand the destroy path. Why does this case destroy anything?
> 
> 
> > +               break;
> > +       default:
> > +               vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
> > +                           request);
> > +               ret = -EINVAL;
> > +               break;
> > +       }
> > +
> > +out:
> > +       if (kick) {
> > +               if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
> > +                       target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> 
> 
> What if the guest AP goes through a create -> destroy -> create cycle? 
> Will it stay runnable while destroyed?

The code is not very straightforward.

1) target_svm->sev_es.snp_vmsa_gpa is set as INVALID_PAGE in the beginning of this function.

2) If a DESTROY is hit in this function, target_svm->sev_es.snp_vmsa_gpa will be
left as INVALID_PAGE.

3) At the end of this function, it calls kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE).

4) In the vcpu_enter_guest(), the kvm_vcpu_reset()->sev_snp_init_protected_guest_state()
->__sev_snp_init_protected_guest_state() is called.

5) The mp_state is set to KVM_MP_STATE_STOPPED by default and the runtime VMSA is
cleared. Then the it will be initialized according to the guest's
configuration.

6) As the snp_vmsa_gpa is set as INVALID_PAGE in 1, the mp_state will be left as
KVM_MP_STATE_STOPPED.

7) With this code piece:

+			kvm_vcpu_reset(vcpu, true);
+			if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
+				goto out;

vcpu_enter_guest() bails out.

> 
> 
> Alex
> 
> > +
> > +               kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
> > +               kvm_vcpu_kick(target_vcpu);
> > +       }
> > +
> > +       mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
> > +
> > +       return ret;
> > +}
> > +
> >   static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> >   {
> >          struct vmcb_control_area *control = &svm->vmcb->control;
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op
  2023-02-20 18:37 ` [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op Michael Roth
@ 2023-03-01 10:25   ` Zhi Wang
  2023-03-18  4:51   ` Isaku Yamahata
  2023-03-18  4:53   ` Isaku Yamahata
  2 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-03-01 10:25 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On Mon, 20 Feb 2023 12:37:52 -0600
Michael Roth <michael.roth@amd.com> wrote:

Basically, I don't think kvm_mmu_fault_is_private() is promising after going
through both SNP and TDX patches:

1) Fault path is critical. kvm_mmu_fault_is_private() is always doing a gfn_to
_memslot() no matter SNP/TDX is enabled or not. It might mostly hits the
slots->last_used_slot, but the worst case is going through an RB-tree search.

Adding extra overhead on the generic fault path needs to be re-considered
carefully. At least, check if the guest is a CC(SNP/TDX) guest.

2) Just after the gfn_to_memslot() in kvm_mmu_fault_is_private(), there is
another gfn_to_memslot():

static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
                                        u64 err, bool prefetch)
{
        struct kvm_page_fault fault = {
                .addr = cr2_or_gpa,
                .error_code = lower_32_bits(err),
                .exec = err & PFERR_FETCH_MASK,
                .write = err & PFERR_WRITE_MASK,
                .present = err & PFERR_PRESENT_MASK,
                .rsvd = err & PFERR_RSVD_MASK,
                .user = err & PFERR_USER_MASK,
                .prefetch = prefetch,
                .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
                .nx_huge_page_workaround_enabled =
                        is_nx_huge_page_enabled(vcpu->kvm),

                .max_level = KVM_MAX_HUGEPAGE_LEVEL,
                .req_level = PG_LEVEL_4K,
                .goal_level = PG_LEVEL_4K,
                .is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
        };
        int r;

        if (vcpu->arch.mmu->root_role.direct) {
                fault.gfn = fault.addr >> PAGE_SHIFT;
		/* here */
                fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
        }

I was thinking if checking the private slot and kvm_slot_can_be_private() is
necessary in kvm_mmu_fault_is_private().

TDP MMU is expecting fault.is_private to indicate if CPU thinks the fault
is private or not (For SNP, it is in PF error code, for TDX it is the shared
bit in the fault GPA). TDP MMU will check if the slot is a private slot or
not, leave the userspace to handle it when they thinks differently.

My points:

1) Resolving the PFER in kvm_x86_ops.fault_is_private and setting
fault.is_private is enough. The rest can be handled by the TDP MMU.

2) Put the kvm_x86_ops.fault_is_private in a separate patch so that TDX series
can include it. (64bit-error code part can stay in another patch)

> This callback is used by the KVM MMU to check whether a #NPF was for a
> private GPA or not.
> 
> In some cases the full 64-bit error code for the #NPF will be needed to
> make this determination, so also update kvm_mmu_do_page_fault() to
> accept the full 64-bit value so it can be plumbed through to the
> callback.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/mmu/mmu.c             |  3 +--
>  arch/x86/kvm/mmu/mmu_internal.h    | 37 +++++++++++++++++++++++++++---
>  4 files changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 8dc345cc6318..72183da010b8 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed)
>  KVM_X86_OP(complete_emulated_msr)
>  KVM_X86_OP(vcpu_deliver_sipi_vector)
>  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> +KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
>  
>  #undef KVM_X86_OP
>  #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e552374f2357..f856d689dda0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1643,6 +1643,7 @@ struct kvm_x86_ops {
>  
>  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
>  			     int root_level);
> +	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
>  
>  	bool (*has_wbinvd_exit)(void);
>  
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index eda615f3951c..fb3f34b7391c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5724,8 +5724,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>  	}
>  
>  	if (r == RET_PF_INVALID) {
> -		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
> -					  lower_32_bits(error_code), false);
> +		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false);
>  		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
>  			return -EIO;
>  	}
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index e642d431df4b..557a001210df 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -231,6 +231,37 @@ struct kvm_page_fault {
>  
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>  
> +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> +{
> +	struct kvm_memory_slot *slot;
> +	bool private_fault = false;
> +	gfn_t gfn = gpa_to_gfn(gpa);
> +
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot) {
> +		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
> +		goto out;
> +	}
> +
> +	if (!kvm_slot_can_be_private(slot)) {
> +		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
> +		goto out;
> +	}
> +
> +	if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault))
> +		goto out;
> +
> +	/*
> +	 * Handling below is for UPM self-tests and guests that treat userspace
> +	 * as the authority on whether a fault should be private or not.
> +	 */
> +	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
> +
> +out:
> +	pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault);
> +	return private_fault;
> +}
> +
>  /*
>   * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
>   * and of course kvm_mmu_do_page_fault().
> @@ -262,11 +293,11 @@ enum {
>  };
>  
>  static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> -					u32 err, bool prefetch)
> +					u64 err, bool prefetch)
>  {
>  	struct kvm_page_fault fault = {
>  		.addr = cr2_or_gpa,
> -		.error_code = err,
> +		.error_code = lower_32_bits(err),
>  		.exec = err & PFERR_FETCH_MASK,
>  		.write = err & PFERR_WRITE_MASK,
>  		.present = err & PFERR_PRESENT_MASK,
> @@ -280,7 +311,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
>  		.req_level = PG_LEVEL_4K,
>  		.goal_level = PG_LEVEL_4K,
> -		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> +		.is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
>  	};
>  	int r;
>  


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 50/56] KVM: SEV: Handle restricted memory invalidations for SNP
  2023-02-20 18:38 ` [PATCH RFC v8 50/56] KVM: SEV: Handle restricted memory invalidations " Michael Roth
@ 2023-03-01 10:41   ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-03-01 10:41 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On Mon, 20 Feb 2023 12:38:41 -0600
Michael Roth <michael.roth@amd.com> wrote:

> Implement a platform hook to do the work of restoring the direct map
> entries and cleaning up RMP table entries for restricted memory that is
> being freed back to the host.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 62 ++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c |  1 +
>  arch/x86/kvm/svm/svm.h |  1 +
>  3 files changed, 64 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 7a74a92cb39a..bedec90d034f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4509,3 +4509,65 @@ bool sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *priv
>  
>  	return true;
>  }
> +
> +void sev_invalidate_private_range(struct kvm_memory_slot *slot, gfn_t start, gfn_t end)
> +{
> +	gfn_t gfn = start;
> +
> +	if (!sev_snp_guest(slot->kvm))
> +		return;
> +
> +	if (!kvm_slot_can_be_private(slot)) {
> +		pr_warn_ratelimited("SEV: Memslot for GFN: 0x%llx is not private.\n",
> +				    gfn);
> +		return;
> +	}
> +

This is a generic check for both SNP and TDX, it should be moved to
kvm_restrictedmem_invalidate_begin().

> +	while (gfn <= end) {
> +		gpa_t gpa = gfn_to_gpa(gfn);
> +		int level = PG_LEVEL_4K;
> +		int order, rc;
> +		kvm_pfn_t pfn;
> +
> +		rc = kvm_restrictedmem_get_pfn(slot, gfn, &pfn, &order);
> +		if (rc) {
> +			pr_warn_ratelimited("SEV: Failed to retrieve restricted PFN for GFN 0x%llx, rc: %d\n",
> +					    gfn, rc);
> +			gfn++;
> +			continue;
> +		}
> +
> +		if (order) {
> +			int rmp_level;
> +
> +			if (IS_ALIGNED(gpa, page_level_size(PG_LEVEL_2M)) &&
> +			    gpa + page_level_size(PG_LEVEL_2M) <= gfn_to_gpa(end))
> +				level = PG_LEVEL_2M;
> +			else
> +				pr_debug("%s: GPA 0x%llx is not aligned to 2M, skipping 2M directmap restoration\n",
> +					 __func__, gpa);
> +
> +			/*
> +			 * TODO: It may still be possible to restore 2M mapping here,
> +			 * but keep it simple for now.
> +			 */
> +			if (level == PG_LEVEL_2M &&
> +			    (!snp_lookup_rmpentry(pfn, &rmp_level) || rmp_level == PG_LEVEL_4K)) {
> +				pr_debug("%s: PFN 0x%llx is not mapped as 2M private range, skipping 2M directmap restoration\n",
> +					 __func__, pfn);
> +				level = PG_LEVEL_4K;
> +			}
> +		}
> +
> +		pr_debug("%s: GPA %llx PFN %llx order %d level %d\n",
> +			 __func__, gpa, pfn, order, level);
> +		rc = snp_make_page_shared(slot->kvm, gpa, pfn, level);
> +		if (rc)
> +			pr_err("SEV: Failed to restore page to shared, GPA: 0x%llx PFN: 0x%llx order: %d rc: %d\n",
> +			       gpa, pfn, order, rc);
> +
> +		gfn += page_level_size(level) >> PAGE_SHIFT;
> +		put_page(pfn_to_page(pfn));
> +		cond_resched();
> +	}
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 18e4a6c17d11..3fe5f13b5f3a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4862,6 +4862,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.adjust_mapping_level = sev_adjust_mapping_level,
>  	.update_mem_attr = sev_update_mem_attr,
>  	.fault_is_private = sev_fault_is_private,
> +	.invalidate_restricted_mem = sev_invalidate_private_range,
>  };
>  
>  /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 97038afa8020..857b674e68f0 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -727,6 +727,7 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
>  void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
>  int sev_update_mem_attr(struct kvm_memory_slot *slot, unsigned int attr,
>  			gfn_t start, gfn_t end);
> +void sev_invalidate_private_range(struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
>  
>  bool sev_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
>  


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 51/56] KVM: SVM: Add module parameter to enable the SEV-SNP
  2023-02-20 18:38 ` [PATCH RFC v8 51/56] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
@ 2023-03-01 10:45   ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-03-01 10:45 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:42 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Add a module parameter than can be used to enable or disable the SEV-SNP
> feature. Now that KVM contains the support for the SNP set the GHCB
> hypervisor feature flag to indicate that SNP is supported.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 7 ++++---
>  arch/x86/kvm/svm/svm.h | 2 +-
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index bedec90d034f..70d5650d8d95 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -55,14 +55,15 @@ module_param_named(sev, sev_enabled, bool, 0444);
>  /* enable/disable SEV-ES support */
>  static bool sev_es_enabled = true;
>  module_param_named(sev_es, sev_es_enabled, bool, 0444);
> +
> +/* enable/disable SEV-SNP support */
> +static bool sev_snp_enabled = true;
> +module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
>  #else
>  #define sev_enabled false
>  #define sev_es_enabled false

Guess we also need #define sev_snp_enabled false.

>  #endif /* CONFIG_KVM_AMD_SEV */
>  
> -/* enable/disable SEV-SNP support */
> -static bool sev_snp_enabled;
> -
>  #define AP_RESET_HOLD_NONE		0
>  #define AP_RESET_HOLD_NAE_EVENT		1
>  #define AP_RESET_HOLD_MSR_PROTO		2
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 857b674e68f0..221b38d3c845 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -694,7 +694,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
>  #define GHCB_VERSION_MAX	2ULL
>  #define GHCB_VERSION_MIN	1ULL
>  
> -#define GHCB_HV_FT_SUPPORTED	0
> +#define GHCB_HV_FT_SUPPORTED	(GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)

This is not related to the topic of this patch, should be merged into related
patches.

>  
>  extern unsigned int max_sev_asid;
>  


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2023-02-20 18:38 ` [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
@ 2023-03-01 12:07   ` Tom Dohrmann
  2023-03-01 16:15   ` Dave Hansen
  1 sibling, 0 replies; 147+ messages in thread
From: Tom Dohrmann @ 2023-03-01 12:07 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, Feb 20, 2023 at 12:38:06PM -0600, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The integrity guarantee of SEV-SNP is enforced through the RMP table.
> The RMP is used with standard x86 and IOMMU page tables to enforce
> memory restrictions and page access rights. The RMP check is enforced as
> soon as SEV-SNP is enabled globally in the system. When hardware
> encounters an RMP-check failure, it raises a page-fault exception.
>
> The rmp_make_private() and rmp_make_shared() helpers are used to add
> or remove the pages from the RMP table. Improve the rmp_make_private()
> to invalidate state so that pages cannot be used in the direct-map after
> they are added the RMP table, and restored to their default valid
> permission after the pages are removed from the RMP table.
>
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kernel/sev.c | 57 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index a49f30c10dc1..3e5ff5934e83 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2595,6 +2595,37 @@ int psmash(u64 pfn)
>  }
>  EXPORT_SYMBOL_GPL(psmash);
>
> +static int restore_direct_map(u64 pfn, int npages)
> +{
> +	int i, ret = 0;
> +
> +	for (i = 0; i < npages; i++) {
> +		ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
> +		if (ret)
> +			goto cleanup;
> +	}
> +
> +cleanup:
> +	WARN(ret > 0, "Failed to restore direct map for pfn 0x%llx\n", pfn + i);
> +	return ret;
> +}
> +
> +static int invalidate_direct_map(u64 pfn, int npages)
> +{
> +	int i, ret = 0;
> +
> +	for (i = 0; i < npages; i++) {
> +		ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));
> +		if (ret)
> +			goto cleanup;
> +	}
> +
> +cleanup:
> +	WARN(ret > 0, "Failed to invalidate direct map for pfn 0x%llx\n", pfn + i);
> +	restore_direct_map(pfn, i);

This immediately restores the direct map after invalidating it. It
probably needs to put behind if(ret).

Regards, Tom

> +	return ret;
> +}
> +
>  static int rmpupdate(u64 pfn, struct rmp_state *val)
>  {
>  	int max_attempts = 4 * num_present_cpus();
> @@ -2605,6 +2636,21 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
>  	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>  		return -ENXIO;
>
> +	level = RMP_TO_X86_PG_LEVEL(val->pagesize);
> +	npages = page_level_size(level) / PAGE_SIZE;
> +
> +	/*
> +	 * If page is getting assigned in the RMP table then unmap it from the
> +	 * direct map.
> +	 */
> +	if (val->assigned) {
> +		if (invalidate_direct_map(pfn, npages)) {
> +			pr_err("Failed to unmap %d pages at pfn 0x%llx from the direct_map\n",
> +			       npages, pfn);
> +			return -EFAULT;
> +		}
> +	}
> +
>  	do {
>  		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
>  		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> @@ -2630,6 +2676,17 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
>  			 attempts, val->asid, ret, pfn, npages);
>  	}
>
> +	/*
> +	 * Restore the direct map after the page is removed from the RMP table.
> +	 */
> +	if (!val->assigned) {
> +		if (restore_direct_map(pfn, npages)) {
> +			pr_err("Failed to map %d pages at pfn 0x%llx into the direct_map\n",
> +			       npages, pfn);
> +			return -EFAULT;
> +		}
> +	}
> +
>  	return 0;
>  }
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2023-02-20 18:38 ` [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
  2023-03-01 12:07   ` Tom Dohrmann
@ 2023-03-01 16:15   ` Dave Hansen
  2023-03-28 22:12     ` Michael Roth
  1 sibling, 1 reply; 147+ messages in thread
From: Dave Hansen @ 2023-03-01 16:15 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On 2/20/23 10:38, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The integrity guarantee of SEV-SNP is enforced through the RMP table.
> The RMP is used with standard x86 and IOMMU page tables to enforce
> memory restrictions and page access rights. The RMP check is enforced as
> soon as SEV-SNP is enabled globally in the system. When hardware
> encounters an RMP-check failure, it raises a page-fault exception.
> 
> The rmp_make_private() and rmp_make_shared() helpers are used to add
> or remove the pages from the RMP table. Improve the rmp_make_private()
> to invalidate state so that pages cannot be used in the direct-map after
> they are added the RMP table, and restored to their default valid
> permission after the pages are removed from the RMP table.

This is a purely "what" changelog.  It doesn't explain the "why" at all.

Could you please elaborate on why this unmapping operation is necessary?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address
  2023-02-20 18:38 ` [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address Michael Roth
@ 2023-03-01 16:21   ` Dave Hansen
  2023-03-28 23:31     ` Michael Roth
  2023-03-03 15:31   ` Vlastimil Babka
  1 sibling, 1 reply; 147+ messages in thread
From: Dave Hansen @ 2023-03-01 16:21 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Jarkko Sakkinen

On 2/20/23 10:38, Michael Roth wrote:
> +static int handle_split_page_fault(struct vm_fault *vmf)
> +{
> +	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> +	return 0;
> +}
> +
>  /*
>   * By the time we get here, we already hold the mm semaphore
>   *
> @@ -5078,6 +5084,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>  				pmd_migration_entry_wait(mm, vmf.pmd);
>  			return 0;
>  		}
> +
> +		if (flags & FAULT_FLAG_PAGE_SPLIT)
> +			return handle_split_page_fault(&vmf);

I asked this long ago, but how do you prevent these faults from
occurring on hugetlbfs mappings that can't be split?


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (55 preceding siblings ...)
  2023-02-20 18:38 ` [PATCH RFC v8 56/56] iommu/amd: Add IOMMU_SNP_SHUTDOWN support Michael Roth
@ 2023-03-01 16:56 ` Dave Hansen
  2023-03-01 22:59   ` Zhi Wang
  2023-08-03 18:27 ` Schander, Johanna 'Mimoja' Amelie
  57 siblings, 1 reply; 147+ messages in thread
From: Dave Hansen @ 2023-03-01 16:56 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On 2/20/23 10:37, Michael Roth wrote:
> The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
> access requires an RMP check. In particular, the read accesses from the
> hypervisor do not require RMP checks because the data confidentiality is
> already protected via memory encryption. When hardware encounters an RMP
> checks failure, it raises a page-fault exception. If RMP check failure
> is due to the page-size mismatch, then split the large page to resolve
> the fault.

What does this all _mean_?

When does the kernel need to care about a "page-size mismatch"?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2023-02-28 20:47     ` Zhi Wang
@ 2023-03-01 21:14       ` Alexander Graf
  2023-04-05  0:54         ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Alexander Graf @ 2023-03-01 21:14 UTC (permalink / raw)
  To: Zhi Wang
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh


On 28.02.23 21:47, Zhi Wang wrote:
> On Fri, 24 Feb 2023 13:37:48 +0100
> Alexander Graf <graf@amazon.com> wrote:
>
>> On 20.02.23 19:38, Michael Roth wrote:
>>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>>
>>> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
>>> guests to alter the register state of the APs on their own. This allows
>>> the guest a way of simulating INIT-SIPI.
>>>
>>> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
>>> so as to avoid updating the VMSA pointer while the vCPU is running.
>>>
>>> For CREATE
>>>     The guest supplies the GPA of the VMSA to be used for the vCPU with
>>>     the specified APIC ID. The GPA is saved in the svm struct of the
>>>     target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>>>     to the vCPU and then the vCPU is kicked.
>>>
>>> For CREATE_ON_INIT:
>>>     The guest supplies the GPA of the VMSA to be used for the vCPU with
>>>     the specified APIC ID the next time an INIT is performed. The GPA is
>>>     saved in the svm struct of the target vCPU.
>>>
>>> For DESTROY:
>>>     The guest indicates it wishes to stop the vCPU. The GPA is cleared
>>>     from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
>>>     added to vCPU and then the vCPU is kicked.
>>>
>>> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
>>> as a result of the event or as a result of an INIT. The handler sets the
>>> vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
>>> leave the vCPU as not runnable. Any previous VMSA pages that were
>>> installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
>>> a new VMSA is to be installed, the VMSA guest page is pinned and set as
>>> the VMSA in the vCPU VMCB and the vCPU state is set to
>>> KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
>>> cleared in the vCPU VMCB and the vCPU state is left as
>>> KVM_MP_STATE_UNINITIALIZED to prevent it from being run.
>>>
>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> [mdr: add handling for restrictedmem]
>>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>>
>> What is the intended boot sequence for SEV-SNP guests? FWIW with this
>> interface in place, guests will typically use in-guest VMSA pages to
>> hold secondary vcpu state. But that means we're now allocating 4kb of
>> memory for every vcpu that we create that will be for most of the
>> guest's lifetime superfluous.
>>
>> Wouldn't it make more sense to have a model where we only allocate the
>> VMSA for the boot CPU and leave secondary allocation to the guest? We
>> already need firmware changes for SEV-SNP - may as well make this one more.
>>
>> [...]
>>
>>> +
>>> +static int sev_snp_ap_creation(struct vcpu_svm *svm)
>>> +{
>>> +       struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
>>> +       struct kvm_vcpu *vcpu = &svm->vcpu;
>>> +       struct kvm_vcpu *target_vcpu;
>>> +       struct vcpu_svm *target_svm;
>>> +       unsigned int request;
>>> +       unsigned int apic_id;
>>> +       bool kick;
>>> +       int ret;
>>> +
>>> +       request = lower_32_bits(svm->vmcb->control.exit_info_1);
>>> +       apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
>>> +
>>> +       /* Validate the APIC ID */
>>> +       target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
>>
>> Out of curiosity: The target CPU can be my own vCPU, right?
>>
>>
>>> +       if (!target_vcpu) {
>>> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
>>> +                           apic_id);
>>> +               return -EINVAL;
>>> +       }
>>> +
>>> +       ret = 0;
>>> +
>>> +       target_svm = to_svm(target_vcpu);
>>> +
>>> +       /*
>>> +        * The target vCPU is valid, so the vCPU will be kicked unless the
>>> +        * request is for CREATE_ON_INIT. For any errors at this stage, the
>>> +        * kick will place the vCPU in an non-runnable state.
>>> +        */
>>> +       kick = true;
>>> +
>>> +       mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
>>> +
>>> +       target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
>>> +       target_svm->sev_es.snp_ap_create = true;
>>> +
>>> +       /* Interrupt injection mode shouldn't change for AP creation */
>>> +       if (request < SVM_VMGEXIT_AP_DESTROY) {
>>> +               u64 sev_features;
>>> +
>>> +               sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
>>> +               sev_features ^= sev->sev_features;
>>> +               if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
>>> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
>>> +                                   vcpu->arch.regs[VCPU_REGS_RAX]);
>>> +                       ret = -EINVAL;
>>> +                       goto out;
>>> +               }
>>> +       }
>>> +
>>> +       switch (request) {
>>> +       case SVM_VMGEXIT_AP_CREATE_ON_INIT:
>>> +               kick = false;
>>> +               fallthrough;
>>> +       case SVM_VMGEXIT_AP_CREATE:
>>> +               if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
>>> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
>>> +                                   svm->vmcb->control.exit_info_2);
>>> +                       ret = -EINVAL;
>>> +                       goto out;
>>> +               }
>>> +
>>> +               /*
>>> +                * Malicious guest can RMPADJUST a large page into VMSA which
>>> +                * will hit the SNP erratum where the CPU will incorrectly signal
>>> +                * an RMP violation #PF if a hugepage collides with the RMP entry
>>> +                * of VMSA page, reject the AP CREATE request if VMSA address from
>>> +                * guest is 2M aligned.
>>
>> This will break genuine current Linux kernels that just happen to
>> allocate a guest page, no? In fact, given enough vCPUs you're almost
>> guaranteed to hit an aligned structure somewhere. What is the guest
>> supposed to do in that situation?
>>
>>
>>> +                */
>>> +               if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
>>> +                       vcpu_unimpl(vcpu,
>>> +                                   "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
>>> +                                   svm->vmcb->control.exit_info_2);
>>> +                       ret = -EINVAL;
>>> +                       goto out;
>>> +               }
>>> +
>>> +               target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
>>> +               break;
>>> +       case SVM_VMGEXIT_AP_DESTROY:
>>
>> I don't understand the destroy path. Why does this case destroy anything?
>>
>>
>>> +               break;
>>> +       default:
>>> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
>>> +                           request);
>>> +               ret = -EINVAL;
>>> +               break;
>>> +       }
>>> +
>>> +out:
>>> +       if (kick) {
>>> +               if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
>>> +                       target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>>
>> What if the guest AP goes through a create -> destroy -> create cycle?
>> Will it stay runnable while destroyed?
> The code is not very straightforward.
>
> 1) target_svm->sev_es.snp_vmsa_gpa is set as INVALID_PAGE in the beginning of this function.
>
> 2) If a DESTROY is hit in this function, target_svm->sev_es.snp_vmsa_gpa will be
> left as INVALID_PAGE.
>
> 3) At the end of this function, it calls kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE).
>
> 4) In the vcpu_enter_guest(), the kvm_vcpu_reset()->sev_snp_init_protected_guest_state()
> ->__sev_snp_init_protected_guest_state() is called.
>
> 5) The mp_state is set to KVM_MP_STATE_STOPPED by default and the runtime VMSA is
> cleared. Then the it will be initialized according to the guest's
> configuration.
>
> 6) As the snp_vmsa_gpa is set as INVALID_PAGE in 1, the mp_state will be left as
> KVM_MP_STATE_STOPPED.
>
> 7) With this code piece:
>
> +                       kvm_vcpu_reset(vcpu, true);
> +                       if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
> +                               goto out;
>
> vcpu_enter_guest() bails out.


Thanks a lot Zhi for the detailed explanation! I think this code flow 
wants to become slightly more obvious. For example, if we just said

   case SVM_VMGEXIT_AP_DESTROY:
     /* This will tell __sev_snp_update_protected_guest_state to unmap 
the VMSA */
     target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
     break;

We'd get a big win in readability with little effort. It makes it 
immediately obvious where to look for the destroy operation.


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 52/56] ccp: Add support to decrypt the page
  2023-02-20 18:38 ` [PATCH RFC v8 52/56] ccp: Add support to decrypt the page Michael Roth
@ 2023-03-01 21:20   ` Zhi Wang
  2023-03-02  5:59     ` Dov Murik
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-03-01 21:20 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Mon, 20 Feb 2023 12:38:43 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Add support to decrypt guest encrypted memory. These API interfaces can
> be used for example to dump VMCBs on SNP guest exit.
> 

What kinds of check will be applied from firmware when VMM decrypts this
page? I suppose there has to be kinda mechanism to prevent VMM to decrypt
any page in the guest. It would be nice to have some introduction about
it in the comments.

> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: minor commit fixups]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c | 32 ++++++++++++++++++++++++++++++++
>  include/linux/psp-sev.h      | 22 ++++++++++++++++++++--
>  2 files changed, 52 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index e65563bc8298..bf5167b2acfc 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -2017,6 +2017,38 @@ int sev_guest_df_flush(int *error)
>  }
>  EXPORT_SYMBOL_GPL(sev_guest_df_flush);
>  
> +int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
> +{
> +	struct sev_data_snp_dbg data = {0};
> +	struct sev_device *sev;
> +	int ret;
> +
> +	if (!psp_master || !psp_master->sev_data)
> +		return -ENODEV;
> +
> +	sev = psp_master->sev_data;
> +
> +	if (!sev->snp_initialized)
> +		return -EINVAL;
> +
> +	data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
> +	data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
> +	data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);
> +
> +	/* The destination page must be in the firmware state. */
> +	if (rmp_mark_pages_firmware(data.dst_addr, 1, false))
> +		return -EIO;
> +
> +	ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);
> +
> +	/* Restore the page state */
> +	if (snp_reclaim_pages(data.dst_addr, 1, false))
> +		ret = -EIO;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
> +
>  int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>  				unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
>  {
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 81bafc049eca..92116e2b74fd 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -710,7 +710,6 @@ struct sev_data_snp_dbg {
>  	u64 gctx_paddr;				/* In */
>  	u64 src_addr;				/* In */
>  	u64 dst_addr;				/* In */
> -	u32 len;				/* In */
>  } __packed;
>  
>  /**
> @@ -913,13 +912,27 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
>   * @error: SEV command return code
>   *
>   * Returns:
> + * 0 if the sev successfully processed the command
> + * -%ENODEV    if the sev device is not available
> + * -%ENOTSUPP  if the sev does not support SEV
> + * -%ETIMEDOUT if the sev command timed out
> + * -%EIO       if the sev returned a non-zero return code
> + */
> +int sev_do_cmd(int cmd, void *data, int *psp_ret);
> +
> +/**
> + * snp_guest_dbg_decrypt_page - perform SEV SNP_DBG_DECRYPT command
> + *
> + * @sev_ret: sev command return code
> + *
> + * Returns:
>   * 0 if the SEV successfully processed the command
>   * -%ENODEV    if the SEV device is not available
>   * -%ENOTSUPP  if the SEV does not support SEV
>   * -%ETIMEDOUT if the SEV command timed out
>   * -%EIO       if the SEV returned a non-zero return code
>   */
> -int sev_do_cmd(int cmd, void *data, int *psp_ret);
> +int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error);
>  
>  void *psp_copy_user_blob(u64 uaddr, u32 len);
>  void *snp_alloc_firmware_page(gfp_t mask);
> @@ -987,6 +1000,11 @@ static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_P
>  
>  void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}
>  
> +static inline int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
> +{
> +	return -ENODEV;
> +}
> +
>  static inline void *snp_alloc_firmware_page(gfp_t mask)
>  {
>  	return NULL;


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 53/56] KVM: SVM: Make VMSAVE target area memory allocation SNP safe
  2023-02-20 18:38 ` [PATCH RFC v8 53/56] KVM: SVM: Make VMSAVE target area memory allocation SNP safe Michael Roth
@ 2023-03-01 21:23   ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-03-01 21:23 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On Mon, 20 Feb 2023 12:38:44 -0600
Michael Roth <michael.roth@amd.com> wrote:

> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of the VMSAVE target page.
> 
> When SEV-SNP is globally enabled, the CPU marks the VMSAVE target page
> as "InUse" while the VMSAVE instruction is executing. If another
> CPU writes to a different page in the same 2MB region while the VMSAVE
> is executing, the CPU will throw an RMP violation #PF.
> 
> Use the snp safe generic allocator for allocating the VMSA target
> page which will ensure that the page returned is not a hugepage, as it
> is already being used for the allocating the VMCB, VMSA and AVIC backing
> page.
> 

This should be merged with patch where implements the snp_safe_alloc_page().

> Co-developed-by: Marc Orr <marcorr@google.com>
> Signed-off-by: Marc Orr <marcorr@google.com>
> Reported-by: Alper Gun <alpergun@google.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3fe5f13b5f3a..8bda31a61757 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -665,7 +665,7 @@ static int svm_cpu_init(int cpu)
>  	int ret = -ENOMEM;
>  
>  	memset(sd, 0, sizeof(struct svm_cpu_data));
> -	sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	sd->save_area = snp_safe_alloc_page(NULL);
>  	if (!sd->save_area)
>  		return ret;
>  


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2023-03-01 16:56 ` [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Dave Hansen
@ 2023-03-01 22:59   ` Zhi Wang
  2023-03-01 23:39     ` Dave Hansen
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-03-01 22:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

On Wed, 1 Mar 2023 08:56:05 -0800
Dave Hansen <dave.hansen@intel.com> wrote:

> On 2/20/23 10:37, Michael Roth wrote:
> > The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
> > access requires an RMP check. In particular, the read accesses from the
> > hypervisor do not require RMP checks because the data confidentiality is
> > already protected via memory encryption. When hardware encounters an RMP
> > checks failure, it raises a page-fault exception. If RMP check failure
> > is due to the page-size mismatch, then split the large page to resolve
> > the fault.
> 
> What does this all _mean_?
>

Unlike TDX which implements secure EPT to hold the restricted memory mapping,
SEV-SNP is still using one NPT (similar to Intel EPT) while adding another
level of HW-enforced check controlled by the "RMP" table. Similar to TDX,
it has firmware calls to modify the attributes in the RMP table to achieve
isolation and shared-private memory conversion. 

The purpose and structure of RMP is quite similar to the PAMT table in TDX from
the perspective of managing the per-page attributes. Each system page has a
collection of attribute bits. But TDX only uses the PAMT as metadata as it has
a separate secure EPT to achieve HW-enforced check.

The RMP memory access checks has its own schematics. E.g. data write,
page table access from VMM will be checked, but data read is not, mostly
I guess, due to performance consideration. More details can be found from
Table 15-39. RMP Memory Access Checks in [1].

> When does the kernel need to care about a "page-size mismatch"?

The RMP table has the ability to describe a large page (similar to a large page
with a description of large-page PAMT entry in TDX that can be demoted via
TDX SEAMCALLs). E.g. 2MB page.

When the userspace sets the memory attribute of a GFN range through the
restricted memory ioctl, the sev logic (sev_update_mem_attr() in PATCH 48, to
be precise) will try to build a large page description in the RMP table if the
PFNs are continuous. When kernel mm breaks the the large page due to THP, KVM
updates the NPT accordingly.

Then there will be a page-size mismatch between NPT and RMP. It will be
resolved by a RMP fault later. Kinda of lazy sync.

[1] https://www.amd.com/system/files/TechDocs/24593.pdf

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 48/56] KVM: SVM: Add SNP-specific handling for memory attribute updates
  2023-02-20 18:38 ` [PATCH RFC v8 48/56] KVM: SVM: Add SNP-specific handling for memory attribute updates Michael Roth
@ 2023-03-01 23:37   ` Dave Hansen
  2023-04-05 23:48     ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Dave Hansen @ 2023-03-01 23:37 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On 2/20/23 10:38, Michael Roth wrote:
> +		/*
> +		 * TODO: The RMP entry's hugepage bit is ignored for
> +		 * shared/unassigned pages. Either handle looping through each
> +		 * sub-page as part of snp_make_page_shared(), or remove the
> +		 * level argument.
> +		 */
> +		if (op == SNP_PAGE_STATE_PRIVATE && order &&
> +		    IS_ALIGNED(gfn, 1 << order) && (gfn + (1 << order)) <= end) {
> +			level = order_to_level(order);
> +			npages = 1 << order;
> +		}

That's a wee bit obtuse.

First of all, I assume that the 'RFC' is because of these TODOs and they
won't survive to the point when you ask for this to be merged.

BTW, what keeps the restrictedmem_get_page() offset and the gfn aligned?

Let's start with this:

> +static inline u8 order_to_level(int order)
> +{
> +	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
> +
> +	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
> +		return PG_LEVEL_1G;
> +
> +	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
> +		return PG_LEVEL_2M;
> +
> +	return PG_LEVEL_4K;
> +}

Right now, 'order' comes only from restrictedmem_get_page(), which I dug
out of:

> https://github.com/mdroth/linux/commit/c6792672cd11737fd255dff10b2d5b6bccc626a8

That order is *only* filled in by THPs.  That makes the PG_LEVEL_1G
stuff here kinda silly.  I guess it might be seen as thorough, but it's
dead code.  I'd probably just make this work on order==9 || order==0 and
warn on anything else.

I'd also highly recommend some comments about how racy this all is.  I
guess it probably works, but it would be good to add some comments about
page splits and collapsing.

It's also not obvious why this only cares about private pages.

Anyway, this is the exact kind of thing where I really like a
well-commented helper:

bool can_install_large_rmp_entry(gfn, order)
{
	// small pages, blah blah
	if (!order)
		return false;

	// The region being updated must be aligned
	if (!IS_ALIGNED(gfn, 1 << order))
		return false;
	// ... and fit
	if (gfn + (1 << order)) > end)
		return false;

	return true;	
}

Which gets used like this:

	if (op == SNP_PAGE_STATE_PRIVATE &&
	    can_install_large_rmp_entry(gfn, order)) {
		level = ...
	}

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2023-03-01 22:59   ` Zhi Wang
@ 2023-03-01 23:39     ` Dave Hansen
  0 siblings, 0 replies; 147+ messages in thread
From: Dave Hansen @ 2023-03-01 23:39 UTC (permalink / raw)
  To: Zhi Wang
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

On 3/1/23 14:59, Zhi Wang wrote:
> When the userspace sets the memory attribute of a GFN range through the
> restricted memory ioctl, the sev logic (sev_update_mem_attr() in PATCH 48, to
> be precise) will try to build a large page description in the RMP table if the
> PFNs are continuous. When kernel mm breaks the the large page due to THP, KVM
> updates the NPT accordingly.

Gah, this really confused me.

It's *NOT* looking for contiguous PFNs.  It's looking for a
restrictedmem THP, which really is something different.  Restrictedmem
THPs have contiguous PFNs, but not all contiguous PFNs will result in
trying to build a large page.

Anyway, I'll reply over to the other patch.

But, either way, I'd appreciate this kind of summary in the changelogs
and probably a comment or two:

	The RMP needs to be consistent with the contents of the NPT.
	KVM updates the NPT but will neglect to update the RMP.  It is
	updated in response to faults when RMP and NPT get out of sync.

Right?

BTW, why doesn't KVM just update the RMP?  Why bother taking the fault?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs
  2023-02-20 18:38 ` [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs Michael Roth
  2023-02-21 12:40   ` Dov Murik
@ 2023-03-02  0:02   ` Zhi Wang
  2023-03-02  1:41     ` Dionna Amalie Glaze
  2023-03-02 11:34   ` Dov Murik
  2 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-03-02  0:02 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Dionna Glaze

On Mon, 20 Feb 2023 12:38:45 -0600
Michael Roth <michael.roth@amd.com> wrote:

In the host-wide certificates installing/uninstalling, a lock is to
prevent the racing between cert upload/download path and the handling
of guest request path.

Guess we need a similar lock to prevent the racing between
snp_{set,get}_instance_certs() and snp_handle_ext_guest_request().

This patch and PATCH 27 are focusing on cert from different sources.
It would be better to abstract. With the functions and data structure
of cert, the cert installing/uninstalling, checks, expectation of the
output buffer can be unified. Then we don't need mostly duplicate
routines in two different paths.

> From: Dionna Glaze <dionnaglaze@google.com>
> 
> The /dev/sev device has the ability to store host-wide certificates for
> the key used by the AMD-SP for SEV-SNP attestation report signing,
> but for hosts that want to specify additional certificates that are
> specific to the image launched in a VM, a different way is needed to
> communicate those certificates.
> 
> Add two new KVM ioctl to handle this: KVM_SEV_SNP_{GET,SET}_CERTS
> 
> The certificates that are set with this command are expected to follow
> the same format as the host certificates, but that format is opaque
> to the kernel.
> 
> The new behavior for custom certificates is that the extended guest
> request command will now return the overridden certificates if they
> were installed for the instance. The error condition for a too small
> data buffer is changed to return the overridden certificate data size
> if there is an overridden certificate set installed.
> 
> Setting a 0 length certificate returns the system state to only return
> the host certificates on an extended guest request.
> 
> Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
> for an extra certificate.
> 
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> 
> Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: remove used of "we" and "this patch" in commit log]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c   | 111 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/svm/svm.h   |   1 +
>  include/linux/psp-sev.h  |   2 +-
>  include/uapi/linux/kvm.h |  12 +++++
>  4 files changed, 123 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 70d5650d8d95..18b64b7005e7 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2089,6 +2089,7 @@ static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  		goto e_free;
>  
>  	sev->snp_certs_data = certs_data;
> +	sev->snp_certs_len = 0;
>  
>  	return context;
>  

Better to move the fix to PATCH 45.

> @@ -2404,6 +2405,86 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return ret;
>  }
>  
> +static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct kvm_sev_snp_get_certs params;
> +
> +	if (!sev_snp_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (!sev->snp_context)
> +		return -EINVAL;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			   sizeof(params)))
> +		return -EFAULT;
> +
> +	/* No instance certs set. */
> +	if (!sev->snp_certs_len)
> +		return -ENOENT;
> +
> +	if (params.certs_len < sev->snp_certs_len) {
> +		/* Output buffer too small. Return the required size. */
> +		params.certs_len = sev->snp_certs_len;
> +
> +		if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> +				 sizeof(params)))
> +			return -EFAULT;
> +
> +		return -EINVAL;
> +	}
> +
> +	if (copy_to_user((void __user *)(uintptr_t)params.certs_uaddr,
> +			 sev->snp_certs_data, sev->snp_certs_len))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	unsigned long length = SEV_FW_BLOB_MAX_SIZE;
> +	void *to_certs = sev->snp_certs_data;
> +	struct kvm_sev_snp_set_certs params;
> +
> +	if (!sev_snp_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (!sev->snp_context)
> +		return -EINVAL;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			   sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.certs_len > SEV_FW_BLOB_MAX_SIZE)
> +		return -EINVAL;
> +
> +	/*
> +	 * Setting a length of 0 is the same as "uninstalling" instance-
> +	 * specific certificates.
> +	 */
> +	if (params.certs_len == 0) {
> +		sev->snp_certs_len = 0;
> +		return 0;
> +	}
> +
> +	/* Page-align the length */
> +	length = (params.certs_len + PAGE_SIZE - 1) & PAGE_MASK;
> +
> +	if (copy_from_user(to_certs,
> +			   (void __user *)(uintptr_t)params.certs_uaddr,
> +			   params.certs_len)) {
> +		return -EFAULT;
> +	}
> +
> +	sev->snp_certs_len = length;
> +
> +	return 0;
> +}
> +
>  int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -2503,6 +2584,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_SNP_LAUNCH_FINISH:
>  		r = snp_launch_finish(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_SNP_GET_CERTS:
> +		r = snp_get_instance_certs(kvm, &sev_cmd);
> +		break;
> +	case KVM_SEV_SNP_SET_CERTS:
> +		r = snp_set_instance_certs(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> @@ -3550,8 +3637,28 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
>  	if (rc)
>  		goto unlock;
>  
> -	rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
> -					 &data_npages, &err);
> +	/*
> +	 * If the VMM has overridden the certs, then change the error message
> +	 * if the size is inappropriate for the override. Otherwise, use a
> +	 * regular guest request and copy back the instance certs.
> +	 */
> +	if (sev->snp_certs_len) {
> +		if ((data_npages << PAGE_SHIFT) < sev->snp_certs_len) {
> +			rc = -EINVAL;
> +			err = SNP_GUEST_REQ_INVALID_LEN;
> +			goto datalen;
> +		}
> +		rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req,
> +				   (int *)&err);
> +	} else {
> +		rc = snp_guest_ext_guest_request(&req,
> +						 (unsigned long)sev->snp_certs_data,
> +						 &data_npages, &err);
> +	}
> +datalen:
> +	if (sev->snp_certs_len)
> +		data_npages = sev->snp_certs_len >> PAGE_SHIFT;
> +
>  	if (rc) {
>  		/*
>  		 * If buffer length is small then return the expected
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 221b38d3c845..dced46559508 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -94,6 +94,7 @@ struct kvm_sev_info {
>  	u64 snp_init_flags;
>  	void *snp_context;      /* SNP guest context page */
>  	void *snp_certs_data;
> +	unsigned int snp_certs_len; /* Size of instance override for certs */
>  	struct mutex guest_req_lock; /* Lock for guest request handling */
>  
>  	u64 sev_features;	/* Features set at VMSA creation */
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 92116e2b74fd..3b28b78938f6 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -22,7 +22,7 @@
>  #define __psp_pa(x)	__pa(x)
>  #endif
>  
> -#define SEV_FW_BLOB_MAX_SIZE	0x4000	/* 16KB */
> +#define SEV_FW_BLOB_MAX_SIZE	0x5000	/* 20KB */
>  
>  /**
>   * SEV platform state
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6e684bf5f723..ad7e24e43547 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1928,6 +1928,8 @@ enum sev_cmd_id {
>  	KVM_SEV_SNP_LAUNCH_START,
>  	KVM_SEV_SNP_LAUNCH_UPDATE,
>  	KVM_SEV_SNP_LAUNCH_FINISH,
> +	KVM_SEV_SNP_GET_CERTS,
> +	KVM_SEV_SNP_SET_CERTS,
>  
>  	KVM_SEV_NR_MAX,
>  };
> @@ -2075,6 +2077,16 @@ struct kvm_sev_snp_launch_finish {
>  	__u8 pad[6];
>  };
>  
> +struct kvm_sev_snp_get_certs {
> +	__u64 certs_uaddr;
> +	__u64 certs_len;
> +};
> +
> +struct kvm_sev_snp_set_certs {
> +	__u64 certs_uaddr;
> +	__u64 certs_len;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs
  2023-03-02  0:02   ` Zhi Wang
@ 2023-03-02  1:41     ` Dionna Amalie Glaze
  2023-03-02 11:27       ` Zhi Wang
  0 siblings, 1 reply; 147+ messages in thread
From: Dionna Amalie Glaze @ 2023-03-02  1:41 UTC (permalink / raw)
  To: Zhi Wang
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

> > @@ -2089,6 +2089,7 @@ static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >               goto e_free;
> >
> >       sev->snp_certs_data = certs_data;
> > +     sev->snp_certs_len = 0;
> >
> >       return context;
> >
>
> Better to move the fix to PATCH 45.
>

This part isn't a fix, but part of the implementation since
snp_certs_len is added in this patch here

> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 221b38d3c845..dced46559508 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -94,6 +94,7 @@ struct kvm_sev_info {
> >       u64 snp_init_flags;
> >       void *snp_context;      /* SNP guest context page */
> >       void *snp_certs_data;
> > +     unsigned int snp_certs_len; /* Size of instance override for certs */
> >       struct mutex guest_req_lock; /* Lock for guest request handling */
> >
> >       u64 sev_features;       /* Features set at VMSA creation */


-- 
-Dionna Glaze, PhD (she/her)

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 52/56] ccp: Add support to decrypt the page
  2023-03-01 21:20   ` Zhi Wang
@ 2023-03-02  5:59     ` Dov Murik
  2023-03-02 14:33       ` Tom Lendacky
  0 siblings, 1 reply; 147+ messages in thread
From: Dov Murik @ 2023-03-02  5:59 UTC (permalink / raw)
  To: Zhi Wang, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, Brijesh Singh,
	Dov Murik

Hi Mike, Zhi,

On 01/03/2023 23:20, Zhi Wang wrote:
> On Mon, 20 Feb 2023 12:38:43 -0600
> Michael Roth <michael.roth@amd.com> wrote:
> 
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> Add support to decrypt guest encrypted memory. These API interfaces can
>> be used for example to dump VMCBs on SNP guest exit.
>>
> 
> What kinds of check will be applied from firmware when VMM decrypts this
> page? I suppose there has to be kinda mechanism to prevent VMM to decrypt
> any page in the guest. It would be nice to have some introduction about
> it in the comments.
> 

The SNP ABI spec says (section 8.27.2 SNP_DBG_DECRYPT):

  The firmware checks that the guest's policy allows debugging. If not,
  the firmware returns POLICY_FAILURE.

and in the Guest Policy (section 4.3):

  Bit 19 - DEBUG
  0: Debugging is disallowed.
  1: Debugging is allowed.

In the kernel, that firmware error code is defined as
SEV_RET_POLICY_FAILURE.


>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> [mdr: minor commit fixups]
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>  drivers/crypto/ccp/sev-dev.c | 32 ++++++++++++++++++++++++++++++++
>>  include/linux/psp-sev.h      | 22 ++++++++++++++++++++--
>>  2 files changed, 52 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index e65563bc8298..bf5167b2acfc 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -2017,6 +2017,38 @@ int sev_guest_df_flush(int *error)
>>  }
>>  EXPORT_SYMBOL_GPL(sev_guest_df_flush);
>>  
>> +int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
>> +{
>> +	struct sev_data_snp_dbg data = {0};
>> +	struct sev_device *sev;
>> +	int ret;
>> +
>> +	if (!psp_master || !psp_master->sev_data)
>> +		return -ENODEV;
>> +
>> +	sev = psp_master->sev_data;
>> +
>> +	if (!sev->snp_initialized)
>> +		return -EINVAL;
>> +
>> +	data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
>> +	data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
>> +	data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);

I guess this works, but I wonder why we need to turn on sme_me_mask on
teh dst_addr.  I thought that the firmware decrypts the guest page
(src_addr) to a plaintext page.  Couldn't find this requirement in the
SNP spec.


>> +
>> +	/* The destination page must be in the firmware state. */
>> +	if (rmp_mark_pages_firmware(data.dst_addr, 1, false))
>> +		return -EIO;
>> +
>> +	ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);
>> +
>> +	/* Restore the page state */
>> +	if (snp_reclaim_pages(data.dst_addr, 1, false))
>> +		ret = -EIO;
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
>> +
>>  int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>>  				unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
>>  {
>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
>> index 81bafc049eca..92116e2b74fd 100644
>> --- a/include/linux/psp-sev.h
>> +++ b/include/linux/psp-sev.h
>> @@ -710,7 +710,6 @@ struct sev_data_snp_dbg {
>>  	u64 gctx_paddr;				/* In */
>>  	u64 src_addr;				/* In */
>>  	u64 dst_addr;				/* In */
>> -	u32 len;				/* In */
>>  } __packed;

The comment above this ^^^ struct still lists the 'len' field, and also
calls the first field 'handle' instead of 'gctx_paddr'.

Also - why is this change happening in this patch? Why was the incorrect
'len' field added in the first place in "[PATCH RFC v8 20/56]
crypto:ccp: Define the SEV-SNP commands" ? (the comment fixes should
probably go there too).



>>  
>>  /**
>> @@ -913,13 +912,27 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
>>   * @error: SEV command return code
>>   *
>>   * Returns:
>> + * 0 if the sev successfully processed the command
>> + * -%ENODEV    if the sev device is not available
>> + * -%ENOTSUPP  if the sev does not support SEV
>> + * -%ETIMEDOUT if the sev command timed out
>> + * -%EIO       if the sev returned a non-zero return code
>> + */

I think that if the word 'sev' would be 'SEV' in this comment, the diff
will be a bit less misleading (basically this patch should not introduce
changes to sev_do_cmd).

-Dov

>> +int sev_do_cmd(int cmd, void *data, int *psp_ret);
>> +
>> +/**
>> + * snp_guest_dbg_decrypt_page - perform SEV SNP_DBG_DECRYPT command
>> + *
>> + * @sev_ret: sev command return code
>> + *
>> + * Returns:
>>   * 0 if the SEV successfully processed the command
>>   * -%ENODEV    if the SEV device is not available
>>   * -%ENOTSUPP  if the SEV does not support SEV
>>   * -%ETIMEDOUT if the SEV command timed out
>>   * -%EIO       if the SEV returned a non-zero return code
>>   */
>> -int sev_do_cmd(int cmd, void *data, int *psp_ret);
>> +int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error);
>>  
>>  void *psp_copy_user_blob(u64 uaddr, u32 len);
>>  void *snp_alloc_firmware_page(gfp_t mask);
>> @@ -987,6 +1000,11 @@ static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_P
>>  
>>  void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}
>>  
>> +static inline int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  static inline void *snp_alloc_firmware_page(gfp_t mask)
>>  {
>>  	return NULL;
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs
  2023-03-02  1:41     ` Dionna Amalie Glaze
@ 2023-03-02 11:27       ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-03-02 11:27 UTC (permalink / raw)
  To: Dionna Amalie Glaze
  Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

On Wed, 1 Mar 2023 17:41:11 -0800
Dionna Amalie Glaze <dionnaglaze@google.com> wrote:

> > > @@ -2089,6 +2089,7 @@ static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > >               goto e_free;
> > >
> > >       sev->snp_certs_data = certs_data;
> > > +     sev->snp_certs_len = 0;
> > >
> > >       return context;
> > >
> >
> > Better to move the fix to PATCH 45.
> >
> 
> This part isn't a fix, but part of the implementation since
> snp_certs_len is added in this patch here
>

I see. My bad. Was thinking it was the snp_serts_len in the global sev as
they has the same name.
 
> > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > index 221b38d3c845..dced46559508 100644
> > > --- a/arch/x86/kvm/svm/svm.h
> > > +++ b/arch/x86/kvm/svm/svm.h
> > > @@ -94,6 +94,7 @@ struct kvm_sev_info {
> > >       u64 snp_init_flags;
> > >       void *snp_context;      /* SNP guest context page */
> > >       void *snp_certs_data;
> > > +     unsigned int snp_certs_len; /* Size of instance override for certs */
> > >       struct mutex guest_req_lock; /* Lock for guest request handling */
> > >
> > >       u64 sev_features;       /* Features set at VMSA creation */
> 
> 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs
  2023-02-20 18:38 ` [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs Michael Roth
  2023-02-21 12:40   ` Dov Murik
  2023-03-02  0:02   ` Zhi Wang
@ 2023-03-02 11:34   ` Dov Murik
  2 siblings, 0 replies; 147+ messages in thread
From: Dov Murik @ 2023-03-02 11:34 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, Dionna Glaze,
	Dov Murik



On 20/02/2023 20:38, Michael Roth wrote:
> From: Dionna Glaze <dionnaglaze@google.com>
> 
> The /dev/sev device has the ability to store host-wide certificates for
> the key used by the AMD-SP for SEV-SNP attestation report signing,
> but for hosts that want to specify additional certificates that are
> specific to the image launched in a VM, a different way is needed to
> communicate those certificates.
> 
> Add two new KVM ioctl to handle this: KVM_SEV_SNP_{GET,SET}_CERTS
> 
> The certificates that are set with this command are expected to follow
> the same format as the host certificates, but that format is opaque
> to the kernel.
> 
> The new behavior for custom certificates is that the extended guest
> request command will now return the overridden certificates if they
> were installed for the instance. The error condition for a too small
> data buffer is changed to return the overridden certificate data size
> if there is an overridden certificate set installed.
> 
> Setting a 0 length certificate returns the system state to only return
> the host certificates on an extended guest request.
> 
> Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
> for an extra certificate.
> 
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> 
> Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: remove used of "we" and "this patch" in commit log]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c   | 111 ++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/svm/svm.h   |   1 +
>  include/linux/psp-sev.h  |   2 +-
>  include/uapi/linux/kvm.h |  12 +++++
>  4 files changed, 123 insertions(+), 3 deletions(-)
> 

[...]

> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 92116e2b74fd..3b28b78938f6 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -22,7 +22,7 @@
>  #define __psp_pa(x)	__pa(x)
>  #endif
>  
> -#define SEV_FW_BLOB_MAX_SIZE	0x4000	/* 16KB */
> +#define SEV_FW_BLOB_MAX_SIZE	0x5000	/* 20KB */

This change should be removed (it was also discussed in v7).  If I
understand correctly, 16KB is a limit of the PSP.

-Dov


>  
>  /**
>   * SEV platform state

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 52/56] ccp: Add support to decrypt the page
  2023-03-02  5:59     ` Dov Murik
@ 2023-03-02 14:33       ` Tom Lendacky
  2023-03-02 21:11         ` Dov Murik
  0 siblings, 1 reply; 147+ messages in thread
From: Tom Lendacky @ 2023-03-02 14:33 UTC (permalink / raw)
  To: Dov Murik, Zhi Wang, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh

On 3/1/23 23:59, Dov Murik wrote:
> Hi Mike, Zhi,
> 
> On 01/03/2023 23:20, Zhi Wang wrote:
>> On Mon, 20 Feb 2023 12:38:43 -0600
>> Michael Roth <michael.roth@amd.com> wrote:
>>
>>> From: Brijesh Singh <brijesh.singh@amd.com>
>>>
>>> Add support to decrypt guest encrypted memory. These API interfaces can
>>> be used for example to dump VMCBs on SNP guest exit.
>>>
>>
>> What kinds of check will be applied from firmware when VMM decrypts this
>> page? I suppose there has to be kinda mechanism to prevent VMM to decrypt
>> any page in the guest. It would be nice to have some introduction about
>> it in the comments.
>>
> 
> The SNP ABI spec says (section 8.27.2 SNP_DBG_DECRYPT):
> 
>    The firmware checks that the guest's policy allows debugging. If not,
>    the firmware returns POLICY_FAILURE.
> 
> and in the Guest Policy (section 4.3):
> 
>    Bit 19 - DEBUG
>    0: Debugging is disallowed.
>    1: Debugging is allowed.
> 
> In the kernel, that firmware error code is defined as
> SEV_RET_POLICY_FAILURE.
> 
> 
>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> [mdr: minor commit fixups]
>>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>>> ---
>>>   drivers/crypto/ccp/sev-dev.c | 32 ++++++++++++++++++++++++++++++++
>>>   include/linux/psp-sev.h      | 22 ++++++++++++++++++++--
>>>   2 files changed, 52 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>>> index e65563bc8298..bf5167b2acfc 100644
>>> --- a/drivers/crypto/ccp/sev-dev.c
>>> +++ b/drivers/crypto/ccp/sev-dev.c
>>> @@ -2017,6 +2017,38 @@ int sev_guest_df_flush(int *error)
>>>   }
>>>   EXPORT_SYMBOL_GPL(sev_guest_df_flush);
>>>   
>>> +int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
>>> +{
>>> +	struct sev_data_snp_dbg data = {0};
>>> +	struct sev_device *sev;
>>> +	int ret;
>>> +
>>> +	if (!psp_master || !psp_master->sev_data)
>>> +		return -ENODEV;
>>> +
>>> +	sev = psp_master->sev_data;
>>> +
>>> +	if (!sev->snp_initialized)
>>> +		return -EINVAL;
>>> +
>>> +	data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
>>> +	data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
>>> +	data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);
> 
> I guess this works, but I wonder why we need to turn on sme_me_mask on
> teh dst_addr.  I thought that the firmware decrypts the guest page
> (src_addr) to a plaintext page.  Couldn't find this requirement in the
> SNP spec.

This sme_me_mask tells the firmware how to access the host memory (similar 
to how DMA uses sme_me_mask when supplying addresses to devices under 
SME). This needs to match the pagetable mapping being used by the host, 
otherwise the contents will appears as ciphertext to the host if they are 
not in sync. Since the default pagetable mapping is encrypted, the 
sme_me_mask bit must be provided on the destination address. So it is not 
a spec requirement, but an SME implementation requirement.

Thanks,
Tom

> 
> 
>>> +
>>> +	/* The destination page must be in the firmware state. */
>>> +	if (rmp_mark_pages_firmware(data.dst_addr, 1, false))
>>> +		return -EIO;
>>> +
>>> +	ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);
>>> +
>>> +	/* Restore the page state */
>>> +	if (snp_reclaim_pages(data.dst_addr, 1, false))
>>> +		ret = -EIO;
>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
>>> +
>>>   int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>>>   				unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
>>>   {
>>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
>>> index 81bafc049eca..92116e2b74fd 100644
>>> --- a/include/linux/psp-sev.h
>>> +++ b/include/linux/psp-sev.h
>>> @@ -710,7 +710,6 @@ struct sev_data_snp_dbg {
>>>   	u64 gctx_paddr;				/* In */
>>>   	u64 src_addr;				/* In */
>>>   	u64 dst_addr;				/* In */
>>> -	u32 len;				/* In */
>>>   } __packed;
> 
> The comment above this ^^^ struct still lists the 'len' field, and also
> calls the first field 'handle' instead of 'gctx_paddr'.
> 
> Also - why is this change happening in this patch? Why was the incorrect
> 'len' field added in the first place in "[PATCH RFC v8 20/56]
> crypto:ccp: Define the SEV-SNP commands" ? (the comment fixes should
> probably go there too).
> 
> 
> 
>>>   
>>>   /**
>>> @@ -913,13 +912,27 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
>>>    * @error: SEV command return code
>>>    *
>>>    * Returns:
>>> + * 0 if the sev successfully processed the command
>>> + * -%ENODEV    if the sev device is not available
>>> + * -%ENOTSUPP  if the sev does not support SEV
>>> + * -%ETIMEDOUT if the sev command timed out
>>> + * -%EIO       if the sev returned a non-zero return code
>>> + */
> 
> I think that if the word 'sev' would be 'SEV' in this comment, the diff
> will be a bit less misleading (basically this patch should not introduce
> changes to sev_do_cmd).
> 
> -Dov
> 
>>> +int sev_do_cmd(int cmd, void *data, int *psp_ret);
>>> +
>>> +/**
>>> + * snp_guest_dbg_decrypt_page - perform SEV SNP_DBG_DECRYPT command
>>> + *
>>> + * @sev_ret: sev command return code
>>> + *
>>> + * Returns:
>>>    * 0 if the SEV successfully processed the command
>>>    * -%ENODEV    if the SEV device is not available
>>>    * -%ENOTSUPP  if the SEV does not support SEV
>>>    * -%ETIMEDOUT if the SEV command timed out
>>>    * -%EIO       if the SEV returned a non-zero return code
>>>    */
>>> -int sev_do_cmd(int cmd, void *data, int *psp_ret);
>>> +int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error);
>>>   
>>>   void *psp_copy_user_blob(u64 uaddr, u32 len);
>>>   void *snp_alloc_firmware_page(gfp_t mask);
>>> @@ -987,6 +1000,11 @@ static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_P
>>>   
>>>   void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}
>>>   
>>> +static inline int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
>>> +{
>>> +	return -ENODEV;
>>> +}
>>> +
>>>   static inline void *snp_alloc_firmware_page(gfp_t mask)
>>>   {
>>>   	return NULL;
>>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 52/56] ccp: Add support to decrypt the page
  2023-03-02 14:33       ` Tom Lendacky
@ 2023-03-02 21:11         ` Dov Murik
  0 siblings, 0 replies; 147+ messages in thread
From: Dov Murik @ 2023-03-02 21:11 UTC (permalink / raw)
  To: Tom Lendacky, Zhi Wang, Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh, Dov Murik



On 02/03/2023 16:33, Tom Lendacky wrote:
> On 3/1/23 23:59, Dov Murik wrote:
>> Hi Mike, Zhi,
>>
>> On 01/03/2023 23:20, Zhi Wang wrote:
>>> On Mon, 20 Feb 2023 12:38:43 -0600
>>> Michael Roth <michael.roth@amd.com> wrote:
>>>
>>>> From: Brijesh Singh <brijesh.singh@amd.com>
>>>>
>>>> Add support to decrypt guest encrypted memory. These API interfaces can
>>>> be used for example to dump VMCBs on SNP guest exit.
>>>>
>>>
>>> What kinds of check will be applied from firmware when VMM decrypts this
>>> page? I suppose there has to be kinda mechanism to prevent VMM to
>>> decrypt
>>> any page in the guest. It would be nice to have some introduction about
>>> it in the comments.
>>>
>>
>> The SNP ABI spec says (section 8.27.2 SNP_DBG_DECRYPT):
>>
>>    The firmware checks that the guest's policy allows debugging. If not,
>>    the firmware returns POLICY_FAILURE.
>>
>> and in the Guest Policy (section 4.3):
>>
>>    Bit 19 - DEBUG
>>    0: Debugging is disallowed.
>>    1: Debugging is allowed.
>>
>> In the kernel, that firmware error code is defined as
>> SEV_RET_POLICY_FAILURE.
>>
>>
>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>> [mdr: minor commit fixups]
>>>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>>>> ---
>>>>   drivers/crypto/ccp/sev-dev.c | 32 ++++++++++++++++++++++++++++++++
>>>>   include/linux/psp-sev.h      | 22 ++++++++++++++++++++--
>>>>   2 files changed, 52 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/crypto/ccp/sev-dev.c
>>>> b/drivers/crypto/ccp/sev-dev.c
>>>> index e65563bc8298..bf5167b2acfc 100644
>>>> --- a/drivers/crypto/ccp/sev-dev.c
>>>> +++ b/drivers/crypto/ccp/sev-dev.c
>>>> @@ -2017,6 +2017,38 @@ int sev_guest_df_flush(int *error)
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(sev_guest_df_flush);
>>>>   +int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64
>>>> dst_pfn, int *error)
>>>> +{
>>>> +    struct sev_data_snp_dbg data = {0};
>>>> +    struct sev_device *sev;
>>>> +    int ret;
>>>> +
>>>> +    if (!psp_master || !psp_master->sev_data)
>>>> +        return -ENODEV;
>>>> +
>>>> +    sev = psp_master->sev_data;
>>>> +
>>>> +    if (!sev->snp_initialized)
>>>> +        return -EINVAL;
>>>> +
>>>> +    data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
>>>> +    data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
>>>> +    data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);
>>
>> I guess this works, but I wonder why we need to turn on sme_me_mask on
>> teh dst_addr.  I thought that the firmware decrypts the guest page
>> (src_addr) to a plaintext page.  Couldn't find this requirement in the
>> SNP spec.
> 
> This sme_me_mask tells the firmware how to access the host memory
> (similar to how DMA uses sme_me_mask when supplying addresses to devices
> under SME). This needs to match the pagetable mapping being used by the
> host, otherwise the contents will appears as ciphertext to the host if
> they are not in sync. Since the default pagetable mapping is encrypted,
> the sme_me_mask bit must be provided on the destination address. So it
> is not a spec requirement, but an SME implementation requirement.
> 

Ah, OK, that's clear now. Thanks Tom.

-Dov

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 08/56] KVM: SEV: Rename sev_{pin,unpin}_memory
  2023-02-20 18:37 ` [PATCH RFC v8 08/56] KVM: SEV: Rename sev_{pin,unpin}_memory Michael Roth
@ 2023-03-03 14:00   ` Vlastimil Babka
  2023-03-06 11:01     ` Nikunj A. Dadhania
  0 siblings, 1 reply; 147+ messages in thread
From: Vlastimil Babka @ 2023-03-03 14:00 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Nikunj A Dadhania

On 2/20/23 19:37, Michael Roth wrote:
> From: Nikunj A Dadhania <nikunj@amd.com>
> 
> Rename sev_{pin|unpin}_memory to sev_memory_{get|put}_pages. Apart
> from pinning the pages, sev_pin_memory also populates the pages array
> which is used by its callers. SEV guest using restricted memfd do not
> to pin the memory but will require the pages array to be populated.

  ^need to?

> Rename the function appropriately.
> 
> No functional change intended.
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 09/56] KVM: SEV: Handle memory backed by restricted memfd
  2023-02-20 18:38 ` [PATCH RFC v8 09/56] KVM: SEV: Handle memory backed by restricted memfd Michael Roth
@ 2023-03-03 14:05   ` Vlastimil Babka
  2023-03-06 11:03     ` Nikunj A. Dadhania
  0 siblings, 1 reply; 147+ messages in thread
From: Vlastimil Babka @ 2023-03-03 14:05 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Nikunj A Dadhania

On 2/20/23 19:38, Michael Roth wrote:
> From: Nikunj A Dadhania <nikunj@amd.com>
> 
> Do not pin the guest memory backed by a restrictedmem backend, as
> pages in the restrictedmem are already pinned. Instead, populate the
> pages array for these guests using the already-pinned pages provided by

IIUC the "already pinned" became "effectively unmovable and unevictable"
since the earlier versions, so that would be more accurate now?

> +static int sev_private_mem_get_pages(struct kvm *kvm, unsigned long addr,
> +				     unsigned long size, unsigned long npages,
> +				     struct page **pages)
> +{
> +	return kvm_vm_do_hva_range_op(kvm, addr, addr + size,
> +				      sev_private_mem_get_pages_handler, pages);
> +}
> +
>  /*
>   * Legacy SEV guest pin the pages and return the array populated with pinned
>   * pages.
> + *
> + * SEV guests using restricted memfd backend, pages are already marked as
> + * unmovable and unevictable. Populate the pages array for these guests using
> + * restrictedmem get_pfn.

Right.



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 12/56] x86/sev: Add RMP entry lookup helpers
  2023-02-20 18:38 ` [PATCH RFC v8 12/56] x86/sev: Add RMP entry lookup helpers Michael Roth
@ 2023-03-03 15:28   ` Vlastimil Babka
  2023-03-29 22:59     ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Vlastimil Babka @ 2023-03-03 15:28 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, Brijesh Singh

On 2/20/23 19:38, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
> entry for a given page. The RMP entry format is documented in AMD PPR, see
> https://bugzilla.kernel.org/attachment.cgi?id=296015.
> 
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---

> +/*
> + * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
> + * and -errno if there is no corresponding RMP entry.
> + */

Hmm IMHO the kernel's idiomatic way is to return 0 on "success" and I'd
assume the more intuitive expectation of success here if the entry is
assigned? The various callers seem to differ though so I guess it depends on
context. Some however don't distinguish their "failure" from an ERR and
maybe they should, at least for the purposes of the various printks?

> +int snp_lookup_rmpentry(u64 pfn, int *level)
> +{
> +	struct rmpentry *e;
> +
> +	e = __snp_lookup_rmpentry(pfn, level);
> +	if (IS_ERR(e))
> +		return PTR_ERR(e);
> +
> +	return !!rmpentry_assigned(e);
> +}
> +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address
  2023-02-20 18:38 ` [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address Michael Roth
  2023-03-01 16:21   ` Dave Hansen
@ 2023-03-03 15:31   ` Vlastimil Babka
  1 sibling, 0 replies; 147+ messages in thread
From: Vlastimil Babka @ 2023-03-03 15:31 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, Brijesh Singh,
	Jarkko Sakkinen

On 2/20/23 19:38, Michael Roth wrote:
> +static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
> +				      unsigned long address)
> +{
> +	int rmp_level, level;
> +	pgd_t *pgd;
> +	pte_t *pte;
> +	u64 pfn;
> +
> +	pgd = __va(read_cr3_pa());
> +	pgd += pgd_index(address);
> +
> +	pte = lookup_address_in_pgd(pgd, address, &level);
> +
> +	/*
> +	 * It can happen if there was a race between an unmap event and
> +	 * the RMP fault delivery.
> +	 */
> +	if (!pte || !pte_present(*pte))
> +		return RMP_PF_UNMAP;
> +
> +	/*
> +	 * RMP page fault handler follows this algorithm:
> +	 * 1. Compute the pfn for the 4kb page being accessed
> +	 * 2. Read that RMP entry -- If it is assigned then kill the process
> +	 * 3. Otherwise, check the level from the host page table
> +	 *    If level=PG_LEVEL_4K then the page is already smashed
> +	 *    so just retry the instruction
> +	 * 4. If level=PG_LEVEL_2M/1G, then the host page needs to be split
> +	 */
> +
> +	pfn = pte_pfn(*pte);
> +
> +	/* If its large page then calculte the fault pfn */
> +	if (level > PG_LEVEL_4K)
> +		pfn = pfn | PFN_DOWN(address & (page_level_size(level) - 1));
> +
> +	/*
> +	 * If its a guest private page, then the fault cannot be resolved.
> +	 * Send a SIGBUS to terminate the process.
> +	 *
> +	 * As documented in APM vol3 pseudo-code for RMPUPDATE, when the 2M range
> +	 * is covered by a valid (Assigned=1) 2M entry, the middle 511 4k entries
> +	 * also have Assigned=1. This means that if there is an access to a page
> +	 * which happens to lie within an Assigned 2M entry, the 4k RMP entry
> +	 * will also have Assigned=1. Therefore, the kernel should see that
> +	 * the page is not a valid page and the fault cannot be resolved.
> +	 */
> +	if (snp_lookup_rmpentry(pfn, &rmp_level)) {
> +		pr_info("Fatal RMP page fault, terminating process, entry assigned for pfn 0x%llx\n",
> +			pfn);
> +		do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
> +		return RMP_PF_RETRY;
> +	}

WRT my reply to 12/56, for example here it might be useful to distinguish
the rmp being assigned from an error of snp_lookup_rmpentry()?

> +
> +	/*
> +	 * The backing page level is higher than the RMP page level, request
> +	 * to split the page.
> +	 */
> +	if (level > rmp_level)
> +		return RMP_PF_SPLIT;
> +
> +	return RMP_PF_RETRY;
> +}
> +
>  /*


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 23/56] crypto: ccp: Introduce snp leaked pages list
  2023-02-20 18:38 ` [PATCH RFC v8 23/56] crypto: ccp: Introduce snp leaked pages list Michael Roth
@ 2023-03-03 15:54   ` Vlastimil Babka
  0 siblings, 0 replies; 147+ messages in thread
From: Vlastimil Babka @ 2023-03-03 15:54 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On 2/20/23 19:38, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Pages are unsafe to be released back to the page-allocator, if they
> have been transitioned to firmware/guest state and can't be reclaimed
> or transitioned back to hypervisor/shared state. In this case add
> them to an internal leaked pages list to ensure that they are not freed
> or touched/accessed to cause fatal page faults.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c | 28 ++++++++++++++++++++++++++++
>  include/linux/psp-sev.h      |  8 ++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 35f605936f1b..eca4e59b0f44 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -42,6 +42,12 @@
>  static DEFINE_MUTEX(sev_cmd_mutex);
>  static struct sev_misc_dev *misc_dev;
>  
> +/* list of pages which are leaked and cannot be reclaimed */
> +static LIST_HEAD(snp_leaked_pages_list);
> +static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
> +
> +static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
> +
>  static int psp_cmd_timeout = 100;
>  module_param(psp_cmd_timeout, int, 0644);
>  MODULE_PARM_DESC(psp_cmd_timeout, " default timeout value, in seconds, for PSP commands");
> @@ -188,6 +194,28 @@ static int sev_cmd_buffer_len(int cmd)
>  	return 0;
>  }
>  
> +void snp_mark_pages_offline(unsigned long pfn, unsigned int npages)

Why call it offline which has usually a memory hotplug-related meaning? What
about e.g. snp_leak_bad_pages() ?

> +{
> +	struct page *page = pfn_to_page(pfn);
> +
> +	WARN(1, "psc failed, pfn 0x%lx pages %d (marked offline)\n", pfn, npages);
> +
> +	spin_lock(&snp_leaked_pages_list_lock);
> +	while (npages--) {
> +		/*
> +		 * Reuse the page's buddy list for chaining into the leaked
> +		 * pages list. This page should not be on a free list currently
> +		 * and is also unsafe to be added to a free list.
> +		 */
> +		list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
> +		sev_dump_rmpentry(pfn);
> +		pfn++;
> +	}
> +	spin_unlock(&snp_leaked_pages_list_lock);
> +	atomic_long_inc(&snp_nr_leaked_pages);
> +}
> +EXPORT_SYMBOL_GPL(snp_mark_pages_offline);
> +
>  static void *sev_fw_alloc(unsigned long len)
>  {
>  	struct page *page;
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 46f61e3ae33b..8edf5c548fbf 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -923,6 +923,12 @@ int sev_do_cmd(int cmd, void *data, int *psp_ret);
>  
>  void *psp_copy_user_blob(u64 uaddr, u32 len);
>  
> +/**
> + * sev_mark_pages_offline - insert non-reclaimed firmware/guest pages
> + * into a leaked pages list.
> + */
> +void snp_mark_pages_offline(unsigned long pfn, unsigned int npages);
> +
>  #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
>  
>  static inline int
> @@ -951,6 +957,8 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
>  
>  static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
>  
> +void snp_mark_pages_offline(unsigned long pfn, unsigned int npages) {}
> +
>  #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
>  
>  #endif	/* __PSP_SEV_H__ */


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 08/56] KVM: SEV: Rename sev_{pin,unpin}_memory
  2023-03-03 14:00   ` Vlastimil Babka
@ 2023-03-06 11:01     ` Nikunj A. Dadhania
  0 siblings, 0 replies; 147+ messages in thread
From: Nikunj A. Dadhania @ 2023-03-06 11:01 UTC (permalink / raw)
  To: Vlastimil Babka, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania



On 03/03/23 19:30, Vlastimil Babka wrote:
> On 2/20/23 19:37, Michael Roth wrote:
>> From: Nikunj A Dadhania <nikunj@amd.com>
>>
>> Rename sev_{pin|unpin}_memory to sev_memory_{get|put}_pages. Apart
>> from pinning the pages, sev_pin_memory also populates the pages array
>> which is used by its callers. SEV guest using restricted memfd do not
>> to pin the memory but will require the pages array to be populated.
> 
>   ^need to?
> 

Sure

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 09/56] KVM: SEV: Handle memory backed by restricted memfd
  2023-03-03 14:05   ` Vlastimil Babka
@ 2023-03-06 11:03     ` Nikunj A. Dadhania
  0 siblings, 0 replies; 147+ messages in thread
From: Nikunj A. Dadhania @ 2023-03-06 11:03 UTC (permalink / raw)
  To: Vlastimil Babka, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania



On 03/03/23 19:35, Vlastimil Babka wrote:
> On 2/20/23 19:38, Michael Roth wrote:
>> From: Nikunj A Dadhania <nikunj@amd.com>
>>
>> Do not pin the guest memory backed by a restrictedmem backend, as
>> pages in the restrictedmem are already pinned. Instead, populate the
>> pages array for these guests using the already-pinned pages provided by
> 
> IIUC the "already pinned" became "effectively unmovable and unevictable"
> since the earlier versions, so that would be more accurate now?

Yes, that makes sense.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op
  2023-02-20 18:37 ` [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op Michael Roth
  2023-03-01 10:25   ` Zhi Wang
@ 2023-03-18  4:51   ` Isaku Yamahata
  2023-03-20 17:46     ` Michael Roth
  2023-03-18  4:53   ` Isaku Yamahata
  2 siblings, 1 reply; 147+ messages in thread
From: Isaku Yamahata @ 2023-03-18  4:51 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	isaku.yamahata

On Mon, Feb 20, 2023 at 12:37:52PM -0600,
Michael Roth <michael.roth@amd.com> wrote:

> This callback is used by the KVM MMU to check whether a #NPF was for a
> private GPA or not.
> 
> In some cases the full 64-bit error code for the #NPF will be needed to
> make this determination, so also update kvm_mmu_do_page_fault() to
> accept the full 64-bit value so it can be plumbed through to the
> callback.

We can split 64-bit part into the independent patch.

> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/mmu/mmu.c             |  3 +--
>  arch/x86/kvm/mmu/mmu_internal.h    | 37 +++++++++++++++++++++++++++---
>  4 files changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 8dc345cc6318..72183da010b8 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed)
>  KVM_X86_OP(complete_emulated_msr)
>  KVM_X86_OP(vcpu_deliver_sipi_vector)
>  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> +KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
>  
>  #undef KVM_X86_OP
>  #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e552374f2357..f856d689dda0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1643,6 +1643,7 @@ struct kvm_x86_ops {
>  
>  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
>  			     int root_level);
> +	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
>  
>  	bool (*has_wbinvd_exit)(void);
>  
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index eda615f3951c..fb3f34b7391c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5724,8 +5724,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>  	}
>  
>  	if (r == RET_PF_INVALID) {
> -		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
> -					  lower_32_bits(error_code), false);
> +		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false);
>  		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
>  			return -EIO;
>  	}
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index e642d431df4b..557a001210df 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -231,6 +231,37 @@ struct kvm_page_fault {
>  
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>  
> +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> +{
> +	struct kvm_memory_slot *slot;
> +	bool private_fault = false;
> +	gfn_t gfn = gpa_to_gfn(gpa);
> +
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot) {
> +		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
> +		goto out;
> +	}
> +
> +	if (!kvm_slot_can_be_private(slot)) {
> +		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
> +		goto out;
> +	}
> +
> +	if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault))
> +		goto out;
> +
> +	/*
> +	 * Handling below is for UPM self-tests and guests that treat userspace
> +	 * as the authority on whether a fault should be private or not.
> +	 */
> +	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
> +
> +out:
> +	pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault);
> +	return private_fault;
> +}
> +
>  /*
>   * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
>   * and of course kvm_mmu_do_page_fault().
> @@ -262,11 +293,11 @@ enum {
>  };
>  
>  static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> -					u32 err, bool prefetch)
> +					u64 err, bool prefetch)
>  {
>  	struct kvm_page_fault fault = {
>  		.addr = cr2_or_gpa,
> -		.error_code = err,
> +		.error_code = lower_32_bits(err),
>  		.exec = err & PFERR_FETCH_MASK,
>  		.write = err & PFERR_WRITE_MASK,
>  		.present = err & PFERR_PRESENT_MASK,
> @@ -280,7 +311,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
>  		.req_level = PG_LEVEL_4K,
>  		.goal_level = PG_LEVEL_4K,
> -		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> +		.is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),

I don't think kvm_mmu_fault_is_private(). It's too heavy. We can make it
it's own. I.e. the following.

From b0f914a1a4d154f076c0294831ce9ef0df7eb3d3 Mon Sep 17 00:00:00 2001
Message-Id: <b0f914a1a4d154f076c0294831ce9ef0df7eb3d3.1679114841.git.isaku.yamahata@intel.com>
In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
From: Isaku Yamahata <isaku.yamahata@intel.com>
Date: Fri, 17 Mar 2023 11:18:13 -0700
Subject: [PATCH 2/4] KVM: x86: Add 'fault_is_private' x86 op

This callback is used by the KVM MMU to check whether a KVM page fault was
for a private GPA or not.

Originally-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/mmu.h                 | 19 +++++++++++++++++++
 arch/x86/kvm/mmu/mmu_internal.h    |  2 +-
 arch/x86/kvm/x86.c                 |  8 ++++++++
 5 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index e1f57905c8fe..dc5f18ac0bd5 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -99,6 +99,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
 KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
 KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
 KVM_X86_OP(load_mmu_pgd)
+KVM_X86_OP(fault_is_private)
 KVM_X86_OP_OPTIONAL(link_private_spt)
 KVM_X86_OP_OPTIONAL(free_private_spt)
 KVM_X86_OP_OPTIONAL(split_private_spt)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 59196a80c3c8..0382d236fbf4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1730,6 +1730,7 @@ struct kvm_x86_ops {
 
 	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			     int root_level);
+	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
 
 	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
 				void *private_spt);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 4aaef2132b97..1f21680b9b97 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -289,6 +289,25 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
 	return translate_nested_gpa(vcpu, gpa, access, exception);
 }
 
+static inline bool kvm_mmu_fault_is_private_default(struct kvm *kvm, gpa_t gpa, u64 err)
+{
+	struct kvm_memory_slot *slot;
+	gfn_t gfn = gpa_to_gfn(gpa);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot)
+		return false;
+
+	if (!kvm_slot_can_be_private(slot))
+		return false;
+
+	/*
+	 * Handling below is for UPM self-tests and guests that treat userspace
+	 * as the authority on whether a fault should be private or not.
+	 */
+	return kvm_mem_is_private(kvm, gfn);
+}
+
 static inline gfn_t kvm_gfn_shared_mask(const struct kvm *kvm)
 {
 #ifdef CONFIG_KVM_MMU_PRIVATE
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index bb5709f1cb57..6b54b069d1ed 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -445,7 +445,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.max_level = vcpu->kvm->arch.tdp_max_page_level,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
-		.is_private = kvm_is_private_gpa(vcpu->kvm, cr2_or_gpa),
+		.is_private = static_call(kvm_x86_fault_is_private)(vcpu->kvm, cr2_or_gpa, err),
 	};
 	int r;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd14368c6bc8..0311ab450330 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9419,6 +9419,14 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
 #undef __KVM_X86_OP
 
 	kvm_pmu_ops_update(ops->pmu_ops);
+
+	/*
+	 * TODO: Once all backend fills this option, remove this and the default
+	 * function.
+	 */
+	if (!ops->runtime_ops->fault_is_private)
+		static_call_update(kvm_x86_fault_is_private,
+				   kvm_mmu_fault_is_private_default);
 }
 
 static int kvm_x86_check_processor_compatibility(void)
-- 
2.25.1




-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op
  2023-02-20 18:37 ` [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op Michael Roth
  2023-03-01 10:25   ` Zhi Wang
  2023-03-18  4:51   ` Isaku Yamahata
@ 2023-03-18  4:53   ` Isaku Yamahata
  2 siblings, 0 replies; 147+ messages in thread
From: Isaku Yamahata @ 2023-03-18  4:53 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	isaku.yamahata

On Mon, Feb 20, 2023 at 12:37:52PM -0600,
Michael Roth <michael.roth@amd.com> wrote:

> This callback is used by the KVM MMU to check whether a #NPF was for a
> private GPA or not.
> 
> In some cases the full 64-bit error code for the #NPF will be needed to
> make this determination, so also update kvm_mmu_do_page_fault() to
> accept the full 64-bit value so it can be plumbed through to the
> callback.

Here is a patch to change error code 64-bit.

From 428a676face7a06a90e59dca1c32941c9b6ee001 Mon Sep 17 00:00:00 2001
Message-Id: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
From: Isaku Yamahata <isaku.yamahata@intel.com>
Date: Fri, 17 Mar 2023 12:58:42 -0700
Subject: [PATCH 1/4] KVM: x86/mmu: Pass round full 64-bit error code for the
 KVM page fault

In some cases the full 64-bit error code for the KVM page fault will be
needed to make this determination, so also update kvm_mmu_do_page_fault()
to accept the full 64-bit value so it can be plumbed through to the
callback.

The upper 32 bits of error code is discarded at kvm_mmu_page_fault()
by lower_32_bits().  Now it's passed down as full 64 bits. It turns out
that only FNAME(page_fault) depends on it.  Move lower_32_bits() into
FNAME(page_fault).

The accesses of fault->error_code are as follows
- FNAME(page_fault): change to explicitly use lower_32_bits()
- kvm_tdp_page_fault(): explicit mask with PFERR_LEVEL_MASK
- kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
                        PFERR_NESTED_GUEST_PAGE
- mmutrace: changed u32 -> u64
- pgprintk(): change %x -> %llx

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu.h              | 2 +-
 arch/x86/kvm/mmu/mmu.c          | 7 +++----
 arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
 arch/x86/kvm/mmu/mmutrace.h     | 2 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 4 ++--
 5 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index de9c6b98c41b..4aaef2132b97 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -156,7 +156,7 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 }
 
 kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
-			       u32 error_code, int max_level);
+			       u64 error_code, int max_level);
 
 /*
  * Check if a given access (described through the I/D, W/R and U/S bits of a
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 960609d72dd6..0ec94c72895c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4860,7 +4860,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu,
 				struct kvm_page_fault *fault)
 {
-	pgprintk("%s: gva %llx error %x\n", __func__, fault->addr, fault->error_code);
+	pgprintk("%s: gva %llx error %llx\n", __func__, fault->addr, fault->error_code);
 
 	/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
 	fault->max_level = PG_LEVEL_2M;
@@ -4986,7 +4986,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 }
 
 kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
-			       u32 error_code, int max_level)
+			       u64 error_code, int max_level)
 {
 	int r;
 	struct kvm_page_fault fault = (struct kvm_page_fault) {
@@ -6238,8 +6238,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	}
 
 	if (r == RET_PF_INVALID) {
-		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
-					  lower_32_bits(error_code), false);
+		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false);
 		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
 			return -EIO;
 	}
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index aa0836191b5a..bb5709f1cb57 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -341,7 +341,7 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 struct kvm_page_fault {
 	/* arguments to kvm_mmu_do_page_fault.  */
 	const gpa_t addr;
-	const u32 error_code;
+	const u64 error_code;
 	const bool prefetch;
 
 	/* Derived from error_code.  */
@@ -427,7 +427,7 @@ enum {
 };
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-					u32 err, bool prefetch)
+					u64 err, bool prefetch)
 {
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 2d7555381955..2e77883c92f6 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -261,7 +261,7 @@ TRACE_EVENT(
 	TP_STRUCT__entry(
 		__field(int, vcpu_id)
 		__field(gpa_t, cr2_or_gpa)
-		__field(u32, error_code)
+		__field(u64, error_code)
 		__field(u64 *, sptep)
 		__field(u64, old_spte)
 		__field(u64, new_spte)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 594af2e1fd2f..cab6822709e2 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -791,7 +791,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	int r;
 	bool is_self_change_mapping;
 
-	pgprintk("%s: addr %llx err %x\n", __func__, fault->addr, fault->error_code);
+	pgprintk("%s: addr %llx err %llx\n", __func__, fault->addr, fault->error_code);
 	WARN_ON_ONCE(fault->is_tdp);
 
 	/*
@@ -800,7 +800,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * The bit needs to be cleared before walking guest page tables.
 	 */
 	r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
-			     fault->error_code & ~PFERR_RSVD_MASK);
+			     lower_32_bits(fault->error_code) & ~PFERR_RSVD_MASK);
 
 	/*
 	 * The page is not mapped by the guest.  Let the guest handle it.
-- 
2.25.1


-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' x86 op
  2023-02-20 18:37 ` [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' " Michael Roth
@ 2023-03-18  4:56   ` Isaku Yamahata
  2023-03-20 18:05     ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Isaku Yamahata @ 2023-03-18  4:56 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	isaku.yamahata

On Mon, Feb 20, 2023 at 12:37:53PM -0600,
Michael Roth <michael.roth@amd.com> wrote:

> This callback will do any platform-specific handling needed for
> converting pages between shared/private.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  2 ++
>  arch/x86/kvm/mmu/mmu.c             | 13 +++++++++++++
>  include/linux/kvm_host.h           |  4 ++++
>  virt/kvm/kvm_main.c                | 29 +++++++++++++++++++++++++++++
>  5 files changed, 49 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 72183da010b8..a8aaf532c2ab 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -132,6 +132,7 @@ KVM_X86_OP(complete_emulated_msr)
>  KVM_X86_OP(vcpu_deliver_sipi_vector)
>  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
>  KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
> +KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
>  
>  #undef KVM_X86_OP
>  #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f856d689dda0..2da3fb2d5d1b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1644,6 +1644,8 @@ struct kvm_x86_ops {
>  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
>  			     int root_level);
>  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
> +	int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
> +			       gfn_t start, gfn_t end);
>  
>  	bool (*has_wbinvd_exit)(void);
>  
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index fb3f34b7391c..053bd77bbf52 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7251,4 +7251,17 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
>  		linfo_update_mixed(gfn, slot, level, mixed);
>  	}
>  }
> +
> +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> +					 struct kvm_memory_slot *slot,
> +					 unsigned long attrs,
> +					 gfn_t start, gfn_t end)
> +{
> +	int ret;
> +
> +	ret = static_call(kvm_x86_update_mem_attr)(slot, attrs, start, end);
> +	if (ret)
> +		pr_warn_ratelimited("Failed to update GFN range 0x%llx-0x%llx with attributes 0x%lx. Ret: %d\n",
> +				    start, end, attrs, ret);
> +}
>  #endif
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index fdc59479b3e2..d200b8f45583 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2330,6 +2330,10 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
>  				    struct kvm_memory_slot *slot,
>  				    unsigned long attrs,
>  				    gfn_t start, gfn_t end);
> +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> +					 struct kvm_memory_slot *slot,
> +					 unsigned long attrs,
> +					 gfn_t start, gfn_t end);
>  
>  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b68574ff6c30..8ec985f1c57d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2561,6 +2561,32 @@ static void kvm_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
>  		kvm_flush_remote_tlbs(kvm);
>  }
>  
> +static void kvm_post_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
> +				       gfn_t start_orig, gfn_t end_orig)
> +{
> +	struct kvm_memory_slot *slot;
> +	struct kvm_memslots *slots;
> +	struct kvm_memslot_iter iter;
> +	int i;
> +
> +	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> +		slots = __kvm_memslots(kvm, i);
> +
> +		kvm_for_each_memslot_in_gfn_range(&iter, slots, start_orig, end_orig) {
> +			gfn_t start, end;
> +
> +			slot = iter.slot;
> +			start = max(start_orig, slot->base_gfn);
> +			end = min(end_orig, slot->base_gfn + slot->npages);
> +
> +			if (start >= end)
> +				continue;
> +
> +			kvm_arch_post_set_memory_attributes(kvm, slot, attrs, start, end);
> +		}
> +	}
> +}
> +
>  static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>  					   struct kvm_memory_attributes *attrs)
>  {
> @@ -2602,6 +2628,9 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>  	kvm_mmu_invalidate_end(kvm);
>  	KVM_MMU_UNLOCK(kvm);
>  
> +	if (i > start)
> +		kvm_post_mem_attrs_changed(kvm, attrs->attributes, start, i);
> +

Doesn't kvm_arch_set_memory_attributes() work for you? i.e the following patch.
The error check and pr_warn_ratelimited() can be pushed down into the callback.

From 7c618c1f3c236c382e64680efcbe7d8a672aa870 Mon Sep 17 00:00:00 2001
Message-Id: <7c618c1f3c236c382e64680efcbe7d8a672aa870.1679114841.git.isaku.yamahata@intel.com>
In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
From: Isaku Yamahata <isaku.yamahata@intel.com>
Date: Fri, 17 Mar 2023 12:00:09 -0700
Subject: [PATCH 4/4] KVM: x86: Add 'set_mem_attr' x86 op

This callback will do any platform-specific handling needed for
converting pages between shared/private.

Originally-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 1 +
 arch/x86/include/asm/kvm_host.h    | 2 ++
 arch/x86/kvm/mmu/mmu.c             | 1 +
 3 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index dc5f18ac0bd5..956db2ee25a5 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -100,6 +100,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
 KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
 KVM_X86_OP(load_mmu_pgd)
 KVM_X86_OP(fault_is_private)
+KVM_X86_OP_OPTIONAL(set_mem_attr)
 KVM_X86_OP_OPTIONAL(link_private_spt)
 KVM_X86_OP_OPTIONAL(free_private_spt)
 KVM_X86_OP_OPTIONAL(split_private_spt)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0382d236fbf4..88e11dd3afde 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
 	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			     int root_level);
 	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
+	void (*set_mem_attr)(struct kvm *kvm, struct kvm_memory_slot *slot,
+			     unsigned int attr, gfn_t start, gfn_t end);
 
 	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
 				void *private_spt);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0ec94c72895c..329333486e64 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7908,6 +7908,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
 				    gfn_t start, gfn_t end)
 {
 	kvm_update_lpage_mixed_flag(kvm, slot, true, attrs, start, end);
+	static_call(kvm_x86_set_mem_attr)(kvm, slot, attrs, start, end);
 }
 
 void kvm_memory_attributes_create_memslot(struct kvm *kvm,
-- 
2.25.1

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 03/56] KVM: x86: Add platform hooks for private memory invalidations
  2023-02-20 18:37 ` [PATCH RFC v8 03/56] KVM: x86: Add platform hooks for private memory invalidations Michael Roth
@ 2023-03-18  5:13   ` Isaku Yamahata
  2023-03-20 18:09     ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Isaku Yamahata @ 2023-03-18  5:13 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	isaku.yamahata

On Mon, Feb 20, 2023 at 12:37:54PM -0600,
Michael Roth <michael.roth@amd.com> wrote:

> In some cases, like with SEV-SNP, guest memory needs to be updated in a
> platform-specific manner before it can be safely freed back to the host.
> Add hooks to wire up handling of this sort to the invalidation notifiers
> for restricted memory.
> 
> Also issue invalidations of all allocated pages during notifier/memslot
> unbinding so that the pages are not left in an unusable state when
> they eventually get freed back to the host upon FD release.

I'm just curios. Could you please elaborate?
Unbind is happen only when memory slot is delete or vm is destroyed.  In the
case of memory slot deletion, the gpa region is zapped via
kvm_arch_commit_memory_region().  In the case of VM destroy, we have
kvm_flush_shadow_all() which calls
kvm_arch_flush_shadow_all() =>kvm_mmu_zap_all().  Doesn't it work?

Thanks,
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op
  2023-03-18  4:51   ` Isaku Yamahata
@ 2023-03-20 17:46     ` Michael Roth
  0 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-03-20 17:46 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On Fri, Mar 17, 2023 at 09:51:37PM -0700, Isaku Yamahata wrote:
> On Mon, Feb 20, 2023 at 12:37:52PM -0600,
> Michael Roth <michael.roth@amd.com> wrote:
> 
> > This callback is used by the KVM MMU to check whether a #NPF was for a
> > private GPA or not.
> > 
> > In some cases the full 64-bit error code for the #NPF will be needed to
> > make this determination, so also update kvm_mmu_do_page_fault() to
> > accept the full 64-bit value so it can be plumbed through to the
> > callback.

Hi Isaku, Zhi,

Thanks for your efforts trying to get us in sync on these shared
interfaces. Would be great to have a common base we can build on for the
SNP/TDX series. You mentioned a couple patches here that I couldn't find
on the list, are you planning to submit these as a separate series?

> 
> We can split 64-bit part into the independent patch.

Agreed that makes sense.

> 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---

<snip>

> > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> > +{
> > +	struct kvm_memory_slot *slot;
> > +	bool private_fault = false;
> > +	gfn_t gfn = gpa_to_gfn(gpa);
> > +
> > +	slot = gfn_to_memslot(kvm, gfn);
> > +	if (!slot) {
> > +		pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
> > +		goto out;
> > +	}
> > +
> > +	if (!kvm_slot_can_be_private(slot)) {
> > +		pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
> > +		goto out;
> > +	}
> > +
> > +	if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault))
> > +		goto out;
> > +
> > +	/*
> > +	 * Handling below is for UPM self-tests and guests that treat userspace
> > +	 * as the authority on whether a fault should be private or not.
> > +	 */
> > +	private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
> > +
> > +out:
> > +	pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault);
> > +	return private_fault;
> > +}
> > +
> >  /*
> >   * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
> >   * and of course kvm_mmu_do_page_fault().
> > @@ -262,11 +293,11 @@ enum {
> >  };
> >  
> >  static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > -					u32 err, bool prefetch)
> > +					u64 err, bool prefetch)
> >  {
> >  	struct kvm_page_fault fault = {
> >  		.addr = cr2_or_gpa,
> > -		.error_code = err,
> > +		.error_code = lower_32_bits(err),
> >  		.exec = err & PFERR_FETCH_MASK,
> >  		.write = err & PFERR_WRITE_MASK,
> >  		.present = err & PFERR_PRESENT_MASK,
> > @@ -280,7 +311,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >  		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
> >  		.req_level = PG_LEVEL_4K,
> >  		.goal_level = PG_LEVEL_4K,
> > -		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> > +		.is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
> 
> I don't think kvm_mmu_fault_is_private(). It's too heavy. We can make it
> it's own. I.e. the following.

Is it causing performance issues? If most of that is mainly due to
gfn_to_memslot()/kvm_slot_can_be_private() check, then maybe that part
can be dropped. In the past Sean has mentioned that we shouldn't have to
do kvm_slot_can_be_private() checks prior to kvm_mem_is_private(), but I
haven't tried removing those yet to see if things still work as expected.

> 
> From b0f914a1a4d154f076c0294831ce9ef0df7eb3d3 Mon Sep 17 00:00:00 2001
> Message-Id: <b0f914a1a4d154f076c0294831ce9ef0df7eb3d3.1679114841.git.isaku.yamahata@intel.com>
> In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> Date: Fri, 17 Mar 2023 11:18:13 -0700
> Subject: [PATCH 2/4] KVM: x86: Add 'fault_is_private' x86 op
> 
> This callback is used by the KVM MMU to check whether a KVM page fault was
> for a private GPA or not.
> 
> Originally-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/mmu.h                 | 19 +++++++++++++++++++
>  arch/x86/kvm/mmu/mmu_internal.h    |  2 +-
>  arch/x86/kvm/x86.c                 |  8 ++++++++
>  5 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index e1f57905c8fe..dc5f18ac0bd5 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -99,6 +99,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
>  KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
>  KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
>  KVM_X86_OP(load_mmu_pgd)
> +KVM_X86_OP(fault_is_private)
>  KVM_X86_OP_OPTIONAL(link_private_spt)
>  KVM_X86_OP_OPTIONAL(free_private_spt)
>  KVM_X86_OP_OPTIONAL(split_private_spt)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 59196a80c3c8..0382d236fbf4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1730,6 +1730,7 @@ struct kvm_x86_ops {
>  
>  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
>  			     int root_level);
> +	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
>  
>  	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
>  				void *private_spt);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 4aaef2132b97..1f21680b9b97 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -289,6 +289,25 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
>  	return translate_nested_gpa(vcpu, gpa, access, exception);
>  }
>  
> +static inline bool kvm_mmu_fault_is_private_default(struct kvm *kvm, gpa_t gpa, u64 err)
> +{
> +	struct kvm_memory_slot *slot;
> +	gfn_t gfn = gpa_to_gfn(gpa);
> +
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot)
> +		return false;
> +
> +	if (!kvm_slot_can_be_private(slot))
> +		return false;
> +
> +	/*
> +	 * Handling below is for UPM self-tests and guests that treat userspace
> +	 * as the authority on whether a fault should be private or not.
> +	 */
> +	return kvm_mem_is_private(kvm, gfn);
> +}
> +
>  static inline gfn_t kvm_gfn_shared_mask(const struct kvm *kvm)
>  {
>  #ifdef CONFIG_KVM_MMU_PRIVATE
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index bb5709f1cb57..6b54b069d1ed 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -445,7 +445,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  		.max_level = vcpu->kvm->arch.tdp_max_page_level,
>  		.req_level = PG_LEVEL_4K,
>  		.goal_level = PG_LEVEL_4K,
> -		.is_private = kvm_is_private_gpa(vcpu->kvm, cr2_or_gpa),
> +		.is_private = static_call(kvm_x86_fault_is_private)(vcpu->kvm, cr2_or_gpa, err),
>  	};
>  	int r;
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fd14368c6bc8..0311ab450330 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9419,6 +9419,14 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
>  #undef __KVM_X86_OP
>  
>  	kvm_pmu_ops_update(ops->pmu_ops);
> +
> +	/*
> +	 * TODO: Once all backend fills this option, remove this and the default
> +	 * function.
> +	 */
> +	if (!ops->runtime_ops->fault_is_private)
> +		static_call_update(kvm_x86_fault_is_private,
> +				   kvm_mmu_fault_is_private_default);

I'm not sure about this approach, since the self-tests (and possibly SEV
(which doesn't use a separate #NPF error bit like SNP/TDX)) currently
rely on that kvm_mem_is_private() call to determine whether to handle as
a private fault or not. But to run either of those, we would need to
load the kvm_amd module, which will have already introduced it's own
kvm_x86_fault_is_private implementation via svm_init(), so the handling
provided by kvm_mmu_fault_is_private_default would never be available and
so we wouldn't be able to run the UPM self-tests.

To me it seems like that handling always needs to be in place as a
fallback when not running SNP/TDX. It doesn't necessarily need to be in the
kvm_x86_fault_is_private handler though, maybe some generic handling for
UPM selftests can be pushed down into KVM MMU. Doing so could also
address a race that Sean mentioned between the time kvm_mem_is_private()
is called here (which happens before mmu_invalidate_seq is recorded for
the #NPF) vs. when it actually gets used in __kvm_faultin_pfn().

If we take that approach, then the requirements for specific TDX/SNP
handling are reduced as well, since we only need to check the
encryption/shared bit, and that could maybe be done as a simple setting
that where you tell KVM MMU the position of the bit, whether it
indicates shared vs. private, then both TDX/SNP could re-use a simple
helper to check the #NPF error code and set .is_private based on that.

Then KVM MMU could, if no bit is indicated, just fall back to using the
value of kvm_mem_is_private() somewhere in __kvm_fault_pfn() or
something.

I mentioned this to Sean a while back, which I think is compatible with
what he was looking for:

  https://lore.kernel.org/lkml/20230220162210.42rjdgbdwbjiextz@amd.com/

Would be good to get his input before spending too much time adding new
state/configuration stuff in KVM MMU though.

As an interim solution, would my original patch work if we could
confirm that the gfn_to_memslot()/kvm_slot_can_be_private() sequence is
no longer needed?

Thanks!

-Mike

>  }
>  
>  static int kvm_x86_check_processor_compatibility(void)
> -- 
> 2.25.1
> 
> 
> 
> 
> -- 
> Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' x86 op
  2023-03-18  4:56   ` Isaku Yamahata
@ 2023-03-20 18:05     ` Michael Roth
  2023-03-21 11:21       ` Zhi Wang
  0 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-03-20 18:05 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On Fri, Mar 17, 2023 at 09:56:11PM -0700, Isaku Yamahata wrote:
> On Mon, Feb 20, 2023 at 12:37:53PM -0600,
> Michael Roth <michael.roth@amd.com> wrote:
> 
> > This callback will do any platform-specific handling needed for
> > converting pages between shared/private.
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  arch/x86/include/asm/kvm-x86-ops.h |  1 +
> >  arch/x86/include/asm/kvm_host.h    |  2 ++
> >  arch/x86/kvm/mmu/mmu.c             | 13 +++++++++++++
> >  include/linux/kvm_host.h           |  4 ++++
> >  virt/kvm/kvm_main.c                | 29 +++++++++++++++++++++++++++++
> >  5 files changed, 49 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > index 72183da010b8..a8aaf532c2ab 100644
> > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > @@ -132,6 +132,7 @@ KVM_X86_OP(complete_emulated_msr)
> >  KVM_X86_OP(vcpu_deliver_sipi_vector)
> >  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> >  KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
> > +KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
> >  
> >  #undef KVM_X86_OP
> >  #undef KVM_X86_OP_OPTIONAL
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index f856d689dda0..2da3fb2d5d1b 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1644,6 +1644,8 @@ struct kvm_x86_ops {
> >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> >  			     int root_level);
> >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
> > +	int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
> > +			       gfn_t start, gfn_t end);
> >  
> >  	bool (*has_wbinvd_exit)(void);
> >  
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index fb3f34b7391c..053bd77bbf52 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -7251,4 +7251,17 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> >  		linfo_update_mixed(gfn, slot, level, mixed);
> >  	}
> >  }
> > +
> > +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> > +					 struct kvm_memory_slot *slot,
> > +					 unsigned long attrs,
> > +					 gfn_t start, gfn_t end)
> > +{
> > +	int ret;
> > +
> > +	ret = static_call(kvm_x86_update_mem_attr)(slot, attrs, start, end);
> > +	if (ret)
> > +		pr_warn_ratelimited("Failed to update GFN range 0x%llx-0x%llx with attributes 0x%lx. Ret: %d\n",
> > +				    start, end, attrs, ret);
> > +}
> >  #endif
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index fdc59479b3e2..d200b8f45583 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2330,6 +2330,10 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> >  				    struct kvm_memory_slot *slot,
> >  				    unsigned long attrs,
> >  				    gfn_t start, gfn_t end);
> > +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> > +					 struct kvm_memory_slot *slot,
> > +					 unsigned long attrs,
> > +					 gfn_t start, gfn_t end);
> >  
> >  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> >  {
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index b68574ff6c30..8ec985f1c57d 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2561,6 +2561,32 @@ static void kvm_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
> >  		kvm_flush_remote_tlbs(kvm);
> >  }
> >  
> > +static void kvm_post_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
> > +				       gfn_t start_orig, gfn_t end_orig)
> > +{
> > +	struct kvm_memory_slot *slot;
> > +	struct kvm_memslots *slots;
> > +	struct kvm_memslot_iter iter;
> > +	int i;
> > +
> > +	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> > +		slots = __kvm_memslots(kvm, i);
> > +
> > +		kvm_for_each_memslot_in_gfn_range(&iter, slots, start_orig, end_orig) {
> > +			gfn_t start, end;
> > +
> > +			slot = iter.slot;
> > +			start = max(start_orig, slot->base_gfn);
> > +			end = min(end_orig, slot->base_gfn + slot->npages);
> > +
> > +			if (start >= end)
> > +				continue;
> > +
> > +			kvm_arch_post_set_memory_attributes(kvm, slot, attrs, start, end);
> > +		}
> > +	}
> > +}
> > +
> >  static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> >  					   struct kvm_memory_attributes *attrs)
> >  {
> > @@ -2602,6 +2628,9 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> >  	kvm_mmu_invalidate_end(kvm);
> >  	KVM_MMU_UNLOCK(kvm);
> >  
> > +	if (i > start)
> > +		kvm_post_mem_attrs_changed(kvm, attrs->attributes, start, i);
> > +
> 
> Doesn't kvm_arch_set_memory_attributes() work for you? i.e the following patch.
> The error check and pr_warn_ratelimited() can be pushed down into the callback.

This is originally how I had but when CONFIG_PREEMPT_COUNT is set this
will generate warnings for this callback as well as the invalidation
callback as reported in v7 here:

  https://lore.kernel.org/lkml/Y80vhKwQyw8hS%2F22@notebook/

The main issue is that kvm_mem_attrs_changed() is called while holding
the KVM MMU lock, which disables preemption. But when updating
attributes for SNP, we also need to remove private pages from kernel
directmap, which involves acquiring a mutex which results in
"BUG: scheduling while atomic" warnings.

So that's why we ended up somewhat duplicating some of the logic and
using a separate callback chain that happens out of KVM MMU lock.

-Mike

> 
> From 7c618c1f3c236c382e64680efcbe7d8a672aa870 Mon Sep 17 00:00:00 2001
> Message-Id: <7c618c1f3c236c382e64680efcbe7d8a672aa870.1679114841.git.isaku.yamahata@intel.com>
> In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> Date: Fri, 17 Mar 2023 12:00:09 -0700
> Subject: [PATCH 4/4] KVM: x86: Add 'set_mem_attr' x86 op
> 
> This callback will do any platform-specific handling needed for
> converting pages between shared/private.
> 
> Originally-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h | 1 +
>  arch/x86/include/asm/kvm_host.h    | 2 ++
>  arch/x86/kvm/mmu/mmu.c             | 1 +
>  3 files changed, 4 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index dc5f18ac0bd5..956db2ee25a5 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -100,6 +100,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
>  KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
>  KVM_X86_OP(load_mmu_pgd)
>  KVM_X86_OP(fault_is_private)
> +KVM_X86_OP_OPTIONAL(set_mem_attr)
>  KVM_X86_OP_OPTIONAL(link_private_spt)
>  KVM_X86_OP_OPTIONAL(free_private_spt)
>  KVM_X86_OP_OPTIONAL(split_private_spt)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0382d236fbf4..88e11dd3afde 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
>  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
>  			     int root_level);
>  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
> +	void (*set_mem_attr)(struct kvm *kvm, struct kvm_memory_slot *slot,
> +			     unsigned int attr, gfn_t start, gfn_t end);
>  
>  	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
>  				void *private_spt);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 0ec94c72895c..329333486e64 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7908,6 +7908,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
>  				    gfn_t start, gfn_t end)
>  {
>  	kvm_update_lpage_mixed_flag(kvm, slot, true, attrs, start, end);
> +	static_call(kvm_x86_set_mem_attr)(kvm, slot, attrs, start, end);
>  }
>  
>  void kvm_memory_attributes_create_memslot(struct kvm *kvm,
> -- 
> 2.25.1
> 
> -- 
> Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 03/56] KVM: x86: Add platform hooks for private memory invalidations
  2023-03-18  5:13   ` Isaku Yamahata
@ 2023-03-20 18:09     ` Michael Roth
  0 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-03-20 18:09 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On Fri, Mar 17, 2023 at 10:13:22PM -0700, Isaku Yamahata wrote:
> On Mon, Feb 20, 2023 at 12:37:54PM -0600,
> Michael Roth <michael.roth@amd.com> wrote:
> 
> > In some cases, like with SEV-SNP, guest memory needs to be updated in a
> > platform-specific manner before it can be safely freed back to the host.
> > Add hooks to wire up handling of this sort to the invalidation notifiers
> > for restricted memory.
> > 
> > Also issue invalidations of all allocated pages during notifier/memslot
> > unbinding so that the pages are not left in an unusable state when
> > they eventually get freed back to the host upon FD release.
> 
> I'm just curios. Could you please elaborate?
> Unbind is happen only when memory slot is delete or vm is destroyed.  In the
> case of memory slot deletion, the gpa region is zapped via
> kvm_arch_commit_memory_region().  In the case of VM destroy, we have
> kvm_flush_shadow_all() which calls
> kvm_arch_flush_shadow_all() =>kvm_mmu_zap_all().  Doesn't it work?

The main thing here is unbind happens right before the restrictedmem
pages are released back to the host, and for SNP we need to clear the
associated RMP table entries to switch them from guest-owned to
hypervisor-owned. It doesn't necessarily need to be a separate callback,
but I'm not sure if it makes sense to squash that down into the various
MMU zapping helpers.

-Mike

> 
> Thanks,
> -- 
> Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' x86 op
  2023-03-20 18:05     ` Michael Roth
@ 2023-03-21 11:21       ` Zhi Wang
  2023-03-22  1:58         ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-03-21 11:21 UTC (permalink / raw)
  To: Michael Roth
  Cc: Isaku Yamahata, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

On Mon, 20 Mar 2023 13:05:43 -0500
Michael Roth <michael.roth@amd.com> wrote:

> On Fri, Mar 17, 2023 at 09:56:11PM -0700, Isaku Yamahata wrote:
> > On Mon, Feb 20, 2023 at 12:37:53PM -0600,
> > Michael Roth <michael.roth@amd.com> wrote:
> >   
> > > This callback will do any platform-specific handling needed for
> > > converting pages between shared/private.
> > > 
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > >  arch/x86/include/asm/kvm-x86-ops.h |  1 +
> > >  arch/x86/include/asm/kvm_host.h    |  2 ++
> > >  arch/x86/kvm/mmu/mmu.c             | 13 +++++++++++++
> > >  include/linux/kvm_host.h           |  4 ++++
> > >  virt/kvm/kvm_main.c                | 29 +++++++++++++++++++++++++++++
> > >  5 files changed, 49 insertions(+)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > > index 72183da010b8..a8aaf532c2ab 100644
> > > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > > @@ -132,6 +132,7 @@ KVM_X86_OP(complete_emulated_msr)
> > >  KVM_X86_OP(vcpu_deliver_sipi_vector)
> > >  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> > >  KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
> > > +KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
> > >  
> > >  #undef KVM_X86_OP
> > >  #undef KVM_X86_OP_OPTIONAL
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index f856d689dda0..2da3fb2d5d1b 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1644,6 +1644,8 @@ struct kvm_x86_ops {
> > >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > >  			     int root_level);
> > >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
> > > +	int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
> > > +			       gfn_t start, gfn_t end);
> > >  
> > >  	bool (*has_wbinvd_exit)(void);
> > >  
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index fb3f34b7391c..053bd77bbf52 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -7251,4 +7251,17 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > >  		linfo_update_mixed(gfn, slot, level, mixed);
> > >  	}
> > >  }
> > > +
> > > +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> > > +					 struct kvm_memory_slot *slot,
> > > +					 unsigned long attrs,
> > > +					 gfn_t start, gfn_t end)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = static_call(kvm_x86_update_mem_attr)(slot, attrs, start, end);
> > > +	if (ret)
> > > +		pr_warn_ratelimited("Failed to update GFN range 0x%llx-0x%llx with attributes 0x%lx. Ret: %d\n",
> > > +				    start, end, attrs, ret);
> > > +}
> > >  #endif
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index fdc59479b3e2..d200b8f45583 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -2330,6 +2330,10 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > >  				    struct kvm_memory_slot *slot,
> > >  				    unsigned long attrs,
> > >  				    gfn_t start, gfn_t end);
> > > +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> > > +					 struct kvm_memory_slot *slot,
> > > +					 unsigned long attrs,
> > > +					 gfn_t start, gfn_t end);
> > >  
> > >  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> > >  {
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index b68574ff6c30..8ec985f1c57d 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -2561,6 +2561,32 @@ static void kvm_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
> > >  		kvm_flush_remote_tlbs(kvm);
> > >  }
> > >  
> > > +static void kvm_post_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
> > > +				       gfn_t start_orig, gfn_t end_orig)
> > > +{
> > > +	struct kvm_memory_slot *slot;
> > > +	struct kvm_memslots *slots;
> > > +	struct kvm_memslot_iter iter;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> > > +		slots = __kvm_memslots(kvm, i);
> > > +
> > > +		kvm_for_each_memslot_in_gfn_range(&iter, slots, start_orig, end_orig) {
> > > +			gfn_t start, end;
> > > +
> > > +			slot = iter.slot;
> > > +			start = max(start_orig, slot->base_gfn);
> > > +			end = min(end_orig, slot->base_gfn + slot->npages);
> > > +
> > > +			if (start >= end)
> > > +				continue;
> > > +
> > > +			kvm_arch_post_set_memory_attributes(kvm, slot, attrs, start, end);
> > > +		}
> > > +	}
> > > +}
> > > +
> > >  static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > >  					   struct kvm_memory_attributes *attrs)
> > >  {
> > > @@ -2602,6 +2628,9 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > >  	kvm_mmu_invalidate_end(kvm);
> > >  	KVM_MMU_UNLOCK(kvm);
> > >  
> > > +	if (i > start)
> > > +		kvm_post_mem_attrs_changed(kvm, attrs->attributes, start, i);
> > > +  
> > 
> > Doesn't kvm_arch_set_memory_attributes() work for you? i.e the following patch.
> > The error check and pr_warn_ratelimited() can be pushed down into the callback.  
> 
> This is originally how I had but when CONFIG_PREEMPT_COUNT is set this
> will generate warnings for this callback as well as the invalidation
> callback as reported in v7 here:
> 
>   https://lore.kernel.org/lkml/Y80vhKwQyw8hS%2F22@notebook/
> 
> The main issue is that kvm_mem_attrs_changed() is called while holding
> the KVM MMU lock, which disables preemption. But when updating
> attributes for SNP, we also need to remove private pages from kernel
> directmap, which involves acquiring a mutex which results in
> "BUG: scheduling while atomic" warnings.
> 
> So that's why we ended up somewhat duplicating some of the logic and
> using a separate callback chain that happens out of KVM MMU lock.

Let's split the things of changing memory attributes:

1) Update the memory attributes in the xa array (Both TDX and SNP)
2) Zapping the EPT/NPT mappings (Required by TDX)
3) Update RMP table (Required by SNP)
4) Update the directmap of kernel (SNP, but I guess TDX needs it as well)

Does SNP really need to zap the NPT mappings when changing the memory
attributes? (The new mappings will be created later in the fault). I don't
find this requirement from APM.

If yes, can we postpone the update of the RMP table in the later fault,
like TDX? So that we can save this update_mem_attr x86 ops as things
will be solved in the SNP-specific fault handler.

If no, guess we need a x86 ops to tell if a zapping is required.

Back to the lock, updating RMP table doesn't require a mutex. Taking
the lock is required when updating the directmap. both TDX/SNP requires
this update the directmap when changing memory attributes.

Wouldn't it better to factor the touching directmap of kernel part out?

Then you can call the x86 ops.update_mem_attr() in kvm_mem_attrs_changed().
And update the direct kernel mapping for both TDX/SNP in the
kvm_post_mem_attrs_changed().

> 
> -Mike
> 
> > 
> > From 7c618c1f3c236c382e64680efcbe7d8a672aa870 Mon Sep 17 00:00:00 2001
> > Message-Id: <7c618c1f3c236c382e64680efcbe7d8a672aa870.1679114841.git.isaku.yamahata@intel.com>
> > In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > Date: Fri, 17 Mar 2023 12:00:09 -0700
> > Subject: [PATCH 4/4] KVM: x86: Add 'set_mem_attr' x86 op
> > 
> > This callback will do any platform-specific handling needed for
> > converting pages between shared/private.
> > 
> > Originally-by: Michael Roth <michael.roth@amd.com>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > ---
> >  arch/x86/include/asm/kvm-x86-ops.h | 1 +
> >  arch/x86/include/asm/kvm_host.h    | 2 ++
> >  arch/x86/kvm/mmu/mmu.c             | 1 +
> >  3 files changed, 4 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > index dc5f18ac0bd5..956db2ee25a5 100644
> > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > @@ -100,6 +100,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
> >  KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
> >  KVM_X86_OP(load_mmu_pgd)
> >  KVM_X86_OP(fault_is_private)
> > +KVM_X86_OP_OPTIONAL(set_mem_attr)
> >  KVM_X86_OP_OPTIONAL(link_private_spt)
> >  KVM_X86_OP_OPTIONAL(free_private_spt)
> >  KVM_X86_OP_OPTIONAL(split_private_spt)
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 0382d236fbf4..88e11dd3afde 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
> >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> >  			     int root_level);
> >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
> > +	void (*set_mem_attr)(struct kvm *kvm, struct kvm_memory_slot *slot,
> > +			     unsigned int attr, gfn_t start, gfn_t end);
> >  
> >  	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> >  				void *private_spt);
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 0ec94c72895c..329333486e64 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -7908,6 +7908,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> >  				    gfn_t start, gfn_t end)
> >  {
> >  	kvm_update_lpage_mixed_flag(kvm, slot, true, attrs, start, end);
> > +	static_call(kvm_x86_set_mem_attr)(kvm, slot, attrs, start, end);
> >  }
> >  
> >  void kvm_memory_attributes_create_memslot(struct kvm *kvm,
> > -- 
> > 2.25.1
> > 
> > -- 
> > Isaku Yamahata <isaku.yamahata@gmail.com>  


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' x86 op
  2023-03-21 11:21       ` Zhi Wang
@ 2023-03-22  1:58         ` Michael Roth
  2023-03-23 18:17           ` Zhi Wang
  0 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-03-22  1:58 UTC (permalink / raw)
  To: Zhi Wang
  Cc: Isaku Yamahata, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

On Tue, Mar 21, 2023 at 01:21:36PM +0200, Zhi Wang wrote:
> On Mon, 20 Mar 2023 13:05:43 -0500
> Michael Roth <michael.roth@amd.com> wrote:
> 
> > On Fri, Mar 17, 2023 at 09:56:11PM -0700, Isaku Yamahata wrote:
> > > On Mon, Feb 20, 2023 at 12:37:53PM -0600,
> > > Michael Roth <michael.roth@amd.com> wrote:
> > >   
> > > > This callback will do any platform-specific handling needed for
> > > > converting pages between shared/private.
> > > > 
> > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > ---
> > > >  arch/x86/include/asm/kvm-x86-ops.h |  1 +
> > > >  arch/x86/include/asm/kvm_host.h    |  2 ++
> > > >  arch/x86/kvm/mmu/mmu.c             | 13 +++++++++++++
> > > >  include/linux/kvm_host.h           |  4 ++++
> > > >  virt/kvm/kvm_main.c                | 29 +++++++++++++++++++++++++++++
> > > >  5 files changed, 49 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > > > index 72183da010b8..a8aaf532c2ab 100644
> > > > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > > > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > > > @@ -132,6 +132,7 @@ KVM_X86_OP(complete_emulated_msr)
> > > >  KVM_X86_OP(vcpu_deliver_sipi_vector)
> > > >  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> > > >  KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
> > > > +KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
> > > >  
> > > >  #undef KVM_X86_OP
> > > >  #undef KVM_X86_OP_OPTIONAL
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index f856d689dda0..2da3fb2d5d1b 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1644,6 +1644,8 @@ struct kvm_x86_ops {
> > > >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > > >  			     int root_level);
> > > >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
> > > > +	int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
> > > > +			       gfn_t start, gfn_t end);
> > > >  
> > > >  	bool (*has_wbinvd_exit)(void);
> > > >  
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index fb3f34b7391c..053bd77bbf52 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -7251,4 +7251,17 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > > >  		linfo_update_mixed(gfn, slot, level, mixed);
> > > >  	}
> > > >  }
> > > > +
> > > > +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> > > > +					 struct kvm_memory_slot *slot,
> > > > +					 unsigned long attrs,
> > > > +					 gfn_t start, gfn_t end)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +	ret = static_call(kvm_x86_update_mem_attr)(slot, attrs, start, end);
> > > > +	if (ret)
> > > > +		pr_warn_ratelimited("Failed to update GFN range 0x%llx-0x%llx with attributes 0x%lx. Ret: %d\n",
> > > > +				    start, end, attrs, ret);
> > > > +}
> > > >  #endif
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index fdc59479b3e2..d200b8f45583 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -2330,6 +2330,10 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > > >  				    struct kvm_memory_slot *slot,
> > > >  				    unsigned long attrs,
> > > >  				    gfn_t start, gfn_t end);
> > > > +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> > > > +					 struct kvm_memory_slot *slot,
> > > > +					 unsigned long attrs,
> > > > +					 gfn_t start, gfn_t end);
> > > >  
> > > >  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> > > >  {
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index b68574ff6c30..8ec985f1c57d 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -2561,6 +2561,32 @@ static void kvm_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
> > > >  		kvm_flush_remote_tlbs(kvm);
> > > >  }
> > > >  
> > > > +static void kvm_post_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
> > > > +				       gfn_t start_orig, gfn_t end_orig)
> > > > +{
> > > > +	struct kvm_memory_slot *slot;
> > > > +	struct kvm_memslots *slots;
> > > > +	struct kvm_memslot_iter iter;
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> > > > +		slots = __kvm_memslots(kvm, i);
> > > > +
> > > > +		kvm_for_each_memslot_in_gfn_range(&iter, slots, start_orig, end_orig) {
> > > > +			gfn_t start, end;
> > > > +
> > > > +			slot = iter.slot;
> > > > +			start = max(start_orig, slot->base_gfn);
> > > > +			end = min(end_orig, slot->base_gfn + slot->npages);
> > > > +
> > > > +			if (start >= end)
> > > > +				continue;
> > > > +
> > > > +			kvm_arch_post_set_memory_attributes(kvm, slot, attrs, start, end);
> > > > +		}
> > > > +	}
> > > > +}
> > > > +
> > > >  static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > >  					   struct kvm_memory_attributes *attrs)
> > > >  {
> > > > @@ -2602,6 +2628,9 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > >  	kvm_mmu_invalidate_end(kvm);
> > > >  	KVM_MMU_UNLOCK(kvm);
> > > >  
> > > > +	if (i > start)
> > > > +		kvm_post_mem_attrs_changed(kvm, attrs->attributes, start, i);
> > > > +  
> > > 
> > > Doesn't kvm_arch_set_memory_attributes() work for you? i.e the following patch.
> > > The error check and pr_warn_ratelimited() can be pushed down into the callback.  
> > 
> > This is originally how I had but when CONFIG_PREEMPT_COUNT is set this
> > will generate warnings for this callback as well as the invalidation
> > callback as reported in v7 here:
> > 
> >   https://lore.kernel.org/lkml/Y80vhKwQyw8hS%2F22@notebook/
> > 
> > The main issue is that kvm_mem_attrs_changed() is called while holding
> > the KVM MMU lock, which disables preemption. But when updating
> > attributes for SNP, we also need to remove private pages from kernel
> > directmap, which involves acquiring a mutex which results in
> > "BUG: scheduling while atomic" warnings.
> > 
> > So that's why we ended up somewhat duplicating some of the logic and
> > using a separate callback chain that happens out of KVM MMU lock.
> 
> Let's split the things of changing memory attributes:
> 
> 1) Update the memory attributes in the xa array (Both TDX and SNP)
> 2) Zapping the EPT/NPT mappings (Required by TDX)
> 3) Update RMP table (Required by SNP)
> 4) Update the directmap of kernel (SNP, but I guess TDX needs it as well)

I'm not so sure TDX requires this. I was under that impression, but
Kirill raised some doubts about this and I'm not sure it's been
confirmed. If it's purely an SNP thing then there may not be much value
in creating a separate callback for it:

  https://lore.kernel.org/linux-mm/20221031141426.GA3994099@chaop.bj.intel.com/T/#meba4ce80709cd3afd3818b61e6419fd800287b9e

And for SNP, the current code does the unmapping/RMP update in the same
function:

  [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table

I'm not against splitting RMP/directmap handling, but just want to
understand what the requirements are around that a bit better.

Does handling the #3 / RMP update / kvm_arch_post_set_memory_attributes
stuff outside of MMU lock cause issues on TDX side? What sort of
handling is needed in these callbacks for TDX (if anything)?

> 
> Does SNP really need to zap the NPT mappings when changing the memory
> attributes? (The new mappings will be created later in the fault). I don't
> find this requirement from APM.

I don't think we've added anything specifically for SNP. Do you mean the
generic kvm_unmap_gfn_range/kvm_flush_remote_tlbs sequence below?

  kvm_vm_ioctl_set_mem_attributes():
    KVM_MMU_LOCK(kvm)
    kvm_mmu_invalidate_begin()
    ...
    KVM_MMU_UNLOCK(kvm)

    kvm_vm_set_region_attr()  // xarray/attribute update

    ...
    KVM_MMU_LOCK(kvm)
    kvm_mem_attrs_changed():
      flush |= kvm_unmap_gfn_range()
      if (flush)
        kvm_flush_remote_tlbs()
    KVM_MMU_UNLOCK(kvm)

In general, when the RMPUPDATE instruction happens, the TLB entries for
the GPAs being modified will be flushed, so subsequent nested page fault
should be able to obtain the updated mapping based on xarray/#NPF at that
point. In that respect *maybe* we don't need to zap the entries there.

But if the nested page fault occurs before the RMPUPDATE, I think we would
have a race if the above sequence isn't in place to handle the unmap/flush,
since in that case we might get a stale mapping because nothing would've
forced a tlbflush.

There's also stuff like the UPM selftests and SEV lazy-pinning where I
think that kvm_unmap_gfn_range() sequence is also needed. But I might be
misunderstanding the question here.

> If yes, can we postpone the update of the RMP table in the later fault,
> like TDX? So that we can save this update_mem_attr x86 ops as things
> will be solved in the SNP-specific fault handler.

Hmm, I think this would be possible. But it's nice to be able to handle
the RMPUPDATE as part of KVM_SET_MEMORY_ATTRIBUTES, since it allows
KVM MMU code to rely solely on xarray state and not have to query RMP
table to check if a particular PFN needs an RMPUPDATE before mapping it
into RMP table.

At least... it would *in theory*, if the RMPUPDATE happened under
protection of mmu_invalidate_seq (in which case it could inherit all the
same protections KVM MMU has around mmu_invalidate_seq/fault->mmu_seq,
e.g. letting the guest retry the #PF if fault->mmu_seq is stale).

But currently, RMPUPDATE (via kvm_arch_post_set_memory_attributes) happens
*after* the invalidation sequence above, so in theory a guest could fault
on a page just after xarray state is updated, but before the RMPUPDATE has
been done, in which case the KVM MMU code would properly map the page
accordingly to xarray, but since RMPUPDATE wouldn't have happened yet, the
state of the corresponding PFN in RMP table won't match the shared/private
access type expected by the guest, so when it tries to access it it will
get another #NPF with RMP bit set in the error code, which will get
handled as a no-op in handle_rmp_page_fault() (patch #44) and loop like
this until the RMPUPDATE is finally done. So it still works out, but
maybe not keeping as much in sync with xarray state and could be.

But deferring RMPUPDATE to fault time goes in the other direction of
that. Are there benefits/requirements for doing things this way for TDX?
I could see it being beneficial in terms of reducing overhead for
uneeded page-state transitions, since they are only done on-demand but
doesn't seem like it would be that much overhead compared to some of the
other operations being done.

> 
> If no, guess we need a x86 ops to tell if a zapping is required.

Sorry don't think I quite understand the suggestion. What would this
zapping be covering vs. the invalidation sequence that currently happens
in kvm_vm_ioctl_set_mem_attributes()?

> 
> Back to the lock, updating RMP table doesn't require a mutex. Taking
> the lock is required when updating the directmap. both TDX/SNP requires
> this update the directmap when changing memory attributes.

Is that confirmed? If so, do you have a pointer to the associated
documentation? I'm a bit unclear on this due to above-mentioned
discussion.

> 
> Wouldn't it better to factor the touching directmap of kernel part out?

It actually needs to happen before the RMPUPDATE. As soon as there is a
shared->private conversion in the RMP table for a particular PFN, then
any access via directmap by any particular kernel thread to any PFN that
happens to be in the same physical 2M range can cause an RMP fault on
the host, which would be fatal. So the rmpupdate() helper in this series
will unmap directmap entry corresponding the PFN before a shared->private
RMPUPDATE, and restore mappings after private->shared RMPUPDATE

So we could still factor it out, but it would be something like:

  if (attr == private)
    kvm_unmap_directmap(start, end)
  kvm_mem_attrs_changed()
  if (attr == shared)
    kvm_map_directmap(start, end)

> 
> Then you can call the x86 ops.update_mem_attr() in kvm_mem_attrs_changed().
> And update the direct kernel mapping for both TDX/SNP in the
> kvm_post_mem_attrs_changed().

Or, adjusting for the above logic, move the unmapping/mapping to a new
kvm_pre_mem_attrs_changed() and kvm_post_mem_attrs_changed(), respectively.

Which seems pretty reasonable to me. Then we can:
 - drop duplicating the kvm_for_each_memslot_in_gfn_range() walk stuff because
   we'd just need to know what PFNs to map/unmap from directmap
   (although we'd still need a loop around kvm_restrictedmem_get_pfn()
   for the GFN range so not necessarily prettier)
 - call the RMPUPDATE / corresponding TDX handling via kvm_mem_attrs_changed()
   which brings it both under KVM MMU lock and also let's it piggyback
   off the fault->mmu_seq handling so it doesn't get out of sync with
   xarray during fault time.

But would be good to hear others' opinions on this. And also confirm
whether TDX needs that pre/post directmap handle or not.

Thanks!

-Mike

> 
> > 
> > -Mike
> > 
> > > 
> > > From 7c618c1f3c236c382e64680efcbe7d8a672aa870 Mon Sep 17 00:00:00 2001
> > > Message-Id: <7c618c1f3c236c382e64680efcbe7d8a672aa870.1679114841.git.isaku.yamahata@intel.com>
> > > In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > Date: Fri, 17 Mar 2023 12:00:09 -0700
> > > Subject: [PATCH 4/4] KVM: x86: Add 'set_mem_attr' x86 op
> > > 
> > > This callback will do any platform-specific handling needed for
> > > converting pages between shared/private.
> > > 
> > > Originally-by: Michael Roth <michael.roth@amd.com>
> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > ---
> > >  arch/x86/include/asm/kvm-x86-ops.h | 1 +
> > >  arch/x86/include/asm/kvm_host.h    | 2 ++
> > >  arch/x86/kvm/mmu/mmu.c             | 1 +
> > >  3 files changed, 4 insertions(+)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > > index dc5f18ac0bd5..956db2ee25a5 100644
> > > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > > @@ -100,6 +100,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
> > >  KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
> > >  KVM_X86_OP(load_mmu_pgd)
> > >  KVM_X86_OP(fault_is_private)
> > > +KVM_X86_OP_OPTIONAL(set_mem_attr)
> > >  KVM_X86_OP_OPTIONAL(link_private_spt)
> > >  KVM_X86_OP_OPTIONAL(free_private_spt)
> > >  KVM_X86_OP_OPTIONAL(split_private_spt)
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 0382d236fbf4..88e11dd3afde 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
> > >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > >  			     int root_level);
> > >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
> > > +	void (*set_mem_attr)(struct kvm *kvm, struct kvm_memory_slot *slot,
> > > +			     unsigned int attr, gfn_t start, gfn_t end);
> > >  
> > >  	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> > >  				void *private_spt);
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 0ec94c72895c..329333486e64 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -7908,6 +7908,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > >  				    gfn_t start, gfn_t end)
> > >  {
> > >  	kvm_update_lpage_mixed_flag(kvm, slot, true, attrs, start, end);
> > > +	static_call(kvm_x86_set_mem_attr)(kvm, slot, attrs, start, end);
> > >  }
> > >  
> > >  void kvm_memory_attributes_create_memslot(struct kvm *kvm,
> > > -- 
> > > 2.25.1
> > > 
> > > -- 
> > > Isaku Yamahata <isaku.yamahata@gmail.com>  
> 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' x86 op
  2023-03-22  1:58         ` Michael Roth
@ 2023-03-23 18:17           ` Zhi Wang
  2023-03-28  4:36             ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-03-23 18:17 UTC (permalink / raw)
  To: Michael Roth
  Cc: Isaku Yamahata, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

On Tue, 21 Mar 2023 20:58:38 -0500
Michael Roth <michael.roth@amd.com> wrote:

> On Tue, Mar 21, 2023 at 01:21:36PM +0200, Zhi Wang wrote:
> > On Mon, 20 Mar 2023 13:05:43 -0500
> > Michael Roth <michael.roth@amd.com> wrote:
> > 
> > > On Fri, Mar 17, 2023 at 09:56:11PM -0700, Isaku Yamahata wrote:
> > > > On Mon, Feb 20, 2023 at 12:37:53PM -0600,
> > > > Michael Roth <michael.roth@amd.com> wrote:
> > > >   
> > > > > This callback will do any platform-specific handling needed for
> > > > > converting pages between shared/private.
> > > > > 
> > > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > > ---
> > > > >  arch/x86/include/asm/kvm-x86-ops.h |  1 +
> > > > >  arch/x86/include/asm/kvm_host.h    |  2 ++
> > > > >  arch/x86/kvm/mmu/mmu.c             | 13 +++++++++++++
> > > > >  include/linux/kvm_host.h           |  4 ++++
> > > > >  virt/kvm/kvm_main.c                | 29 +++++++++++++++++++++++++++++
> > > > >  5 files changed, 49 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > > > > index 72183da010b8..a8aaf532c2ab 100644
> > > > > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > > > > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > > > > @@ -132,6 +132,7 @@ KVM_X86_OP(complete_emulated_msr)
> > > > >  KVM_X86_OP(vcpu_deliver_sipi_vector)
> > > > >  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> > > > >  KVM_X86_OP_OPTIONAL_RET0(fault_is_private);
> > > > > +KVM_X86_OP_OPTIONAL_RET0(update_mem_attr)
> > > > >  
> > > > >  #undef KVM_X86_OP
> > > > >  #undef KVM_X86_OP_OPTIONAL
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > index f856d689dda0..2da3fb2d5d1b 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -1644,6 +1644,8 @@ struct kvm_x86_ops {
> > > > >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > > > >  			     int root_level);
> > > > >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault);
> > > > > +	int (*update_mem_attr)(struct kvm_memory_slot *slot, unsigned int attr,
> > > > > +			       gfn_t start, gfn_t end);
> > > > >  
> > > > >  	bool (*has_wbinvd_exit)(void);
> > > > >  
> > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > > index fb3f34b7391c..053bd77bbf52 100644
> > > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > > @@ -7251,4 +7251,17 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > > > >  		linfo_update_mixed(gfn, slot, level, mixed);
> > > > >  	}
> > > > >  }
> > > > > +
> > > > > +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> > > > > +					 struct kvm_memory_slot *slot,
> > > > > +					 unsigned long attrs,
> > > > > +					 gfn_t start, gfn_t end)
> > > > > +{
> > > > > +	int ret;
> > > > > +
> > > > > +	ret = static_call(kvm_x86_update_mem_attr)(slot, attrs, start, end);
> > > > > +	if (ret)
> > > > > +		pr_warn_ratelimited("Failed to update GFN range 0x%llx-0x%llx with attributes 0x%lx. Ret: %d\n",
> > > > > +				    start, end, attrs, ret);
> > > > > +}
> > > > >  #endif
> > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > > index fdc59479b3e2..d200b8f45583 100644
> > > > > --- a/include/linux/kvm_host.h
> > > > > +++ b/include/linux/kvm_host.h
> > > > > @@ -2330,6 +2330,10 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > > > >  				    struct kvm_memory_slot *slot,
> > > > >  				    unsigned long attrs,
> > > > >  				    gfn_t start, gfn_t end);
> > > > > +void kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> > > > > +					 struct kvm_memory_slot *slot,
> > > > > +					 unsigned long attrs,
> > > > > +					 gfn_t start, gfn_t end);
> > > > >  
> > > > >  static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
> > > > >  {
> > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > > index b68574ff6c30..8ec985f1c57d 100644
> > > > > --- a/virt/kvm/kvm_main.c
> > > > > +++ b/virt/kvm/kvm_main.c
> > > > > @@ -2561,6 +2561,32 @@ static void kvm_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
> > > > >  		kvm_flush_remote_tlbs(kvm);
> > > > >  }
> > > > >  
> > > > > +static void kvm_post_mem_attrs_changed(struct kvm *kvm, unsigned long attrs,
> > > > > +				       gfn_t start_orig, gfn_t end_orig)
> > > > > +{
> > > > > +	struct kvm_memory_slot *slot;
> > > > > +	struct kvm_memslots *slots;
> > > > > +	struct kvm_memslot_iter iter;
> > > > > +	int i;
> > > > > +
> > > > > +	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> > > > > +		slots = __kvm_memslots(kvm, i);
> > > > > +
> > > > > +		kvm_for_each_memslot_in_gfn_range(&iter, slots, start_orig, end_orig) {
> > > > > +			gfn_t start, end;
> > > > > +
> > > > > +			slot = iter.slot;
> > > > > +			start = max(start_orig, slot->base_gfn);
> > > > > +			end = min(end_orig, slot->base_gfn + slot->npages);
> > > > > +
> > > > > +			if (start >= end)
> > > > > +				continue;
> > > > > +
> > > > > +			kvm_arch_post_set_memory_attributes(kvm, slot, attrs, start, end);
> > > > > +		}
> > > > > +	}
> > > > > +}
> > > > > +
> > > > >  static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > > >  					   struct kvm_memory_attributes *attrs)
> > > > >  {
> > > > > @@ -2602,6 +2628,9 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > > >  	kvm_mmu_invalidate_end(kvm);
> > > > >  	KVM_MMU_UNLOCK(kvm);
> > > > >  
> > > > > +	if (i > start)
> > > > > +		kvm_post_mem_attrs_changed(kvm, attrs->attributes, start, i);
> > > > > +  
> > > > 
> > > > Doesn't kvm_arch_set_memory_attributes() work for you? i.e the following patch.
> > > > The error check and pr_warn_ratelimited() can be pushed down into the callback.  
> > > 
> > > This is originally how I had but when CONFIG_PREEMPT_COUNT is set this
> > > will generate warnings for this callback as well as the invalidation
> > > callback as reported in v7 here:
> > > 
> > >   https://lore.kernel.org/lkml/Y80vhKwQyw8hS%2F22@notebook/
> > > 
> > > The main issue is that kvm_mem_attrs_changed() is called while holding
> > > the KVM MMU lock, which disables preemption. But when updating
> > > attributes for SNP, we also need to remove private pages from kernel
> > > directmap, which involves acquiring a mutex which results in
> > > "BUG: scheduling while atomic" warnings.
> > > 
> > > So that's why we ended up somewhat duplicating some of the logic and
> > > using a separate callback chain that happens out of KVM MMU lock.
> > 
> > Let's split the things of changing memory attributes:
> > 
> > 1) Update the memory attributes in the xa array (Both TDX and SNP)
> > 2) Zapping the EPT/NPT mappings (Required by TDX)
> > 3) Update RMP table (Required by SNP)
> > 4) Update the directmap of kernel (SNP, but I guess TDX needs it as well)
> 

Thanks for the effort of detailed reply. It is very informative.

> I'm not so sure TDX requires this. I was under that impression, but
> Kirill raised some doubts about this and I'm not sure it's been
> confirmed. If it's purely an SNP thing then there may not be much value
> in creating a separate callback for it:
> 
>   https://lore.kernel.org/linux-mm/20221031141426.GA3994099@chaop.bj.intel.com/T/#meba4ce80709cd3afd3818b61e6419fd800287b9e
> 

Hmm, Krill and Isaku, can you confirm that TDX doesn't need this?

I think it is a generic requirement that TDX/SNP are not expecting the
host to touch a private page either from the kernel or the userspace. 

> And for SNP, the current code does the unmapping/RMP update in the same
> function:
> 
>   [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
> 
> I'm not against splitting RMP/directmap handling, but just want to
> understand what the requirements are around that a bit better.
> 
> Does handling the #3 / RMP update / kvm_arch_post_set_memory_attributes
> stuff outside of MMU lock cause issues on TDX side? What sort of
> handling is needed in these callbacks for TDX (if anything)?
> 

No, it doesn't cause problem for TDX as TDX doesn't need such callback.

Unlike SNP, which has (1 NPT + 1 RMP) and the enforced HW check is done by RMP, TDX has
two EPT(smiliar with NPT)s (1 shared + 1 private). Converting the memory attr is achieved
by zapping the mapping from one EPT and creating the mapping in the other one in the fault
when guest access the memory. The fault GPA will carry the "SHARED" bit (!C-BIT), so
KVM knows which EPT should be chosen for populating the mapping.

I was trying to figure out what should be the proper callback and at which layer it should
sit for achieving changing memory attr for both TDX/SNP. The current callback looks a little
bit hacky. Duplicate code pieces because of locks implies the SW structure might need to be
re-considered.

> > 
> > Does SNP really need to zap the NPT mappings when changing the memory
> > attributes? (The new mappings will be created later in the fault). I don't
> > find this requirement from APM.
> 
> I don't think we've added anything specifically for SNP. Do you mean the
> generic kvm_unmap_gfn_range/kvm_flush_remote_tlbs sequence below?
> 
>   kvm_vm_ioctl_set_mem_attributes():
>     KVM_MMU_LOCK(kvm)
>     kvm_mmu_invalidate_begin()
>     ...
>     KVM_MMU_UNLOCK(kvm)
> 
>     kvm_vm_set_region_attr()  // xarray/attribute update
> 
>     ...
>     KVM_MMU_LOCK(kvm)
>     kvm_mem_attrs_changed():
>       flush |= kvm_unmap_gfn_range()
>       if (flush)
>         kvm_flush_remote_tlbs()
>     KVM_MMU_UNLOCK(kvm)
>

Yes, I was talking about the sequence above. I was confused of why changing
RMP requires a zapping-recreating flow of NPT in SNP.

> In general, when the RMPUPDATE instruction happens, the TLB entries for
> the GPAs being modified will be flushed, so subsequent nested page fault
> should be able to obtain the updated mapping based on xarray/#NPF at that
> point. In that respect *maybe* we don't need to zap the entries there.
> 
> But if the nested page fault occurs before the RMPUPDATE, I think we would
> have a race if the above sequence isn't in place to handle the unmap/flush,
> since in that case we might get a stale mapping because nothing would've
> forced a tlbflush.
> 
> There's also stuff like the UPM selftests and SEV lazy-pinning where I
> think that kvm_unmap_gfn_range() sequence is also needed. But I might be
> misunderstanding the question here.
> 

In this case, an extra tlbflush would solve? Still, the unnecessary
zapping/recreating of mapping is not promising. I understand that the way
how this patch goes is probably to minimize the changes, but it would be
nice to focus more on what is really needed in a common path and abstract
and re-factor from there.

Can you elaborate more about how the lazy-pinning unpin path is connected
with the zapping here? So that I can dig more about it.

Selftest is a minor case, guess we deal with them via enabling a switch.
E.g. a prop in debugfs.

> > If yes, can we postpone the update of the RMP table in the later fault,
> > like TDX? So that we can save this update_mem_attr x86 ops as things
> > will be solved in the SNP-specific fault handler.
> 
> Hmm, I think this would be possible. But it's nice to be able to handle
> the RMPUPDATE as part of KVM_SET_MEMORY_ATTRIBUTES, since it allows
> KVM MMU code to rely solely on xarray state and not have to query RMP
> table to check if a particular PFN needs an RMPUPDATE before mapping it
> into RMP table.
> 
> At least... it would *in theory*, if the RMPUPDATE happened under
> protection of mmu_invalidate_seq (in which case it could inherit all the
> same protections KVM MMU has around mmu_invalidate_seq/fault->mmu_seq,
> e.g. letting the guest retry the #PF if fault->mmu_seq is stale).
> 
> But currently, RMPUPDATE (via kvm_arch_post_set_memory_attributes) happens
> *after* the invalidation sequence above, so in theory a guest could fault
> on a page just after xarray state is updated, but before the RMPUPDATE has
> been done, in which case the KVM MMU code would properly map the page
> accordingly to xarray, but since RMPUPDATE wouldn't have happened yet, the
> state of the corresponding PFN in RMP table won't match the shared/private
> access type expected by the guest, so when it tries to access it it will
> get another #NPF with RMP bit set in the error code, which will get
> handled as a no-op in handle_rmp_page_fault() (patch #44) and loop like
> this until the RMPUPDATE is finally done. So it still works out, but
> maybe not keeping as much in sync with xarray state and could be.
> 

I see. rmp fault handler only deals with page size mismatch for now.

> But deferring RMPUPDATE to fault time goes in the other direction of
> that. Are there benefits/requirements for doing things this way for TDX?
> I could see it being beneficial in terms of reducing overhead for
> uneeded page-state transitions, since they are only done on-demand but
> doesn't seem like it would be that much overhead compared to some of the
> other operations being done.
> 

Besides the HW design, I guess one major purpose is to optimize the
booting time of VMs with large memory. Also, post migration can be another case.

Out of curiosity, What is the avg cost of RMUPDATE? Suppose it is an x86
instruction and not going through PSP firmware.

> > 
> > If no, guess we need a x86 ops to tell if a zapping is required.
> 
> Sorry don't think I quite understand the suggestion. What would this
> zapping be covering vs. the invalidation sequence that currently happens
> in kvm_vm_ioctl_set_mem_attributes()?

I was thinking that zapping of the mapping in EPT/NPT was required by TDX
while SNP might only need an RMP update + TLB flush. Thus, the abstraction
of the kvm_x86_ops.update_mem_attr should sit at this level. But let's
scratch this for now as I need to dig more about the lazy pinning stuff.

> 
> > 
> > Back to the lock, updating RMP table doesn't require a mutex. Taking
> > the lock is required when updating the directmap. both TDX/SNP requires
> > this update the directmap when changing memory attributes.
> 
> Is that confirmed? If so, do you have a pointer to the associated
> documentation? I'm a bit unclear on this due to above-mentioned
> discussion.
> 
> > 
> > Wouldn't it better to factor the touching directmap of kernel part out?
> 
> It actually needs to happen before the RMPUPDATE. As soon as there is a
> shared->private conversion in the RMP table for a particular PFN, then
> any access via directmap by any particular kernel thread to any PFN that
> happens to be in the same physical 2M range can cause an RMP fault on
> the host, which would be fatal. So the rmpupdate() helper in this series
> will unmap directmap entry corresponding the PFN before a shared->private
> RMPUPDATE, and restore mappings after private->shared RMPUPDATE
> 
> So we could still factor it out, but it would be something like:
> 
>   if (attr == private)
>     kvm_unmap_directmap(start, end)
>   kvm_mem_attrs_changed()
>   if (attr == shared)
>     kvm_map_directmap(start, end)
> 
> > 
> > Then you can call the x86 ops.update_mem_attr() in kvm_mem_attrs_changed().
> > And update the direct kernel mapping for both TDX/SNP in the
> > kvm_post_mem_attrs_changed().
> 
> Or, adjusting for the above logic, move the unmapping/mapping to a new
> kvm_pre_mem_attrs_changed() and kvm_post_mem_attrs_changed(), respectively.
> 
> Which seems pretty reasonable to me. Then we can:
>  - drop duplicating the kvm_for_each_memslot_in_gfn_range() walk stuff because
>    we'd just need to know what PFNs to map/unmap from directmap
>    (although we'd still need a loop around kvm_restrictedmem_get_pfn()
>    for the GFN range so not necessarily prettier)
>  - call the RMPUPDATE / corresponding TDX handling via kvm_mem_attrs_changed()
>    which brings it both under KVM MMU lock and also let's it piggyback
>    off the fault->mmu_seq handling so it doesn't get out of sync with
>    xarray during fault time.
> 

That sounds better. I am just little bit worried that update_mem_attr() will
end up as an SNP-only callback.

> But would be good to hear others' opinions on this. And also confirm
> whether TDX needs that pre/post directmap handle or not.

Yes.

> 
> Thanks!
> 
> -Mike
> 
> > 
> > > 
> > > -Mike
> > > 
> > > > 
> > > > From 7c618c1f3c236c382e64680efcbe7d8a672aa870 Mon Sep 17 00:00:00 2001
> > > > Message-Id: <7c618c1f3c236c382e64680efcbe7d8a672aa870.1679114841.git.isaku.yamahata@intel.com>
> > > > In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > > References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > Date: Fri, 17 Mar 2023 12:00:09 -0700
> > > > Subject: [PATCH 4/4] KVM: x86: Add 'set_mem_attr' x86 op
> > > > 
> > > > This callback will do any platform-specific handling needed for
> > > > converting pages between shared/private.
> > > > 
> > > > Originally-by: Michael Roth <michael.roth@amd.com>
> > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > ---
> > > >  arch/x86/include/asm/kvm-x86-ops.h | 1 +
> > > >  arch/x86/include/asm/kvm_host.h    | 2 ++
> > > >  arch/x86/kvm/mmu/mmu.c             | 1 +
> > > >  3 files changed, 4 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > > > index dc5f18ac0bd5..956db2ee25a5 100644
> > > > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > > > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > > > @@ -100,6 +100,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
> > > >  KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
> > > >  KVM_X86_OP(load_mmu_pgd)
> > > >  KVM_X86_OP(fault_is_private)
> > > > +KVM_X86_OP_OPTIONAL(set_mem_attr)
> > > >  KVM_X86_OP_OPTIONAL(link_private_spt)
> > > >  KVM_X86_OP_OPTIONAL(free_private_spt)
> > > >  KVM_X86_OP_OPTIONAL(split_private_spt)
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 0382d236fbf4..88e11dd3afde 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
> > > >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > > >  			     int root_level);
> > > >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
> > > > +	void (*set_mem_attr)(struct kvm *kvm, struct kvm_memory_slot *slot,
> > > > +			     unsigned int attr, gfn_t start, gfn_t end);
> > > >  
> > > >  	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> > > >  				void *private_spt);
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 0ec94c72895c..329333486e64 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -7908,6 +7908,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > > >  				    gfn_t start, gfn_t end)
> > > >  {
> > > >  	kvm_update_lpage_mixed_flag(kvm, slot, true, attrs, start, end);
> > > > +	static_call(kvm_x86_set_mem_attr)(kvm, slot, attrs, start, end);
> > > >  }
> > > >  
> > > >  void kvm_memory_attributes_create_memslot(struct kvm *kvm,
> > > > -- 
> > > > 2.25.1
> > > > 
> > > > -- 
> > > > Isaku Yamahata <isaku.yamahata@gmail.com>  
> > 
> > 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 36/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2023-02-20 18:38 ` [PATCH RFC v8 36/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
@ 2023-03-24 14:40   ` Alexander Graf
  0 siblings, 0 replies; 147+ messages in thread
From: Alexander Graf @ 2023-03-24 14:40 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Harald Hoyer

On 20.02.23 19:38, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
> it as the measurement of the guest at launch.
>
> While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
> to encrypt the VMSA pages.
>
> If its an SNP guest, then VMSA was added in the RMP entry as
> a guest owned page and also removed from the kernel direct map
> so flush it later after it is transitioned back to hypervisor
> state and restored in the direct map.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Harald Hoyer <harald@profian.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   .../virt/kvm/x86/amd-memory-encryption.rst    |  23 ++++
>   arch/x86/kvm/svm/sev.c                        | 122 ++++++++++++++++++
>   include/uapi/linux/kvm.h                      |  14 ++
>   3 files changed, 159 insertions(+)
>
[...]


> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 03dd227f6090..515e22d0dc30 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2280,6 +2280,109 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>                                        snp_launch_update_gfn_handler, argp);
>   }
>
> +static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +       struct sev_data_snp_launch_update data = {};
> +       struct kvm_vcpu *vcpu;
> +       unsigned long i;
> +       int ret;
> +
> +       data.gctx_paddr = __psp_pa(sev->snp_context);
> +       data.page_type = SNP_PAGE_TYPE_VMSA;
> +
> +       kvm_for_each_vcpu(i, vcpu, kvm) {
> +               struct vcpu_svm *svm = to_svm(vcpu);
> +               u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
> +
> +               /* Perform some pre-encryption checks against the VMSA */
> +               ret = sev_es_sync_vmsa(svm);
> +               if (ret)
> +                       return ret;
> +
> +               /* Transition the VMSA page to a firmware state. */
> +               ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);
> +               if (ret)
> +                       return ret;
> +
> +               /* Issue the SNP command to encrypt the VMSA */
> +               data.address = __sme_pa(svm->sev_es.vmsa);
> +               ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
> +                                     &data, &argp->error);


There is no contract in KVM that dictates that the first entry in the 
vcpu list needs to be vcpu_id==0 (BSP). That means if you use a user 
space that spawns vCPUs in parallel on init, you will end up with the 
BSP behind APs in the LAUNCH_UPDATE order.

This is a problem because for LAUNCH_UPDATE, the order matters. BSP and 
AP vCPUs have different initial state and so if you want to reconstruct 
the launch digest, you need to ensure that the guest knows the order.

The easiest way I can think of to fix this is to call 
snp_launch_update_vmsa twice: Once filtering for vcpu_id == 0 and once 
for vcpu_id != 0.


Thanks,

Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 04/56] KVM: Add HVA range operator
  2023-02-20 21:37   ` Zhi Wang
@ 2023-03-27  0:34     ` Michael Roth
  2023-04-04 14:40       ` Zhi Wang
  0 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-03-27  0:34 UTC (permalink / raw)
  To: Zhi Wang
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Vishal Annapurve

On Mon, Feb 20, 2023 at 11:37:09PM +0200, Zhi Wang wrote:
> On Mon, 20 Feb 2023 12:37:55 -0600
> Michael Roth <michael.roth@amd.com> wrote:
> 
> > From: Vishal Annapurve <vannapurve@google.com>
> > 
> > Introduce HVA range operator so that other KVM subsystems
> > can operate on HVA range.
> > 
> > Signed-off-by: Vishal Annapurve <vannapurve@google.com>
> > [mdr: minor checkpatch alignment fixups]
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  include/linux/kvm_host.h |  6 +++++
> >  virt/kvm/kvm_main.c      | 48 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 54 insertions(+)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 4d542060cd93..c615650ed256 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1402,6 +1402,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm);
> >  void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
> >  void kvm_mmu_invalidate_end(struct kvm *kvm);
> >  
> > +typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
> > +				struct kvm_gfn_range *range, void *data);
> > +
> > +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
> > +			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
> > +
> >  long kvm_arch_dev_ioctl(struct file *filp,
> >  			unsigned int ioctl, unsigned long arg);
> >  long kvm_arch_vcpu_ioctl(struct file *filp,
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index f7e00593cc5d..4ccd655dd5af 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -642,6 +642,54 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
> >  	return (int)ret;
> >  }
> >  
> 
> Below function seems a reduced duplicate of __kvm_handle_hva_range()
> in virt/kvm/kvm_main.c. It would be nice to factor __kvm_handle_hva_range().

A few differences make it difficult to refactor this clearly:

  - This handler is mainly used for loading initial contents into guest
    image before booting and doesn't rely on the MMU lock being held. It
    also *can't* be called with MMU lock held because it suffers from the
    same issue with mem_attr_update() hook where it needs to take a
    mutex as part of unmapping from directmap when transitioning page to
    private state in RMP table
  - This handler wants to return an error code, as opposed to existing
    handlers which return a true/false values which are passed along to
    MMU notifier call-site and handled differently.
  - This handler wants to terminate iterating through memslots as soon
    as it encounters the first failure, whereas the existing handlers
    expect to be called for each slot regardless of return value.

So it's a pretty different use-case that adds enough complexity to
__kvm_handle_hva_range() that it might need be worth refactoring it,
since it complicates some bits that are closely tied to dealing with
invalidations where the extra complexity probably needs to be
well-warranted.

I took a stab at it here for reference, but even with what seems to be
the minimal set of changes it doesn't save on any code and ultimately I
think it makes it harder to make sense of what going on:

  https://github.com/mdroth/linux/commit/976c5fb708f7babe899fd80e27e19f8ba3f6818d

Is there a better approach?

Thanks,

-Mike

> 
> > +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
> > +			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
> > +{
> > +	int ret = 0;
> > +	struct kvm_gfn_range gfn_range;
> > +	struct kvm_memory_slot *slot;
> > +	struct kvm_memslots *slots;
> > +	int i, idx;
> > +
> > +	if (WARN_ON_ONCE(hva_end <= hva_start))
> > +		return -EINVAL;
> > +
> > +	idx = srcu_read_lock(&kvm->srcu);
> > +
> > +	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> > +		struct interval_tree_node *node;
> > +
> > +		slots = __kvm_memslots(kvm, i);
> > +		kvm_for_each_memslot_in_hva_range(node, slots,
> > +						  hva_start, hva_end - 1) {
> > +			unsigned long start, end;
> > +
> > +			slot = container_of(node, struct kvm_memory_slot,
> > +					    hva_node[slots->node_idx]);
> > +			start = max(hva_start, slot->userspace_addr);
> > +			end = min(hva_end, slot->userspace_addr +
> > +						  (slot->npages << PAGE_SHIFT));
> > +
> > +			/*
> > +			 * {gfn(page) | page intersects with [hva_start, hva_end)} =
> > +			 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
> > +			 */
> > +			gfn_range.start = hva_to_gfn_memslot(start, slot);
> > +			gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
> > +			gfn_range.slot = slot;
> > +
> > +			ret = handler(kvm, &gfn_range, data);
> > +			if (ret)
> > +				goto e_ret;
> > +		}
> > +	}
> > +
> > +e_ret:
> > +	srcu_read_unlock(&kvm->srcu, idx);
> > +
> > +	return ret;
> > +}
> > +
> >  static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
> >  						unsigned long start,
> >  						unsigned long end,
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' x86 op
  2023-03-23 18:17           ` Zhi Wang
@ 2023-03-28  4:36             ` Michael Roth
  2023-03-28 23:00               ` Zhi Wang
  0 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-03-28  4:36 UTC (permalink / raw)
  To: Zhi Wang
  Cc: Isaku Yamahata, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

On Thu, Mar 23, 2023 at 08:17:16PM +0200, Zhi Wang wrote:
> On Tue, 21 Mar 2023 20:58:38 -0500
> Michael Roth <michael.roth@amd.com> wrote:
> 
> > On Tue, Mar 21, 2023 at 01:21:36PM +0200, Zhi Wang wrote:
> > > On Mon, 20 Mar 2023 13:05:43 -0500
> > > Michael Roth <michael.roth@amd.com> wrote:
> > > 
> > > > On Fri, Mar 17, 2023 at 09:56:11PM -0700, Isaku Yamahata wrote:
> > > > > On Mon, Feb 20, 2023 at 12:37:53PM -0600,
> > > > > Michael Roth <michael.roth@amd.com> wrote:
> > > > >   
> > > > > > This callback will do any platform-specific handling needed for
> > > > > > converting pages between shared/private.
> > > > > > 
> > > > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > > > ---

<snip>

> > > > > >  static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > > > >  					   struct kvm_memory_attributes *attrs)
> > > > > >  {
> > > > > > @@ -2602,6 +2628,9 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > > > >  	kvm_mmu_invalidate_end(kvm);
> > > > > >  	KVM_MMU_UNLOCK(kvm);
> > > > > >  
> > > > > > +	if (i > start)
> > > > > > +		kvm_post_mem_attrs_changed(kvm, attrs->attributes, start, i);
> > > > > > +  
> > > > > 
> > > > > Doesn't kvm_arch_set_memory_attributes() work for you? i.e the following patch.
> > > > > The error check and pr_warn_ratelimited() can be pushed down into the callback.  
> > > > 
> > > > This is originally how I had but when CONFIG_PREEMPT_COUNT is set this
> > > > will generate warnings for this callback as well as the invalidation
> > > > callback as reported in v7 here:
> > > > 
> > > >   https://lore.kernel.org/lkml/Y80vhKwQyw8hS%2F22@notebook/
> > > > 
> > > > The main issue is that kvm_mem_attrs_changed() is called while holding
> > > > the KVM MMU lock, which disables preemption. But when updating
> > > > attributes for SNP, we also need to remove private pages from kernel
> > > > directmap, which involves acquiring a mutex which results in
> > > > "BUG: scheduling while atomic" warnings.
> > > > 
> > > > So that's why we ended up somewhat duplicating some of the logic and
> > > > using a separate callback chain that happens out of KVM MMU lock.
> > > 
> > > Let's split the things of changing memory attributes:
> > > 
> > > 1) Update the memory attributes in the xa array (Both TDX and SNP)
> > > 2) Zapping the EPT/NPT mappings (Required by TDX)
> > > 3) Update RMP table (Required by SNP)
> > > 4) Update the directmap of kernel (SNP, but I guess TDX needs it as well)
> > 
> 
> Thanks for the effort of detailed reply. It is very informative.
> 
> > I'm not so sure TDX requires this. I was under that impression, but
> > Kirill raised some doubts about this and I'm not sure it's been
> > confirmed. If it's purely an SNP thing then there may not be much value
> > in creating a separate callback for it:
> > 
> >   https://lore.kernel.org/linux-mm/20221031141426.GA3994099@chaop.bj.intel.com/T/#meba4ce80709cd3afd3818b61e6419fd800287b9e
> > 
> 
> Hmm, Krill and Isaku, can you confirm that TDX doesn't need this?
> 
> I think it is a generic requirement that TDX/SNP are not expecting the
> host to touch a private page either from the kernel or the userspace. 

This main issue is that in the case of the RMP table, a write to a 2M
mapping is interpreted as a write to a private page even if the write
didn't actually touch any private pages within the range. It may be
possible in firmware/hardware to distinguish between these 2 cases, but I'm
not sure whether that's the case for TDX or not.

> 
> > And for SNP, the current code does the unmapping/RMP update in the same
> > function:
> > 
> >   [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
> > 
> > I'm not against splitting RMP/directmap handling, but just want to
> > understand what the requirements are around that a bit better.
> > 
> > Does handling the #3 / RMP update / kvm_arch_post_set_memory_attributes
> > stuff outside of MMU lock cause issues on TDX side? What sort of
> > handling is needed in these callbacks for TDX (if anything)?
> > 
> 
> No, it doesn't cause problem for TDX as TDX doesn't need such callback.
> 
> Unlike SNP, which has (1 NPT + 1 RMP) and the enforced HW check is done by RMP, TDX has
> two EPT(smiliar with NPT)s (1 shared + 1 private). Converting the memory attr is achieved
> by zapping the mapping from one EPT and creating the mapping in the other one in the fault
> when guest access the memory. The fault GPA will carry the "SHARED" bit (!C-BIT), so
> KVM knows which EPT should be chosen for populating the mapping.
> 
> I was trying to figure out what should be the proper callback and at which layer it should
> sit for achieving changing memory attr for both TDX/SNP. The current callback looks a little
> bit hacky. Duplicate code pieces because of locks implies the SW structure might need to be
> re-considered.
> 
> > > 
> > > Does SNP really need to zap the NPT mappings when changing the memory
> > > attributes? (The new mappings will be created later in the fault). I don't
> > > find this requirement from APM.
> > 
> > I don't think we've added anything specifically for SNP. Do you mean the
> > generic kvm_unmap_gfn_range/kvm_flush_remote_tlbs sequence below?
> > 
> >   kvm_vm_ioctl_set_mem_attributes():
> >     KVM_MMU_LOCK(kvm)
> >     kvm_mmu_invalidate_begin()
> >     ...
> >     KVM_MMU_UNLOCK(kvm)
> > 
> >     kvm_vm_set_region_attr()  // xarray/attribute update
> > 
> >     ...
> >     KVM_MMU_LOCK(kvm)
> >     kvm_mem_attrs_changed():
> >       flush |= kvm_unmap_gfn_range()
> >       if (flush)
> >         kvm_flush_remote_tlbs()
> >     KVM_MMU_UNLOCK(kvm)
> >
> 
> Yes, I was talking about the sequence above. I was confused of why changing
> RMP requires a zapping-recreating flow of NPT in SNP.

Hmm, so you're suggesting doing something like moving the
kvm_unmap_gfn_range()/kvm_flush_remote_tlbs() into .update_mem_attr()
callback so the platform can decide if it needs the zap/flush rather
than handling it in generic code?

If SNP really did't need the handling it's an interesting thought, but
I think it is needed there after all...

> 
> > In general, when the RMPUPDATE instruction happens, the TLB entries for
> > the GPAs being modified will be flushed, so subsequent nested page fault
> > should be able to obtain the updated mapping based on xarray/#NPF at that
> > point. In that respect *maybe* we don't need to zap the entries there.
> > 
> > But if the nested page fault occurs before the RMPUPDATE, I think we would
> > have a race if the above sequence isn't in place to handle the unmap/flush,
> > since in that case we might get a stale mapping because nothing would've
> > forced a tlbflush.
> > 
> > There's also stuff like the UPM selftests and SEV lazy-pinning where I
> > think that kvm_unmap_gfn_range() sequence is also needed. But I might be
> > misunderstanding the question here.
> > 
> 
> In this case, an extra tlbflush would solve? Still, the unnecessary
> zapping/recreating of mapping is not promising. I understand that the way
> how this patch goes is probably to minimize the changes, but it would be
> nice to focus more on what is really needed in a common path and abstract
> and re-factor from there.

I tested this and just doing the tlbflush is insufficient. If the
entries are still present in nested page table HV doesn't necessarily
get an #NPF so the guest can still end up with a stale mapping. After a
shared->private conversion the guest would cause #NPF with RMP/ENC bits
set once it tries to access it as a private page, but for a
private->shared conversion the guest can subsequently access the page
with ENC-bit=0 without causing an #NPF, but in this case GFN can still
be mapped to the PFN restrictedmem was using to back it when it was in
private state, instead of normal non-restricted memory.

So it seems like SNP needs the zapping behavior as well, and that it
isn't very different from the TDX/SEV-lazy/selftest users. So having
common handling seems worthwhile.

> 
> Can you elaborate more about how the lazy-pinning unpin path is connected
> with the zapping here? So that I can dig more about it.

Just to be clear this is with regard to SEV lazy-pinning, which makes
used of restrictedmem mainly for lazy-pinning of private pages, rather
than SNP (which also inherits lazy-pinning from restrictedmem).

The main thing with SEV is that the private/shared status of a page is
completely up to how the guest decides to map it in its page tables,
unlike with SNP where a mismatch between guest-expected status and
actual status in the RMP table will generate a #NPF. So SEV would be
even more reliant on current zapping behavior to ensure the NPT will
be updated.

> 
> Selftest is a minor case, guess we deal with them via enabling a switch.
> E.g. a prop in debugfs.

Wouldn't want to break things if a guest was running while selftest was
running, or something along that line. Especially since it seems to be a
common requirement given the above.

> 
> > > If yes, can we postpone the update of the RMP table in the later fault,
> > > like TDX? So that we can save this update_mem_attr x86 ops as things
> > > will be solved in the SNP-specific fault handler.
> > 
> > Hmm, I think this would be possible. But it's nice to be able to handle
> > the RMPUPDATE as part of KVM_SET_MEMORY_ATTRIBUTES, since it allows
> > KVM MMU code to rely solely on xarray state and not have to query RMP
> > table to check if a particular PFN needs an RMPUPDATE before mapping it
> > into RMP table.
> > 
> > At least... it would *in theory*, if the RMPUPDATE happened under
> > protection of mmu_invalidate_seq (in which case it could inherit all the
> > same protections KVM MMU has around mmu_invalidate_seq/fault->mmu_seq,
> > e.g. letting the guest retry the #PF if fault->mmu_seq is stale).
> > 
> > But currently, RMPUPDATE (via kvm_arch_post_set_memory_attributes) happens
> > *after* the invalidation sequence above, so in theory a guest could fault
> > on a page just after xarray state is updated, but before the RMPUPDATE has
> > been done, in which case the KVM MMU code would properly map the page
> > accordingly to xarray, but since RMPUPDATE wouldn't have happened yet, the
> > state of the corresponding PFN in RMP table won't match the shared/private
> > access type expected by the guest, so when it tries to access it it will
> > get another #NPF with RMP bit set in the error code, which will get
> > handled as a no-op in handle_rmp_page_fault() (patch #44) and loop like
> > this until the RMPUPDATE is finally done. So it still works out, but
> > maybe not keeping as much in sync with xarray state and could be.
> > 
> 
> I see. rmp fault handler only deals with page size mismatch for now.

That's correct.

> 
> > But deferring RMPUPDATE to fault time goes in the other direction of
> > that. Are there benefits/requirements for doing things this way for TDX?
> > I could see it being beneficial in terms of reducing overhead for
> > uneeded page-state transitions, since they are only done on-demand but
> > doesn't seem like it would be that much overhead compared to some of the
> > other operations being done.
> > 
> 
> Besides the HW design, I guess one major purpose is to optimize the
> booting time of VMs with large memory. Also, post migration can be another case.

It seems like without lazy-acceptance support in the guest there isn't
too much reason to optimize here, since the guest will necessarily fault
in every page as part of pre-accepting the memory in OVMF.

And if we're making use of lazy-acceptance, for SNP at least, we wouldn't
end up getting .update_mem_attr() callbacks in the first place since
those are ultimately the result of the guest issuing a shared->private
conversion request, which would generally happen until just before the
guest decides to accept/pvalidate that GFN. So with lazy-acceptance the
guest optimizes most of this potential overhead away already.

> 
> Out of curiosity, What is the avg cost of RMUPDATE? Suppose it is an x86
> instruction and not going through PSP firmware.

Yes, it's an x86 instruction, no firmware calls there. Average seems to
be about 1us per instruction. It's not insignificant, up to 5s for a
16GB guest in a worst case scenario where the guest does not optimize
for 2MB shared->private conversions and has no lazy-acceptance support,
but it doesn't seem like it would be common to try to boot large guests
in such a configuration.

> 
> > > 
> > > If no, guess we need a x86 ops to tell if a zapping is required.
> > 
> > Sorry don't think I quite understand the suggestion. What would this
> > zapping be covering vs. the invalidation sequence that currently happens
> > in kvm_vm_ioctl_set_mem_attributes()?
> 
> I was thinking that zapping of the mapping in EPT/NPT was required by TDX
> while SNP might only need an RMP update + TLB flush. Thus, the abstraction
> of the kvm_x86_ops.update_mem_attr should sit at this level. But let's
> scratch this for now as I need to dig more about the lazy pinning stuff.
> 
> > 
> > > 
> > > Back to the lock, updating RMP table doesn't require a mutex. Taking
> > > the lock is required when updating the directmap. both TDX/SNP requires
> > > this update the directmap when changing memory attributes.
> > 
> > Is that confirmed? If so, do you have a pointer to the associated
> > documentation? I'm a bit unclear on this due to above-mentioned
> > discussion.
> > 
> > > 
> > > Wouldn't it better to factor the touching directmap of kernel part out?
> > 
> > It actually needs to happen before the RMPUPDATE. As soon as there is a
> > shared->private conversion in the RMP table for a particular PFN, then
> > any access via directmap by any particular kernel thread to any PFN that
> > happens to be in the same physical 2M range can cause an RMP fault on
> > the host, which would be fatal. So the rmpupdate() helper in this series
> > will unmap directmap entry corresponding the PFN before a shared->private
> > RMPUPDATE, and restore mappings after private->shared RMPUPDATE
> > 
> > So we could still factor it out, but it would be something like:
> > 
> >   if (attr == private)
> >     kvm_unmap_directmap(start, end)
> >   kvm_mem_attrs_changed()
> >   if (attr == shared)
> >     kvm_map_directmap(start, end)
> > 
> > > 
> > > Then you can call the x86 ops.update_mem_attr() in kvm_mem_attrs_changed().
> > > And update the direct kernel mapping for both TDX/SNP in the
> > > kvm_post_mem_attrs_changed().
> > 
> > Or, adjusting for the above logic, move the unmapping/mapping to a new
> > kvm_pre_mem_attrs_changed() and kvm_post_mem_attrs_changed(), respectively.
> > 
> > Which seems pretty reasonable to me. Then we can:
> >  - drop duplicating the kvm_for_each_memslot_in_gfn_range() walk stuff because
> >    we'd just need to know what PFNs to map/unmap from directmap
> >    (although we'd still need a loop around kvm_restrictedmem_get_pfn()
> >    for the GFN range so not necessarily prettier)
> >  - call the RMPUPDATE / corresponding TDX handling via kvm_mem_attrs_changed()
> >    which brings it both under KVM MMU lock and also let's it piggyback
> >    off the fault->mmu_seq handling so it doesn't get out of sync with
> >    xarray during fault time.
> > 
> 
> That sounds better. I am just little bit worried that update_mem_attr() will
> end up as an SNP-only callback.

If it really ends up looking like an SNP-only thing, I don't see any
immediate issue with deferring this handling until fault time. But the
previous constraints remain:

 - directmap unmap needs to happen before shared->private RMPUPDATE,
   can't be called while holding KVM MMU lock or other spinlock
 - RMPUPDATE, and that needs to happen outside 
 - directmap restore needs to happen after private->shared RMPUPDATE,
   can't be called while holding KVM MMU lock or other spinlock

I saw the TDX patches added some x86 ops / hooks in KVM MMU to handle
mapping secure pages. Is there anything there you think is worth
re-using/re-purposing for SNP use-case?

Thanks,

-Mike

> 
> > But would be good to hear others' opinions on this. And also confirm
> > whether TDX needs that pre/post directmap handle or not.
> 
> Yes.
> 
> > 
> > Thanks!
> > 
> > -Mike
> > 
> > > 
> > > > 
> > > > -Mike
> > > > 
> > > > > 
> > > > > From 7c618c1f3c236c382e64680efcbe7d8a672aa870 Mon Sep 17 00:00:00 2001
> > > > > Message-Id: <7c618c1f3c236c382e64680efcbe7d8a672aa870.1679114841.git.isaku.yamahata@intel.com>
> > > > > In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > > > References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > > Date: Fri, 17 Mar 2023 12:00:09 -0700
> > > > > Subject: [PATCH 4/4] KVM: x86: Add 'set_mem_attr' x86 op
> > > > > 
> > > > > This callback will do any platform-specific handling needed for
> > > > > converting pages between shared/private.
> > > > > 
> > > > > Originally-by: Michael Roth <michael.roth@amd.com>
> > > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > > ---
> > > > >  arch/x86/include/asm/kvm-x86-ops.h | 1 +
> > > > >  arch/x86/include/asm/kvm_host.h    | 2 ++
> > > > >  arch/x86/kvm/mmu/mmu.c             | 1 +
> > > > >  3 files changed, 4 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > > > > index dc5f18ac0bd5..956db2ee25a5 100644
> > > > > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > > > > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > > > > @@ -100,6 +100,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
> > > > >  KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
> > > > >  KVM_X86_OP(load_mmu_pgd)
> > > > >  KVM_X86_OP(fault_is_private)
> > > > > +KVM_X86_OP_OPTIONAL(set_mem_attr)
> > > > >  KVM_X86_OP_OPTIONAL(link_private_spt)
> > > > >  KVM_X86_OP_OPTIONAL(free_private_spt)
> > > > >  KVM_X86_OP_OPTIONAL(split_private_spt)
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > index 0382d236fbf4..88e11dd3afde 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
> > > > >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > > > >  			     int root_level);
> > > > >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
> > > > > +	void (*set_mem_attr)(struct kvm *kvm, struct kvm_memory_slot *slot,
> > > > > +			     unsigned int attr, gfn_t start, gfn_t end);
> > > > >  
> > > > >  	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> > > > >  				void *private_spt);
> > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > > index 0ec94c72895c..329333486e64 100644
> > > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > > @@ -7908,6 +7908,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > > > >  				    gfn_t start, gfn_t end)
> > > > >  {
> > > > >  	kvm_update_lpage_mixed_flag(kvm, slot, true, attrs, start, end);
> > > > > +	static_call(kvm_x86_set_mem_attr)(kvm, slot, attrs, start, end);
> > > > >  }
> > > > >  
> > > > >  void kvm_memory_attributes_create_memslot(struct kvm *kvm,
> > > > > -- 
> > > > > 2.25.1
> > > > > 
> > > > > -- 
> > > > > Isaku Yamahata <isaku.yamahata@gmail.com>  
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2023-03-01 16:15   ` Dave Hansen
@ 2023-03-28 22:12     ` Michael Roth
  0 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-03-28 22:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Wed, Mar 01, 2023 at 08:15:46AM -0800, Dave Hansen wrote:
> On 2/20/23 10:38, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > The integrity guarantee of SEV-SNP is enforced through the RMP table.
> > The RMP is used with standard x86 and IOMMU page tables to enforce
> > memory restrictions and page access rights. The RMP check is enforced as
> > soon as SEV-SNP is enabled globally in the system. When hardware
> > encounters an RMP-check failure, it raises a page-fault exception.
> > 
> > The rmp_make_private() and rmp_make_shared() helpers are used to add
> > or remove the pages from the RMP table. Improve the rmp_make_private()
> > to invalidate state so that pages cannot be used in the direct-map after
> > they are added the RMP table, and restored to their default valid
> > permission after the pages are removed from the RMP table.
> 
> This is a purely "what" changelog.  It doesn't explain the "why" at all.
> 
> Could you please elaborate on why this unmapping operation is necessary?
> 

Here's my attempt at an updated changelog that explains why this handling
is needed:

  With SEV-SNP, if a host attempts to write to a page that is owned by a
  guest (according to the SEV-SNP RMP table), the host will get a #PF with
  a bit set to indicate that an RMP violation occurred. This shouldn't
  normally occur, since guest-owned pages are encrypted, so any attempt to
  write to them would just result in garbage being written.
  
  However, if a host writes to a page via a 2M/1G mapping in the host
  process' page table, the above #PF condition will trigger if *any*
  4K sub-pages mapped by that PTE are guest-owned, even if the write
  is only to 4K pages that are owned by the host.
  
  This becomes an issue with the kernel directmap, which provides a
  static/linear mapping of all kernel memory and defaults to using 2M
  pages. In cases where a page from one of these 2M ranges gets assigned
  to a guest, the kernel can inadvertantly trigger #PF by writing to some
  other page in that 2M region via the virtual addresses provided by the
  directmap.
  
  Address this by removing directmap mappings for these PFNs before
  marking them as guest-owned in the RMP table, which will result in the
  original 2M mapping being split and ensure that guest-owned pages can't
  be written to via the kernel directmap.

-Mike

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' x86 op
  2023-03-28  4:36             ` Michael Roth
@ 2023-03-28 23:00               ` Zhi Wang
  2023-03-29 23:50                 ` Michael Roth
  0 siblings, 1 reply; 147+ messages in thread
From: Zhi Wang @ 2023-03-28 23:00 UTC (permalink / raw)
  To: Michael Roth
  Cc: Isaku Yamahata, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

On Mon, 27 Mar 2023 23:36:35 -0500
Michael Roth <michael.roth@amd.com> wrote:

> On Thu, Mar 23, 2023 at 08:17:16PM +0200, Zhi Wang wrote:
> > On Tue, 21 Mar 2023 20:58:38 -0500
> > Michael Roth <michael.roth@amd.com> wrote:
> > 
> > > On Tue, Mar 21, 2023 at 01:21:36PM +0200, Zhi Wang wrote:
> > > > On Mon, 20 Mar 2023 13:05:43 -0500
> > > > Michael Roth <michael.roth@amd.com> wrote:
> > > > 
> > > > > On Fri, Mar 17, 2023 at 09:56:11PM -0700, Isaku Yamahata wrote:
> > > > > > On Mon, Feb 20, 2023 at 12:37:53PM -0600,
> > > > > > Michael Roth <michael.roth@amd.com> wrote:
> > > > > >   
> > > > > > > This callback will do any platform-specific handling needed for
> > > > > > > converting pages between shared/private.
> > > > > > > 
> > > > > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > > > > ---
> 
> <snip>
> 
> > > > > > >  static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > > > > >  					   struct kvm_memory_attributes *attrs)
> > > > > > >  {
> > > > > > > @@ -2602,6 +2628,9 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > > > > >  	kvm_mmu_invalidate_end(kvm);
> > > > > > >  	KVM_MMU_UNLOCK(kvm);
> > > > > > >  
> > > > > > > +	if (i > start)
> > > > > > > +		kvm_post_mem_attrs_changed(kvm, attrs->attributes, start, i);
> > > > > > > +  
> > > > > > 
> > > > > > Doesn't kvm_arch_set_memory_attributes() work for you? i.e the following patch.
> > > > > > The error check and pr_warn_ratelimited() can be pushed down into the callback.  
> > > > > 
> > > > > This is originally how I had but when CONFIG_PREEMPT_COUNT is set this
> > > > > will generate warnings for this callback as well as the invalidation
> > > > > callback as reported in v7 here:
> > > > > 
> > > > >   https://lore.kernel.org/lkml/Y80vhKwQyw8hS%2F22@notebook/
> > > > > 
> > > > > The main issue is that kvm_mem_attrs_changed() is called while holding
> > > > > the KVM MMU lock, which disables preemption. But when updating
> > > > > attributes for SNP, we also need to remove private pages from kernel
> > > > > directmap, which involves acquiring a mutex which results in
> > > > > "BUG: scheduling while atomic" warnings.
> > > > > 
> > > > > So that's why we ended up somewhat duplicating some of the logic and
> > > > > using a separate callback chain that happens out of KVM MMU lock.
> > > > 
> > > > Let's split the things of changing memory attributes:
> > > > 
> > > > 1) Update the memory attributes in the xa array (Both TDX and SNP)
> > > > 2) Zapping the EPT/NPT mappings (Required by TDX)
> > > > 3) Update RMP table (Required by SNP)
> > > > 4) Update the directmap of kernel (SNP, but I guess TDX needs it as well)
> > > 
> > 
> > Thanks for the effort of detailed reply. It is very informative.
> > 
> > > I'm not so sure TDX requires this. I was under that impression, but
> > > Kirill raised some doubts about this and I'm not sure it's been
> > > confirmed. If it's purely an SNP thing then there may not be much value
> > > in creating a separate callback for it:
> > > 
> > >   https://lore.kernel.org/linux-mm/20221031141426.GA3994099@chaop.bj.intel.com/T/#meba4ce80709cd3afd3818b61e6419fd800287b9e
> > > 
> > 
> > Hmm, Krill and Isaku, can you confirm that TDX doesn't need this?
> > 
> > I think it is a generic requirement that TDX/SNP are not expecting the
> > host to touch a private page either from the kernel or the userspace. 
> 
> This main issue is that in the case of the RMP table, a write to a 2M
> mapping is interpreted as a write to a private page even if the write
> didn't actually touch any private pages within the range. It may be
> possible in firmware/hardware to distinguish between these 2 cases, but I'm
> not sure whether that's the case for TDX or not.
> 

I was thinking this today. SNP uses this trick to cope with the existing
HW design. It is mandatory for SNP. What was the benefit of
removing/restoring the directmap for the purpose of restricting the access
from the kernel for TDX? Probably catching a wrong touch to a private
kernel page from the kernel space instead of triggering MCE.

My point goes to let them stay in SNP for now as they seems closely coupled
with RMUPDATE so far and the timing of doing RMUPDATE is still under
discussion. Maybe we invent some new pre/post, but they turn not necessary
later if the timing of RMUPDATE is changed.

> > 
> > > And for SNP, the current code does the unmapping/RMP update in the same
> > > function:
> > > 
> > >   [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
> > > 
> > > I'm not against splitting RMP/directmap handling, but just want to
> > > understand what the requirements are around that a bit better.
> > > 
> > > Does handling the #3 / RMP update / kvm_arch_post_set_memory_attributes
> > > stuff outside of MMU lock cause issues on TDX side? What sort of
> > > handling is needed in these callbacks for TDX (if anything)?
> > > 
> > 
> > No, it doesn't cause problem for TDX as TDX doesn't need such callback.
> > 
> > Unlike SNP, which has (1 NPT + 1 RMP) and the enforced HW check is done by RMP, TDX has
> > two EPT(smiliar with NPT)s (1 shared + 1 private). Converting the memory attr is achieved
> > by zapping the mapping from one EPT and creating the mapping in the other one in the fault
> > when guest access the memory. The fault GPA will carry the "SHARED" bit (!C-BIT), so
> > KVM knows which EPT should be chosen for populating the mapping.
> > 
> > I was trying to figure out what should be the proper callback and at which layer it should
> > sit for achieving changing memory attr for both TDX/SNP. The current callback looks a little
> > bit hacky. Duplicate code pieces because of locks implies the SW structure might need to be
> > re-considered.
> > 
> > > > 
> > > > Does SNP really need to zap the NPT mappings when changing the memory
> > > > attributes? (The new mappings will be created later in the fault). I don't
> > > > find this requirement from APM.
> > > 
> > > I don't think we've added anything specifically for SNP. Do you mean the
> > > generic kvm_unmap_gfn_range/kvm_flush_remote_tlbs sequence below?
> > > 
> > >   kvm_vm_ioctl_set_mem_attributes():
> > >     KVM_MMU_LOCK(kvm)
> > >     kvm_mmu_invalidate_begin()
> > >     ...
> > >     KVM_MMU_UNLOCK(kvm)
> > > 
> > >     kvm_vm_set_region_attr()  // xarray/attribute update
> > > 
> > >     ...
> > >     KVM_MMU_LOCK(kvm)
> > >     kvm_mem_attrs_changed():
> > >       flush |= kvm_unmap_gfn_range()
> > >       if (flush)
> > >         kvm_flush_remote_tlbs()
> > >     KVM_MMU_UNLOCK(kvm)
> > >
> > 
> > Yes, I was talking about the sequence above. I was confused of why changing
> > RMP requires a zapping-recreating flow of NPT in SNP.
> 
> Hmm, so you're suggesting doing something like moving the
> kvm_unmap_gfn_range()/kvm_flush_remote_tlbs() into .update_mem_attr()
> callback so the platform can decide if it needs the zap/flush rather
> than handling it in generic code?
> 
> If SNP really did't need the handling it's an interesting thought, but
> I think it is needed there after all...

Yes. I was thinking so. When I was reading the APM, I thought one of the
advantages of SNP architecture was splitting the HW-enforced check from NPT
mapping management. Thus when updating the RMP, the NPT can stay as it is. I
guess that's why I was so fancying of eliminating the extra zapping.

After reading the code of the UPM, I think we did need the zapping here for
SNP as pages for private memory and shared memory seem from different
sources. shared <->private memory conversion needs to wipe the mapping of
pages from one source and go through a fault-mapping flow on the pages
from the other source. Still wondering if there is any good idea to improve
it based on the current approaches.

> 
> > 
> > > In general, when the RMPUPDATE instruction happens, the TLB entries for
> > > the GPAs being modified will be flushed, so subsequent nested page fault
> > > should be able to obtain the updated mapping based on xarray/#NPF at that
> > > point. In that respect *maybe* we don't need to zap the entries there.
> > > 
> > > But if the nested page fault occurs before the RMPUPDATE, I think we would
> > > have a race if the above sequence isn't in place to handle the unmap/flush,
> > > since in that case we might get a stale mapping because nothing would've
> > > forced a tlbflush.
> > > 
> > > There's also stuff like the UPM selftests and SEV lazy-pinning where I
> > > think that kvm_unmap_gfn_range() sequence is also needed. But I might be
> > > misunderstanding the question here.
> > > 
> > 
> > In this case, an extra tlbflush would solve? Still, the unnecessary
> > zapping/recreating of mapping is not promising. I understand that the way
> > how this patch goes is probably to minimize the changes, but it would be
> > nice to focus more on what is really needed in a common path and abstract
> > and re-factor from there.
> 
> I tested this and just doing the tlbflush is insufficient. If the
> entries are still present in nested page table HV doesn't necessarily
> get an #NPF so the guest can still end up with a stale mapping. After a
> shared->private conversion the guest would cause #NPF with RMP/ENC bits
> set once it tries to access it as a private page, but for a
> private->shared conversion the guest can subsequently access the page
> with ENC-bit=0 without causing an #NPF, but in this case GFN can still
> be mapped to the PFN restrictedmem was using to back it when it was in
> private state, instead of normal non-restricted memory.
> 
> So it seems like SNP needs the zapping behavior as well, and that it
> isn't very different from the TDX/SEV-lazy/selftest users. So having
> common handling seems worthwhile.
> 
> > 
> > Can you elaborate more about how the lazy-pinning unpin path is connected
> > with the zapping here? So that I can dig more about it.
> 
> Just to be clear this is with regard to SEV lazy-pinning, which makes
> used of restrictedmem mainly for lazy-pinning of private pages, rather
> than SNP (which also inherits lazy-pinning from restrictedmem).
> 
> The main thing with SEV is that the private/shared status of a page is
> completely up to how the guest decides to map it in its page tables,
> unlike with SNP where a mismatch between guest-expected status and
> actual status in the RMP table will generate a #NPF. So SEV would be
> even more reliant on current zapping behavior to ensure the NPT will
> be updated.
> 
> > 
> > Selftest is a minor case, guess we deal with them via enabling a switch.
> > E.g. a prop in debugfs.
> 
> Wouldn't want to break things if a guest was running while selftest was
> running, or something along that line. Especially since it seems to be a
> common requirement given the above.
> 
> > 
> > > > If yes, can we postpone the update of the RMP table in the later fault,
> > > > like TDX? So that we can save this update_mem_attr x86 ops as things
> > > > will be solved in the SNP-specific fault handler.
> > > 
> > > Hmm, I think this would be possible. But it's nice to be able to handle
> > > the RMPUPDATE as part of KVM_SET_MEMORY_ATTRIBUTES, since it allows
> > > KVM MMU code to rely solely on xarray state and not have to query RMP
> > > table to check if a particular PFN needs an RMPUPDATE before mapping it
> > > into RMP table.
> > > 
> > > At least... it would *in theory*, if the RMPUPDATE happened under
> > > protection of mmu_invalidate_seq (in which case it could inherit all the
> > > same protections KVM MMU has around mmu_invalidate_seq/fault->mmu_seq,
> > > e.g. letting the guest retry the #PF if fault->mmu_seq is stale).
> > > 
> > > But currently, RMPUPDATE (via kvm_arch_post_set_memory_attributes) happens
> > > *after* the invalidation sequence above, so in theory a guest could fault
> > > on a page just after xarray state is updated, but before the RMPUPDATE has
> > > been done, in which case the KVM MMU code would properly map the page
> > > accordingly to xarray, but since RMPUPDATE wouldn't have happened yet, the
> > > state of the corresponding PFN in RMP table won't match the shared/private
> > > access type expected by the guest, so when it tries to access it it will
> > > get another #NPF with RMP bit set in the error code, which will get
> > > handled as a no-op in handle_rmp_page_fault() (patch #44) and loop like
> > > this until the RMPUPDATE is finally done. So it still works out, but
> > > maybe not keeping as much in sync with xarray state and could be.
> > > 
> > 
> > I see. rmp fault handler only deals with page size mismatch for now.
> 
> That's correct.
> 
> > 
> > > But deferring RMPUPDATE to fault time goes in the other direction of
> > > that. Are there benefits/requirements for doing things this way for TDX?
> > > I could see it being beneficial in terms of reducing overhead for
> > > uneeded page-state transitions, since they are only done on-demand but
> > > doesn't seem like it would be that much overhead compared to some of the
> > > other operations being done.
> > > 
> > 
> > Besides the HW design, I guess one major purpose is to optimize the
> > booting time of VMs with large memory. Also, post migration can be another case.
> 
> It seems like without lazy-acceptance support in the guest there isn't
> too much reason to optimize here, since the guest will necessarily fault
> in every page as part of pre-accepting the memory in OVMF.
> 

Do you mean pre-accepting the whole system memory? or only the initial
allocated memory for OVMF?

TDX has already been working in a lazy-acceptance style. I believe the
lazy-acceptance will be the furture direction. But I understand that part
can be in stage 2 with the lazy-acceptance for SNP.

> And if we're making use of lazy-acceptance, for SNP at least, we wouldn't
> end up getting .update_mem_attr() callbacks in the first place since
> those are ultimately the result of the guest issuing a shared->private
> conversion request, which would generally happen until just before the
> guest decides to accept/pvalidate that GFN. So with lazy-acceptance the
> guest optimizes most of this potential overhead away already.
I understand there can be 
> 
> > 
> > Out of curiosity, What is the avg cost of RMUPDATE? Suppose it is an x86
> > instruction and not going through PSP firmware.
> 
> Yes, it's an x86 instruction, no firmware calls there. Average seems to
> be about 1us per instruction. It's not insignificant, up to 5s for a
> 16GB guest in a worst case scenario where the guest does not optimize
> for 2MB shared->private conversions and has no lazy-acceptance support,
> but it doesn't seem like it would be common to try to boot large guests
> in such a configuration.
> 
I see.
> > 
> > > > 
> > > > If no, guess we need a x86 ops to tell if a zapping is required.
> > > 
> > > Sorry don't think I quite understand the suggestion. What would this
> > > zapping be covering vs. the invalidation sequence that currently happens
> > > in kvm_vm_ioctl_set_mem_attributes()?
> > 
> > I was thinking that zapping of the mapping in EPT/NPT was required by TDX
> > while SNP might only need an RMP update + TLB flush. Thus, the abstraction
> > of the kvm_x86_ops.update_mem_attr should sit at this level. But let's
> > scratch this for now as I need to dig more about the lazy pinning stuff.
> > 
> > > 
> > > > 
> > > > Back to the lock, updating RMP table doesn't require a mutex. Taking
> > > > the lock is required when updating the directmap. both TDX/SNP requires
> > > > this update the directmap when changing memory attributes.
> > > 
> > > Is that confirmed? If so, do you have a pointer to the associated
> > > documentation? I'm a bit unclear on this due to above-mentioned
> > > discussion.
> > > 
> > > > 
> > > > Wouldn't it better to factor the touching directmap of kernel part out?
> > > 
> > > It actually needs to happen before the RMPUPDATE. As soon as there is a
> > > shared->private conversion in the RMP table for a particular PFN, then
> > > any access via directmap by any particular kernel thread to any PFN that
> > > happens to be in the same physical 2M range can cause an RMP fault on
> > > the host, which would be fatal. So the rmpupdate() helper in this series
> > > will unmap directmap entry corresponding the PFN before a shared->private
> > > RMPUPDATE, and restore mappings after private->shared RMPUPDATE
> > > 
> > > So we could still factor it out, but it would be something like:
> > > 
> > >   if (attr == private)
> > >     kvm_unmap_directmap(start, end)
> > >   kvm_mem_attrs_changed()
> > >   if (attr == shared)
> > >     kvm_map_directmap(start, end)
> > > 
> > > > 
> > > > Then you can call the x86 ops.update_mem_attr() in kvm_mem_attrs_changed().
> > > > And update the direct kernel mapping for both TDX/SNP in the
> > > > kvm_post_mem_attrs_changed().
> > > 
> > > Or, adjusting for the above logic, move the unmapping/mapping to a new
> > > kvm_pre_mem_attrs_changed() and kvm_post_mem_attrs_changed(), respectively.
> > > 
> > > Which seems pretty reasonable to me. Then we can:
> > >  - drop duplicating the kvm_for_each_memslot_in_gfn_range() walk stuff because
> > >    we'd just need to know what PFNs to map/unmap from directmap
> > >    (although we'd still need a loop around kvm_restrictedmem_get_pfn()
> > >    for the GFN range so not necessarily prettier)
> > >  - call the RMPUPDATE / corresponding TDX handling via kvm_mem_attrs_changed()
> > >    which brings it both under KVM MMU lock and also let's it piggyback
> > >    off the fault->mmu_seq handling so it doesn't get out of sync with
> > >    xarray during fault time.
> > > 
> > 
> > That sounds better. I am just little bit worried that update_mem_attr() will
> > end up as an SNP-only callback.
> 
> If it really ends up looking like an SNP-only thing, I don't see any
> immediate issue with deferring this handling until fault time. But the
> previous constraints remain:
> 
>  - directmap unmap needs to happen before shared->private RMPUPDATE,
>    can't be called while holding KVM MMU lock or other spinlock
>  - RMPUPDATE, and that needs to happen outside 
>  - directmap restore needs to happen after private->shared RMPUPDATE,
>    can't be called while holding KVM MMU lock or other spinlock
> 
> I saw the TDX patches added some x86 ops / hooks in KVM MMU to handle
> mapping secure pages. Is there anything there you think is worth
> re-using/re-purposing for SNP use-case?
>

They are mainly for the TDP MMU to manipulate the TDX secure EPT through
firmware call. Due to the TDX architecture (1 shared + 1 secure EPT) and
mirror secure EPT design, TDP MMU is aware there is a "secure EPT but with
reduced-features", when  a #PF happens which EPT should be operated, and
when memory attr is changed which EPT should be zapped. Those callbacks
in TDP MMU are used for maintaining the mirror secure EPT.

SNP designs in a different way. It doesn't need the TDP MMU to be aware of
"there is another kind of NPT" as the enforced check is done in RMP, SNP can
easily, keep all the RMP routines out of the TDP MMU. Also, SNP doesn't need
 a mirror NPT. Thus, SNP doesn't need those callbacks.

In *theory*, it is possible to re-purpose some of the callbacks and merge
RMP update into TDP MMU, like omitting the manipulating mirror secure EPT part
for SNP, but updating the RMP in the callback of setting SPTE. But I have to
think about what would be the benefit as it doesn't fit the SNP HW nature.


> Thanks,
> 
> -Mike
> 
> > 
> > > But would be good to hear others' opinions on this. And also confirm
> > > whether TDX needs that pre/post directmap handle or not.
> > 
> > Yes.
> > 
> > > 
> > > Thanks!
> > > 
> > > -Mike
> > > 
> > > > 
> > > > > 
> > > > > -Mike
> > > > > 
> > > > > > 
> > > > > > From 7c618c1f3c236c382e64680efcbe7d8a672aa870 Mon Sep 17 00:00:00 2001
> > > > > > Message-Id: <7c618c1f3c236c382e64680efcbe7d8a672aa870.1679114841.git.isaku.yamahata@intel.com>
> > > > > > In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > > > > References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > > > Date: Fri, 17 Mar 2023 12:00:09 -0700
> > > > > > Subject: [PATCH 4/4] KVM: x86: Add 'set_mem_attr' x86 op
> > > > > > 
> > > > > > This callback will do any platform-specific handling needed for
> > > > > > converting pages between shared/private.
> > > > > > 
> > > > > > Originally-by: Michael Roth <michael.roth@amd.com>
> > > > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > > > ---
> > > > > >  arch/x86/include/asm/kvm-x86-ops.h | 1 +
> > > > > >  arch/x86/include/asm/kvm_host.h    | 2 ++
> > > > > >  arch/x86/kvm/mmu/mmu.c             | 1 +
> > > > > >  3 files changed, 4 insertions(+)
> > > > > > 
> > > > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > > > > > index dc5f18ac0bd5..956db2ee25a5 100644
> > > > > > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > > > > > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > > > > > @@ -100,6 +100,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
> > > > > >  KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
> > > > > >  KVM_X86_OP(load_mmu_pgd)
> > > > > >  KVM_X86_OP(fault_is_private)
> > > > > > +KVM_X86_OP_OPTIONAL(set_mem_attr)
> > > > > >  KVM_X86_OP_OPTIONAL(link_private_spt)
> > > > > >  KVM_X86_OP_OPTIONAL(free_private_spt)
> > > > > >  KVM_X86_OP_OPTIONAL(split_private_spt)
> > > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > > index 0382d236fbf4..88e11dd3afde 100644
> > > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
> > > > > >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > > > > >  			     int root_level);
> > > > > >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
> > > > > > +	void (*set_mem_attr)(struct kvm *kvm, struct kvm_memory_slot *slot,
> > > > > > +			     unsigned int attr, gfn_t start, gfn_t end);
> > > > > >  
> > > > > >  	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> > > > > >  				void *private_spt);
> > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > > > index 0ec94c72895c..329333486e64 100644
> > > > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > > > @@ -7908,6 +7908,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > > > > >  				    gfn_t start, gfn_t end)
> > > > > >  {
> > > > > >  	kvm_update_lpage_mixed_flag(kvm, slot, true, attrs, start, end);
> > > > > > +	static_call(kvm_x86_set_mem_attr)(kvm, slot, attrs, start, end);
> > > > > >  }
> > > > > >  
> > > > > >  void kvm_memory_attributes_create_memslot(struct kvm *kvm,
> > > > > > -- 
> > > > > > 2.25.1
> > > > > > 
> > > > > > -- 
> > > > > > Isaku Yamahata <isaku.yamahata@gmail.com>  
> > > > 
> > > > 
> > 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address
  2023-03-01 16:21   ` Dave Hansen
@ 2023-03-28 23:31     ` Michael Roth
  2023-04-11 18:27       ` Dave Hansen
  0 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-03-28 23:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Jarkko Sakkinen

On Wed, Mar 01, 2023 at 08:21:17AM -0800, Dave Hansen wrote:
> On 2/20/23 10:38, Michael Roth wrote:
> > +static int handle_split_page_fault(struct vm_fault *vmf)
> > +{
> > +	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> > +	return 0;
> > +}
> > +
> >  /*
> >   * By the time we get here, we already hold the mm semaphore
> >   *
> > @@ -5078,6 +5084,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> >  				pmd_migration_entry_wait(mm, vmf.pmd);
> >  			return 0;
> >  		}
> > +
> > +		if (flags & FAULT_FLAG_PAGE_SPLIT)
> > +			return handle_split_page_fault(&vmf);
> 
> I asked this long ago, but how do you prevent these faults from
> occurring on hugetlbfs mappings that can't be split?
> 

In v6 there used to be a KVM ioctl to register a user HVA range for use
with SEV-SNP guests, and as part of that registration the code would scan
all the VMAs encompassed by that range and check for VM_HUGETLB in
vma->vm_flags.

With v7+ this registration mechanism has been replaced with the
new restricted memfd implementation provided by UPM to manage private guest
memory. Normal shmem/memfd backend can specify HugeTLBFS via a
MFD_HUGETLB flag when creating the memfd, but for restricted memfd no
special flags are allowed, so HugeTLBFS isn't possible for the pages
that are used for private memory. Though it might make sense to enforce
that in SNP-specific code still, in case restricted memfd does
eventually gain that ability...

But now, with v7+, the non-private memory that doesn't get allocated via
restricted memfd (and thus can actually be mapped into userspace and
used for things like buffers shared between host/guest), can still be
allocated via HugeTLBFS since there is nothing SNP is doing to
specifically guard against that. So we'd probably want to reimplement
similar logic to what was in v6 to guard against this, since it's these
mapping that would potentially be triggering the RMP faults and require
splitting.

However...

The fact that any pages potentially triggering these #PFs are able to be
mapped as 2M in the first place means that all the PFNs covered by that
2M mapping must also been allocated by via mappable/VMA memory rather
than via restricted memfd where userspace mappings are not possible.

So I think we should be able to drop this patch entirely, as well as
allow the use of HugeTLBFS for non-restricted memfd memory (though
eventually the guest will switch all its memory to private/restricted
so not gaining much there other than reducing management complexity).

-Mike

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 12/56] x86/sev: Add RMP entry lookup helpers
  2023-03-03 15:28   ` Vlastimil Babka
@ 2023-03-29 22:59     ` Michael Roth
  2023-04-20 16:31       ` Vlastimil Babka
  0 siblings, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-03-29 22:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, Brijesh Singh

On Fri, Mar 03, 2023 at 04:28:39PM +0100, Vlastimil Babka wrote:
> On 2/20/23 19:38, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
> > entry for a given page. The RMP entry format is documented in AMD PPR, see
> > https://bugzilla.kernel.org/attachment.cgi?id=296015.
> > 
> > Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> 
> > +/*
> > + * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
> > + * and -errno if there is no corresponding RMP entry.
> > + */
> 
> Hmm IMHO the kernel's idiomatic way is to return 0 on "success" and I'd
> assume the more intuitive expectation of success here if the entry is
> assigned?

In general I'd agree. Here's it's a little awkward though.
snp_lookup_rmpentry() sort of wants to be a bool, where true indicates
an assigned entry was found, false indicates no assigned entry.

But it also has to deal with error values, so the most direct way to
encapsulate that is true == 1, false == 0, and < 0 for errors.

Inverting it to align more with kernel expections of 0 == success/true
gets awkward too, because stuff like:

  if (snp_lookup_rmpentry(...))
    //error

still doesn't work the way most other functions written in this way
would since it could still be "successful" if we were expecting PFN to
be in shared state. So the return value needs special handling there
too.

Would it make sense to define it something like this?:

  /*
   * Query information about the RMP entry corresponding to the given
   * PFN.
   * 
   * Returns 0 on success, and -errno if there was a problem accessing
   * the RMP entry.
   */
  int snp_lookup_rmpentry(u64 pfn, int *level, bool *assigned)

> The various callers seem to differ though so I guess it depends on
> context. Some however don't distinguish their "failure" from an ERR and
> maybe they should, at least for the purposes of the various printks?

Yes, regardless of what we decide above, the call-sites should properly
distinguish between failure/assigned/not-assigned and report the
information accordingly. I'll get those fixed up where needed.

Thanks,

-Mike

> 
> > +int snp_lookup_rmpentry(u64 pfn, int *level)
> > +{
> > +	struct rmpentry *e;
> > +
> > +	e = __snp_lookup_rmpentry(pfn, level);
> > +	if (IS_ERR(e))
> > +		return PTR_ERR(e);
> > +
> > +	return !!rmpentry_assigned(e);
> > +}
> > +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' x86 op
  2023-03-28 23:00               ` Zhi Wang
@ 2023-03-29 23:50                 ` Michael Roth
  0 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-03-29 23:50 UTC (permalink / raw)
  To: Zhi Wang
  Cc: Isaku Yamahata, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania

On Wed, Mar 29, 2023 at 02:00:11AM +0300, Zhi Wang wrote:
> On Mon, 27 Mar 2023 23:36:35 -0500
> Michael Roth <michael.roth@amd.com> wrote:
> 
> > On Thu, Mar 23, 2023 at 08:17:16PM +0200, Zhi Wang wrote:
> > > On Tue, 21 Mar 2023 20:58:38 -0500
> > > Michael Roth <michael.roth@amd.com> wrote:
> > > 
> > > > On Tue, Mar 21, 2023 at 01:21:36PM +0200, Zhi Wang wrote:
> > > > > On Mon, 20 Mar 2023 13:05:43 -0500
> > > > > Michael Roth <michael.roth@amd.com> wrote:
> > > > > 
> > > > > > On Fri, Mar 17, 2023 at 09:56:11PM -0700, Isaku Yamahata wrote:
> > > > > > > On Mon, Feb 20, 2023 at 12:37:53PM -0600,
> > > > > > > Michael Roth <michael.roth@amd.com> wrote:
> > > > > > >   
> > > > > > > > This callback will do any platform-specific handling needed for
> > > > > > > > converting pages between shared/private.
> > > > > > > > 
> > > > > > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > > > > > ---
> > 
> > <snip>
> > 
> > > > > > > >  static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > > > > > >  					   struct kvm_memory_attributes *attrs)
> > > > > > > >  {
> > > > > > > > @@ -2602,6 +2628,9 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> > > > > > > >  	kvm_mmu_invalidate_end(kvm);
> > > > > > > >  	KVM_MMU_UNLOCK(kvm);
> > > > > > > >  
> > > > > > > > +	if (i > start)
> > > > > > > > +		kvm_post_mem_attrs_changed(kvm, attrs->attributes, start, i);
> > > > > > > > +  
> > > > > > > 
> > > > > > > Doesn't kvm_arch_set_memory_attributes() work for you? i.e the following patch.
> > > > > > > The error check and pr_warn_ratelimited() can be pushed down into the callback.  
> > > > > > 
> > > > > > This is originally how I had but when CONFIG_PREEMPT_COUNT is set this
> > > > > > will generate warnings for this callback as well as the invalidation
> > > > > > callback as reported in v7 here:
> > > > > > 
> > > > > >   https://lore.kernel.org/lkml/Y80vhKwQyw8hS%2F22@notebook/
> > > > > > 
> > > > > > The main issue is that kvm_mem_attrs_changed() is called while holding
> > > > > > the KVM MMU lock, which disables preemption. But when updating
> > > > > > attributes for SNP, we also need to remove private pages from kernel
> > > > > > directmap, which involves acquiring a mutex which results in
> > > > > > "BUG: scheduling while atomic" warnings.
> > > > > > 
> > > > > > So that's why we ended up somewhat duplicating some of the logic and
> > > > > > using a separate callback chain that happens out of KVM MMU lock.
> > > > > 
> > > > > Let's split the things of changing memory attributes:
> > > > > 
> > > > > 1) Update the memory attributes in the xa array (Both TDX and SNP)
> > > > > 2) Zapping the EPT/NPT mappings (Required by TDX)
> > > > > 3) Update RMP table (Required by SNP)
> > > > > 4) Update the directmap of kernel (SNP, but I guess TDX needs it as well)
> > > > 
> > > 
> > > Thanks for the effort of detailed reply. It is very informative.
> > > 
> > > > I'm not so sure TDX requires this. I was under that impression, but
> > > > Kirill raised some doubts about this and I'm not sure it's been
> > > > confirmed. If it's purely an SNP thing then there may not be much value
> > > > in creating a separate callback for it:
> > > > 
> > > >   https://lore.kernel.org/linux-mm/20221031141426.GA3994099@chaop.bj.intel.com/T/#meba4ce80709cd3afd3818b61e6419fd800287b9e
> > > > 
> > > 
> > > Hmm, Krill and Isaku, can you confirm that TDX doesn't need this?
> > > 
> > > I think it is a generic requirement that TDX/SNP are not expecting the
> > > host to touch a private page either from the kernel or the userspace. 
> > 
> > This main issue is that in the case of the RMP table, a write to a 2M
> > mapping is interpreted as a write to a private page even if the write
> > didn't actually touch any private pages within the range. It may be
> > possible in firmware/hardware to distinguish between these 2 cases, but I'm
> > not sure whether that's the case for TDX or not.
> > 
> 
> I was thinking this today. SNP uses this trick to cope with the existing
> HW design. It is mandatory for SNP. What was the benefit of
> removing/restoring the directmap for the purpose of restricting the access
> from the kernel for TDX? Probably catching a wrong touch to a private
> kernel page from the kernel space instead of triggering MCE.

Not sure, but yah might've been something like that.

> 
> My point goes to let them stay in SNP for now as they seems closely coupled
> with RMUPDATE so far and the timing of doing RMUPDATE is still under
> discussion. Maybe we invent some new pre/post, but they turn not necessary
> later if the timing of RMUPDATE is changed.

Makes sense.

> 
> > > 
> > > > And for SNP, the current code does the unmapping/RMP update in the same
> > > > function:
> > > > 
> > > >   [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
> > > > 
> > > > I'm not against splitting RMP/directmap handling, but just want to
> > > > understand what the requirements are around that a bit better.
> > > > 
> > > > Does handling the #3 / RMP update / kvm_arch_post_set_memory_attributes
> > > > stuff outside of MMU lock cause issues on TDX side? What sort of
> > > > handling is needed in these callbacks for TDX (if anything)?
> > > > 
> > > 
> > > No, it doesn't cause problem for TDX as TDX doesn't need such callback.
> > > 
> > > Unlike SNP, which has (1 NPT + 1 RMP) and the enforced HW check is done by RMP, TDX has
> > > two EPT(smiliar with NPT)s (1 shared + 1 private). Converting the memory attr is achieved
> > > by zapping the mapping from one EPT and creating the mapping in the other one in the fault
> > > when guest access the memory. The fault GPA will carry the "SHARED" bit (!C-BIT), so
> > > KVM knows which EPT should be chosen for populating the mapping.
> > > 
> > > I was trying to figure out what should be the proper callback and at which layer it should
> > > sit for achieving changing memory attr for both TDX/SNP. The current callback looks a little
> > > bit hacky. Duplicate code pieces because of locks implies the SW structure might need to be
> > > re-considered.
> > > 
> > > > > 
> > > > > Does SNP really need to zap the NPT mappings when changing the memory
> > > > > attributes? (The new mappings will be created later in the fault). I don't
> > > > > find this requirement from APM.
> > > > 
> > > > I don't think we've added anything specifically for SNP. Do you mean the
> > > > generic kvm_unmap_gfn_range/kvm_flush_remote_tlbs sequence below?
> > > > 
> > > >   kvm_vm_ioctl_set_mem_attributes():
> > > >     KVM_MMU_LOCK(kvm)
> > > >     kvm_mmu_invalidate_begin()
> > > >     ...
> > > >     KVM_MMU_UNLOCK(kvm)
> > > > 
> > > >     kvm_vm_set_region_attr()  // xarray/attribute update
> > > > 
> > > >     ...
> > > >     KVM_MMU_LOCK(kvm)
> > > >     kvm_mem_attrs_changed():
> > > >       flush |= kvm_unmap_gfn_range()
> > > >       if (flush)
> > > >         kvm_flush_remote_tlbs()
> > > >     KVM_MMU_UNLOCK(kvm)
> > > >
> > > 
> > > Yes, I was talking about the sequence above. I was confused of why changing
> > > RMP requires a zapping-recreating flow of NPT in SNP.
> > 
> > Hmm, so you're suggesting doing something like moving the
> > kvm_unmap_gfn_range()/kvm_flush_remote_tlbs() into .update_mem_attr()
> > callback so the platform can decide if it needs the zap/flush rather
> > than handling it in generic code?
> > 
> > If SNP really did't need the handling it's an interesting thought, but
> > I think it is needed there after all...
> 
> Yes. I was thinking so. When I was reading the APM, I thought one of the
> advantages of SNP architecture was splitting the HW-enforced check from NPT
> mapping management. Thus when updating the RMP, the NPT can stay as it is. I
> guess that's why I was so fancying of eliminating the extra zapping.

Yah it was an interesting thought. Maybe before the switch to UPM it might've
been possible to avoid the zapping, but seems like it's a fairly general
requirement now.

> 
> After reading the code of the UPM, I think we did need the zapping here for
> SNP as pages for private memory and shared memory seem from different
> sources. shared <->private memory conversion needs to wipe the mapping of
> pages from one source and go through a fault-mapping flow on the pages
> from the other source. Still wondering if there is any good idea to improve
> it based on the current approaches.

It may still be possible to avoiding zapping for shared->private
conversions since the subsequent access with C-bit set would cause an
#NPF with RMP bit set, but yah, it won't work for private->shared and
doesn't seem worth it to complicate the logic for anything SNP-specific
unless it would make a significant performance impact, but I think it
would be lost in the noise vs. RMPUPDATE/initial faulting of private
pages/directmap handling/guest exits for page-state change/etc.

> 
> > 
> > > 
> > > > In general, when the RMPUPDATE instruction happens, the TLB entries for
> > > > the GPAs being modified will be flushed, so subsequent nested page fault
> > > > should be able to obtain the updated mapping based on xarray/#NPF at that
> > > > point. In that respect *maybe* we don't need to zap the entries there.
> > > > 
> > > > But if the nested page fault occurs before the RMPUPDATE, I think we would
> > > > have a race if the above sequence isn't in place to handle the unmap/flush,
> > > > since in that case we might get a stale mapping because nothing would've
> > > > forced a tlbflush.
> > > > 
> > > > There's also stuff like the UPM selftests and SEV lazy-pinning where I
> > > > think that kvm_unmap_gfn_range() sequence is also needed. But I might be
> > > > misunderstanding the question here.
> > > > 
> > > 
> > > In this case, an extra tlbflush would solve? Still, the unnecessary
> > > zapping/recreating of mapping is not promising. I understand that the way
> > > how this patch goes is probably to minimize the changes, but it would be
> > > nice to focus more on what is really needed in a common path and abstract
> > > and re-factor from there.
> > 
> > I tested this and just doing the tlbflush is insufficient. If the
> > entries are still present in nested page table HV doesn't necessarily
> > get an #NPF so the guest can still end up with a stale mapping. After a
> > shared->private conversion the guest would cause #NPF with RMP/ENC bits
> > set once it tries to access it as a private page, but for a
> > private->shared conversion the guest can subsequently access the page
> > with ENC-bit=0 without causing an #NPF, but in this case GFN can still
> > be mapped to the PFN restrictedmem was using to back it when it was in
> > private state, instead of normal non-restricted memory.
> > 
> > So it seems like SNP needs the zapping behavior as well, and that it
> > isn't very different from the TDX/SEV-lazy/selftest users. So having
> > common handling seems worthwhile.
> > 
> > > 
> > > Can you elaborate more about how the lazy-pinning unpin path is connected
> > > with the zapping here? So that I can dig more about it.
> > 
> > Just to be clear this is with regard to SEV lazy-pinning, which makes
> > used of restrictedmem mainly for lazy-pinning of private pages, rather
> > than SNP (which also inherits lazy-pinning from restrictedmem).
> > 
> > The main thing with SEV is that the private/shared status of a page is
> > completely up to how the guest decides to map it in its page tables,
> > unlike with SNP where a mismatch between guest-expected status and
> > actual status in the RMP table will generate a #NPF. So SEV would be
> > even more reliant on current zapping behavior to ensure the NPT will
> > be updated.
> > 
> > > 
> > > Selftest is a minor case, guess we deal with them via enabling a switch.
> > > E.g. a prop in debugfs.
> > 
> > Wouldn't want to break things if a guest was running while selftest was
> > running, or something along that line. Especially since it seems to be a
> > common requirement given the above.
> > 
> > > 
> > > > > If yes, can we postpone the update of the RMP table in the later fault,
> > > > > like TDX? So that we can save this update_mem_attr x86 ops as things
> > > > > will be solved in the SNP-specific fault handler.
> > > > 
> > > > Hmm, I think this would be possible. But it's nice to be able to handle
> > > > the RMPUPDATE as part of KVM_SET_MEMORY_ATTRIBUTES, since it allows
> > > > KVM MMU code to rely solely on xarray state and not have to query RMP
> > > > table to check if a particular PFN needs an RMPUPDATE before mapping it
> > > > into RMP table.
> > > > 
> > > > At least... it would *in theory*, if the RMPUPDATE happened under
> > > > protection of mmu_invalidate_seq (in which case it could inherit all the
> > > > same protections KVM MMU has around mmu_invalidate_seq/fault->mmu_seq,
> > > > e.g. letting the guest retry the #PF if fault->mmu_seq is stale).
> > > > 
> > > > But currently, RMPUPDATE (via kvm_arch_post_set_memory_attributes) happens
> > > > *after* the invalidation sequence above, so in theory a guest could fault
> > > > on a page just after xarray state is updated, but before the RMPUPDATE has
> > > > been done, in which case the KVM MMU code would properly map the page
> > > > accordingly to xarray, but since RMPUPDATE wouldn't have happened yet, the
> > > > state of the corresponding PFN in RMP table won't match the shared/private
> > > > access type expected by the guest, so when it tries to access it it will
> > > > get another #NPF with RMP bit set in the error code, which will get
> > > > handled as a no-op in handle_rmp_page_fault() (patch #44) and loop like
> > > > this until the RMPUPDATE is finally done. So it still works out, but
> > > > maybe not keeping as much in sync with xarray state and could be.
> > > > 
> > > 
> > > I see. rmp fault handler only deals with page size mismatch for now.
> > 
> > That's correct.
> > 
> > > 
> > > > But deferring RMPUPDATE to fault time goes in the other direction of
> > > > that. Are there benefits/requirements for doing things this way for TDX?
> > > > I could see it being beneficial in terms of reducing overhead for
> > > > uneeded page-state transitions, since they are only done on-demand but
> > > > doesn't seem like it would be that much overhead compared to some of the
> > > > other operations being done.
> > > > 
> > > 
> > > Besides the HW design, I guess one major purpose is to optimize the
> > > booting time of VMs with large memory. Also, post migration can be another case.
> > 
> > It seems like without lazy-acceptance support in the guest there isn't
> > too much reason to optimize here, since the guest will necessarily fault
> > in every page as part of pre-accepting the memory in OVMF.
> > 
> 
> Do you mean pre-accepting the whole system memory? or only the initial
> allocated memory for OVMF?

I mean "pre-accepting" in the sense of "guests that don't have
lazy-acceptance enabled", so yes the whole of system memory.

> 
> TDX has already been working in a lazy-acceptance style. I believe the
> lazy-acceptance will be the furture direction. But I understand that part
> can be in stage 2 with the lazy-acceptance for SNP.

For SNP the OVMF bits are there:
  https://www.mail-archive.com/devel@edk2.groups.io/msg53857.html

but guest-side kernel support still requires these patches getting upstream:

  Kirill's base+TDX lazy-acceptance support:
    https://lore.kernel.org/linux-mm/Y7QzI0GqNeG9OHy1@zn.tnic/T/

  Tom's follow-on SNP support:
    https://lore.kernel.org/lkml/cover.1670513353.git.thomas.lendacky@amd.com/

  Dionna's patch to instruct OVMF to not pre-accept remaining memory
  after EFI_EXIT_BOOT_SERVICES:
    https://lore.kernel.org/lkml/def9b0b5-b880-be99-fa95-b05d76a91824@intel.com/T/

But yes, I agree lazy-acceptance will probably be the common default for
TDX/SNP guests, where we'd only be zapping entries that we know the
guest is going to be using anyway, as well as avoiding all the other
related overhead.

> 
> > And if we're making use of lazy-acceptance, for SNP at least, we wouldn't
> > end up getting .update_mem_attr() callbacks in the first place since
> > those are ultimately the result of the guest issuing a shared->private
> > conversion request, which would generally happen until just before the
> > guest decides to accept/pvalidate that GFN. So with lazy-acceptance the
> > guest optimizes most of this potential overhead away already.
> I understand there can be 
> > 
> > > 
> > > Out of curiosity, What is the avg cost of RMUPDATE? Suppose it is an x86
> > > instruction and not going through PSP firmware.
> > 
> > Yes, it's an x86 instruction, no firmware calls there. Average seems to
> > be about 1us per instruction. It's not insignificant, up to 5s for a
> > 16GB guest in a worst case scenario where the guest does not optimize
> > for 2MB shared->private conversions and has no lazy-acceptance support,
> > but it doesn't seem like it would be common to try to boot large guests
> > in such a configuration.
> > 
> I see.
> > > 
> > > > > 
> > > > > If no, guess we need a x86 ops to tell if a zapping is required.
> > > > 
> > > > Sorry don't think I quite understand the suggestion. What would this
> > > > zapping be covering vs. the invalidation sequence that currently happens
> > > > in kvm_vm_ioctl_set_mem_attributes()?
> > > 
> > > I was thinking that zapping of the mapping in EPT/NPT was required by TDX
> > > while SNP might only need an RMP update + TLB flush. Thus, the abstraction
> > > of the kvm_x86_ops.update_mem_attr should sit at this level. But let's
> > > scratch this for now as I need to dig more about the lazy pinning stuff.
> > > 
> > > > 
> > > > > 
> > > > > Back to the lock, updating RMP table doesn't require a mutex. Taking
> > > > > the lock is required when updating the directmap. both TDX/SNP requires
> > > > > this update the directmap when changing memory attributes.
> > > > 
> > > > Is that confirmed? If so, do you have a pointer to the associated
> > > > documentation? I'm a bit unclear on this due to above-mentioned
> > > > discussion.
> > > > 
> > > > > 
> > > > > Wouldn't it better to factor the touching directmap of kernel part out?
> > > > 
> > > > It actually needs to happen before the RMPUPDATE. As soon as there is a
> > > > shared->private conversion in the RMP table for a particular PFN, then
> > > > any access via directmap by any particular kernel thread to any PFN that
> > > > happens to be in the same physical 2M range can cause an RMP fault on
> > > > the host, which would be fatal. So the rmpupdate() helper in this series
> > > > will unmap directmap entry corresponding the PFN before a shared->private
> > > > RMPUPDATE, and restore mappings after private->shared RMPUPDATE
> > > > 
> > > > So we could still factor it out, but it would be something like:
> > > > 
> > > >   if (attr == private)
> > > >     kvm_unmap_directmap(start, end)
> > > >   kvm_mem_attrs_changed()
> > > >   if (attr == shared)
> > > >     kvm_map_directmap(start, end)
> > > > 
> > > > > 
> > > > > Then you can call the x86 ops.update_mem_attr() in kvm_mem_attrs_changed().
> > > > > And update the direct kernel mapping for both TDX/SNP in the
> > > > > kvm_post_mem_attrs_changed().
> > > > 
> > > > Or, adjusting for the above logic, move the unmapping/mapping to a new
> > > > kvm_pre_mem_attrs_changed() and kvm_post_mem_attrs_changed(), respectively.
> > > > 
> > > > Which seems pretty reasonable to me. Then we can:
> > > >  - drop duplicating the kvm_for_each_memslot_in_gfn_range() walk stuff because
> > > >    we'd just need to know what PFNs to map/unmap from directmap
> > > >    (although we'd still need a loop around kvm_restrictedmem_get_pfn()
> > > >    for the GFN range so not necessarily prettier)
> > > >  - call the RMPUPDATE / corresponding TDX handling via kvm_mem_attrs_changed()
> > > >    which brings it both under KVM MMU lock and also let's it piggyback
> > > >    off the fault->mmu_seq handling so it doesn't get out of sync with
> > > >    xarray during fault time.
> > > > 
> > > 
> > > That sounds better. I am just little bit worried that update_mem_attr() will
> > > end up as an SNP-only callback.
> > 
> > If it really ends up looking like an SNP-only thing, I don't see any
> > immediate issue with deferring this handling until fault time. But the
> > previous constraints remain:
> > 
> >  - directmap unmap needs to happen before shared->private RMPUPDATE,
> >    can't be called while holding KVM MMU lock or other spinlock
> >  - RMPUPDATE, and that needs to happen outside 
> >  - directmap restore needs to happen after private->shared RMPUPDATE,
> >    can't be called while holding KVM MMU lock or other spinlock
> > 
> > I saw the TDX patches added some x86 ops / hooks in KVM MMU to handle
> > mapping secure pages. Is there anything there you think is worth
> > re-using/re-purposing for SNP use-case?
> >
> 
> They are mainly for the TDP MMU to manipulate the TDX secure EPT through
> firmware call. Due to the TDX architecture (1 shared + 1 secure EPT) and
> mirror secure EPT design, TDP MMU is aware there is a "secure EPT but with
> reduced-features", when  a #PF happens which EPT should be operated, and
> when memory attr is changed which EPT should be zapped. Those callbacks
> in TDP MMU are used for maintaining the mirror secure EPT.
> 
> SNP designs in a different way. It doesn't need the TDP MMU to be aware of
> "there is another kind of NPT" as the enforced check is done in RMP, SNP can
> easily, keep all the RMP routines out of the TDP MMU. Also, SNP doesn't need
>  a mirror NPT. Thus, SNP doesn't need those callbacks.
> 
> In *theory*, it is possible to re-purpose some of the callbacks and merge
> RMP update into TDP MMU, like omitting the manipulating mirror secure EPT part
> for SNP, but updating the RMP in the callback of setting SPTE. But I have to
> think about what would be the benefit as it doesn't fit the SNP HW nature.

Yah, can't think of any benefit currently. With both pre-acceptance and
lazy-acceptance those pages will get touched shorty after
KVM_SET_MEMORY_ATTRIBUTES, so the more obvious benefit of avoiding
unecessary RMPUPDATEs doesn't end up working here. And in terms of re-using
callbacks it seems like the TDX and SNP hooks are both where it makes the
most sense to have them. Would be nice to have a shared callback for
this but might not be the right approach here.

-Mike

> 
> 
> > Thanks,
> > 
> > -Mike
> > 
> > > 
> > > > But would be good to hear others' opinions on this. And also confirm
> > > > whether TDX needs that pre/post directmap handle or not.
> > > 
> > > Yes.
> > > 
> > > > 
> > > > Thanks!
> > > > 
> > > > -Mike
> > > > 
> > > > > 
> > > > > > 
> > > > > > -Mike
> > > > > > 
> > > > > > > 
> > > > > > > From 7c618c1f3c236c382e64680efcbe7d8a672aa870 Mon Sep 17 00:00:00 2001
> > > > > > > Message-Id: <7c618c1f3c236c382e64680efcbe7d8a672aa870.1679114841.git.isaku.yamahata@intel.com>
> > > > > > > In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > > > > > References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@intel.com>
> > > > > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > > > > Date: Fri, 17 Mar 2023 12:00:09 -0700
> > > > > > > Subject: [PATCH 4/4] KVM: x86: Add 'set_mem_attr' x86 op
> > > > > > > 
> > > > > > > This callback will do any platform-specific handling needed for
> > > > > > > converting pages between shared/private.
> > > > > > > 
> > > > > > > Originally-by: Michael Roth <michael.roth@amd.com>
> > > > > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > > > > ---
> > > > > > >  arch/x86/include/asm/kvm-x86-ops.h | 1 +
> > > > > > >  arch/x86/include/asm/kvm_host.h    | 2 ++
> > > > > > >  arch/x86/kvm/mmu/mmu.c             | 1 +
> > > > > > >  3 files changed, 4 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > > > > > > index dc5f18ac0bd5..956db2ee25a5 100644
> > > > > > > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > > > > > > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > > > > > > @@ -100,6 +100,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
> > > > > > >  KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
> > > > > > >  KVM_X86_OP(load_mmu_pgd)
> > > > > > >  KVM_X86_OP(fault_is_private)
> > > > > > > +KVM_X86_OP_OPTIONAL(set_mem_attr)
> > > > > > >  KVM_X86_OP_OPTIONAL(link_private_spt)
> > > > > > >  KVM_X86_OP_OPTIONAL(free_private_spt)
> > > > > > >  KVM_X86_OP_OPTIONAL(split_private_spt)
> > > > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > > > index 0382d236fbf4..88e11dd3afde 100644
> > > > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > > @@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
> > > > > > >  	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> > > > > > >  			     int root_level);
> > > > > > >  	bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
> > > > > > > +	void (*set_mem_attr)(struct kvm *kvm, struct kvm_memory_slot *slot,
> > > > > > > +			     unsigned int attr, gfn_t start, gfn_t end);
> > > > > > >  
> > > > > > >  	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> > > > > > >  				void *private_spt);
> > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > > > > index 0ec94c72895c..329333486e64 100644
> > > > > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > > > > @@ -7908,6 +7908,7 @@ void kvm_arch_set_memory_attributes(struct kvm *kvm,
> > > > > > >  				    gfn_t start, gfn_t end)
> > > > > > >  {
> > > > > > >  	kvm_update_lpage_mixed_flag(kvm, slot, true, attrs, start, end);
> > > > > > > +	static_call(kvm_x86_set_mem_attr)(kvm, slot, attrs, start, end);
> > > > > > >  }
> > > > > > >  
> > > > > > >  void kvm_memory_attributes_create_memslot(struct kvm *kvm,
> > > > > > > -- 
> > > > > > > 2.25.1
> > > > > > > 
> > > > > > > -- 
> > > > > > > Isaku Yamahata <isaku.yamahata@gmail.com>  
> > > > > 
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 04/56] KVM: Add HVA range operator
  2023-03-27  0:34     ` Michael Roth
@ 2023-04-04 14:40       ` Zhi Wang
  0 siblings, 0 replies; 147+ messages in thread
From: Zhi Wang @ 2023-04-04 14:40 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Vishal Annapurve

On Sun, 26 Mar 2023 19:34:44 -0500
Michael Roth <michael.roth@amd.com> wrote:

> On Mon, Feb 20, 2023 at 11:37:09PM +0200, Zhi Wang wrote:
> > On Mon, 20 Feb 2023 12:37:55 -0600
> > Michael Roth <michael.roth@amd.com> wrote:
> > 
> > > From: Vishal Annapurve <vannapurve@google.com>
> > > 
> > > Introduce HVA range operator so that other KVM subsystems
> > > can operate on HVA range.
> > > 
> > > Signed-off-by: Vishal Annapurve <vannapurve@google.com>
> > > [mdr: minor checkpatch alignment fixups]
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > >  include/linux/kvm_host.h |  6 +++++
> > >  virt/kvm/kvm_main.c      | 48 ++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 54 insertions(+)
> > > 
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 4d542060cd93..c615650ed256 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -1402,6 +1402,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm);
> > >  void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
> > >  void kvm_mmu_invalidate_end(struct kvm *kvm);
> > >  
> > > +typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
> > > +				struct kvm_gfn_range *range, void *data);
> > > +
> > > +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
> > > +			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
> > > +
> > >  long kvm_arch_dev_ioctl(struct file *filp,
> > >  			unsigned int ioctl, unsigned long arg);
> > >  long kvm_arch_vcpu_ioctl(struct file *filp,
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index f7e00593cc5d..4ccd655dd5af 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -642,6 +642,54 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
> > >  	return (int)ret;
> > >  }
> > >  
> > 
> > Below function seems a reduced duplicate of __kvm_handle_hva_range()
> > in virt/kvm/kvm_main.c. It would be nice to factor __kvm_handle_hva_range().
> 
> A few differences make it difficult to refactor this clearly:
> 
>   - This handler is mainly used for loading initial contents into guest
>     image before booting and doesn't rely on the MMU lock being held. It
>     also *can't* be called with MMU lock held because it suffers from the
>     same issue with mem_attr_update() hook where it needs to take a
>     mutex as part of unmapping from directmap when transitioning page to
>     private state in RMP table
>   - This handler wants to return an error code, as opposed to existing
>     handlers which return a true/false values which are passed along to
>     MMU notifier call-site and handled differently.
>   - This handler wants to terminate iterating through memslots as soon
>     as it encounters the first failure, whereas the existing handlers
>     expect to be called for each slot regardless of return value.
> 
> So it's a pretty different use-case that adds enough complexity to
> __kvm_handle_hva_range() that it might need be worth refactoring it,
> since it complicates some bits that are closely tied to dealing with
> invalidations where the extra complexity probably needs to be
> well-warranted.
> 
> I took a stab at it here for reference, but even with what seems to be
> the minimal set of changes it doesn't save on any code and ultimately I
> think it makes it harder to make sense of what going on:
> 
>   https://github.com/mdroth/linux/commit/976c5fb708f7babe899fd80e27e19f8ba3f6818d
> 
> Is there a better approach?
> 

Those requirements looks pretty suitable for kvm_handle_hva_range(). Guess
we just need to extend the iterator a little bit.

My ideas:

1) Add a lock flag in struct kvm_hva_range to indicate if kvm_lock is required
or not during the iteration. Check the flag with if (!locked && hva_range.need_lock). Then the unlock part can be left un-touched.

2) Add an error code in struct kvm_gfn_range, the handler can set it so that __kvm_handle_hva_range() can check gfn_range.err after ret|= handler(xxx);
If the err is set, bail out.

3) Return the gfn_range.err to the caller. The caller can decide how to convert
it (to boolean or keep it)

4) Set hva_range.need_lock in the existing and the new caller.

How about this?

> Thanks,
> 
> -Mike
> 
> > 
> > > +int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
> > > +			   unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
> > > +{
> > > +	int ret = 0;
> > > +	struct kvm_gfn_range gfn_range;
> > > +	struct kvm_memory_slot *slot;
> > > +	struct kvm_memslots *slots;
> > > +	int i, idx;
> > > +
> > > +	if (WARN_ON_ONCE(hva_end <= hva_start))
> > > +		return -EINVAL;
> > > +
> > > +	idx = srcu_read_lock(&kvm->srcu);
> > > +
> > > +	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> > > +		struct interval_tree_node *node;
> > > +
> > > +		slots = __kvm_memslots(kvm, i);
> > > +		kvm_for_each_memslot_in_hva_range(node, slots,
> > > +						  hva_start, hva_end - 1) {
> > > +			unsigned long start, end;
> > > +
> > > +			slot = container_of(node, struct kvm_memory_slot,
> > > +					    hva_node[slots->node_idx]);
> > > +			start = max(hva_start, slot->userspace_addr);
> > > +			end = min(hva_end, slot->userspace_addr +
> > > +						  (slot->npages << PAGE_SHIFT));
> > > +
> > > +			/*
> > > +			 * {gfn(page) | page intersects with [hva_start, hva_end)} =
> > > +			 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
> > > +			 */
> > > +			gfn_range.start = hva_to_gfn_memslot(start, slot);
> > > +			gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
> > > +			gfn_range.slot = slot;
> > > +
> > > +			ret = handler(kvm, &gfn_range, data);
> > > +			if (ret)
> > > +				goto e_ret;
> > > +		}
> > > +	}
> > > +
> > > +e_ret:
> > > +	srcu_read_unlock(&kvm->srcu, idx);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > >  static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
> > >  						unsigned long start,
> > >  						unsigned long end,
> > 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2023-02-24 12:37   ` Alexander Graf
  2023-02-28 20:47     ` Zhi Wang
@ 2023-04-04 22:48     ` Michael Roth
  2023-04-05 15:20       ` Tom Lendacky
  1 sibling, 1 reply; 147+ messages in thread
From: Michael Roth @ 2023-04-04 22:48 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh

On Fri, Feb 24, 2023 at 01:37:48PM +0100, Alexander Graf wrote:
> 
> On 20.02.23 19:38, Michael Roth wrote:
> > From: Tom Lendacky <thomas.lendacky@amd.com>
> > 
> > Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
> > guests to alter the register state of the APs on their own. This allows
> > the guest a way of simulating INIT-SIPI.
> > 
> > A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
> > so as to avoid updating the VMSA pointer while the vCPU is running.
> > 
> > For CREATE
> >    The guest supplies the GPA of the VMSA to be used for the vCPU with
> >    the specified APIC ID. The GPA is saved in the svm struct of the
> >    target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
> >    to the vCPU and then the vCPU is kicked.
> > 
> > For CREATE_ON_INIT:
> >    The guest supplies the GPA of the VMSA to be used for the vCPU with
> >    the specified APIC ID the next time an INIT is performed. The GPA is
> >    saved in the svm struct of the target vCPU.
> > 
> > For DESTROY:
> >    The guest indicates it wishes to stop the vCPU. The GPA is cleared
> >    from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
> >    added to vCPU and then the vCPU is kicked.
> > 
> > The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
> > as a result of the event or as a result of an INIT. The handler sets the
> > vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
> > leave the vCPU as not runnable. Any previous VMSA pages that were
> > installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
> > a new VMSA is to be installed, the VMSA guest page is pinned and set as
> > the VMSA in the vCPU VMCB and the vCPU state is set to
> > KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
> > cleared in the vCPU VMCB and the vCPU state is left as
> > KVM_MP_STATE_UNINITIALIZED to prevent it from being run.
> > 
> > Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > [mdr: add handling for restrictedmem]
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> 
> 
> What is the intended boot sequence for SEV-SNP guests? FWIW with this
> interface in place, guests will typically use in-guest VMSA pages to hold
> secondary vcpu state. But that means we're now allocating 4kb of memory for
> every vcpu that we create that will be for most of the guest's lifetime
> superfluous.
> 
> Wouldn't it make more sense to have a model where we only allocate the VMSA
> for the boot CPU and leave secondary allocation to the guest? We already
> need firmware changes for SEV-SNP - may as well make this one more.

I don't think we'd necessarily need a firmware change. We could just
free original VMSA back to the hypervisor as soon as those APs come
online. The down-side to that versus deferring cleaning till guest
shutdown is there is some flushing activity (see:
sev_flush_encrypted_page()) that would now likely be occuring during
guest boot up where the overhead might be more noticeable. But for SNP
the host likely supports X86_FEATURE_SME_COHERENT so the overhead
probably isn't that bad.

> 
> [...]
> 
> > +
> > +static int sev_snp_ap_creation(struct vcpu_svm *svm)
> > +{
> > +       struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
> > +       struct kvm_vcpu *vcpu = &svm->vcpu;
> > +       struct kvm_vcpu *target_vcpu;
> > +       struct vcpu_svm *target_svm;
> > +       unsigned int request;
> > +       unsigned int apic_id;
> > +       bool kick;
> > +       int ret;
> > +
> > +       request = lower_32_bits(svm->vmcb->control.exit_info_1);
> > +       apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
> > +
> > +       /* Validate the APIC ID */
> > +       target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
> 
> 
> Out of curiosity: The target CPU can be my own vCPU, right?

I don't think that would be the normal behavior, but maybe with some
care it's possible for a guest to do things that way. I haven't seen
anything strictly prohibiting this in the relevant specs.

> 
> 
> > +       if (!target_vcpu) {
> > +               vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
> > +                           apic_id);
> > +               return -EINVAL;
> > +       }
> > +
> > +       ret = 0;
> > +
> > +       target_svm = to_svm(target_vcpu);
> > +
> > +       /*
> > +        * The target vCPU is valid, so the vCPU will be kicked unless the
> > +        * request is for CREATE_ON_INIT. For any errors at this stage, the
> > +        * kick will place the vCPU in an non-runnable state.
> > +        */
> > +       kick = true;
> > +
> > +       mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
> > +
> > +       target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
> > +       target_svm->sev_es.snp_ap_create = true;
> > +
> > +       /* Interrupt injection mode shouldn't change for AP creation */
> > +       if (request < SVM_VMGEXIT_AP_DESTROY) {
> > +               u64 sev_features;
> > +
> > +               sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
> > +               sev_features ^= sev->sev_features;
> > +               if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
> > +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
> > +                                   vcpu->arch.regs[VCPU_REGS_RAX]);
> > +                       ret = -EINVAL;
> > +                       goto out;
> > +               }
> > +       }
> > +
> > +       switch (request) {
> > +       case SVM_VMGEXIT_AP_CREATE_ON_INIT:
> > +               kick = false;
> > +               fallthrough;
> > +       case SVM_VMGEXIT_AP_CREATE:
> > +               if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
> > +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
> > +                                   svm->vmcb->control.exit_info_2);
> > +                       ret = -EINVAL;
> > +                       goto out;
> > +               }
> > +
> > +               /*
> > +                * Malicious guest can RMPADJUST a large page into VMSA which
> > +                * will hit the SNP erratum where the CPU will incorrectly signal
> > +                * an RMP violation #PF if a hugepage collides with the RMP entry
> > +                * of VMSA page, reject the AP CREATE request if VMSA address from
> > +                * guest is 2M aligned.
> 
> 
> This will break genuine current Linux kernels that just happen to allocate a
> guest page, no? In fact, given enough vCPUs you're almost guaranteed to hit
> an aligned structure somewhere. What is the guest supposed to do in that
> situation?

The initial SNP support for guest kernels already made use of
snp_alloc_vmsa_page() to do the appropriate workaround to avoid allocating
2MB-aligned VMSA pages.

-Mike

> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2023-03-01 21:14       ` Alexander Graf
@ 2023-04-05  0:54         ` Michael Roth
  0 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-04-05  0:54 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Zhi Wang, kvm, linux-coco, linux-mm, linux-crypto, x86,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
	bp, vbabka, kirill, ak, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh

On Wed, Mar 01, 2023 at 10:14:19PM +0100, Alexander Graf wrote:
> 
> On 28.02.23 21:47, Zhi Wang wrote:
> > On Fri, 24 Feb 2023 13:37:48 +0100
> > Alexander Graf <graf@amazon.com> wrote:
> > 
> > > On 20.02.23 19:38, Michael Roth wrote:
> > > > From: Tom Lendacky <thomas.lendacky@amd.com>
> > > > 
> > > > Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
> > > > guests to alter the register state of the APs on their own. This allows
> > > > the guest a way of simulating INIT-SIPI.

<snip>

> > > > +                */
> > > > +               if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
> > > > +                       vcpu_unimpl(vcpu,
> > > > +                                   "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
> > > > +                                   svm->vmcb->control.exit_info_2);
> > > > +                       ret = -EINVAL;
> > > > +                       goto out;
> > > > +               }
> > > > +
> > > > +               target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
> > > > +               break;
> > > > +       case SVM_VMGEXIT_AP_DESTROY:
> > > 
> > > I don't understand the destroy path. Why does this case destroy anything?
> > > 
> > > 
> > > > +               break;
> > > > +       default:
> > > > +               vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
> > > > +                           request);
> > > > +               ret = -EINVAL;
> > > > +               break;
> > > > +       }
> > > > +
> > > > +out:
> > > > +       if (kick) {
> > > > +               if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
> > > > +                       target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> > > 
> > > What if the guest AP goes through a create -> destroy -> create cycle?
> > > Will it stay runnable while destroyed?
> > The code is not very straightforward.
> > 
> > 1) target_svm->sev_es.snp_vmsa_gpa is set as INVALID_PAGE in the beginning of this function.
> > 
> > 2) If a DESTROY is hit in this function, target_svm->sev_es.snp_vmsa_gpa will be
> > left as INVALID_PAGE.
> > 
> > 3) At the end of this function, it calls kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE).
> > 
> > 4) In the vcpu_enter_guest(), the kvm_vcpu_reset()->sev_snp_init_protected_guest_state()
> > ->__sev_snp_init_protected_guest_state() is called.
> > 
> > 5) The mp_state is set to KVM_MP_STATE_STOPPED by default and the runtime VMSA is
> > cleared. Then the it will be initialized according to the guest's
> > configuration.
> > 
> > 6) As the snp_vmsa_gpa is set as INVALID_PAGE in 1, the mp_state will be left as
> > KVM_MP_STATE_STOPPED.
> > 
> > 7) With this code piece:
> > 
> > +                       kvm_vcpu_reset(vcpu, true);
> > +                       if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
> > +                               goto out;
> > 
> > vcpu_enter_guest() bails out.
> 
> 
> Thanks a lot Zhi for the detailed explanation! I think this code flow wants
> to become slightly more obvious. For example, if we just said
> 
>   case SVM_VMGEXIT_AP_DESTROY:
>     /* This will tell __sev_snp_update_protected_guest_state to unmap the
> VMSA */
>     target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
>     break;
> 
> We'd get a big win in readability with little effort. It makes it
> immediately obvious where to look for the destroy operation.

The target->snp_vmsa_gpa value mainly serves to cache the GPA of the new
VMSA up until the point where KVM points the target vCPU's VMCB over to
using it as part of sev_snp_init_protected_guest_state(). We want to
make sure it is either INVALID_PAGE, or whatever GPA the guest want us
to set it to.

Once sev_snp_init_protected_guest_state() processes the update, it will
set it back to INVALID_PAGE. So if we initialized it once during VM
creation it wouldn't be necessary to keep setting it to INVALID_PAGE at
all, except for one case ...

There is a window where the guest could do an AP_CREATE with new GPA,
followed immediately by AP_DESTROY. If that AP_DESTROY got processed
before the vCPU re-enters the guest, it might instead restart the vCPU
(due to taking VALID_PAGE(snp_vmsa_gpa) branch in
__sev_snp_update_protected_guest_state()). So for that reason it makes
sense to always initialize it to INVALID_PAGE when an AP_CREATION
request comes in, to essentially clear any pending updates from a
previous pending AP_CREATION for the same vCPU. If we move it to the
AP_DESTROY case, but hit a failure elsewhere in sev_snp_ap_creation(),
then the calling vCPU will get a #GP, but the target vCPU might get
set up with the VMSA for the previous pending request. That might be
appropriate though, just because calling vCPU hosed itself doesn't mean
a previous valid AP_CREATION request shouldn't go through for target.
Hmm...

And having said all that... yes, I guess that basically boils down to
INVALID_PAGE being used as an indicator of whether or not to unmap
VMSA. I'll try to come up with some wording to better convey all this
a bit more clearly.

Thanks,

-Mike

> 
> 
> Alex
> 
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2023-04-04 22:48     ` Michael Roth
@ 2023-04-05 15:20       ` Tom Lendacky
  0 siblings, 0 replies; 147+ messages in thread
From: Tom Lendacky @ 2023-04-05 15:20 UTC (permalink / raw)
  To: Michael Roth, Alexander Graf
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh

On 4/4/23 17:48, Michael Roth wrote:
> On Fri, Feb 24, 2023 at 01:37:48PM +0100, Alexander Graf wrote:
>>
>> On 20.02.23 19:38, Michael Roth wrote:
>>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>>
>>> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
>>> guests to alter the register state of the APs on their own. This allows
>>> the guest a way of simulating INIT-SIPI.
>>>
>>> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
>>> so as to avoid updating the VMSA pointer while the vCPU is running.
>>>
>>> For CREATE
>>>     The guest supplies the GPA of the VMSA to be used for the vCPU with
>>>     the specified APIC ID. The GPA is saved in the svm struct of the
>>>     target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>>>     to the vCPU and then the vCPU is kicked.
>>>
>>> For CREATE_ON_INIT:
>>>     The guest supplies the GPA of the VMSA to be used for the vCPU with
>>>     the specified APIC ID the next time an INIT is performed. The GPA is
>>>     saved in the svm struct of the target vCPU.
>>>
>>> For DESTROY:
>>>     The guest indicates it wishes to stop the vCPU. The GPA is cleared
>>>     from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
>>>     added to vCPU and then the vCPU is kicked.
>>>
>>> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
>>> as a result of the event or as a result of an INIT. The handler sets the
>>> vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
>>> leave the vCPU as not runnable. Any previous VMSA pages that were
>>> installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
>>> a new VMSA is to be installed, the VMSA guest page is pinned and set as
>>> the VMSA in the vCPU VMCB and the vCPU state is set to
>>> KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
>>> cleared in the vCPU VMCB and the vCPU state is left as
>>> KVM_MP_STATE_UNINITIALIZED to prevent it from being run.
>>>
>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> [mdr: add handling for restrictedmem]
>>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>>
>>
>> What is the intended boot sequence for SEV-SNP guests? FWIW with this
>> interface in place, guests will typically use in-guest VMSA pages to hold
>> secondary vcpu state. But that means we're now allocating 4kb of memory for
>> every vcpu that we create that will be for most of the guest's lifetime
>> superfluous.
>>
>> Wouldn't it make more sense to have a model where we only allocate the VMSA
>> for the boot CPU and leave secondary allocation to the guest? We already
>> need firmware changes for SEV-SNP - may as well make this one more.
> 
> I don't think we'd necessarily need a firmware change. We could just
> free original VMSA back to the hypervisor as soon as those APs come
> online. The down-side to that versus deferring cleaning till guest
> shutdown is there is some flushing activity (see:
> sev_flush_encrypted_page()) that would now likely be occuring during
> guest boot up where the overhead might be more noticeable. But for SNP
> the host likely supports X86_FEATURE_SME_COHERENT so the overhead
> probably isn't that bad.

Currently, OVMF code will perform a broadcast IPI to start all the APs 
because it doesn't know the APIC IDs until they start for the first time. 
Until the APIC IDs are known, the guest BSP can't create the VMSAs.

However, a new GHCB event is in plan to retrieve the APIC IDs for the 
guest. Once that is in place, then you could create just a single VMSA for 
the BSP and then allow the guest to create the remainder (the current OVMF 
PoC patches to support an SVSM do this). The VMM would have to know that 
the hypervisor and the firmware both support that, though. That could be 
advertised as part of the GUID table of the firmware (in the case of OVMF) 
and as a capability from KVM.

Thanks,
Tom

> 
>>
>> [...]
>>
>>> +
>>> +static int sev_snp_ap_creation(struct vcpu_svm *svm)
>>> +{
>>> +       struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
>>> +       struct kvm_vcpu *vcpu = &svm->vcpu;
>>> +       struct kvm_vcpu *target_vcpu;
>>> +       struct vcpu_svm *target_svm;
>>> +       unsigned int request;
>>> +       unsigned int apic_id;
>>> +       bool kick;
>>> +       int ret;
>>> +
>>> +       request = lower_32_bits(svm->vmcb->control.exit_info_1);
>>> +       apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
>>> +
>>> +       /* Validate the APIC ID */
>>> +       target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
>>
>>
>> Out of curiosity: The target CPU can be my own vCPU, right?
> 
> I don't think that would be the normal behavior, but maybe with some
> care it's possible for a guest to do things that way. I haven't seen
> anything strictly prohibiting this in the relevant specs.
> 
>>
>>
>>> +       if (!target_vcpu) {
>>> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
>>> +                           apic_id);
>>> +               return -EINVAL;
>>> +       }
>>> +
>>> +       ret = 0;
>>> +
>>> +       target_svm = to_svm(target_vcpu);
>>> +
>>> +       /*
>>> +        * The target vCPU is valid, so the vCPU will be kicked unless the
>>> +        * request is for CREATE_ON_INIT. For any errors at this stage, the
>>> +        * kick will place the vCPU in an non-runnable state.
>>> +        */
>>> +       kick = true;
>>> +
>>> +       mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
>>> +
>>> +       target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
>>> +       target_svm->sev_es.snp_ap_create = true;
>>> +
>>> +       /* Interrupt injection mode shouldn't change for AP creation */
>>> +       if (request < SVM_VMGEXIT_AP_DESTROY) {
>>> +               u64 sev_features;
>>> +
>>> +               sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
>>> +               sev_features ^= sev->sev_features;
>>> +               if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
>>> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
>>> +                                   vcpu->arch.regs[VCPU_REGS_RAX]);
>>> +                       ret = -EINVAL;
>>> +                       goto out;
>>> +               }
>>> +       }
>>> +
>>> +       switch (request) {
>>> +       case SVM_VMGEXIT_AP_CREATE_ON_INIT:
>>> +               kick = false;
>>> +               fallthrough;
>>> +       case SVM_VMGEXIT_AP_CREATE:
>>> +               if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
>>> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
>>> +                                   svm->vmcb->control.exit_info_2);
>>> +                       ret = -EINVAL;
>>> +                       goto out;
>>> +               }
>>> +
>>> +               /*
>>> +                * Malicious guest can RMPADJUST a large page into VMSA which
>>> +                * will hit the SNP erratum where the CPU will incorrectly signal
>>> +                * an RMP violation #PF if a hugepage collides with the RMP entry
>>> +                * of VMSA page, reject the AP CREATE request if VMSA address from
>>> +                * guest is 2M aligned.
>>
>>
>> This will break genuine current Linux kernels that just happen to allocate a
>> guest page, no? In fact, given enough vCPUs you're almost guaranteed to hit
>> an aligned structure somewhere. What is the guest supposed to do in that
>> situation?
> 
> The initial SNP support for guest kernels already made use of
> snp_alloc_vmsa_page() to do the appropriate workaround to avoid allocating
> 2MB-aligned VMSA pages.
> 
> -Mike
> 
>>
>>
>>
>> Amazon Development Center Germany GmbH
>> Krausenstr. 38
>> 10117 Berlin
>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
>> Sitz: Berlin
>> Ust-ID: DE 289 237 879
>>
>>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 48/56] KVM: SVM: Add SNP-specific handling for memory attribute updates
  2023-03-01 23:37   ` Dave Hansen
@ 2023-04-05 23:48     ` Michael Roth
  0 siblings, 0 replies; 147+ messages in thread
From: Michael Roth @ 2023-04-05 23:48 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania

On Wed, Mar 01, 2023 at 03:37:00PM -0800, Dave Hansen wrote:
> On 2/20/23 10:38, Michael Roth wrote:
> > +		/*
> > +		 * TODO: The RMP entry's hugepage bit is ignored for
> > +		 * shared/unassigned pages. Either handle looping through each
> > +		 * sub-page as part of snp_make_page_shared(), or remove the
> > +		 * level argument.
> > +		 */
> > +		if (op == SNP_PAGE_STATE_PRIVATE && order &&
> > +		    IS_ALIGNED(gfn, 1 << order) && (gfn + (1 << order)) <= end) {
> > +			level = order_to_level(order);
> > +			npages = 1 << order;
> > +		}
> 
> That's a wee bit obtuse.
> 
> First of all, I assume that the 'RFC' is because of these TODOs and they
> won't survive to the point when you ask for this to be merged.

Yes, will make sure to have all the TODOs in the tree addressed before
dropping the RFC tag.

> 
> BTW, what keeps the restrictedmem_get_page() offset and the gfn aligned?

I don't think anything enforces that currently, but there is a TODO in the
UPM v10 patchset to enforce that:

  https://github.com/AMDESE/linux/commit/5c86db7f98701f614c48946b733f2542c962f139#diff-e7514a224c92c2e47224f99919405a37ee7edc4612953135229cfb6e07a680d8R2131

So currently this patch relies on the following:

 - the fact that the memslot alignment/sizes for a standard x86 guest's
   associated memory regions are 2M-aligned, so when they are bound to a
   restrictedmem FD they are naturally packed in at restrictedmem offsets
   that are also 2M-aligned. But of course we can't assume userspace will
   live up to this assumption and need the above TODO in KVM to enforce
   this when registering new memslots.

 - that restrictedmem/shmem will ensure that THPs are only allocated for
   restrictedmem offsets that are 2M-aligned. I think this enforcement
   happens in shmem_alloc_hugefolio().

which both seem to hold in testing. But it's probably a good idea to add
an explicit check for this, at least until KVM implements something to
enforce this earlier in guest life-cycle.

> 
> Let's start with this:
> 
> > +static inline u8 order_to_level(int order)
> > +{
> > +	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
> > +
> > +	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
> > +		return PG_LEVEL_1G;
> > +
> > +	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
> > +		return PG_LEVEL_2M;
> > +
> > +	return PG_LEVEL_4K;
> > +}
> 
> Right now, 'order' comes only from restrictedmem_get_page(), which I dug
> out of:
> 
> > https://github.com/mdroth/linux/commit/c6792672cd11737fd255dff10b2d5b6bccc626a8
> 
> That order is *only* filled in by THPs.  That makes the PG_LEVEL_1G
> stuff here kinda silly.  I guess it might be seen as thorough, but it's
> dead code.  I'd probably just make this work on order==9 || order==0 and
> warn on anything else.

Ok, makes sense.

> 
> I'd also highly recommend some comments about how racy this all is.  I
> guess it probably works, but it would be good to add some comments about
> page splits and collapsing.

Collapsing while in this code path should be ok since the 4K sub-pages will
just end up getting mapped as 4K in RMP table. KVM MMU will then map
them into nested page table as 4K as well and we'll get non-optimal
performance, but things should otherwise work.

Splitting is a similar story: if we map as 2M in RMP table, and then
afterward the page gets split, then KVM MMU during fault time would map
the pages in the NPT as 4K, and when the guest attempts to access
private pages of this sort they'll generate a nested page fault with
PFERR_GUEST_RMP_BIT and PFERR_GUEST_SIZEM_BIT set, and the code in
handle_rmp_page_fault() will issue a PSMASH instruction to split the
2M RMP entry into 512 4K RMP entries.

Will add some comments around this.

> 
> It's also not obvious why this only cares about private pages.

Mainly because the shared memory that actually get mapped into the guest
is always shared in the RMP table. It is normal VMA memory that is not
allocated by UPM/restrictedmem. We will never attempt to make them
private, so there is never a need to bother with switching them back to
shared.

So we only need to handle RMP updates for the UPM/restrictedmem PFNs.
Obviously for shared->private conversion before mapping it into the
guest, but also for private->shared conversion since we will still
get RMP check failures if we try to leave the PFNs as private in the
RMP table and map the above-mentioned VMA memory into the guest
instead.

Will add some more comments around this.

> 
> Anyway, this is the exact kind of thing where I really like a
> well-commented helper:
> 
> bool can_install_large_rmp_entry(gfn, order)
> {
> 	// small pages, blah blah
> 	if (!order)
> 		return false;
> 
> 	// The region being updated must be aligned
> 	if (!IS_ALIGNED(gfn, 1 << order))
> 		return false;
> 	// ... and fit
> 	if (gfn + (1 << order)) > end)
> 		return false;
> 
> 	return true;	
> }
> 
> Which gets used like this:
> 
> 	if (op == SNP_PAGE_STATE_PRIVATE &&
> 	    can_install_large_rmp_entry(gfn, order)) {
> 		level = ...
> 	}

Makes sense, will implement something along these lines.

Thanks!

-Mike


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address
  2023-03-28 23:31     ` Michael Roth
@ 2023-04-11 18:27       ` Dave Hansen
  0 siblings, 0 replies; 147+ messages in thread
From: Dave Hansen @ 2023-04-11 18:27 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Jarkko Sakkinen

On 3/28/23 16:31, Michael Roth wrote:
> However...
> 
> The fact that any pages potentially triggering these #PFs are able to be
> mapped as 2M in the first place means that all the PFNs covered by that
> 2M mapping must also been allocated by via mappable/VMA memory rather
> than via restricted memfd where userspace mappings are not possible.
> 
> So I think we should be able to drop this patch entirely, as well as
> allow the use of HugeTLBFS for non-restricted memfd memory (though
> eventually the guest will switch all its memory to private/restricted
> so not gaining much there other than reducing management complexity).

This is sounding a bit voodoo-ish to me.

If this whole series is predicated on having its memory supplied via one
very specific ABI with very specific behavior.

That connection and the associated contract isn't spelled out very
clearly in this series.  I'm sure it works on your machine and is clear
to _you_ but I'm worried that nobody else is going to be able to figure
out the voodoo.

Could we make sure that this stuff is made very clear in the
Documentation and cover letter, please?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2023-02-20 18:38 ` [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
  2023-02-24 11:01   ` Alexander Graf
  2023-02-28 19:34   ` Zhi Wang
@ 2023-04-17 13:05   ` Alexander Graf
  2 siblings, 0 replies; 147+ messages in thread
From: Alexander Graf @ 2023-04-17 13:05 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh, Rapan, Sabin


On 20.02.23 19:38, Michael Roth wrote:

> From: Brijesh Singh <brijesh.singh@amd.com>
>
> Version 2 of GHCB specification added the support for two SNP Guest
> Request Message NAE events. The events allows for an SEV-SNP guest to
> make request to the SEV-SNP firmware through hypervisor using the
> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>
> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
> difference of an additional certificate blob that can be passed through
> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
> provides snp_guest_ext_guest_request() that is used by the KVM to get
> both the report and certificate data at once.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   arch/x86/kvm/svm/sev.c | 185 +++++++++++++++++++++++++++++++++++++++--
>   arch/x86/kvm/svm/svm.h |   2 +
>   2 files changed, 181 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 197b1f904567..92179614102e 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -327,6 +327,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>                  if (ret)
>                          goto e_free;
>
> +               mutex_init(&sev->guest_req_lock);
>                  ret = sev_snp_init(&argp->error, false);
>          } else {
>                  ret = sev_platform_init(&argp->error);
> @@ -2059,23 +2060,34 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>    */
>   static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   {
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>          struct sev_data_snp_addr data = {};
> -       void *context;
> +       void *context, *certs_data;
>          int rc;
>
> +       /* Allocate memory used for the certs data in SNP guest request */
> +       certs_data = kzalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
> +       if (!certs_data)
> +               return NULL;
> +
>          /* Allocate memory for context page */
>          context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>          if (!context)
> -               return NULL;
> +               goto e_free;
>
>          data.gctx_paddr = __psp_pa(context);
>          rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> -       if (rc) {
> -               snp_free_firmware_page(context);
> -               return NULL;
> -       }
> +       if (rc)
> +               goto e_free;
> +
> +       sev->snp_certs_data = certs_data;
>
>          return context;
> +
> +e_free:
> +       snp_free_firmware_page(context);
> +       kfree(certs_data);
> +       return NULL;
>   }
>
>   static int snp_bind_asid(struct kvm *kvm, int *error)
> @@ -2693,6 +2705,8 @@ static int snp_decommission_context(struct kvm *kvm)
>          snp_free_firmware_page(sev->snp_context);
>          sev->snp_context = NULL;
>
> +       kfree(sev->snp_certs_data);
> +
>          return 0;
>   }
>
> @@ -3153,6 +3167,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>          case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>          case SVM_VMGEXIT_HV_FEATURES:
>          case SVM_VMGEXIT_PSC:
> +       case SVM_VMGEXIT_GUEST_REQUEST:
> +       case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>                  break;
>          default:
>                  reason = GHCB_ERR_INVALID_EVENT;
> @@ -3384,6 +3400,149 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
>          return 1;
>   }
>
> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
> +                                        struct sev_data_snp_guest_request *data,
> +                                        gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm *kvm = vcpu->kvm;
> +       kvm_pfn_t req_pfn, resp_pfn;
> +       struct kvm_sev_info *sev;
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
> +               return SEV_RET_INVALID_PARAM;
> +
> +       req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> +       if (is_error_noslot_pfn(req_pfn))
> +               return SEV_RET_INVALID_ADDRESS;
> +
> +       resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> +       if (is_error_noslot_pfn(resp_pfn))
> +               return SEV_RET_INVALID_ADDRESS;
> +
> +       if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
> +               return SEV_RET_INVALID_ADDRESS;
> +
> +       data->gctx_paddr = __psp_pa(sev->snp_context);
> +       data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> +       data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> +
> +       return 0;
> +}
> +
> +static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
> +{
> +       u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
> +       int ret;
> +
> +       ret = snp_page_reclaim(pfn);
> +       if (ret)
> +               *rc = SEV_RET_INVALID_ADDRESS;
> +
> +       ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> +       if (ret)
> +               *rc = SEV_RET_INVALID_ADDRESS;
> +}
> +
> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +       struct sev_data_snp_guest_request data = {0};
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm *kvm = vcpu->kvm;
> +       struct kvm_sev_info *sev;
> +       unsigned long rc;
> +       int err;
> +
> +       if (!sev_snp_guest(vcpu->kvm)) {
> +               rc = SEV_RET_INVALID_GUEST;
> +               goto e_fail;
> +       }
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       mutex_lock(&sev->guest_req_lock);
> +
> +       rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> +       if (rc)
> +               goto unlock;
> +
> +       rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> +       if (rc)
> +               /* use the firmware error code */
> +               rc = err;


There are cases where sev_issue_cmd can fail, but not set err. For 
example, when the file descriptor is incorrect. In that case, this code 
path leaks uninitialized state from the host stack (err) all the way 
into a guest.


Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 20/56] crypto:ccp: Define the SEV-SNP commands
  2023-02-20 18:38 ` [PATCH RFC v8 20/56] crypto:ccp: Define the SEV-SNP commands Michael Roth
@ 2023-04-17 14:54   ` Sabin Rapan
  0 siblings, 0 replies; 147+ messages in thread
From: Sabin Rapan @ 2023-04-17 14:54 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh



On 20.02.2023 20:38, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
> Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
> while adding new hardware security protection.
> 
> Define the commands and structures used to communicate with the AMD-SP
> when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
> is available at developer.amd.com/sev.
> 
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c |  16 +++
>  include/linux/psp-sev.h      | 247 +++++++++++++++++++++++++++++++++++
>  include/uapi/linux/psp-sev.h |  44 +++++++
>  3 files changed, 307 insertions(+)
> 
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 06fc7156c04f..9d84720a41d7 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -126,6 +126,8 @@ static int sev_cmd_buffer_len(int cmd)
>  	switch (cmd) {
>  	case SEV_CMD_INIT:			return sizeof(struct sev_data_init);
>  	case SEV_CMD_INIT_EX:                   return sizeof(struct sev_data_init_ex);
> +	case SEV_CMD_SNP_SHUTDOWN_EX:		return sizeof(struct sev_data_snp_shutdown_ex);
> +	case SEV_CMD_SNP_INIT_EX:		return sizeof(struct sev_data_snp_init_ex);
>  	case SEV_CMD_PLATFORM_STATUS:		return sizeof(struct sev_user_data_status);
>  	case SEV_CMD_PEK_CSR:			return sizeof(struct sev_data_pek_csr);
>  	case SEV_CMD_PEK_CERT_IMPORT:		return sizeof(struct sev_data_pek_cert_import);
> @@ -154,6 +156,20 @@ static int sev_cmd_buffer_len(int cmd)
>  	case SEV_CMD_GET_ID:			return sizeof(struct sev_data_get_id);
>  	case SEV_CMD_ATTESTATION_REPORT:	return sizeof(struct sev_data_attestation_report);
>  	case SEV_CMD_SEND_CANCEL:		return sizeof(struct sev_data_send_cancel);
> +	case SEV_CMD_SNP_GCTX_CREATE:		return sizeof(struct sev_data_snp_addr);
> +	case SEV_CMD_SNP_LAUNCH_START:		return sizeof(struct sev_data_snp_launch_start);
> +	case SEV_CMD_SNP_LAUNCH_UPDATE:		return sizeof(struct sev_data_snp_launch_update);
> +	case SEV_CMD_SNP_ACTIVATE:		return sizeof(struct sev_data_snp_activate);
> +	case SEV_CMD_SNP_DECOMMISSION:		return sizeof(struct sev_data_snp_addr);
> +	case SEV_CMD_SNP_PAGE_RECLAIM:		return sizeof(struct sev_data_snp_page_reclaim);
> +	case SEV_CMD_SNP_GUEST_STATUS:		return sizeof(struct sev_data_snp_guest_status);
> +	case SEV_CMD_SNP_LAUNCH_FINISH:		return sizeof(struct sev_data_snp_launch_finish);
> +	case SEV_CMD_SNP_DBG_DECRYPT:		return sizeof(struct sev_data_snp_dbg);
> +	case SEV_CMD_SNP_DBG_ENCRYPT:		return sizeof(struct sev_data_snp_dbg);
> +	case SEV_CMD_SNP_PAGE_UNSMASH:		return sizeof(struct sev_data_snp_page_unsmash);
> +	case SEV_CMD_SNP_PLATFORM_STATUS:	return sizeof(struct sev_data_snp_addr);
> +	case SEV_CMD_SNP_GUEST_REQUEST:		return sizeof(struct sev_data_snp_guest_request);
> +	case SEV_CMD_SNP_CONFIG:		return sizeof(struct sev_user_data_snp_config);

This needs SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX, SEV_CMD_SNP_COMMIT and
SEV_CMD_SNP_VLEK_LOAD from 1.54 ABI release.

>  	default:				return 0;
>  	}
>  
> +/**
> + * struct sev_user_data_snp_status - SNP status
> + *
> + * @major: API major version
> + * @minor: API minor version
> + * @state: current platform state
> + * @build: firmware build id for the API version
> + * @guest_count: the number of guest currently managed by the firmware
> + * @tcb_version: current TCB version
> + */
> +struct sev_user_data_snp_status {
> +	__u8 api_major;		/* Out */
> +	__u8 api_minor;		/* Out */
> +	__u8 state;		/* Out */
> +	__u8 rsvd;
> +	__u32 build_id;		/* Out */
> +	__u32 rsvd1;
> +	__u32 guest_count;	/* Out */
> +	__u64 tcb_version;	/* Out */
> +	__u64 rsvd2;
> +} __packed;

Could you please update this to 1.54 ABI version?
Should include something along these lines:

diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 60e7a8d1a18e..e9ebd24ef085 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -163,20 +163,29 @@ struct sev_user_data_get_id2 {
  * @major: API major version
  * @minor: API minor version
  * @state: current platform state
+ * @is_rmp_initialized: whether RMP is initialized or not
  * @build: firmware build id for the API version
+ * @mask_chip_id: whether chip id is present in attestation reports or not
+ * @mask_chip_key: whether attestation reports are signed or not
+ * @vlek_en: VLEK hashstick is loaded
  * @guest_count: the number of guest currently managed by the firmware
- * @tcb_version: current TCB version
+ * @current_tcb_version: current TCB version
+ * @reported_tcb_version: reported TCB version
  */
 struct sev_user_data_snp_status {
-       __u8 api_major;         /* Out */
-       __u8 api_minor;         /* Out */
-       __u8 state;             /* Out */
-       __u8 rsvd;
-       __u32 build_id;         /* Out */
-       __u32 rsvd1;
-       __u32 guest_count;      /* Out */
-       __u64 tcb_version;      /* Out */
-       __u64 rsvd2;
+       __u8 api_major;             /* Out */
+       __u8 api_minor;             /* Out */
+       __u8 state;                 /* Out */
+       __u8 is_rmp_initialized:1;  /* Out */
+       __u8 rsvd:7;
+       __u32 build_id;             /* Out */
+       __u32 mask_chip_id:1;       /* Out */
+       __u32 mask_chip_key:1;      /* Out */
+       __u32 vlek_en:1;            /* Out */
+       __u32 rsvd1:29;
+       __u32 guest_count;          /* Out */
+       __u64 current_tcb_version;  /* Out */
+       __u64 reported_tcb_version; /* Out */
 } __packed;

 /*



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 12/56] x86/sev: Add RMP entry lookup helpers
  2023-03-29 22:59     ` Michael Roth
@ 2023-04-20 16:31       ` Vlastimil Babka
  0 siblings, 0 replies; 147+ messages in thread
From: Vlastimil Babka @ 2023-04-20 16:31 UTC (permalink / raw)
  To: Michael Roth
  Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
	tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
	dgilbert, jarkko, ashish.kalra, nikunj.dadhania, Brijesh Singh

On 3/30/23 00:59, Michael Roth wrote:
> On Fri, Mar 03, 2023 at 04:28:39PM +0100, Vlastimil Babka wrote:
>> On 2/20/23 19:38, Michael Roth wrote:
>> > From: Brijesh Singh <brijesh.singh@amd.com>
>> > 
>> > The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
>> > entry for a given page. The RMP entry format is documented in AMD PPR, see
>> > https://bugzilla.kernel.org/attachment.cgi?id=296015.
>> > 
>> > Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
>> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> > Signed-off-by: Michael Roth <michael.roth@amd.com>
>> > ---
>> 
>> > +/*
>> > + * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
>> > + * and -errno if there is no corresponding RMP entry.
>> > + */
>> 
>> Hmm IMHO the kernel's idiomatic way is to return 0 on "success" and I'd
>> assume the more intuitive expectation of success here if the entry is
>> assigned?
> 
> In general I'd agree. Here's it's a little awkward though.
> snp_lookup_rmpentry() sort of wants to be a bool, where true indicates
> an assigned entry was found, false indicates no assigned entry.
> 
> But it also has to deal with error values, so the most direct way to
> encapsulate that is true == 1, false == 0, and < 0 for errors.
> 
> Inverting it to align more with kernel expections of 0 == success/true
> gets awkward too, because stuff like:
> 
>   if (snp_lookup_rmpentry(...))
>     //error
> 
> still doesn't work the way most other functions written in this way
> would since it could still be "successful" if we were expecting PFN to
> be in shared state. So the return value needs special handling there
> too.
> 
> Would it make sense to define it something like this?:
> 
>   /*
>    * Query information about the RMP entry corresponding to the given
>    * PFN.
>    * 
>    * Returns 0 on success, and -errno if there was a problem accessing
>    * the RMP entry.
>    */
>   int snp_lookup_rmpentry(u64 pfn, int *level, bool *assigned)

Yeah that looks fine to me. Hope you find out it makes it easier to work
with in the callers too.

> 
>> The various callers seem to differ though so I guess it depends on
>> context. Some however don't distinguish their "failure" from an ERR and
>> maybe they should, at least for the purposes of the various printks?
> 
> Yes, regardless of what we decide above, the call-sites should properly
> distinguish between failure/assigned/not-assigned and report the
> information accordingly. I'll get those fixed up where needed.

Great, thanks!

> Thanks,
> 
> -Mike
> 
>> 
>> > +int snp_lookup_rmpentry(u64 pfn, int *level)
>> > +{
>> > +	struct rmpentry *e;
>> > +
>> > +	e = __snp_lookup_rmpentry(pfn, level);
>> > +	if (IS_ERR(e))
>> > +		return PTR_ERR(e);
>> > +
>> > +	return !!rmpentry_assigned(e);
>> > +}
>> > +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
>> 


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  2023-02-20 18:38 ` [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
  2023-02-23 21:41   ` Zhi Wang
@ 2023-04-26 17:06   ` Sabin Rapan
  2023-04-26 18:02     ` Tom Lendacky
  1 sibling, 1 reply; 147+ messages in thread
From: Sabin Rapan @ 2023-04-26 17:06 UTC (permalink / raw)
  To: Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
	kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
	alpergun, dgilbert, jarkko, ashish.kalra, nikunj.dadhania,
	Brijesh Singh



On 20.02.2023 20:38, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
>  
> +static int snp_decommission_context(struct kvm *kvm)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_snp_addr data = {};
> +	int ret;
> +
> +	/* If context is not created then do nothing */
> +	if (!sev->snp_context)
> +		return 0;
> +
> +	data.gctx_paddr = __sme_pa(sev->snp_context);
> +	ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
> +	if (WARN_ONCE(ret, "failed to release guest context"))
> +		return ret;
> +
> +	/* free the context page now */
> +	snp_free_firmware_page(sev->snp_context);
> +	sev->snp_context = NULL;
> +
> +	return 0;
> +}
> +

Even though it's not documented, SNP_DECOMMISSION seems to clear the
WBINVD indicator just like DEACTIVATE does for SEV.
Won't ASID recycling race with SNP_DECOMMISSION if the latter isn't
guarded with sev_deactivate_lock?


>  void sev_vm_destroy(struct kvm *kvm)
>  {
>  	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> @@ -2333,7 +2440,15 @@ void sev_vm_destroy(struct kvm *kvm)
>  		}
>  	}
>  
> -	sev_unbind_asid(kvm, sev->handle);
> +	if (sev_snp_guest(kvm)) {
> +		if (snp_decommission_context(kvm)) {
> +			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
> +			return;
> +		}
> +	} else {
> +		sev_unbind_asid(kvm, sev->handle);
> +	}
> +
>  	sev_asid_free(sev);
>  }
>  



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  2023-04-26 17:06   ` Sabin Rapan
@ 2023-04-26 18:02     ` Tom Lendacky
  0 siblings, 0 replies; 147+ messages in thread
From: Tom Lendacky @ 2023-04-26 18:02 UTC (permalink / raw)
  To: Sabin Rapan, Michael Roth, kvm
  Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, alpergun, dgilbert, jarkko,
	ashish.kalra, nikunj.dadhania, Brijesh Singh

On 4/26/23 12:06, Sabin Rapan wrote:
> On 20.02.2023 20:38, Michael Roth wrote:
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>>   
>> +static int snp_decommission_context(struct kvm *kvm)
>> +{
>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +	struct sev_data_snp_addr data = {};
>> +	int ret;
>> +
>> +	/* If context is not created then do nothing */
>> +	if (!sev->snp_context)
>> +		return 0;
>> +
>> +	data.gctx_paddr = __sme_pa(sev->snp_context);
>> +	ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
>> +	if (WARN_ONCE(ret, "failed to release guest context"))
>> +		return ret;
>> +
>> +	/* free the context page now */
>> +	snp_free_firmware_page(sev->snp_context);
>> +	sev->snp_context = NULL;
>> +
>> +	return 0;
>> +}
>> +
> 
> Even though it's not documented, SNP_DECOMMISSION seems to clear the
> WBINVD indicator just like DEACTIVATE does for SEV.
> Won't ASID recycling race with SNP_DECOMMISSION if the latter isn't
> guarded with sev_deactivate_lock?

Good catch, yes, this needs to use the sev_deactivate_lock around the 
DECOMMISSION command.

Thanks,
Tom

> 
> 
>>   void sev_vm_destroy(struct kvm *kvm)
>>   {
>>   	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> @@ -2333,7 +2440,15 @@ void sev_vm_destroy(struct kvm *kvm)
>>   		}
>>   	}
>>   
>> -	sev_unbind_asid(kvm, sev->handle);
>> +	if (sev_snp_guest(kvm)) {
>> +		if (snp_decommission_context(kvm)) {
>> +			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
>> +			return;
>> +		}
>> +	} else {
>> +		sev_unbind_asid(kvm, sev->handle);
>> +	}
>> +
>>   	sev_asid_free(sev);
>>   }
>>   
> 
> 
> 
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
                   ` (56 preceding siblings ...)
  2023-03-01 16:56 ` [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Dave Hansen
@ 2023-08-03 18:27 ` Schander, Johanna 'Mimoja' Amelie
  2023-08-04  1:01   ` Kalra, Ashish
  57 siblings, 1 reply; 147+ messages in thread
From: Schander, Johanna 'Mimoja' Amelie @ 2023-08-03 18:27 UTC (permalink / raw)
  To: Graf (AWS), Alexander, michael.roth, mimoja
  Cc: tglx, srinivas.pandruvada, tobin, kvm, alpergun, jmattson, luto,
	ak, slp, dovmurik, pbonzini, rientjes, peterz, linux-kernel,
	pgonda, thomas.lendacky, tony.luck, x86, bp, dgilbert, seanjc,
	vkuznets, marcorr, vbabka, ashish.kalra, linux-coco,
	nikunj.dadhania, hpa, mingo, sathyanarayanan.kuppuswamy, jroedel,
	kirill, jarkko, ardb, linux-crypto, linux-mm, dave.hansen

We discovered that the kdump crashkernel would not work with our SEV-
SNP configuration.

After reading the device table from the previour kernel we would
see a lot of
  AMD-Vi: Completion-Wait loop timed out
errors and finally crash:
  Kernel panic - not syncing: timer doesn't work through Interrupt-
remapped IO-APIC

We found that disabeling SNP in the outgoing (crashing) kernel
would enable the crashkernel to take over the iommu config and
boot from there.

We opened a PR over on github against the rfc-v9 branch to discuss
the issue:
https://github.com/AMDESE/linux/pull/5

Cheers Johanna







Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2023-08-03 18:27 ` Schander, Johanna 'Mimoja' Amelie
@ 2023-08-04  1:01   ` Kalra, Ashish
  0 siblings, 0 replies; 147+ messages in thread
From: Kalra, Ashish @ 2023-08-04  1:01 UTC (permalink / raw)
  To: Schander, Johanna 'Mimoja' Amelie, Graf (AWS),
	Alexander, michael.roth, mimoja
  Cc: tglx, srinivas.pandruvada, tobin, kvm, alpergun, jmattson, luto,
	ak, slp, dovmurik, pbonzini, rientjes, peterz, linux-kernel,
	pgonda, thomas.lendacky, tony.luck, x86, bp, dgilbert, seanjc,
	vkuznets, marcorr, vbabka, linux-coco, nikunj.dadhania, hpa,
	mingo, sathyanarayanan.kuppuswamy, jroedel, kirill, jarkko, ardb,
	linux-crypto, linux-mm, dave.hansen

Please look at the response on github and followup discussion.

Thanks,
Ashish

On 8/3/2023 1:27 PM, Schander, Johanna 'Mimoja' Amelie wrote:
> We discovered that the kdump crashkernel would not work with our SEV-
> SNP configuration.
> 
> After reading the device table from the previour kernel we would
> see a lot of
>    AMD-Vi: Completion-Wait loop timed out
> errors and finally crash:
>    Kernel panic - not syncing: timer doesn't work through Interrupt-
> remapped IO-APIC
> 
> We found that disabeling SNP in the outgoing (crashing) kernel
> would enable the crashkernel to take over the iommu config and
> boot from there.
> 
> We opened a PR over on github against the rfc-v9 branch to discuss
> the issue:
> https://github.com/AMDESE/linux/pull/5
> 
> Cheers Johanna
> 
> 
> 
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

end of thread, other threads:[~2023-08-04  1:02 UTC | newest]

Thread overview: 147+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-20 18:37 [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
2023-02-20 18:37 ` [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op Michael Roth
2023-03-01 10:25   ` Zhi Wang
2023-03-18  4:51   ` Isaku Yamahata
2023-03-20 17:46     ` Michael Roth
2023-03-18  4:53   ` Isaku Yamahata
2023-02-20 18:37 ` [PATCH RFC v8 02/56] KVM: x86: Add 'update_mem_attr' " Michael Roth
2023-03-18  4:56   ` Isaku Yamahata
2023-03-20 18:05     ` Michael Roth
2023-03-21 11:21       ` Zhi Wang
2023-03-22  1:58         ` Michael Roth
2023-03-23 18:17           ` Zhi Wang
2023-03-28  4:36             ` Michael Roth
2023-03-28 23:00               ` Zhi Wang
2023-03-29 23:50                 ` Michael Roth
2023-02-20 18:37 ` [PATCH RFC v8 03/56] KVM: x86: Add platform hooks for private memory invalidations Michael Roth
2023-03-18  5:13   ` Isaku Yamahata
2023-03-20 18:09     ` Michael Roth
2023-02-20 18:37 ` [PATCH RFC v8 04/56] KVM: Add HVA range operator Michael Roth
2023-02-20 21:37   ` Zhi Wang
2023-03-27  0:34     ` Michael Roth
2023-04-04 14:40       ` Zhi Wang
2023-02-20 18:37 ` [PATCH RFC v8 05/56] KVM: SEV: Require KVM_PROTECTED_VM when AMD_MEM_ENCRYPT is enabled Michael Roth
2023-02-20 18:37 ` [PATCH RFC v8 06/56] KVM: Split out memory attribute xarray updates to helper function Michael Roth
2023-02-20 18:37 ` [PATCH RFC v8 07/56] KVM: SEV: Populate private memory fd during LAUNCH_UPDATE_DATA Michael Roth
2023-02-20 18:37 ` [PATCH RFC v8 08/56] KVM: SEV: Rename sev_{pin,unpin}_memory Michael Roth
2023-03-03 14:00   ` Vlastimil Babka
2023-03-06 11:01     ` Nikunj A. Dadhania
2023-02-20 18:38 ` [PATCH RFC v8 09/56] KVM: SEV: Handle memory backed by restricted memfd Michael Roth
2023-03-03 14:05   ` Vlastimil Babka
2023-03-06 11:03     ` Nikunj A. Dadhania
2023-02-20 18:38 ` [PATCH RFC v8 10/56] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
2023-02-21 21:21   ` Sathyanarayanan Kuppuswamy
2023-02-22 23:27     ` Kalra, Ashish
2023-02-20 18:38 ` [PATCH RFC v8 11/56] x86/sev: Add the host SEV-SNP initialization support Michael Roth
2023-02-20 20:12   ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 12/56] x86/sev: Add RMP entry lookup helpers Michael Roth
2023-03-03 15:28   ` Vlastimil Babka
2023-03-29 22:59     ` Michael Roth
2023-04-20 16:31       ` Vlastimil Babka
2023-02-20 18:38 ` [PATCH RFC v8 13/56] x86/fault: Add helper for dumping RMP entries Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 14/56] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 15/56] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
2023-03-01 12:07   ` Tom Dohrmann
2023-03-01 16:15   ` Dave Hansen
2023-03-28 22:12     ` Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 16/56] x86/traps: Define RMP violation #PF error code Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 17/56] x86/fault: Add support to handle the RMP fault for user address Michael Roth
2023-03-01 16:21   ` Dave Hansen
2023-03-28 23:31     ` Michael Roth
2023-04-11 18:27       ` Dave Hansen
2023-03-03 15:31   ` Vlastimil Babka
2023-02-20 18:38 ` [PATCH RFC v8 18/56] x86/fault: fix handle_split_page_fault() to work with memfd backed pages Michael Roth
2023-02-20 19:57   ` Hugh Dickins
2023-02-20 20:31     ` Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 19/56] x86/fault: Return pfn from dump_pagetable() for SEV-specific fault handling Michael Roth
2023-02-20 21:13   ` Zhi Wang
2023-02-28 10:53   ` Wu Zongyong
2023-02-20 18:38 ` [PATCH RFC v8 20/56] crypto:ccp: Define the SEV-SNP commands Michael Roth
2023-04-17 14:54   ` Sabin Rapan
2023-02-20 18:38 ` [PATCH RFC v8 21/56] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 22/56] crypto:ccp: Provide API to issue SEV and SNP commands Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 23/56] crypto: ccp: Introduce snp leaked pages list Michael Roth
2023-03-03 15:54   ` Vlastimil Babka
2023-02-20 18:38 ` [PATCH RFC v8 24/56] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
2023-02-21  9:28   ` Zhi Wang
2023-02-21 15:31     ` Kalra, Ashish
2023-02-21 21:15       ` Zhi Wang
2023-02-21 22:06         ` Kalra, Ashish
2023-02-20 18:38 ` [PATCH RFC v8 25/56] crypto: ccp: Handle the legacy SEV command " Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 26/56] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 27/56] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
2023-02-22 12:32   ` Zhi Wang
2023-02-22 16:50     ` Tom Lendacky
2023-02-22 22:43     ` Kalra, Ashish
2023-02-23  6:38       ` Zhi Wang
2023-02-23 14:19         ` Tom Lendacky
2023-02-20 18:38 ` [PATCH RFC v8 28/56] crypto: ccp: Provide APIs to query extended attestation report Michael Roth
2023-02-22 20:24   ` Zhi Wang
2023-02-22 22:35     ` Kalra, Ashish
2023-02-23  8:14       ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 29/56] KVM: SVM: Add support to handle AP reset MSR protocol Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 30/56] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 31/56] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
2023-02-22 20:42   ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 32/56] KVM: SVM: Add initial SEV-SNP support Michael Roth
2023-02-23 17:46   ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 33/56] KVM: SVM: Add KVM_SNP_INIT command Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 34/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
2023-02-23 21:41   ` Zhi Wang
2023-02-24 16:22     ` Tom Lendacky
2023-04-26 17:06   ` Sabin Rapan
2023-04-26 18:02     ` Tom Lendacky
2023-02-20 18:38 ` [PATCH RFC v8 35/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
2023-02-24 11:55   ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 36/56] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
2023-03-24 14:40   ` Alexander Graf
2023-02-20 18:38 ` [PATCH RFC v8 37/56] KVM: X86: Keep the NPT and RMP page level in sync Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 38/56] KVM: x86: Define RMP page fault error bits for #NPF Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 39/56] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 40/56] KVM: SVM: Add KVM_EXIT_VMGEXIT Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 41/56] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
2023-02-24 15:06   ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 42/56] KVM: SVM: Add support to handle " Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 43/56] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 44/56] KVM: SVM: Add support to handle the RMP nested page fault Michael Roth
2023-02-28 19:11   ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 45/56] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
2023-02-24 11:01   ` Alexander Graf
2023-02-28 19:34   ` Zhi Wang
2023-04-17 13:05   ` Alexander Graf
2023-02-20 18:38 ` [PATCH RFC v8 46/56] KVM: SVM: Use a VMSA physical address variable for populating VMCB Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE event Michael Roth
2023-02-24 12:37   ` Alexander Graf
2023-02-28 20:47     ` Zhi Wang
2023-03-01 21:14       ` Alexander Graf
2023-04-05  0:54         ` Michael Roth
2023-04-04 22:48     ` Michael Roth
2023-04-05 15:20       ` Tom Lendacky
2023-02-20 18:38 ` [PATCH RFC v8 48/56] KVM: SVM: Add SNP-specific handling for memory attribute updates Michael Roth
2023-03-01 23:37   ` Dave Hansen
2023-04-05 23:48     ` Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 49/56] KVM: SVM: Implement .fault_is_private callback for SNP Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 50/56] KVM: SEV: Handle restricted memory invalidations " Michael Roth
2023-03-01 10:41   ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 51/56] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
2023-03-01 10:45   ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 52/56] ccp: Add support to decrypt the page Michael Roth
2023-03-01 21:20   ` Zhi Wang
2023-03-02  5:59     ` Dov Murik
2023-03-02 14:33       ` Tom Lendacky
2023-03-02 21:11         ` Dov Murik
2023-02-20 18:38 ` [PATCH RFC v8 53/56] KVM: SVM: Make VMSAVE target area memory allocation SNP safe Michael Roth
2023-03-01 21:23   ` Zhi Wang
2023-02-20 18:38 ` [PATCH RFC v8 54/56] x86/sev: Add KVM commands for instance certs Michael Roth
2023-02-21 12:40   ` Dov Murik
2023-03-02  0:02   ` Zhi Wang
2023-03-02  1:41     ` Dionna Amalie Glaze
2023-03-02 11:27       ` Zhi Wang
2023-03-02 11:34   ` Dov Murik
2023-02-20 18:38 ` [PATCH RFC v8 55/56] x86/sev: Document KVM_SEV_SNP_{G,S}ET_CERTS Michael Roth
2023-02-20 18:38 ` [PATCH RFC v8 56/56] iommu/amd: Add IOMMU_SNP_SHUTDOWN support Michael Roth
2023-03-01 16:56 ` [PATCH RFC v8 00/56] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Dave Hansen
2023-03-01 22:59   ` Zhi Wang
2023-03-01 23:39     ` Dave Hansen
2023-08-03 18:27 ` Schander, Johanna 'Mimoja' Amelie
2023-08-04  1:01   ` Kalra, Ashish

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).